Production-Grade Container Scheduling and Management
Go to file
Kubernetes Submit Queue 6de28fab7d Merge pull request #42942 from vishh/gpu-cont-fix
Automatic merge from submit-queue (batch tested with PRs 42942, 42935)

[Bug] Handle container restarts and avoid using runtime pod cache while allocating GPUs

Fixes #42412

**Background**
Support for multiple GPUs is an experimental feature in v1.6. 
Container restarts were handled incorrectly which resulted in stranding of GPUs
Kubelet is incorrectly using runtime cache to track running pods which can result in race conditions (as it did in other parts of kubelet). This can result in same GPU being assigned to multiple pods.

**What does this PR do**
This PR tracks assignment of GPUs to containers and returns pre-allocated GPUs instead of (incorrectly) allocating new GPUs.
GPU manager is updated to consume a list of active pods derived from apiserver cache instead of runtime cache.
Node e2e has been extended to validate this failure scenario.

**Risk**
Minimal/None since support for GPUs is an experimental feature that is turned off by default. The code is also isolated to GPU manager in kubelet.

**Workarounds**
In the absence of this PR, users can mitigate the original issue by setting `RestartPolicyNever`  in their pods.
There is no workaround for the race condition caused by using the runtime cache though.
Hence it is worth including this fix in v1.6.0.

cc @jianzhangbjz @seelam @kubernetes/sig-node-pr-reviews 

Replaces #42560
2017-03-14 10:19:17 -07:00
.github
api Update docs and client 2017-03-09 07:34:56 +00:00
build Merge pull request #42070 from luxas/remove_kube_discovery 2017-03-04 12:58:23 -08:00
cluster Merge pull request #42988 from MaciekPytel/update_ca_manifest 2017-03-14 07:31:35 -07:00
cmd Merge pull request #41429 from mikedanese/kubeadm-owners 2017-03-14 08:49:37 -07:00
docs Update docs and client 2017-03-09 07:34:56 +00:00
examples Update examples with storage.k8s.io/v1 2017-03-07 12:39:13 +01:00
federation fed: Fix flakey ingress unit test 2017-03-13 13:18:06 -07:00
Godeps Merge pull request #42669 from curtisallen/update_dep_go-oidc 2017-03-14 07:31:34 -07:00
hack Merge pull request #42942 from vishh/gpu-cont-fix 2017-03-14 10:19:17 -07:00
hooks
logo
pkg Merge pull request #42942 from vishh/gpu-cont-fix 2017-03-14 10:19:17 -07:00
plugin Use constant time compare for bootstrap tokens 2017-03-14 14:06:33 +00:00
staging Merge pull request #42669 from curtisallen/update_dep_go-oidc 2017-03-14 07:31:34 -07:00
test Merge pull request #42942 from vishh/gpu-cont-fix 2017-03-14 10:19:17 -07:00
third_party
translations Update extraction script, sort messages, add .pot file. 2017-02-23 18:53:00 +00:00
vendor Merge pull request #42669 from curtisallen/update_dep_go-oidc 2017-03-14 07:31:34 -07:00
.bazelrc
.gazelcfg.json
.generated_files Move .generated_docs to docs/ so docs OWNERS can review / approve 2017-02-16 10:11:57 -08:00
.gitattributes
.gitignore
BUILD.bazel
CHANGELOG.md Update CHANGELOG.md for v1.6.0-beta.3. 2017-03-10 21:05:30 -08:00
code-of-conduct.md
CONTRIBUTING.md
labels.yaml
LICENSE
Makefile Make make quick-release quick again 2017-02-21 14:35:55 -08:00
Makefile.generated_files
OWNERS
OWNERS_ALIASES add OWNER file to kubelet/network 2017-02-24 11:41:13 -08:00
README.md
Vagrantfile
WORKSPACE

Kubernetes

Submit Queue Widget GoDoc Widget

Introduction

Kubernetes is an open source system for managing containerized applications across multiple hosts, providing basic mechanisms for deployment, maintenance, and scaling of applications. Kubernetes is hosted by the Cloud Native Computing Foundation (CNCF).

Kubernetes builds upon a decade and a half of experience at Google running production workloads at scale using a system called Borg, combined with best-of-breed ideas and practices from the community.


Are you ...

Code of Conduct

The Kubernetes community abides by the CNCF code of conduct. Here is an excerpt:

As contributors and maintainers of this project, and in the interest of fostering an open and welcoming community, we pledge to respect all people who contribute through reporting issues, posting feature requests, updating documentation, submitting pull requests or patches, and other activities.

Community

Do you want to help shape the evolution of technologies that are container-packaged, dynamically-scheduled and microservices-oriented?. If you are a company, you should consider joining the CNCF. For details about who's involved in CNCF and how Kubernetes plays a role, read the announcement. For general information about our community, see the website community page.

Contribute

If you're interested in being a contributor and want to get involved in developing Kubernetes, get started with this reading:

You will then most certainly gain a lot from joining a SIG, attending the regular hangouts as well as the community meeting.

If you have an idea for a new feature, see the Kubernetes Features repository for a list of features that are coming in new releases as well as details on how to propose one.

Building Kubernetes for the impatient

If you want to build Kubernetes right away there are two options:

$ go get -d k8s.io/kubernetes
$ cd $GOPATH/src/k8s.io/kubernetes
$ make
$ git clone https://github.com/kubernetes/kubernetes
$ cd kubernetes
$ make quick-release

If you are less impatient, head over to the developer's documentation.

Support

While there are many different channels that you can use to get hold of us (Slack, Stack Overflow, Issues, Forums/Mailing lists), you can help make sure that we are efficient in getting you the help that you need.

If you need support, start with the troubleshooting guide and work your way through the process that we've outlined.

That said, if you have questions, reach out to us one way or another. We don't bite!

Analytics