We have a e2e test which tries to ensure device plugin assignments to pods are kept across node reboots. And this tests is permafailing since many weeks at time of writing (xref: #128443). Problem is: closer inspection reveals the test was well intentioned, but puzzling: The test runs a pod, then restarts the kubelet, then _expects the pod to end up in admission failure_ and yet _ensure the device assignment is kept_! https://github.com/kubernetes/kubernetes/blob/v1.32.0-rc.0/test/e2e_node/device_plugin_test.go#L97 A reader can legitmately wonder if this means the device will be kept busy forever? This is not the case, luckily. The test however embodied the behavior at time of the kubelet, in turn caused by #103979 Device manager used to record the last admitted pod and forcibly added to the list of active pod. The retention logic had space for exactly one pod, the last which attempted admission. This retention prevented the cleanup code (see: https://github.com/kubernetes/kubernetes/blob/v1.32.0-rc.0/pkg/kubelet/cm/devicemanager/manager.go#L549 compare to: https://github.com/kubernetes/kubernetes/blob/v1.31.0-rc.0/pkg/kubelet/cm/devicemanager/manager.go#L549) to clear the registration, so the device was still (mis)reported allocated to the failed pod. This fact was in turn leveraged by the test in question: the test uses the podresources API to learn about the device assignment, and because of the chain of events above the pod failed admission yet was still reported as owning the device. What happened however was the next pod trying admission would have replaced the previous pod in the device manager data, so the previous pod was no longer forced to be added into the active list, so its assignment were correctly cleared once the cleanup code runs; And the cleanup code is run, among other things, every time device manager is asked to allocated devices and every time podresources API queries the device assignment Later, in PR https://github.com/kubernetes/kubernetes/pull/120661 the forced retention logic was removed from all the resource managers, thus also from device manager, and this is what caused the permafailure. Because all of the above, it should be evident that the e2e test was actually enforcing a very specific and not really work-as-intended behavior, which was also overall quite puzzling for users. The best we can do is to fix the test to record and ensure that pods which did fail admission _do not_ retain device assignment. Unfortunately, we _cannot_ guarantee the desirable property that pod going running retain their device assignment across node reboots. In the kubelet restart flow, all pods race to be admitted. There's no order enforced between device plugin pods and application pods. Unless an application pod is lucky enough to _lose_ the race with both the device plugin (to go running before the app pod does) and _also_ with the kubelet (which needs to set devices healthy before the pod tries admission). Signed-off-by: Francesco Romani <fromani@redhat.com> |
||
---|---|---|
.github | ||
api | ||
build | ||
CHANGELOG | ||
cluster | ||
cmd | ||
docs | ||
hack | ||
LICENSES | ||
logo | ||
pkg | ||
plugin | ||
staging | ||
test | ||
third_party | ||
vendor | ||
.generated_files | ||
.gitattributes | ||
.gitignore | ||
.go-version | ||
CHANGELOG.md | ||
code-of-conduct.md | ||
CONTRIBUTING.md | ||
go.mod | ||
go.sum | ||
go.work | ||
go.work.sum | ||
LICENSE | ||
Makefile | ||
OWNERS | ||
OWNERS_ALIASES | ||
README.md | ||
SECURITY_CONTACTS | ||
SUPPORT.md |
Kubernetes (K8s)

Kubernetes, also known as K8s, is an open source system for managing containerized applications across multiple hosts. It provides basic mechanisms for the deployment, maintenance, and scaling of applications.
Kubernetes builds upon a decade and a half of experience at Google running production workloads at scale using a system called Borg, combined with best-of-breed ideas and practices from the community.
Kubernetes is hosted by the Cloud Native Computing Foundation (CNCF). If your company wants to help shape the evolution of technologies that are container-packaged, dynamically scheduled, and microservices-oriented, consider joining the CNCF. For details about who's involved and how Kubernetes plays a role, read the CNCF announcement.
To start using K8s
See our documentation on kubernetes.io.
Take a free course on Scalable Microservices with Kubernetes.
To use Kubernetes code as a library in other applications, see the list of published components.
Use of the k8s.io/kubernetes
module or k8s.io/kubernetes/...
packages as libraries is not supported.
To start developing K8s
The community repository hosts all information about building Kubernetes from source, how to contribute code and documentation, who to contact about what, etc.
If you want to build Kubernetes right away there are two options:
You have a working Go environment.
git clone https://github.com/kubernetes/kubernetes
cd kubernetes
make
You have a working Docker environment.
git clone https://github.com/kubernetes/kubernetes
cd kubernetes
make quick-release
For the full story, head over to the developer's documentation.
Support
If you need support, start with the troubleshooting guide, and work your way through the process that we've outlined.
That said, if you have questions, reach out to us one way or another.
Community Meetings
The Calendar has the list of all the meetings in the Kubernetes community in a single location.
Adopters
The User Case Studies website has real-world use cases of organizations across industries that are deploying/migrating to Kubernetes.
Governance
Kubernetes project is governed by a framework of principles, values, policies and processes to help our community and constituents towards our shared goals.
The Kubernetes Community is the launching point for learning about how we organize ourselves.
The Kubernetes Steering community repo is used by the Kubernetes Steering Committee, which oversees governance of the Kubernetes project.
Roadmap
The Kubernetes Enhancements repo provides information about Kubernetes releases, as well as feature tracking and backlogs.