We have a e2e test which tries to ensure device plugin assignments to pods are kept
across node reboots. And this tests is permafailing since many weeks at
time of writing (xref: #128443).
Problem is: closer inspection reveals the test was well intentioned, but
puzzling:
The test runs a pod, then restarts the kubelet, then _expects the pod to
end up in admission failure_ and yet _ensure the device assignment is
kept_! https://github.com/kubernetes/kubernetes/blob/v1.32.0-rc.0/test/e2e_node/device_plugin_test.go#L97
A reader can legitmately wonder if this means the device will be kept busy forever?
This is not the case, luckily. The test however embodied the behavior at
time of the kubelet, in turn caused by #103979
Device manager used to record the last admitted pod and forcibly added
to the list of active pod. The retention logic had space for exactly one
pod, the last which attempted admission.
This retention prevented the cleanup code
(see: https://github.com/kubernetes/kubernetes/blob/v1.32.0-rc.0/pkg/kubelet/cm/devicemanager/manager.go#L549
compare to: https://github.com/kubernetes/kubernetes/blob/v1.31.0-rc.0/pkg/kubelet/cm/devicemanager/manager.go#L549)
to clear the registration, so the device was still (mis)reported
allocated to the failed pod.
This fact was in turn leveraged by the test in question:
the test uses the podresources API to learn about the device assignment,
and because of the chain of events above the pod failed admission yet
was still reported as owning the device.
What happened however was the next pod trying admission would have
replaced the previous pod in the device manager data, so the previous
pod was no longer forced to be added into the active list, so its
assignment were correctly cleared once the cleanup code runs;
And the cleanup code is run, among other things, every time device
manager is asked to allocated devices and every time podresources API
queries the device assignment
Later, in PR https://github.com/kubernetes/kubernetes/pull/120661
the forced retention logic was removed from all the resource managers,
thus also from device manager, and this is what caused the permafailure.
Because all of the above, it should be evident that the e2e test was
actually enforcing a very specific and not really work-as-intended
behavior, which was also overall quite puzzling for users.
The best we can do is to fix the test to record and ensure that
pods which did fail admission _do not_ retain device assignment.
Unfortunately, we _cannot_ guarantee the desirable property that
pod going running retain their device assignment across node reboots.
In the kubelet restart flow, all pods race to be admitted. There's no
order enforced between device plugin pods and application pods.
Unless an application pod is lucky enough to _lose_ the race with both
the device plugin (to go running before the app pod does) and _also_
with the kubelet (which needs to set devices healthy before the pod
tries admission).
Signed-off-by: Francesco Romani <fromani@redhat.com>
The "// import <path>" comment has been superseded by Go modules.
We don't have to remove them, but doing so has some advantages:
- They are used inconsistently, which is confusing.
- We can then also remove the (currently broken) hack/update-vanity-imports.sh.
- Last but not least, it would be a first step towards avoiding the k8s.io domain.
This commit was generated with
sed -i -e 's;^package \(.*\) // import.*;package \1;' $(git grep -l '^package.*// import' | grep -v 'vendor/')
Everything was included, except for
package labels // import k8s.io/kubernetes/pkg/util/labels
because that package is marked as "read-only".
Not setting a total timeout for curl may result in hanging command if
the connection succeeded, but the data transfer did not.
Signed-off-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>
When stripping out log messages from the failure text, the original text gets
stored as <system-out>. That part then got lost when reducing tests. Instead of
dropping it, it needs to be joined from all failed tests. Same for
<system-err>, although that isn't used yet.
Repeating the same "Failed" message text doesn't add any
information. Separating with a blank line is more readable.
Before:
<failure message="Failed; Failed; Failed" type="">
...
--- FAIL: TestFrontProxyConfig/WithoutUID (64.89s) ; === RUN TestFrontProxyConfig/WithUID
After:
<failure message="Failed" type="">
...
--- FAIL: TestFrontProxyConfig/WithoutUID (64.89s)
=== RUN TestFrontProxyConfig/WithUID
The kubernetes repository contains some internal golang modules that are
not part of the golang global workspace. Because apidiff is currently
run from the root of the repository, it does not work against this
internal modules.
Instead of executing apidiff from the root we can just cd into the
passed path of the module to avoid this limitation.
When `kubeadm init phase bootstrap-token` gets invoked, it reads
the kubeconfig from disk repeatedly. This is wasteful, but, more
importantly, it blocks the use of `/dev/stdin` and other sources
of data that cannot be read repeatedly.
This change introduces a new field that caches a parsed kubeconfig
and when a new clientset is requested, it is converted from
this pre-parsed kubeconfig, the code no longer reaches out to disk.