Breakdown of the steps implemented as part of this e2e test is as follows:
1. Create a file `registration` at path `/var/lib/kubelet/device-plugins/sample/`
2. Create sample device plugin with an environment variable with
`REGISTER_CONTROL_FILE=/var/lib/kubelet/device-plugins/sample/registration` that
waits for a client to delete the control file.
3. Trigger plugin registeration by deleting the abovementioned directory.
4. Create a test pod requesting devices exposed by the device plugin.
5. Stop kubelet.
6. Remove pods using CRI to ensure new pods are created after kubelet restart.
7. Restart kubelet.
8. Wait for the sample device plugin pod to be running. In this case,
the registration is not triggered.
9. Ensure that resource capacity/allocatable exported by the device plugin is zero.
10. The test pod should fail with `UnexpectedAdmissionError`
11. Delete the test pod.
12. Delete the sample device plugin pod.
13. Remove `/var/lib/kubelet/device-plugins/sample/` and its content, the directory
created to control registration
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
This commit reuses e2e tests implmented as part of https://github.com/kubernetes/kubernetes/pull/110729.
The commit is borrowed from the aforementioned PR as is to preserve
authorship. Subsequent commit will update the end to end test to
simulate the problem this PR is trying to solve by reproducing
the issue: 109595.
Co-authored-by: Francesco Romani <fromani@redhat.com>
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
In case of node reboot/kubelet restart, the flow of events involves
obtaining the state from the checkpoint file followed by setting
the `healthDevices`/`unhealthyDevices` to its zero value. This is
done to allow the device plugin to re-register itself so that
capacity can be updated appropriately.
During the allocation phase, we need to check if the resources requested
by the pod have been registered AND healthy devices are present on
the node to be allocated.
Also we need to move this check above `needed==0` where needed is
required - devices allocated to the container (which is obtained from
the checkpoint file) because even in cases where no additional devices
have to be allocated (as they were pre-allocated), we still need to
make the devices that were previously allocated are healthy.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
- set higher severity and log level when unmanaged pods found and improve testing
- do not mention unsupported controller when triggering event for
unmanaged pods (this is covered by CalculateExpectedPodCountFailed
event)
- test unsupported controller
- make testing for events non blocking when event not found
go list -find takes ~60% the time:
$ time go list -e ./... | grep -E -v "/(build|third_party|vendor|staging|clientset_generated|hack)/" | md5sum
b5593b3f51f3b3cd08c33bbff9627d10 -
real 0m2.687s
user 0m3.624s
sys 0m1.552s
$ time go list -find -e ./... | grep -E -v "/(build|third_party|vendor|staging|clientset_generated|hack)/" | md5sum
b5593b3f51f3b3cd08c33bbff9627d10 -
real 0m1.721s
user 0m1.675s
sys 0m1.197s
https://github.com/kubernetes/kubernetes/pull/116166#discussion_r1123924871
To run just a specific linter based on command line flags, the default
configuration needs to be disabled (because it would enable additional ones)
and then command line flags must be passed through to "golangci-lint run".
For example, to lint with just "go vet" in verbose mode, use:
verify-golangci-lint.sh -c none -- --disable-all --enable=govet -v