Commit Graph

114674 Commits

Author SHA1 Message Date
Swati Sehgal
bae8a164e0 node: device-mgr: e2e: address e2e test review comments
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-03-06 12:15:58 +00:00
Swati Sehgal
674879a959 node: device-mgr: e2e: Update the e2e test to reproduce issue:109595
Breakdown of the steps implemented as part of this e2e test is as follows:
1. Create a file `registration` at path `/var/lib/kubelet/device-plugins/sample/`
2. Create sample device plugin with an environment variable with
   `REGISTER_CONTROL_FILE=/var/lib/kubelet/device-plugins/sample/registration` that
    waits for a client to delete the control file.
3. Trigger plugin registeration by deleting the abovementioned directory.
4. Create a test pod requesting devices exposed by the device plugin.
5. Stop kubelet.
6. Remove pods using CRI to ensure new pods are created after kubelet restart.
7. Restart kubelet.
8. Wait for the sample device plugin pod to be running. In this case,
   the registration is not triggered.
9. Ensure that resource capacity/allocatable exported by the device plugin is zero.
10. The test pod should fail with `UnexpectedAdmissionError`
11. Delete the test pod.
12. Delete the sample device plugin pod.
13. Remove `/var/lib/kubelet/device-plugins/sample/` and its content, the directory
    created to control registration

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-03-06 12:15:58 +00:00
Swati Sehgal
db7afc1cd8 node: device-mgr: e2e: Implement End to end test
This commit reuses e2e tests implmented as part of https://github.com/kubernetes/kubernetes/pull/110729.
The commit is borrowed from the aforementioned PR as is to preserve
authorship. Subsequent commit will update the end to end test to
simulate the problem this PR is trying to solve by reproducing
the issue: 109595.

Co-authored-by: Francesco Romani <fromani@redhat.com>
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-03-06 11:52:23 +00:00
Swati Sehgal
5b2a3dbbdc node: device-mgr: explicitly check if pre-allocated devices are healthy
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-03-06 11:52:23 +00:00
Swati Sehgal
a799ffb571 node: device-mgr: unit-tests: admission failure due to unhealthy devices
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-03-06 11:52:23 +00:00
Swati Sehgal
7ac399c205 node: device-mgr: Handle recovery by checking if healthy devices exist
In case of node reboot/kubelet restart, the flow of events involves
obtaining the state from the checkpoint file followed by setting
the `healthDevices`/`unhealthyDevices` to its zero value. This is
done to allow the device plugin to re-register itself so that
capacity can be updated appropriately.

During the allocation phase, we need to check if the resources requested
by the pod have been registered AND healthy devices are present on
the node to be allocated.

Also we need to move this check above `needed==0` where needed is
required - devices allocated to the container (which is obtained from
the checkpoint file) because even in cases where no additional devices
have to be allocated (as they were pre-allocated), we still need to
make the devices that were previously allocated are healthy.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-03-06 11:52:23 +00:00
Kubernetes Prow Robot
b6acf6f805
Merge pull request #116294 from p0lyn0mial/upstream-flaky-getcurrentrvfromstorage
cacher: deflake TestGetCurrentResourceVersionFromStorage
2023-03-06 03:36:30 -08:00
Wojciech Tyczyński
280651abcc Autogenerated 2023-03-06 12:08:34 +01:00
Wojciech Tyczyński
760acbbbe3 Bump QPS limits for Kubelet 2023-03-06 12:07:52 +01:00
SataQiu
528a471302 remove unused resize.go from pkg/kubelet/container 2023-03-06 18:33:13 +08:00
Lukasz Szaszkiewicz
8fd9d573f0 cacher: deflake TestGetCurrentResourceVersionFromStorage 2023-03-06 11:30:39 +01:00
Kubernetes Prow Robot
b8aaaf380a
Merge pull request #116083 from SataQiu/clean-20230227
kubelet: remove unused DockerID type
2023-03-06 02:22:58 -08:00
Alexander Constantinescu
ec917850af Add proxy healthz result to ETP=local health check
Today, the health check response to the load balancers asking Kube-proxy for
the status of ETP:Local services does not include the healthz state of Kube-
proxy. This means that Kube-proxy might indicate to load balancers that they
should forward traffic to the node in question, simply because the endpoint
is running on the node - this overlooks the fact that Kube-proxy might be
not-healthy and hasn't successfully written the rules enabling traffic to
reach the endpoint.
2023-03-06 10:53:17 +01:00
vinay kulkarni
b0dce923f1 Add Get interfaces for container's checkpointed ResourcesAllocated and Resize values, remove error logging for valid standalone kubelet scenario 2023-03-06 09:50:12 +00:00
Kubernetes Prow Robot
931e07de16
Merge pull request #116284 from thockin/codegen_subprojects_cleanup_verify
Codegen: subprojects: clean up verify scripts
2023-03-06 00:14:59 -08:00
Alex Wang
13b941e120 feat: graduate matchLabelKeys in podTopologySpread to beta 2023-03-06 14:46:17 +08:00
huyinhou
88274d96fc update code style
Signed-off-by: huyinhou <huyinhou@bytedance.com>
2023-03-06 14:23:14 +08:00
csDengh
f762145e06
minor code improvement
minor code improvement 
from repeated assignments in loops to initialize outside the loop
2023-03-06 09:00:40 +08:00
Kensei Nakada
608f4808ff support PreFilter as well 2023-03-06 00:48:30 +00:00
Tim Hockin
357bfbc436
Codegen: subprojects: clean up verify scripts
They all run successfully.
2023-03-05 15:05:26 -08:00
Kubernetes Prow Robot
fafa45d13c
Merge pull request #116279 from bart0sh/PR105-fix-CDI-spec-version
DRA: fix CDI spec version
2023-03-05 12:22:57 -08:00
Ed Bartosh
35fd124f4d DRA: fix CDI spec version
The latest CDI release includes spec version check that fails
if version is less than 0.3.0:
  https://github.com/container-orchestrated-devices/container-device-interface/blob/v0.5.4/pkg/cdi/version.go#L42

Updating CDI spec version to 0.3.0 in the test kubelet plugin code
should fix e2e test failures on the CRI runtimes that use CDI >= 0.5.4
(Containerd master atm, CRI-O soon).
2023-03-05 16:49:56 +02:00
Kubernetes Prow Robot
bbbbfcd967
Merge pull request #116266 from SergeyKanzhelev/ExperimentalPodPidsLimit
rename ExperimentalPodPidsLimit to PodPidsLimit
2023-03-05 06:30:56 -08:00
Mateusz Puczyński
d1877f514a
adjust comment prefixes in k8s.io/api/apps/v1beta1/types.go 2023-03-04 21:20:24 +01:00
Mateusz Puczyński
f74724a3f4
update obsolete links 2023-03-04 19:57:52 +01:00
mantuliu
83fdbd76a1 Improve the performance when Resource Clone
Signed-off-by: mantuliu <240951888@qq.com>
2023-03-05 00:35:51 +08:00
vinay kulkarni
12435b26fc Fix nil pointer access panic in kubelet from uninitialized pod allocation checkpoint manager in standalone kubelet scenario 2023-03-04 08:07:40 +00:00
Kubernetes Prow Robot
d48b8167f7
Merge pull request #115463 from SergeyKanzhelev/containerStatusDocs
update docs for ContainerStatus fields
2023-03-03 20:17:06 -08:00
Yoon Park
8d2c81e7ec Fix comments at fit_test.go to increase redability 2023-03-04 13:03:15 +09:00
SataQiu
eb541bb819 controller-manager: fix a bug that the kubeconfig field of kubecontrollermanager.config.k8s.io configuration is not set correctly 2023-03-04 11:17:55 +08:00
Sergey Kanzhelev
04189b1fc4 rename ExperimentalPodPidsLimit to PodPidsLimit 2023-03-04 01:48:16 +00:00
Kubernetes Prow Robot
8da8bb41bc
Merge pull request #116243 from KnVerey/applyset_parent_mgmt
Create and update the ApplySet parent object
2023-03-03 15:21:13 -08:00
Kubernetes Prow Robot
6260796b63
Merge pull request #116233 from SergeyKanzhelev/GRPCContainerProbeGA
GRPCContainerProbe is GA
2023-03-03 15:21:06 -08:00
Kubernetes Prow Robot
20c3a007f5
Merge pull request #115693 from bobbypage/shutdown_test
test: e2e node shutdown test logging improvements
2023-03-03 15:20:57 -08:00
Kubernetes Prow Robot
15c5366a1c
Merge pull request #116240 from bobbypage/devicepluginfix
test: Fix path to e2e node sample device plugin
2023-03-03 14:15:09 -08:00
Kubernetes Prow Robot
ff735dff85
Merge pull request #116166 from pohly/test-go-vet
fix "go vet" issues, check as part of golangci-lint
2023-03-03 14:14:58 -08:00
Filip Křepinský
747ffe785d improve message, log level and testing for unmanaged pods in disruption controller
- set higher severity and log level when unmanaged pods found and improve testing
- do not mention unsupported controller when triggering event for
  unmanaged pods (this is covered by CalculateExpectedPodCountFailed
event)
- test unsupported controller
- make testing for events non blocking when event not found
2023-03-03 23:03:06 +01:00
Kubernetes Prow Robot
253ab3eda7
Merge pull request #116162 from apelisse/update-openapi
Update kube-openapi to afdc3dddf62d31f5e3868d699379c571a6007920
2023-03-03 12:29:09 -08:00
Kubernetes Prow Robot
20df9dd6b7
Merge pull request #115672 from sding3/fix-restricted-profile
fix restricted debug profile
2023-03-03 12:28:57 -08:00
Sean Sullivan
a49f132585 Tolerate empty discovery response in memcache client 2023-03-03 11:36:53 -08:00
Katrina Verey
3b0e13482e
Feedback and linter 2023-03-03 13:31:04 -05:00
Kubernetes Prow Robot
a1b12e49ea
Merge pull request #116251 from wojtek-t/fix_ready_test
Fix deadlock in ready test
2023-03-03 10:25:19 -08:00
Kubernetes Prow Robot
f7605cae7a
Merge pull request #115914 from ravisantoshgudimetla/promote-pdb
Promote pdb
2023-03-03 10:25:12 -08:00
Kubernetes Prow Robot
9f0b491953
Merge pull request #113270 from rrangith/fix/create-pvc-for-pending-pod
Automatically recreate PVC for pending STS pod
2023-03-03 10:24:58 -08:00
Dan Winship
3181db4606 Belatedly remove controller-manager IPv6DualStack feature gate 2023-03-03 13:16:36 -05:00
Kubernetes Prow Robot
37d8b5a2b8
Merge pull request #116227 from gnufied/wait-for-pod-startup-before-resize
Wait for pod to be running before expanding
2023-03-03 09:18:59 -08:00
Antoine Pelisse
736123f447 Update kube-openapi to afdc3dddf62d31f5e3868d699379c571a6007920 2023-03-03 08:43:44 -08:00
David Porter
c5a1f0188b
test: Add node e2e test to verify static pod termination
Add node e2e test to verify that static pods can be started after a
previous static pod with the same config temporarily failed termination.

The scenario is:

1. Static pod is started
2. Static pod is deleted
3. Static pod termination fails (internally `syncTerminatedPod` fails)
4. At later time, pod termination should succeed
5. New static pod with the same config is (re)-added
6. New static pod is expected to start successfully

To repro this scenario, setup a pod using a NFS mount. The NFS server is
stopped which will result in volumes failing to unmount and
`syncTerminatedPod` to fail. The NFS server is later started, allowing
the volume to unmount successfully.

xref:

1. https://github.com/kubernetes/kubernetes/pull/113145#issuecomment-1289587988
2. https://github.com/kubernetes/kubernetes/pull/113065
3. https://github.com/kubernetes/kubernetes/pull/113093

Signed-off-by: David Porter <david@porter.me>
2023-03-03 10:00:48 -06:00
David Porter
1c75c2cda8
test: Add e2e to verify static pod termination
Add a node e2e to verify that if a static pod is terminated while the
container runtime or CRI returns an error, the pod is eventually
terminated successfully.

This test serves as a regression test for k8s.io/issue/113145 which
fixes an issue where force deleted pods may not be terminated if the
container runtime fails during a `syncTerminatingPod`.

To test this behavior, start a static pod, stop the container runtime,
and later start the container runtime. The static pod is expected to
eventually terminate successfully.

To start and stop the container runtime, we need to find the container
runtime systemd unit name. Introduce a util function
`findContainerRuntimeServiceName` which finds the unit name by getting
the pid of the container runtime from the existing
`ContainerRuntimeProcessName` flag passed into node e2e and using
systemd dbus `GetUnitNameByPID` function to convert the pid of the
container runtime to a unit name. Using the unit name, introduce helper
functions to start and stop the container runtime.

Signed-off-by: David Porter <david@porter.me>
2023-03-03 10:00:48 -06:00
Wojciech Tyczyński
39fa78fe7d Fix deadlock in ready test 2023-03-03 16:47:11 +01:00