kubernetes

mirror of https://github.com/k3s-io/kubernetes.git synced 2025-08-21 01:26:28 +00:00

Author	SHA1	Message	Date
Francesco Romani	29d26297a1	e2e: node: fix misleading device plugin test We have a e2e test which tries to ensure device plugin assignments to pods are kept across node reboots. And this tests is permafailing since many weeks at time of writing (xref: #128443). Problem is: closer inspection reveals the test was well intentioned, but puzzling: The test runs a pod, then restarts the kubelet, then _expects the pod to end up in admission failure_ and yet _ensure the device assignment is kept_! https://github.com/kubernetes/kubernetes/blob/v1.32.0-rc.0/test/e2e_node/device_plugin_test.go#L97 A reader can legitmately wonder if this means the device will be kept busy forever? This is not the case, luckily. The test however embodied the behavior at time of the kubelet, in turn caused by #103979 Device manager used to record the last admitted pod and forcibly added to the list of active pod. The retention logic had space for exactly one pod, the last which attempted admission. This retention prevented the cleanup code (see: https://github.com/kubernetes/kubernetes/blob/v1.32.0-rc.0/pkg/kubelet/cm/devicemanager/manager.go#L549 compare to: https://github.com/kubernetes/kubernetes/blob/v1.31.0-rc.0/pkg/kubelet/cm/devicemanager/manager.go#L549) to clear the registration, so the device was still (mis)reported allocated to the failed pod. This fact was in turn leveraged by the test in question: the test uses the podresources API to learn about the device assignment, and because of the chain of events above the pod failed admission yet was still reported as owning the device. What happened however was the next pod trying admission would have replaced the previous pod in the device manager data, so the previous pod was no longer forced to be added into the active list, so its assignment were correctly cleared once the cleanup code runs; And the cleanup code is run, among other things, every time device manager is asked to allocated devices and every time podresources API queries the device assignment Later, in PR https://github.com/kubernetes/kubernetes/pull/120661 the forced retention logic was removed from all the resource managers, thus also from device manager, and this is what caused the permafailure. Because all of the above, it should be evident that the e2e test was actually enforcing a very specific and not really work-as-intended behavior, which was also overall quite puzzling for users. The best we can do is to fix the test to record and ensure that pods which did fail admission _do not_ retain device assignment. Unfortunately, we _cannot_ guarantee the desirable property that pod going running retain their device assignment across node reboots. In the kubelet restart flow, all pods race to be admitted. There's no order enforced between device plugin pods and application pods. Unless an application pod is lucky enough to _lose_ the race with both the device plugin (to go running before the app pod does) and _also_ with the kubelet (which needs to set devices healthy before the pod tries admission). Signed-off-by: Francesco Romani <fromani@redhat.com>	2024-12-04 17:06:27 +01:00
Kubernetes Release Robot	2b472fe469	CHANGELOG: Update directory for v1.32.0-rc.1 release	2024-12-03 19:07:38 +00:00
Kubernetes Prow Robot	810e9e212e	Merge pull request #129019 from cpanato/update-rules Update publishing-bot rules to Go 1.22.9 for the active release branches	2024-11-28 19:22:57 +00:00
cpanato	823cd76162	Update publishing-bot rules to Go 1.22.9 for the active release branches Signed-off-by: cpanato <ctadeu@gmail.com>	2024-11-28 15:00:08 +01:00
Kubernetes Prow Robot	95d71c464a	Merge pull request #128990 from dims/update-publishing-bot-rules-for-release-1.32 staging/publishing: add release-1.32 branch	2024-11-27 15:14:56 +00:00
Kubernetes Prow Robot	9d62330bfa	Merge pull request #128286 from umagnus/fix_unmount_relative_path fix: mount-utils IsLikelyNotMountPoint relative path issue	2024-11-27 09:02:57 +00:00
Davanum Srinivas	2ca4217a38	staging/publishing: add release-1.32 branch ``` update-rules --branch release-1.32 --rules staging/publishing/rules.yaml --o staging/publishing/rules.yaml ``` Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2024-11-26 19:40:13 -05:00
Kubernetes Release Robot	0e1abc4d18	CHANGELOG: Update directory for v1.32.0-rc.0 release	2024-11-26 16:47:58 +00:00
Kubernetes Prow Robot	8770bd58d0	Merge pull request #128966 from liggitt/deflake-external Wait for updated keys to be observed	2024-11-25 19:18:59 +00:00
Jordan Liggitt	26c08dde52	Wait for updated keys to be observed	2024-11-25 11:55:20 -05:00
Kubernetes Prow Robot	e4c1f980b7	Merge pull request #128932 from pohly/dra-node-selector-validation DRA API: validate node selector labels	2024-11-22 20:22:55 +00:00
Kubernetes Prow Robot	35d098aaa0	Merge pull request #128852 from cpanato/update-go [go] Bump images, dependencies and versions to go 1.23.3 and distroless iptables	2024-11-22 10:18:55 +00:00
AxeZhan	3075a9ae96	DRA API: validate node selector labels Previously, ValidateNodeSelector did not check that labels are valid. Now it does for resource.k8s.io, regardless whether an object already was created with invalid labels in an earlier Kubernetes release. Theoretically this is a breaking change and could cause problems during an upgrade, but that is highly unlikely in practice. In contrast to node affinity, DRA does not ignore parse errors (= uses NewNodeSelector, not NewLazyErrorNodeSelector), so invalid labels would have been found instead of being silently ignored. Even if some object has invalid labels, this only affects an alpha -> beta upgrade which isn't guaranteed to work seamlessly.	2024-11-22 09:10:02 +01:00
Kubernetes Prow Robot	40f222b620	Merge pull request #128834 from pohly/scheduler-perf-pass-workaround scheduler_perf: fix and enhance reporting	2024-11-21 16:08:55 +00:00
Kubernetes Release Robot	776fb24e06	CHANGELOG: Update directory for v1.31.3 release	2024-11-21 01:12:49 +00:00
Kubernetes Release Robot	dfa82b30f4	CHANGELOG: Update directory for v1.30.7 release	2024-11-21 00:46:50 +00:00
Kubernetes Release Robot	a132d17d45	CHANGELOG: Update directory for v1.29.11 release	2024-11-20 23:36:16 +00:00
Kubernetes Prow Robot	bf70d289fb	Merge pull request #128875 from pacoxu/revert-128682-ippr-beta Revert "[FG:InPlacePodVerticalScaling] Graduate to Beta"	2024-11-20 14:34:54 +00:00
Paco Xu	03a15fa65d	Revert "[FG:InPlacePodVerticalScaling] Graduate to Beta"	2024-11-20 14:55:29 +08:00
Kubernetes Prow Robot	c9092f69fc	Merge pull request #128851 from pacoxu/fix-master-blocking skip if cri proxy is disabled/undefined	2024-11-19 14:30:56 +00:00
Kubernetes Prow Robot	021dbe9d19	Merge pull request #128841 from princepereira/ppereira-remove-opencensus Vendoring changes for new hnslib v0.0.8.	2024-11-19 10:08:54 +00:00
cpanato	cb42224952	Bump images, dependencies and versions to go 1.23.3 and distroless iptables Signed-off-by: cpanato <ctadeu@gmail.com>	2024-11-19 09:57:41 +01:00
Paco Xu	59dfb0e779	skip if cri proxy is disabled/undefined	2024-11-19 11:17:07 +08:00
Prince Pereira	07c79da04f	vendoring changes for new hnslib v0.0.8.	2024-11-18 23:30:50 +05:30
Patrick Ohly	25a4758bcc	testing: allow keeping detailed go test JUnit results Pruning of tests to the top-level test was added for jobs like pull-kubernetes-unit which run many tests. For other, more focused jobs like scheduler-perf benchmarking it would be nice to keep the more detailed information, in particular because it includes the duration per test case.	2024-11-18 12:44:34 +01:00
Patrick Ohly	ac3d43a8a6	scheduler_perf: work around incorrect gotestsum failure reports Because Go does not a "pass" action for benchmarks (https://github.com/golang/go/issues/66825#issuecomment-2343229005), gotestsum reports a successful benchmark run as failed (https://github.com/gotestyourself/gotestsum/issues/413#issuecomment-2343206787). We can work around that in each benchmark and sub-benchmark by emitting the output line that `go test` expects on stdout from the test binary for success.	2024-11-18 12:35:05 +01:00
Patrick Ohly	369a18a3a1	scheduler_perf: simplify flags, fix output The "disabled by label filter" message for benchmarks printed the pointer to the filter string, not the filter string itself. This mistake gets avoided and the code becomes simpler when not using pointers.	2024-11-18 12:32:59 +01:00
Kubernetes Prow Robot	cf480a3a1a	Merge pull request #128800 from soltysh/flake_128742 Add timeout for port-forward test	2024-11-15 18:02:53 +00:00
Maciej Szulik	9e87e99587	Add timeout for port-forward test After removing a pod in port-forward test we wait for an error from POST request. Since the POST doesn't have a timeout it hangs indefinitely, so instead we're hitting a DefaultPodDeletionTimeout. To make sure the POST fails this adds a timeout to ensure we'll always get that expected error, rather than nil. Signed-off-by: Maciej Szulik <soltysh@gmail.com>	2024-11-15 17:48:24 +01:00
Kubernetes Prow Robot	23e76432a9	Merge pull request #128720 from mengjiao-liu/update-metrics-1-32 Update metrics docs for v1.32 release(Note this must be committed after the code freeze)	2024-11-15 04:30:51 +00:00
Kubernetes Prow Robot	475ee33f69	Merge pull request #128765 from sanposhiho/split-tests fix: split TestCoreResourceEnqueue to deal with the timeout issue	2024-11-14 00:12:46 +00:00
Kubernetes Prow Robot	deecaf73eb	Merge pull request #128763 from srivastav-abhishek/fix-err-string Fixed failing UT TestWriteKubeletConfigFiles by removing privilege check and adding proper error handling	2024-11-13 18:54:47 +00:00
Kensei Nakada	429abe33f1	fix: split TestCoreResourceEnqueue to deal with the timeout issue	2024-11-13 11:53:42 -07:00
Abhishek Kr Srivastav	56e3c787a5	Fixed failing test by removing privilege check and some refactor Addressed review comments	2024-11-13 23:13:33 +05:30
Kubernetes Prow Robot	5ee686b6cf	Merge pull request #128559 from lauralorenz/crashloopbackoff-refactorimagepullbackoff-e2enodecriproxytest E2E Node tests for image pull backoff and crashloopbackoff behavior	2024-11-13 17:34:47 +00:00
Kubernetes Prow Robot	f59dd4bce3	Merge pull request #128777 from SataQiu/fix-upgrade-dryrun-20241113 kubeadm: fix a bug where upgrade dryrun can not select the network interface correctly	2024-11-13 08:14:46 +00:00
SataQiu	d81e8beaea	kubeadm: fix a bug where upgrade dryrun can not select the network interface correctly	2024-11-13 11:54:30 +08:00
Kubernetes Prow Robot	0926587bf0	Merge pull request #128771 from tallclair/min-quota [FG:InPlacePodVerticalScaling] Equate CPU limits below the minimum effective limit (10m)	2024-11-13 03:48:53 +00:00
Kubernetes Prow Robot	af7581e8ec	Merge pull request #128761 from ah8ad3/revert-126533 Revert PR #126533 and add a test case for broken case	2024-11-13 03:48:46 +00:00
Kubernetes Prow Robot	420c6982ef	Merge pull request #128764 from pohly/dra-kubelet-grpc-api-package-name DRA kubelet: use unique protobuf package name	2024-11-13 02:00:46 +00:00
Laura Lorenz	9ab0d81d76	Now that sleep is shorter, only expect to reach 3 within 30s Focused too much on the container restart one in commit that fixed that Signed-off-by: Laura Lorenz <lauralorenz@google.com>	2024-11-13 01:39:58 +00:00
Tim Allclair	18600f43e0	Min cpu limit resize e2e test	2024-11-12 17:23:36 -08:00
Tim Allclair	8342d39956	Equate CPU limits below the minimum effective limit (10m)	2024-11-12 17:23:17 -08:00
Laura Lorenz	59f9858086	Move function specific to container restart test inline Signed-off-by: Laura Lorenz <lauralorenz@google.com>	2024-11-12 23:59:30 +00:00
Laura Lorenz	529d5ba9d3	Don't overly indirect image name Signed-off-by: Laura Lorenz <lauralorenz@google.com>	2024-11-12 23:34:57 +00:00
Laura Lorenz	8e7b2af712	Use a better util Signed-off-by: Laura Lorenz <lauralorenz@google.com>	2024-11-12 23:30:03 +00:00
Laura Lorenz	285d433dea	Clearer image pull test and utils Signed-off-by: Laura Lorenz <lauralorenz@google.com>	2024-11-12 23:30:00 +00:00
Laura Lorenz	e03d0f60ef	Orient tests to run faster, but tolerate infra slowdowns up to 5 minutes Signed-off-by: Laura Lorenz <lauralorenz@google.com>	2024-11-12 21:48:28 +00:00
Laura Lorenz	d293c5088f	Fix spelling Signed-off-by: Laura Lorenz <lauralorenz@google.com>	2024-11-12 21:12:20 +00:00
Kubernetes Prow Robot	252e9cbb23	Merge pull request #128754 from vivzbansal/sidecar-3 Add AllowSidecarResizePolicy to relax resize policy validation check of sidecar containers	2024-11-12 20:28:48 +00:00

1 2 3 4 5 ...

127097 Commits