kubernetes

mirror of https://github.com/k3s-io/kubernetes.git synced 2025-08-09 20:17:41 +00:00

Author	SHA1	Message	Date
Kubernetes Prow Robot	0da0b7a85d	Merge pull request #119251 from soltysh/issue119230 Match both old and new kubectl version for a while in e2e	2023-07-12 04:51:12 -07:00
Patrick Ohly	1b8ddf6b79	podgc controller: convert to contextual logging	2023-07-12 13:45:10 +02:00
TommyStarK	f924bf95df	dynamic resource allocation: Improve code coverage of state checkpoint Signed-off-by: TommyStarK <thomasmilox@gmail.com>	2023-07-12 13:27:18 +02:00
Francesco Romani	c635a7e7d8	node: devicemgr: topomgr: add logs One of the contributing factors of issues #118559 and #109595 hard to debug and fix is that the devicemanager has very few logs in important flow, so it's unnecessarily hard to reconstruct the state from logs. We add minimal logs to be able to improve troubleshooting. We add minimal logs to be backport-friendly, deferring a more comprehensive review of logging to later PRs. Signed-off-by: Francesco Romani <fromani@redhat.com>	2023-07-12 13:25:36 +02:00
Francesco Romani	d78671447f	e2e: node: add test to check device-requiring pods are cleaned up Make sure orphanded pods (pods deleted while kubelet is down) are handled correctly. Outline: 1. create a pod (not static pod) 2. stop kubelet 3. while kubelet is down, force delete the pod on API server 4. restart kubelet the pod becomes an orphaned pod and is expected to be killed by HandlePodCleanups. There is a similar test already, but here we want to check device assignment. Signed-off-by: Francesco Romani <fromani@redhat.com>	2023-07-12 13:25:36 +02:00
Francesco Romani	5cf50105a2	e2e: node: devices: improve the node reboot test The recently added e2e device plugins test to cover node reboot works fine if runs every time on CI environment (e.g CI) but doesn't handle correctly partial setup when run repeatedly on the same instance (developer setup). To accomodate both flows, we extend the error management, checking more error conditions in the flow. Signed-off-by: Francesco Romani <fromani@redhat.com>	2023-07-12 13:25:36 +02:00
Francesco Romani	b926aba268	e2e: node: devicemanager: update tests Fix e2e device manager tests. Most notably, the workload pods needs to survive a kubelet restart. Update tests to reflect that. Signed-off-by: Francesco Romani <fromani@redhat.com>	2023-07-12 13:25:36 +02:00
Francesco Romani	3bcf4220ec	kubelet: devices: skip allocation for running pods When kubelet initializes, runs admission for pods and possibly allocated requested resources. We need to distinguish between node reboot (no containers running) versus kubelet restart (containers potentially running). Running pods should always survive kubelet restart. This means that device allocation on admission should not be attempted, because if a container requires devices and is still running when kubelet is restarting, that container already has devices allocated and working. Thus, we need to properly detect this scenario in the allocation step and handle it explicitely. We need to inform the devicemanager about which pods are already running. Note that if container runtime is down when kubelet restarts, the approach implemented here won't work. In this scenario, so on kubelet restart containers will again fail admission, hitting https://github.com/kubernetes/kubernetes/issues/118559 again. This scenario should however be pretty rare. Signed-off-by: Francesco Romani <fromani@redhat.com>	2023-07-12 13:25:36 +02:00
Maciej Szulik	ab3a0b78ea	Match both old and new kubectl version for a while in e2e	2023-07-12 12:49:33 +02:00
Kubernetes Prow Robot	745cfa35bd	Merge pull request #119147 from mengjiao-liu/contextual-logging-controller-disruption Migrate /pkg/controller/disruption to structured and contextual logging	2023-07-12 03:35:25 -07:00
Kubernetes Prow Robot	a8093823c3	Merge pull request #119042 from sttts/sttts-restcore-split cmd/kube-apiserver: turn core (legacy) rest storage into standard RESTStorageProvider	2023-07-12 03:35:17 -07:00
Patrick Ohly	c143a875ed	dra e2e: fix "reallocation works" flake The main problem probably was that https://github.com/kubernetes/kubernetes/pull/118862 moved creating the first pod before setting up the callback which blocks allocating one claim for that pod. This is racy because allocations happen in the background. The test also was unnecessarily complex and hard to read: - The intended effect can be achieved with three instead of four claims. - It wasn't clear which claim has "external-claim-other" as name. Using the claim variable avoids that.	2023-07-12 11:20:47 +02:00
Patrick Ohly	6f1a29520f	scheduler/dra: reduce pod scheduling latency This is a combination of two related enhancements: - By implementing a PreEnqueue check, the initial pod scheduling attempt for a pod with a claim template gets avoided when the claim does not exist yet. - By implementing cluster event checks, only those pods get scheduled for which something changed, and they get scheduled immediately without delay.	2023-07-12 11:17:04 +02:00
Kubernetes Prow Robot	a8b90c9008	Merge pull request #119247 from saschagrunert/setcap setcap: update to debian bookworm v1.0.0	2023-07-12 02:11:12 -07:00
Patrick Ohly	e01db32573	scheduler util: handle cache.DeletedFinalStateUnknown in As Informer callbacks must be prepared to get cache.DeletedFinalStateUnknown as the deleted object. They can use that as hint that some information may have been missed, but typically they just retrieve the stored object inside it.	2023-07-12 11:07:59 +02:00
Patrick Ohly	ef48efc736	scheduler dynamicresources: minor logging improvements This makes some complex values a bit more readable.	2023-07-12 11:07:59 +02:00
Sascha Grunert	363874e9b5	setcap: update to debian bookworm v1.0.0 Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2023-07-12 09:29:55 +02:00
Kubernetes Prow Robot	95e915c428	Merge pull request #119229 from HirazawaUi/fix-convert-slice Fix the converts an empty string to nil.	2023-07-11 23:39:11 -07:00
Kubernetes Prow Robot	5130dad2cf	Merge pull request #118408 from danwinship/local-detector kube-proxy local traffic detector single-vs-dual-stack cleanup	2023-07-11 21:19:11 -07:00
Mengjiao Liu	19869478c1	Migrate /pkg/controller/disruption to structured and contextual logging	2023-07-12 11:30:45 +08:00
Kubernetes Prow Robot	98e7c2a751	Merge pull request #119237 from jpbetz/jpbetz-apiserver-integration-owner Add jpbetz as approver of apiserver integration tests	2023-07-11 20:03:18 -07:00
Kubernetes Prow Robot	2d9c951abe	Merge pull request #117011 from fabi200123/Add-Node-Log-Query-Tests- Add e2e tests for feature NodeLogQuery	2023-07-11 20:03:11 -07:00
Kubernetes Prow Robot	d45b6ba676	Merge pull request #119225 from iholder101/bump-cadvisor/v0.47.3 Bump cadvisor version to v0.47.3	2023-07-11 16:19:11 -07:00
Kubernetes Prow Robot	da8974157f	Merge pull request #119209 from jiahuif-forks/feature/validating-admission-policy/typechecking-expension ValidatingAdmissionPolicy: expended type checking to messageExpression	2023-07-11 14:19:12 -07:00
Monis Khan	b81f07ac9a	Add enj to apiserver options approver Signed-off-by: Monis Khan <mok@microsoft.com>	2023-07-11 16:07:44 -04:00
Kubernetes Prow Robot	4954c7bac4	Merge pull request #118540 from jiahuif-forks/feature/validating-admission-policy/authorizer-typechecking-support add support for authorizer to type checking.	2023-07-11 12:41:22 -07:00
Kubernetes Prow Robot	6ffca50136	Merge pull request #116443 from benluddy/secondary-authz-decision-caching Cache authz decisions within the scope of validating policy admission.	2023-07-11 12:41:11 -07:00
Joe Betz	6d6595d0f6	Add jpbetz as approver of apiserver integration tests	2023-07-11 14:36:45 -04:00
Maciej Skrocki	43b509de42	staging: Add endpointslice to publishing data.	2023-07-11 18:08:26 +00:00
Maciej Skrocki	7c873327b6	Convert controller name to reconciler variable.	2023-07-11 18:08:25 +00:00
Maciej Skrocki	29fad383da	move endpointslice reconciler to staging endpointslice repo	2023-07-11 18:08:12 +00:00
Kubernetes Prow Robot	a6890b361d	Merge pull request #119193 from mimowo/sync-job-context Introduce syncJobContext to limit the number of function parameters	2023-07-11 10:33:30 -07:00
Kubernetes Prow Robot	da61644869	Merge pull request #119179 from gjkim42/add-prestop-e2e-test node-e2e: Add container lifecycle e2e tests for preStop hook	2023-07-11 10:33:23 -07:00
Kubernetes Prow Robot	e0dafe57a3	Merge pull request #117351 from pohly/dra-generated-resource-claim-names DRA: generated resource claim names	2023-07-11 10:33:11 -07:00
Wojciech Tyczyński	c0030a4d27	Add support for watchlist to APF	2023-07-11 18:49:04 +02:00
HirazawaUi	9759fc3c23	Fix the converts an empty string to nil.	2023-07-12 00:02:13 +08:00
Maciej Skrocki	22c66784e0	staging: add endpointslice repo	2023-07-11 15:42:20 +00:00
Dr. Stefan Schimanski	a34e06e74c	kube-apiserver/corerest: structure Config	2023-07-11 17:27:20 +02:00
Dr. Stefan Schimanski	75e3576523	kube-apiserver: rewire service controllers: kubernetesservice + IP repair	2023-07-11 17:27:20 +02:00
Itamar Holder	f22aa42aa8	bump go.mod cadvisor to v0.47.3 Signed-off-by: Itamar Holder <iholder@redhat.com>	2023-07-11 17:22:33 +03:00
PiotrProkop	f855a23b45	topologymanager: promote TopologyManagerPolicyOptions feature to beta * Promote TopologyManagerPolicyOptions feature to beta * Promote PreferClosestNUMANodes TopologyManagerPolicyOption to beta Signed-off-by: PiotrProkop <pprokop@nvidia.com>	2023-07-11 15:06:57 +02:00
Kubernetes Prow Robot	e1fbd0c113	Merge pull request #119207 from serathius/progress-notify Implement conditionalProgressRequester that allows requesting watch progress notification if watch cache is not fresh	2023-07-11 06:05:19 -07:00
Arda Güçlü	3267dd9d52	kubectl delete: Introduce new interactive flag for interactive deletion (#114530 )	2023-07-11 06:05:11 -07:00
PiotrProkop	23833b9c81	topologymanager: Increase TopologyManager test coverage by adding negative test cases around NUMA topology discovery Signed-off-by: PiotrProkop <pprokop@nvidia.com>	2023-07-11 15:04:32 +02:00
PiotrProkop	998654e044	topologymanager: fix TopologyManagerPolicyBetaOptions not being enabled by default Signed-off-by: PiotrProkop <pprokop@nvidia.com>	2023-07-11 15:04:32 +02:00
Patrick Ohly	fec25785ee	dra: store generated ResourceClaims in cache This addresses the following bad sequence of events: - controller creates ResourceClaim - updating pod status fails - pod gets retried before the informer receives the created ResourceClaim - another ResourceClaim gets created Storing the generated ResourceClaim in a MutationCache ensures that the controller knows about it during the retry. A positive side effect is that ResourceClaims now get index by pod owner and thus iterating over existing ones becomes a bit more efficient.	2023-07-11 14:23:49 +02:00
Patrick Ohly	ba810871ad	dra e2e: check that not generating a ResourceClaim works This is not something that normally happens, but the API supports it because it might be needed at some point, so we have to test it.	2023-07-11 14:23:49 +02:00
Patrick Ohly	0fc62d5ded	dra: generated files	2023-07-11 14:23:48 +02:00
Patrick Ohly	444d23bd2f	dra: generated name for ResourceClaim from template Generating the name avoids all potential name collisions. It's not clear how much of a problem that was because users can avoid them and the deterministic names for generic ephemeral volumes have not led to reports from users. But using generated names is not too hard either. What makes it relatively easy is that the new pod.status.resourceClaimStatus map stores the generated name for kubelet and node authorizer, i.e. the information in the pod is sufficient to determine the name of the ResourceClaim. The resource claim controller becomes a bit more complex and now needs permission to modify the pod status. The new failure scenario of "ResourceClaim created, updating pod status fails" is handled with the help of a new special "resource.kubernetes.io/pod-claim-name" annotation that together with the owner reference identifies exactly for what a ResourceClaim was generated, so updating the pod status can be retried for existing ResourceClaims. The transition from deterministic names is handled with a special case for that recovery code path: a ResourceClaim with no annotation and a name that follows the Kubernetes <= 1.27 naming pattern is assumed to be generated for that pod claim and gets added to the pod status. There's no immediate need for it, but just in case that it may become relevant, the name of the generated ResourceClaim may also be left unset to record that no claim was needed. Components processing such a pod can skip whatever they normally would do for the claim. To ensure that they do and also cover other cases properly ("no known field is set", "must check ownership"), resourceclaim.Name gets extended.	2023-07-11 14:23:48 +02:00
Kubernetes Prow Robot	86038ae590	Merge pull request #116846 from moshe010/e2e--node-pod-resources kubelet pod-resources: add e2e for KubeletPodResourcesGet feature	2023-07-11 04:53:24 -07:00

... 3 4 5 6 7 ...

117439 Commits