Commit Graph

117461 Commits

Author SHA1 Message Date
Patrick Ohly
08d40f53a7 dra: test with and without immediate ReservedFor
The recommendation and default in the controller helper code is to set
ReservedFor to the pod which triggered delayed allocation. However, this
is neither required nor enforced. Therefore we should also test the fallback
path were kube-scheduler itself adds the pod to ReservedFor.
2023-07-12 16:57:17 +02:00
Patrick Ohly
98ba89d31d resourceclaim controller: avoid caching deleted pod unnecessarily
We don't need to remember that a pod got deleted when it had no resource claims
because the code which checks the cached UIDs only checks for pods which have
resource claims.
2023-07-12 16:57:17 +02:00
Amine
28b6c90696 Move DeleteFunc logging to level 2 2023-07-12 15:50:40 +01:00
Amine
761016482d Properly setup mutatingWebhookConfigurationManager{} 2023-07-12 15:50:40 +01:00
Amine
747dbd9b6b run ./hack/verify-gofmt.sh 2023-07-12 15:50:16 +01:00
Amine
1eb60939fe Add smart reload for MutatingWebhooks 2023-07-12 15:50:16 +01:00
Amine
aeefb762ec Properly handle parameter in shareInformer.DeleteFunc 2023-07-12 15:50:16 +01:00
Amine
a01a8cb07e Fix webhook accessors caching pattern 2023-07-12 15:50:16 +01:00
Amine
7d3d44af77 Add webhookAccessors smart reloads unit tests
This patch adds few unit tests to assert that the webhook accessors are
only recreate when they are update in the api-server.

In order to test this feature we had to make few changes to wb manager
that allows us to mock `NewValidatingWebhookAccessor` external function.
2023-07-12 15:50:16 +01:00
Amine
c6f36e8702 Fix deadlock issue
This patch fixes the deadlock issue by using a map to cache already
initiated Webhooks instead of using `needRefresh` map.
2023-07-12 15:50:16 +01:00
Amine
99875b3fb7 Webhook Accessors Smart Recompilation
Addresses https://github.com/kubernetes/kubernetes/issues/116588

This is an WIP patch trying to avoid recompiling CELs expressions when
recreation Validating/Mutating WebhookAccessors.

Maybe we should also concider using generatic.Controller from
5f59f44983/staging/src/k8s.io/apiserver/pkg/admission/plugin/validatingadmissionpolicy/internal/generic/controller.go
2023-07-12 15:50:14 +01:00
Kubernetes Prow Robot
da2d500c80
Merge pull request #119252 from serathius/flakes
Fix TestConditionalProgressRequester and TestWaitUntilFreshAndListTimeout flakes
2023-07-12 07:49:26 -07:00
Kubernetes Prow Robot
be222f38f0
Merge pull request #119058 from TommyStarK/dra-state-checkpoint-unit-test
dynamic resource allocation: Improve code coverage of state checkpoint
2023-07-12 07:49:14 -07:00
Patrick Ohly
7d064812bb kube-controller-manager: finish conversion to contextual logging
This removes all exceptions and fixes the remaining unconverted log calls.
2023-07-12 14:57:29 +02:00
Kubernetes Prow Robot
3cc729fc7f
Merge pull request #119195 from pohly/dra-reallocate-flake
dra e2e: fix "reallocation works" flake
2023-07-12 05:55:25 -07:00
Kubernetes Prow Robot
529eeb78ef
Merge pull request #119078 from pohly/dra-scheduler-queueing-hints
dra: scheduler queueing hints
2023-07-12 05:55:13 -07:00
Patrick Ohly
d743c50bb9 kubelet: support batched prepare/unprepare in v1alpha3 DRA plugin API
Combining all prepare/unprepare operations for a pod enables plugins to
optimize the execution. Plugins can continue to use the v1beta2 API for now,
but should switch. The new API is designed so that plugins which want to work
on each claim one-by-one can do so and then report errors for each claim
separately, i.e. partial success is supported.
2023-07-12 14:50:30 +02:00
Marek Siarkowicz
7a63997c8a Improve apiserver storage size metric to allow it's graduation
Change name to make it compliant with prometheus guidelines.
Calculate it on demand instead of periodic to comply with prometheus standards.
Replace "endpoint" with "server" label to make it semantically consistent with storage factory
2023-07-12 14:33:10 +02:00
dprotaso
610509fedd Update standard app protocols
Add websocket support - see https://github.com/kubernetes/enhancements/pull/3996
2023-07-12 08:28:50 -04:00
Dr. Stefan Schimanski
f1f2fa9da8
kube-apiserver/corerest: split apart generic code 2023-07-12 14:13:10 +02:00
Francesco Romani
01c3a51a78 node: podresources: getallocatable: move to GA
lock the feature gate to GA, and remove the now-redundant code.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-07-12 14:11:22 +02:00
Marek Siarkowicz
c1decb6763 Fix TestConditionalProgressRequester and TestWaitUntilFreshAndListTimeout flakes 2023-07-12 14:03:32 +02:00
Kubernetes Prow Robot
0da0b7a85d
Merge pull request #119251 from soltysh/issue119230
Match both old and new kubectl version for a while in e2e
2023-07-12 04:51:12 -07:00
Patrick Ohly
1b8ddf6b79 podgc controller: convert to contextual logging 2023-07-12 13:45:10 +02:00
TommyStarK
f924bf95df dynamic resource allocation: Improve code coverage of state checkpoint
Signed-off-by: TommyStarK <thomasmilox@gmail.com>
2023-07-12 13:27:18 +02:00
Francesco Romani
c635a7e7d8 node: devicemgr: topomgr: add logs
One of the contributing factors of issues #118559 and #109595 hard to
debug and fix is that the devicemanager has very few logs in important
flow, so it's unnecessarily hard to reconstruct the state from logs.

We add minimal logs to be able to improve troubleshooting.
We add minimal logs to be backport-friendly, deferring a more
comprehensive review of logging to later PRs.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-07-12 13:25:36 +02:00
Francesco Romani
d78671447f e2e: node: add test to check device-requiring pods are cleaned up
Make sure orphanded pods (pods deleted while kubelet is down) are
handled correctly.
Outline:
1. create a pod (not static pod)
2. stop kubelet
3. while kubelet is down, force delete the pod on API server
4. restart kubelet
the pod becomes an orphaned pod and is expected to be killed by HandlePodCleanups.

There is a similar test already, but here we want to check device
assignment.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-07-12 13:25:36 +02:00
Francesco Romani
5cf50105a2 e2e: node: devices: improve the node reboot test
The recently added e2e device plugins test to cover node reboot
works fine if runs every time on CI environment (e.g CI) but
doesn't handle correctly partial setup when run repeatedly on
the same instance (developer setup).

To accomodate both flows, we extend the error management, checking
more error conditions in the flow.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-07-12 13:25:36 +02:00
Francesco Romani
b926aba268 e2e: node: devicemanager: update tests
Fix e2e device manager tests.
Most notably, the workload pods needs to survive a kubelet
restart. Update tests to reflect that.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-07-12 13:25:36 +02:00
Francesco Romani
3bcf4220ec kubelet: devices: skip allocation for running pods
When kubelet initializes, runs admission for pods and possibly
allocated requested resources. We need to distinguish between
node reboot (no containers running) versus kubelet restart (containers
potentially running).

Running pods should always survive kubelet restart.
This means that device allocation on admission should not be attempted,
because if a container requires devices and is still running when kubelet
is restarting, that container already has devices allocated and working.

Thus, we need to properly detect this scenario in the allocation step
and handle it explicitely. We need to inform
the devicemanager about which pods are already running.

Note that if container runtime is down when kubelet restarts, the
approach implemented here won't work. In this scenario, so on kubelet
restart containers will again fail admission, hitting
https://github.com/kubernetes/kubernetes/issues/118559 again.
This scenario should however be pretty rare.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-07-12 13:25:36 +02:00
Maciej Szulik
ab3a0b78ea
Match both old and new kubectl version for a while in e2e 2023-07-12 12:49:33 +02:00
Kubernetes Prow Robot
745cfa35bd
Merge pull request #119147 from mengjiao-liu/contextual-logging-controller-disruption
Migrate /pkg/controller/disruption to structured and contextual logging
2023-07-12 03:35:25 -07:00
Kubernetes Prow Robot
a8093823c3
Merge pull request #119042 from sttts/sttts-restcore-split
cmd/kube-apiserver: turn core (legacy) rest storage into standard RESTStorageProvider
2023-07-12 03:35:17 -07:00
Patrick Ohly
c143a875ed dra e2e: fix "reallocation works" flake
The main problem probably was that
https://github.com/kubernetes/kubernetes/pull/118862 moved creating the first
pod before setting up the callback which blocks allocating one claim for that
pod. This is racy because allocations happen in the background.

The test also was unnecessarily complex and hard to read:
- The intended effect can be achieved with three instead of four claims.
- It wasn't clear which claim has "external-claim-other" as name.
  Using the claim variable avoids that.
2023-07-12 11:20:47 +02:00
Patrick Ohly
6f1a29520f scheduler/dra: reduce pod scheduling latency
This is a combination of two related enhancements:
- By implementing a PreEnqueue check, the initial pod scheduling
  attempt for a pod with a claim template gets avoided when the claim
  does not exist yet.
- By implementing cluster event checks, only those pods get
  scheduled for which something changed, and they get scheduled
  immediately without delay.
2023-07-12 11:17:04 +02:00
Kubernetes Prow Robot
a8b90c9008
Merge pull request #119247 from saschagrunert/setcap
setcap: update to debian bookworm v1.0.0
2023-07-12 02:11:12 -07:00
Patrick Ohly
e01db32573 scheduler util: handle cache.DeletedFinalStateUnknown in As
Informer callbacks must be prepared to get cache.DeletedFinalStateUnknown as
the deleted object. They can use that as hint that some information may have
been missed, but typically they just retrieve the stored object inside it.
2023-07-12 11:07:59 +02:00
Patrick Ohly
ef48efc736 scheduler dynamicresources: minor logging improvements
This makes some complex values a bit more readable.
2023-07-12 11:07:59 +02:00
Sascha Grunert
363874e9b5
setcap: update to debian bookworm v1.0.0
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2023-07-12 09:29:55 +02:00
Kubernetes Prow Robot
95e915c428
Merge pull request #119229 from HirazawaUi/fix-convert-slice
Fix the converts an empty string to nil.
2023-07-11 23:39:11 -07:00
Kubernetes Prow Robot
5130dad2cf
Merge pull request #118408 from danwinship/local-detector
kube-proxy local traffic detector single-vs-dual-stack cleanup
2023-07-11 21:19:11 -07:00
Mengjiao Liu
19869478c1 Migrate /pkg/controller/disruption to structured and contextual logging 2023-07-12 11:30:45 +08:00
Kubernetes Prow Robot
98e7c2a751
Merge pull request #119237 from jpbetz/jpbetz-apiserver-integration-owner
Add jpbetz as approver of apiserver integration tests
2023-07-11 20:03:18 -07:00
Kubernetes Prow Robot
2d9c951abe
Merge pull request #117011 from fabi200123/Add-Node-Log-Query-Tests-
Add e2e tests for feature NodeLogQuery
2023-07-11 20:03:11 -07:00
Kubernetes Prow Robot
d45b6ba676
Merge pull request #119225 from iholder101/bump-cadvisor/v0.47.3
Bump cadvisor version to v0.47.3
2023-07-11 16:19:11 -07:00
Kubernetes Prow Robot
da8974157f
Merge pull request #119209 from jiahuif-forks/feature/validating-admission-policy/typechecking-expension
ValidatingAdmissionPolicy: expended type checking to messageExpression
2023-07-11 14:19:12 -07:00
Monis Khan
b81f07ac9a
Add enj to apiserver options approver
Signed-off-by: Monis Khan <mok@microsoft.com>
2023-07-11 16:07:44 -04:00
Kubernetes Prow Robot
4954c7bac4
Merge pull request #118540 from jiahuif-forks/feature/validating-admission-policy/authorizer-typechecking-support
add support for authorizer to type checking.
2023-07-11 12:41:22 -07:00
Kubernetes Prow Robot
6ffca50136
Merge pull request #116443 from benluddy/secondary-authz-decision-caching
Cache authz decisions within the scope of validating policy admission.
2023-07-11 12:41:11 -07:00
Joe Betz
6d6595d0f6 Add jpbetz as approver of apiserver integration tests 2023-07-11 14:36:45 -04:00