Commit Graph

128939 Commits

Author SHA1 Message Date
Patrick Ohly
582b421393 DRA kubeletplugin: add RollingUpdate
When the new RollingUpdate option is used, the DRA driver gets deployed such
that it uses unique socket paths and uses file locking to serialize gRPC
calls. This enables the kubelet to pick arbitrarily between two concurrently
instances. The handover is seamless (no downtime, no removal of ResourceSlices
by the kubelet).

For file locking, the fileutils package from etcd is used because that was
already a Kubernetes dependency. Unfortunately that package brings in some
additional indirect dependency for DRA drivers (zap, multierr), but those
seem acceptable.
2025-03-18 12:32:35 +01:00
Patrick Ohly
b471c2c11f DRA kubelet: support rolling upgrades
The key difference is that the kubelet must remember all plugin instances
because it could always happen that the new instance dies and leaves only the
old one running.

The endpoints of each instance must be different. Registering a plugin with the
same endpoint as some other instance is not supported and triggers an error,
which should get reported as "not registered" to the plugin. This should only
happen when the kubelet missed some unregistration event and re-registers the
same instance again. The recovery in this case is for the plugin to shut down,
remove its socket, which should get observed by kubelet, and then try again
after a restart.
2025-03-18 12:32:35 +01:00
Patrick Ohly
760903c0de DRA kubelet: give DRA drivers a 30 second grace period for updates
When doing an update of a DaemonSet, first the old pod gets stopped and
then the new one is started. This causes the kubelet to remove all
ResourceSlices directly after removal and forces the new pod to recreate all of
them.

Now the kubelet waits 30 seconds before it deletes ResourceSlices. If a new
driver registers during that period, nothing is done at all. The new driver
finds the existing ResourceSlices and only needs to update them if something
changed.

The downside is that if the driver gets removed permanently, this creates a
delay where pods might still get scheduled to the node although the driver is
not going to run there anymore and thus the pods will be stuck.
2025-03-18 12:32:35 +01:00
Patrick Ohly
0490b9f0b7 kubelet: document seamless upgrade support and guidance
This tries to capture the current state of affairs and a potential plan for
supporting seamless upgrades better.
2025-03-17 14:43:08 +01:00
Kubernetes Prow Robot
5a6ace2aa0 Merge pull request #130811 from serathius/watchcache-test-negative-rv
Add test cases for negative resource version in TestList
2025-03-17 05:37:48 -07:00
Kubernetes Prow Robot
e2a77c2a05 Merge pull request #130815 from serathius/watchcache-simplify-bypass-test
Simplify bypass test by just testing shouldDelegateList function
2025-03-17 04:30:00 -07:00
Kubernetes Prow Robot
d2ef120924 Merge pull request #130813 from serathius/watchcache-consistent-list-flake
Fix flaky RunTestConsistentList
2025-03-17 04:29:49 -07:00
Kubernetes Prow Robot
abad982cf8 Merge pull request #130857 from thockin/kk_small_vg_diffs
Port small deltas from validation-gen dev branch to master
2025-03-16 15:49:47 -07:00
Tim Hockin
8f69d596e8 Fix pkg names != dir in tests 2025-03-16 14:41:18 -07:00
Tim Hockin
b47e839e4e Comment on origin and JSON schema 2025-03-16 14:32:49 -07:00
Tim Hockin
46d5438c14 Fix import groupings 2025-03-16 14:29:52 -07:00
Tim Hockin
1ff4433c87 Fix whitespace in validateFalse test fixture
This diff should be entirely whitespace.
2025-03-16 14:29:19 -07:00
Tim Hockin
4c0c2d21ea Use origin in validateFalse's own test 2025-03-16 14:27:31 -07:00
Tim Hockin
d1d77cd553 Use test.Helper in helper funcs 2025-03-16 14:26:41 -07:00
Kubernetes Prow Robot
f007012f5f Merge pull request #130700 from pohly/dra-kubeletplugin-helper
DRA kubeletplugin: turn helper into wrapper
2025-03-16 01:55:47 -07:00
Kubernetes Prow Robot
157f42bff3 Merge pull request #129295 from carlory/fg-PersistentVolumeLastPhaseTransitionTime
Remove general available feature-gate PersistentVolumeLastPhaseTransitionTime
2025-03-15 05:25:47 -07:00
carlory
1f04af7947 Remove general avaliable feature-gate PersistentVolumeLastPhaseTransitionTime 2025-03-15 16:05:34 +08:00
Kubernetes Prow Robot
555efba04a Merge pull request #128123 from felipeagger/feat/add-updatepodsandbox-cri-method
[FG:InPlacePodVerticalScaling] Add UpdatePodSandboxResources CRI method
2025-03-14 23:07:46 -07:00
Kubernetes Prow Robot
18e5a4d585 Merge pull request #130827 from thockin/kk_refactor_FunctionGen
validation-gen: Simplify FunctionGen and VariableGen
2025-03-14 19:09:53 -07:00
Kubernetes Prow Robot
b29c2f5343 Merge pull request #130738 from ritazh/dra-user-rbac
DRA: add user rbac
2025-03-14 19:09:46 -07:00
Kubernetes Prow Robot
c12006e8b4 Merge pull request #130742 from gauravkghildiyal/kep-2433-ga
Promote TopologyAwareHints feature-gate to GA
2025-03-14 17:41:53 -07:00
Kubernetes Prow Robot
8de738e336 Merge pull request #129923 from vinayakankugoyal/gitRepo
KEP-5040: Disable git_repo volume driver.
2025-03-14 17:41:46 -07:00
Kubernetes Prow Robot
88642906f0 Merge pull request #130820 from rata/userns-stub-windows
userns: Don't special-case windows for the kubelet userns mappings
2025-03-14 16:27:57 -07:00
Kubernetes Prow Robot
e981d1302b Merge pull request #130728 from jpbetz/enable-declarative-validation
Enable DeclarativeValidation feature gate by default
2025-03-14 16:27:46 -07:00
Rita Zhang
06482b6bd3 address comment
Signed-off-by: Rita Zhang <rita.z.zhang@gmail.com>
2025-03-14 13:51:43 -07:00
Kubernetes Prow Robot
22ff6b4918 Merge pull request #130818 from macsko/fix_flaky_nonimatednodename_integration_test
Wait for node to appear in cache in TestUpdateNominatedNodeName integration test
2025-03-14 13:39:58 -07:00
Kubernetes Prow Robot
687a2f0d87 Merge pull request #130763 from aramase/aramase/t/kep_4412_alpha_plugin_unit_tests
Add unit tests for credential provider in service account mode
2025-03-14 13:39:50 -07:00
Filipe Xavier
41e3efdb60 change doPodResizeAction to call updatePodSandBoxResources inside setPodCgroupConfig 2025-03-14 16:49:15 -03:00
Tim Hockin
a758e725b8 Non-pointer VariableGen 2025-03-14 12:33:46 -07:00
Tim Hockin
4e3d114c26 Refactor VariableGen - no interface needed 2025-03-14 12:33:45 -07:00
Tim Hockin
6a59dcfa1d Non-pointer FunctionGen 2025-03-14 12:33:44 -07:00
Tim Hockin
0b29555323 Refactor FunctionGen - no interface needed 2025-03-14 12:33:43 -07:00
Kubernetes Prow Robot
c79e13c177 Merge pull request #130821 from BenTheElder/revert-procs
Revert "stop overriding max concurrency in CI, let automax procs handle it
2025-03-14 12:32:07 -07:00
Kubernetes Prow Robot
d7a720f393 Merge pull request #130819 from jpbetz/fix-subresource-disablement
Guard declarative validation code to only validate spec since subresources are not yet supported
2025-03-14 12:32:00 -07:00
Kubernetes Prow Robot
8e147365a8 Merge pull request #130803 from siyuanfoundation/owner
chore: Add update-featuregates to update.sh
2025-03-14 12:31:53 -07:00
Kubernetes Prow Robot
3c97b23a20 Merge pull request #130795 from thockin/kk_emit_comments_before_validations
Validation-gen: emit comments before validations
2025-03-14 12:31:47 -07:00
Vinayak Goyal
282e1490d4 KEP-5040: Disable git_repo volume driver. 2025-03-14 19:29:03 +00:00
Gaurav Ghildiyal
9aeeb53095 Remove usage of TopologyAwareHints feature-gate from kube-proxy packages.
TopologyAwareHints feature-gate is GA'd and enabled by default since 1.33. Since
it is also locked-to-default, we can remove flag-usages in kube-proxy.

NOTE that as per
https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/feature-gates.md#disablement-tests:
_"Disablement tests are only required to be preserved for components and
libraries that support compatibility version. Tests for node and kubelet are
unaffected by compatibility version."_
2025-03-14 12:06:40 -07:00
Gaurav Ghildiyal
25e041470e Run ./hack/update-featuregates.sh 2025-03-14 12:06:02 -07:00
Gaurav Ghildiyal
619957c976 Graduate TopologyAwareHints feature-gate to GA in 1.33 and LockToDefault 2025-03-14 12:06:02 -07:00
Joe Betz
a6c94ea605 Enable DeclarativeValidation feature gate by default 2025-03-14 14:44:10 -04:00
Anish Ramasekar
95d411382f Fix comment for GetServiceAccountFunc type
Signed-off-by: Anish Ramasekar <anish.ramasekar@gmail.com>
2025-03-14 11:21:58 -07:00
Anish Ramasekar
01302639f5 Add unit tests for credential provider in service account mode
Signed-off-by: Anish Ramasekar <anish.ramasekar@gmail.com>
2025-03-14 11:21:08 -07:00
Kubernetes Prow Robot
f9e92a1aa7 Merge pull request #130814 from LionelJouin/kep-4817-beta
[KEP-4817] DRAResourceClaimDeviceStatus to Beta
2025-03-14 10:57:58 -07:00
Kubernetes Prow Robot
45f5ecfefd Merge pull request #125452 from carlory/clean-e2efeatures
remove unneeded e2e features
2025-03-14 10:57:47 -07:00
Maciej Skoczeń
f6a35c55f2 Wait for node to appear in cache in TestUpdateNominatedNodeName integration test 2025-03-14 17:06:30 +00:00
Benjamin Elder
cf20c21ef8 Revert "stop overriding max concurrency in CI, let automax procs handle it"
This reverts changes from commit 9e42056a0d.

NOTE: this is not a clean revert bcause of further changes.
2025-03-14 09:42:04 -07:00
Rita Zhang
04ac6df8a9 add dra to edit role and add featuregate test
Signed-off-by: Rita Zhang <rita.z.zhang@gmail.com>
2025-03-14 09:14:15 -07:00
Rita Zhang
718ed7d0b5 dra: add user rbac
Signed-off-by: Rita Zhang <rita.z.zhang@gmail.com>
2025-03-14 09:14:15 -07:00
Joe Betz
5a98d4dbb4 Limit declarative validation to spec until subresource support is added 2025-03-14 11:58:19 -04:00