Commit Graph

129291 Commits

Author SHA1 Message Date
Jon Huhn
5760a4f282 DRA scheduler: device taints and tolerations
Thanks to the tracker, the plugin sees all taints directly in the device
definition and can compare it against the tolerations of a request while
trying to find a device for the request.

When the feature is turnedd off, taints are ignored during scheduling.
2025-03-19 09:18:38 +01:00
Patrick Ohly
a027b439e5 DRA: add device taint eviction controller
The controller is derived from the node taint eviction controller.
In contrast to that controller it tracks the UID of pods to prevent
deleting the wrong pod when it got replaced.
2025-03-19 09:18:38 +01:00
Keita Mochizuki
07a275437f
kubectl debug: Display a warning message that the debug container's capabilities may not work with a non-root user (#127696)
* Add warning message about capabilities of debug container

* fix1

* fix2

* fix3
2025-03-19 00:50:30 -07:00
vinay kulkarni
d5d008a6bd Invoke UpdateContainerResources or trigger container restarts (for RestartContainer policy) when memory requests are resized 2025-03-19 06:33:27 +00:00
Rita Zhang
0301e5a9f8
DRA: AdminAccess validate based on namespace label
Signed-off-by: Rita Zhang <rita.z.zhang@gmail.com>
2025-03-18 22:56:54 -07:00
Kubernetes Prow Robot
3a14b619d5
Merge pull request #130910 from googs1025/fix/datarace
flake: fix data race for func TestBackoff_Step
2025-03-18 22:49:55 -07:00
Kubernetes Prow Robot
a6227695ab
Merge pull request #128402 from richabanker/mvp-agg-discovery
KEP 4020: Replace StorageVersionAPI with aggregated discovery to fetch served resources by a peer apiserver
2025-03-18 21:43:49 -07:00
Kubernetes Prow Robot
4dfed146e0
Merge pull request #130891 from pohly/dra-scheduler-plugin-unit-test-fix
DRA scheduler: fix potential panic during unit test verification
2025-03-18 20:34:16 -07:00
Kubernetes Prow Robot
9f8a84930d
Merge pull request #130573 from natasha41575/pod-conditions
[FG:PodObservedGenerationTracking] kubelet sets observedGeneration on pod conditions
2025-03-18 20:34:08 -07:00
Kubernetes Prow Robot
f287bc21b7
Merge pull request #130115 from danmillwood/danmillwood-dispatcher-test-patch
Fix intermittent failure in TestDispatcher test
2025-03-18 20:34:01 -07:00
Kubernetes Prow Robot
3b6596d1e0
Merge pull request #130020 from mozillazg/patch-3
test: fix a typo
2025-03-18 20:33:49 -07:00
googs1025
2f1f19a992 flake: fix data race for func TestBackoff_Step 2025-03-19 10:48:58 +08:00
Kubernetes Prow Robot
32b1819423
Merge pull request #130906 from serathius/streaming-validation
Update kube-openapi and integrate streaming tags validation
2025-03-18 18:46:00 -07:00
Kubernetes Prow Robot
7fb8bd8aca
Merge pull request #130905 from tallclair/ippr-beta
[FG:InPlacePodVerticalScaling] Graduate to Beta
2025-03-18 18:45:54 -07:00
Kubernetes Prow Robot
83f8513db8
Merge pull request #130550 from sanposhiho/async-preemption-beta
feat: graduate the async preemption feature to beta
2025-03-18 17:17:54 -07:00
Kubernetes Prow Robot
6a968c5789
Merge pull request #130904 from serathius/watchcache-corrupt
In TestListCorruptObject corrupt the object in etcd instead of changing encryption key
2025-03-18 16:09:55 -07:00
Marek Siarkowicz
75a4d136ab Update kube-openapi and integrate streaming tags validation 2025-03-18 23:52:55 +01:00
Tim Allclair
cd1a5c6d5c Fix Kubelet unit tests 2025-03-18 15:51:09 -07:00
Kubernetes Prow Robot
94d66387d0
Merge pull request #130553 from Phaow/vac-e2e
Add protection finalizer to vac when it is created
2025-03-18 14:59:54 -07:00
Kubernetes Prow Robot
0f7ab496c1
Merge pull request #130901 from deads2k/perms
add API approvers to generated applyconfigurations
2025-03-18 13:52:12 -07:00
Kubernetes Prow Robot
fe60c4316e
Merge pull request #130514 from xigang/daemonset
Add workqueue for node updates in DaemonSetController
2025-03-18 13:52:04 -07:00
Kubernetes Prow Robot
64621d17a6
Merge pull request #129832 from pohly/dra-seamless-upgrade
DRA: seamless driver upgrades
2025-03-18 13:51:51 -07:00
Marek Siarkowicz
506e4fed14 In TestListCorruptObject corrupt the object in etcd instead of changing encryption key
Changing the encryption key doesn't work with watch cache as it doesn't
break decoding newly written objects. A new object will be written using
a new key, and decoded using a new key.
2025-03-18 21:49:17 +01:00
Dawei Wei
413e867f53 [KEP-5100] WinDSR to Beta 2025-03-18 13:46:45 -07:00
Richa Banker
8b2cee83c1 Replace StorageVersion API with aggregated discovery to fetch served resources by a peer for MVP
Co-authored-by: Joe Betz <jpbetz@google.com>

Co-authored-by: Jordan Liggitt <jordan@liggitt.net>
2025-03-18 13:27:27 -07:00
Marek Siarkowicz
c09d87f79c Implement watchcache returning error from etcd that caused cache reinitialization 2025-03-18 21:20:11 +01:00
Patrick Ohly
13d04d4a92 DRA device taints: copy taintseviction controller
This is a verbatim copy of the current pkg/controller/taintseviction code,
revision fc268ecd09 (v1.33.0 plus one commit),
minus the TimedWorker helper.

The intent is to modify the code such that it enforces eviction of pods which
use tainted devices.
2025-03-18 20:52:54 +01:00
Patrick Ohly
6478ca5859 ktesting: fix per-test logging in TContext.Run and WithTB
WithTB was originally defined as "uses the existing logger". But what we want
there and in the newer TContext.Run is the usual per-test logging, now for the
sub-test.
2025-03-18 20:52:54 +01:00
Jon Huhn
939c9c0c6b DRA: add ResourceSlice tracker
The purpose of the tracker is to emulate a ResourceSlice informer, including
cache and event handlers. In contrast to that informer, the tracker adds taints
from a DeviceTaint such that they appear in the ResourceSlice device
definition. Code using the tracker doesn't need to care where the taints are
coming from.

The main advantage is that it enables fine-grained reactions to taints that
only affect a few devices, the common case. Without this tracker, the pod
eviction controller would have to sync all pods when any slice or any taint
change.

In the scheduler it avoids re-evaluating the selection criteria repeatedly.
The tracker serves as a cross-pod-scheduling cache.
2025-03-18 20:52:54 +01:00
Patrick Ohly
99dbd85c45 DRA: generated files for device taints API 2025-03-18 20:52:54 +01:00
Patrick Ohly
797475e113 DRA: add device taints API
This adds the "DeviceTaint" top-level type to v1alpha3 and related fields to
ResourceSlice and ResourceClaim. It's complete enough bring up an API server
and generate files.
2025-03-18 20:52:54 +01:00
Patrick Ohly
7fb028a433 DRA: add DRADeviceTaints feature 2025-03-18 20:52:54 +01:00
Kubernetes Prow Robot
fe27448ee4
Merge pull request #130833 from rzlink/master
Add Unit Tests for Windows DSR and Overlay Support
2025-03-18 12:45:56 -07:00
Eddie Torres
c766a52356
Implement KEP 4876 Mutable CSINode (#130007)
* Implement KEP-4876 Mutable CSINode Allocatable Count

Signed-off-by: torredil <torredil@amazon.com>

* Update TestGetNodeAllocatableUpdatePeriod

Signed-off-by: torredil <torredil@amazon.com>

* Implement CSINodeUpdater

Signed-off-by: torredil <torredil@amazon.com>

* Use sync.Once in csiNodeUpdater

Signed-off-by: torredil <torredil@amazon.com>

* ImVerify driver is installed before running periodic updates

Signed-off-by: torredil <torredil@amazon.com>

* Update NodeAllocatableUpdatePeriodSeconds type comment

Signed-off-by: torredil <torredil@amazon.com>

* Leverage apivalidation.ValidateImmutableField in ValidateCSINodeUpdate

Signed-off-by: torredil <torredil@amazon.com>

* Update strategy functions

Signed-off-by: torredil <torredil@amazon.com>

* Run hack/update-openapi-spec.sh

Signed-off-by: torredil <torredil@amazon.com>

* Update VolumeError.ErrorCode field

Signed-off-by: torredil <torredil@amazon.com>

* CSINodeUpdater improvements

Signed-off-by: torredil <torredil@amazon.com>

* Iron out concurrency in syncDriverUpdater

Signed-off-by: torredil <torredil@amazon.com>

* Run hack/update-openapi-spec.sh

Signed-off-by: torredil <torredil@amazon.com>

* Revise logging

Signed-off-by: torredil <torredil@amazon.com>

* Revise log in VerifyExhaustedResource

Signed-off-by: torredil <torredil@amazon.com>

* Update API validation

Signed-off-by: torredil <torredil@amazon.com>

* Add more code coverage

Signed-off-by: torredil <torredil@amazon.com>

* Fix pull-kubernetes-linter-hints

Signed-off-by: torredil <torredil@amazon.com>

* Update API types documentation

Signed-off-by: torredil <torredil@amazon.com>

* Update strategy and validation for new errorCode field

Signed-off-by: torredil <torredil@amazon.com>

* Update validation tests after strategy changes

Signed-off-by: torredil <torredil@amazon.com>

* Update VA status strategy

Signed-off-by: torredil <torredil@amazon.com>

---------

Signed-off-by: torredil <torredil@amazon.com>
2025-03-18 12:45:49 -07:00
Tim Allclair
9be73c0d67 Graduate InPlacePodVerticalScaling to beta 2025-03-18 12:26:42 -07:00
Kubernetes Prow Robot
55573a0739
Merge pull request #130823 from torredil/update-storage-csi-test-manifests
Update hostpathplugin image to v1.16.1
2025-03-18 11:28:01 -07:00
Kubernetes Prow Robot
b658aa1e79
Merge pull request #130796 from ndixita/pod-level-resources-ippr
Replace PodResourceAllocation with PodResourceInfoMap type and cleanup
2025-03-18 11:27:49 -07:00
David Eads
691398c856 add API approvers to generated applyconfigurations
API approvers review new fields and need permissions to approve the
files generated from those new fields
2025-03-18 13:29:10 -04:00
xigang
aa32537e9a Add workqueue for node updates in DaemonSetController
Signed-off-by: xigang <wangxigang2014@gmail.com>
2025-03-19 01:09:44 +08:00
mchtech
381ccf0f4c
Fix empty describedObject in hpa status (#124555)
* fix empty DescribedObject in hpa MetricStatus when object target type is AverageValue

Signed-off-by: mchtech <michu_an@126.com>

* add test

Signed-off-by: mchtech <michu_an@126.com>

---------

Signed-off-by: mchtech <michu_an@126.com>
2025-03-18 09:33:56 -07:00
Mark Sasnal
5625483527 KEP-4540: added e2e tests for strict-cpu-reservation option 2025-03-18 11:52:25 -04:00
Mark Sasnal
269bbac6e8 KEP-4540: moved StrictCPUReservationOption to beta feature gate 2025-03-18 11:52:23 -04:00
Natasha Sarkar
4c2be4bdde kubelet sets observedGeneration in conditions 2025-03-18 15:43:24 +00:00
Patrick Ohly
d95d6ba526 DRA scheduler: fix potential panic during unit test verification
If there was an unexpected status, the code extracting the expected error
message crashed with a panic. Happened once so far, for unknown reasons
because the unexpected status then didn't get logged.
2025-03-18 15:07:51 +01:00
Kubernetes Prow Robot
ded2956c83
Merge pull request #130886 from macsko/fix_race_when_closing_activeq
Fix a race when closing activeQ
2025-03-18 06:32:07 -07:00
Kubernetes Prow Robot
4b848a555f
Merge pull request #130863 from serathius/watchcache-negative-RV-consistent
Extend tests for negative RV with consistent reads
2025-03-18 06:31:57 -07:00
Kubernetes Prow Robot
8312d8e85e
Merge pull request #130560 from stlaz/remote-uid-config-beta
RemoteRequestHeaderUID: bump to beta, enabled by default
2025-03-18 06:31:49 -07:00
Kubernetes Prow Robot
8559194e11
Merge pull request #130878 from yongruilin/compatibility-version-featuregate
feat: Add alpha feature verification to feature gates
2025-03-18 05:25:49 -07:00
Patrick Ohly
582b421393 DRA kubeletplugin: add RollingUpdate
When the new RollingUpdate option is used, the DRA driver gets deployed such
that it uses unique socket paths and uses file locking to serialize gRPC
calls. This enables the kubelet to pick arbitrarily between two concurrently
instances. The handover is seamless (no downtime, no removal of ResourceSlices
by the kubelet).

For file locking, the fileutils package from etcd is used because that was
already a Kubernetes dependency. Unfortunately that package brings in some
additional indirect dependency for DRA drivers (zap, multierr), but those
seem acceptable.
2025-03-18 12:32:35 +01:00
Patrick Ohly
b471c2c11f DRA kubelet: support rolling upgrades
The key difference is that the kubelet must remember all plugin instances
because it could always happen that the new instance dies and leaves only the
old one running.

The endpoints of each instance must be different. Registering a plugin with the
same endpoint as some other instance is not supported and triggers an error,
which should get reported as "not registered" to the plugin. This should only
happen when the kubelet missed some unregistration event and re-registers the
same instance again. The recovery in this case is for the plugin to shut down,
remove its socket, which should get observed by kubelet, and then try again
after a restart.
2025-03-18 12:32:35 +01:00