Commit Graph

6819 Commits

Author SHA1 Message Date
Jan Safranek
7c5a9b9833 Add unit test with CSIDriver.SELinuxMount=false
Add unit test with a volume plugin that does not support SELinux. That
simulates a CSi driver whose spec.SELinuxMount is empty or false.

This requires a little refactoring, each unit test now has a flag if it
runs with a volume plugin that supports SELinux.
2026-01-12 14:48:14 +01:00
Jan Safranek
6e491604a4 Use only enqueuePod to add pods to the controller queue
enqueuePod already creates the right key for a pod, it's better to reuse it
than copy the code around.
2026-01-12 14:48:14 +01:00
Jan Safranek
1c3b0b1138 Fix policy of Pods with unknown SELinux label
Reset SELinuxChangePolicy of Pods that have no SELinux label set to
Recursive. Kubelet cannot mount with `-o context=<label>`, if the label is
not known.

This fixes the e2e test error revealed by the previous commit - it changed the
e2e test to check for events when no events are expected and it found a
warning about a Pod with no label, but MountOption policy.
2026-01-12 14:48:13 +01:00
Jan Safranek
d05bfe8123 Add new unit tests 2026-01-12 14:48:13 +01:00
Jan Safranek
5602c5e6b5 Rework unit tests to builder pattern 2026-01-12 14:48:13 +01:00
Jan Safranek
9222f08d22 selinux: Do not report conflits with finished pods
When a Pod reaches its final state (Succeeded or Failed), its volumes are
getting unmounted and therefore their SELinux mount option will not
conflict with any other pod.

Let the SELinux controller monitor "pod updated" events to see the pod is
finished
2026-01-12 14:48:13 +01:00
Jan Safranek
f02a1fc357 refactoring: use a common function to enqueue Pod
addPod and deletePod have the same implementation, merge them into
enqueuePod
2026-01-12 14:48:13 +01:00
Filip Křepinský
6a783eb8f9 mark QuotaMonitor as not running and invalidate monitors list
to prevent close of closed channel panic
2026-01-08 13:46:29 +01:00
Kubernetes Prow Robot
a73380b215 Merge pull request #134676 from klueska/automated-cherry-pick-of-#132533-release-1.33
Automated cherry pick of #132533: DRA: fix deleting orphaned ResourceClaim on startup
2025-11-08 02:54:51 -08:00
Jon Huhn
ab145bbf2c DRA: fix deleting orphaned ResourceClaim on startup 2025-10-17 09:57:01 -07:00
Jordan Liggitt
ec583fd4f1 Include relevant dimensions in pod controller indexing 2025-10-17 09:04:21 -04:00
xigang
fcad64b91c Add namespace-aware orphan pod indexing
Signed-off-by: xigang <wangxigang2014@gmail.com>
2025-10-16 11:39:29 -04:00
xigang
ce801e5561 Fix DaemonSet misscheduled status not updating on node taint changes
Signed-off-by: xigang <wangxigang2014@gmail.com>
2025-09-10 20:58:34 +08:00
Jan Safranek
c1a0f959af Add a note about Conflicts return value 2025-08-28 10:31:15 +02:00
Jan Safranek
2d6c21edd1 Fix SELinux label comparison
The comparison of SELinux labels in KCM tolerates missing fields - the
operating system is going to default them from its defaults, but in KCM we
don't know what the defaults are.

But the OS won't default the last component, "level", which includes also
categories. Make sure that labels with a level set conflicts with level "",
that's what will conflict on the OS too.
2025-08-28 10:31:15 +02:00
Eric Lin
d2be12ab76 Clean backoff record earlier
Once received job deletion event, it cleans the backoff records for that
job before enqueueing this job so that we can avoid a race condition
that the syncJob() may incorrect use stale backoff records for a newly created
job with same key.

Co-authored-by: Michal Wozniak <michalwozniak@google.com>
2025-06-06 20:45:35 +00:00
Filip Křepinský
8db1426554 rename DeploymentPodReplacementPolicy FG to DeploymentReplicaSetTerminatingReplicas 2025-03-27 20:27:44 +01:00
Jean-Marc François
2dd9eda47f Add configurable tolerance logic. 2025-03-21 18:48:37 -04:00
Kubernetes Prow Robot
b0d6079ddc Merge pull request #130947 from pohly/dra-device-taints-flake
DRA device taints: fix some race conditions
2025-03-20 14:16:55 -07:00
Kubernetes Prow Robot
dca334e350 Merge pull request #130859 from hakuna-matatah/optimize-ds
Optimize DS Controller Performance: Reduce Work Duration Time & Minimize Cache Locking.
2025-03-20 14:16:39 -07:00
Patrick Ohly
cfb9486417 DRA taint eviction: avoid nil panic
The timed worker queue actually can have nil entries in its map if the work was
kicked off immediately. This looks like an unnecessary special case (it would
be fine to call AfterFunc with a duration <= 0 and it would do the right
thing), but to avoid more sweeping changes the fix consists of documenting this
special behavior and adding a nil check.
2025-03-20 19:49:54 +01:00
Patrick Ohly
56adcd06f3 DRA device eviction: fix eviction triggered by pod scheduling
Normally the scheduler shouldn't schedule when there is a taint, but perhaps it
didn't know yet.

The TestEviction/update test covered this, but only failed under the right
timing conditions. The new event handler test case covers it reliably.
2025-03-20 19:49:54 +01:00
Patrick Ohly
5856d3ee6f DRA taint eviction: fix waiting in unit test
Events get recorded in the apiserver asynchronously, so even if the test knows
that the event has been evicted because the pod is deleted, it still has to
also check for the event to be recorded.

This caused a flake in the "Consistently" check of events.
2025-03-20 17:59:48 +01:00
Patrick Ohly
ac6e47cb14 DRA taint eviction: improve error handling
There was one error path that led to a "controller has shut down" log
message. Other errors caused different log entries or are so unlikely (event
handler registration failure!) that they weren't checked at all.

It's clearer to let Run return an error in all cases and then log the
"controller has shut down" error at the call site. This also enables tests to
mark themselves as failed, should that ever happen.
2025-03-20 17:59:06 +01:00
Harish Kuna
a67cc3aac1 Reduce locking duration on cache to fetch data in DaemonSet Controller 2025-03-20 16:00:27 +00:00
Kubernetes Prow Robot
68ba091fca Merge pull request #130844 from danwinship/improved-traffic-distribution
KEP-3015 PreferSameZone/PreferSameNode traffic distribution
2025-03-19 13:00:48 -07:00
Kubernetes Prow Robot
ab3cec0701 Merge pull request #130447 from pohly/dra-device-taints
device taints and tolerations (KEP 5055)
2025-03-19 13:00:32 -07:00
Kubernetes Prow Robot
2b79593ece Merge pull request #130225 from ritazh/dra-admin-access-namespace
DRA: AdminAccess validate based on namespace label
2025-03-19 10:18:50 -07:00
Dan Winship
19952a2b7b Implement the EndpointSlice controller side of PreferSameZone/PreferSameNode 2025-03-19 08:39:13 -04:00
Patrick Ohly
9f161590be metrics testing: add type aliases to avoid direct prometheus imports
In tests it is sometimes unavoidable to use the Prometheus types directly,
for example when writing a custom gatherer which needs to normalize data
before testing it. device_taint_eviction_test.go does this to strip
out unpredictable data in a histogram.

With type aliases in a package that is explicitly meant for tests we
can avoid adding exceptions for such tests to the global exception list.
2025-03-19 09:18:38 +01:00
Patrick Ohly
a027b439e5 DRA: add device taint eviction controller
The controller is derived from the node taint eviction controller.
In contrast to that controller it tracks the UID of pods to prevent
deleting the wrong pod when it got replaced.
2025-03-19 09:18:38 +01:00
Rita Zhang
0301e5a9f8 DRA: AdminAccess validate based on namespace label
Signed-off-by: Rita Zhang <rita.z.zhang@gmail.com>
2025-03-18 22:56:54 -07:00
Kubernetes Prow Robot
a6227695ab Merge pull request #128402 from richabanker/mvp-agg-discovery
KEP 4020: Replace StorageVersionAPI with aggregated discovery to fetch served resources by a peer apiserver
2025-03-18 21:43:49 -07:00
Kubernetes Prow Robot
9f8a84930d Merge pull request #130573 from natasha41575/pod-conditions
[FG:PodObservedGenerationTracking] kubelet sets observedGeneration on pod conditions
2025-03-18 20:34:08 -07:00
Kubernetes Prow Robot
fe60c4316e Merge pull request #130514 from xigang/daemonset
Add workqueue for node updates in DaemonSetController
2025-03-18 13:52:04 -07:00
Richa Banker
8b2cee83c1 Replace StorageVersion API with aggregated discovery to fetch served resources by a peer for MVP
Co-authored-by: Joe Betz <jpbetz@google.com>

Co-authored-by: Jordan Liggitt <jordan@liggitt.net>
2025-03-18 13:27:27 -07:00
Patrick Ohly
13d04d4a92 DRA device taints: copy taintseviction controller
This is a verbatim copy of the current pkg/controller/taintseviction code,
revision fc268ecd09 (v1.33.0 plus one commit),
minus the TimedWorker helper.

The intent is to modify the code such that it enforces eviction of pods which
use tainted devices.
2025-03-18 20:52:54 +01:00
Eddie Torres
c766a52356 Implement KEP 4876 Mutable CSINode (#130007)
* Implement KEP-4876 Mutable CSINode Allocatable Count

Signed-off-by: torredil <torredil@amazon.com>

* Update TestGetNodeAllocatableUpdatePeriod

Signed-off-by: torredil <torredil@amazon.com>

* Implement CSINodeUpdater

Signed-off-by: torredil <torredil@amazon.com>

* Use sync.Once in csiNodeUpdater

Signed-off-by: torredil <torredil@amazon.com>

* ImVerify driver is installed before running periodic updates

Signed-off-by: torredil <torredil@amazon.com>

* Update NodeAllocatableUpdatePeriodSeconds type comment

Signed-off-by: torredil <torredil@amazon.com>

* Leverage apivalidation.ValidateImmutableField in ValidateCSINodeUpdate

Signed-off-by: torredil <torredil@amazon.com>

* Update strategy functions

Signed-off-by: torredil <torredil@amazon.com>

* Run hack/update-openapi-spec.sh

Signed-off-by: torredil <torredil@amazon.com>

* Update VolumeError.ErrorCode field

Signed-off-by: torredil <torredil@amazon.com>

* CSINodeUpdater improvements

Signed-off-by: torredil <torredil@amazon.com>

* Iron out concurrency in syncDriverUpdater

Signed-off-by: torredil <torredil@amazon.com>

* Run hack/update-openapi-spec.sh

Signed-off-by: torredil <torredil@amazon.com>

* Revise logging

Signed-off-by: torredil <torredil@amazon.com>

* Revise log in VerifyExhaustedResource

Signed-off-by: torredil <torredil@amazon.com>

* Update API validation

Signed-off-by: torredil <torredil@amazon.com>

* Add more code coverage

Signed-off-by: torredil <torredil@amazon.com>

* Fix pull-kubernetes-linter-hints

Signed-off-by: torredil <torredil@amazon.com>

* Update API types documentation

Signed-off-by: torredil <torredil@amazon.com>

* Update strategy and validation for new errorCode field

Signed-off-by: torredil <torredil@amazon.com>

* Update validation tests after strategy changes

Signed-off-by: torredil <torredil@amazon.com>

* Update VA status strategy

Signed-off-by: torredil <torredil@amazon.com>

---------

Signed-off-by: torredil <torredil@amazon.com>
2025-03-18 12:45:49 -07:00
xigang
aa32537e9a Add workqueue for node updates in DaemonSetController
Signed-off-by: xigang <wangxigang2014@gmail.com>
2025-03-19 01:09:44 +08:00
mchtech
381ccf0f4c Fix empty describedObject in hpa status (#124555)
* fix empty DescribedObject in hpa MetricStatus when object target type is AverageValue

Signed-off-by: mchtech <michu_an@126.com>

* add test

Signed-off-by: mchtech <michu_an@126.com>

---------

Signed-off-by: mchtech <michu_an@126.com>
2025-03-18 09:33:56 -07:00
Natasha Sarkar
4c2be4bdde kubelet sets observedGeneration in conditions 2025-03-18 15:43:24 +00:00
xigang
5c4948ff31 controller: factor out pod node name indexer helper function
Signed-off-by: xigang <wangxigang2014@gmail.com>
2025-03-17 20:21:30 +08:00
Kubernetes Prow Robot
9fd0e20bc2 Merge pull request #129345 from pohly/log-client-go-workqueue
client-go workqueue: add optional logger
2025-03-14 06:37:53 -07:00
Kubernetes Prow Robot
af3b4cd57a Merge pull request #130718 from kei01234kei/feature/use_generic_set
Use generic set in pkg/controller/nodelifecycle
2025-03-14 01:21:47 -07:00
Kubernetes Prow Robot
04fb7ac18b Merge pull request #130536 from tenzen-y/promote-successpolicy-to-ga
KEP-3998: Promote JobSuccessPolicy to Stable
2025-03-13 13:27:54 -07:00
Kubernetes Prow Robot
1c756849d6 Merge pull request #130591 from fmuyassarov/devel/logging
Refine logging levels in job, IPAM, and replicaSet
2025-03-12 07:13:47 -07:00
Kubernetes Prow Robot
309c4c17fb Merge pull request #128499 from stlaz/ctb_betav1
ClusterTrustBundles - move to beta
2025-03-11 12:47:45 -07:00
Kubernetes Prow Robot
652f681c2b Merge pull request #130650 from natasha41575/pod-conditions-controller
[FG:PodObservedGenerationTracking] controller sets observedGeneration on pod conditions
2025-03-11 11:27:54 -07:00
Stanislav Láznička
5b3b68a3a1 KCM: CTBPublisher: use generics to handle both alpha/beta APIs 2025-03-11 18:07:29 +01:00
Stanislav Láznička
e0f536bf1f use the ClusterTrustBundles beta API 2025-03-11 18:07:24 +01:00