Patrick Ohly
67f0428769
DRA resourceslice controller: delay sync
...
When deleting a bunch of slices, the delete events queue the pool while it is
being synced. It then got synced again immediately, while the deleted slices
were still being removed from the informer cache. The obsolete slice in the
cache caused the controller to delete it again, which fails with a "not
found". That error is ignored, but this still caused extra API calls.
Now syncing gets delayed with a configuration duration (default: 30 seconds) so
the informer cache is more likely to be up-to-date when the pool gets synced
again.
2024-10-30 15:54:32 +01:00
Patrick Ohly
99cf2d8a2e
DRA resource slice controller: add E2E test
...
This test covers creating and deleting 100 large ResourceSlices. It is strict
about using the minimum number of calls.
The test also verifies that creating large slices works.
2024-10-30 15:54:32 +01:00
Patrick Ohly
7473e643fa
DRA resource slice controller: use MutationCache to avoid race
...
This avoids the problem of creating an additional slice when the one from the
previous sync is not in the informer cache yet. It also avoids false
attempts to delete slices which were updated in the previous sync. Such
attempts would fail the ResourceVersion precondition check, but would
still cause work for the apiserver.
2024-10-30 15:54:32 +01:00
Patrick Ohly
e88d5c37e6
DRA resource claim controller: add statistics
...
This is primarily for testing. Proper metrics might be useful, but can still be
added later.
2024-10-30 15:54:32 +01:00
Patrick Ohly
d94752ebc8
DRA resourceslice controller: use preconditions for Delete
...
It's better to verify UID and ResourceVersion of the ResourceSlice that we want
to delete. If anything changed, the decision to remove it might not apply
anymore and we need to check again.
2024-10-30 15:54:32 +01:00
Patrick Ohly
a6d180c7d3
DRA: validate set of devices in a pool before using the pool
...
The ResourceSlice controller (theoretically) might end up creating too many
slices if it syncs again before its informer cache was updated. This could
cause the scheduler to allocate a device from a duplicated slice. They should
be identical, but its still better to fail and wait until the controller
removes the redundant slice.
2024-10-30 15:54:32 +01:00
Patrick Ohly
26650371cc
DRA resourceslice controller: support publishing multiple slices
...
The driver determines what each slice is meant to look like. The controller
then ensures that only those slices exist. It reuses existing slices where the
set of devices, as identified by their names, is the same as in some desired
slice. Such slices get updated to match the desired state.
In other words, attributes and the order of devices can be changed by updating
an existing slice, but adding or removing a device is done by deleting and
re-creating slices.
Co-authored-by: googs1025 <googs1025@gmail.com>
The test update is partly based on
https://github.com/kubernetes/kubernetes/pull/127645 .
2024-10-30 15:54:32 +01:00
Antoni Zawodny
4afa554f65
Add --concurrent-daemonset-syncs
flag to kube-controller-manager
2024-10-30 15:03:26 +01:00
dom4ha
ff584a76e0
Fix Unschedulable test by scheduling high priority churn pods to get processed right after they were injected (before the queued test pods)
2024-10-30 13:04:38 +00:00
Itamar Holder
f21473b924
Set pod-level CPUPeriod only if CPUQuota is changed
...
Signed-off-by: Itamar Holder <iholder@redhat.com>
2024-10-30 14:21:35 +02:00
Itamar Holder
c792c30b6a
Refactor: remove no longer needed resourceName parameter
...
Signed-off-by: Itamar Holder <iholder@redhat.com>
2024-10-30 13:58:38 +02:00
Itamar Holder
7207ce20f0
Refactor: remove functions that are no longer used
...
Signed-off-by: Itamar Holder <iholder@redhat.com>
2024-10-30 13:58:38 +02:00
Itamar Holder
510ff67528
Use libcontainer's cgroup manager to update resources through systemd
...
libcontainer's cgroup manager is version agnostic, and is agnostic
to whether systemd is used. This way if systemd is used, the cgroup
manager would be able to update resources properly so that if
the daemon would be restarted the changes would not be reverted.
Signed-off-by: Itamar Holder <iholder@redhat.com>
2024-10-30 13:58:38 +02:00
Itamar Holder
2a5a6c7fb8
Refactor: add import alias to libcontainer cgroup manager
...
Signed-off-by: Itamar Holder <iholder@redhat.com>
2024-10-30 13:58:38 +02:00
Kubernetes Prow Robot
d001d5684e
Merge pull request #128417 from tenzen-y/self-nominate-job-controller-reviewer
...
Self nominate tenzen-y as a reviewer for the Job controller
2024-10-30 11:21:39 +00:00
Kubernetes Prow Robot
a18b50e7e4
Merge pull request #128373 from mimowo/job-cover-negative-codes
...
Job Pod Failure policy - cover testing of negative exit codes
2024-10-30 11:21:31 +00:00
Kubernetes Prow Robot
7529696b59
Merge pull request #128334 from mimowo/job-windows-e2e-test
...
Job Pod Failure policy refactor e2e test using exit codes
2024-10-30 11:21:25 +00:00
yunwang0911
05493c0924
Update pkg/kubelet/status/state/state_checkpoint_test.go
...
Co-authored-by: Tim Allclair <timallclair@gmail.com>
2024-10-30 18:11:10 +08:00
yunwang0911
e4c8eefeb2
Update pkg/kubelet/status/state/state_checkpoint_test.go
...
Co-authored-by: Tim Allclair <timallclair@gmail.com>
2024-10-30 18:08:53 +08:00
Kubernetes Prow Robot
daef8c2419
Merge pull request #127266 from pohly/dra-admin-access-in-status
...
DRA API: AdminAccess in DeviceRequestAllocationResult + DRAAdminAccess feature gate
2024-10-30 03:41:25 +00:00
Kubernetes Prow Robot
5fcef4f79d
Merge pull request #128422 from bart0sh/PR163-density-e2e_node-adjust-limits
...
density test: adjust CPU and memory limits
2024-10-30 02:37:31 +00:00
Kubernetes Prow Robot
db66e397d9
Merge pull request #128359 from matteriben/disable-caching-for-authoritative-zone
...
disable caching for authoritative zone to comply with rfc-1035 section 6.1.2
2024-10-30 02:37:24 +00:00
Kubernetes Prow Robot
a93e3e7ae1
Merge pull request #127483 from nokia/strict-cpu-reservation-core
...
KEP-4540: Add CPUManager policy option to restrict reservedSystemCPUs to system daemons and interrupt processing
2024-10-30 01:21:47 +00:00
Kubernetes Prow Robot
d702d265c7
Merge pull request #127291 from zhifei92/fix-apiserver-unexpected-panic
...
[FG:InPlacePodVerticalScaling] Fixed the apiserver panic issue that occurred when adding a container during pod updates in the InPlacePodVerticalScaling scenario.
2024-10-30 01:21:40 +00:00
Kubernetes Prow Robot
a0e5e244b3
Merge pull request #126875 from serathius/watchcache-test-indexers
...
Adding tests for using indexers in tests
2024-10-30 01:21:32 +00:00
Kubernetes Prow Robot
6737352b03
Merge pull request #125708 from hshiina/dopodresizeaction-error
...
[FG:InPlacePodVerticalScaling] Fix order of resizing pod cgroups in doPodResizeAction()
2024-10-30 01:21:25 +00:00
Kubernetes Prow Robot
e8a75ac53f
Merge pull request #128420 from tallclair/e2e-cleanup
...
Reuse cached client config for exec requests in e2e
2024-10-30 00:17:37 +00:00
Kubernetes Prow Robot
42b7cfecec
Merge pull request #128274 from eddycharly/fix-cel-type-provider
...
fix: cel type provider should return a type type
2024-10-30 00:17:30 +00:00
Kubernetes Prow Robot
a339a36a36
Merge pull request #127506 from ffromani/cpu-pool-size-metrics
...
node: metrics: add metrics about cpu pool sizes
2024-10-30 00:17:24 +00:00
James Sturtevant
ac174f518c
Respond to sig-node feedback
...
Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
2024-10-29 16:56:37 -07:00
Ed Bartosh
04f7a86001
density test: adjust CPU and memory limits
...
Adjusted limits based on recent job log:
I1028 20:05:42.079182 1002 resource_usage_test.go:199] Resource usage:
container cpu(cores) memory_working_set(MB) memory_rss(MB)
"kubelet" 0.024 22.17 14.20
"runtime" 0.041 409.70 84.21
I1028 20:05:42.079274 1002 resource_usage_test.go:206] CPU usage of containers:
container 50th% 90th% 95th% 99th% 100th%
"/" N/A N/A N/A N/A N/A
"runtime" 0.014 0.834 0.834 0.834 1.083
"kubelet" 0.023 0.093 0.093 0.093 0.164
Increasing 95th percentile for runtime CPU usage should also make
pull-kubernetes-node-kubelet-containerd-flaky less flaky.
2024-10-30 00:48:56 +02:00
Kubernetes Prow Robot
f087575f21
Merge pull request #127226 from myeunee/cleanup
...
Clean up unnecessary else block and redundant variable assignment
2024-10-29 22:41:25 +00:00
Matt Riben
30d9ed7203
disable caching for authoritative zone
...
Signed-off-by: Matt Riben <matt.riben@swirldslabs.com>
2024-10-29 17:10:07 -05:00
myeunee
9cc65ce872
Restrict cz variable scope within else clause
2024-10-30 06:31:06 +09:00
Kubernetes Release Robot
f01e0d64db
CHANGELOG: Update directory for v1.32.0-alpha.3 release
2024-10-29 20:17:52 +00:00
myeunee
2faaedbe39
Refactor error handling for configz initialization
...
Improved code readability and limited variable scope as per reviewer's suggestion.
2024-10-30 04:53:51 +09:00
Marek Siarkowicz
711772a1e1
Adding tests for using indexers in tests
2024-10-29 20:22:16 +01:00
Kubernetes Prow Robot
988769933e
Merge pull request #128307 from NoicFank/bugfix-scheduler-preemption
...
bugfix(scheduler): preemption picks wrong victim node with higher priority pod on it
2024-10-29 19:05:02 +00:00
Kubernetes Prow Robot
a12a32cd12
Merge pull request #127146 from bart0sh/PR156-DRA-Kubelet-latency
...
Kubelet: add DRA latency metrics
2024-10-29 19:04:55 +00:00
Tim Allclair
2407a49956
Reuse cached client config for exec requests in e2e
2024-10-29 10:00:11 -07:00
Kubernetes Prow Robot
c3980f601c
Merge pull request #128267 from benluddy/cbor-response-negotiation
...
KEP-4222: Test response content negotiation for each CBOR enablement state.
2024-10-29 16:48:55 +00:00
Yuki Iwai
eca7ee877a
Self nominate tenzen-y as a reviewer for the Job controller
...
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2024-10-30 01:14:47 +09:00
Kubernetes Prow Robot
b8e20b74dd
Merge pull request #128382 from carlory/rm-vac
...
remove unused vac code
2024-10-29 15:25:05 +00:00
Kubernetes Prow Robot
c5ccf59974
Merge pull request #128379 from pohly/dra-owners-wg-label
...
DRA: add wg/device-management label automatically
2024-10-29 15:24:57 +00:00
Kubernetes Prow Robot
eb445ac66c
Merge pull request #128414 from soltysh/improve_error
...
Provide link with e2e guidelines when verity-test-code.sh fails
2024-10-29 14:21:06 +00:00
Kubernetes Prow Robot
c83250d104
Merge pull request #126754 from serathius/watchcache-btree
...
Reimplement watch cache storage with btree
2024-10-29 14:20:58 +00:00
Kubernetes Prow Robot
d09d98e07c
Merge pull request #128022 from googs1025/cleanup/ut/preemption
...
chore(scheduler): add unit test for framework preemption part
2024-10-29 13:16:55 +00:00
Talor Itzhak
d64f34eb2c
memorymanager: areMemoryStatesEqual
helper
...
perform the memoryStates comparison in helper function
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
2024-10-29 14:22:04 +02:00
Marek Siarkowicz
50d2fab279
Implement btree based storage indexer
2024-10-29 13:13:21 +01:00
Maciej Szulik
97fcb05374
Provide link with e2e guidelines when verity-test-code.sh fails
...
Signed-off-by: Maciej Szulik <soltysh@gmail.com>
2024-10-29 13:07:05 +01:00