Commit Graph

126628 Commits

Author SHA1 Message Date
Itamar Holder
c792c30b6a Refactor: remove no longer needed resourceName parameter
Signed-off-by: Itamar Holder <iholder@redhat.com>
2024-10-30 13:58:38 +02:00
Itamar Holder
7207ce20f0 Refactor: remove functions that are no longer used
Signed-off-by: Itamar Holder <iholder@redhat.com>
2024-10-30 13:58:38 +02:00
Itamar Holder
510ff67528 Use libcontainer's cgroup manager to update resources through systemd
libcontainer's cgroup manager is version agnostic, and is agnostic
to whether systemd is used. This way if systemd is used, the cgroup
manager would be able to update resources properly so that if
the daemon would be restarted the changes would not be reverted.

Signed-off-by: Itamar Holder <iholder@redhat.com>
2024-10-30 13:58:38 +02:00
Itamar Holder
2a5a6c7fb8 Refactor: add import alias to libcontainer cgroup manager
Signed-off-by: Itamar Holder <iholder@redhat.com>
2024-10-30 13:58:38 +02:00
Kubernetes Prow Robot
d001d5684e
Merge pull request #128417 from tenzen-y/self-nominate-job-controller-reviewer
Self nominate tenzen-y as a reviewer for the Job controller
2024-10-30 11:21:39 +00:00
Kubernetes Prow Robot
a18b50e7e4
Merge pull request #128373 from mimowo/job-cover-negative-codes
Job Pod Failure policy - cover testing of negative exit codes
2024-10-30 11:21:31 +00:00
Kubernetes Prow Robot
7529696b59
Merge pull request #128334 from mimowo/job-windows-e2e-test
Job Pod Failure policy refactor e2e test using exit codes
2024-10-30 11:21:25 +00:00
yunwang0911
05493c0924
Update pkg/kubelet/status/state/state_checkpoint_test.go
Co-authored-by: Tim Allclair <timallclair@gmail.com>
2024-10-30 18:11:10 +08:00
yunwang0911
e4c8eefeb2
Update pkg/kubelet/status/state/state_checkpoint_test.go
Co-authored-by: Tim Allclair <timallclair@gmail.com>
2024-10-30 18:08:53 +08:00
Kubernetes Prow Robot
daef8c2419
Merge pull request #127266 from pohly/dra-admin-access-in-status
DRA API: AdminAccess in DeviceRequestAllocationResult + DRAAdminAccess feature gate
2024-10-30 03:41:25 +00:00
Kubernetes Prow Robot
5fcef4f79d
Merge pull request #128422 from bart0sh/PR163-density-e2e_node-adjust-limits
density test: adjust CPU and memory limits
2024-10-30 02:37:31 +00:00
Kubernetes Prow Robot
db66e397d9
Merge pull request #128359 from matteriben/disable-caching-for-authoritative-zone
disable caching for authoritative zone to comply with rfc-1035 section 6.1.2
2024-10-30 02:37:24 +00:00
Kubernetes Prow Robot
a93e3e7ae1
Merge pull request #127483 from nokia/strict-cpu-reservation-core
KEP-4540: Add CPUManager policy option to restrict reservedSystemCPUs to system daemons and interrupt processing
2024-10-30 01:21:47 +00:00
Kubernetes Prow Robot
d702d265c7
Merge pull request #127291 from zhifei92/fix-apiserver-unexpected-panic
[FG:InPlacePodVerticalScaling] Fixed the apiserver panic issue that occurred when adding a container during pod updates in the InPlacePodVerticalScaling scenario.
2024-10-30 01:21:40 +00:00
Kubernetes Prow Robot
a0e5e244b3
Merge pull request #126875 from serathius/watchcache-test-indexers
Adding tests for using indexers in tests
2024-10-30 01:21:32 +00:00
Kubernetes Prow Robot
6737352b03
Merge pull request #125708 from hshiina/dopodresizeaction-error
[FG:InPlacePodVerticalScaling] Fix order of resizing pod cgroups in doPodResizeAction()
2024-10-30 01:21:25 +00:00
Kubernetes Prow Robot
e8a75ac53f
Merge pull request #128420 from tallclair/e2e-cleanup
Reuse cached client config for exec requests in e2e
2024-10-30 00:17:37 +00:00
Kubernetes Prow Robot
42b7cfecec
Merge pull request #128274 from eddycharly/fix-cel-type-provider
fix: cel type provider should return a type type
2024-10-30 00:17:30 +00:00
Kubernetes Prow Robot
a339a36a36
Merge pull request #127506 from ffromani/cpu-pool-size-metrics
node: metrics: add metrics about cpu pool sizes
2024-10-30 00:17:24 +00:00
James Sturtevant
ac174f518c
Respond to sig-node feedback
Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
2024-10-29 16:56:37 -07:00
Ed Bartosh
04f7a86001 density test: adjust CPU and memory limits
Adjusted limits based on recent job log:
I1028 20:05:42.079182 1002 resource_usage_test.go:199] Resource usage:
  container cpu(cores) memory_working_set(MB) memory_rss(MB)
  "kubelet" 0.024      22.17                  14.20
  "runtime" 0.041      409.70                 84.21

  I1028 20:05:42.079274 1002 resource_usage_test.go:206] CPU usage of containers:
  container 50th% 90th% 95th% 99th% 100th%
  "/"       N/A   N/A   N/A   N/A   N/A
  "runtime" 0.014 0.834 0.834 0.834 1.083
  "kubelet" 0.023 0.093 0.093 0.093 0.164

Increasing 95th percentile for runtime CPU usage should also make
pull-kubernetes-node-kubelet-containerd-flaky less flaky.
2024-10-30 00:48:56 +02:00
Kubernetes Prow Robot
f087575f21
Merge pull request #127226 from myeunee/cleanup
Clean up unnecessary else block and redundant variable assignment
2024-10-29 22:41:25 +00:00
Matt Riben
30d9ed7203
disable caching for authoritative zone
Signed-off-by: Matt Riben <matt.riben@swirldslabs.com>
2024-10-29 17:10:07 -05:00
myeunee
9cc65ce872 Restrict cz variable scope within else clause 2024-10-30 06:31:06 +09:00
Kubernetes Release Robot
f01e0d64db CHANGELOG: Update directory for v1.32.0-alpha.3 release 2024-10-29 20:17:52 +00:00
myeunee
2faaedbe39 Refactor error handling for configz initialization
Improved code readability and limited variable scope as per reviewer's suggestion.
2024-10-30 04:53:51 +09:00
Marek Siarkowicz
711772a1e1 Adding tests for using indexers in tests 2024-10-29 20:22:16 +01:00
Kubernetes Prow Robot
988769933e
Merge pull request #128307 from NoicFank/bugfix-scheduler-preemption
bugfix(scheduler): preemption picks wrong victim node with higher priority pod on it
2024-10-29 19:05:02 +00:00
Kubernetes Prow Robot
a12a32cd12
Merge pull request #127146 from bart0sh/PR156-DRA-Kubelet-latency
Kubelet: add DRA latency metrics
2024-10-29 19:04:55 +00:00
Tim Allclair
2407a49956 Reuse cached client config for exec requests in e2e 2024-10-29 10:00:11 -07:00
Kubernetes Prow Robot
c3980f601c
Merge pull request #128267 from benluddy/cbor-response-negotiation
KEP-4222: Test response content negotiation for each CBOR enablement state.
2024-10-29 16:48:55 +00:00
Yuki Iwai
eca7ee877a Self nominate tenzen-y as a reviewer for the Job controller
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2024-10-30 01:14:47 +09:00
Kubernetes Prow Robot
b8e20b74dd
Merge pull request #128382 from carlory/rm-vac
remove unused vac code
2024-10-29 15:25:05 +00:00
Kubernetes Prow Robot
c5ccf59974
Merge pull request #128379 from pohly/dra-owners-wg-label
DRA: add wg/device-management label automatically
2024-10-29 15:24:57 +00:00
Kubernetes Prow Robot
eb445ac66c
Merge pull request #128414 from soltysh/improve_error
Provide link with e2e guidelines when verity-test-code.sh fails
2024-10-29 14:21:06 +00:00
Kubernetes Prow Robot
c83250d104
Merge pull request #126754 from serathius/watchcache-btree
Reimplement watch cache storage with btree
2024-10-29 14:20:58 +00:00
Kubernetes Prow Robot
d09d98e07c
Merge pull request #128022 from googs1025/cleanup/ut/preemption
chore(scheduler): add unit test for framework preemption part
2024-10-29 13:16:55 +00:00
Talor Itzhak
d64f34eb2c memorymanager: areMemoryStatesEqual helper
perform the memoryStates comparison in helper function

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
2024-10-29 14:22:04 +02:00
Marek Siarkowicz
50d2fab279 Implement btree based storage indexer 2024-10-29 13:13:21 +01:00
Maciej Szulik
97fcb05374
Provide link with e2e guidelines when verity-test-code.sh fails
Signed-off-by: Maciej Szulik <soltysh@gmail.com>
2024-10-29 13:07:05 +01:00
NoicFank
68f7a7c682 bugfix(scheduler): preemption picks wrong victim node with higher priority pod on it.
Introducing pdb to preemption had disrupted the orderliness of pods in the victims,
which would leads picking wrong victim node with higher priority pod on it.
2024-10-29 19:50:55 +08:00
Talor Itzhak
7476f46d71 memorymanager: fix checkpoint file comparison
For a resource within a group, such as memory,
we should validate the total `Free` and total `Reserved` size of the expected `machineState` and state restored from checkpoint file after kubelet start.
If total `Free` and total `Reserved` are equal, the restored state is valid.

The old comparison however was done by reflection.

There're times when the memory accounting is equals
but the allocations across the NUMA nodes are varies.

In such cases we still need to consider the states as equals.

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
2024-10-29 12:10:27 +02:00
holder
6709317ae2 chore: optimize code logic
(cherry picked from commit 91a9a195ac0fe0e31301dc60af0ea868fc4756ff)
2024-10-29 12:08:28 +02:00
holder
6d7a1226d5 update the test case name
(cherry picked from commit de033352079c7d87417f88f073d6b7891e51e590)
2024-10-29 12:08:23 +02:00
holder
39726b119f fix: fix state validate error after memorymanager with static policy start
(cherry picked from commit b91951f847d0b159c9d8ef32688cc96489ac1884)
2024-10-29 12:08:16 +02:00
Patrick Ohly
4419568259 DRA: treat AdminAccess as a new feature gated field
Using the "normal" logic for a feature gated field simplifies the
implementation of the feature gate.

There is one (entirely theoretic!) problem with updating from 1.31: if a claim
was allocated in 1.31 with admin access, the status field was not set because
it didn't exist yet. If a driver now follows the current definition of "unset =
off", then it will not grant admin access even though it should. This is
theoretic because drivers are starting to support admin access with 1.32, so
there shouldn't be any claim where this problem could occur.
2024-10-29 10:22:31 +01:00
Patrick Ohly
9a7e4ccab2 DRA admin access: add feature gate
The new DRAAdminAccess feature gate has the following effects:
- If disabled in the apiserver, the spec.devices.requests[*].adminAccess
  field gets cleared. Same in the status. In both cases the scenario
  that it was already set and a claim or claim template get updated
  is special: in those cases, the field is not cleared.

  Also, allocating a claim with admin access is allowed regardless of the
  feature gate and the field is not cleared. In practice, the scheduler
  will not do that.
- If disabled in the resource claim controller, creating ResourceClaims
  with the field set gets rejected. This prevents running workloads
  which depend on admin access.
- If disabled in the scheduler, claims with admin access don't get
  allocated. The effect is the same.

The alternative would have been to ignore the fields in claim controller and
scheduler. This is bad because a monitoring workload then runs, blocking
resources that probably were meant for production workloads.
2024-10-29 09:50:11 +01:00
Patrick Ohly
f3fef01e79 DRA API: AdminAccess in DeviceRequestAllocationResult
Drivers need to know that because admin access may also grant additional
permissions. The allocator needs to ignore such results when determining which
devices are considered as allocated.

In both cases it is conceptually cleaner to not rely on the content of the
ClaimSpec.
2024-10-29 09:50:07 +01:00
Kubernetes Prow Robot
5f594f4215
Merge pull request #128401 from tenzen-y/use-same-receiver-name
Job: Consistentely use the same reveiver name in the controller
2024-10-29 08:16:55 +00:00
Kubernetes Prow Robot
66b3dc1a38
Merge pull request #128400 from tenzen-y/use-uid-typed-instead-of-string
Job: Refactor uncountedTerminatedPods to avoid casting everywhere
2024-10-29 06:32:56 +00:00