Commit Graph

126399 Commits

Author SHA1 Message Date
Patrick Ohly
814c9428fd DRA scheduler: cache compiled CEL expressions
DeviceClasses and different requests are very likely to contain the same
expression string. We don't need to compile that over and over again.

To avoid hanging onto that cache longer than necessary, it's currently tied to
each PreFilter/Filter combination. It might make sense to move this up into the
scheduler plugin and thus reuse compiled expressions for different pods.

    goos: linux
    goarch: amd64
    pkg: k8s.io/kubernetes/test/integration/scheduler_perf
    cpu: Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz
                                                                                       │            before            │                        after                        │
                                                                                       │ SchedulingThroughput/Average │ SchedulingThroughput/Average  vs base               │
    PerfScheduling/SchedulingWithResourceClaimTemplateStructured/5000pods_500nodes-36                      33.95 ± 4%                     36.65 ± 2%   +7.95% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_100nodes-36                     105.8 ± 2%                     106.7 ± 3%        ~ (p=0.177 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_500nodes-36                     100.7 ± 1%                     119.7 ± 3%  +18.82% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_100nodes-36                      90.78 ± 1%                    121.10 ± 4%  +33.40% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_500nodes-36                      50.51 ± 7%                     63.72 ± 3%  +26.17% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_100nodes-36                      103.7 ± 5%                     110.2 ± 2%   +6.32% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_500nodes-36                      28.50 ± 2%                     28.16 ± 5%        ~ (p=0.102 n=6)
    geomean                                                                                                64.99                          73.15       +12.56%
2024-11-01 13:20:06 +01:00
Patrick Ohly
941d17b3b8 DRA scheduler: code cleanups
Looking up the slice can be avoided by storing it when allocating a device.
The AllocationResult struct is small enough that it can be copied by value.

    goos: linux
    goarch: amd64
    pkg: k8s.io/kubernetes/test/integration/scheduler_perf
    cpu: Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz
                                                                                       │            before            │                       after                        │
                                                                                       │ SchedulingThroughput/Average │ SchedulingThroughput/Average  vs base              │
    PerfScheduling/SchedulingWithResourceClaimTemplateStructured/5000pods_500nodes-36                      33.30 ± 2%                     33.95 ± 4%       ~ (p=0.288 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_100nodes-36                     105.3 ± 2%                     105.8 ± 2%       ~ (p=0.524 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_500nodes-36                     100.8 ± 1%                     100.7 ± 1%       ~ (p=0.738 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_100nodes-36                      90.96 ± 2%                     90.78 ± 1%       ~ (p=0.952 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_500nodes-36                      49.84 ± 4%                     50.51 ± 7%       ~ (p=0.485 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_100nodes-36                      103.8 ± 1%                     103.7 ± 5%       ~ (p=0.582 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_500nodes-36                      27.21 ± 7%                     28.50 ± 2%       ~ (p=0.065 n=6)
    geomean                                                                                                64.26                          64.99       +1.14%
2024-11-01 13:19:51 +01:00
Patrick Ohly
1246898315 DRA scheduler: ResourceSlice with unique strings
Using unique strings instead of normal strings speeds up allocation with
structured parameters because maps that use those strings as key no longer need
to build hashes of the string content. However, care must be taken to call
unique.Make as little as possible because it is costly.

Pre-allocating the map of allocated devices reduces the need to grow the map
when adding devices.

    goos: linux
    goarch: amd64
    pkg: k8s.io/kubernetes/test/integration/scheduler_perf
    cpu: Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz
                                                                                       │            before            │                        after                         │
                                                                                       │ SchedulingThroughput/Average │ SchedulingThroughput/Average  vs base                │
    PerfScheduling/SchedulingWithResourceClaimTemplateStructured/5000pods_500nodes-36                     18.06 ±  2%                     33.30 ± 2%   +84.31% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_100nodes-36                    104.7 ±  2%                     105.3 ± 2%         ~ (p=0.818 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_500nodes-36                    96.62 ±  1%                    100.75 ± 1%    +4.28% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_100nodes-36                     83.00 ±  2%                     90.96 ± 2%    +9.59% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_500nodes-36                     32.45 ±  7%                     49.84 ± 4%   +53.60% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_100nodes-36                     95.22 ±  7%                    103.80 ± 1%    +9.00% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_500nodes-36                     9.111 ± 10%                    27.215 ± 7%  +198.69% (p=0.002 n=6)
    geomean                                                                                               45.86                           64.26        +40.12%
2024-11-01 13:19:48 +01:00
Patrick Ohly
7de6d070f2 DRA scheduler: avoid listing claims during Filter
The Allocate call used to call back into the claim lister for each node. This
was significant work which showed up at the top of the CPU profile. It's
okay to list only once during PreFilter because the Filter call does not change
the claim status between Allocate calls.

    goos: linux
    goarch: amd64
    pkg: k8s.io/kubernetes/test/integration/scheduler_perf
    cpu: Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz
                                                                                       │            before            │                        after                        │
                                                                                       │ SchedulingThroughput/Average │ SchedulingThroughput/Average  vs base               │
    PerfScheduling/SchedulingWithResourceClaimTemplateStructured/5000pods_500nodes-36                      15.04 ± 0%                    18.06 ±  2%  +20.07% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_100nodes-36                     105.5 ± 1%                    104.7 ±  2%        ~ (p=0.485 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_500nodes-36                     95.83 ± 1%                    96.62 ±  1%        ~ (p=0.063 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_100nodes-36                      79.67 ± 3%                    83.00 ±  2%   +4.18% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_500nodes-36                      27.11 ± 5%                    32.45 ±  7%  +19.68% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_100nodes-36                      84.00 ± 3%                    95.22 ±  7%  +13.36% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_500nodes-36                      7.110 ± 6%                    9.111 ± 10%  +28.15% (p=0.002 n=6)
    geomean                                                                                                41.05                         45.86        +11.73%
2024-11-01 12:43:17 +01:00
Patrick Ohly
ad22b74c60 DRA scheduler: fix match attribute names in test
FullyQualifiedNames must include a domain. The current code doesn't care, but
once it does, the tests better should behave correctly.
2024-11-01 12:43:16 +01:00
Kubernetes Prow Robot
d87bf75c29
Merge pull request #128439 from oxxenix/migrate-security-components-to-contextual-logging
clustertrustbundle, token_manager: migrate to contextual logging
2024-11-01 11:17:25 +00:00
Kubernetes Prow Robot
c4eea34dcf
Merge pull request #128293 from sebastiaanspeck/fix/kubeamd-typo
Fix typo for `kubeadm`
2024-11-01 09:15:26 +00:00
Kubernetes Prow Robot
b831df733e
Merge pull request #128416 from jpbetz/reset-filter
Add optional ResetFieldsFilterStrategy interface for storage
2024-11-01 02:23:26 +00:00
Joe Betz
2bc17d1cf0 Add ResetFieldsFilterStrategy 2024-10-31 21:19:27 -04:00
Joe Betz
6fe5140366 hack/pin-dependency.sh sigs.k8s.io/structured-merge-diff/v4 v4.4.2 2024-10-31 21:19:27 -04:00
Kubernetes Prow Robot
223ac36b50
Merge pull request #128399 from JesseStutler/dra
Refactor the dynamicResources struct to DynamicResources
2024-11-01 00:33:27 +00:00
Kubernetes Prow Robot
74b9204b6a
Merge pull request #128473 from dims/copy-ParseCgroupFileUnified-and-drop-rest-of-containerd-cgroups
Copy ParseCgroupFileUnified and Drop rest of containerd/cgroups
2024-10-31 21:57:33 +00:00
Kubernetes Prow Robot
34ce75749e
Merge pull request #128463 from knrc/fix_vap_elapsed_time_tracking
Fix elapsed time tracking for validating admission policies
2024-10-31 21:57:27 +00:00
Kubernetes Prow Robot
d76a8fae67
Merge pull request #128468 from wojtek-t/fix_miss_events_tests
Fix TestCacherDontMissEventsOnReinitialization test
2024-10-31 20:25:40 +00:00
Kubernetes Prow Robot
f68a0371f1
Merge pull request #128433 from pohly/dra-admin-access-in-status
DRA API: check "AdminAccess in use" only once
2024-10-31 20:25:33 +00:00
Kubernetes Prow Robot
b337f048db
Merge pull request #127094 from sreeram-venkitesh/4818-allow-zero-for-prestop-hook
KEP-4818: Relaxed validation for allowing zero in PreStop hook sleep action
2024-10-31 20:25:26 +00:00
Kubernetes Prow Robot
d34c181465
Merge pull request #128444 from tosi3k/ds-syncs
Add --concurrent-daemonset-syncs argument to kube-controller-manager
2024-10-31 19:21:34 +00:00
Kubernetes Prow Robot
151ca569f9
Merge pull request #128426 from yongruilin/reset-label-allow-list
feat(metrics): Add util func to reset label allow lists
2024-10-31 19:21:27 +00:00
Davanum Srinivas
e86d02b60c
Copy ParseCgroupFileUnified and Drop rest of containerd/cgroups
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2024-10-31 13:42:39 -04:00
Kubernetes Prow Robot
365b457e3e
Merge pull request #128455 from jsafrane/refactor-kcm-plugins
Refactor KCM volume plugin probe
2024-10-31 17:17:34 +00:00
Kubernetes Prow Robot
7a43edefa1
Merge pull request #128454 from jpbetz/fix-emulated-storage-version-encoding-config
Don't select versions that have a replacement as storage version for APIs
2024-10-31 17:17:26 +00:00
Wojciech Tyczyński
d35ea217fa Fix TestCacherDontMissEventsOnReinitialization test 2024-10-31 17:08:42 +01:00
yongruilin
d2ef8a1808 feat(metrics): Add util func to reset label allow lists
Adds a utility function `ResetLabelValueAllowLists` to reset the allow lists for label values.  This facilitates testing by allowing tests to clear the global state between runs and avoid unintended side effects.
2024-10-31 09:08:00 -07:00
Kubernetes Prow Robot
50998de605
Merge pull request #128457 from neolit123/1.31-improve-dry-run-logic
kubeadm: support dryrunning upgrade without a real cluster
2024-10-31 15:21:33 +00:00
Kubernetes Prow Robot
8233d1edc8
Merge pull request #127164 from cici37/correctGoDoc
Correct go doc for admissionregistration resources
2024-10-31 15:21:26 +00:00
Kubernetes Prow Robot
69e30cd642
Merge pull request #128263 from ShazaAldawamneh/typecheck-retry-generation
CRD type check test fix
2024-10-31 13:53:33 +00:00
Kubernetes Prow Robot
ff5cb3791a
Merge pull request #127903 from soltysh/test_daemonset
Add unit tests verifying the update touches old, unhealthy pods first, and only after new pods
2024-10-31 13:53:26 +00:00
Joe Betz
d5517b7a51 Unit test for emulated storage version selection 2024-10-31 09:22:28 -04:00
Lubomir I. Ivanov
07918a59e8 kubeadm: support dryrunning upgrade wihout a real cluster
Make the following changes:
- When dryrunning if the given kubeconfig does not exist
create a DryRun object without a real client. This means only
a fake client will be used for all actions.
- Skip the preflight check if manifests exist during dryrun.
Print "would ..." instead.
- Add new reactors that handle objects during upgrade.
- Add unit tests for new reactors.
- Print message on "upgrade node" that this is not a CP node
if the apiserver manifest is missing.
- Add a new function GetNodeName() that uses 3 different methods
for fetching the node name. Solves a long standing issue where
we only used the cert in kubelet.conf for determining node name.
- Various other minor fixes.
2024-10-31 14:58:47 +02:00
Kubernetes Prow Robot
c19ffb7e72
Merge pull request #128464 from sanposhiho/flaky-sched-one
fix: flake TestSchedulerScheduleOne
2024-10-31 12:13:33 +00:00
Kubernetes Prow Robot
ac25b64847
Merge pull request #128450 from liggitt/revert-127669
Revert "Merge pull request #127669 from olyazavr/fix-probe-race"
2024-10-31 12:13:26 +00:00
Kubernetes Prow Robot
ce6396175b
Merge pull request #127318 from aroradaman/conntrack-reconciler
proxy/conntrack: reconciler
2024-10-31 10:21:33 +00:00
Kubernetes Prow Robot
f94f87795f
Merge pull request #126935 from aojea/proxy_conntrack_service_topology
e2e conntrack test for UDP Service with internalTrafficPolicy local
2024-10-31 10:21:26 +00:00
Maciej Szulik
174288d751
Add unit tests verifying the update touches old, unhealthy pods first, and only after new pods.
Signed-off-by: Maciej Szulik <soltysh@gmail.com>
2024-10-31 11:13:01 +01:00
Patrick Ohly
d1f0d24ef9 DRA API: check "AdminAccess in use" only once
This is simpler and an opportunity to explain the concept.
2024-10-31 09:42:18 +01:00
Kensei Nakada
bd8e9dd021 fix: flake TestSchedulerScheduleOne 2024-10-31 17:13:50 +09:00
Jan Safranek
9e29f95618 Refactor controller-manager volume plugins
Most of the volume plugins were removed from k/k. Refactor how KCM
controllers initialize the few leftovers.
2024-10-31 09:08:30 +01:00
Jan Safranek
cba5a93468 Remove portworx from attachable volume plugins
The volume plugin does not implement AttachableVolumePlugin interface.
2024-10-31 09:08:21 +01:00
Jan Safranek
0ecbdf3622 Remove fc from expandable plugins
FibreChannel volume plugin does not implement ExpandableVolumePlugin.
2024-10-31 09:08:21 +01:00
Jan Safranek
1fa8877c33 Add unit tests for KCM volume plugin probers 2024-10-31 09:08:19 +01:00
Kubernetes Prow Robot
453efd7a4b
Merge pull request #121604 from pacoxu/image-pull-e2e
[node-e2e] add test cases for serialize and parallel image pulling
2024-10-31 08:01:26 +00:00
Paco Xu
82df7a7d82 use cri proxy injector for parallel pulling image tests 2024-10-31 14:50:50 +08:00
Kubernetes Prow Robot
7c56aa5a58
Merge pull request #128353 from sanposhiho/patch-13
fix: register ResourceSlice to allResources
2024-10-31 04:41:25 +00:00
Kubernetes Prow Robot
5d353417cd
Merge pull request #128346 from dims/update-to-latest-advisor-for-1.32
Update to latest cadvisor - `v0.51.0`
2024-10-30 23:45:26 +00:00
Kubernetes Prow Robot
c0e0785fe4
Merge pull request #128427 from dom4ha/scheduler-perf
Fix Unschedulable test by using high priority churn pods to get processed right after they were injected
2024-10-30 22:23:25 +00:00
Joe Betz
c59fba7f26
Promote CRD field selector e2e test to conformance (#128109)
* Promote CRD field selector e2e test to conformance

* Fix release number for conformance test

* re-run update conformance
2024-10-30 21:19:25 +00:00
Kubernetes Prow Robot
dc1d7f41ef
Merge pull request #128456 from benluddy/nondeterministic-response-encoding
KEP-4222: Allow nondeterministic object encoding in HTTP response bodies.
2024-10-30 20:13:27 +00:00
Kevin Conner
9538747d4d Fix elapsed time tracking for validating admission policies
Signed-off-by: Kevin Conner <kev.conner@gmail.com>
2024-10-30 12:38:39 -07:00
Davanum Srinivas
152d342a8d
Update to latest cadvisor
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2024-10-30 15:25:21 -04:00
Ben Luddy
dee76a460e
Allow nondeterministic object encoding in HTTP response bodies. 2024-10-30 15:10:16 -04:00