Commit Graph

125606 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
5de3c1e93d
Merge pull request #127292 from skitt/fix-client-go-extensions-without-test
client-go: add missing template functions and types for extensions
2024-09-25 17:46:00 +01:00
YamasouA
c50884a2f9 tweak 2024-09-25 23:09:06 +09:00
YamasouA
c2ba2ea383 fix unit test 2024-09-25 22:37:05 +09:00
YamasouA
84f45c81ca tweak 2024-09-25 22:37:05 +09:00
YamasouA
b4d9fe3957 delete framework.Add 2024-09-25 22:37:05 +09:00
YamasouA
b98634c2da volumebinding: scheduler queueing hints - CSIDriver
fix if condition

add test

add log

eliminate unnecessary args from log

fix Queue condition

check original pod status

fix return value when can scheduleable

fix tweak

fix testcase
2024-09-25 22:37:05 +09:00
Kubernetes Prow Robot
36bbdd692f
Merge pull request #127466 from guozheng-shen/fix-return
endpointsLeasesResourceLock and configMapsLeasesResourceLock  has been removed
2024-09-25 14:36:01 +01:00
Maciej Skoczeń
40154baab0 Add updateAnyOp to scheduler_perf 2024-09-25 12:42:25 +00:00
Kubernetes Prow Robot
5fc4e71a30
Merge pull request #127499 from pohly/scheduler-perf-updates
scheduler_perf: updates to enhance performance testing of DRA
2024-09-25 13:32:00 +01:00
Maciej Szulik
f11ddad99d
e2e: add test covering cronjob-scheduled-timestamp annotation added by cronjob 2024-09-25 12:47:27 +02:00
Kubernetes Prow Robot
75214d11d5
Merge pull request #127428 from googs1025/scheduler/plugin
chore(scheduler): refactor import package ordering in scheduler
2024-09-25 11:40:07 +01:00
Kubernetes Prow Robot
4c4edfede5
Merge pull request #127398 from my-git9/patch-23
kubeadm: update comment for ArgumentsFromCommand function in app/util/arguments
2024-09-25 11:40:00 +01:00
Lukasz Szaszkiewicz
ae35048cb0
adds watchListEndpointRestrictions for watchlist requests (#126996)
* endpoints/handlers/get: intro watchListEndpointRestrictions

* consistencydetector/list_data_consistency_detector: expose IsDataConsistencyDetectionForListEnabled

* e2e/watchlist: extract common function for adding unstructured secrets

* e2e/watchlist: new e2e scenarios for convering watchListEndpointRestrict
2024-09-25 10:12:01 +01:00
Patrick Ohly
d100768d94 scheduler_perf: track and visualize progress over time
This is useful to see whether pod scheduling happens in bursts and how it
behaves over time, which is relevant in particular for dynamic resource
allocation where it may become harder at the end to find the node which still
has resources available.

Besides "pods scheduled" it's also useful to know how many attempts were
needed, so schedule_attempts_total also gets sampled and stored.

To visualize the result of one or more test runs, use:

     gnuplot.sh *.dat
2024-09-25 11:09:15 +02:00
xin.li
706e939382 kubeadm: update comment for ArgumentsFromCommand function in app/util/arguments
Signed-off-by: xin.li <xin.li@daocloud.io>
2024-09-25 16:19:28 +08:00
Patrick Ohly
ded96042f7 scheduler_perf + DRA: load up cluster by allocating claims
Having to schedule 4999 pods to simulate a "full" cluster is slow. Creating
claims and then allocating them more or less like the scheduler would when
scheduling pods is much faster and in practice has the same effect on the
dynamicresources plugin because it looks at claims, not pods.

This allows defining the "steady state" workloads with higher number of
devices ("claimsPerNode") again. This was prohibitively slow before.
2024-09-25 09:45:39 +02:00
Patrick Ohly
385599f0a8 scheduler_perf + DRA: measure pod scheduling at a steady state
The previous tests were based on scheduling pods until the cluster was
full. This is a valid scenario, but not necessarily realistic.

More realistic is how quickly the scheduler can schedule new pods when some
old pods finished running, in particular in a cluster that is properly
utilized (= almost full). To test this, pods must get created, scheduled, and
then immediately deleted. This can run for a certain period of time.

Scenarios with empty and full cluster have different scheduling rates. This was
previously visible for DRA because the 50% percentile of the scheduling
throughput was lower than the average, but one had to guess in which scenario
the throughput was lower. Now this can be measured for DRA with the new
SteadyStateClusterResourceClaimTemplateStructured test.

The metrics collector must watch pod events to figure out how many pods got
scheduled. Polling misses pods that already got deleted again. There seems to
be no relevant difference in the collected
metrics (SchedulingWithResourceClaimTemplateStructured/2000pods_200nodes, 6 repetitions):

     │            before            │                     after                     │
     │ SchedulingThroughput/Average │ SchedulingThroughput/Average  vs base         │
                         157.1 ± 0%                     157.1 ± 0%  ~ (p=0.329 n=6)

     │           before            │                    after                     │
     │ SchedulingThroughput/Perc50 │ SchedulingThroughput/Perc50  vs base         │
                        48.99 ± 8%                    47.52 ± 9%  ~ (p=0.937 n=6)

     │           before            │                    after                     │
     │ SchedulingThroughput/Perc90 │ SchedulingThroughput/Perc90  vs base         │
                       463.9 ± 16%                   460.1 ± 13%  ~ (p=0.818 n=6)

     │           before            │                    after                     │
     │ SchedulingThroughput/Perc95 │ SchedulingThroughput/Perc95  vs base         │
                       463.9 ± 16%                   460.1 ± 13%  ~ (p=0.818 n=6)

     │           before            │                    after                     │
     │ SchedulingThroughput/Perc99 │ SchedulingThroughput/Perc99  vs base         │
                       463.9 ± 16%                   460.1 ± 13%  ~ (p=0.818 n=6)
2024-09-25 09:45:39 +02:00
Patrick Ohly
51cafb0053 scheduler_perf: more useful errors for configuration mistakes
Before, the first error was reported, which typically was the "invalid op code"
error from the createAny operation:

    scheduler_perf.go:900: parsing test cases error: error unmarshaling JSON: while decoding JSON: cannot unmarshal {"collectMetrics":true,"count":10,"duration":"30s","namespace":"test","opcode":"createPods","podTemplatePath":"config/dra/pod-with-claim-template.yaml","steadyState":true} into any known op type: invalid opcode "createPods"; expected "createAny"

Now the opcode is determined first, then decoding into exactly the matching operation is
tried and validated. Unknown fields are an error.

In the case above, decoding a string into time.Duration failed:

    scheduler_test.go:29: parsing test cases error: error unmarshaling JSON: while decoding JSON: decoding {"collectMetrics":true,"count":10,"duration":"30s","namespace":"test","opcode":"createPods","podTemplatePath":"config/dra/pod-with-claim-template.yaml","steadyState":true} into *benchmark.createPodsOp: json: cannot unmarshal string into Go struct field createPodsOp.Duration of type time.Duration

Some typos:

    scheduler_test.go:29: parsing test cases error: error unmarshaling JSON: while decoding JSON: unknown opcode "sleeep" in {"duration":"5s","opcode":"sleeep"}

    scheduler_test.go:29: parsing test cases error: error unmarshaling JSON: while decoding JSON: decoding {"countParram":"$deletingPods","deletePodsPerSecond":50,"opcode":"createPods"} into *benchmark.createPodsOp: json: unknown field "countParram"
2024-09-25 09:45:39 +02:00
Stephen Kitt
13dfa4cbf5
Run codegen
Signed-off-by: Stephen Kitt <skitt@redhat.com>
2024-09-25 07:59:49 +02:00
Stephen Kitt
7313fec892
client-go: add missing template functions and types
Now that imports aren't automatically added, the client-go generator
produces broken code for extensions since it references a few
functions and types directly without declaring them properly
(klog.Warningf, time.Duration, time.Second).

The generated code also references c.client and c.ns which are no
longer accessible following the generic refactor.

This fixes both issues by adding missing template functions and types,
and going through the appropriate getters.

Signed-off-by: Stephen Kitt <skitt@redhat.com>
2024-09-25 07:59:40 +02:00
Stephen Kitt
06f072b009
Run codegen
Signed-off-by: Stephen Kitt <skitt@redhat.com>
2024-09-25 07:58:39 +02:00
Stephen Kitt
b882213f8f
Add a missing PatchOptions declaration for extensions
Signed-off-by: Stephen Kitt <skitt@redhat.com>
2024-09-25 07:58:38 +02:00
Stephen Kitt
acb1b364b8
Add an example with all possible extensions
Signed-off-by: Stephen Kitt <skitt@redhat.com>
2024-09-25 07:58:33 +02:00
Omer Aplatony
ade7305940 chore: moving apiserver featuregates to versioned
Signed-off-by: Omer Aplatony <omerap12@gmail.com>
2024-09-25 07:41:26 +03:00
Kubernetes Prow Robot
99ff62e87a
Merge pull request #127491 from SataQiu/fix-etcd-20240920
kubeadm: check whether the peer URL for the added etcd member already exists when the MemberAddAsLearner/MemberAdd fails
2024-09-25 05:08:07 +01:00
Kubernetes Prow Robot
2e6216170b
Merge pull request #127319 from p0lyn0mial/upstream-define-initial-events-list-blueprint
apimachinery/meta/types.go: define InitialEventsListBlueprintAnnotationKey const
2024-09-25 05:08:00 +01:00
Matthieu MOREL
f50a173bec fix: enable contains rule from testifylint in module k8s.io/client-go
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2024-09-25 06:05:30 +02:00
Matthieu MOREL
3b92b9f84d fix: enable contains rule from testifylint in module k8s.io/apiserver
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2024-09-25 06:04:37 +02:00
Matthieu MOREL
a28c2b6bf8 fix: enable error-is-as rule from testifylint in module k8s.io/client-go
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2024-09-25 06:03:50 +02:00
Matthieu MOREL
27b98be303 fix: enable nil-compare and error-nil rules from testifylint in module k8s.io/kubernetes
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2024-09-25 06:02:47 +02:00
carlory
b3913395c5 drop the option mark from the InvolvedObject field of internal event object 2024-09-25 11:43:52 +08:00
Kubernetes Prow Robot
f3a54b68f9
Merge pull request #127579 from chrishenzie/context
Propagate existing ctx instead of context.TODO() in sample-controller
2024-09-25 04:02:06 +01:00
Kubernetes Prow Robot
5dd244ff00
Merge pull request #125796 from haorenfsa/fix-gc-sync-blocked
garbagecollector: controller should not be blocking on failed cache sync
2024-09-25 04:02:00 +01:00
Kubernetes Prow Robot
8ccc878de0
Merge pull request #127583 from mmorel-35/testifylint/disable/require-error
chore: disable require-error rule from testifylint
2024-09-24 23:08:00 +01:00
Ben Luddy
de914d6e54
Support nondeterministic encode for the CBOR serializer. 2024-09-24 16:38:02 -04:00
Kubernetes Prow Robot
e9cde03b91
Merge pull request #127598 from aojea/servicecidr_seconday_dualwrite
bugfix: initialize secondary range registry with the right value
2024-09-24 21:08:08 +01:00
Kubernetes Prow Robot
63fc917521
Merge pull request #127480 from thockin/skip_test_target_normalization
Skip test target normalization
2024-09-24 21:08:01 +01:00
Kubernetes Prow Robot
9e157c5450
Merge pull request #127357 from lengrongfu/feat/add-chan-buffer
add resourceupdates.Update chan buffer
2024-09-24 20:02:01 +01:00
Antonio Ojea
7a9bca3888 bugfix: initialize secondary range registry with the right value
When MultiCIDRServiceAllocator feature is enabled, we added an
additional feature gate DisableAllocatorDualWrite that allows to enable
a mirror behavior on the old allocator to deal with problems during
cluster upgrades.

During the implementation the secondary range of the legacy allocator
was initialized with the valuye of the primary range, hence, when a
Service tried to allocate a new IP on the secondary range, it succeded
in the new ip allocator but failed when it tried to allocate the same IP
on the legacy allocator, since it has a different range.

Expand the integration test that run over all the combinations of
Service ClusterIP possibilities to run with all the possible
combinations of the feature gates.

The integration test need to change the way of starting the apiserver
otherwise it will timeout.
2024-09-24 17:48:13 +00:00
Patrick Ohly
7bbb3465e5 scheduler_perf: more realistic structured parameters tests
Real devices are likely to have a handful of attributes and (for GPUs) the
memory as capacity. Most keys will be driver specific, a few may eventually
have a domain (none standardized right now).
2024-09-24 18:52:45 +02:00
rongfu.leng
ead64fb8f0 add resourceupdates.Update chan buffer
Signed-off-by: rongfu.leng <lenronfu@gmail.com>
2024-09-24 16:48:32 +00:00
Kubernetes Prow Robot
b071443187
Merge pull request #127592 from dims/wait-for-gpus-even-for-aws-kubetest2-ec2-harness
Wait for GPUs even for AWS kubetest2 ec2 harness
2024-09-24 17:26:08 +01:00
Kubernetes Prow Robot
56071089e2
Merge pull request #127573 from benluddy/dynamic-golden-response-test
Add test for unintended changes to dynamic client response handling.
2024-09-24 17:26:01 +01:00
Tim Hockin
8912df652b
Use Go workspaces + go list to find test targets
Plain old UNIX find requires us to do all sorts of silly filtering.
2024-09-24 09:04:13 -07:00
SataQiu
9af1b25bec kubeadm: check the member list status before adding or removing an etcd member 2024-09-24 22:53:42 +08:00
Kubernetes Prow Robot
4c24b9337f
Merge pull request #127575 from alculquicondor/acondor-apps
Stepping down from SIG Apps reviewers
2024-09-24 15:38:06 +01:00
Kubernetes Prow Robot
9571d3b6c6
Merge pull request #125995 from carlory/remove-unnecessary-permissions
remove unneeded permissions for volume controllers
2024-09-24 15:38:00 +01:00
Davanum Srinivas
472ca3b279
skip control plane nodes, they may not have GPUs
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2024-09-24 10:09:33 -04:00
Kubernetes Prow Robot
6ded721910
Merge pull request #127496 from macsko/add_metricscollectionop_to_scheduler_perf
Add separate ops for collecting metrics from multiple namespaces in scheduler_perf
2024-09-24 14:34:00 +01:00
Davanum Srinivas
349c7136c9
Wait for GPUs even for AWS kubetest2 ec2 harness
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2024-09-24 09:11:18 -04:00