Commit Graph

387 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
22a30e7cbb Merge pull request #127700 from macsko/add_option_waitforpodsprocessed
Add option to wait for pods to be attempted in barrierOp in scheduler_perf
2024-10-01 05:17:49 +01:00
Maciej Skoczeń
fdbf21e03a Allow to filter pods using labels while collecting metrics in scheduler_perf 2024-09-30 13:32:12 +00:00
Maciej Skoczeń
928670061d Allow to wait for pods to be attempted in barrierOp in scheduler_perf 2024-09-30 08:07:15 +00:00
Maciej Skoczeń
837d917d91 Make sleepOp duration parametrizable in scheduler_perf 2024-09-26 13:07:22 +00:00
Maciej Skoczeń
40154baab0 Add updateAnyOp to scheduler_perf 2024-09-25 12:42:25 +00:00
Patrick Ohly
d100768d94 scheduler_perf: track and visualize progress over time
This is useful to see whether pod scheduling happens in bursts and how it
behaves over time, which is relevant in particular for dynamic resource
allocation where it may become harder at the end to find the node which still
has resources available.

Besides "pods scheduled" it's also useful to know how many attempts were
needed, so schedule_attempts_total also gets sampled and stored.

To visualize the result of one or more test runs, use:

     gnuplot.sh *.dat
2024-09-25 11:09:15 +02:00
Patrick Ohly
ded96042f7 scheduler_perf + DRA: load up cluster by allocating claims
Having to schedule 4999 pods to simulate a "full" cluster is slow. Creating
claims and then allocating them more or less like the scheduler would when
scheduling pods is much faster and in practice has the same effect on the
dynamicresources plugin because it looks at claims, not pods.

This allows defining the "steady state" workloads with higher number of
devices ("claimsPerNode") again. This was prohibitively slow before.
2024-09-25 09:45:39 +02:00
Patrick Ohly
385599f0a8 scheduler_perf + DRA: measure pod scheduling at a steady state
The previous tests were based on scheduling pods until the cluster was
full. This is a valid scenario, but not necessarily realistic.

More realistic is how quickly the scheduler can schedule new pods when some
old pods finished running, in particular in a cluster that is properly
utilized (= almost full). To test this, pods must get created, scheduled, and
then immediately deleted. This can run for a certain period of time.

Scenarios with empty and full cluster have different scheduling rates. This was
previously visible for DRA because the 50% percentile of the scheduling
throughput was lower than the average, but one had to guess in which scenario
the throughput was lower. Now this can be measured for DRA with the new
SteadyStateClusterResourceClaimTemplateStructured test.

The metrics collector must watch pod events to figure out how many pods got
scheduled. Polling misses pods that already got deleted again. There seems to
be no relevant difference in the collected
metrics (SchedulingWithResourceClaimTemplateStructured/2000pods_200nodes, 6 repetitions):

     │            before            │                     after                     │
     │ SchedulingThroughput/Average │ SchedulingThroughput/Average  vs base         │
                         157.1 ± 0%                     157.1 ± 0%  ~ (p=0.329 n=6)

     │           before            │                    after                     │
     │ SchedulingThroughput/Perc50 │ SchedulingThroughput/Perc50  vs base         │
                        48.99 ± 8%                    47.52 ± 9%  ~ (p=0.937 n=6)

     │           before            │                    after                     │
     │ SchedulingThroughput/Perc90 │ SchedulingThroughput/Perc90  vs base         │
                       463.9 ± 16%                   460.1 ± 13%  ~ (p=0.818 n=6)

     │           before            │                    after                     │
     │ SchedulingThroughput/Perc95 │ SchedulingThroughput/Perc95  vs base         │
                       463.9 ± 16%                   460.1 ± 13%  ~ (p=0.818 n=6)

     │           before            │                    after                     │
     │ SchedulingThroughput/Perc99 │ SchedulingThroughput/Perc99  vs base         │
                       463.9 ± 16%                   460.1 ± 13%  ~ (p=0.818 n=6)
2024-09-25 09:45:39 +02:00
Patrick Ohly
51cafb0053 scheduler_perf: more useful errors for configuration mistakes
Before, the first error was reported, which typically was the "invalid op code"
error from the createAny operation:

    scheduler_perf.go:900: parsing test cases error: error unmarshaling JSON: while decoding JSON: cannot unmarshal {"collectMetrics":true,"count":10,"duration":"30s","namespace":"test","opcode":"createPods","podTemplatePath":"config/dra/pod-with-claim-template.yaml","steadyState":true} into any known op type: invalid opcode "createPods"; expected "createAny"

Now the opcode is determined first, then decoding into exactly the matching operation is
tried and validated. Unknown fields are an error.

In the case above, decoding a string into time.Duration failed:

    scheduler_test.go:29: parsing test cases error: error unmarshaling JSON: while decoding JSON: decoding {"collectMetrics":true,"count":10,"duration":"30s","namespace":"test","opcode":"createPods","podTemplatePath":"config/dra/pod-with-claim-template.yaml","steadyState":true} into *benchmark.createPodsOp: json: cannot unmarshal string into Go struct field createPodsOp.Duration of type time.Duration

Some typos:

    scheduler_test.go:29: parsing test cases error: error unmarshaling JSON: while decoding JSON: unknown opcode "sleeep" in {"duration":"5s","opcode":"sleeep"}

    scheduler_test.go:29: parsing test cases error: error unmarshaling JSON: while decoding JSON: decoding {"countParram":"$deletingPods","deletePodsPerSecond":50,"opcode":"createPods"} into *benchmark.createPodsOp: json: unknown field "countParram"
2024-09-25 09:45:39 +02:00
Patrick Ohly
7bbb3465e5 scheduler_perf: more realistic structured parameters tests
Real devices are likely to have a handful of attributes and (for GPUs) the
memory as capacity. Most keys will be driver specific, a few may eventually
have a domain (none standardized right now).
2024-09-24 18:52:45 +02:00
Maciej Skoczeń
a273e5381a Add separate ops for collecting metrics from multiple namespaces in scheduler_perf 2024-09-24 12:28:53 +00:00
Maciej Skoczeń
287b61918a Add deletePodsOp to scheduler_perf 2024-09-20 09:46:27 +00:00
Kubernetes Prow Robot
2850d302ca Merge pull request #127269 from sanposhiho/patch-11
chore: tidy up labels in scheduler-perf
2024-09-19 04:18:44 +01:00
Maciej Skoczeń
2d4d7e0b5f Fix opIndex in log when deleting pod failed in scheduler_perf 2024-09-16 13:48:24 +00:00
Kensei Nakada
898cb15b18 chore: clarify the labels in scheduler-perf 2024-09-14 15:39:54 +09:00
Maciej Skoczeń
9f6fdf1b77 Decrease number of integration tests in scheduler_perf 2024-09-12 15:13:53 +00:00
Kubernetes Prow Robot
c3ebd95c83 Merge pull request #127236 from macsko/scheduler_perf_test_case_for_hints_memory_leak_scenario
Add scheduler_perf test case for queueing hints memory leak scenario
2024-09-11 16:03:11 +01:00
Maciej Skoczeń
c1f7b8e9f1 Measure event_handling and QHints duration metrics in scheduler_perf 2024-09-10 10:45:19 +00:00
Maciej Skoczeń
dba24fde78 Add scheduler_perf test case for queueing hints memory leak scenario 2024-09-10 08:15:10 +00:00
Kubernetes Prow Robot
abc056843c Merge pull request #127238 from macsko/make_scheduler_perf_integration_tests_shorter
Make scheduler_perf integration tests shorter
2024-09-10 03:17:14 +01:00
Maciej Skoczeń
ccf86f1709 Make scheduler_perf integration tests shorter 2024-09-09 09:32:13 +00:00
Maciej Skoczeń
7d4c713520 Check if InFlightEvents is empty after scheduler_perf workload 2024-09-09 08:00:34 +00:00
Maciej Skoczeń
3047ab73f5 Reset only metrics configured in collector before the createPodsOp 2024-09-06 08:26:20 +00:00
Kubernetes Prow Robot
08dd9951f5 Merge pull request #126886 from pohly/scheduler-perf-output
scheduler_perf: output
2024-08-26 22:23:40 +01:00
Kubernetes Prow Robot
8bbc0636b9 Merge pull request #126911 from macsko/scheduler_perf_throughput_fixes
Fix wrong throughput threshold for one scheduler_perf test case
2024-08-26 18:42:17 +01:00
Kubernetes Prow Robot
0bcbc3b77a Merge pull request #124003 from carlory/scheduler-rm-non-csi-limit
kube-scheduler remove non-csi volumelimit plugins
2024-08-26 12:02:13 +01:00
Maciej Skoczeń
7a88548755 Add workload name to failed threshold log 2024-08-26 07:44:52 +00:00
Maciej Skoczeń
71c9b9e2b0 Fix wrong throughput threshold for SchedulingRequiredPodAntiAffinityWithNSSelector test 2024-08-26 07:40:04 +00:00
Maciej Skoczeń
48dc6ff43c Disable scheduler_perf performance DRA tests 2024-08-26 07:35:36 +00:00
Kubernetes Prow Robot
605e94f6df Merge pull request #126871 from macsko/set_thresholds_in_scheduler_perf
Set scheduling throughput thresholds in scheduler_perf tests
2024-08-23 16:39:54 +01:00
Maciej Skoczeń
48a8cb2bc5 Document throughput thresholds in scheduler_perf readme 2024-08-23 14:22:48 +00:00
Patrick Ohly
bf1188d292 scheduler_perf: only store log output after failures
Reconfiguring the logging infrastructure with a per-test output file mimicks
the behavior of per-test output (log output captured only on failures) while
still using the normal logging code, which is important for benchmarking.

To enable this behavior, the ARTIFACT env variable must be set.
2024-08-23 16:02:45 +02:00
Maciej Skoczeń
d0e3fc3561 Set scheduling throughput thresholds in scheduler_perf tests 2024-08-23 12:48:28 +00:00
Kubernetes Prow Robot
a1fc2551ba Merge pull request #126144 from likakuli/cleanup-unusedparamters
cleanup: remove scheduler_perf unused parameters
2024-08-22 19:29:40 +01:00
Maciej Skoczeń
77372cf3cf Label short workloads in scheduler_perf tests 2024-08-20 10:04:30 +00:00
Maciej Skoczeń
09fc399837 Add label to select short workloads in scheduler_perf tests 2024-08-20 10:04:30 +00:00
Maciej Skoczeń
a2cd8aa539 Make smaller workloads for scheduler_perf integration tests 2024-08-20 10:04:25 +00:00
Kubernetes Prow Robot
983875b2f5 Merge pull request #126337 from macsko/add_larger_scheduler_perf_test_cases
Add larger scheduler_perf test cases
2024-08-16 09:44:38 -07:00
Maciej Skoczeń
3b7b50a2cc Create fresh etcd instance for each workload in scheduler_perf 2024-08-16 08:19:52 +00:00
Maciej Skoczeń
5894e201fa Measure metrics only during a specific op in scheduler_perf 2024-08-13 12:34:06 +00:00
carlory
cba2b3f773 kube-scheduler remove non-csi volumelimit plugins 2024-08-05 15:02:32 +08:00
Maciej Skoczeń
1747483922 Add larger scheduler_perf test cases 2024-07-25 14:20:51 +00:00
Maciej Skoczeń
c15cdf7431 Init etcd and apiserver per test case in scheduler_perf integration tests 2024-07-23 09:10:01 +00:00
Patrick Ohly
9f36c8d718 DRA: add DRAControlPlaneController feature gate for "classic DRA"
In the API, the effect of the feature gate is that alpha fields get dropped on
create. They get preserved during updates if already set. The
PodSchedulingContext registration is *not* restricted by the feature gate.
This enables deleting stale PodSchedulingContext objects after disabling
the feature gate.

The scheduler checks the new feature gate before setting up an informer for
PodSchedulingContext objects and when deciding whether it can schedule a
pod. If any claim depends on a control plane controller, the scheduler bails
out, leading to:

    Status:       Pending
    ...
      Warning  FailedScheduling             73s   default-scheduler  0/1 nodes are available: resourceclaim depends on disabled DRAControlPlaneController feature. no new claims to deallocate, preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.

The rest of the changes prepare for testing the new feature separately from
"structured parameters". The goal is to have base "dra" jobs which just enable
and test those, then "classic-dra" jobs which add DRAControlPlaneController.
2024-07-22 18:09:34 +02:00
Patrick Ohly
599fe605f9 DRA scheduler: adapt to v1alpha3 API
The structured parameter allocation logic was written from scratch in
staging/src/k8s.io/dynamic-resource-allocation/structured where it might be
useful for out-of-tree components.

Besides the new features (amount, admin access) and API it now supports
backtracking when the initial device selection doesn't lead to a complete
allocation of all claims.

Co-authored-by: Ed Bartosh <eduard.bartosh@intel.com>
Co-authored-by: John Belamaric <jbelamaric@google.com>
2024-07-22 18:09:34 +02:00
Patrick Ohly
8a629b9f15 DRA: remove "sharable" from claim allocation result
Now all claims are shareable up to the limit imposed by the size of the
"reserverFor" array.

This is one of the agreed simplifications for 1.31.
2024-07-21 17:28:14 +02:00
Patrick Ohly
b51d68bb87 DRA: bump API v1alpha2 -> v1alpha3
This is in preparation for revamping the resource.k8s.io completely. Because
there will be no support for transitioning from v1alpha2 to v1alpha3, the
roundtrip test data for that API in 1.29 and 1.30 gets removed.

Repeating the version in the import name of the API packages is not really
required. It was done for a while to support simpler grepping for usage of
alpha APIs, but there are better ways for that now. So during this transition,
"resourceapi" gets used instead of "resourcev1alpha3" and the version gets
dropped from informer and lister imports. The advantage is that the next bump
to v1beta1 will affect fewer source code lines.

Only source code where the version really matters (like API registration)
retains the versioned import.
2024-07-21 17:28:13 +02:00
likakuli
ef9e1c39e9 cleanup: remove unused parameters
Signed-off-by: likakuli <1154584512@qq.com>
2024-07-17 16:27:12 +08:00
Kubernetes Prow Robot
a6460c4f3e Merge pull request #126036 from macsko/scheduler_perf_throughput_thresholds
Allow to set scheduling throughput thresholds in scheduler_perf tests
2024-07-16 21:43:13 -07:00
Maciej Skoczeń
767d2a3e5e Allow to set scheduling throughput thresholds in scheduler_perf tests 2024-07-15 08:06:21 +00:00