kubernetes

mirror of https://github.com/k3s-io/kubernetes.git synced 2025-11-03 07:11:01 +00:00

Author	SHA1	Message	Date
Maciej Skoczeń	8371a35824	Split scheduler_perf config into subdirectories	2024-11-04 08:45:34 +00:00
dom4ha	ff584a76e0	Fix Unschedulable test by scheduling high priority churn pods to get processed right after they were injected (before the queued test pods)	2024-10-30 13:04:38 +00:00
Patrick Ohly	9a7e4ccab2	DRA admin access: add feature gate The new DRAAdminAccess feature gate has the following effects: - If disabled in the apiserver, the spec.devices.requests[*].adminAccess field gets cleared. Same in the status. In both cases the scenario that it was already set and a claim or claim template get updated is special: in those cases, the field is not cleared. Also, allocating a claim with admin access is allowed regardless of the feature gate and the field is not cleared. In practice, the scheduler will not do that. - If disabled in the resource claim controller, creating ResourceClaims with the field set gets rejected. This prevents running workloads which depend on admin access. - If disabled in the scheduler, claims with admin access don't get allocated. The effect is the same. The alternative would have been to ignore the fields in claim controller and scheduler. This is bad because a monitoring workload then runs, blocking resources that probably were meant for production workloads.	2024-10-29 09:50:11 +01:00
Kensei Nakada	b5d0745db3	Fix: use pod-high-priority.yaml to trigger preemption in PreemptionAsync test case	2024-10-26 14:16:24 +09:00
dom4ha	b3c4fe48e9	Tune PreemptionAsync and Unschedulable tests threshold and params.	2024-10-23 12:24:10 +00:00
Maciej Skoczeń	84e23fcc88	Add scheduler_perf test case for NodeUpdate event handling	2024-10-22 09:03:53 +00:00
Kensei Nakada	83f9e4b6df	cleanup: remove event list	2024-10-18 11:10:10 +10:00
Kubernetes Prow Robot	b1b4e5d397	Merge pull request #128003 from pohly/dra-classic-dra-removal DRA: remove "classic DRA"	2024-10-18 00:55:17 +01:00
dom4ha	b7f55a37a0	Bring back the smallest integration test	2024-10-17 15:41:36 +00:00
dom4ha	59458573ff	Remove unschedulable test and replace it with the new one.	2024-10-17 15:41:21 +00:00
dom4ha	f2c947e36d	Add UnschedulableAsync test in scheduler_perf to monitor impact of unschedulable pods on scheduler throughput	2024-10-17 15:35:21 +00:00
dom4ha	b2b41444f2	Add PreemptionBlocking test in scheduler_perf to monitor how long the preemption process (which blocks scheduling of regular nodes) takes.	2024-10-17 09:58:32 +00:00
Patrick Ohly	f84eb5ecf8	DRA: remove "classic DRA" This removes the DRAControlPlaneController feature gate, the fields controlled by it (claim.spec.controller, claim.status.deallocationRequested, claim.status.allocation.controller, class.spec.suitableNodes), the PodSchedulingContext type, and all code related to the feature. The feature gets removed because there is no path towards beta and GA and DRA with "structured parameters" should be able to replace it.	2024-10-16 23:09:50 +02:00
Kubernetes Prow Robot	e287784a8d	Merge pull request #128050 from macsko/add_pod_add_event_handling_scheduler_perf_test_case Add scheduler_perf test case for AssignedPodAdd event handling	2024-10-16 15:37:02 +01:00
Maciej Skoczeń	0db96a0ac3	Add scheduler_perf test case for AssignedPodAdd event handling	2024-10-16 07:45:50 +00:00
Kubernetes Prow Robot	558c0b6eaa	Merge pull request #128084 from macsko/fix_panic_when_defining_featuregates_only_on_workload_level_scheduler_perf Fix panic when setting feature gates only on workload level in scheduler_perf	2024-10-15 23:05:03 +01:00
Kubernetes Prow Robot	9872b17ccc	Merge pull request #127828 from macsko/add_template_parameters_to_createnodesop_in_scheduler_perf Add template parameters to createNodesOp in scheduler_perf	2024-10-15 20:43:04 +01:00
Maciej Skoczeń	cca6f8c800	Fix panic when defining feature gates only on workload level in scheduler_perf	2024-10-15 09:50:55 +00:00
Kubernetes Prow Robot	2f7df335ad	Merge pull request #127615 from macsko/add_node_add_event_benchmark_to_scheduler_perf Add scheduler_perf test case for NodeAdd event handling	2024-10-11 18:10:19 +01:00
Kubernetes Prow Robot	1b6c993cee	Merge pull request #127952 from macsko/allow_to_specify_feature_gates_on_workload_level_scheduler_perf Allow to set feature gates on workload level in scheduler_perf	2024-10-11 15:28:19 +01:00
Maciej Skoczeń	e676d0e76a	Allow to specify feature gates on workload level in scheduler_perf	2024-10-11 08:41:08 +00:00
Maciej Skoczeń	6dbb5d84b3	Move integration tests perf utils to scheduler_perf package	2024-10-11 08:27:08 +00:00
Maciej Skoczeń	25850caf8a	Add scheduler_perf test case for NodeAdd event handling	2024-10-11 07:40:06 +00:00
Maciej Skoczeń	930ebe16db	Add template parameters to createNodesOp in scheduler_perf	2024-10-09 08:51:04 +00:00
Maciej Skoczeń	98e4892b84	Add scheduler_perf test case for pod update events handling	2024-10-09 08:35:25 +00:00
Maciej Skoczeń	2a08ce5c68	Add scheduler_perf test case for AssignedPodDelete event handling	2024-10-02 09:16:28 +00:00
Kubernetes Prow Robot	ae617c3d20	Merge pull request #127781 from macsko/use_barrier_not_sleep_where_possible_in_scheduler_perf_test_cases Use barrier instead of sleep when possible in scheduler_perf test cases	2024-10-01 22:06:10 +01:00
Maciej Skoczeń	bae0eb91d4	Use barrier instead of sleep when possible in scheduler_perf test cases	2024-10-01 13:53:04 +00:00
Maciej Skoczeń	5e2552c2b0	Allow to filter pods using labels on barrier in scheduler_perf	2024-10-01 08:48:37 +00:00
Kubernetes Prow Robot	22a30e7cbb	Merge pull request #127700 from macsko/add_option_waitforpodsprocessed Add option to wait for pods to be attempted in barrierOp in scheduler_perf	2024-10-01 05:17:49 +01:00
Maciej Skoczeń	fdbf21e03a	Allow to filter pods using labels while collecting metrics in scheduler_perf	2024-09-30 13:32:12 +00:00
Maciej Skoczeń	928670061d	Allow to wait for pods to be attempted in barrierOp in scheduler_perf	2024-09-30 08:07:15 +00:00
Maciej Skoczeń	837d917d91	Make sleepOp duration parametrizable in scheduler_perf	2024-09-26 13:07:22 +00:00
Maciej Skoczeń	40154baab0	Add updateAnyOp to scheduler_perf	2024-09-25 12:42:25 +00:00
Patrick Ohly	d100768d94	scheduler_perf: track and visualize progress over time This is useful to see whether pod scheduling happens in bursts and how it behaves over time, which is relevant in particular for dynamic resource allocation where it may become harder at the end to find the node which still has resources available. Besides "pods scheduled" it's also useful to know how many attempts were needed, so schedule_attempts_total also gets sampled and stored. To visualize the result of one or more test runs, use: gnuplot.sh *.dat	2024-09-25 11:09:15 +02:00
Patrick Ohly	ded96042f7	scheduler_perf + DRA: load up cluster by allocating claims Having to schedule 4999 pods to simulate a "full" cluster is slow. Creating claims and then allocating them more or less like the scheduler would when scheduling pods is much faster and in practice has the same effect on the dynamicresources plugin because it looks at claims, not pods. This allows defining the "steady state" workloads with higher number of devices ("claimsPerNode") again. This was prohibitively slow before.	2024-09-25 09:45:39 +02:00
Patrick Ohly	385599f0a8	scheduler_perf + DRA: measure pod scheduling at a steady state The previous tests were based on scheduling pods until the cluster was full. This is a valid scenario, but not necessarily realistic. More realistic is how quickly the scheduler can schedule new pods when some old pods finished running, in particular in a cluster that is properly utilized (= almost full). To test this, pods must get created, scheduled, and then immediately deleted. This can run for a certain period of time. Scenarios with empty and full cluster have different scheduling rates. This was previously visible for DRA because the 50% percentile of the scheduling throughput was lower than the average, but one had to guess in which scenario the throughput was lower. Now this can be measured for DRA with the new SteadyStateClusterResourceClaimTemplateStructured test. The metrics collector must watch pod events to figure out how many pods got scheduled. Polling misses pods that already got deleted again. There seems to be no relevant difference in the collected metrics (SchedulingWithResourceClaimTemplateStructured/2000pods_200nodes, 6 repetitions): │ before │ after │ │ SchedulingThroughput/Average │ SchedulingThroughput/Average vs base │ 157.1 ± 0% 157.1 ± 0% ~ (p=0.329 n=6) │ before │ after │ │ SchedulingThroughput/Perc50 │ SchedulingThroughput/Perc50 vs base │ 48.99 ± 8% 47.52 ± 9% ~ (p=0.937 n=6) │ before │ after │ │ SchedulingThroughput/Perc90 │ SchedulingThroughput/Perc90 vs base │ 463.9 ± 16% 460.1 ± 13% ~ (p=0.818 n=6) │ before │ after │ │ SchedulingThroughput/Perc95 │ SchedulingThroughput/Perc95 vs base │ 463.9 ± 16% 460.1 ± 13% ~ (p=0.818 n=6) │ before │ after │ │ SchedulingThroughput/Perc99 │ SchedulingThroughput/Perc99 vs base │ 463.9 ± 16% 460.1 ± 13% ~ (p=0.818 n=6)	2024-09-25 09:45:39 +02:00
Patrick Ohly	51cafb0053	scheduler_perf: more useful errors for configuration mistakes Before, the first error was reported, which typically was the "invalid op code" error from the createAny operation: scheduler_perf.go:900: parsing test cases error: error unmarshaling JSON: while decoding JSON: cannot unmarshal {"collectMetrics":true,"count":10,"duration":"30s","namespace":"test","opcode":"createPods","podTemplatePath":"config/dra/pod-with-claim-template.yaml","steadyState":true} into any known op type: invalid opcode "createPods"; expected "createAny" Now the opcode is determined first, then decoding into exactly the matching operation is tried and validated. Unknown fields are an error. In the case above, decoding a string into time.Duration failed: scheduler_test.go:29: parsing test cases error: error unmarshaling JSON: while decoding JSON: decoding {"collectMetrics":true,"count":10,"duration":"30s","namespace":"test","opcode":"createPods","podTemplatePath":"config/dra/pod-with-claim-template.yaml","steadyState":true} into benchmark.createPodsOp: json: cannot unmarshal string into Go struct field createPodsOp.Duration of type time.Duration Some typos: scheduler_test.go:29: parsing test cases error: error unmarshaling JSON: while decoding JSON: unknown opcode "sleeep" in {"duration":"5s","opcode":"sleeep"} scheduler_test.go:29: parsing test cases error: error unmarshaling JSON: while decoding JSON: decoding {"countParram":"$deletingPods","deletePodsPerSecond":50,"opcode":"createPods"} into benchmark.createPodsOp: json: unknown field "countParram"	2024-09-25 09:45:39 +02:00
Patrick Ohly	7bbb3465e5	scheduler_perf: more realistic structured parameters tests Real devices are likely to have a handful of attributes and (for GPUs) the memory as capacity. Most keys will be driver specific, a few may eventually have a domain (none standardized right now).	2024-09-24 18:52:45 +02:00
Maciej Skoczeń	a273e5381a	Add separate ops for collecting metrics from multiple namespaces in scheduler_perf	2024-09-24 12:28:53 +00:00
Maciej Skoczeń	287b61918a	Add deletePodsOp to scheduler_perf	2024-09-20 09:46:27 +00:00
Kubernetes Prow Robot	2850d302ca	Merge pull request #127269 from sanposhiho/patch-11 chore: tidy up labels in scheduler-perf	2024-09-19 04:18:44 +01:00
Maciej Skoczeń	2d4d7e0b5f	Fix opIndex in log when deleting pod failed in scheduler_perf	2024-09-16 13:48:24 +00:00
Kensei Nakada	898cb15b18	chore: clarify the labels in scheduler-perf	2024-09-14 15:39:54 +09:00
Maciej Skoczeń	9f6fdf1b77	Decrease number of integration tests in scheduler_perf	2024-09-12 15:13:53 +00:00
Kubernetes Prow Robot	c3ebd95c83	Merge pull request #127236 from macsko/scheduler_perf_test_case_for_hints_memory_leak_scenario Add scheduler_perf test case for queueing hints memory leak scenario	2024-09-11 16:03:11 +01:00
Maciej Skoczeń	c1f7b8e9f1	Measure event_handling and QHints duration metrics in scheduler_perf	2024-09-10 10:45:19 +00:00
Maciej Skoczeń	dba24fde78	Add scheduler_perf test case for queueing hints memory leak scenario	2024-09-10 08:15:10 +00:00
Kubernetes Prow Robot	abc056843c	Merge pull request #127238 from macsko/make_scheduler_perf_integration_tests_shorter Make scheduler_perf integration tests shorter	2024-09-10 03:17:14 +01:00
Maciej Skoczeń	ccf86f1709	Make scheduler_perf integration tests shorter	2024-09-09 09:32:13 +00:00

1 2 3 4 5 ...

416 Commits