Commit Graph

3097 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
f852d7fead Merge pull request #118653 from pohly/volume-resource-requirements
Volume resource requirements
2023-08-21 14:08:05 -07:00
Kubernetes Prow Robot
f082fab916 Merge pull request #119556 from linxiulei/schedMF
Trim managedFields in pod informer
2023-08-21 07:03:34 -07:00
Patrick Ohly
2472291790 api: introduce separate VolumeResourceRequirements struct
PVC and containers shared the same ResourceRequirements struct to define their
API. When resource claims were added, that struct got extended, which
accidentally also changed the PVC API. To avoid such a mistake from happening
again, PVC now uses its own VolumeResourceRequirements struct.

The `Claims` field gets removed because risk of breaking someone is low:
theoretically, YAML files which have a claims field for volumes now
get rejected when validating against the OpenAPI. Such files
have never made sense and should be fixed.

Code that uses the struct definitions needs to be updated.
2023-08-21 15:31:28 +02:00
Kubernetes Prow Robot
ea3318cb71 Merge pull request #119971 from kwakubiney/chore/include-pod-uid-in-event-log
chore: attach pod UID to event log
2023-08-21 04:13:22 -07:00
Eric Lin
f93bd699aa Trim managedFields in pod informer
Signed-off-by: Eric Lin <exlin@google.com>
2023-08-20 13:09:15 +00:00
Kubernetes Prow Robot
312dc127a9 Merge pull request #118923 from AxeZhan/volume_zone_csi
[Scheduler]Translate beta label to ga in volume_zone
2023-08-17 20:20:28 -07:00
AxeZhan
af26ebd0fa translate beta label to ga in volume_zone 2023-08-18 00:31:09 +08:00
SataQiu
ef7d404702 using wait.PollUntilContextTimeout instead of deprecated wait.Poll for pkg/scheduler
using wait.PollUntilContextTimeout instead of deprecated wait.Poll for test/integration/scheduler

using wait.PollUntilContextTimeout instead of deprecated wait.Poll for test/e2e/scheduling

using wait.ConditionWithContextFunc for PodScheduled/PodIsGettingEvicted/PodScheduledIn/PodUnschedulable/PodSchedulingError
2023-08-17 17:25:09 +08:00
kwakubiney
5752cbd8c7 chore: add pod UID in event log
This change includes preemptor pod UID in event log to allow
for easier debugging.

Signed-off-by: kwakubiney <kebiney@hotmail.com>
2023-08-16 11:00:56 +00:00
Kubernetes Prow Robot
130a5a423f Merge pull request #119785 from sanposhiho/waitonpermit-fiterror
fix: register the plugin rejects Pods in WaitOnPermit to UnschedulablePlugins
2023-08-15 23:13:04 -07:00
Kubernetes Prow Robot
719d1a84f7 Merge pull request #119778 from sanposhiho/bugfix-unschedulableandunresolvable
fix: when PreFilter returns UnschedulableAndUnresolvable, copy the state in all nodes in statusmap
2023-08-15 23:12:57 -07:00
Kubernetes Prow Robot
57212647e9 Merge pull request #119769 from Huang-Wei/bug/prefilter-preemption
Fix a bug that PostFilter plugin may don't function if previous PreFilter plugins return Skip
2023-08-15 23:12:50 -07:00
Kubernetes Prow Robot
ea30d100f6 Merge pull request #119399 from wackxu/optimizecodeforNodeUnschedulable
Optimize the code of NodeUnschedulable to reduce TolerationsTolerateT…
2023-08-15 17:14:26 -07:00
Heba Elayoty
224087abfa Add Pod Scheduling SLI Duration metric (#119049)
Signed-off-by: Heba Elayoty <hebaelayoty@gmail.com>
Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com>
2023-08-15 15:17:41 -07:00
Kensei Nakada
cf3f0bd778 fix: register the plugin rejects Pods in WaitOnPermit to UnschedulablePlugins 2023-08-12 07:18:01 +00:00
Kensei Nakada
b008223705 fix: when PreFilter returns UnschedulableAndUnresolvable, copy the state in all nodes in statusmap 2023-08-12 06:58:49 +00:00
Wei Huang
765f3916c2 Fix a bug that PostFilter plugin may not function if previous PreFilter plugins return Skip 2023-08-10 13:43:00 -07:00
Kensei Nakada
050c0437e6 fix: broadcast when pod is pushed back to activeQ directly in AddUnschedulableIfNotPresent 2023-08-09 03:32:14 +00:00
Patrick Ohly
2f30fae0e8 scheduler: fix data race after binding failure
When binding has failed, `Done` gets called by
`handleBindingCycleError`. Calling it again is at best redundant and worse,
suffers from a data race:
- the `assumedPodInfo` is placed in the backoff queue
- an event causes the `Pod` pointer to get updated in it
- reading `assumedPodInfo.Pod.UID` races with that write

This race was found with`go test -race`.
2023-08-02 11:04:10 +02:00
AxeZhan
2863b3d1ab Revert "refactor: simplify RunScorePlugins for readability + performance"
This reverts commit a7eb7ed5c6.
2023-07-20 10:50:32 +08:00
Kubernetes Prow Robot
15450a3f02 Merge pull request #119318 from codefromthecrypt/CycleState-docs
Improve docs on framework.CycleState
2023-07-18 07:19:10 -07:00
wackxu
a9d26ac7c7 Optimize the code of NodeUnschedulable to reduce TolerationsTolerateTaint function calls
Signed-off-by: wackxu <xushiwei5@huawei.com>
2023-07-18 21:00:05 +08:00
Adrian Cole
89ab733760 Improve docs on framework.CycleState
Signed-off-by: Adrian Cole <adrian@tetrate.io>
Co-authored-by: Kante Yin <kerthcet@gmail.com>
2023-07-18 14:48:20 +08:00
Kensei Nakada
c7e7eee554 feature(scheduling_queue): track events per Pods (#118438)
* feature(sscheduling_queue): track events per Pods

* fix typos

* record events in one slice and make each in-flight Pod to refer it

* fix: use Pop() in test before AddUnschedulableIfNotPresent to register in-flight Pods

* eliminate MakeNextPodFuncs

* call Done inside the scheduling queue

* fix comment

* implement done() not to require lock in it

* fix UTs

* improve the receivedEvents implementation based on suggestions

* call DonePod when we don't call AddUnschedulableIfNotPresent

* fix UT

* use queuehint to filter out events for in-flight Pods

* fix based on suggestion from aldo

* fix based on suggestion from Wei

* rename lastEventBefore → previousEvent

* fix based on suggestion

* address comments from aldo

* fix based on the suggestion from Abdullah

* gate in-flight Pods logic by the SchedulingQueueHints feature gate
2023-07-17 15:53:07 -07:00
Kensei Nakada
34640772ed implement SchedulerQueueingHints feature gate 2023-07-14 12:31:27 +00:00
carlory
0599b3caa0 change the QueueingHintFn to pass a logger 2023-07-13 00:56:41 +08:00
Patrick Ohly
6f1a29520f scheduler/dra: reduce pod scheduling latency
This is a combination of two related enhancements:
- By implementing a PreEnqueue check, the initial pod scheduling
  attempt for a pod with a claim template gets avoided when the claim
  does not exist yet.
- By implementing cluster event checks, only those pods get
  scheduled for which something changed, and they get scheduled
  immediately without delay.
2023-07-12 11:17:04 +02:00
Patrick Ohly
e01db32573 scheduler util: handle cache.DeletedFinalStateUnknown in As
Informer callbacks must be prepared to get cache.DeletedFinalStateUnknown as
the deleted object. They can use that as hint that some information may have
been missed, but typically they just retrieve the stored object inside it.
2023-07-12 11:07:59 +02:00
Patrick Ohly
ef48efc736 scheduler dynamicresources: minor logging improvements
This makes some complex values a bit more readable.
2023-07-12 11:07:59 +02:00
Kubernetes Prow Robot
e0dafe57a3 Merge pull request #117351 from pohly/dra-generated-resource-claim-names
DRA: generated resource claim names
2023-07-11 10:33:11 -07:00
Patrick Ohly
444d23bd2f dra: generated name for ResourceClaim from template
Generating the name avoids all potential name collisions. It's not clear how
much of a problem that was because users can avoid them and the deterministic
names for generic ephemeral volumes have not led to reports from users. But
using generated names is not too hard either.

What makes it relatively easy is that the new pod.status.resourceClaimStatus
map stores the generated name for kubelet and node authorizer, i.e. the
information in the pod is sufficient to determine the name of the
ResourceClaim.

The resource claim controller becomes a bit more complex and now needs
permission to modify the pod status. The new failure scenario of "ResourceClaim
created, updating pod status fails" is handled with the help of a new special
"resource.kubernetes.io/pod-claim-name" annotation that together with the owner
reference identifies exactly for what a ResourceClaim was generated, so
updating the pod status can be retried for existing ResourceClaims.

The transition from deterministic names is handled with a special case for that
recovery code path: a ResourceClaim with no annotation and a name that follows
the Kubernetes <= 1.27 naming pattern is assumed to be generated for that pod
claim and gets added to the pod status.

There's no immediate need for it, but just in case that it may become relevant,
the name of the generated ResourceClaim may also be left unset to record that
no claim was needed. Components processing such a pod can skip whatever they
normally would do for the claim. To ensure that they do and also cover other
cases properly ("no known field is set", "must check ownership"),
resourceclaim.Name gets extended.
2023-07-11 14:23:48 +02:00
Kubernetes Prow Robot
c95b16b280 Merge pull request #118608 from utam0k/podtopologyspread-prescore-skip
Return Skip in PodTopologySpread#PreScore under specific conditions
2023-07-10 09:27:07 -07:00
Kubernetes Prow Robot
0ae9aaacfa Merge pull request #118271 from tangwz/add_nodeports_prefilter_skip_status
feat(NodePorts): return Skip status in PreFilter
2023-07-09 20:49:04 -07:00
Kubernetes Prow Robot
09899b986f Merge pull request #118926 from mengjiao-liu/improve-scheduler-use-cmp.Diff
scheduler test: Use cmp.Diff instead of reflect.DeepEqual for pkg/scheduler/internal/cache
2023-07-08 21:51:04 -07:00
Gunju Kim
7286d122fb Mark pods with restartable init containers as UnschedulableAndUnresolvable
This marks the pods with restartable init containers as
`UnschedulableAndUnresolvable` if the feature gate is disabled to avoid
the inconsistency in resource calculation between the scheduler and the
older kubelet.
2023-07-08 07:26:13 +09:00
kerthcet
c0eb0caf4a Support fine-gained rescheduling in ReservePlugin
Signed-off-by: kerthcet <kerthcet@gmail.com>
2023-07-07 13:30:29 +08:00
Kubernetes Prow Robot
b07a843cb5 Merge pull request #119046 from kerthcet/fix/handle-unschedule-plugins
Fix fitError in Permit plugin not handled perfectly
2023-07-06 21:01:03 -07:00
kerthcet
278a8376e1 Fix: fiterror in permit plugin not handled perfectly
We only added failed plulgins, but actually this will not work unless
we make the status with a fitError because we only copy the failured plugins
to podInfo if it is a fitError

Signed-off-by: kerthcet <kerthcet@gmail.com>
2023-07-07 10:35:59 +08:00
Kubernetes Prow Robot
aeed7da616 Merge pull request #119077 from sanposhiho/follow-up-hint
clean up the implementation around QueueingHintFn
2023-07-06 13:39:15 -07:00
Kensei Nakada
be0db3f93d clean up the implementation around QueueingHintFn 2023-07-06 16:07:39 +00:00
tangwz
1bf2f6c9c0 feat(NodePorts): return Skip status in PreFilter 2023-07-06 08:42:08 +08:00
Kubernetes Prow Robot
293c1b8378 Merge pull request #118025 from AxeZhan/score-metrics
feature(scheduler): plugin_evaluation_total metric support preScore/score
2023-07-05 05:14:56 -07:00
Mengjiao Liu
443bf3b01b scheduler test: Use cmp.Diff instead of reflect.DeepEqual for pkg/scheduler/internal/cache 2023-07-05 16:00:25 +08:00
Kubernetes Prow Robot
0852a2759a Merge pull request #118965 from mengjiao-liu/use-cmp.Diff-scheduler-queue
scheduler test: Use cmp.Diff instead of reflect.DeepEqual for pkg/scheduler/internal/queue/
2023-07-04 05:29:05 -07:00
Heba Elayoty
d548983dbb Use table-driven table for TestPerPodSchedulingMetrics
Signed-off-by: Heba Elayoty <hebaelayoty@gmail.com>
2023-06-29 14:51:55 -07:00
Shingo Omura
d53762ec3a remove unnecessary comment in pkg/scheduler/framework.QueueingHintFn
event is not passed to QueueingHintFn but it exists a comment about it.
event is unnecessary in QueueingHintFn because QueueingHintFn is used in
ClusterEventWithHint and ClusterEventWithHint already have ClusterEvent.

Signed-off-by: Shingo Omura <everpeace@gmail.com>
2023-06-29 21:22:20 +09:00
Kubernetes Prow Robot
3a9c639d5a Merge pull request #118312 from mengjiao-liu/improve-scheduler-cache-test
scheduler: add test name and remove redundant test tables to improve cache_test.go
2023-06-29 02:51:36 -07:00
Mengjiao Liu
72294e4eff scheduler test: Use cmp.Diff instead of reflect.DeepEqual for pkg/scheduler/internal/queue/ 2023-06-29 15:28:42 +08:00
utam0k
ef26510164 Return Skip in PodTopologySpread#PreScore under specific conditions
Signed-off-by: utam0k <k0ma@utam0k.jp>
2023-06-28 12:08:10 +00:00
Kubernetes Prow Robot
52457842d1 Merge pull request #117055 from cyclinder/csi_migration
remove CSI-migration gate
2023-06-28 04:28:31 -07:00