The pInfo.Timestamp is refreshed but the sort in activeQ or podBackoffQ is not be updated when pod is already present in the backoff or active queue.
AddUnschedulableIfNotPresent() return error if pod is already present in the backoff or active queue, and there is no re-add.
So refresh pInfo.Timestamp when the pod is not present in the three sub-queues, otherwise need to update the order of the pod in the active or backoff queue, for example p.activeQ.Update(pInfo)
- rename `UpdateNominatedPodForNode` to `AddNominatedPod`
- promote `update` to `UpdateNominatedPod`
- anonymous lock in nominatedMap
- pass PodNominator as an option to NewFramework
- use in-cache Pod instead of real-time Pod (by calling API server) to mark it as unschedulable
in internal schedulingQ
- remove the backoff logic as now we don't call API server
- the whole logic is changed to a synchronous call
In some cases, an Update event with no "NominatedNode" present is received right
after a node("NominatedNode") is reserved for this pod in memory.
If we go updating (delete and add) it, it actually un-reserves the node since
the newPod doesn't carry sped.status.nominatedNode.
In this case, during this time other low-priority pods have chances to take space which
was reserved for the nominatedPod.
The function AddUnschedulableIfNotPresent is responsible for
initializing or updating backoff timers for pods that could not be
scheduled. The helper function backoffPod does that work, but was not
being called in all cases.
This moves that call to be (mostly) unconditional, while cleaning up
comments and error handling.
- add incremental scheduling cycle
- instead of set a flag on move reqeust, we cache current scheduling
cycle in moveRequestCycle
- when unschedulable pods are added back, compare its cycle with
moveRequestCycle to decide whether it should be added into active queue
or not
When starvation heppens:
- a lot of unschedulable pods exists in the head of queue
- because condition.LastTransitionTime is updated only when condition.Status changed
- (this means that once a pod is marked unschedulable, the field never updated until the pod successfuly scheduled.)
What was changed:
- condition.LastProbeTime is updated everytime when pod is determined
unschedulable.
- changed sort function so to use LastProbeTime to avoid starvation
described above
Consideration:
- This changes increases k8s API server load because it updates Pod.status whenever scheduler decides it as
unschedulable.
Signed-off-by: Shingo Omura <everpeace@gmail.com>