mirror of
https://github.com/k3s-io/kubernetes.git
synced 2025-07-22 11:21:47 +00:00
Merge pull request #65987 from Random-Liu/fix-pod-worker-deadlock
Automatic merge from submit-queue (batch tested with PRs 65987, 65962). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix pod worker deadlock.
Preemption will stuck forever if `killPodNow` timeout once. The sequence is:
* `killPodNow` create the response channel (size 0) and send it to pod worker.
* `killPodNow` timeout and return.
* Pod worker finishes killing the pod, and tries to send back response via the channel.
However, because the channel size is 0, and the receiver has exited, the pod worker will stuck forever.
In @jingxu97's case, this causes a critical system pod (apiserver) unable to come up, because the csi pod can't be preempted.
I checked the history, and the bug was introduced 2 years ago 6fefb428c1
.
I think we should at least cherrypick this to `1.11` since preemption is beta and enabled by default in 1.11.
@kubernetes/sig-node-bugs @derekwaynecarr @dashpole @yujuhong
Signed-off-by: Lantao Liu <lantaol@google.com>
```release-note
none
```
This commit is contained in:
commit
55620e2be6
@ -306,7 +306,7 @@ func killPodNow(podWorkers PodWorkers, recorder record.EventRecorder) eviction.K
|
|||||||
type response struct {
|
type response struct {
|
||||||
err error
|
err error
|
||||||
}
|
}
|
||||||
ch := make(chan response)
|
ch := make(chan response, 1)
|
||||||
podWorkers.UpdatePod(&UpdatePodOptions{
|
podWorkers.UpdatePod(&UpdatePodOptions{
|
||||||
Pod: pod,
|
Pod: pod,
|
||||||
UpdateType: kubetypes.SyncPodKill,
|
UpdateType: kubetypes.SyncPodKill,
|
||||||
|
Loading…
Reference in New Issue
Block a user