Merge pull request #65987 from Random-Liu/fix-pod-worker-deadlock

Automatic merge from submit-queue (batch tested with PRs 65987, 65962). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix pod worker deadlock.

Preemption will stuck forever if `killPodNow` timeout once. The sequence is:
* `killPodNow` create the response channel (size 0) and send it to pod worker.
* `killPodNow` timeout and return.
*  Pod worker finishes killing the pod, and tries to send back response via the channel.

However, because the channel size is 0, and the receiver has exited, the pod worker will stuck forever.

In @jingxu97's case, this causes a critical system pod (apiserver) unable to come up, because the csi pod can't be preempted.

I checked the history, and the bug was introduced 2 years ago 6fefb428c1.

I think we should at least cherrypick this to `1.11` since preemption is beta and enabled by default in 1.11.

@kubernetes/sig-node-bugs @derekwaynecarr @dashpole @yujuhong 
Signed-off-by: Lantao Liu <lantaol@google.com>

```release-note
none
```
This commit is contained in:
Kubernetes Submit Queue 2018-07-09 16:53:59 -07:00 committed by GitHub
commit 55620e2be6
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -306,7 +306,7 @@ func killPodNow(podWorkers PodWorkers, recorder record.EventRecorder) eviction.K
type response struct {
err error
}
ch := make(chan response)
ch := make(chan response, 1)
podWorkers.UpdatePod(&UpdatePodOptions{
Pod: pod,
UpdateType: kubetypes.SyncPodKill,