Merge pull request #58107 from ironcladlou/quota-controller-deadlock

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix quota controller worker deadlock

The resource quota controller worker pool can deadlock when:

* Worker goroutines are idle waiting for work from queues
* The Sync() method detects discovery updates to apply

The problem is workers acquire a read lock while idle, making write lock
acquisition dependent upon the presence of work in the queues.

The Sync() method blocks on a pending write lock acquisition and won't unblock
until every existing worker processes one item from their queue and releases
their read lock. While the Sync() method's lock is pending, all new read lock
acquisitions will block; if a worker does process work and release its lock, it
will then become blocked on a read lock acquisition; they become blocked on
Sync(). This can easily deadlock all the workers processing from one queue while
any workers on the other queue remain blocked waiting for work.

Fix the deadlock by refactoring workers to acquire a read lock *after* work is
popped from the queue. This allows writers to get locks while workers are idle,
while preserving the worker pause semantics necessary to allow safe sync.

```release-note
Fixes an infrequent problem causing the resource quota controller to become stuck in clusters with low ResourceQuota churn, potentially preventing quota from being recalculated until the controller is restarted or until bursts of diverse quota activity unstick the controller.
```

/cc @kubernetes/sig-api-machinery-bugs
This commit is contained in:
Kubernetes Submit Queue 2018-01-10 15:59:50 -08:00 committed by GitHub
commit 15b1d165fb
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -237,15 +237,13 @@ func (rq *ResourceQuotaController) addQuota(obj interface{}) {
// worker runs a worker thread that just dequeues items, processes them, and marks them done.
func (rq *ResourceQuotaController) worker(queue workqueue.RateLimitingInterface) func() {
workFunc := func() bool {
rq.workerLock.RLock()
defer rq.workerLock.RUnlock()
key, quit := queue.Get()
if quit {
return true
}
defer queue.Done(key)
rq.workerLock.RLock()
defer rq.workerLock.RUnlock()
err := rq.syncHandler(key.(string))
if err == nil {
queue.Forget(key)