Merge pull request #37431 from liggitt/namespace-leftovers

Automatic merge from submit-queue

hold namespaces briefly before processing deletion

possible fix for #36891

in HA scenarios (either HA apiserver or HA etcd), it is possible for deletion of resources from namespace cleanup to race with creation of objects in the terminating namespace

HA master timeline:
1. "delete namespace n" API call goes to apiserver 1, deletion timestamp is set in etcd
2. namespace controller observes namespace deletion, starts cleaning up resources, lists deployments
3. "create deployment d" API call goes to apiserver 2, gets persisted to etcd
4. apiserver 2 observes namespace deletion, stops allowing new objects to be created
5. namespace controller finishes deleting listed deployments, deletes namespace

HA etcd timeline:
1. "create deployment d" API call goes to apiserver, gets persisted to etcd
2. "delete namespace n" API call goes to apiserver, deletion timestamp is set in etcd
3. namespace controller observes namespace deletion, starts cleaning up resources, lists deployments
4. list call goes to non-leader etcd member that hasn't observed the new deployment or the deleted namespace yet
5. namespace controller finishes deleting the listed deployments, deletes namespace

In both cases, simply waiting to clean up the namespace (either for etcd members to observe objects created at the last second in the namespace, or for other apiservers to observe the namespace move to terminating phase and disallow additional creations) resolves the issue

Possible other fixes:
* do a second sweep of objects before deleting the namespace
* have the namespace controller check for and clean up objects in namespaces that no longer exist
* ...?
This commit is contained in:
Kubernetes Submit Queue
2016-11-30 04:44:31 -08:00
committed by GitHub

View File

@@ -35,6 +35,16 @@ import (
"github.com/golang/glog"
)
const (
// namespaceDeletionGracePeriod is the time period to wait before processing a received namespace event.
// This allows time for the following to occur:
// * lifecycle admission plugins on HA apiservers to also observe a namespace
// deletion and prevent new objects from being created in the terminating namespace
// * non-leader etcd servers to observe last-minute object creations in a namespace
// so this controller's cleanup can actually clean up all objects
namespaceDeletionGracePeriod = 5 * time.Second
)
// NamespaceController is responsible for performing actions dependent upon a namespace phase
type NamespaceController struct {
// client that purges namespace content, must have list/delete privileges on all content
@@ -132,7 +142,9 @@ func (nm *NamespaceController) enqueueNamespace(obj interface{}) {
glog.Errorf("Couldn't get key for object %+v: %v", obj, err)
return
}
nm.queue.Add(key)
// delay processing namespace events to allow HA api servers to observe namespace deletion,
// and HA etcd servers to observe last minute object creations inside the namespace
nm.queue.AddAfter(key, namespaceDeletionGracePeriod)
}
// worker processes the queue of namespace objects.