Watches against etcd in the API server can hang forever if the etcd
cluster loses quorum, e.g. the majority of nodes crashes. This fix
improves responsiveness (detection and reaction time) of API server
watches against etcd in some rare (but still possible) edge cases so
that watches are terminated with `"etcdserver: no leader"
(ErrNoLeader)`.
Implementation behavior described by jingyih:
```
The etcd server waits until it cannot find a leader for 3 election
timeouts to cancel existing streams. 3 is currently a hard coded
constant. The election timeout defaults to 1000ms.
If the cluster is healthy, when the leader is stopped, the leadership
transfer should be smooth. (leader transfers its leadership before
stopping). If leader is hard killed, other servers will take an election
timeout to realize leader lost and start campaign.
```
For further details, discussion and validation see
https://github.com/kubernetes/kubernetes/issues/89488#issuecomment-606491110
and https://github.com/etcd-io/etcd/issues/8980.
Closes: https://github.com/kubernetes/kubernetes/issues/89488
Signed-off-by: Michael Gasch <mgasch@vmware.com>
The cpumanager file-based state backend was obsoleted since few
releases, aving the cpumanager moved to the checkpointmanager common
infrastructure.
The old test checking compatibility to/from the old format is
also no longer needed, because the checkpoint format is stable
(see
https://github.com/kubernetes/kubernetes/tree/master/pkg/kubelet/checkpointmanager).
Signed-off-by: Francesco Romani <fromani@redhat.com>
The "error waiting for expected CSI calls" is redundant because it's
immediately followed by checking that error with:
framework.ExpectNoError(err, "while waiting for all CSI calls")
The mock driver gets instructed to return a ResourceExhausted error
for the first CreateVolume invocation via the storage class
parameters.
How this should be handled depends on the situation: for normal
volumes, we just want external-scheduler to retry. For late binding,
we want to reschedule the pod. It also depends on topology support.
The code became obsolete with the introduction of parseMockLogs
because that will retrieve the log itself. For debugging of a running
test the normal pod output logging is sufficient.
parseMockLogs is called potentially multiple times while waiting for
output. Dumping all CSI calls each time is quite verbose and
repetitive. To verify what the driver has done already, the normal
capturing of the container log can be used instead:
csi-mockplugin-0/mock@127.0.0.1: gRPCCall: {"Method":"/csi.v1.Node/NodePublishVolume","Request"...