Use a watch to detect invalid pod status updates in graceful node
shutdown node e2e test. By using a watch, all pod updates will be
captured while the previous logic required polling the api-server which
could miss some intermediate updates.
Signed-off-by: David Porter <david@porter.me>
Terminal pods may continue to report a ready condition of true because
there is a delay in reconciling the ready condition of the containers
from the runtime with the pod status. It should be invalid for kubelet
to report a terminal phase with a true ready condition. To fix the
issue, explicitly override the ready condition to false for terminal
pods during status updates.
Signed-off-by: David Porter <david@porter.me>
The pod worker is the owner of when a container is running or not,
and the start and stop of the probes for a given pod should be
handled during the pod sync loop. This ensures that probes do not
continue running even after eviction.
Because the pod semantics allow lifecycle probes to shorten grace
period, the probe is removed after the containers in a pod are
terminated successfully. As an optimization, if the pod will have
a very short grace period (0 or 1 seconds) we stop the probes
immediately to reduce resource usage during eviction slightly.
After this change, the probe manager is only called by the pod
worker or by the reconcile loop.
In phases/kubelet/WriteConfigToDisk() create a patch
manager for the root patches directory and apply
the user patches with a target "kubeletconfiguration".
With phases/kubelet/WriteConfigToDisk() about to support patches
it is required that the function accepts an io.Writer
where the PatchManager can output to and also a patch directory.
Modify all call sites of the function WriteConfigToDisk()
to properly prepare an pass an io.Writer and patches dir to it.
This results in command phases for init/join/upgrade to pass
the root io.Writer (usually stdout) and the patchesDir populated
either via the config file or --patches flag.
The test is about SCTP and the accessed service only forwarded SCTP
traffic to the server Pod but the client Pod used TCP protocol, so the
test traffic never reached the server Pod and the test NetworkPolicy
was never enforced, which lead to test success even if the default-deny
policy was implemented wrongly. In some cases it may got failure result
if there was an external server having same IP as the cluster IP and
listening to TCP 80 port.
Signed-off-by: Quan Tian <qtian@vmware.com>