The KCM is moving to means of only singing apiserver (kubelet) client
and kubelet serving certificates. See:
https://github.com/kubernetes/enhancements/blob/master/keps/sig-auth/20190607-certificates-api.md#signers
Up until now the experimental kubeadm functionality '--use-api'
under "kubeadm alpha certs renew" was using the KCM to sign *any*
certficate as long as the KCM has the root CA cert/key.
Post discussions with the kubeadm maintainers, it was decided that
this functionality should be removed from kubeadm due to the
requirement to have external signers for renewing the common
control-plane certificates that kubeadm manages.
The status manager syncBatch() method processes the current state
of the cache, which should include all entries in the channel. Flush
the channel before we call a batch to avoid unnecessary work and
to unblock pod workers when the node is congested.
Discovered while investigating long shutdown intervals on the node
where the status channel stayed full for tens of seconds.
Add a for loop around the select statement to avoid unnecessary
invocations of the wait.Forever closure each time.
When constructing the API status of a pod, if the pod is marked for
deletion no containers should be started. Previously, if a container
inside of a terminating pod failed to start due to a container
runtime error (that populates reasonCache) the reasonCache would
remain populated (it is only updated by syncPod for non-terminating
pods) and the delete action on the pod would be delayed until the
reasonCache entry expired due to other pods.
This dramatically reduces the amount of time the Kubelet waits to
delete pods that are terminating and encountered a container runtime
error.
After a pod reaches a terminal state and all containers are complete
we can delete the pod from the API server. The dispatchWork method
needs to wait for all container status to be available before invoking
delete. Even after the worker stops, status updates will continue to
be delivered and the sync handler will continue to sync the pods, so
dispatchWork gets multiple opportunities to see status.
The previous code assumed that a pod in Failed or Succeeded had no
running containers, but eviction or deletion of running pods could
still have running containers whose status needed to be reported.
This modifies earlier test to guarantee that the "fallback" exit
code 137 is never reported to match the expectation that all pods
exit with valid status for all containers (unless some exceptional
failure like eviction were to occur while the test is running).
The kubelet must not allow a container that was reported failed in a
restartPolicy=Never pod to be reported to the apiserver as success.
If a client deletes a restartPolicy=Never pod, the dispatchWork and
status manager race to update the container status. When dispatchWork
(specifically podIsTerminated) returns true, it means all containers
are stopped, which means status in the container is accurate. However,
the TerminatePod method then clears this status. This results in a
pod that has been reported with status.phase=Failed getting reset to
status.phase.Succeeded, which is a violation of the guarantees around
terminal phase.
Ensure the Kubelet never reports that a container succeeded when it
hasn't run or been executed by guarding the terminate pod loop from
ever reporting 0 in the absence of container status.
The kubelet can race when a pod is deleted and report that a container succeeded
when it instead failed, and thus the pod is reported as succeeded. Create an e2e
test that demonstrates this failure.
This first version of `kubectl alpha debug` is an import of the existing
kubectl-debug plugin, which supports adding ephemeral containers to
running pods. This attempts to follow patterns used by other kubectl
commands such as run, exec and scale.
After moving Permit() to the scheduling cycle test PermitPlugin should
no longer wait inside Permit() for another pod to enter Permit() and become waiting pod.
In the past this was a way to make test work regardless of order in
which pods enter Permit(), but now only one Permit() can be executed at
any given moment and waiting for another pod to enter Permit() inside
Permit() leads to timeouts.
In this change waitAndRejectPermit and waitAndAllowPermit flags make first
pod to enter Permit() a waiting pod and second pod to enter Permit()
either rejecting or allowing pod.
Mentioned in #88469
- Uses correct pause image for Windows
- Omits systemd specific flags
- Common build flags function to be used by Linux and Windows
- Uses user configured image repository for Windows pause image