When doing the very first upgrade from a cluster that contains the
source of truth in the ClusterStatus struct, the new kubeadm logic
will try to retrieve this information from annotations.
This changeset adds to both etcd and apiserver endpoint retrieval the
special case in which they won't retry if we are in such cases. The
logic will retry if we find any unknown error, but will not retry in
the following cases:
- etcd annotations do not contain etcd endpoints, but the overall list
of etcd pods is greater than 0. This means that we listed at least
one etcd pod, but they are missing the annotation.
- API server annotation is not found on the api server pod for a given
node name, but no errors aside from that one were found. This means
that the API server pod is present, but is missing the annotation.
In both cases there is no point in retrying, and so, this speeds up the
upgrade path when coming from a previous existing cluster.
While `ClusterStatus` will be maintained and uploaded, it won't be
used by the internal `kubeadm` logic in order to determine the etcd
endpoints anymore.
The only exception is during the first upgrade cycle (`kubeadm upgrade
apply`, `kubeadm upgrade node`), in which we will fallback to the
ClusterStatus to let the upgrade path add the required annotations to
the newly created static pods.
The test "should enforce egress policy allowing traffic to a server in a
different namespace based on PodSelector and NamespaceSelector
[Feature:NetworkPolicy]" is flaky because it doesn't wait for the server
Pod to be ready before testing traffic via its service, then even the
NetworkPolicy allows it, the SYN packets will be rejected by iptables
because the service has no endpoints at that moment.
This PR fixes it by making it wait for Pods to be ready like other
tests.
This is useful for logs from daemonset pods because for those it is
often relevant which node they ran on because they interact with
resources or other pods on the host.
To keep the log prefix short, it gets limited to a maximum length of
10 characters.
Quite a few images are only used a few times in a few tests. Thus,
the images are being centralized into the agnhost image, reducing
the number of images that have to be pulled and used.
This PR replaces the usage of the following images with agnhost:
- resource-consumer-controller
- test-webserver
The pod, container, and emptyDir volumes can all trigger evictions
when their limits are breached. To ensure that administrators can
alert on these type of evictions, update kubelet_evictions to include
the following signal types:
* ephemeralcontainerfs.limit - container ephemeral storage breaches its limit
* ephemeralpodfs.limit - pod ephemeral storage breaches its limit
* emptydirfs.limit - pod emptyDir storage breaches its limit
The StatefulSet controller cleans up ControllerRevisions at the end of
the reconcile loop. If something goes wrong during reconcile, it bails
out without actually performing this step. This commit moves the cleanup
to a deferred function call to guarantee it will be executed.
Fixes issue: https://github.com/kubernetes/kubernetes/issues/85690