This change optimizes the kubeadm/etcd `AddMember` client-side function
by stopping early in the backoff loop when a peer conflict is found
(indicating the member has already been added to the etcd cluster). In
this situation, the function will stop early and relay a call to
`ListMembers` to fetch the current list of members to return. With this
optimization, front-loading a `ListMembers` call is no longer necessary,
as this functionally returns the equivalent response.
This helps reduce the amount of time taken in situational cases where an
initial client request to add a member is accepted by the server, but
fails client-side.
This situation is possible situationally, such as if network latency
causes the request to timeout after it was sent and accepted by the
cluster. In this situation, the following loop would occur and fail with
an `ErrPeerURLExist` response, and would be stuck until the backoff
timeout was met (roughly ~2min30sec currently).
Testing Done:
* Manual testing with an etcd cluster. Initial "AddMember` call was
successful, and the etcd manifest file was identical to prior version
of these files. Subsequent calls to add the same member succeeded
immediately (retaining idempotency), and the resulting manifest file
remains identical to previous version as well. The difference, this
time, is the call finished ~2min25sec faster in an identical test in
the environment tested with.
- Remove the object form v1beta3 and internal type
- Deprecate a couple of phases that were specifically designed / named to
modify the ClusterStatus object
- Adapt logic around annotation vs ClusterStatus retrieval
- Update unit tests
- Run generators
- Ensure the directory is created with 0700 via a new function
called CreateDataDirectory().
- Call this function in the init phases instead of the manual call
to MkdirAll.
- Call this function when joining control-plane nodes with local etcd.
If the directory creation is left to the kubelet via the
static Pod hostPath mounts, it will end up with 0755
which is not desired.
In slower setups it can take more time for the existing cluster
to be in a healthy state, so the existing backoff of ~50 seconds
is apparently not sufficient.
The client dial can also fail for similar reasons.
Improve kubeadm's join toleration of adding new etcd members.
Wrap both the client dial and member add in a longer backoff
(up to ~200 seconds).
This particular change should be backported to the support skew.
In a future change for master, all etcd client operations should be
make consistent so that the etcd logic is in a sane state.
kubeadm uses image tags (such as `v3.4.3-0`) to specify the version of
etcd. However, the upgrade code in kubeadm uses the etcd client API to
fetch the currently deployed version. The result contains only the etcd
version without the additional information (such as image revision) that
is normally found in the tag. As a result it would refuse an upgrade
where the etcd versions match and the only difference is the image
revision number (`v3.4.3-0` to `v3.4.3-1`).
To fix the above issue, the following changes are done:
- Replace the existing etcd version querying code, that uses the etcd
client library, with code that returns the etcd image tag from the
local static pod manifest file.
- If an etcd `imageTag` is specified in the ClusterConfiguration during
upgrade, use that tag instead. This is done regardless if the tag was
specified in the configuration stored in the cluster or with a new
configuration supplied by the `--config` command line parameter.
If no custom tag is specified, kubeadm will select one depending on
the desired Kubernetes version.
- `kubeadm upgrade plan` no longer prints upgrade information about
external etcd. It's the user's responsibility to manage it in that
case.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
When doing the very first upgrade from a cluster that contains the
source of truth in the ClusterStatus struct, the new kubeadm logic
will try to retrieve this information from annotations.
This changeset adds to both etcd and apiserver endpoint retrieval the
special case in which they won't retry if we are in such cases. The
logic will retry if we find any unknown error, but will not retry in
the following cases:
- etcd annotations do not contain etcd endpoints, but the overall list
of etcd pods is greater than 0. This means that we listed at least
one etcd pod, but they are missing the annotation.
- API server annotation is not found on the api server pod for a given
node name, but no errors aside from that one were found. This means
that the API server pod is present, but is missing the annotation.
In both cases there is no point in retrying, and so, this speeds up the
upgrade path when coming from a previous existing cluster.
While `ClusterStatus` will be maintained and uploaded, it won't be
used by the internal `kubeadm` logic in order to determine the etcd
endpoints anymore.
The only exception is during the first upgrade cycle (`kubeadm upgrade
apply`, `kubeadm upgrade node`), in which we will fallback to the
ClusterStatus to let the upgrade path add the required annotations to
the newly created static pods.
- Extend the exponential backoff for add/remove/... retry to
11 steps ~=106 seconds. From experiments for 3 and more members
the race can take more that ~=26 seconds.
- Increase the dialTimeout for client creation to 40 seconds.
20 seconds seems racy for 3 and more members.
For the etcd client, amend AddMember() to handle a very
rare bug when multiple members can end up with the same
name. Match the member peer address and assign it the name of
the member we are adding. For the rest of the members with missing
names use their member IDs as name. The etcd node is not disrupted
by the unknown names.
The important aspects are:
- The number of members of the initial cluster must match
the members in the cluster.
- The member we are current adding is present in the initial cluster.
The new etcd balancer (>3.3.14, 3.4.0) uses an asynchronous resolver for
endpoints. Without "WithBlock", the client may return before the
connection is up.
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
When adding a new etcd member the etcd cluster can enter a state
of vote, where any new members added at the exact same time will
fail with an error right away.
Implement exponential backoff retry around the MemberAdd call.
This solves a kubeadm problem when concurrently joining
control-plane nodes with stacked etcd members.
From experiment, a few retries with milliseconds apart are
sufficient to achieve the concurrent join of a 3xCP cluster.
Apply the same backoff to MemberRemove in case the concurrent
removal of members fails for similar reasons.
- move most unrelated to phases output to klog.V(1)
- rename some prefixes for consistency - e.g.
[kubelet] -> [kubelet-start]
- control-plane-prepare: print details for each generated CP
component manifest.
- uppercase the info text for all "[reset].." lines
- modify the text for one line in reset
For historical reasons InitConfiguration is used almost everywhere in kubeadm
as a carrier of various configuration components such as ClusterConfiguration,
local API server endpoint, node registration settings, etc.
Since v1alpha2, InitConfiguration is meant to be used solely as a way to supply
the kubeadm init configuration from a config file. Its usage outside of this
context is caused by technical dept, it's clunky and requires hacks to fetch a
working InitConfiguration from the cluster (as it's not stored in the config
map in its entirety).
This change is a small step towards removing all unnecessary usages of
InitConfiguration. It reduces its usage by replacing it in some places with
some of the following:
- ClusterConfiguration only.
- APIEndpoint (as local API server endpoint).
- NodeRegistrationOptions only.
- Some combinations of the above types, or if single fields from them are used,
only those field.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
When the etcd cluster grows we need to explicitly wait for it to be
available. This ensures that we are not implicitly doing this in
following steps when they try to access the apiserver.
When 'kubeadm init ...' is used with an IPv6 kubeadm configuration,
kubeadm currently generates an etcd.yaml manifest that uses IP:port
combinatins where the IP is an IPv6 address, but it is not enclosed
in square brackets, e.g.:
- --advertise-client-urls=https://fd00:20::2:2379
For IPv6 advertise addresses, this should be of the form:
- --advertise-client-urls=https://[fd00:20::2]:2379
The lack of brackets around IPv6 addresses in cases like this is
causing failures to bring up IPv6-only clusters with Kubeadm as
described in kubernetes/kubeadm Issues #1212.
This format error is fixed by using net.JoinHostPort() to generate
URLs as shown above.
Fixes kubernetes/kubeadm Issue #1212
- Move from the old github.com/golang/glog to k8s.io/klog
- klog as explicit InitFlags() so we add them as necessary
- we update the other repositories that we vendor that made a similar
change from glog to klog
* github.com/kubernetes/repo-infra
* k8s.io/gengo/
* k8s.io/kube-openapi/
* github.com/google/cadvisor
- Entirely remove all references to glog
- Fix some tests by explicit InitFlags in their init() methods
Change-Id: I92db545ff36fcec83afe98f550c9e630098b3135