Commit Graph

57 Commits

Author SHA1 Message Date
Tobias Giese
ea46c91868 kubeadm: promote member after the static pod manifest was written
Signed-off-by: Tobias Giese <tobias.giese@mercedes-benz.com>
Co-authored-by: Christian Schlotter <christi.schlotter@gmail.com>
2023-01-16 11:11:58 +01:00
Paco Xu
37f5da904b kubeadm: remove nested loops for member promotion 2022-12-17 12:40:15 +08:00
Paco Xu
b3deecfb17 add etcd as learner mode and promote when fg EtcdLearnerMode is enabled
- use etcd backoff to wait; still has many warning messages
Signed-off-by: Paco Xu <paco.xu@daocloud.io>
2022-12-16 21:09:59 +08:00
XinYang
72fd01095d
re-order imports for kubeadm
Signed-off-by: XinYang <xinydev@gmail.com>
2021-08-17 22:40:46 +08:00
Ian Gann
c8431f42d9 kubeadm: Reduce the backoff time of AddMember for etcd
This change optimizes the kubeadm/etcd `AddMember` client-side function
by stopping early in the backoff loop when a peer conflict is found
(indicating the member has already been added to the etcd cluster). In
this situation, the function will stop early and relay a call to
`ListMembers` to fetch the current list of members to return. With this
optimization, front-loading a `ListMembers` call is no longer necessary,
as this functionally returns the equivalent response.

This helps reduce the amount of time taken in situational cases where an
initial client request to add a member is accepted by the server, but
fails client-side.

This situation is possible situationally, such as if network latency
causes the request to timeout after it was sent and accepted by the
cluster. In this situation, the following loop would occur and fail with
an `ErrPeerURLExist` response, and would be stuck until the backoff
timeout was met (roughly ~2min30sec currently).

Testing Done:

* Manual testing with an etcd cluster. Initial "AddMember` call was
  successful, and the etcd manifest file was identical to prior version
  of these files. Subsequent calls to add the same member succeeded
  immediately (retaining idempotency), and the resulting manifest file
  remains identical to previous version as well. The difference, this
  time, is the call finished ~2min25sec faster in an identical test in
  the environment tested with.
2021-08-05 13:11:42 -07:00
XinYang
c2a8cd359f
re-order the imports in kubeadm
Signed-off-by: XinYang <xinydev@gmail.com>

Update cmd/kubeadm/app/cmd/join.go

Co-authored-by: Lubomir I. Ivanov <neolit123@gmail.com>
2021-07-04 16:41:27 +08:00
Jordan Liggitt
2979c3325e Switch to go.etcd.io/etcd/client/v3 2021-06-15 09:53:06 -04:00
Lubomir I. Ivanov
8b9d0dceb1 kubeadm: remove the ClusterStatus object from v1beta3
- Remove the object form v1beta3 and internal type
- Deprecate a couple of phases that were specifically designed / named to
modify the ClusterStatus object
- Adapt logic around annotation vs ClusterStatus retrieval
- Update unit tests
- Run generators
2021-05-17 19:27:36 +03:00
Benjamin Elder
56e092e382 hack/update-bazel.sh 2021-02-28 15:17:29 -08:00
Lubomir I. Ivanov
ebf163684a kubeadm: adjust the logic around etcd data directory creation
- Ensure the directory is created with 0700 via a new function
called CreateDataDirectory().
- Call this function in the init phases instead of the manual call
to MkdirAll.
- Call this function when joining control-plane nodes with local etcd.

If the directory creation is left to the kubelet via the
static Pod hostPath mounts, it will end up with 0755
which is not desired.
2020-09-03 18:38:54 +03:00
SataQiu
800dd19fc2 increase robustness for kubeadm etcd operations
Signed-off-by: SataQiu <1527062125@qq.com>
2020-06-15 22:43:21 +08:00
Kubernetes Prow Robot
02637bb250
Merge pull request #91145 from tnqn/kubeadm-reset-error
kubeadm: skip removing last etcd member in reset phase
2020-05-27 15:04:01 -07:00
Quan Tian
9cc416e7df kubeadm: do not remove the only remaining etcd member during reset
If this is the only remaining stacked etcd member in the cluster,
calling RemoveMember() is not needed.
2020-05-21 02:12:36 +08:00
Davanum Srinivas
07d88617e5
Run hack/update-vendor.sh
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-05-16 07:54:33 -04:00
Davanum Srinivas
442a69c3bd
switch over k/k to use klog v2
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-05-16 07:54:27 -04:00
Lubomir I. Ivanov
1c430ff30f kubeadm: fix flakes when performing etcd MemberAdd on slower setups
In slower setups it can take more time for the existing cluster
to be in a healthy state, so the existing backoff of ~50 seconds
is apparently not sufficient.

The client dial can also fail for similar reasons.

Improve kubeadm's join toleration of adding new etcd members.
Wrap both the client dial and member add in a longer backoff
(up to ~200 seconds).

This particular change should be backported to the support skew.
In a future change for master, all etcd client operations should be
make consistent so that the etcd logic is in a sane state.
2020-04-30 18:53:29 +03:00
SataQiu
004a61a46c kubeadm: fix some mistakes about log output 2020-04-15 14:32:46 +08:00
Rostislav M. Georgiev
c8b7e5739c kubeadm: Use image tag as version of stacked etcd
kubeadm uses image tags (such as `v3.4.3-0`) to specify the version of
etcd. However, the upgrade code in kubeadm uses the etcd client API to
fetch the currently deployed version. The result contains only the etcd
version without the additional information (such as image revision) that
is normally found in the tag. As a result it would refuse an upgrade
where the etcd versions match and the only difference is the image
revision number (`v3.4.3-0` to `v3.4.3-1`).

To fix the above issue, the following changes are done:
- Replace the existing etcd version querying code, that uses the etcd
  client library, with code that returns the etcd image tag from the
  local static pod manifest file.
- If an etcd `imageTag` is specified in the ClusterConfiguration during
  upgrade, use that tag instead. This is done regardless if the tag was
  specified in the configuration stored in the cluster or with a new
  configuration supplied by the `--config` command line parameter.
  If no custom tag is specified, kubeadm will select one depending on
  the desired Kubernetes version.
- `kubeadm upgrade plan` no longer prints upgrade information about
  external etcd. It's the user's responsibility to manage it in that
  case.

Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
2020-03-30 16:28:45 +03:00
Rafael Fernández López
3e59a0651f
kubeadm: optimize the upgrade path from ClusterStatus to annotations
When doing the very first upgrade from a cluster that contains the
source of truth in the ClusterStatus struct, the new kubeadm logic
will try to retrieve this information from annotations.

This changeset adds to both etcd and apiserver endpoint retrieval the
special case in which they won't retry if we are in such cases. The
logic will retry if we find any unknown error, but will not retry in
the following cases:

- etcd annotations do not contain etcd endpoints, but the overall list
  of etcd pods is greater than 0. This means that we listed at least
  one etcd pod, but they are missing the annotation.

- API server annotation is not found on the api server pod for a given
  node name, but no errors aside from that one were found. This means
  that the API server pod is present, but is missing the annotation.

In both cases there is no point in retrying, and so, this speeds up the
upgrade path when coming from a previous existing cluster.
2020-02-20 12:19:05 +01:00
Rafael Fernández López
b140c5d64b
kubeadm: remove ClusterStatus dependency
While `ClusterStatus` will be maintained and uploaded, it won't be
used by the internal `kubeadm` logic in order to determine the etcd
endpoints anymore.

The only exception is during the first upgrade cycle (`kubeadm upgrade
apply`, `kubeadm upgrade node`), in which we will fallback to the
ClusterStatus to let the upgrade path add the required annotations to
the newly created static pods.
2020-02-20 12:18:56 +01:00
Lubomir I. Ivanov
a027c379f7 kubeadm: increase timeouts in the etcd client
- Extend the exponential backoff for add/remove/... retry to
11 steps ~=106 seconds. From experiments for 3 and more members
the race can take more that ~=26 seconds.
- Increase the dialTimeout for client creation to 40 seconds.
20 seconds seems racy for 3 and more members.
2020-01-25 00:48:05 +02:00
Lubomir I. Ivanov
5e0c0779a1 kubeadm: handle multiple members without names during concurrent join
For the etcd client, amend AddMember() to handle a very
rare bug when multiple members can end up with the same
name. Match the member peer address and assign it the name of
the member we are adding. For the rest of the members with missing
names use their member IDs as name. The etcd node is not disrupted
by the unknown names.

The important aspects are:
- The number of members of the initial cluster must match
the members in the cluster.
- The member we are current adding is present in the initial cluster.
2020-01-25 00:48:05 +02:00
SataQiu
72559ec693 kubeadm upgrades always persist the etcd backup for stacked 2020-01-06 12:34:28 +08:00
fabriziopandini
0573a2227f add retry to etcd operations 2019-11-14 09:27:03 +01:00
Wenjia Zhang
660b17d0ae Pin dependencies and update vendors 2019-10-24 14:09:24 -07:00
Wenjia Zhang
9ead9373f3 Resolve uncompatibility from update: etcd CAFile -> TrustedCAFIle 2019-10-24 14:09:24 -07:00
Wenjia Zhang
3b274fad2a Replace github.com/coreos/etcd by go.etcd.io/etcd 2019-10-24 14:09:24 -07:00
Kubernetes Prow Robot
7e060eec79
Merge pull request #81908 from tedyu/etcd-cluster-avail
Remove Client#ClusterAvailable from interface
2019-09-10 17:42:46 -07:00
Gyuho Lee
93b9545f48 vendor: update with "update-vendor.sh" script
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-08-29 08:46:02 -07:00
Gyuho Lee
eb1509a1d3 kubeadm/app/util/etcd: : block etcd client creation until connection is up
The new etcd balancer (>3.3.14, 3.4.0) uses an asynchronous resolver for
endpoints. Without "WithBlock", the client may return before the
connection is up.

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-08-29 08:38:29 -07:00
Ted Yu
2167321adb Remove Client#ClusterAvailable from interface 2019-08-29 07:40:34 -07:00
Lubomir I. Ivanov
25668531f7 kubeadm: run MemberAdd/Remove for etcd clients with exp-backoff retry
When adding a new etcd member the etcd cluster can enter a state
of vote, where any new members added at the exact same time will
fail with an error right away.

Implement exponential backoff retry around the MemberAdd call.

This solves a kubeadm problem when concurrently joining
control-plane nodes with stacked etcd members.

From experiment, a few retries with milliseconds apart are
sufficient to achieve the concurrent join of a 3xCP cluster.

Apply the same backoff to MemberRemove in case the concurrent
removal of members fails for similar reasons.
2019-07-03 03:26:30 +03:00
Lubomir I. Ivanov
6f6b364b9c kubeadm: update output of init, join reset commands
- move most unrelated to phases output to klog.V(1)
- rename some prefixes for consistency - e.g.
[kubelet] -> [kubelet-start]
- control-plane-prepare: print details for each generated CP
component manifest.
- uppercase the info text for all "[reset].." lines
- modify the text for one line in reset
2019-03-06 03:17:35 +02:00
RA489
a0ee4b471d Refactor etcd client function have same signatures in etcd.go 2019-02-25 12:54:12 +05:30
pytimer
83f5296a14 kubeadm: Remove etcd member from the etcd cluster when reset the control plane node 2019-02-22 09:13:01 +08:00
Rostislav M. Georgiev
80e2a3cf07 kubeadm: reduce the usage of InitConfiguration
For historical reasons InitConfiguration is used almost everywhere in kubeadm
as a carrier of various configuration components such as ClusterConfiguration,
local API server endpoint, node registration settings, etc.

Since v1alpha2, InitConfiguration is meant to be used solely as a way to supply
the kubeadm init configuration from a config file. Its usage outside of this
context is caused by technical dept, it's clunky and requires hacks to fetch a
working InitConfiguration from the cluster (as it's not stored in the config
map in its entirety).

This change is a small step towards removing all unnecessary usages of
InitConfiguration. It reduces its usage by replacing it in some places with
some of the following:

- ClusterConfiguration only.
- APIEndpoint (as local API server endpoint).
- NodeRegistrationOptions only.
- Some combinations of the above types, or if single fields from them are used,
  only those field.

Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
2019-01-28 12:21:01 +02:00
Rafael Fernández López
b4cb3fd37c
kubeadm: wait for the etcd cluster to be available when growing it
When the etcd cluster grows we need to explicitly wait for it to be
available. This ensures that we are not implicitly doing this in
following steps when they try to access the apiserver.
2019-01-18 12:04:39 +01:00
fabriziopandini
684b80f8b8 cleanup kubeadm etcd client 2019-01-03 12:21:17 +01:00
pytimer
48d757b6bb kubeadm: fixed etcd sync endpoints 2018-12-11 10:03:22 +08:00
yuexiao-wang
39f71245b3 kubeadm: fixed cleanup upgrade from no-TLS etcd to TLS etcd
Signed-off-by: yuexiao-wang <wang.yuexiao@zte.com.cn>
2018-12-08 08:38:03 +08:00
yuexiao-wang
5610ac3c9c cleanup upgrade from non-TLS etcd to TLS etcd
Signed-off-by: yuexiao-wang <wang.yuexiao@zte.com.cn>
2018-12-05 23:10:13 +08:00
fabriziopandini
8a53031b4e Fix etcd connect for join --control-plane 2018-11-22 17:17:27 +01:00
Dane LeBlanc
99887716c5 Fix kubeadm etcd manifests to use brackets around IPv6 addrs
When 'kubeadm init ...' is used with an IPv6 kubeadm configuration,
kubeadm currently generates an etcd.yaml manifest that uses IP:port
combinatins where the IP is an IPv6 address, but it is not enclosed
in square brackets, e.g.:
    - --advertise-client-urls=https://fd00:20::2:2379
For IPv6 advertise addresses, this should be of the form:
    - --advertise-client-urls=https://[fd00:20::2]:2379

The lack of brackets around IPv6 addresses in cases like this is
causing failures to bring up IPv6-only clusters with Kubeadm as
described in kubernetes/kubeadm Issues #1212.

This format error is fixed by using net.JoinHostPort() to generate
URLs as shown above.

Fixes kubernetes/kubeadm Issue #1212
2018-11-16 15:12:29 -05:00
fabriziopandini
7f1b2a62a7 fix kubeadm upgrade 2018-11-13 09:14:16 +01:00
Davanum Srinivas
954996e231
Move from glog to klog
- Move from the old github.com/golang/glog to k8s.io/klog
- klog as explicit InitFlags() so we add them as necessary
- we update the other repositories that we vendor that made a similar
change from glog to klog
  * github.com/kubernetes/repo-infra
  * k8s.io/gengo/
  * k8s.io/kube-openapi/
  * github.com/google/cadvisor
- Entirely remove all references to glog
- Fix some tests by explicit InitFlags in their init() methods

Change-Id: I92db545ff36fcec83afe98f550c9e630098b3135
2018-11-10 07:50:31 -05:00
yuexiao-wang
cc303c8774 [kubeadm/app/]switch to github.com/pkg/errors
Signed-off-by: yuexiao-wang <wang.yuexiao@zte.com.cn>
2018-10-30 16:23:24 +08:00
fabriziopandini
fbd6d2d68a autogenerated 2018-10-27 18:04:44 +02:00
fabriziopandini
d30492ee8f kubeadm graduate kubelet-start phase 2018-10-27 18:04:33 +02:00
yuexiao-wang
f15410692e [kubeadm/app/util]switch to github.com/pkg/errors
Signed-off-by: yuexiao-wang <wang.yuexiao@zte.com.cn>
2018-10-26 15:17:21 +08:00
Lucas Käldström
52f0591ad9
Automated rename from MasterConfiguration to InitConfiguration 2018-07-09 04:55:02 +03:00