docs: Update proposal based on feedback

This commit is contained in:
Derek Parker 2016-04-18 09:39:53 -07:00
parent 2588d2516b
commit 73973b429d

View File

@ -31,25 +31,67 @@ Documentation for other releases can be found at
## Abstract
In a self-hosted Kubernetes deployment, we have the initial bootstrap problem. This proposal presents a solution to the kubelet bootstrap, and assumes a functioning control plane, and a kubelet that can securely contact the API server.
In a self-hosted Kubernetes deployment, we have the initial bootstrap problem.
When running self-hosted components, there needs to be a mechanism for pivoting
from the initial bootstrap state to the kubernetes-managed (self-hosted) state.
In the case of a self-hosted kubelet, this means pivoting from the initial
kubelet defined and run on the host, to the kubelet pod which has been scheduled
to the node.
## Background
This proposal presents a solution to the kubelet bootstrap, and assumes a
functioning control plane (e.g. an apiserver, controller-manager, scheduler, and
etcd cluster), and a kubelet that can securely contact the API server.
Our current approach to a self-hosted kubelet is a "pivot" style installation. This procedure assumes a short-lived “bootstrap” kubelet will run and start a long-running “self-hosted” kubelet. Once the self-hosted kubelet is running the bootstrap kubelet will exit. As part of this, we propose introducing a new `--bootstrap` flag to the kubelet. The behaviour of that flag will be explained in detail below.
## Background and Motivation
In order to understand the goals of this proposal, one must understand what
"self-hosted" means. This proposal defines "self-hosted" as a kubernetes cluster
that is installed and managed by the kubernetes installation itself. This means
that each kubernetes component is described by a kubernetes manifest (Daemonset,
Deployment, etc) and can be updated via kubernetes.
The overall goal of this proposal is to make kubernetes easier to install and
upgrade. We can then treat kubernetes itself just like any other application
hosted in a kubernetes cluster, and have access to easy upgrades, monitoring,
and durability for core kubernetes components themselves.
We intend to achieve this by using kubernetes to manage itself. However, in
order to do that we must first "bootstrap" the cluster, by using kubernetes to
install kubernetes components. This is where this proposal fits in, by
describing the necessary modifications, and required procedures, needed to run a
self-hosted kubelet.
The approach being proposed for a self-hosted kubelet is a "pivot" style
installation. This procedure assumes a short-lived “bootstrap” kubelet will run
and start a long-running “self-hosted” kubelet. Once the self-hosted kubelet is
running the bootstrap kubelet will exit. As part of this, we propose introducing
a new `--bootstrap` flag to the kubelet. The behaviour of that flag will be
explained in detail below.
## Proposal
We propose adding a new flag to the kubelet, the `--bootstrap` flag, which is assumed to be used in conjunction with the `--lock-file` flag. When the `--bootstrap` flag is provided, after the kubelet acquires the file lock, it will begin asynchronously waiting on [inotify](http://man7.org/linux/man-pages/man7/inotify.7.html) events. Once an "open" event is received, the kubelet will assume another kubelet is attempting to take control and will exit. In this process, the $init system would be responsible for ensuring a kubelet is always running on the node.
We propose adding a new flag to the kubelet, the `--bootstrap` flag, which is
assumed to be used in conjunction with the `--lock-file` flag. When the
`--bootstrap` flag is provided, after the kubelet acquires the file lock, it
will begin asynchronously waiting on
[inotify](http://man7.org/linux/man-pages/man7/inotify.7.html) events. Once an
"open" event is received, the kubelet will assume another kubelet is attempting
to take control and will exit by calling `exit(0)`.
Thus, the initial bootstrap becomes:
1. "bootstrap" kubelet is started by $init system.
1. "bootstrap" kubelet pulls down "self-hosted" kubelet as a pod from a daemonset
1. "self-hosted" kubelet attempts to acquire the file lock, causing "bootstrap" kubelet to exit
1. "bootstrap" kubelet pulls down "self-hosted" kubelet as a pod from a
daemonset
1. "self-hosted" kubelet attempts to acquire the file lock, causing "bootstrap"
kubelet to exit
1. "self-hosted" kubelet acquires lock and takes over
1. "bootstrap" kubelet is restarted by $init system and blocks on acquiring the file lock
1. "bootstrap" kubelet is restarted by $init system and blocks on acquiring the
file lock
During an upgrade of the kubelet, for simplicity we will consider 3 kubelets, namely "bootstrap", "v1", and "v2". We imagine the following scenario for upgrades:
During an upgrade of the kubelet, for simplicity we will consider 3 kubelets,
namely "bootstrap", "v1", and "v2". We imagine the following scenario for
upgrades:
1. Cluster administrator introduces "v2" kubelet daemonset
1. "v1" kubelet pulls down and starts "v2"
@ -57,17 +99,54 @@ During an upgrade of the kubelet, for simplicity we will consider 3 kubelets, na
1. "v1" kubelet is killed
1. Both "bootstrap" and "v2" kubelets race for file lock
1. If "v2" kubelet acquires lock, process has completed
1. If "bootstrap" kubelet acquires lock, it is assumed that "v2" kubelet will fail a health check and be killed. Once restarted, it will try to acquire the lock, triggering the "bootstrap" kubelet to exit.
1. If "bootstrap" kubelet acquires lock, it is assumed that "v2" kubelet will
fail a health check and be killed. Once restarted, it will try to acquire the
lock, triggering the "bootstrap" kubelet to exit.
Alternatively, it would also be possible via this mechanism to delete the "v1" daemonset first, allow the "bootstrap" kubelet to take over, and then introduce the "v2" kubelet daemonset, effectively eliminating the race between "bootstrap" and "v2" for lock acquisition, and the reliance on the failing health check procedure.
Alternatively, it would also be possible via this mechanism to delete the "v1"
daemonset first, allow the "bootstrap" kubelet to take over, and then introduce
the "v2" kubelet daemonset, effectively eliminating the race between "bootstrap"
and "v2" for lock acquisition, and the reliance on the failing health check
procedure.
This will allow a "self-hosted" kubelet with minimal new concepts introduced into the core Kubernetes code base, and remains flexible enough to work well with future [bootstrapping services](https://github.com/kubernetes/kubernetes/issues/5754).
Eventually this could be handled by a DaemonSet upgrade policy.
This will allow a "self-hosted" kubelet with minimal new concepts introduced
into the core Kubernetes code base, and remains flexible enough to work well
with future [bootstrapping
services](https://github.com/kubernetes/kubernetes/issues/5754).
## Production readiness considerations / Out of scope issues
* Deterministically pulling and running kubelet pod: we would prefer not to have
to loop until we finally get a kubelet pod.
* It is possible that the bootstrap kubelet version is incompatible with the
newer versions that were run in the node. For example, the cgroup
configurations might be incompatible. In the beginning, we will require
cluster admins to keep the configuration in sync. Since we want the bootstrap
kubelet to come up and run even if the API server is not available, we should
persist the configuration for bootstrap kubelet on the node. Once we have
checkpointing in kubelet, we will checkpoint the updated config and have the
bootstrap kubelet use the updated config, if it were to take over.
## Other discussion
Various similar approaches have been discussed [here](https://github.com/kubernetes/kubernetes/issues/246#issuecomment-64533959) and [here](https://github.com/kubernetes/kubernetes/issues/23073#issuecomment-198478997). This also relies on the kubelet being able to be run inside a container, more discussion on that is [here](https://github.com/kubernetes/kubernetes/issues/4869).
Various similar approaches have been discussed
[here](https://github.com/kubernetes/kubernetes/issues/246#issuecomment-64533959)
and
[here](https://github.com/kubernetes/kubernetes/issues/23073#issuecomment-198478997).
This also relies on the kubelet being able to be run inside a container, more
discussion on that is
[here](https://github.com/kubernetes/kubernetes/issues/4869).
Additionally, [Taints and Tolerations](../../docs/design/taint-toleration-dedicated.md), whose design has already been accepted, would make the overall kubelet bootstrap more deterministic. With this, we would also need the ability for a kubelet to register itself with a given taint when it first contacts the API server. Given that, a kubelet could register itself with a given taint such as “component=kubelet”, and a kubelet pod could exist that has a toleration to that taint, ensuring it is the only pod the “bootstrap” kubelet runs.
Additionally, [Taints and
Tolerations](../../docs/design/taint-toleration-dedicated.md), whose design has
already been accepted, would make the overall kubelet bootstrap more
deterministic. With this, we would also need the ability for a kubelet to
register itself with a given taint when it first contacts the API server. Given
that, a kubelet could register itself with a given taint such as
“component=kubelet”, and a kubelet pod could exist that has a toleration to that
taint, ensuring it is the only pod the “bootstrap” kubelet runs.
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->