mirror of
https://github.com/k3s-io/kubernetes.git
synced 2025-09-16 22:53:22 +00:00
address issue #1488; clean up linewrap and some minor editing issues in the docs/design/* tree
Signed-off-by: mikebrow <brownwm@us.ibm.com>
This commit is contained in:
@@ -54,7 +54,7 @@ ideas.
|
||||
* **High availability:** continuing to be available and work correctly
|
||||
even if some components are down or uncontactable. This typically
|
||||
involves multiple replicas of critical services, and a reliable way
|
||||
to find available replicas. Note that it's possible (but not
|
||||
to find available replicas. Note that it's possible (but not
|
||||
desirable) to have high
|
||||
availability properties (e.g. multiple replicas) in the absence of
|
||||
self-healing properties (e.g. if a replica fails, nothing replaces
|
||||
@@ -109,11 +109,11 @@ ideas.
|
||||
|
||||
## Relative Priorities
|
||||
|
||||
1. **(Possibly manual) recovery from catastrophic failures:** having a Kubernetes cluster, and all
|
||||
applications running inside it, disappear forever perhaps is the worst
|
||||
possible failure mode. So it is critical that we be able to
|
||||
recover the applications running inside a cluster from such
|
||||
failures in some well-bounded time period.
|
||||
1. **(Possibly manual) recovery from catastrophic failures:** having a
|
||||
Kubernetes cluster, and all applications running inside it, disappear forever
|
||||
perhaps is the worst possible failure mode. So it is critical that we be able to
|
||||
recover the applications running inside a cluster from such failures in some
|
||||
well-bounded time period.
|
||||
1. In theory a cluster can be recovered by replaying all API calls
|
||||
that have ever been executed against it, in order, but most
|
||||
often that state has been lost, and/or is scattered across
|
||||
@@ -121,12 +121,12 @@ ideas.
|
||||
probably infeasible.
|
||||
1. In theory a cluster can also be recovered to some relatively
|
||||
recent non-corrupt backup/snapshot of the disk(s) backing the
|
||||
etcd cluster state. But we have no default consistent
|
||||
etcd cluster state. But we have no default consistent
|
||||
backup/snapshot, verification or restoration process. And we
|
||||
don't routinely test restoration, so even if we did routinely
|
||||
perform and verify backups, we have no hard evidence that we
|
||||
can in practise effectively recover from catastrophic cluster
|
||||
failure or data corruption by restoring from these backups. So
|
||||
failure or data corruption by restoring from these backups. So
|
||||
there's more work to be done here.
|
||||
1. **Self-healing:** Most major cloud providers provide the ability to
|
||||
easily and automatically replace failed virtual machines within a
|
||||
@@ -144,7 +144,6 @@ ideas.
|
||||
addition](https://github.com/coreos/etcd/blob/master/Documentation/runtime-configuration.md#add-a-new-member)
|
||||
or [backup and
|
||||
recovery](https://github.com/coreos/etcd/blob/master/Documentation/admin_guide.md#disaster-recovery)).
|
||||
|
||||
1. and boot disks are either:
|
||||
1. truely persistent (i.e. remote persistent disks), or
|
||||
1. reconstructible (e.g. using boot-from-snapshot,
|
||||
@@ -157,7 +156,7 @@ ideas.
|
||||
quorum members). In environments where cloud-assisted automatic
|
||||
self-healing might be infeasible (e.g. on-premise bare-metal
|
||||
deployments), it also gives cluster administrators more time to
|
||||
respond (e.g. replace/repair failed machines) without incurring
|
||||
respond (e.g. replace/repair failed machines) without incurring
|
||||
system downtime.
|
||||
|
||||
## Design and Status (as of December 2015)
|
||||
@@ -174,7 +173,7 @@ ideas.
|
||||
|
||||
Multiple stateless, self-hosted, self-healing API servers behind a HA
|
||||
load balancer, built out by the default "kube-up" automation on GCE,
|
||||
AWS and basic bare metal (BBM). Note that the single-host approach of
|
||||
AWS and basic bare metal (BBM). Note that the single-host approach of
|
||||
hving etcd listen only on localhost to ensure that onyl API server can
|
||||
connect to it will no longer work, so alternative security will be
|
||||
needed in the regard (either using firewall rules, SSL certs, or
|
||||
@@ -189,13 +188,13 @@ design doc.
|
||||
<td>
|
||||
|
||||
No scripted self-healing or HA on GCE, AWS or basic bare metal
|
||||
currently exists in the OSS distro. To be clear, "no self healing"
|
||||
currently exists in the OSS distro. To be clear, "no self healing"
|
||||
means that even if multiple e.g. API servers are provisioned for HA
|
||||
purposes, if they fail, nothing replaces them, so eventually the
|
||||
system will fail. Self-healing and HA can be set up
|
||||
system will fail. Self-healing and HA can be set up
|
||||
manually by following documented instructions, but this is not
|
||||
currently an automated process, and it is not tested as part of
|
||||
continuous integration. So it's probably safest to assume that it
|
||||
continuous integration. So it's probably safest to assume that it
|
||||
doesn't actually work in practise.
|
||||
|
||||
</td>
|
||||
@@ -205,8 +204,8 @@ doesn't actually work in practise.
|
||||
<td>
|
||||
|
||||
Multiple self-hosted, self healing warm standby stateless controller
|
||||
managers and schedulers with leader election and automatic failover of API server
|
||||
clients, automatically installed by default "kube-up" automation.
|
||||
managers and schedulers with leader election and automatic failover of API
|
||||
server clients, automatically installed by default "kube-up" automation.
|
||||
|
||||
</td>
|
||||
<td>As above.</td>
|
||||
@@ -218,47 +217,49 @@ clients, automatically installed by default "kube-up" automation.
|
||||
Multiple (3-5) etcd quorum members behind a load balancer with session
|
||||
affinity (to prevent clients from being bounced from one to another).
|
||||
|
||||
Regarding self-healing, if a node running etcd goes down, it is always necessary to do three
|
||||
things:
|
||||
Regarding self-healing, if a node running etcd goes down, it is always necessary
|
||||
to do three things:
|
||||
<ol>
|
||||
<li>allocate a new node (not necessary if running etcd as a pod, in
|
||||
which case specific measures are required to prevent user pods from
|
||||
interfering with system pods, for example using node selectors as
|
||||
described in <A HREF=")
|
||||
<li>start an etcd replica on that new node,
|
||||
described in <A HREF="),
|
||||
<li>start an etcd replica on that new node, and
|
||||
<li>have the new replica recover the etcd state.
|
||||
</ol>
|
||||
In the case of local disk (which fails in concert with the machine), the etcd
|
||||
state must be recovered from the other replicas. This is called <A HREF="https://github.com/coreos/etcd/blob/master/Documentation/runtime-configuration.md#add-a-new-member">dynamic member
|
||||
addition</A>.
|
||||
In the case of remote persistent disk, the etcd state can be recovered
|
||||
by attaching the remote persistent disk to the replacement node, thus
|
||||
the state is recoverable even if all other replicas are down.
|
||||
state must be recovered from the other replicas. This is called
|
||||
<A HREF="https://github.com/coreos/etcd/blob/master/Documentation/runtime-configuration.md#add-a-new-member">
|
||||
dynamic member addition</A>.
|
||||
|
||||
In the case of remote persistent disk, the etcd state can be recovered by
|
||||
attaching the remote persistent disk to the replacement node, thus the state is
|
||||
recoverable even if all other replicas are down.
|
||||
|
||||
There are also significant performance differences between local disks and remote
|
||||
persistent disks. For example, the <A HREF="https://cloud.google.com/compute/docs/disks/#comparison_of_disk_types">sustained throughput
|
||||
local disks in GCE is approximatley 20x that of remote disks</A>.
|
||||
persistent disks. For example, the
|
||||
<A HREF="https://cloud.google.com/compute/docs/disks/#comparison_of_disk_types">
|
||||
sustained throughput local disks in GCE is approximatley 20x that of remote
|
||||
disks</A>.
|
||||
|
||||
Hence we suggest that self-healing be provided by remotely mounted persistent disks in
|
||||
non-performance critical, single-zone cloud deployments. For
|
||||
performance critical installations, faster local SSD's should be used,
|
||||
in which case remounting on node failure is not an option, so
|
||||
<A HREF="https://github.com/coreos/etcd/blob/master/Documentation/runtime-configuration.md ">etcd runtime configuration</A>
|
||||
should be used to replace the failed machine. Similarly, for
|
||||
cross-zone self-healing, cloud persistent disks are zonal, so
|
||||
automatic
|
||||
<A HREF="https://github.com/coreos/etcd/blob/master/Documentation/runtime-configuration.md">runtime configuration</A>
|
||||
is required. Similarly, basic bare metal deployments cannot generally
|
||||
rely on
|
||||
remote persistent disks, so the same approach applies there.
|
||||
Hence we suggest that self-healing be provided by remotely mounted persistent
|
||||
disks in non-performance critical, single-zone cloud deployments. For
|
||||
performance critical installations, faster local SSD's should be used, in which
|
||||
case remounting on node failure is not an option, so
|
||||
<A HREF="https://github.com/coreos/etcd/blob/master/Documentation/runtime-configuration.md ">
|
||||
etcd runtime configuration</A> should be used to replace the failed machine.
|
||||
Similarly, for cross-zone self-healing, cloud persistent disks are zonal, so
|
||||
automatic <A HREF="https://github.com/coreos/etcd/blob/master/Documentation/runtime-configuration.md">
|
||||
runtime configuration</A> is required. Similarly, basic bare metal deployments
|
||||
cannot generally rely on remote persistent disks, so the same approach applies
|
||||
there.
|
||||
</td>
|
||||
<td>
|
||||
<A HREF="http://kubernetes.io/v1.1/docs/admin/high-availability.html">
|
||||
Somewhat vague instructions exist</A>
|
||||
on how to set some of this up manually in a self-hosted
|
||||
configuration. But automatic bootstrapping and self-healing is not
|
||||
described (and is not implemented for the non-PD cases). This all
|
||||
still needs to be automated and continuously tested.
|
||||
Somewhat vague instructions exist</A> on how to set some of this up manually in
|
||||
a self-hosted configuration. But automatic bootstrapping and self-healing is not
|
||||
described (and is not implemented for the non-PD cases). This all still needs to
|
||||
be automated and continuously tested.
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
Reference in New Issue
Block a user