mirror of
https://github.com/k3s-io/kubernetes.git
synced 2025-08-03 09:22:44 +00:00
Merge pull request #28558 from quinton-hoole/2016-07-06-excise-ubernetes-from-main-repo
Automatic merge from submit-queue Deprecate the term "Ubernetes" Deprecate the term "Ubernetes" in favor of "Cluster Federation" and "Multi-AZ Clusters"
This commit is contained in:
commit
629f3c159e
@ -32,7 +32,7 @@ Documentation for other releases can be found at
|
|||||||
|
|
||||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||||
|
|
||||||
# Kubernetes/Ubernetes Control Plane Resilience
|
# Kubernetes and Cluster Federation Control Plane Resilience
|
||||||
|
|
||||||
## Long Term Design and Current Status
|
## Long Term Design and Current Status
|
||||||
|
|
||||||
@ -44,7 +44,7 @@ Documentation for other releases can be found at
|
|||||||
|
|
||||||
Some amount of confusion exists around how we currently, and in future
|
Some amount of confusion exists around how we currently, and in future
|
||||||
want to ensure resilience of the Kubernetes (and by implication
|
want to ensure resilience of the Kubernetes (and by implication
|
||||||
Ubernetes) control plane. This document is an attempt to capture that
|
Kubernetes Cluster Federation) control plane. This document is an attempt to capture that
|
||||||
definitively. It covers areas including self-healing, high
|
definitively. It covers areas including self-healing, high
|
||||||
availability, bootstrapping and recovery. Most of the information in
|
availability, bootstrapping and recovery. Most of the information in
|
||||||
this document already exists in the form of github comments,
|
this document already exists in the form of github comments,
|
||||||
|
@ -32,7 +32,7 @@ Documentation for other releases can be found at
|
|||||||
|
|
||||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||||
|
|
||||||
# Kubernetes Cluster Federation (a.k.a. "Ubernetes")
|
# Kubernetes Cluster Federation (previously nicknamed "Ubernetes")
|
||||||
|
|
||||||
## Cross-cluster Load Balancing and Service Discovery
|
## Cross-cluster Load Balancing and Service Discovery
|
||||||
|
|
||||||
@ -106,7 +106,7 @@ Documentation for other releases can be found at
|
|||||||
|
|
||||||
A Kubernetes application configuration (e.g. for a Pod, Replication
|
A Kubernetes application configuration (e.g. for a Pod, Replication
|
||||||
Controller, Service etc) should be able to be successfully deployed
|
Controller, Service etc) should be able to be successfully deployed
|
||||||
into any Kubernetes Cluster or Ubernetes Federation of Clusters,
|
into any Kubernetes Cluster or Federation of Clusters,
|
||||||
without modification. More specifically, a typical configuration
|
without modification. More specifically, a typical configuration
|
||||||
should work correctly (although possibly not optimally) across any of
|
should work correctly (although possibly not optimally) across any of
|
||||||
the following environments:
|
the following environments:
|
||||||
@ -154,7 +154,7 @@ environments. More specifically, for example:
|
|||||||
|
|
||||||
## Component Cloud Services
|
## Component Cloud Services
|
||||||
|
|
||||||
Ubernetes cross-cluster load balancing is built on top of the following:
|
Cross-cluster Federated load balancing is built on top of the following:
|
||||||
|
|
||||||
1. [GCE Global L7 Load Balancers](https://cloud.google.com/compute/docs/load-balancing/http/global-forwarding-rules)
|
1. [GCE Global L7 Load Balancers](https://cloud.google.com/compute/docs/load-balancing/http/global-forwarding-rules)
|
||||||
provide single, static global IP addresses which load balance and
|
provide single, static global IP addresses which load balance and
|
||||||
@ -194,10 +194,11 @@ Ubernetes cross-cluster load balancing is built on top of the following:
|
|||||||
A generic wrapper around cloud-provided L4 and L7 load balancing services, and
|
A generic wrapper around cloud-provided L4 and L7 load balancing services, and
|
||||||
roll-your-own load balancers run in pods, e.g. HA Proxy.
|
roll-your-own load balancers run in pods, e.g. HA Proxy.
|
||||||
|
|
||||||
## Ubernetes API
|
## Cluster Federation API
|
||||||
|
|
||||||
The Ubernetes API for load balancing should be compatible with the equivalent
|
The Cluster Federation API for load balancing should be compatible with the equivalent
|
||||||
Kubernetes API, to ease porting of clients between Ubernetes and Kubernetes.
|
Kubernetes API, to ease porting of clients between Kubernetes and
|
||||||
|
federations of Kubernetes clusters.
|
||||||
Further details below.
|
Further details below.
|
||||||
|
|
||||||
## Common Client Behavior
|
## Common Client Behavior
|
||||||
@ -250,13 +251,13 @@ multiple) fixed server IP(s). Nothing else matters.
|
|||||||
|
|
||||||
### General Control Plane Architecture
|
### General Control Plane Architecture
|
||||||
|
|
||||||
Each cluster hosts one or more Ubernetes master components (Ubernetes API
|
Each cluster hosts one or more Cluster Federation master components (Federation API
|
||||||
servers, controller managers with leader election, and etcd quorum members. This
|
servers, controller managers with leader election, and etcd quorum members. This
|
||||||
is documented in more detail in a separate design doc:
|
is documented in more detail in a separate design doc:
|
||||||
[Kubernetes/Ubernetes Control Plane Resilience](https://docs.google.com/document/d/1jGcUVg9HDqQZdcgcFYlWMXXdZsplDdY6w3ZGJbU7lAw/edit#).
|
[Kubernetes and Cluster Federation Control Plane Resilience](https://docs.google.com/document/d/1jGcUVg9HDqQZdcgcFYlWMXXdZsplDdY6w3ZGJbU7lAw/edit#).
|
||||||
|
|
||||||
In the description below, assume that 'n' clusters, named 'cluster-1'...
|
In the description below, assume that 'n' clusters, named 'cluster-1'...
|
||||||
'cluster-n' have been registered against an Ubernetes Federation "federation-1",
|
'cluster-n' have been registered against a Cluster Federation "federation-1",
|
||||||
each with their own set of Kubernetes API endpoints,so,
|
each with their own set of Kubernetes API endpoints,so,
|
||||||
"[http://endpoint-1.cluster-1](http://endpoint-1.cluster-1),
|
"[http://endpoint-1.cluster-1](http://endpoint-1.cluster-1),
|
||||||
[http://endpoint-2.cluster-1](http://endpoint-2.cluster-1)
|
[http://endpoint-2.cluster-1](http://endpoint-2.cluster-1)
|
||||||
@ -264,13 +265,13 @@ each with their own set of Kubernetes API endpoints,so,
|
|||||||
|
|
||||||
### Federated Services
|
### Federated Services
|
||||||
|
|
||||||
Ubernetes Services are pretty straight-forward. They're comprised of multiple
|
Federated Services are pretty straight-forward. They're comprised of multiple
|
||||||
equivalent underlying Kubernetes Services, each with their own external
|
equivalent underlying Kubernetes Services, each with their own external
|
||||||
endpoint, and a load balancing mechanism across them. Let's work through how
|
endpoint, and a load balancing mechanism across them. Let's work through how
|
||||||
exactly that works in practice.
|
exactly that works in practice.
|
||||||
|
|
||||||
Our user creates the following Ubernetes Service (against an Ubernetes API
|
Our user creates the following Federated Service (against a Federation
|
||||||
endpoint):
|
API endpoint):
|
||||||
|
|
||||||
$ kubectl create -f my-service.yaml --context="federation-1"
|
$ kubectl create -f my-service.yaml --context="federation-1"
|
||||||
|
|
||||||
@ -296,7 +297,7 @@ where service.yaml contains the following:
|
|||||||
run: my-service
|
run: my-service
|
||||||
type: LoadBalancer
|
type: LoadBalancer
|
||||||
|
|
||||||
Ubernetes in turn creates one equivalent service (identical config to the above)
|
The Cluster Federation control system in turn creates one equivalent service (identical config to the above)
|
||||||
in each of the underlying Kubernetes clusters, each of which results in
|
in each of the underlying Kubernetes clusters, each of which results in
|
||||||
something like this:
|
something like this:
|
||||||
|
|
||||||
@ -338,7 +339,7 @@ something like this:
|
|||||||
Similar services are created in `cluster-2` and `cluster-3`, each of which are
|
Similar services are created in `cluster-2` and `cluster-3`, each of which are
|
||||||
allocated their own `spec.clusterIP`, and `status.loadBalancer.ingress.ip`.
|
allocated their own `spec.clusterIP`, and `status.loadBalancer.ingress.ip`.
|
||||||
|
|
||||||
In Ubernetes `federation-1`, the resulting federated service looks as follows:
|
In the Cluster Federation `federation-1`, the resulting federated service looks as follows:
|
||||||
|
|
||||||
$ kubectl get -o yaml --context="federation-1" service my-service
|
$ kubectl get -o yaml --context="federation-1" service my-service
|
||||||
|
|
||||||
@ -382,7 +383,7 @@ Note that the federated service:
|
|||||||
1. has a federation-wide load balancer hostname
|
1. has a federation-wide load balancer hostname
|
||||||
|
|
||||||
In addition to the set of underlying Kubernetes services (one per cluster)
|
In addition to the set of underlying Kubernetes services (one per cluster)
|
||||||
described above, Ubernetes has also created a DNS name (e.g. on
|
described above, the Cluster Federation control system has also created a DNS name (e.g. on
|
||||||
[Google Cloud DNS](https://cloud.google.com/dns) or
|
[Google Cloud DNS](https://cloud.google.com/dns) or
|
||||||
[AWS Route 53](https://aws.amazon.com/route53/), depending on configuration)
|
[AWS Route 53](https://aws.amazon.com/route53/), depending on configuration)
|
||||||
which provides load balancing across all of those services. For example, in a
|
which provides load balancing across all of those services. For example, in a
|
||||||
@ -397,7 +398,8 @@ Each of the above IP addresses (which are just the external load balancer
|
|||||||
ingress IP's of each cluster service) is of course load balanced across the pods
|
ingress IP's of each cluster service) is of course load balanced across the pods
|
||||||
comprising the service in each cluster.
|
comprising the service in each cluster.
|
||||||
|
|
||||||
In a more sophisticated configuration (e.g. on GCE or GKE), Ubernetes
|
In a more sophisticated configuration (e.g. on GCE or GKE), the Cluster
|
||||||
|
Federation control system
|
||||||
automatically creates a
|
automatically creates a
|
||||||
[GCE Global L7 Load Balancer](https://cloud.google.com/compute/docs/load-balancing/http/global-forwarding-rules)
|
[GCE Global L7 Load Balancer](https://cloud.google.com/compute/docs/load-balancing/http/global-forwarding-rules)
|
||||||
which exposes a single, globally load-balanced IP:
|
which exposes a single, globally load-balanced IP:
|
||||||
@ -405,7 +407,7 @@ which exposes a single, globally load-balanced IP:
|
|||||||
$ dig +noall +answer my-service.my-namespace.my-federation.my-domain.com
|
$ dig +noall +answer my-service.my-namespace.my-federation.my-domain.com
|
||||||
my-service.my-namespace.my-federation.my-domain.com 180 IN A 107.194.17.44
|
my-service.my-namespace.my-federation.my-domain.com 180 IN A 107.194.17.44
|
||||||
|
|
||||||
Optionally, Ubernetes also configures the local DNS servers (SkyDNS)
|
Optionally, the Cluster Federation control system also configures the local DNS servers (SkyDNS)
|
||||||
in each Kubernetes cluster to preferentially return the local
|
in each Kubernetes cluster to preferentially return the local
|
||||||
clusterIP for the service in that cluster, with other clusters'
|
clusterIP for the service in that cluster, with other clusters'
|
||||||
external service IP's (or a global load-balanced IP) also configured
|
external service IP's (or a global load-balanced IP) also configured
|
||||||
@ -416,7 +418,7 @@ for failover purposes:
|
|||||||
my-service.my-namespace.my-federation.my-domain.com 180 IN A 104.197.74.77
|
my-service.my-namespace.my-federation.my-domain.com 180 IN A 104.197.74.77
|
||||||
my-service.my-namespace.my-federation.my-domain.com 180 IN A 104.197.38.157
|
my-service.my-namespace.my-federation.my-domain.com 180 IN A 104.197.38.157
|
||||||
|
|
||||||
If Ubernetes Global Service Health Checking is enabled, multiple service health
|
If Cluster Federation Global Service Health Checking is enabled, multiple service health
|
||||||
checkers running across the federated clusters collaborate to monitor the health
|
checkers running across the federated clusters collaborate to monitor the health
|
||||||
of the service endpoints, and automatically remove unhealthy endpoints from the
|
of the service endpoints, and automatically remove unhealthy endpoints from the
|
||||||
DNS record (e.g. a majority quorum is required to vote a service endpoint
|
DNS record (e.g. a majority quorum is required to vote a service endpoint
|
||||||
@ -460,7 +462,7 @@ where `my-service-rc.yaml` contains the following:
|
|||||||
- containerPort: 2380
|
- containerPort: 2380
|
||||||
protocol: TCP
|
protocol: TCP
|
||||||
|
|
||||||
Ubernetes in turn creates one equivalent replication controller
|
The Cluster Federation control system in turn creates one equivalent replication controller
|
||||||
(identical config to the above, except for the replica count) in each
|
(identical config to the above, except for the replica count) in each
|
||||||
of the underlying Kubernetes clusters, each of which results in
|
of the underlying Kubernetes clusters, each of which results in
|
||||||
something like this:
|
something like this:
|
||||||
@ -510,8 +512,8 @@ entire cluster failures, various approaches are possible, including:
|
|||||||
replicas in its cluster in response to the additional traffic
|
replicas in its cluster in response to the additional traffic
|
||||||
diverted from the failed cluster. This saves resources and is relatively
|
diverted from the failed cluster. This saves resources and is relatively
|
||||||
simple, but there is some delay in the autoscaling.
|
simple, but there is some delay in the autoscaling.
|
||||||
3. **federated replica migration**, where the Ubernetes Federation
|
3. **federated replica migration**, where the Cluster Federation
|
||||||
Control Plane detects the cluster failure and automatically
|
control system detects the cluster failure and automatically
|
||||||
increases the replica count in the remainaing clusters to make up
|
increases the replica count in the remainaing clusters to make up
|
||||||
for the lost replicas in the failed cluster. This does not seem to
|
for the lost replicas in the failed cluster. This does not seem to
|
||||||
offer any benefits relative to pod autoscaling above, and is
|
offer any benefits relative to pod autoscaling above, and is
|
||||||
@ -523,23 +525,24 @@ entire cluster failures, various approaches are possible, including:
|
|||||||
The implementation approach and architecture is very similar to Kubernetes, so
|
The implementation approach and architecture is very similar to Kubernetes, so
|
||||||
if you're familiar with how Kubernetes works, none of what follows will be
|
if you're familiar with how Kubernetes works, none of what follows will be
|
||||||
surprising. One additional design driver not present in Kubernetes is that
|
surprising. One additional design driver not present in Kubernetes is that
|
||||||
Ubernetes aims to be resilient to individual cluster and availability zone
|
the Cluster Federation control system aims to be resilient to individual cluster and availability zone
|
||||||
failures. So the control plane spans multiple clusters. More specifically:
|
failures. So the control plane spans multiple clusters. More specifically:
|
||||||
|
|
||||||
+ Ubernetes runs it's own distinct set of API servers (typically one
|
+ Cluster Federation runs it's own distinct set of API servers (typically one
|
||||||
or more per underlying Kubernetes cluster). These are completely
|
or more per underlying Kubernetes cluster). These are completely
|
||||||
distinct from the Kubernetes API servers for each of the underlying
|
distinct from the Kubernetes API servers for each of the underlying
|
||||||
clusters.
|
clusters.
|
||||||
+ Ubernetes runs it's own distinct quorum-based metadata store (etcd,
|
+ Cluster Federation runs it's own distinct quorum-based metadata store (etcd,
|
||||||
by default). Approximately 1 quorum member runs in each underlying
|
by default). Approximately 1 quorum member runs in each underlying
|
||||||
cluster ("approximately" because we aim for an odd number of quorum
|
cluster ("approximately" because we aim for an odd number of quorum
|
||||||
members, and typically don't want more than 5 quorum members, even
|
members, and typically don't want more than 5 quorum members, even
|
||||||
if we have a larger number of federated clusters, so 2 clusters->3
|
if we have a larger number of federated clusters, so 2 clusters->3
|
||||||
quorum members, 3->3, 4->3, 5->5, 6->5, 7->5 etc).
|
quorum members, 3->3, 4->3, 5->5, 6->5, 7->5 etc).
|
||||||
|
|
||||||
Cluster Controllers in Ubernetes watch against the Ubernetes API server/etcd
|
Cluster Controllers in the Federation control system watch against the
|
||||||
|
Federation API server/etcd
|
||||||
state, and apply changes to the underlying kubernetes clusters accordingly. They
|
state, and apply changes to the underlying kubernetes clusters accordingly. They
|
||||||
also have the anti-entropy mechanism for reconciling ubernetes "desired desired"
|
also have the anti-entropy mechanism for reconciling Cluster Federation "desired desired"
|
||||||
state against kubernetes "actual desired" state.
|
state against kubernetes "actual desired" state.
|
||||||
|
|
||||||
|
|
||||||
|
@ -320,8 +320,8 @@ Below is the state transition diagram.
|
|||||||
|
|
||||||
## Replication Controller
|
## Replication Controller
|
||||||
|
|
||||||
A global workload submitted to control plane is represented as an
|
A global workload submitted to control plane is represented as a
|
||||||
Ubernetes replication controller. When a replication controller
|
replication controller in the Cluster Federation control plane. When a replication controller
|
||||||
is submitted to control plane, clients need a way to express its
|
is submitted to control plane, clients need a way to express its
|
||||||
requirements or preferences on clusters. Depending on different use
|
requirements or preferences on clusters. Depending on different use
|
||||||
cases it may be complex. For example:
|
cases it may be complex. For example:
|
||||||
@ -377,11 +377,11 @@ some implicit scheduling restrictions. For example it defines
|
|||||||
“nodeSelector” which can only be satisfied on some particular
|
“nodeSelector” which can only be satisfied on some particular
|
||||||
clusters. How to handle this will be addressed after phase one.
|
clusters. How to handle this will be addressed after phase one.
|
||||||
|
|
||||||
## Ubernetes Services
|
## Federated Services
|
||||||
|
|
||||||
The Service API object exposed by Ubernetes is similar to service
|
The Service API object exposed by the Cluster Federation is similar to service
|
||||||
objects on Kubernetes. It defines the access to a group of pods. The
|
objects on Kubernetes. It defines the access to a group of pods. The
|
||||||
Ubernetes service controller will create corresponding Kubernetes
|
federation service controller will create corresponding Kubernetes
|
||||||
service objects on underlying clusters. These are detailed in a
|
service objects on underlying clusters. These are detailed in a
|
||||||
separate design document: [Federated Services](federated-services.md).
|
separate design document: [Federated Services](federated-services.md).
|
||||||
|
|
||||||
@ -389,13 +389,13 @@ separate design document: [Federated Services](federated-services.md).
|
|||||||
|
|
||||||
In phase one we only support scheduling replication controllers. Pod
|
In phase one we only support scheduling replication controllers. Pod
|
||||||
scheduling will be supported in later phase. This is primarily in
|
scheduling will be supported in later phase. This is primarily in
|
||||||
order to keep the Ubernetes API compatible with the Kubernetes API.
|
order to keep the Cluster Federation API compatible with the Kubernetes API.
|
||||||
|
|
||||||
## ACTIVITY FLOWS
|
## ACTIVITY FLOWS
|
||||||
|
|
||||||
## Scheduling
|
## Scheduling
|
||||||
|
|
||||||
The below diagram shows how workloads are scheduled on the Ubernetes control\
|
The below diagram shows how workloads are scheduled on the Cluster Federation control\
|
||||||
plane:
|
plane:
|
||||||
|
|
||||||
1. A replication controller is created by the client.
|
1. A replication controller is created by the client.
|
||||||
@ -419,20 +419,20 @@ distribution policies. The scheduling rule is basically:
|
|||||||
There is a potential race condition here. Say at time _T1_ the control
|
There is a potential race condition here. Say at time _T1_ the control
|
||||||
plane learns there are _m_ available resources in a K8S cluster. As
|
plane learns there are _m_ available resources in a K8S cluster. As
|
||||||
the cluster is working independently it still accepts workload
|
the cluster is working independently it still accepts workload
|
||||||
requests from other K8S clients or even another Ubernetes control
|
requests from other K8S clients or even another Cluster Federation control
|
||||||
plane. The Ubernetes scheduling decision is based on this data of
|
plane. The Cluster Federation scheduling decision is based on this data of
|
||||||
available resources. However when the actual RC creation happens to
|
available resources. However when the actual RC creation happens to
|
||||||
the cluster at time _T2_, the cluster may don’t have enough resources
|
the cluster at time _T2_, the cluster may don’t have enough resources
|
||||||
at that time. We will address this problem in later phases with some
|
at that time. We will address this problem in later phases with some
|
||||||
proposed solutions like resource reservation mechanisms.
|
proposed solutions like resource reservation mechanisms.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
## Service Discovery
|
## Service Discovery
|
||||||
|
|
||||||
This part has been included in the section “Federated Service” of
|
This part has been included in the section “Federated Service” of
|
||||||
document
|
document
|
||||||
“[Ubernetes Cross-cluster Load Balancing and Service Discovery Requirements and System Design](federated-services.md))”.
|
“[Federated Cross-cluster Load Balancing and Service Discovery Requirements and System Design](federated-services.md))”.
|
||||||
Please refer to that document for details.
|
Please refer to that document for details.
|
||||||
|
|
||||||
|
|
||||||
|
@ -347,7 +347,7 @@ scheduler to not put more than one pod from S in the same zone, and thus by
|
|||||||
definition it will not put more than one pod from S on the same node, assuming
|
definition it will not put more than one pod from S on the same node, assuming
|
||||||
each node is in one zone. This rule is more useful as PreferredDuringScheduling
|
each node is in one zone. This rule is more useful as PreferredDuringScheduling
|
||||||
anti-affinity, e.g. one might expect it to be common in
|
anti-affinity, e.g. one might expect it to be common in
|
||||||
[Ubernetes](../../docs/proposals/federation.md) clusters.)
|
[Cluster Federation](../../docs/proposals/federation.md) clusters.)
|
||||||
|
|
||||||
* **Don't co-locate pods of this service with pods from service "evilService"**:
|
* **Don't co-locate pods of this service with pods from service "evilService"**:
|
||||||
`{LabelSelector: selector that matches evilService's pods, TopologyKey: "node"}`
|
`{LabelSelector: selector that matches evilService's pods, TopologyKey: "node"}`
|
||||||
|
@ -34,25 +34,25 @@ Documentation for other releases can be found at
|
|||||||
|
|
||||||
# Kubernetes Multi-AZ Clusters
|
# Kubernetes Multi-AZ Clusters
|
||||||
|
|
||||||
## (a.k.a. "Ubernetes-Lite")
|
## (previously nicknamed "Ubernetes-Lite")
|
||||||
|
|
||||||
## Introduction
|
## Introduction
|
||||||
|
|
||||||
Full Ubernetes will offer sophisticated federation between multiple kuberentes
|
Full Cluster Federation will offer sophisticated federation between multiple kuberentes
|
||||||
clusters, offering true high-availability, multiple provider support &
|
clusters, offering true high-availability, multiple provider support &
|
||||||
cloud-bursting, multiple region support etc. However, many users have
|
cloud-bursting, multiple region support etc. However, many users have
|
||||||
expressed a desire for a "reasonably" high-available cluster, that runs in
|
expressed a desire for a "reasonably" high-available cluster, that runs in
|
||||||
multiple zones on GCE or availability zones in AWS, and can tolerate the failure
|
multiple zones on GCE or availability zones in AWS, and can tolerate the failure
|
||||||
of a single zone without the complexity of running multiple clusters.
|
of a single zone without the complexity of running multiple clusters.
|
||||||
|
|
||||||
Ubernetes-Lite aims to deliver exactly that functionality: to run a single
|
Multi-AZ Clusters aim to deliver exactly that functionality: to run a single
|
||||||
Kubernetes cluster in multiple zones. It will attempt to make reasonable
|
Kubernetes cluster in multiple zones. It will attempt to make reasonable
|
||||||
scheduling decisions, in particular so that a replication controller's pods are
|
scheduling decisions, in particular so that a replication controller's pods are
|
||||||
spread across zones, and it will try to be aware of constraints - for example
|
spread across zones, and it will try to be aware of constraints - for example
|
||||||
that a volume cannot be mounted on a node in a different zone.
|
that a volume cannot be mounted on a node in a different zone.
|
||||||
|
|
||||||
Ubernetes-Lite is deliberately limited in scope; for many advanced functions
|
Multi-AZ Clusters are deliberately limited in scope; for many advanced functions
|
||||||
the answer will be "use Ubernetes (full)". For example, multiple-region
|
the answer will be "use full Cluster Federation". For example, multiple-region
|
||||||
support is not in scope. Routing affinity (e.g. so that a webserver will
|
support is not in scope. Routing affinity (e.g. so that a webserver will
|
||||||
prefer to talk to a backend service in the same zone) is similarly not in
|
prefer to talk to a backend service in the same zone) is similarly not in
|
||||||
scope.
|
scope.
|
||||||
@ -122,7 +122,7 @@ zones (in the same region). For both clouds, the behaviour of the native cloud
|
|||||||
load-balancer is reasonable in the face of failures (indeed, this is why clouds
|
load-balancer is reasonable in the face of failures (indeed, this is why clouds
|
||||||
provide load-balancing as a primitve).
|
provide load-balancing as a primitve).
|
||||||
|
|
||||||
For Ubernetes-Lite we will therefore simply rely on the native cloud provider
|
For multi-AZ clusters we will therefore simply rely on the native cloud provider
|
||||||
load balancer behaviour, and we do not anticipate substantial code changes.
|
load balancer behaviour, and we do not anticipate substantial code changes.
|
||||||
|
|
||||||
One notable shortcoming here is that load-balanced traffic still goes through
|
One notable shortcoming here is that load-balanced traffic still goes through
|
||||||
@ -130,8 +130,8 @@ kube-proxy controlled routing, and kube-proxy does not (currently) favor
|
|||||||
targeting a pod running on the same instance or even the same zone. This will
|
targeting a pod running on the same instance or even the same zone. This will
|
||||||
likely produce a lot of unnecessary cross-zone traffic (which is likely slower
|
likely produce a lot of unnecessary cross-zone traffic (which is likely slower
|
||||||
and more expensive). This might be sufficiently low-hanging fruit that we
|
and more expensive). This might be sufficiently low-hanging fruit that we
|
||||||
choose to address it in kube-proxy / Ubernetes-Lite, but this can be addressed
|
choose to address it in kube-proxy / multi-AZ clusters, but this can be addressed
|
||||||
after the initial Ubernetes-Lite implementation.
|
after the initial implementation.
|
||||||
|
|
||||||
|
|
||||||
## Implementation
|
## Implementation
|
||||||
@ -182,8 +182,8 @@ region-wide, meaning that a single call will find instances and volumes in all
|
|||||||
zones. In addition, instance ids and volume ids are unique per-region (and
|
zones. In addition, instance ids and volume ids are unique per-region (and
|
||||||
hence also per-zone). I believe they are actually globally unique, but I do
|
hence also per-zone). I believe they are actually globally unique, but I do
|
||||||
not know if this is guaranteed; in any case we only need global uniqueness if
|
not know if this is guaranteed; in any case we only need global uniqueness if
|
||||||
we are to span regions, which will not be supported by Ubernetes-Lite (to do
|
we are to span regions, which will not be supported by multi-AZ clusters (to do
|
||||||
that correctly requires an Ubernetes-Full type approach).
|
that correctly requires a full Cluster Federation type approach).
|
||||||
|
|
||||||
## GCE Specific Considerations
|
## GCE Specific Considerations
|
||||||
|
|
||||||
@ -197,20 +197,20 @@ combine results from calls in all relevant zones.
|
|||||||
A further complexity is that GCE volume names are scoped per-zone, not
|
A further complexity is that GCE volume names are scoped per-zone, not
|
||||||
per-region. Thus it is permitted to have two volumes both named `myvolume` in
|
per-region. Thus it is permitted to have two volumes both named `myvolume` in
|
||||||
two different GCE zones. (Instance names are currently unique per-region, and
|
two different GCE zones. (Instance names are currently unique per-region, and
|
||||||
thus are not a problem for Ubernetes-Lite).
|
thus are not a problem for multi-AZ clusters).
|
||||||
|
|
||||||
The volume scoping leads to a (small) behavioural change for Ubernetes-Lite on
|
The volume scoping leads to a (small) behavioural change for multi-AZ clusters on
|
||||||
GCE. If you had two volumes both named `myvolume` in two different GCE zones,
|
GCE. If you had two volumes both named `myvolume` in two different GCE zones,
|
||||||
this would not be ambiguous when Kubernetes is operating only in a single zone.
|
this would not be ambiguous when Kubernetes is operating only in a single zone.
|
||||||
But, if Ubernetes-Lite is operating in multiple zones, `myvolume` is no longer
|
But, when operating a cluster across multiple zones, `myvolume` is no longer
|
||||||
sufficient to specify a volume uniquely. Worse, the fact that a volume happens
|
sufficient to specify a volume uniquely. Worse, the fact that a volume happens
|
||||||
to be unambigious at a particular time is no guarantee that it will continue to
|
to be unambigious at a particular time is no guarantee that it will continue to
|
||||||
be unambigious in future, because a volume with the same name could
|
be unambigious in future, because a volume with the same name could
|
||||||
subsequently be created in a second zone. While perhaps unlikely in practice,
|
subsequently be created in a second zone. While perhaps unlikely in practice,
|
||||||
we cannot automatically enable Ubernetes-Lite for GCE users if this then causes
|
we cannot automatically enable multi-AZ clusters for GCE users if this then causes
|
||||||
volume mounts to stop working.
|
volume mounts to stop working.
|
||||||
|
|
||||||
This suggests that (at least on GCE), Ubernetes-Lite must be optional (i.e.
|
This suggests that (at least on GCE), multi-AZ clusters must be optional (i.e.
|
||||||
there must be a feature-flag). It may be that we can make this feature
|
there must be a feature-flag). It may be that we can make this feature
|
||||||
semi-automatic in future, by detecting whether nodes are running in multiple
|
semi-automatic in future, by detecting whether nodes are running in multiple
|
||||||
zones, but it seems likely that kube-up could instead simply set this flag.
|
zones, but it seems likely that kube-up could instead simply set this flag.
|
||||||
@ -218,14 +218,14 @@ zones, but it seems likely that kube-up could instead simply set this flag.
|
|||||||
For the initial implementation, creating volumes with identical names will
|
For the initial implementation, creating volumes with identical names will
|
||||||
yield undefined results. Later, we may add some way to specify the zone for a
|
yield undefined results. Later, we may add some way to specify the zone for a
|
||||||
volume (and possibly require that volumes have their zone specified when
|
volume (and possibly require that volumes have their zone specified when
|
||||||
running with Ubernetes-Lite). We could add a new `zone` field to the
|
running in multi-AZ cluster mode). We could add a new `zone` field to the
|
||||||
PersistentVolume type for GCE PD volumes, or we could use a DNS-style dotted
|
PersistentVolume type for GCE PD volumes, or we could use a DNS-style dotted
|
||||||
name for the volume name (<name>.<zone>)
|
name for the volume name (<name>.<zone>)
|
||||||
|
|
||||||
Initially therefore, the GCE changes will be to:
|
Initially therefore, the GCE changes will be to:
|
||||||
|
|
||||||
1. change kube-up to support creation of a cluster in multiple zones
|
1. change kube-up to support creation of a cluster in multiple zones
|
||||||
1. pass a flag enabling Ubernetes-Lite with kube-up
|
1. pass a flag enabling multi-AZ clusters with kube-up
|
||||||
1. change the kuberentes cloud provider to iterate through relevant zones when resolving items
|
1. change the kuberentes cloud provider to iterate through relevant zones when resolving items
|
||||||
1. tag GCE PD volumes with the appropriate zone information
|
1. tag GCE PD volumes with the appropriate zone information
|
||||||
|
|
||||||
|
@ -34,7 +34,7 @@ Documentation for other releases can be found at
|
|||||||
|
|
||||||
# Kubernetes Cluster Federation
|
# Kubernetes Cluster Federation
|
||||||
|
|
||||||
## (a.k.a. "Ubernetes")
|
## (previously nicknamed "Ubernetes")
|
||||||
|
|
||||||
## Requirements Analysis and Product Proposal
|
## Requirements Analysis and Product Proposal
|
||||||
|
|
||||||
@ -413,7 +413,7 @@ detail to be added here, but feel free to shoot down the basic DNS
|
|||||||
idea in the mean time. In addition, some applications rely on private
|
idea in the mean time. In addition, some applications rely on private
|
||||||
networking between clusters for security (e.g. AWS VPC or more
|
networking between clusters for security (e.g. AWS VPC or more
|
||||||
generally VPN). It should not be necessary to forsake this in
|
generally VPN). It should not be necessary to forsake this in
|
||||||
order to use Ubernetes, for example by being forced to use public
|
order to use Cluster Federation, for example by being forced to use public
|
||||||
connectivity between clusters.
|
connectivity between clusters.
|
||||||
|
|
||||||
## Cross-cluster Scheduling
|
## Cross-cluster Scheduling
|
||||||
@ -546,7 +546,7 @@ prefers the Decoupled Hierarchical model for the reasons stated below).
|
|||||||
here, as each underlying Kubernetes cluster can be scaled
|
here, as each underlying Kubernetes cluster can be scaled
|
||||||
completely independently w.r.t. scheduling, node state management,
|
completely independently w.r.t. scheduling, node state management,
|
||||||
monitoring, network connectivity etc. It is even potentially
|
monitoring, network connectivity etc. It is even potentially
|
||||||
feasible to stack "Ubernetes" federated clusters (i.e. create
|
feasible to stack federations of clusters (i.e. create
|
||||||
federations of federations) should scalability of the independent
|
federations of federations) should scalability of the independent
|
||||||
Federation Control Plane become an issue (although the author does
|
Federation Control Plane become an issue (although the author does
|
||||||
not envision this being a problem worth solving in the short
|
not envision this being a problem worth solving in the short
|
||||||
@ -595,7 +595,7 @@ prefers the Decoupled Hierarchical model for the reasons stated below).
|
|||||||
|
|
||||||

|

|
||||||
|
|
||||||
## Ubernetes API
|
## Cluster Federation API
|
||||||
|
|
||||||
It is proposed that this look a lot like the existing Kubernetes API
|
It is proposed that this look a lot like the existing Kubernetes API
|
||||||
but be explicitly multi-cluster.
|
but be explicitly multi-cluster.
|
||||||
@ -603,7 +603,8 @@ but be explicitly multi-cluster.
|
|||||||
+ Clusters become first class objects, which can be registered,
|
+ Clusters become first class objects, which can be registered,
|
||||||
listed, described, deregistered etc via the API.
|
listed, described, deregistered etc via the API.
|
||||||
+ Compute resources can be explicitly requested in specific clusters,
|
+ Compute resources can be explicitly requested in specific clusters,
|
||||||
or automatically scheduled to the "best" cluster by Ubernetes (by a
|
or automatically scheduled to the "best" cluster by the Cluster
|
||||||
|
Federation control system (by a
|
||||||
pluggable Policy Engine).
|
pluggable Policy Engine).
|
||||||
+ There is a federated equivalent of a replication controller type (or
|
+ There is a federated equivalent of a replication controller type (or
|
||||||
perhaps a [deployment](deployment.md)),
|
perhaps a [deployment](deployment.md)),
|
||||||
@ -627,14 +628,15 @@ Controllers and related Services accordingly).
|
|||||||
This should ideally be delegated to some external auth system, shared
|
This should ideally be delegated to some external auth system, shared
|
||||||
by the underlying clusters, to avoid duplication and inconsistency.
|
by the underlying clusters, to avoid duplication and inconsistency.
|
||||||
Either that, or we end up with multilevel auth. Local readonly
|
Either that, or we end up with multilevel auth. Local readonly
|
||||||
eventually consistent auth slaves in each cluster and in Ubernetes
|
eventually consistent auth slaves in each cluster and in the Cluster
|
||||||
|
Federation control system
|
||||||
could potentially cache auth, to mitigate an SPOF auth system.
|
could potentially cache auth, to mitigate an SPOF auth system.
|
||||||
|
|
||||||
## Data consistency, failure and availability characteristics
|
## Data consistency, failure and availability characteristics
|
||||||
|
|
||||||
The services comprising the Ubernetes Control Plane) have to run
|
The services comprising the Cluster Federation control plane) have to run
|
||||||
somewhere. Several options exist here:
|
somewhere. Several options exist here:
|
||||||
* For high availability Ubernetes deployments, these
|
* For high availability Cluster Federation deployments, these
|
||||||
services may run in either:
|
services may run in either:
|
||||||
* a dedicated Kubernetes cluster, not co-located in the same
|
* a dedicated Kubernetes cluster, not co-located in the same
|
||||||
availability zone with any of the federated clusters (for fault
|
availability zone with any of the federated clusters (for fault
|
||||||
@ -672,7 +674,7 @@ does the zookeeper config look like for N=3 across 3 AZs -- and how
|
|||||||
does each replica find the other replicas and how do clients find
|
does each replica find the other replicas and how do clients find
|
||||||
their primary zookeeper replica? And now how do I do a shared, highly
|
their primary zookeeper replica? And now how do I do a shared, highly
|
||||||
available redis database? Use a few common specific use cases like
|
available redis database? Use a few common specific use cases like
|
||||||
this to flesh out the detailed API and semantics of Ubernetes.
|
this to flesh out the detailed API and semantics of Cluster Federation.
|
||||||
|
|
||||||
|
|
||||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||||
|
@ -79,10 +79,11 @@ The design of the pipeline for collecting application level metrics should
|
|||||||
be revisited and it's not clear whether application level metrics should be
|
be revisited and it's not clear whether application level metrics should be
|
||||||
available in API server so the use case initially won't be supported.
|
available in API server so the use case initially won't be supported.
|
||||||
|
|
||||||
#### Ubernetes
|
#### Cluster Federation
|
||||||
|
|
||||||
Ubernetes might want to consider cluster-level usage (in addition to cluster-level request)
|
The Cluster Federation control system might want to consider cluster-level usage (in addition to cluster-level request)
|
||||||
of running pods when choosing where to schedule new pods. Although Ubernetes is still in design,
|
of running pods when choosing where to schedule new pods. Although
|
||||||
|
Cluster Federation is still in design,
|
||||||
we expect the metrics API described here to be sufficient. Cluster-level usage can be
|
we expect the metrics API described here to be sufficient. Cluster-level usage can be
|
||||||
obtained by summing over usage of all nodes in the cluster.
|
obtained by summing over usage of all nodes in the cluster.
|
||||||
|
|
||||||
|
@ -1174,8 +1174,8 @@ func newAWSDisk(aws *Cloud, name string) (*awsDisk, error) {
|
|||||||
// The original idea of the URL-style name was to put the AZ into the
|
// The original idea of the URL-style name was to put the AZ into the
|
||||||
// host, so we could find the AZ immediately from the name without
|
// host, so we could find the AZ immediately from the name without
|
||||||
// querying the API. But it turns out we don't actually need it for
|
// querying the API. But it turns out we don't actually need it for
|
||||||
// Ubernetes-Lite, as we put the AZ into the labels on the PV instead.
|
// multi-AZ clusters, as we put the AZ into the labels on the PV instead.
|
||||||
// However, if in future we want to support Ubernetes-Lite
|
// However, if in future we want to support multi-AZ cluster
|
||||||
// volume-awareness without using PersistentVolumes, we likely will
|
// volume-awareness without using PersistentVolumes, we likely will
|
||||||
// want the AZ in the host.
|
// want the AZ in the host.
|
||||||
|
|
||||||
|
@ -81,7 +81,7 @@ type GCECloud struct {
|
|||||||
projectID string
|
projectID string
|
||||||
region string
|
region string
|
||||||
localZone string // The zone in which we are running
|
localZone string // The zone in which we are running
|
||||||
managedZones []string // List of zones we are spanning (for Ubernetes-Lite, primarily when running on master)
|
managedZones []string // List of zones we are spanning (for multi-AZ clusters, primarily when running on master)
|
||||||
networkURL string
|
networkURL string
|
||||||
nodeTags []string // List of tags to use on firewall rules for load balancers
|
nodeTags []string // List of tags to use on firewall rules for load balancers
|
||||||
nodeInstancePrefix string // If non-"", an advisory prefix for all nodes in the cluster
|
nodeInstancePrefix string // If non-"", an advisory prefix for all nodes in the cluster
|
||||||
|
@ -32,8 +32,8 @@ import (
|
|||||||
"k8s.io/kubernetes/test/e2e/framework"
|
"k8s.io/kubernetes/test/e2e/framework"
|
||||||
)
|
)
|
||||||
|
|
||||||
var _ = framework.KubeDescribe("Ubernetes Lite", func() {
|
var _ = framework.KubeDescribe("Multi-AZ Clusters", func() {
|
||||||
f := framework.NewDefaultFramework("ubernetes-lite")
|
f := framework.NewDefaultFramework("multi-az")
|
||||||
var zoneCount int
|
var zoneCount int
|
||||||
var err error
|
var err error
|
||||||
image := "gcr.io/google_containers/serve_hostname:v1.4"
|
image := "gcr.io/google_containers/serve_hostname:v1.4"
|
||||||
|
Loading…
Reference in New Issue
Block a user