mirror of
https://github.com/k3s-io/kubernetes.git
synced 2026-01-06 07:57:35 +00:00
Remove all docs which are moving to http://kubernetes.github.io
All .md files now are only a pointer to where they likely are on the new site. All other files are untouched.
This commit is contained in:
@@ -32,42 +32,7 @@ Documentation for other releases can be found at
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
# Kubernetes Cluster Admin Guide
|
||||
|
||||
The cluster admin guide is for anyone creating or administering a Kubernetes cluster.
|
||||
It assumes some familiarity with concepts in the [User Guide](../user-guide/README.md).
|
||||
|
||||
## Admin Guide Table of Contents
|
||||
|
||||
1. [Introduction](introduction.md)
|
||||
1. [Components of a cluster](cluster-components.md)
|
||||
1. [Cluster Management](cluster-management.md)
|
||||
1. Kubernetes Master Components
|
||||
1. [The kube-apiserver binary](kube-apiserver.md)
|
||||
1. [Authorization](authorization.md)
|
||||
1. [Authentication](authentication.md)
|
||||
1. [Accessing the api](accessing-the-api.md)
|
||||
1. [Admission Controllers](admission-controllers.md)
|
||||
1. [Administrating Service Accounts](service-accounts-admin.md)
|
||||
1. [Resource Quotas](resource-quota.md)
|
||||
1. [The kube-scheduler binary](kube-scheduler.md)
|
||||
1. [The kube-controller-manager binary](kube-controller-manager.md)
|
||||
1. [Kubernetes Node Components](node.md)
|
||||
1. [The kubelet binary](kubelet.md)
|
||||
1. [Garbage Collection](garbage-collection.md)
|
||||
1. [The kube-proxy binary](kube-proxy.md)
|
||||
1. Cluster Addons
|
||||
1. [DNS](dns.md)
|
||||
1. [Networking](networking.md)
|
||||
1. [OVS Networking](ovs-networking.md)
|
||||
1. [Master <-> Node Communication](master-node-communication.md)
|
||||
1. Example Configurations
|
||||
1. [Multiple Clusters](multi-cluster.md)
|
||||
1. [High Availability Clusters](high-availability.md)
|
||||
1. [Large Clusters](cluster-large.md)
|
||||
1. [Getting started from scratch](../getting-started-guides/scratch.md)
|
||||
1. [Kubernetes's use of salt](salt.md)
|
||||
1. [Troubleshooting](cluster-troubleshooting.md)
|
||||
This file has moved to: http://kubernetes.github.io/docs/admin/README/
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
|
||||
@@ -32,72 +32,8 @@ Documentation for other releases can be found at
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
# Configuring APIserver ports
|
||||
This file has moved to: http://kubernetes.github.io/docs/admin/accessing-the-api/
|
||||
|
||||
This document describes what ports the Kubernetes apiserver
|
||||
may serve on and how to reach them. The audience is
|
||||
cluster administrators who want to customize their cluster
|
||||
or understand the details.
|
||||
|
||||
Most questions about accessing the cluster are covered
|
||||
in [Accessing the cluster](../user-guide/accessing-the-cluster.md).
|
||||
|
||||
|
||||
## Ports and IPs Served On
|
||||
|
||||
The Kubernetes API is served by the Kubernetes apiserver process. Typically,
|
||||
there is one of these running on a single kubernetes-master node.
|
||||
|
||||
By default the Kubernetes APIserver serves HTTP on 2 ports:
|
||||
1. Localhost Port
|
||||
- serves HTTP
|
||||
- default is port 8080, change with `--insecure-port` flag.
|
||||
- defaults IP is localhost, change with `--insecure-bind-address` flag.
|
||||
- no authentication or authorization checks in HTTP
|
||||
- protected by need to have host access
|
||||
2. Secure Port
|
||||
- default is port 6443, change with `--secure-port` flag.
|
||||
- default IP is first non-localhost network interface, change with `--bind-address` flag.
|
||||
- serves HTTPS. Set cert with `--tls-cert-file` and key with `--tls-private-key-file` flag.
|
||||
- uses token-file or client-certificate based [authentication](authentication.md).
|
||||
- uses policy-based [authorization](authorization.md).
|
||||
3. Removed: ReadOnly Port
|
||||
- For security reasons, this had to be removed. Use the [service account](../user-guide/service-accounts.md) feature instead.
|
||||
|
||||
## Proxies and Firewall rules
|
||||
|
||||
Additionally, in some configurations there is a proxy (nginx) running
|
||||
on the same machine as the apiserver process. The proxy serves HTTPS protected
|
||||
by Basic Auth on port 443, and proxies to the apiserver on localhost:8080. In
|
||||
these configurations the secure port is typically set to 6443.
|
||||
|
||||
A firewall rule is typically configured to allow external HTTPS access to port 443.
|
||||
|
||||
The above are defaults and reflect how Kubernetes is deployed to Google Compute Engine using
|
||||
kube-up.sh. Other cloud providers may vary.
|
||||
|
||||
## Use Cases vs IP:Ports
|
||||
|
||||
There are three differently configured serving ports because there are a
|
||||
variety of uses cases:
|
||||
1. Clients outside of a Kubernetes cluster, such as human running `kubectl`
|
||||
on desktop machine. Currently, accesses the Localhost Port via a proxy (nginx)
|
||||
running on the `kubernetes-master` machine. The proxy can use cert-based authentication
|
||||
or token-based authentication.
|
||||
2. Processes running in Containers on Kubernetes that need to read from
|
||||
the apiserver. Currently, these can use a [service account](../user-guide/service-accounts.md).
|
||||
3. Scheduler and Controller-manager processes, which need to do read-write
|
||||
API operations, using service accounts to avoid the need to be co-located.
|
||||
4. Kubelets, which need to do read-write API operations and are necessarily
|
||||
on different machines than the apiserver. Kubelet uses the Secure Port
|
||||
to get their pods, to find the services that a pod can see, and to
|
||||
write events. Credentials are distributed to kubelets at cluster
|
||||
setup time. Kubelet and kube-proxy can use cert-based authentication or token-based
|
||||
authentication.
|
||||
|
||||
## Expected changes
|
||||
|
||||
- Policy will limit the actions kubelets can do via the authed port.
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
|
||||
@@ -32,170 +32,7 @@ Documentation for other releases can be found at
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
# Admission Controllers
|
||||
|
||||
**Table of Contents**
|
||||
<!-- BEGIN MUNGE: GENERATED_TOC -->
|
||||
|
||||
- [Admission Controllers](#admission-controllers)
|
||||
- [What are they?](#what-are-they)
|
||||
- [Why do I need them?](#why-do-i-need-them)
|
||||
- [How do I turn on an admission control plug-in?](#how-do-i-turn-on-an-admission-control-plug-in)
|
||||
- [What does each plug-in do?](#what-does-each-plug-in-do)
|
||||
- [AlwaysAdmit](#alwaysadmit)
|
||||
- [AlwaysPullImages](#alwayspullimages)
|
||||
- [AlwaysDeny](#alwaysdeny)
|
||||
- [DenyExecOnPrivileged (deprecated)](#denyexeconprivileged-deprecated)
|
||||
- [DenyEscalatingExec](#denyescalatingexec)
|
||||
- [ServiceAccount](#serviceaccount)
|
||||
- [SecurityContextDeny](#securitycontextdeny)
|
||||
- [ResourceQuota](#resourcequota)
|
||||
- [LimitRanger](#limitranger)
|
||||
- [InitialResources (experimental)](#initialresources-experimental)
|
||||
- [NamespaceExists (deprecated)](#namespaceexists-deprecated)
|
||||
- [NamespaceAutoProvision (deprecated)](#namespaceautoprovision-deprecated)
|
||||
- [NamespaceLifecycle](#namespacelifecycle)
|
||||
- [Is there a recommended set of plug-ins to use?](#is-there-a-recommended-set-of-plug-ins-to-use)
|
||||
|
||||
<!-- END MUNGE: GENERATED_TOC -->
|
||||
|
||||
## What are they?
|
||||
|
||||
An admission control plug-in is a piece of code that intercepts requests to the Kubernetes
|
||||
API server prior to persistence of the object, but after the request is authenticated
|
||||
and authorized. The plug-in code is in the API server process
|
||||
and must be compiled into the binary in order to be used at this time.
|
||||
|
||||
Each admission control plug-in is run in sequence before a request is accepted into the cluster. If
|
||||
any of the plug-ins in the sequence reject the request, the entire request is rejected immediately
|
||||
and an error is returned to the end-user.
|
||||
|
||||
Admission control plug-ins may mutate the incoming object in some cases to apply system configured
|
||||
defaults. In addition, admission control plug-ins may mutate related resources as part of request
|
||||
processing to do things like increment quota usage.
|
||||
|
||||
## Why do I need them?
|
||||
|
||||
Many advanced features in Kubernetes require an admission control plug-in to be enabled in order
|
||||
to properly support the feature. As a result, a Kubernetes API server that is not properly
|
||||
configured with the right set of admission control plug-ins is an incomplete server and will not
|
||||
support all the features you expect.
|
||||
|
||||
## How do I turn on an admission control plug-in?
|
||||
|
||||
The Kubernetes API server supports a flag, `admission-control` that takes a comma-delimited,
|
||||
ordered list of admission control choices to invoke prior to modifying objects in the cluster.
|
||||
|
||||
## What does each plug-in do?
|
||||
|
||||
### AlwaysAdmit
|
||||
|
||||
Use this plugin by itself to pass-through all requests.
|
||||
|
||||
### AlwaysPullImages
|
||||
|
||||
This plug-in modifies every new Pod to force the image pull policy to Always. This is useful in a
|
||||
multitenant cluster so that users can be assured that their private images can only be used by those
|
||||
who have the credentials to pull them. Without this plug-in, once an image has been pulled to a
|
||||
node, any pod from any user can use it simply by knowing the image's name (assuming the Pod is
|
||||
scheduled onto the right node), without any authorization check against the image. When this plug-in
|
||||
is enabled, images are always pulled prior to starting containers, which means valid credentials are
|
||||
required.
|
||||
|
||||
### AlwaysDeny
|
||||
|
||||
Rejects all requests. Used for testing.
|
||||
|
||||
### DenyExecOnPrivileged (deprecated)
|
||||
|
||||
This plug-in will intercept all requests to exec a command in a pod if that pod has a privileged container.
|
||||
|
||||
If your cluster supports privileged containers, and you want to restrict the ability of end-users to exec
|
||||
commands in those containers, we strongly encourage enabling this plug-in.
|
||||
|
||||
This functionality has been merged into [DenyEscalatingExec](#denyescalatingexec).
|
||||
|
||||
### DenyEscalatingExec
|
||||
|
||||
This plug-in will deny exec and attach commands to pods that run with escalated privileges that
|
||||
allow host access. This includes pods that run as privileged, have access to the host IPC namespace, and
|
||||
have access to the host PID namespace.
|
||||
|
||||
If your cluster supports containers that run with escalated privileges, and you want to
|
||||
restrict the ability of end-users to exec commands in those containers, we strongly encourage
|
||||
enabling this plug-in.
|
||||
|
||||
### ServiceAccount
|
||||
|
||||
This plug-in implements automation for [serviceAccounts](../user-guide/service-accounts.md).
|
||||
We strongly recommend using this plug-in if you intend to make use of Kubernetes `ServiceAccount` objects.
|
||||
|
||||
### SecurityContextDeny
|
||||
|
||||
This plug-in will deny any pod with a [SecurityContext](../user-guide/security-context.md) that defines options that were not available on the `Container`.
|
||||
|
||||
### ResourceQuota
|
||||
|
||||
This plug-in will observe the incoming request and ensure that it does not violate any of the constraints
|
||||
enumerated in the `ResourceQuota` object in a `Namespace`. If you are using `ResourceQuota`
|
||||
objects in your Kubernetes deployment, you MUST use this plug-in to enforce quota constraints.
|
||||
|
||||
See the [resourceQuota design doc](../design/admission_control_resource_quota.md) and the [example of Resource Quota](resourcequota/) for more details.
|
||||
|
||||
It is strongly encouraged that this plug-in is configured last in the sequence of admission control plug-ins. This is
|
||||
so that quota is not prematurely incremented only for the request to be rejected later in admission control.
|
||||
|
||||
### LimitRanger
|
||||
|
||||
This plug-in will observe the incoming request and ensure that it does not violate any of the constraints
|
||||
enumerated in the `LimitRange` object in a `Namespace`. If you are using `LimitRange` objects in
|
||||
your Kubernetes deployment, you MUST use this plug-in to enforce those constraints. LimitRanger can also
|
||||
be used to apply default resource requests to Pods that don't specify any; currently, the default LimitRanger
|
||||
applies a 0.1 CPU requirement to all Pods in the `default` namespace.
|
||||
|
||||
See the [limitRange design doc](../design/admission_control_limit_range.md) and the [example of Limit Range](limitrange/) for more details.
|
||||
|
||||
### InitialResources (experimental)
|
||||
|
||||
This plug-in observes pod creation requests. If a container omits compute resource requests and limits,
|
||||
then the plug-in auto-populates a compute resource request based on historical usage of containers running the same image.
|
||||
If there is not enough data to make a decision the Request is left unchanged.
|
||||
When the plug-in sets a compute resource request, it annotates the pod with information on what compute resources it auto-populated.
|
||||
|
||||
See the [InitialResouces proposal](../proposals/initial-resources.md) for more details.
|
||||
|
||||
### NamespaceExists (deprecated)
|
||||
|
||||
This plug-in will observe all incoming requests that attempt to create a resource in a Kubernetes `Namespace`
|
||||
and reject the request if the `Namespace` was not previously created. We strongly recommend running
|
||||
this plug-in to ensure integrity of your data.
|
||||
|
||||
The functionality of this admission controller has been merged into `NamespaceLifecycle`
|
||||
|
||||
### NamespaceAutoProvision (deprecated)
|
||||
|
||||
This plug-in will observe all incoming requests that attempt to create a resource in a Kubernetes `Namespace`
|
||||
and create a new `Namespace` if one did not already exist previously.
|
||||
|
||||
We strongly recommend `NamespaceLifecycle` over `NamespaceAutoProvision`.
|
||||
|
||||
### NamespaceLifecycle
|
||||
|
||||
This plug-in enforces that a `Namespace` that is undergoing termination cannot have new objects created in it,
|
||||
and ensures that requests in a non-existent `Namespace` are rejected.
|
||||
|
||||
A `Namespace` deletion kicks off a sequence of operations that remove all objects (pods, services, etc.) in that
|
||||
namespace. In order to enforce integrity of that process, we strongly recommend running this plug-in.
|
||||
|
||||
## Is there a recommended set of plug-ins to use?
|
||||
|
||||
Yes.
|
||||
|
||||
For Kubernetes 1.0, we strongly recommend running the following set of admission control plug-ins (order matters):
|
||||
|
||||
```
|
||||
--admission-control=NamespaceLifecycle,LimitRanger,SecurityContextDeny,ServiceAccount,ResourceQuota
|
||||
```
|
||||
This file has moved to: http://kubernetes.github.io/docs/admin/admission-controllers/
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
|
||||
@@ -32,138 +32,8 @@ Documentation for other releases can be found at
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
# Authentication Plugins
|
||||
This file has moved to: http://kubernetes.github.io/docs/admin/authentication/
|
||||
|
||||
Kubernetes uses client certificates, tokens, or http basic auth to authenticate users for API calls.
|
||||
|
||||
**Client certificate authentication** is enabled by passing the `--client-ca-file=SOMEFILE`
|
||||
option to apiserver. The referenced file must contain one or more certificates authorities
|
||||
to use to validate client certificates presented to the apiserver. If a client certificate
|
||||
is presented and verified, the common name of the subject is used as the user name for the
|
||||
request.
|
||||
|
||||
**Token File** is enabled by passing the `--token-auth-file=SOMEFILE` option
|
||||
to apiserver. Currently, tokens last indefinitely, and the token list cannot
|
||||
be changed without restarting apiserver.
|
||||
|
||||
The token file format is implemented in `plugin/pkg/auth/authenticator/token/tokenfile/...`
|
||||
and is a csv file with a minimum of 3 columns: token, user name, user uid, followed by
|
||||
optional group names. Note, if you have more than one group the column must be double quoted e.g.
|
||||
|
||||
```
|
||||
token,user,uid,"group1,group2,group3"
|
||||
```
|
||||
|
||||
When using token authentication from an http client the apiserver expects an `Authorization`
|
||||
header with a value of `Bearer SOMETOKEN`.
|
||||
|
||||
**OpenID Connect ID Token** is enabled by passing the following options to the apiserver:
|
||||
- `--oidc-issuer-url` (required) tells the apiserver where to connect to the OpenID provider. Only HTTPS scheme will be accepted.
|
||||
- `--oidc-client-id` (required) is used by apiserver to verify the audience of the token.
|
||||
A valid [ID token](http://openid.net/specs/openid-connect-core-1_0.html#IDToken) MUST have this
|
||||
client-id in its `aud` claims.
|
||||
- `--oidc-ca-file` (optional) is used by apiserver to establish and verify the secure connection
|
||||
to the OpenID provider.
|
||||
- `--oidc-username-claim` (optional, experimental) specifies which OpenID claim to use as the user name. By default, `sub`
|
||||
will be used, which should be unique and immutable under the issuer's domain. Cluster administrator can
|
||||
choose other claims such as `email` to use as the user name, but the uniqueness and immutability is not guaranteed.
|
||||
- `--oidc-groups-claim` (optional, experimental) the name of a custom OpenID Connect claim for specifying user groups. The claim
|
||||
value is expected to be an array of strings.
|
||||
|
||||
Please note that this flag is still experimental until we settle more on how to handle the mapping of the OpenID user to the Kubernetes user. Thus further changes are possible.
|
||||
|
||||
Currently, the ID token will be obtained by some third-party app. This means the app and apiserver
|
||||
MUST share the `--oidc-client-id`.
|
||||
|
||||
Like **Token File**, when using token authentication from an http client the apiserver expects
|
||||
an `Authorization` header with a value of `Bearer SOMETOKEN`.
|
||||
|
||||
**Basic authentication** is enabled by passing the `--basic-auth-file=SOMEFILE`
|
||||
option to apiserver. Currently, the basic auth credentials last indefinitely,
|
||||
and the password cannot be changed without restarting apiserver. Note that basic
|
||||
authentication is currently supported for convenience while we finish making the
|
||||
more secure modes described above easier to use.
|
||||
|
||||
The basic auth file format is implemented in `plugin/pkg/auth/authenticator/password/passwordfile/...`
|
||||
and is a csv file with 3 columns: password, user name, user id.
|
||||
|
||||
When using basic authentication from an http client, the apiserver expects an `Authorization` header
|
||||
with a value of `Basic BASE64ENCODED(USER:PASSWORD)`.
|
||||
|
||||
**Keystone authentication** is enabled by passing the `--experimental-keystone-url=<AuthURL>`
|
||||
option to the apiserver during startup. The plugin is implemented in
|
||||
`plugin/pkg/auth/authenticator/request/keystone/keystone.go`.
|
||||
For details on how to use keystone to manage projects and users, refer to the
|
||||
[Keystone documentation](http://docs.openstack.org/developer/keystone/). Please note that
|
||||
this plugin is still experimental which means it is subject to changes.
|
||||
Please refer to the [discussion](https://github.com/kubernetes/kubernetes/pull/11798#issuecomment-129655212)
|
||||
and the [blueprint](https://github.com/kubernetes/kubernetes/issues/11626) for more details
|
||||
|
||||
## Plugin Development
|
||||
|
||||
We plan for the Kubernetes API server to issue tokens
|
||||
after the user has been (re)authenticated by a *bedrock* authentication
|
||||
provider external to Kubernetes. We plan to make it easy to develop modules
|
||||
that interface between Kubernetes and a bedrock authentication provider (e.g.
|
||||
github.com, google.com, enterprise directory, kerberos, etc.)
|
||||
|
||||
## APPENDIX
|
||||
|
||||
### Creating Certificates
|
||||
|
||||
When using client certificate authentication, you can generate certificates manually or
|
||||
using an existing deployment script.
|
||||
|
||||
**Deployment script** is implemented at
|
||||
`cluster/saltbase/salt/generate-cert/make-ca-cert.sh`.
|
||||
Execute this script with two parameters. First is the IP address of apiserver, the second is
|
||||
a list of subject alternate names in the form `IP:<ip-address> or DNS:<dns-name>`.
|
||||
The script will generate three files:ca.crt, server.crt and server.key.
|
||||
Finally, add these parameters
|
||||
`--client-ca-file=/srv/kubernetes/ca.crt`
|
||||
`--tls-cert-file=/srv/kubernetes/server.cert`
|
||||
`--tls-private-key-file=/srv/kubernetes/server.key`
|
||||
into apiserver start parameters.
|
||||
|
||||
**easyrsa** can be used to manually generate certificates for your cluster.
|
||||
|
||||
1. Download, unpack, and initialize the patched version of easyrsa3.
|
||||
|
||||
curl -L -O https://storage.googleapis.com/kubernetes-release/easy-rsa/easy-rsa.tar.gz
|
||||
tar xzf easy-rsa.tar.gz
|
||||
cd easy-rsa-master/easyrsa3
|
||||
./easyrsa init-pki
|
||||
1. Generate a CA. (`--batch` set automatic mode. `--req-cn` default CN to use.)
|
||||
|
||||
./easyrsa --batch "--req-cn=${MASTER_IP}@`date +%s`" build-ca nopass
|
||||
1. Generate server certificate and key.
|
||||
(build-server-full [filename]: Generate a keypair and sign locally for a client or server)
|
||||
|
||||
./easyrsa --subject-alt-name="IP:${MASTER_IP}" build-server-full kubernetes-master nopass
|
||||
1. Copy `pki/ca.crt` `pki/issued/kubernetes-master.crt`
|
||||
`pki/private/kubernetes-master.key` to your directory.
|
||||
1. Remember fill the parameters
|
||||
`--client-ca-file=/yourdirectory/ca.crt`
|
||||
`--tls-cert-file=/yourdirectory/server.cert`
|
||||
`--tls-private-key-file=/yourdirectory/server.key`
|
||||
and add these into apiserver start parameters.
|
||||
|
||||
**openssl** can also be use to manually generate certificates for your cluster.
|
||||
|
||||
1. Generate a ca.key with 2048bit
|
||||
`openssl genrsa -out ca.key 2048`
|
||||
1. According to the ca.key generate a ca.crt. (-days set the certificate effective time).
|
||||
`openssl req -x509 -new -nodes -key ca.key -subj "/CN=${MASTER_IP}" -days 10000 -out ca.crt`
|
||||
1. Generate a server.key with 2048bit
|
||||
`openssl genrsa -out server.key 2048`
|
||||
1. According to the server.key generate a server.csr.
|
||||
`openssl req -new -key server.key -subj "/CN=${MASTER_IP}" -out server.csr`
|
||||
1. According to the ca.key, ca.crt and server.csr generate the server.crt.
|
||||
`openssl x509 -req -in server.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out server.crt
|
||||
-days 10000`
|
||||
1. View the certificate.
|
||||
`openssl x509 -noout -text -in ./server.crt`
|
||||
Finally, do not forget fill the same parameters and add parameters into apiserver start parameters.
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
|
||||
@@ -32,269 +32,7 @@ Documentation for other releases can be found at
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
# Authorization Plugins
|
||||
|
||||
|
||||
In Kubernetes, authorization happens as a separate step from authentication.
|
||||
See the [authentication documentation](authentication.md) for an
|
||||
overview of authentication.
|
||||
|
||||
Authorization applies to all HTTP accesses on the main (secure) apiserver port.
|
||||
|
||||
The authorization check for any request compares attributes of the context of
|
||||
the request, (such as user, resource, and namespace) with access
|
||||
policies. An API call must be allowed by some policy in order to proceed.
|
||||
|
||||
The following implementations are available, and are selected by flag:
|
||||
- `--authorization-mode=AlwaysDeny`
|
||||
- `--authorization-mode=AlwaysAllow`
|
||||
- `--authorization-mode=ABAC`
|
||||
- `--authorization-mode=Webhook`
|
||||
|
||||
`AlwaysDeny` blocks all requests (used in tests).
|
||||
`AlwaysAllow` allows all requests; use if you don't need authorization.
|
||||
`ABAC` allows for user-configured authorization policy. ABAC stands for Attribute-Based Access Control.
|
||||
`Webhook` allows for authorization to be driven by a remote service using REST.
|
||||
|
||||
## ABAC Mode
|
||||
|
||||
### Request Attributes
|
||||
|
||||
A request has the following attributes that can be considered for authorization:
|
||||
- user (the user-string which a user was authenticated as).
|
||||
- group (the list of group names the authenticated user is a member of).
|
||||
- whether the request is for an API resource.
|
||||
- the request path.
|
||||
- allows authorizing access to miscellaneous endpoints like `/api` or `/healthz` (see [kubectl](#kubectl)).
|
||||
- the request verb.
|
||||
- API verbs like `get`, `list`, `create`, `update`, `watch`, `delete`, and `deletecollection` are used for API requests
|
||||
- HTTP verbs like `get`, `post`, `put`, and `delete` are used for non-API requests
|
||||
- what resource is being accessed (for API requests only)
|
||||
- the namespace of the object being accessed (for namespaced API requests only)
|
||||
- the API group being accessed (for API requests only)
|
||||
|
||||
We anticipate adding more attributes to allow finer grained access control and
|
||||
to assist in policy management.
|
||||
|
||||
### Policy File Format
|
||||
|
||||
For mode `ABAC`, also specify `--authorization-policy-file=SOME_FILENAME`.
|
||||
|
||||
The file format is [one JSON object per line](http://jsonlines.org/). There should be no enclosing list or map, just
|
||||
one map per line.
|
||||
|
||||
Each line is a "policy object". A policy object is a map with the following properties:
|
||||
- Versioning properties:
|
||||
- `apiVersion`, type string; valid values are "abac.authorization.kubernetes.io/v1beta1". Allows versioning and conversion of the policy format.
|
||||
- `kind`, type string: valid values are "Policy". Allows versioning and conversion of the policy format.
|
||||
|
||||
- `spec` property set to a map with the following properties:
|
||||
- Subject-matching properties:
|
||||
- `user`, type string; the user-string from `--token-auth-file`. If you specify `user`, it must match the username of the authenticated user. `*` matches all requests.
|
||||
- `group`, type string; if you specify `group`, it must match one of the groups of the authenticated user. `*` matches all requests.
|
||||
|
||||
- `readonly`, type boolean, when true, means that the policy only applies to get, list, and watch operations.
|
||||
|
||||
- Resource-matching properties:
|
||||
- `apiGroup`, type string; an API group, such as `extensions`. `*` matches all API groups.
|
||||
- `namespace`, type string; a namespace string. `*` matches all resource requests.
|
||||
- `resource`, type string; a resource, such as `pods`. `*` matches all resource requests.
|
||||
|
||||
- Non-resource-matching properties:
|
||||
- `nonResourcePath`, type string; matches the non-resource request paths (like `/version` and `/apis`). `*` matches all non-resource requests. `/foo/*` matches `/foo/` and all of its subpaths.
|
||||
|
||||
An unset property is the same as a property set to the zero value for its type (e.g. empty string, 0, false).
|
||||
However, unset should be preferred for readability.
|
||||
|
||||
In the future, policies may be expressed in a JSON format, and managed via a REST interface.
|
||||
|
||||
### Authorization Algorithm
|
||||
|
||||
A request has attributes which correspond to the properties of a policy object.
|
||||
|
||||
When a request is received, the attributes are determined. Unknown attributes
|
||||
are set to the zero value of its type (e.g. empty string, 0, false).
|
||||
|
||||
A property set to "*" will match any value of the corresponding attribute.
|
||||
|
||||
The tuple of attributes is checked for a match against every policy in the policy file.
|
||||
If at least one line matches the request attributes, then the request is authorized (but may fail later validation).
|
||||
|
||||
To permit any user to do something, write a policy with the user property set to "*".
|
||||
To permit a user to do anything, write a policy with the apiGroup, namespace, resource, and nonResourcePath properties set to "*".
|
||||
|
||||
### Kubectl
|
||||
|
||||
Kubectl uses the `/api` and `/apis` endpoints of api-server to negotiate client/server versions. To validate objects sent to the API by create/update operations, kubectl queries certain swagger resources. For API version `v1` those would be `/swaggerapi/api/v1` & `/swaggerapi/experimental/v1`.
|
||||
|
||||
When using ABAC authorization, those special resources have to be explicitly exposed via the `nonResourcePath` property in a policy (see [examples](#examples) below):
|
||||
|
||||
* `/api`, `/api/*`, `/apis`, and `/apis/*` for API version negotiation.
|
||||
* `/version` for retrieving the server version via `kubectl version`.
|
||||
* `/swaggerapi/*` for create/update operations.
|
||||
|
||||
To inspect the HTTP calls involved in a specific kubectl operation you can turn up the verbosity:
|
||||
|
||||
kubectl --v=8 version
|
||||
|
||||
### Examples
|
||||
|
||||
1. Alice can do anything to all resources: `{"apiVersion": "abac.authorization.kubernetes.io/v1beta1", "kind": "Policy", "spec": {"user": "alice", "namespace": "*", "resource": "*", "apiGroup": "*"}}`
|
||||
2. Kubelet can read any pods: `{"apiVersion": "abac.authorization.kubernetes.io/v1beta1", "kind": "Policy", "spec": {"user": "kubelet", "namespace": "*", "resource": "pods", "readonly": true}}`
|
||||
3. Kubelet can read and write events: `{"apiVersion": "abac.authorization.kubernetes.io/v1beta1", "kind": "Policy", "spec": {"user": "kubelet", "namespace": "*", "resource": "events"}}`
|
||||
4. Bob can just read pods in namespace "projectCaribou": `{"apiVersion": "abac.authorization.kubernetes.io/v1beta1", "kind": "Policy", "spec": {"user": "bob", "namespace": "projectCaribou", "resource": "pods", "readonly": true}}`
|
||||
5. Anyone can make read-only requests to all non-API paths: `{"apiVersion": "abac.authorization.kubernetes.io/v1beta1", "kind": "Policy", "spec": {"user": "*", "readonly": true, "nonResourcePath": "*"}}`
|
||||
|
||||
[Complete file example](http://releases.k8s.io/HEAD/pkg/auth/authorizer/abac/example_policy_file.jsonl)
|
||||
|
||||
### A quick note on service accounts
|
||||
|
||||
A service account automatically generates a user. The user's name is generated according to the naming convention:
|
||||
|
||||
```
|
||||
system:serviceaccount:<namespace>:<serviceaccountname>
|
||||
```
|
||||
|
||||
Creating a new namespace also causes a new service account to be created, of this form:*
|
||||
|
||||
```
|
||||
system:serviceaccount:<namespace>:default
|
||||
```
|
||||
|
||||
For example, if you wanted to grant the default service account in the kube-system full privilege to the API, you would add this line to your policy file:
|
||||
|
||||
```json
|
||||
{"apiVersion":"abac.authorization.kubernetes.io/v1beta1","kind":"Policy","user":"system:serviceaccount:kube-system:default","namespace":"*","resource":"*","apiGroup":"*"}
|
||||
```
|
||||
|
||||
The apiserver will need to be restarted to pickup the new policy lines.
|
||||
|
||||
## Webhook Mode
|
||||
|
||||
When specified, mode `Webhook` causes Kubernetes to query an outside REST service when determining user privileges.
|
||||
|
||||
### Configuration File Format
|
||||
|
||||
Mode `Webhook` requires a file for HTTP configuration, specify by the `--authorization-webhook-config-file=SOME_FILENAME` flag.
|
||||
|
||||
The configuration file uses the [kubeconfig](../user-guide/kubeconfig-file.md) file format. Within the file "users" refers to the API Server webhook and "clusters" refers to the remote service.
|
||||
|
||||
A configuration example which uses HTTPS client auth:
|
||||
|
||||
```yaml
|
||||
# clusters refers to the remote service.
|
||||
clusters:
|
||||
- name: name-of-remote-authz-service
|
||||
cluster:
|
||||
certificate-authority: /path/to/ca.pem # CA for verifying the remote service.
|
||||
server: https://authz.example.com/authorize # URL of remote service to query. Must use 'https'.
|
||||
|
||||
# users refers to the API Server's webhook configuration.
|
||||
users:
|
||||
- name: name-of-api-server
|
||||
user:
|
||||
client-certificate: /path/to/cert.pem # cert for the webhook plugin to use
|
||||
client-key: /path/to/key.pem # key matching the cert
|
||||
```
|
||||
|
||||
### Request Payloads
|
||||
|
||||
When faced with an authorization decision, the API Server POSTs a JSON serialized api.authorization.v1beta1.SubjectAccessReview object describing the action. This object contains fields describing the user attempting to make the request, and either details about the resource being accessed or requests attributes.
|
||||
|
||||
Note that webhook API objects are subject to the same [versioning compatibility rules](../api.md) as other Kubernetes API objects. Implementers should be aware of loser compatibility promises for beta objects and check the "apiVersion" field of the request to ensure correct deserialization. Additionally, the API Server must enable the `authorization.k8s.io/v1beta1` API extensions group (`--runtime-config=authorization.k8s.io/v1beta1=true`).
|
||||
|
||||
An example request body:
|
||||
|
||||
```json
|
||||
{
|
||||
"apiVersion": "authorization.k8s.io/v1beta1",
|
||||
"kind": "SubjectAccessReview",
|
||||
"spec": {
|
||||
"resourceAttributes": {
|
||||
"namespace": "kittensandponies",
|
||||
"verb": "GET",
|
||||
"group": "*",
|
||||
"resource": "pods"
|
||||
},
|
||||
"user": "jane",
|
||||
"group": [
|
||||
"group1",
|
||||
"group2"
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The remote service is expected to fill the SubjectAccessReviewStatus field of the request and respond to either allow or disallow access. The response body's "spec" field is ignored and may be omitted. A permissive response would return:
|
||||
|
||||
```json
|
||||
{
|
||||
"apiVersion": "authorization.k8s.io/v1beta1",
|
||||
"kind": "SubjectAccessReview",
|
||||
"status": {
|
||||
"allowed": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
To disallow access, the remote service would return:
|
||||
|
||||
```json
|
||||
{
|
||||
"apiVersion": "authorization.k8s.io/v1beta1",
|
||||
"kind": "SubjectAccessReview",
|
||||
"status": {
|
||||
"allowed": false,
|
||||
"reason": "user does not have read access to the namespace"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Access to non-resource paths are sent as:
|
||||
|
||||
```json
|
||||
{
|
||||
"apiVersion": "authorization.k8s.io/v1beta1",
|
||||
"kind": "SubjectAccessReview",
|
||||
"spec": {
|
||||
"nonResourceAttributes": {
|
||||
"path": "/debug",
|
||||
"verb": "GET"
|
||||
},
|
||||
"user": "jane",
|
||||
"group": [
|
||||
"group1",
|
||||
"group2"
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Non-resource paths include: `/api`, `/apis`, `/metrics`, `/resetMetrics`, `/logs`, `/debug`, `/healthz`, `/swagger-ui/`, `/swaggerapi/`, `/ui`, and `/version.` Clients require access to `/api`, `/api/*/`, `/apis/`, `/apis/*`, `/apis/*/*`, and `/version` to discover what resources and versions are present on the server. Access to other non-resource paths can be disallowed without restricting access to the REST api.
|
||||
|
||||
For further documentation refer to the authorization.v1beta1 API objects and plugin/pkg/auth/authorizer/webhook/webhook.go.
|
||||
|
||||
## Plugin Development
|
||||
|
||||
Other implementations can be developed fairly easily.
|
||||
The APIserver calls the Authorizer interface:
|
||||
|
||||
```go
|
||||
type Authorizer interface {
|
||||
Authorize(a Attributes) error
|
||||
}
|
||||
```
|
||||
|
||||
to determine whether or not to allow each API action.
|
||||
|
||||
An authorization plugin is a module that implements this interface.
|
||||
Authorization plugin code goes in `pkg/auth/authorizer/$MODULENAME`.
|
||||
|
||||
An authorization module can be completely implemented in go, or can call out
|
||||
to a remote authorization service. Authorization modules can implement
|
||||
their own caching to reduce the cost of repeated authorization calls with the
|
||||
same or similar arguments. Developers should then consider the interaction between
|
||||
caching and revocation of permissions.
|
||||
This file has moved to: http://kubernetes.github.io/docs/admin/authorization/
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
|
||||
@@ -32,173 +32,8 @@ Documentation for other releases can be found at
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
# Kubernetes Cluster Admin Guide: Cluster Components
|
||||
This file has moved to: http://kubernetes.github.io/docs/admin/cluster-components/
|
||||
|
||||
**Table of Contents**
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_TOC -->
|
||||
|
||||
- [Kubernetes Cluster Admin Guide: Cluster Components](#kubernetes-cluster-admin-guide-cluster-components)
|
||||
- [Master Components](#master-components)
|
||||
- [kube-apiserver](#kube-apiserver)
|
||||
- [etcd](#etcd)
|
||||
- [kube-controller-manager](#kube-controller-manager)
|
||||
- [kube-scheduler](#kube-scheduler)
|
||||
- [addons](#addons)
|
||||
- [DNS](#dns)
|
||||
- [User interface](#user-interface)
|
||||
- [Container Resource Monitoring](#container-resource-monitoring)
|
||||
- [Cluster-level Logging](#cluster-level-logging)
|
||||
- [Node components](#node-components)
|
||||
- [kubelet](#kubelet)
|
||||
- [kube-proxy](#kube-proxy)
|
||||
- [docker](#docker)
|
||||
- [rkt](#rkt)
|
||||
- [supervisord](#supervisord)
|
||||
- [fluentd](#fluentd)
|
||||
|
||||
<!-- END MUNGE: GENERATED_TOC -->
|
||||
|
||||
This document outlines the various binary components that need to run to
|
||||
deliver a functioning Kubernetes cluster.
|
||||
|
||||
## Master Components
|
||||
|
||||
Master components are those that provide the cluster's control plane. For
|
||||
example, master components are responsible for making global decisions about the
|
||||
cluster (e.g., scheduling), and detecting and responding to cluster events
|
||||
(e.g., starting up a new pod when a replication controller's 'replicas' field is
|
||||
unsatisfied).
|
||||
|
||||
Master components could in theory be run on any node in the cluster. However,
|
||||
for simplicity, current set up scripts typically start all master components on
|
||||
the same VM, and does not run user containers on this VM. See
|
||||
[high-availability.md](high-availability.md) for an example multi-master-VM setup.
|
||||
|
||||
Even in the future, when Kubernetes is fully self-hosting, it will probably be
|
||||
wise to only allow master components to schedule on a subset of nodes, to limit
|
||||
co-running with user-run pods, reducing the possible scope of a
|
||||
node-compromising security exploit.
|
||||
|
||||
### kube-apiserver
|
||||
|
||||
[kube-apiserver](kube-apiserver.md) exposes the Kubernetes API; it is the front-end for the
|
||||
Kubernetes control plane. It is designed to scale horizontally (i.e., one scales
|
||||
it by running more of them-- [high-availability.md](high-availability.md)).
|
||||
|
||||
### etcd
|
||||
|
||||
[etcd](etcd.md) is used as Kubernetes' backing store. All cluster data is stored here.
|
||||
Proper administration of a Kubernetes cluster includes a backup plan for etcd's
|
||||
data.
|
||||
|
||||
### kube-controller-manager
|
||||
|
||||
[kube-controller-manager](kube-controller-manager.md) is a binary that runs controllers, which are the
|
||||
background threads that handle routine tasks in the cluster. Logically, each
|
||||
controller is a separate process, but to reduce the number of moving pieces in
|
||||
the system, they are all compiled into a single binary and run in a single
|
||||
process.
|
||||
|
||||
These controllers include:
|
||||
|
||||
* Node Controller
|
||||
* Responsible for noticing & responding when nodes go down.
|
||||
* Replication Controller
|
||||
* Responsible for maintaining the correct number of pods for every replication
|
||||
controller object in the system.
|
||||
* Endpoints Controller
|
||||
* Populates the Endpoints object (i.e., join Services & Pods).
|
||||
* Service Account & Token Controllers
|
||||
* Create default accounts and API access tokens for new namespaces.
|
||||
* ... and others.
|
||||
|
||||
### kube-scheduler
|
||||
|
||||
[kube-scheduler](kube-scheduler.md) watches newly created pods that have no node assigned, and
|
||||
selects a node for them to run on.
|
||||
|
||||
### addons
|
||||
|
||||
Addons are pods and services that implement cluster features. They don't run on
|
||||
the master VM, but currently the default setup scripts that make the API calls
|
||||
to create these pods and services does run on the master VM. See:
|
||||
[kube-master-addons](http://releases.k8s.io/HEAD/cluster/saltbase/salt/kube-master-addons/kube-master-addons.sh)
|
||||
|
||||
Addon objects are created in the "kube-system" namespace.
|
||||
|
||||
#### DNS
|
||||
|
||||
While the other addons are not strictly required, all Kubernetes
|
||||
clusters should have [cluster DNS](dns.md), as many examples rely on it.
|
||||
|
||||
Cluster DNS is a DNS server, in addition to the other DNS server(s) in your
|
||||
environment, which serves DNS records for Kubernetes services.
|
||||
|
||||
Containers started by Kubernetes automatically include this DNS server
|
||||
in their DNS searches.
|
||||
|
||||
#### User interface
|
||||
|
||||
The kube-ui provides a read-only overview of the cluster state. Access
|
||||
[the UI using kubectl proxy](../user-guide/connecting-to-applications-proxy.md#connecting-to-the-kube-ui-service-from-your-local-workstation)
|
||||
|
||||
#### Container Resource Monitoring
|
||||
|
||||
[Container Resource Monitoring](../user-guide/monitoring.md) records generic time-series metrics
|
||||
about containers in a central database, and provides a UI for browsing that data.
|
||||
|
||||
#### Cluster-level Logging
|
||||
|
||||
[Container Logging](../user-guide/monitoring.md) saves container logs
|
||||
to a central log store with search/browsing interface. There are two
|
||||
implementations:
|
||||
|
||||
* [Cluster-level logging to Google Cloud Logging](
|
||||
docs/user-guide/logging.md#cluster-level-logging-to-google-cloud-logging)
|
||||
|
||||
* [Cluster-level Logging with Elasticsearch and Kibana](
|
||||
docs/user-guide/logging.md#cluster-level-logging-with-elasticsearch-and-kibana)
|
||||
|
||||
## Node components
|
||||
|
||||
Node components run on every node, maintaining running pods and providing them
|
||||
the Kubernetes runtime environment.
|
||||
|
||||
### kubelet
|
||||
|
||||
[kubelet](kubelet.md) is the primary node agent. It:
|
||||
* Watches for pods that have been assigned to its node (either by apiserver
|
||||
or via local configuration file) and:
|
||||
* Mounts the pod's required volumes
|
||||
* Downloads the pod's secrets
|
||||
* Run the pod's containers via docker (or, experimentally, rkt).
|
||||
* Periodically executes any requested container liveness probes.
|
||||
* Reports the status of the pod back to the rest of the system, by creating a
|
||||
"mirror pod" if necessary.
|
||||
* Reports the status of the node back to the rest of the system.
|
||||
|
||||
### kube-proxy
|
||||
|
||||
[kube-proxy](kube-proxy.md) enables the Kubernetes service abstraction by maintaining
|
||||
network rules on the host and performing connection forwarding.
|
||||
|
||||
### docker
|
||||
|
||||
`docker` is of course used for actually running containers.
|
||||
|
||||
### rkt
|
||||
|
||||
`rkt` is supported experimentally as an alternative to docker.
|
||||
|
||||
### supervisord
|
||||
|
||||
`supervisord` is a lightweight process babysitting system for keeping kubelet and docker
|
||||
running.
|
||||
|
||||
### fluentd
|
||||
|
||||
`fluentd` is a daemon which helps provide [cluster-level logging](#cluster-level-logging).
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
|
||||
@@ -32,96 +32,8 @@ Documentation for other releases can be found at
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
# Kubernetes Large Cluster
|
||||
This file has moved to: http://kubernetes.github.io/docs/admin/cluster-large/
|
||||
|
||||
## Support
|
||||
|
||||
At v1.2, Kubernetes supports clusters with up to 1000 nodes. More specifically, we support configurations that meet *all* of the following criteria:
|
||||
|
||||
* No more than 1000 nodes
|
||||
* No more than 30000 total pods
|
||||
* No more than 60000 total containers
|
||||
* No more than 100 pods per node
|
||||
|
||||
## Setup
|
||||
|
||||
A cluster is a set of nodes (physical or virtual machines) running Kubernetes agents, managed by a "master" (the cluster-level control plane).
|
||||
|
||||
Normally the number of nodes in a cluster is controlled by the the value `NUM_NODES` in the platform-specific `config-default.sh` file (for example, see [GCE's `config-default.sh`](http://releases.k8s.io/HEAD/cluster/gce/config-default.sh)).
|
||||
|
||||
Simply changing that value to something very large, however, may cause the setup script to fail for many cloud providers. A GCE deployment, for example, will run in to quota issues and fail to bring the cluster up.
|
||||
|
||||
When setting up a large Kubernetes cluster, the following issues must be considered.
|
||||
|
||||
### Quota Issues
|
||||
|
||||
To avoid running into cloud provider quota issues, when creating a cluster with many nodes, consider:
|
||||
* Increase the quota for things like CPU, IPs, etc.
|
||||
* In [GCE, for example,](https://cloud.google.com/compute/docs/resource-quotas) you'll want to increase the quota for:
|
||||
* CPUs
|
||||
* VM instances
|
||||
* Total persistent disk reserved
|
||||
* In-use IP addresses
|
||||
* Firewall Rules
|
||||
* Forwarding rules
|
||||
* Routes
|
||||
* Target pools
|
||||
* Gating the setup script so that it brings up new node VMs in smaller batches with waits in between, because some cloud providers rate limit the creation of VMs.
|
||||
|
||||
### Etcd storage
|
||||
|
||||
To improve performance of large clusters, we store events in a separate dedicated etcd instance.
|
||||
|
||||
When creating a cluster, existing salt scripts:
|
||||
* start and configure additional etcd instance
|
||||
* configure api-server to use it for storing events
|
||||
|
||||
### Addon Resources
|
||||
|
||||
To prevent memory leaks or other resource issues in [cluster addons](../../cluster/addons/) from consuming all the resources available on a node, Kubernetes sets resource limits on addon containers to limit the CPU and Memory resources they can consume (See PR [#10653](http://pr.k8s.io/10653/files) and [#10778](http://pr.k8s.io/10778/files)).
|
||||
|
||||
For [example](../../cluster/saltbase/salt/fluentd-gcp/fluentd-gcp.yaml):
|
||||
|
||||
```yaml
|
||||
containers:
|
||||
- name: fluentd-cloud-logging
|
||||
image: gcr.io/google_containers/fluentd-gcp:1.16
|
||||
resources:
|
||||
limits:
|
||||
cpu: 100m
|
||||
memory: 200Mi
|
||||
```
|
||||
|
||||
Except for Heapster, these limits are static and are based on data we collected from addons running on 4-node clusters (see [#10335](http://issue.k8s.io/10335#issuecomment-117861225)). The addons consume a lot more resources when running on large deployment clusters (see [#5880](http://issue.k8s.io/5880#issuecomment-113984085)). So, if a large cluster is deployed without adjusting these values, the addons may continuously get killed because they keep hitting the limits.
|
||||
|
||||
To avoid running into cluster addon resource issues, when creating a cluster with many nodes, consider the following:
|
||||
* Scale memory and CPU limits for each of the following addons, if used, as you scale up the size of cluster (there is one replica of each handling the entire cluster so memory and CPU usage tends to grow proportionally with size/load on cluster):
|
||||
* [InfluxDB and Grafana](http://releases.k8s.io/HEAD/cluster/addons/cluster-monitoring/influxdb/influxdb-grafana-controller.yaml)
|
||||
* [skydns, kube2sky, and dns etcd](http://releases.k8s.io/HEAD/cluster/addons/dns/skydns-rc.yaml.in)
|
||||
* [Kibana](http://releases.k8s.io/HEAD/cluster/addons/fluentd-elasticsearch/kibana-controller.yaml)
|
||||
* Scale number of replicas for the following addons, if used, along with the size of cluster (there are multiple replicas of each so increasing replicas should help handle increased load, but, since load per replica also increases slightly, also consider increasing CPU/memory limits):
|
||||
* [elasticsearch](http://releases.k8s.io/HEAD/cluster/addons/fluentd-elasticsearch/es-controller.yaml)
|
||||
* Increase memory and CPU limits slightly for each of the following addons, if used, along with the size of cluster (there is one replica per node but CPU/memory usage increases slightly along with cluster load/size as well):
|
||||
* [FluentD with ElasticSearch Plugin](http://releases.k8s.io/HEAD/cluster/saltbase/salt/fluentd-es/fluentd-es.yaml)
|
||||
* [FluentD with GCP Plugin](http://releases.k8s.io/HEAD/cluster/saltbase/salt/fluentd-gcp/fluentd-gcp.yaml)
|
||||
|
||||
Heapster's resource limits are set dynamically based on the initial size of your cluster (see [#16185](http://issue.k8s.io/16185) and [#21258](http://issue.k8s.io/21258)). If you find that Heapster is running
|
||||
out of resources, you should adjust the formulas that compute heapster memory request (see those PRs for details).
|
||||
|
||||
For directions on how to detect if addon containers are hitting resource limits, see the [Troubleshooting section of Compute Resources](../user-guide/compute-resources.md#troubleshooting).
|
||||
|
||||
In the [future](http://issue.k8s.io/13048), we anticipate to set all cluster addon resource limits based on cluster size, and to dynamically adjust them if you grow or shrink your cluster.
|
||||
We welcome PRs that implement those features.
|
||||
|
||||
### Allowing minor node failure at startup
|
||||
|
||||
For various reasons (see [#18969](https://github.com/kubernetes/kubernetes/issues/18969) for more details) running
|
||||
`kube-up.sh` with a very large `NUM_NODES` may fail due to a very small number of nodes not coming up properly.
|
||||
Currently you have two choices: restart the cluster (`kube-down.sh` and then `kube-up.sh` again), or before
|
||||
running `kube-up.sh` set the environment variable `ALLOWED_NOTREADY_NODES` to whatever value you feel comfortable
|
||||
with. This will allow `kube-up.sh` to succeed with fewer than `NUM_NODES` coming up. Depending on the
|
||||
reason for the failure, those additional nodes may join later or the cluster may remain at a size of
|
||||
`NUM_NODES - ALLOWED_NOTREADY_NODES`.
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
|
||||
@@ -32,185 +32,8 @@ Documentation for other releases can be found at
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
# Cluster Management
|
||||
This file has moved to: http://kubernetes.github.io/docs/admin/cluster-management/
|
||||
|
||||
This document describes several topics related to the lifecycle of a cluster: creating a new cluster,
|
||||
upgrading your cluster's
|
||||
master and worker nodes, performing node maintenance (e.g. kernel upgrades), and upgrading the Kubernetes API version of a
|
||||
running cluster.
|
||||
|
||||
## Creating and configuring a Cluster
|
||||
|
||||
To install Kubernetes on a set of machines, consult one of the existing [Getting Started guides](../../docs/getting-started-guides/README.md) depending on your environment.
|
||||
|
||||
## Upgrading a cluster
|
||||
|
||||
The current state of cluster upgrades is provider dependent.
|
||||
|
||||
### Upgrading Google Compute Engine clusters
|
||||
|
||||
Google Compute Engine Open Source (GCE-OSS) support master upgrades by deleting and
|
||||
recreating the master, while maintaining the same Persistent Disk (PD) to ensure that data is retained across the
|
||||
upgrade.
|
||||
|
||||
Node upgrades for GCE use a [Managed Instance Group](https://cloud.google.com/compute/docs/instance-groups/), each node
|
||||
is sequentially destroyed and then recreated with new software. Any Pods that are running on that node need to be
|
||||
controlled by a Replication Controller, or manually re-created after the roll out.
|
||||
|
||||
Upgrades on open source Google Compute Engine (GCE) clusters are controlled by the `cluster/gce/upgrade.sh` script.
|
||||
|
||||
Get its usage by running `cluster/gce/upgrade.sh -h`.
|
||||
|
||||
For example, to upgrade just your master to a specific version (v1.0.2):
|
||||
|
||||
```console
|
||||
cluster/gce/upgrade.sh -M v1.0.2
|
||||
```
|
||||
|
||||
Alternatively, to upgrade your entire cluster to the latest stable release:
|
||||
|
||||
```console
|
||||
cluster/gce/upgrade.sh release/stable
|
||||
```
|
||||
|
||||
### Upgrading Google Container Engine (GKE) clusters
|
||||
|
||||
Google Container Engine automatically updates master components (e.g. `kube-apiserver`, `kube-scheduler`) to the latest
|
||||
version. It also handles upgrading the operating system and other components that the master runs on.
|
||||
|
||||
The node upgrade process is user-initiated and is described in the [GKE documentation.](https://cloud.google.com/container-engine/docs/clusters/upgrade)
|
||||
|
||||
### Upgrading clusters on other platforms
|
||||
|
||||
The `cluster/kube-push.sh` script will do a rudimentary update. This process is still quite experimental, we
|
||||
recommend testing the upgrade on an experimental cluster before performing the update on a production cluster.
|
||||
|
||||
## Resizing a cluster
|
||||
|
||||
If your cluster runs short on resources you can easily add more machines to it if your cluster is running in [Node self-registration mode](node.md#self-registration-of-nodes).
|
||||
If you're using GCE or GKE it's done by resizing Instance Group managing your Nodes. It can be accomplished by modifying number of instances on `Compute > Compute Engine > Instance groups > your group > Edit group` [Google Cloud Console page](https://console.developers.google.com) or using gcloud CLI:
|
||||
|
||||
```
|
||||
gcloud compute instance-groups managed resize kubernetes-minion-group --size 42 --zone $ZONE
|
||||
```
|
||||
|
||||
Instance Group will take care of putting appropriate image on new machines and start them, while Kubelet will register its Node with API server to make it available for scheduling. If you scale the instance group down, system will randomly choose Nodes to kill.
|
||||
|
||||
In other environments you may need to configure the machine yourself and tell the Kubelet on which machine API server is running.
|
||||
|
||||
|
||||
### Horizontal auto-scaling of nodes (GCE)
|
||||
|
||||
If you are using GCE, you can configure your cluster so that the number of nodes will be automatically scaled based on:
|
||||
* CPU and memory utilization.
|
||||
* Amount of of CPU and memory requested by the pods (called also reservation).
|
||||
|
||||
Before setting up the cluster by `kube-up.sh`, you can set `KUBE_ENABLE_NODE_AUTOSCALER` environment variable to `true` and export it.
|
||||
The script will create an autoscaler for the instance group managing your nodes.
|
||||
|
||||
The autoscaler will try to maintain the average CPU/memory utilization and reservation of nodes within the cluster close to the target value.
|
||||
The target value can be configured by `KUBE_TARGET_NODE_UTILIZATION` environment variable (default: 0.7) for ``kube-up.sh`` when creating the cluster.
|
||||
Node utilization is the total node's CPU/memory usage (OS + k8s + user load) divided by the node's capacity.
|
||||
Node reservation is the total CPU/memory requested by pods that are running on the node divided by the node's capacity.
|
||||
If the desired numbers of nodes in the cluster resulting from CPU/memory utilization/reservation are different,
|
||||
the autoscaler will choose the bigger number. The number of nodes in the cluster set by the autoscaler will be limited from `KUBE_AUTOSCALER_MIN_NODES` (default: 1)
|
||||
to `KUBE_AUTOSCALER_MAX_NODES` (default: the initial number of nodes in the cluster).
|
||||
|
||||
The autoscaler is implemented as a Compute Engine Autoscaler.
|
||||
The initial values of the autoscaler parameters set by `kube-up.sh` and some more advanced options can be tweaked on
|
||||
`Compute > Compute Engine > Instance groups > your group > Edit group`[Google Cloud Console page](https://console.developers.google.com)
|
||||
or using gcloud CLI:
|
||||
|
||||
```
|
||||
gcloud alpha compute autoscaler --zone $ZONE <command>
|
||||
```
|
||||
|
||||
Note that autoscaling will work properly only if node metrics are accessible in Google Cloud Monitoring.
|
||||
To make the metrics accessible, you need to create your cluster with `KUBE_ENABLE_CLUSTER_MONITORING`
|
||||
equal to `google` or `googleinfluxdb` (`googleinfluxdb` is the default value). Please also make sure
|
||||
that you have Google Cloud Monitoring API enabled in Google Developer Console.
|
||||
|
||||
## Maintenance on a Node
|
||||
|
||||
If you need to reboot a node (such as for a kernel upgrade, libc upgrade, hardware repair, etc.), and the downtime is
|
||||
brief, then when the Kubelet restarts, it will attempt to restart the pods scheduled to it. If the reboot takes longer,
|
||||
then the node controller will terminate the pods that are bound to the unavailable node. If there is a corresponding
|
||||
replication controller, then a new copy of the pod will be started on a different node. So, in the case where all
|
||||
pods are replicated, upgrades can be done without special coordination, assuming that not all nodes will go down at the same time.
|
||||
|
||||
If you want more control over the upgrading process, you may use the following workflow:
|
||||
|
||||
Mark the node to be rebooted as unschedulable:
|
||||
|
||||
```console
|
||||
kubectl replace nodes $NODENAME --patch='{"apiVersion": "v1", "spec": {"unschedulable": true}}'
|
||||
```
|
||||
|
||||
This keeps new pods from landing on the node while you are trying to get them off.
|
||||
|
||||
Get the pods off the machine, via any of the following strategies:
|
||||
* Wait for finite-duration pods to complete.
|
||||
* Delete pods with:
|
||||
|
||||
```console
|
||||
kubectl delete pods $PODNAME
|
||||
```
|
||||
|
||||
For pods with a replication controller, the pod will eventually be replaced by a new pod which will be scheduled to a new node. Additionally, if the pod is part of a service, then clients will automatically be redirected to the new pod.
|
||||
|
||||
For pods with no replication controller, you need to bring up a new copy of the pod, and assuming it is not part of a service, redirect clients to it.
|
||||
|
||||
Perform maintenance work on the node.
|
||||
|
||||
Make the node schedulable again:
|
||||
|
||||
```console
|
||||
kubectl replace nodes $NODENAME --patch='{"apiVersion": "v1", "spec": {"unschedulable": false}}'
|
||||
```
|
||||
|
||||
If you deleted the node's VM instance and created a new one, then a new schedulable node resource will
|
||||
be created automatically when you create a new VM instance (if you're using a cloud provider that supports
|
||||
node discovery; currently this is only Google Compute Engine, not including CoreOS on Google Compute Engine using kube-register). See [Node](node.md) for more details.
|
||||
|
||||
## Advanced Topics
|
||||
|
||||
### Upgrading to a different API version
|
||||
|
||||
When a new API version is released, you may need to upgrade a cluster to support the new API version (e.g. switching from 'v1' to 'v2' when 'v2' is launched)
|
||||
|
||||
This is an infrequent event, but it requires careful management. There is a sequence of steps to upgrade to a new API version.
|
||||
|
||||
1. Turn on the new api version.
|
||||
1. Upgrade the cluster's storage to use the new version.
|
||||
1. Upgrade all config files. Identify users of the old API version endpoints.
|
||||
1. Update existing objects in the storage to new version by running `cluster/update-storage-objects.sh`.
|
||||
1. Turn off the old API version.
|
||||
|
||||
### Turn on or off an API version for your cluster
|
||||
|
||||
Specific API versions can be turned on or off by passing --runtime-config=api/<version> flag while bringing up the API server. For example: to turn off v1 API, pass `--runtime-config=api/v1=false`.
|
||||
runtime-config also supports 2 special keys: api/all and api/legacy to control all and legacy APIs respectively.
|
||||
For example, for turning off all api versions except v1, pass `--runtime-config=api/all=false,api/v1=true`.
|
||||
For the purposes of these flags, _legacy_ APIs are those APIs which have been explicitly deprecated (e.g. `v1beta3`).
|
||||
|
||||
### Switching your cluster's storage API version
|
||||
|
||||
The objects that are stored to disk for a cluster's internal representation of the Kubernetes resources active in the cluster are written using a particular version of the API.
|
||||
When the supported API changes, these objects may need to be rewritten in the newer API. Failure to do this will eventually result in resources that are no longer decodable or usable
|
||||
by the kubernetes API server.
|
||||
|
||||
`KUBE_API_VERSIONS` environment variable for the `kube-apiserver` binary which controls the API versions that are supported in the cluster. The first version in the list is used as the cluster's storage version. Hence, to set a specific version as the storage version, bring it to the front of list of versions in the value of `KUBE_API_VERSIONS`. You need to restart the `kube-apiserver` binary
|
||||
for changes to this variable to take effect.
|
||||
|
||||
### Switching your config files to a new API version
|
||||
|
||||
You can use `kubectl convert` command to convert config files between different API versions.
|
||||
|
||||
```console
|
||||
$ kubectl convert -f pod.yaml --output-version v1
|
||||
```
|
||||
|
||||
For more options, please refer to the usage of [kubectl convert](../user-guide/kubectl/kubectl_convert.md) command.
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
|
||||
@@ -32,114 +32,7 @@ Documentation for other releases can be found at
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
# Cluster Troubleshooting
|
||||
|
||||
This doc is about cluster troubleshooting; we assume you have already ruled out your application as the root cause of the
|
||||
problem you are experiencing. See
|
||||
the [application troubleshooting guide](../user-guide/application-troubleshooting.md) for tips on application debugging.
|
||||
You may also visit [troubleshooting document](../troubleshooting.md) for more information.
|
||||
|
||||
## Listing your cluster
|
||||
|
||||
The first thing to debug in your cluster is if your nodes are all registered correctly.
|
||||
|
||||
Run
|
||||
|
||||
```sh
|
||||
kubectl get nodes
|
||||
```
|
||||
|
||||
And verify that all of the nodes you expect to see are present and that they are all in the `Ready` state.
|
||||
|
||||
## Looking at logs
|
||||
|
||||
For now, digging deeper into the cluster requires logging into the relevant machines. Here are the locations
|
||||
of the relevant log files. (note that on systemd-based systems, you may need to use `journalctl` instead)
|
||||
|
||||
### Master
|
||||
|
||||
* /var/log/kube-apiserver.log - API Server, responsible for serving the API
|
||||
* /var/log/kube-scheduler.log - Scheduler, responsible for making scheduling decisions
|
||||
* /var/log/kube-controller-manager.log - Controller that manages replication controllers
|
||||
|
||||
### Worker Nodes
|
||||
|
||||
* /var/log/kubelet.log - Kubelet, responsible for running containers on the node
|
||||
* /var/log/kube-proxy.log - Kube Proxy, responsible for service load balancing
|
||||
|
||||
## A general overview of cluster failure modes
|
||||
|
||||
This is an incomplete list of things that could go wrong, and how to adjust your cluster setup to mitigate the problems.
|
||||
|
||||
Root causes:
|
||||
- VM(s) shutdown
|
||||
- Network partition within cluster, or between cluster and users
|
||||
- Crashes in Kubernetes software
|
||||
- Data loss or unavailability of persistent storage (e.g. GCE PD or AWS EBS volume)
|
||||
- Operator error, e.g. misconfigured Kubernetes software or application software
|
||||
|
||||
Specific scenarios:
|
||||
- Apiserver VM shutdown or apiserver crashing
|
||||
- Results
|
||||
- unable to stop, update, or start new pods, services, replication controller
|
||||
- existing pods and services should continue to work normally, unless they depend on the Kubernetes API
|
||||
- Apiserver backing storage lost
|
||||
- Results
|
||||
- apiserver should fail to come up
|
||||
- kubelets will not be able to reach it but will continue to run the same pods and provide the same service proxying
|
||||
- manual recovery or recreation of apiserver state necessary before apiserver is restarted
|
||||
- Supporting services (node controller, replication controller manager, scheduler, etc) VM shutdown or crashes
|
||||
- currently those are colocated with the apiserver, and their unavailability has similar consequences as apiserver
|
||||
- in future, these will be replicated as well and may not be co-located
|
||||
- they do not have their own persistent state
|
||||
- Individual node (VM or physical machine) shuts down
|
||||
- Results
|
||||
- pods on that Node stop running
|
||||
- Network partition
|
||||
- Results
|
||||
- partition A thinks the nodes in partition B are down; partition B thinks the apiserver is down. (Assuming the master VM ends up in partition A.)
|
||||
- Kubelet software fault
|
||||
- Results
|
||||
- crashing kubelet cannot start new pods on the node
|
||||
- kubelet might delete the pods or not
|
||||
- node marked unhealthy
|
||||
- replication controllers start new pods elsewhere
|
||||
- Cluster operator error
|
||||
- Results
|
||||
- loss of pods, services, etc
|
||||
- lost of apiserver backing store
|
||||
- users unable to read API
|
||||
- etc.
|
||||
|
||||
Mitigations:
|
||||
- Action: Use IaaS provider's automatic VM restarting feature for IaaS VMs
|
||||
- Mitigates: Apiserver VM shutdown or apiserver crashing
|
||||
- Mitigates: Supporting services VM shutdown or crashes
|
||||
|
||||
- Action use IaaS providers reliable storage (e.g GCE PD or AWS EBS volume) for VMs with apiserver+etcd
|
||||
- Mitigates: Apiserver backing storage lost
|
||||
|
||||
- Action: Use (experimental) [high-availability](high-availability.md) configuration
|
||||
- Mitigates: Master VM shutdown or master components (scheduler, API server, controller-managing) crashing
|
||||
- Will tolerate one or more simultaneous node or component failures
|
||||
- Mitigates: Apiserver backing storage (i.e., etcd's data directory) lost
|
||||
- Assuming you used clustered etcd.
|
||||
|
||||
- Action: Snapshot apiserver PDs/EBS-volumes periodically
|
||||
- Mitigates: Apiserver backing storage lost
|
||||
- Mitigates: Some cases of operator error
|
||||
- Mitigates: Some cases of Kubernetes software fault
|
||||
|
||||
- Action: use replication controller and services in front of pods
|
||||
- Mitigates: Node shutdown
|
||||
- Mitigates: Kubelet software fault
|
||||
|
||||
- Action: applications (containers) designed to tolerate unexpected restarts
|
||||
- Mitigates: Node shutdown
|
||||
- Mitigates: Kubelet software fault
|
||||
|
||||
- Action: [Multiple independent clusters](multi-cluster.md) (and avoid making risky changes to all clusters at once)
|
||||
- Mitigates: Everything listed above.
|
||||
This file has moved to: http://kubernetes.github.io/docs/admin/cluster-troubleshooting/
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
|
||||
@@ -32,187 +32,7 @@ Documentation for other releases can be found at
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
# Daemon Sets
|
||||
|
||||
**Table of Contents**
|
||||
<!-- BEGIN MUNGE: GENERATED_TOC -->
|
||||
|
||||
- [Daemon Sets](#daemon-sets)
|
||||
- [What is a _Daemon Set_?](#what-is-a-daemon-set)
|
||||
- [Writing a DaemonSet Spec](#writing-a-daemonset-spec)
|
||||
- [Required Fields](#required-fields)
|
||||
- [Pod Template](#pod-template)
|
||||
- [Pod Selector](#pod-selector)
|
||||
- [Running Pods on Only Some Nodes](#running-pods-on-only-some-nodes)
|
||||
- [How Daemon Pods are Scheduled](#how-daemon-pods-are-scheduled)
|
||||
- [Communicating with DaemonSet Pods](#communicating-with-daemonset-pods)
|
||||
- [Updating a DaemonSet](#updating-a-daemonset)
|
||||
- [Alternatives to Daemon Set](#alternatives-to-daemon-set)
|
||||
- [Init Scripts](#init-scripts)
|
||||
- [Bare Pods](#bare-pods)
|
||||
- [Static Pods](#static-pods)
|
||||
- [Replication Controller](#replication-controller)
|
||||
|
||||
<!-- END MUNGE: GENERATED_TOC -->
|
||||
|
||||
## What is a _Daemon Set_?
|
||||
|
||||
A _Daemon Set_ ensures that all (or some) nodes run a copy of a pod. As nodes are added to the
|
||||
cluster, pods are added to them. As nodes are removed from the cluster, those pods are garbage
|
||||
collected. Deleting a Daemon Set will clean up the pods it created.
|
||||
|
||||
Some typical uses of a Daemon Set are:
|
||||
|
||||
- running a cluster storage daemon, such as `glusterd`, `ceph`, on each node.
|
||||
- running a logs collection daemon on every node, such as `fluentd` or `logstash`.
|
||||
- running a node monitoring daemon on every node, such as [Prometheus Node Exporter](
|
||||
https://github.com/prometheus/node_exporter), `collectd`, New Relic agent, or Ganglia `gmond`.
|
||||
|
||||
In a simple case, one Daemon Set, covering all nodes, would be used for each type of daemon.
|
||||
A more complex setup might use multiple DaemonSets would be used for a single type of daemon,
|
||||
but with different flags and/or different memory and cpu requests for different hardware types.
|
||||
|
||||
## Writing a DaemonSet Spec
|
||||
|
||||
### Required Fields
|
||||
|
||||
As with all other Kubernetes config, a DaemonSet needs `apiVersion`, `kind`, and `metadata` fields. For
|
||||
general information about working with config files, see [deploying applications](../user-guide/deploying-applications.md),
|
||||
[configuring containers](../user-guide/configuring-containers.md), and [working with resources](../user-guide/working-with-resources.md) documents.
|
||||
|
||||
A DaemonSet also needs a [`.spec`](../devel/api-conventions.md#spec-and-status) section.
|
||||
|
||||
### Pod Template
|
||||
|
||||
The `.spec.template` is the only required field of the `.spec`.
|
||||
|
||||
The `.spec.template` is a [pod template](../user-guide/replication-controller.md#pod-template).
|
||||
It has exactly the same schema as a [pod](../user-guide/pods.md), except
|
||||
it is nested and does not have an `apiVersion` or `kind`.
|
||||
|
||||
In addition to required fields for a pod, a pod template in a DaemonSet has to specify appropriate
|
||||
labels (see [pod selector](#pod-selector)).
|
||||
|
||||
A pod template in a DaemonSet must have a [`RestartPolicy`](../user-guide/pod-states.md)
|
||||
equal to `Always`, or be unspecified, which defaults to `Always`.
|
||||
|
||||
### Pod Selector
|
||||
|
||||
The `.spec.selector` field is a pod selector. It works the same as the `.spec.selector` of
|
||||
a [Job](../user-guide/jobs.md) or other new resources.
|
||||
|
||||
The `spec.selector` is an object consisting of two fields:
|
||||
* `matchLabels` - works the same as the `.spec.selector` of a [ReplicationController](../user-guide/replication-controller.md)
|
||||
* `matchExpressions` - allows to build more sophisticated selectors by specifying key,
|
||||
list of values and an operator that relates the key and values.
|
||||
|
||||
When the two are specified the result is ANDed.
|
||||
|
||||
If the `.spec.selector` is specified, it must match the `.spec.template.metadata.labels`. If not
|
||||
specified, they are defaulted to be equal. Config with these not matching will be rejected by the API.
|
||||
|
||||
Also you should not normally create any pods whose labels match this selector, either directly, via
|
||||
another DaemonSet, or via other controller such as ReplicationController. Otherwise, the DaemonSet
|
||||
controller will think that those pods were created by it. Kubernetes will not stop you from doing
|
||||
this. Once case where you might want to do this is manually create a pod with a different value on
|
||||
a node for testing.
|
||||
|
||||
### Running Pods on Only Some Nodes
|
||||
|
||||
If you specify a `.spec.template.spec.nodeSelector`, then the DaemonSet controller will
|
||||
create pods on nodes which match that [node
|
||||
selector](../user-guide/node-selection/README.md).
|
||||
|
||||
If you do not specify a `.spec.template.spec.nodeSelector`, then the DaemonSet controller will
|
||||
create pods on all nodes.
|
||||
|
||||
## How Daemon Pods are Scheduled
|
||||
|
||||
Normally, the machine that a pod runs on is selected by the Kubernetes scheduler. However, pods
|
||||
created by the Daemon controller have the machine already selected (`.spec.nodeName` is specified
|
||||
when the pod is created, so it is ignored by the scheduler). Therefore:
|
||||
|
||||
- the [`unschedulable`](node.md#manual-node-administration) field of a node is not respected
|
||||
by the daemon set controller.
|
||||
- daemon set controller can make pods even when the scheduler has not been started, which can help cluster
|
||||
bootstrap.
|
||||
|
||||
## Communicating with DaemonSet Pods
|
||||
|
||||
Some possible patterns for communicating with pods in a DaemonSet are:
|
||||
|
||||
- **Push**: Pods in the Daemon Set are configured to send updates to another service, such
|
||||
as a stats database. They do not have clients.
|
||||
- **NodeIP and Known Port**: Pods in the Daemon Set use a `hostPort`, so that the pods are reachable
|
||||
via the node IPs. Clients knows the the list of nodes ips somehow, and know the port by convention.
|
||||
- **DNS**: Create a [headless service](../user-guide/services.md#headless-services) with the same pod selector,
|
||||
and then discover DaemonSets using the `endpoints` resource or retrieve multiple A records from
|
||||
DNS.
|
||||
- **Service**: Create a service with the same pod selector, and use the service to reach a
|
||||
daemon on a random node. (No way to reach specific node.)
|
||||
|
||||
## Updating a DaemonSet
|
||||
|
||||
If node labels are changed, the DaemonSet will promptly add pods to newly matching nodes and delete
|
||||
pods from newly not-matching nodes.
|
||||
|
||||
You can modify the pods that a DaemonSet creates. However, pods do not allow all
|
||||
fields to be updated. Also, the DaemonSet controller will use the original template the next
|
||||
time a node (even with the same name) is created.
|
||||
|
||||
|
||||
You can delete a DaemonSet. If you specify `--cascade=false` with `kubectl`, then the pods
|
||||
will be left on the nodes. You can then create a new DaemonSet with a different template.
|
||||
the new DaemonSet with the different template will recognize all the existing pods as having
|
||||
matching labels. It will not modify or delete them despite a mismatch in the pod template.
|
||||
You will need to force new pod creation by deleting the pod or deleting the node.
|
||||
|
||||
You cannot update a DaemonSet.
|
||||
|
||||
Support for updating DaemonSets and controlled updating of nodes is planned.
|
||||
|
||||
## Alternatives to Daemon Set
|
||||
|
||||
### Init Scripts
|
||||
|
||||
It is certainly possible to run daemon processes by directly starting them on a node (e.g using
|
||||
`init`, `upstartd`, or `systemd`). This is perfectly fine. However, there are several advantages to
|
||||
running such processes via a DaemonSet:
|
||||
|
||||
- Ability to monitor and manage logs for daemons in the same way as applications.
|
||||
- Same config language and tools (e.g. pod templates, `kubectl`) for daemons and applications.
|
||||
- Future versions of Kubernetes will likely support integration between DaemonSet-created
|
||||
pods and node upgrade workflows.
|
||||
- Running daemons in containers with resource limits increases isolation between daemons from app
|
||||
containers. However, this can also be accomplished by running the daemons in a container but not in a pod
|
||||
(e.g. start directly via Docker).
|
||||
|
||||
### Bare Pods
|
||||
|
||||
It is possible to create pods directly which specify a particular node to run on. However,
|
||||
a Daemon Set replaces pods that are deleted or terminated for any reason, such as in the case of
|
||||
node failure or disruptive node maintenance, such as a kernel upgrade. For this reason, you should
|
||||
use a Daemon Set rather than creating individual pods.
|
||||
|
||||
### Static Pods
|
||||
|
||||
It is possible to create pods by writing a file to a certain directory watched by Kubelet. These
|
||||
are called [static pods](static-pods.md).
|
||||
Unlike DaemonSet, static pods cannot be managed with kubectl
|
||||
or other Kubernetes API clients. Static pods do not depend on the apiserver, making them useful
|
||||
in cluster bootstrapping cases. Also, static pods may be deprecated in the future.
|
||||
|
||||
### Replication Controller
|
||||
|
||||
Daemon Set are similar to [Replication Controllers](../user-guide/replication-controller.md) in that
|
||||
they both create pods, and those pods have processes which are not expected to terminate (e.g. web servers,
|
||||
storage servers).
|
||||
|
||||
Use a replication controller for stateless services, like frontends, where scaling up and down the
|
||||
number of replicas and rolling out updates are more important than controlling exactly which host
|
||||
the pod runs on. Use a Daemon Controller when it is important that a copy of a pod always run on
|
||||
all or certain hosts, and when it needs to start before other pods.
|
||||
|
||||
This file has moved to: http://kubernetes.github.io/docs/admin/daemons/
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
|
||||
@@ -32,44 +32,7 @@ Documentation for other releases can be found at
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
# DNS Integration with Kubernetes
|
||||
|
||||
As of Kubernetes 0.8, DNS is offered as a [cluster add-on](http://releases.k8s.io/HEAD/cluster/addons/README.md).
|
||||
If enabled, a DNS Pod and Service will be scheduled on the cluster, and the kubelets will be
|
||||
configured to tell individual containers to use the DNS Service's IP to resolve DNS names.
|
||||
|
||||
Every Service defined in the cluster (including the DNS server itself) will be
|
||||
assigned a DNS name. By default, a client Pod's DNS search list will
|
||||
include the Pod's own namespace and the cluster's default domain. This is best
|
||||
illustrated by example:
|
||||
|
||||
Assume a Service named `foo` in the Kubernetes namespace `bar`. A Pod running
|
||||
in namespace `bar` can look up this service by simply doing a DNS query for
|
||||
`foo`. A Pod running in namespace `quux` can look up this service by doing a
|
||||
DNS query for `foo.bar`.
|
||||
|
||||
The cluster DNS server ([SkyDNS](https://github.com/skynetservices/skydns))
|
||||
supports forward lookups (A records) and service lookups (SRV records).
|
||||
|
||||
## How it Works
|
||||
|
||||
The running DNS pod holds 4 containers - skydns, etcd (a private instance which skydns uses),
|
||||
a Kubernetes-to-skydns bridge called kube2sky, and a health check called healthz. The kube2sky process
|
||||
watches the Kubernetes master for changes in Services, and then writes the
|
||||
information to etcd, which skydns reads. This etcd instance is not linked to
|
||||
any other etcd clusters that might exist, including the Kubernetes master.
|
||||
|
||||
## Issues
|
||||
|
||||
The skydns service is reachable directly from Kubernetes nodes (outside
|
||||
of any container) and DNS resolution works if the skydns service is targeted
|
||||
explicitly. However, nodes are not configured to use the cluster DNS service or
|
||||
to search the cluster's DNS domain by default. This may be resolved at a later
|
||||
time.
|
||||
|
||||
## For more information
|
||||
|
||||
See [the docs for the DNS cluster addon](http://releases.k8s.io/HEAD/cluster/addons/dns/README.md).
|
||||
This file has moved to: http://kubernetes.github.io/docs/admin/dns/
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
|
||||
@@ -32,51 +32,7 @@ Documentation for other releases can be found at
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
# etcd
|
||||
|
||||
[etcd](https://coreos.com/etcd/docs/2.2.1/) is a highly-available key value
|
||||
store which Kubernetes uses for persistent storage of all of its REST API
|
||||
objects.
|
||||
|
||||
## Configuration: high-level goals
|
||||
|
||||
Access Control: give *only* kube-apiserver read/write access to etcd. You do not
|
||||
want apiserver's etcd exposed to every node in your cluster (or worse, to the
|
||||
internet at large), because access to etcd is equivalent to root in your
|
||||
cluster.
|
||||
|
||||
Data Reliability: for reasonable safety, either etcd needs to be run as a
|
||||
[cluster](high-availability.md#clustering-etcd) (multiple machines each running
|
||||
etcd) or etcd's data directory should be located on durable storage (e.g., GCE's
|
||||
persistent disk). In either case, if high availability is required--as it might
|
||||
be in a production cluster--the data directory ought to be [backed up
|
||||
periodically](https://coreos.com/etcd/docs/2.2.1/admin_guide.html#disaster-recovery),
|
||||
to reduce downtime in case of corruption.
|
||||
|
||||
## Default configuration
|
||||
|
||||
The default setup scripts use kubelet's file-based static pods feature to run etcd in a
|
||||
[pod](http://releases.k8s.io/HEAD/cluster/saltbase/salt/etcd/etcd.manifest). This manifest should only
|
||||
be run on master VMs. The default location that kubelet scans for manifests is
|
||||
`/etc/kubernetes/manifests/`.
|
||||
|
||||
## Kubernetes's usage of etcd
|
||||
|
||||
By default, Kubernetes objects are stored under the `/registry` key in etcd.
|
||||
This path can be prefixed by using the [kube-apiserver](kube-apiserver.md) flag
|
||||
`--etcd-prefix="/foo"`.
|
||||
|
||||
`etcd` is the only place that Kubernetes keeps state.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
To test whether `etcd` is running correctly, you can try writing a value to a
|
||||
test key. On your master VM (or somewhere with firewalls configured such that
|
||||
you can talk to your cluster's etcd), try:
|
||||
|
||||
```sh
|
||||
curl -fs -X PUT "http://${host}:${port}/v2/keys/_test"
|
||||
```
|
||||
This file has moved to: http://kubernetes.github.io/docs/admin/etcd/
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
|
||||
@@ -32,58 +32,7 @@ Documentation for other releases can be found at
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
# Garbage Collection
|
||||
|
||||
- [Introduction](#introduction)
|
||||
- [Image Collection](#image-collection)
|
||||
- [Container Collection](#container-collection)
|
||||
- [User Configuration](#user-configuration)
|
||||
|
||||
### Introduction
|
||||
|
||||
Garbage collection is a helpful function of kubelet that will clean up unreferenced images and unused containers. kubelet will perform garbage collection for containers every minute and garbage collection for images every five minutes.
|
||||
|
||||
External garbage collection tools are not recommended as these tools can potentially break the behavior of kubelet by removing containers expected to exist.
|
||||
|
||||
### Image Collection
|
||||
|
||||
Kubernetes manages lifecycle of all images through imageManager, with the cooperation
|
||||
of cadvisor.
|
||||
|
||||
The policy for garbage collecting images takes two factors into consideration:
|
||||
`HighThresholdPercent` and `LowThresholdPercent`. Disk usage above the the high threshold
|
||||
will trigger garbage collection. The garbage collection will delete least recently used images until the low
|
||||
threshold has been met.
|
||||
|
||||
### Container Collection
|
||||
|
||||
The policy for garbage collecting containers considers three user-defined variables. `MinAge` is the minimum age at which a container can be garbage collected. `MaxPerPodContainer` is the maximum number of dead containers any single
|
||||
pod (UID, container name) pair is allowed to have. `MaxContainers` is the maximum number of total dead containers. These variables can be individually disabled by setting 'Min Age' to zero and setting 'MaxPerPodContainer' and 'MaxContainers' respectively to less than zero.
|
||||
|
||||
Kubelet will act on containers that are unidentified, deleted, or outside of the boundaries set by the previously mentioned flags. The oldest containers will generally be removed first. 'MaxPerPodContainer' and 'MaxContainer' may potentially conflict with each other in situations where retaining the maximum number of containers per pod ('MaxPerPodContainer') would go outside the allowable range of global dead containers ('MaxContainers'). 'MaxPerPodContainer' would be adjusted in this situation: A worst case scenario would be to downgrade 'MaxPerPodContainer' to 1 and evict the oldest containers. Additionally, containers owned by pods that have been deleted are removed once they are older than `MinAge`.
|
||||
|
||||
Containers that are not managed by kubelet are not subject to container garbage collection.
|
||||
|
||||
### User Configuration
|
||||
|
||||
Users can adjust the following thresholds to tune image garbage collection with the following kubelet flags :
|
||||
|
||||
1. `image-gc-high-threshold`, the percent of disk usage which triggers image garbage collection.
|
||||
Default is 90%.
|
||||
2. `image-gc-low-threshold`, the percent of disk usage to which image garbage collection attempts
|
||||
to free. Default is 80%.
|
||||
|
||||
We also allow users to customize garbage collection policy through the following kubelet flags:
|
||||
|
||||
1. `minimum-container-ttl-duration`, minimum age for a finished container before it is
|
||||
garbage collected. Default is 1 minute.
|
||||
2. `maximum-dead-containers-per-container`, maximum number of old instances to retain
|
||||
per container. Default is 2.
|
||||
3. `maximum-dead-containers`, maximum number of old instances of containers to retain globally.
|
||||
Default is 100.
|
||||
|
||||
Containers can potentially be garbage collected before their usefulness has expired. These containers can contain logs and other data that can be useful for troubleshooting. A sufficiently large value for `maximum-dead-containers-per-container` is highly recommended to allow at least 2 dead containers to be retained per expected container. A higher value for `maximum-dead-containers` is also recommended for a similiar reason.
|
||||
See [this issue](https://github.com/kubernetes/kubernetes/issues/13287) for more details.
|
||||
This file has moved to: http://kubernetes.github.io/docs/admin/garbage-collection/
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
|
||||
@@ -32,252 +32,8 @@ Documentation for other releases can be found at
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
# High Availability Kubernetes Clusters
|
||||
This file has moved to: http://kubernetes.github.io/docs/admin/high-availability/
|
||||
|
||||
**Table of Contents**
|
||||
<!-- BEGIN MUNGE: GENERATED_TOC -->
|
||||
|
||||
- [High Availability Kubernetes Clusters](#high-availability-kubernetes-clusters)
|
||||
- [Introduction](#introduction)
|
||||
- [Overview](#overview)
|
||||
- [Initial set-up](#initial-set-up)
|
||||
- [Reliable nodes](#reliable-nodes)
|
||||
- [Establishing a redundant, reliable data storage layer](#establishing-a-redundant-reliable-data-storage-layer)
|
||||
- [Clustering etcd](#clustering-etcd)
|
||||
- [Validating your cluster](#validating-your-cluster)
|
||||
- [Even more reliable storage](#even-more-reliable-storage)
|
||||
- [Replicated API Servers](#replicated-api-servers)
|
||||
- [Installing configuration files](#installing-configuration-files)
|
||||
- [Starting the API Server](#starting-the-api-server)
|
||||
- [Load balancing](#load-balancing)
|
||||
- [Master elected components](#master-elected-components)
|
||||
- [Installing configuration files](#installing-configuration-files)
|
||||
- [Running the podmaster](#running-the-podmaster)
|
||||
- [Conclusion](#conclusion)
|
||||
|
||||
<!-- END MUNGE: GENERATED_TOC -->
|
||||
|
||||
## Introduction
|
||||
|
||||
PLEASE NOTE: Note that the podmaster implementation is obsoleted by https://github.com/kubernetes/kubernetes/pull/16830,
|
||||
which provides a primitive for leader election in the experimental kubernetes API.
|
||||
|
||||
Nevertheless, the concepts and implementation in this document are still valid, as is the podmaster implementation itself.
|
||||
|
||||
This document describes how to build a high-availability (HA) Kubernetes cluster. This is a fairly advanced topic.
|
||||
Users who merely want to experiment with Kubernetes are encouraged to use configurations that are simpler to set up such as
|
||||
the simple [Docker based single node cluster instructions](../../docs/getting-started-guides/docker.md),
|
||||
or try [Google Container Engine](https://cloud.google.com/container-engine/) for hosted Kubernetes.
|
||||
|
||||
Also, at this time high availability support for Kubernetes is not continuously tested in our end-to-end (e2e) testing. We will
|
||||
be working to add this continuous testing, but for now the single-node master installations are more heavily tested.
|
||||
|
||||
## Overview
|
||||
|
||||
Setting up a truly reliable, highly available distributed system requires a number of steps, it is akin to
|
||||
wearing underwear, pants, a belt, suspenders, another pair of underwear, and another pair of pants. We go into each
|
||||
of these steps in detail, but a summary is given here to help guide and orient the user.
|
||||
|
||||
The steps involved are as follows:
|
||||
* [Creating the reliable constituent nodes that collectively form our HA master implementation.](#reliable-nodes)
|
||||
* [Setting up a redundant, reliable storage layer with clustered etcd.](#establishing-a-redundant-reliable-data-storage-layer)
|
||||
* [Starting replicated, load balanced Kubernetes API servers](#replicated-api-servers)
|
||||
* [Setting up master-elected Kubernetes scheduler and controller-manager daemons](#master-elected-components)
|
||||
|
||||
Here's what the system should look like when it's finished:
|
||||

|
||||
|
||||
Ready? Let's get started.
|
||||
|
||||
## Initial set-up
|
||||
|
||||
The remainder of this guide assumes that you are setting up a 3-node clustered master, where each machine is running some flavor of Linux.
|
||||
Examples in the guide are given for Debian distributions, but they should be easily adaptable to other distributions.
|
||||
Likewise, this set up should work whether you are running in a public or private cloud provider, or if you are running
|
||||
on bare metal.
|
||||
|
||||
The easiest way to implement an HA Kubernetes cluster is to start with an existing single-master cluster. The
|
||||
instructions at [https://get.k8s.io](https://get.k8s.io)
|
||||
describe easy installation for single-master clusters on a variety of platforms.
|
||||
|
||||
## Reliable nodes
|
||||
|
||||
On each master node, we are going to run a number of processes that implement the Kubernetes API. The first step in making these reliable is
|
||||
to make sure that each automatically restarts when it fails. To achieve this, we need to install a process watcher. We choose to use
|
||||
the `kubelet` that we run on each of the worker nodes. This is convenient, since we can use containers to distribute our binaries, we can
|
||||
establish resource limits, and introspect the resource usage of each daemon. Of course, we also need something to monitor the kubelet
|
||||
itself (insert who watches the watcher jokes here). For Debian systems, we choose monit, but there are a number of alternate
|
||||
choices. For example, on systemd-based systems (e.g. RHEL, CentOS), you can run 'systemctl enable kubelet'.
|
||||
|
||||
If you are extending from a standard Kubernetes installation, the `kubelet` binary should already be present on your system. You can run
|
||||
`which kubelet` to determine if the binary is in fact installed. If it is not installed,
|
||||
you should install the [kubelet binary](https://storage.googleapis.com/kubernetes-release/release/v0.19.3/bin/linux/amd64/kubelet), the
|
||||
[kubelet init file](http://releases.k8s.io/HEAD/cluster/saltbase/salt/kubelet/initd) and [high-availability/default-kubelet](high-availability/default-kubelet)
|
||||
scripts.
|
||||
|
||||
If you are using monit, you should also install the monit daemon (`apt-get install monit`) and the [high-availability/monit-kubelet](high-availability/monit-kubelet) and
|
||||
[high-availability/monit-docker](high-availability/monit-docker) configs.
|
||||
|
||||
On systemd systems you `systemctl enable kubelet` and `systemctl enable docker`.
|
||||
|
||||
|
||||
## Establishing a redundant, reliable data storage layer
|
||||
|
||||
The central foundation of a highly available solution is a redundant, reliable storage layer. The number one rule of high-availability is
|
||||
to protect the data. Whatever else happens, whatever catches on fire, if you have the data, you can rebuild. If you lose the data, you're
|
||||
done.
|
||||
|
||||
Clustered etcd already replicates your storage to all master instances in your cluster. This means that to lose data, all three nodes would need
|
||||
to have their physical (or virtual) disks fail at the same time. The probability that this occurs is relatively low, so for many people
|
||||
running a replicated etcd cluster is likely reliable enough. You can add additional reliability by increasing the
|
||||
size of the cluster from three to five nodes. If that is still insufficient, you can add
|
||||
[even more redundancy to your storage layer](#even-more-reliable-storage).
|
||||
|
||||
### Clustering etcd
|
||||
|
||||
The full details of clustering etcd are beyond the scope of this document, lots of details are given on the
|
||||
[etcd clustering page](https://github.com/coreos/etcd/blob/master/Documentation/clustering.md). This example walks through
|
||||
a simple cluster set up, using etcd's built in discovery to build our cluster.
|
||||
|
||||
First, hit the etcd discovery service to create a new token:
|
||||
|
||||
```sh
|
||||
curl https://discovery.etcd.io/new?size=3
|
||||
```
|
||||
|
||||
On each node, copy the [etcd.yaml](high-availability/etcd.yaml) file into `/etc/kubernetes/manifests/etcd.yaml`
|
||||
|
||||
The kubelet on each node actively monitors the contents of that directory, and it will create an instance of the `etcd`
|
||||
server from the definition of the pod specified in `etcd.yaml`.
|
||||
|
||||
Note that in `etcd.yaml` you should substitute the token URL you got above for `${DISCOVERY_TOKEN}` on all three machines,
|
||||
and you should substitute a different name (e.g. `node-1`) for ${NODE_NAME} and the correct IP address
|
||||
for `${NODE_IP}` on each machine.
|
||||
|
||||
|
||||
#### Validating your cluster
|
||||
|
||||
Once you copy this into all three nodes, you should have a clustered etcd set up. You can validate with
|
||||
|
||||
```sh
|
||||
etcdctl member list
|
||||
```
|
||||
|
||||
and
|
||||
|
||||
```sh
|
||||
etcdctl cluster-health
|
||||
```
|
||||
|
||||
You can also validate that this is working with `etcdctl set foo bar` on one node, and `etcd get foo`
|
||||
on a different node.
|
||||
|
||||
### Even more reliable storage
|
||||
|
||||
Of course, if you are interested in increased data reliability, there are further options which makes the place where etcd
|
||||
installs it's data even more reliable than regular disks (belts *and* suspenders, ftw!).
|
||||
|
||||
If you use a cloud provider, then they usually provide this
|
||||
for you, for example [Persistent Disk](https://cloud.google.com/compute/docs/disks/persistent-disks) on the Google Cloud Platform. These
|
||||
are block-device persistent storage that can be mounted onto your virtual machine. Other cloud providers provide similar solutions.
|
||||
|
||||
If you are running on physical machines, you can also use network attached redundant storage using an iSCSI or NFS interface.
|
||||
Alternatively, you can run a clustered file system like Gluster or Ceph. Finally, you can also run a RAID array on each physical machine.
|
||||
|
||||
Regardless of how you choose to implement it, if you chose to use one of these options, you should make sure that your storage is mounted
|
||||
to each machine. If your storage is shared between the three masters in your cluster, you should create a different directory on the storage
|
||||
for each node. Throughout these instructions, we assume that this storage is mounted to your machine in `/var/etcd/data`
|
||||
|
||||
|
||||
## Replicated API Servers
|
||||
|
||||
Once you have replicated etcd set up correctly, we will also install the apiserver using the kubelet.
|
||||
|
||||
### Installing configuration files
|
||||
|
||||
First you need to create the initial log file, so that Docker mounts a file instead of a directory:
|
||||
|
||||
```sh
|
||||
touch /var/log/kube-apiserver.log
|
||||
```
|
||||
|
||||
Next, you need to create a `/srv/kubernetes/` directory on each node. This directory includes:
|
||||
* basic_auth.csv - basic auth user and password
|
||||
* ca.crt - Certificate Authority cert
|
||||
* known_tokens.csv - tokens that entities (e.g. the kubelet) can use to talk to the apiserver
|
||||
* kubecfg.crt - Client certificate, public key
|
||||
* kubecfg.key - Client certificate, private key
|
||||
* server.cert - Server certificate, public key
|
||||
* server.key - Server certificate, private key
|
||||
|
||||
The easiest way to create this directory, may be to copy it from the master node of a working cluster, or you can manually generate these files yourself.
|
||||
|
||||
### Starting the API Server
|
||||
|
||||
Once these files exist, copy the [kube-apiserver.yaml](high-availability/kube-apiserver.yaml) into `/etc/kubernetes/manifests/` on each master node.
|
||||
|
||||
The kubelet monitors this directory, and will automatically create an instance of the `kube-apiserver` container using the pod definition specified
|
||||
in the file.
|
||||
|
||||
### Load balancing
|
||||
|
||||
At this point, you should have 3 apiservers all working correctly. If you set up a network load balancer, you should
|
||||
be able to access your cluster via that load balancer, and see traffic balancing between the apiserver instances. Setting
|
||||
up a load balancer will depend on the specifics of your platform, for example instructions for the Google Cloud
|
||||
Platform can be found [here](https://cloud.google.com/compute/docs/load-balancing/)
|
||||
|
||||
Note, if you are using authentication, you may need to regenerate your certificate to include the IP address of the balancer,
|
||||
in addition to the IP addresses of the individual nodes.
|
||||
|
||||
For pods that you deploy into the cluster, the `kubernetes` service/dns name should provide a load balanced endpoint for the master automatically.
|
||||
|
||||
For external users of the API (e.g. the `kubectl` command line interface, continuous build pipelines, or other clients) you will want to configure
|
||||
them to talk to the external load balancer's IP address.
|
||||
|
||||
## Master elected components
|
||||
|
||||
So far we have set up state storage, and we have set up the API server, but we haven't run anything that actually modifies
|
||||
cluster state, such as the controller manager and scheduler. To achieve this reliably, we only want to have one actor modifying state at a time, but we want replicated
|
||||
instances of these actors, in case a machine dies. To achieve this, we are going to use a lease-lock in etcd to perform
|
||||
master election. On each of the three apiserver nodes, we run a small utility application named `podmaster`. It's job is to implement a master
|
||||
election protocol using etcd "compare and swap". If the apiserver node wins the election, it starts the master component it is managing (e.g. the scheduler), if it
|
||||
loses the election, it ensures that any master components running on the node (e.g. the scheduler) are stopped.
|
||||
|
||||
In the future, we expect to more tightly integrate this lease-locking into the scheduler and controller-manager binaries directly, as described in the [high availability design proposal](../proposals/high-availability.md)
|
||||
|
||||
### Installing configuration files
|
||||
|
||||
First, create empty log files on each node, so that Docker will mount the files not make new directories:
|
||||
|
||||
```sh
|
||||
touch /var/log/kube-scheduler.log
|
||||
touch /var/log/kube-controller-manager.log
|
||||
```
|
||||
|
||||
Next, set up the descriptions of the scheduler and controller manager pods on each node.
|
||||
by copying [kube-scheduler.yaml](high-availability/kube-scheduler.yaml) and [kube-controller-manager.yaml](high-availability/kube-controller-manager.yaml) into the `/srv/kubernetes/`
|
||||
directory.
|
||||
|
||||
### Running the podmaster
|
||||
|
||||
Now that the configuration files are in place, copy the [podmaster.yaml](high-availability/podmaster.yaml) config file into `/etc/kubernetes/manifests/`
|
||||
|
||||
As before, the kubelet on the node monitors this directory, and will start an instance of the podmaster using the pod specification provided in `podmaster.yaml`.
|
||||
|
||||
Now you will have one instance of the scheduler process running on a single master node, and likewise one
|
||||
controller-manager process running on a single (possibly different) master node. If either of these processes fail,
|
||||
the kubelet will restart them. If any of these nodes fail, the process will move to a different instance of a master
|
||||
node.
|
||||
|
||||
## Conclusion
|
||||
|
||||
At this point, you are done (yeah!) with the master components, but you still need to add worker nodes (boo!).
|
||||
|
||||
If you have an existing cluster, this is as simple as reconfiguring your kubelets to talk to the load-balanced endpoint, and
|
||||
restarting the kubelets on each node.
|
||||
|
||||
If you are turning up a fresh cluster, you will need to install the kubelet and kube-proxy on each worker node, and
|
||||
set the `--apiserver` flag to your replicated endpoint.
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
|
||||
@@ -32,80 +32,7 @@ Documentation for other releases can be found at
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
# Kubernetes Cluster Admin Guide
|
||||
|
||||
The cluster admin guide is for anyone creating or administering a Kubernetes cluster.
|
||||
It assumes some familiarity with concepts in the [User Guide](../user-guide/README.md).
|
||||
|
||||
## Planning a cluster
|
||||
|
||||
There are many different examples of how to setup a kubernetes cluster. Many of them are listed in this
|
||||
[matrix](../getting-started-guides/README.md). We call each of the combinations in this matrix a *distro*.
|
||||
|
||||
Before choosing a particular guide, here are some things to consider:
|
||||
|
||||
- Are you just looking to try out Kubernetes on your laptop, or build a high-availability many-node cluster? Both
|
||||
models are supported, but some distros are better for one case or the other.
|
||||
- Will you be using a hosted Kubernetes cluster, such as [GKE](https://cloud.google.com/container-engine), or setting
|
||||
one up yourself?
|
||||
- Will your cluster be on-premises, or in the cloud (IaaS)? Kubernetes does not directly support hybrid clusters. We
|
||||
recommend setting up multiple clusters rather than spanning distant locations.
|
||||
- Will you be running Kubernetes on "bare metal" or virtual machines? Kubernetes supports both, via different distros.
|
||||
- Do you just want to run a cluster, or do you expect to do active development of kubernetes project code? If the
|
||||
latter, it is better to pick a distro actively used by other developers. Some distros only use binary releases, but
|
||||
offer is a greater variety of choices.
|
||||
- Not all distros are maintained as actively. Prefer ones which are listed as tested on a more recent version of
|
||||
Kubernetes.
|
||||
- If you are configuring kubernetes on-premises, you will need to consider what [networking
|
||||
model](networking.md) fits best.
|
||||
- If you are designing for very high-availability, you may want [clusters in multiple zones](multi-cluster.md).
|
||||
- You may want to familiarize yourself with the various
|
||||
[components](cluster-components.md) needed to run a cluster.
|
||||
|
||||
## Setting up a cluster
|
||||
|
||||
Pick one of the Getting Started Guides from the [matrix](../getting-started-guides/README.md) and follow it.
|
||||
If none of the Getting Started Guides fits, you may want to pull ideas from several of the guides.
|
||||
|
||||
One option for custom networking is *OpenVSwitch GRE/VxLAN networking* ([ovs-networking.md](ovs-networking.md)), which
|
||||
uses OpenVSwitch to set up networking between pods across
|
||||
Kubernetes nodes.
|
||||
|
||||
If you are modifying an existing guide which uses Salt, this document explains [how Salt is used in the Kubernetes
|
||||
project](salt.md).
|
||||
|
||||
## Managing a cluster, including upgrades
|
||||
|
||||
[Managing a cluster](cluster-management.md).
|
||||
|
||||
## Managing nodes
|
||||
|
||||
[Managing nodes](node.md).
|
||||
|
||||
## Optional Cluster Services
|
||||
|
||||
* **DNS Integration with SkyDNS** ([dns.md](dns.md)):
|
||||
Resolving a DNS name directly to a Kubernetes service.
|
||||
|
||||
* **Logging** with [Kibana](../user-guide/logging.md)
|
||||
|
||||
## Multi-tenant support
|
||||
|
||||
* **Resource Quota** ([resource-quota.md](resource-quota.md))
|
||||
|
||||
## Security
|
||||
|
||||
* **Kubernetes Container Environment** ([docs/user-guide/container-environment.md](../user-guide/container-environment.md)):
|
||||
Describes the environment for Kubelet managed containers on a Kubernetes
|
||||
node.
|
||||
|
||||
* **Securing access to the API Server** [accessing the api](accessing-the-api.md)
|
||||
|
||||
* **Authentication** [authentication](authentication.md)
|
||||
|
||||
* **Authorization** [authorization](authorization.md)
|
||||
|
||||
* **Admission Controllers** [admission_controllers](admission-controllers.md)
|
||||
This file has moved to: http://kubernetes.github.io/docs/admin/introduction/
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
|
||||
@@ -31,201 +31,8 @@ Documentation for other releases can be found at
|
||||
<!-- END STRIP_FOR_RELEASE -->
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
Limit Range
|
||||
========================================
|
||||
By default, pods run with unbounded CPU and memory limits. This means that any pod in the
|
||||
system will be able to consume as much CPU and memory on the node that executes the pod.
|
||||
|
||||
Users may want to impose restrictions on the amount of resource a single pod in the system may consume
|
||||
for a variety of reasons.
|
||||
|
||||
For example:
|
||||
|
||||
1. Each node in the cluster has 2GB of memory. The cluster operator does not want to accept pods
|
||||
that require more than 2GB of memory since no node in the cluster can support the requirement. To prevent a
|
||||
pod from being permanently unscheduled to a node, the operator instead chooses to reject pods that exceed 2GB
|
||||
of memory as part of admission control.
|
||||
2. A cluster is shared by two communities in an organization that runs production and development workloads
|
||||
respectively. Production workloads may consume up to 8GB of memory, but development workloads may consume up
|
||||
to 512MB of memory. The cluster operator creates a separate namespace for each workload, and applies limits to
|
||||
each namespace.
|
||||
3. Users may create a pod which consumes resources just below the capacity of a machine. The left over space
|
||||
may be too small to be useful, but big enough for the waste to be costly over the entire cluster. As a result,
|
||||
the cluster operator may want to set limits that a pod must consume at least 20% of the memory and cpu of their
|
||||
average node size in order to provide for more uniform scheduling and to limit waste.
|
||||
|
||||
This example demonstrates how limits can be applied to a Kubernetes namespace to control
|
||||
min/max resource limits per pod. In addition, this example demonstrates how you can
|
||||
apply default resource limits to pods in the absence of an end-user specified value.
|
||||
|
||||
See [LimitRange design doc](../../design/admission_control_limit_range.md) for more information. For a detailed description of the Kubernetes resource model, see [Resources](../../../docs/user-guide/compute-resources.md)
|
||||
|
||||
Step 0: Prerequisites
|
||||
-----------------------------------------
|
||||
This example requires a running Kubernetes cluster. See the [Getting Started guides](../../../docs/getting-started-guides/) for how to get started.
|
||||
|
||||
Change to the `<kubernetes>` directory if you're not already there.
|
||||
|
||||
Step 1: Create a namespace
|
||||
-----------------------------------------
|
||||
This example will work in a custom namespace to demonstrate the concepts involved.
|
||||
|
||||
Let's create a new namespace called limit-example:
|
||||
|
||||
```console
|
||||
$ kubectl create -f docs/admin/limitrange/namespace.yaml
|
||||
namespace "limit-example" created
|
||||
$ kubectl get namespaces
|
||||
NAME LABELS STATUS AGE
|
||||
default <none> Active 5m
|
||||
limit-example <none> Active 53s
|
||||
```
|
||||
|
||||
Step 2: Apply a limit to the namespace
|
||||
-----------------------------------------
|
||||
Let's create a simple limit in our namespace.
|
||||
|
||||
```console
|
||||
$ kubectl create -f docs/admin/limitrange/limits.yaml --namespace=limit-example
|
||||
limitrange "mylimits" created
|
||||
```
|
||||
|
||||
Let's describe the limits that we have imposed in our namespace.
|
||||
|
||||
```console
|
||||
$ kubectl describe limits mylimits --namespace=limit-example
|
||||
Name: mylimits
|
||||
Namespace: limit-example
|
||||
Type Resource Min Max Default Request Default Limit Max Limit/Request Ratio
|
||||
---- -------- --- --- --------------- ------------- -----------------------
|
||||
Pod cpu 200m 2 - - -
|
||||
Pod memory 6Mi 1Gi - - -
|
||||
Container cpu 100m 2 200m 300m -
|
||||
Container memory 3Mi 1Gi 100Mi 200Mi -
|
||||
```
|
||||
|
||||
In this scenario, we have said the following:
|
||||
|
||||
1. If a max constraint is specified for a resource (2 CPU and 1Gi memory in this case), then a limit
|
||||
must be specified for that resource across all containers. Failure to specify a limit will result in
|
||||
a validation error when attempting to create the pod. Note that a default value of limit is set by
|
||||
*default* in file `limits.yaml` (300m CPU and 200Mi memory).
|
||||
2. If a min constraint is specified for a resource (100m CPU and 3Mi memory in this case), then a
|
||||
request must be specified for that resource across all containers. Failure to specify a request will
|
||||
result in a validation error when attempting to create the pod. Note that a default value of request is
|
||||
set by *defaultRequest* in file `limits.yaml` (200m CPU and 100Mi memory).
|
||||
3. For any pod, the sum of all containers memory requests must be >= 6Mi and the sum of all containers
|
||||
memory limits must be <= 1Gi; the sum of all containers CPU requests must be >= 200m and the sum of all
|
||||
containers CPU limits must be <= 2.
|
||||
|
||||
Step 3: Enforcing limits at point of creation
|
||||
-----------------------------------------
|
||||
The limits enumerated in a namespace are only enforced when a pod is created or updated in
|
||||
the cluster. If you change the limits to a different value range, it does not affect pods that
|
||||
were previously created in a namespace.
|
||||
|
||||
If a resource (cpu or memory) is being restricted by a limit, the user will get an error at time
|
||||
of creation explaining why.
|
||||
|
||||
Let's first spin up a replication controller that creates a single container pod to demonstrate
|
||||
how default values are applied to each pod.
|
||||
|
||||
```console
|
||||
$ kubectl run nginx --image=nginx --replicas=1 --namespace=limit-example
|
||||
replicationcontroller "nginx" created
|
||||
$ kubectl get pods --namespace=limit-example
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
nginx-aq0mf 1/1 Running 0 35s
|
||||
$ kubectl get pods nginx-aq0mf --namespace=limit-example -o yaml | grep resources -C 8
|
||||
```
|
||||
|
||||
```yaml
|
||||
resourceVersion: "127"
|
||||
selfLink: /api/v1/namespaces/limit-example/pods/nginx-aq0mf
|
||||
uid: 51be42a7-7156-11e5-9921-286ed488f785
|
||||
spec:
|
||||
containers:
|
||||
- image: nginx
|
||||
imagePullPolicy: IfNotPresent
|
||||
name: nginx
|
||||
resources:
|
||||
limits:
|
||||
cpu: 300m
|
||||
memory: 200Mi
|
||||
requests:
|
||||
cpu: 200m
|
||||
memory: 100Mi
|
||||
terminationMessagePath: /dev/termination-log
|
||||
volumeMounts:
|
||||
```
|
||||
|
||||
Note that our nginx container has picked up the namespace default cpu and memory resource *limits* and *requests*.
|
||||
|
||||
Let's create a pod that exceeds our allowed limits by having it have a container that requests 3 cpu cores.
|
||||
|
||||
```console
|
||||
$ kubectl create -f docs/admin/limitrange/invalid-pod.yaml --namespace=limit-example
|
||||
Error from server: error when creating "docs/admin/limitrange/invalid-pod.yaml": Pod "invalid-pod" is forbidden: [Maximum cpu usage per Pod is 2, but limit is 3., Maximum cpu usage per Container is 2, but limit is 3.]
|
||||
```
|
||||
|
||||
Let's create a pod that falls within the allowed limit boundaries.
|
||||
|
||||
```console
|
||||
$ kubectl create -f docs/admin/limitrange/valid-pod.yaml --namespace=limit-example
|
||||
pod "valid-pod" created
|
||||
$ kubectl get pods valid-pod --namespace=limit-example -o yaml | grep -C 6 resources
|
||||
```
|
||||
|
||||
```yaml
|
||||
uid: 162a12aa-7157-11e5-9921-286ed488f785
|
||||
spec:
|
||||
containers:
|
||||
- image: gcr.io/google_containers/serve_hostname
|
||||
imagePullPolicy: IfNotPresent
|
||||
name: kubernetes-serve-hostname
|
||||
resources:
|
||||
limits:
|
||||
cpu: "1"
|
||||
memory: 512Mi
|
||||
requests:
|
||||
cpu: "1"
|
||||
memory: 512Mi
|
||||
```
|
||||
|
||||
Note that this pod specifies explicit resource *limits* and *requests* so it did not pick up the namespace
|
||||
default values.
|
||||
|
||||
Note: The *limits* for CPU resource are not enforced in the default Kubernetes setup on the physical node
|
||||
that runs the container unless the administrator deploys the kubelet with the folllowing flag:
|
||||
|
||||
```
|
||||
$ kubelet --help
|
||||
Usage of kubelet
|
||||
....
|
||||
--cpu-cfs-quota[=false]: Enable CPU CFS quota enforcement for containers that specify CPU limits
|
||||
$ kubelet --cpu-cfs-quota=true ...
|
||||
```
|
||||
|
||||
Step 4: Cleanup
|
||||
----------------------------
|
||||
To remove the resources used by this example, you can just delete the limit-example namespace.
|
||||
|
||||
```console
|
||||
$ kubectl delete namespace limit-example
|
||||
namespace "limit-example" deleted
|
||||
$ kubectl get namespaces
|
||||
NAME LABELS STATUS AGE
|
||||
default <none> Active 20m
|
||||
```
|
||||
|
||||
Summary
|
||||
----------------------------
|
||||
Cluster operators that want to restrict the amount of resources a single container or pod may consume
|
||||
are able to define allowable ranges per Kubernetes namespace. In the absence of any explicit assignments,
|
||||
the Kubernetes system is able to apply default resource *limits* and *requests* if desired in order to
|
||||
constrain the amount of resource a pod consumes on a node.
|
||||
|
||||
|
||||
This file has moved to: http://kubernetes.github.io/docs/admin/limitrange/README/
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
|
||||
@@ -27,98 +27,7 @@ Documentation for other releases can be found at
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
# Master <-> Node Communication
|
||||
|
||||
**Table of Contents**
|
||||
<!-- BEGIN MUNGE: GENERATED_TOC -->
|
||||
|
||||
- [Master <-> Node Communication](#master---node-communication)
|
||||
- [Summary](#summary)
|
||||
- [Cluster -> Master](#cluster---master)
|
||||
- [Master -> Cluster](#master---cluster)
|
||||
- [SSH Tunnels](#ssh-tunnels)
|
||||
|
||||
<!-- END MUNGE: GENERATED_TOC -->
|
||||
|
||||
## Summary
|
||||
|
||||
This document catalogs the communication paths between the master (really the
|
||||
apiserver) and the Kubernetes cluster. The intent is to allow users to
|
||||
customize their installation to harden the network configuration such that
|
||||
the cluster can be run on an untrusted network (or on fully public IPs on a
|
||||
cloud provider).
|
||||
|
||||
## Cluster -> Master
|
||||
|
||||
All communication paths from the cluster to the master terminate at the
|
||||
apiserver (none of the other master components are designed to expose remote
|
||||
services). In a typical deployment, the apiserver is configured to listen for
|
||||
remote connections on a secure HTTPS port (443) with one or more forms of
|
||||
client [authentication](authentication.md) enabled.
|
||||
|
||||
Nodes should be provisioned with the public root certificate for the cluster
|
||||
such that they can connect securely to the apiserver along with valid client
|
||||
credentials. For example, on a default GCE deployment, the client credentials
|
||||
provided to the kubelet are in the form of a client certificate. Pods that
|
||||
wish to connect to the apiserver can do so securely by leveraging a service
|
||||
account so that Kubernetes will automatically inject the public root
|
||||
certificate and a valid bearer token into the pod when it is instantiated.
|
||||
The `kubernetes` service (in all namespaces) is configured with a virtual IP
|
||||
address that is redirected (via kube-proxy) to the HTTPS endpoint on the
|
||||
apiserver.
|
||||
|
||||
The master components communicate with the cluster apiserver over the
|
||||
insecure (not encrypted or authenticated) port. This port is typically only
|
||||
exposed on the localhost interface of the master machine, so that the master
|
||||
components, all running on the same machine, can communicate with the
|
||||
cluster apiserver. Over time, the master components will be migrated to use
|
||||
the secure port with authentication and authorization (see
|
||||
[#13598](https://github.com/kubernetes/kubernetes/issues/13598)).
|
||||
|
||||
As a result, the default operating mode for connections from the cluster
|
||||
(nodes and pods running on the nodes) to the master is secured by default
|
||||
and can run over untrusted and/or public networks.
|
||||
|
||||
## Master -> Cluster
|
||||
|
||||
There are two primary communication paths from the master (apiserver) to the
|
||||
cluster. The first is from the apiserver to the kubelet process which runs on
|
||||
each node in the cluster. The second is from the apiserver to any node, pod,
|
||||
or service through the apiserver's proxy functionality.
|
||||
|
||||
The connections from the apiserver to the kubelet are used for fetching logs
|
||||
for pods, attaching (through kubectl) to running pods, and using the kubelet's
|
||||
port-forwarding functionality. These connections terminate at the kubelet's
|
||||
HTTPS endpoint, which is typically using a self-signed certificate, and
|
||||
ignore the certificate presented by the kubelet (although you can override this
|
||||
behavior by specifying the `--kubelet-certificate-authority`,
|
||||
`--kubelet-client-certificate`, and `--kubelet-client-key` flags when starting
|
||||
the cluster apiserver). By default, these connections **are not currently safe**
|
||||
to run over untrusted and/or public networks as they are subject to
|
||||
man-in-the-middle attacks.
|
||||
|
||||
The connections from the apiserver to a node, pod, or service default to plain
|
||||
HTTP connections and are therefore neither authenticated nor encrypted. They
|
||||
can be run over a secure HTTPS connection by prefixing `https:` to the node,
|
||||
pod, or service name in the API URL, but they will not validate the certificate
|
||||
provided by the HTTPS endpoint nor provide client credentials so while the
|
||||
connection will by encrypted, it will not provide any guarantees of integrity.
|
||||
These connections **are not currently safe** to run over untrusted and/or
|
||||
public networks.
|
||||
|
||||
### SSH Tunnels
|
||||
|
||||
[Google Container Engine](https://cloud.google.com/container-engine/docs/) uses
|
||||
SSH tunnels to protect the Master -> Cluster communication paths. In this
|
||||
configuration, the apiserver initiates an SSH tunnel to each node in the
|
||||
cluster (connecting to the ssh server listening on port 22) and passes all
|
||||
traffic destined for a kubelet, node, pod, or service through the tunnel.
|
||||
This tunnel ensures that the traffic is not exposed outside of the private
|
||||
GCE network in which the cluster is running.
|
||||
|
||||
|
||||
|
||||
|
||||
This file has moved to: http://kubernetes.github.io/docs/admin/master-node-communication/
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
|
||||
@@ -32,68 +32,7 @@ Documentation for other releases can be found at
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
# Considerations for running multiple Kubernetes clusters
|
||||
|
||||
You may want to set up multiple Kubernetes clusters, both to
|
||||
have clusters in different regions to be nearer to your users, and to tolerate failures and/or invasive maintenance.
|
||||
This document describes some of the issues to consider when making a decision about doing so.
|
||||
|
||||
Note that at present,
|
||||
Kubernetes does not offer a mechanism to aggregate multiple clusters into a single virtual cluster. However,
|
||||
we [plan to do this in the future](../proposals/federation.md).
|
||||
|
||||
## Scope of a single cluster
|
||||
|
||||
On IaaS providers such as Google Compute Engine or Amazon Web Services, a VM exists in a
|
||||
[zone](https://cloud.google.com/compute/docs/zones) or [availability
|
||||
zone](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html).
|
||||
We suggest that all the VMs in a Kubernetes cluster should be in the same availability zone, because:
|
||||
- compared to having a single global Kubernetes cluster, there are fewer single-points of failure
|
||||
- compared to a cluster that spans availability zones, it is easier to reason about the availability properties of a
|
||||
single-zone cluster.
|
||||
- when the Kubernetes developers are designing the system (e.g. making assumptions about latency, bandwidth, or
|
||||
correlated failures) they are assuming all the machines are in a single data center, or otherwise closely connected.
|
||||
|
||||
It is okay to have multiple clusters per availability zone, though on balance we think fewer is better.
|
||||
Reasons to prefer fewer clusters are:
|
||||
- improved bin packing of Pods in some cases with more nodes in one cluster (less resource fragmentation)
|
||||
- reduced operational overhead (though the advantage is diminished as ops tooling and processes matures)
|
||||
- reduced costs for per-cluster fixed resource costs, e.g. apiserver VMs (but small as a percentage
|
||||
of overall cluster cost for medium to large clusters).
|
||||
|
||||
Reasons to have multiple clusters include:
|
||||
- strict security policies requiring isolation of one class of work from another (but, see Partitioning Clusters
|
||||
below).
|
||||
- test clusters to canary new Kubernetes releases or other cluster software.
|
||||
|
||||
## Selecting the right number of clusters
|
||||
|
||||
The selection of the number of Kubernetes clusters may be a relatively static choice, only revisited occasionally.
|
||||
By contrast, the number of nodes in a cluster and the number of pods in a service may be change frequently according to
|
||||
load and growth.
|
||||
|
||||
To pick the number of clusters, first, decide which regions you need to be in to have adequate latency to all your end users, for services that will run
|
||||
on Kubernetes (if you use a Content Distribution Network, the latency requirements for the CDN-hosted content need not
|
||||
be considered). Legal issues might influence this as well. For example, a company with a global customer base might decide to have clusters in US, EU, AP, and SA regions.
|
||||
Call the number of regions to be in `R`.
|
||||
|
||||
Second, decide how many clusters should be able to be unavailable at the same time, while still being available. Call
|
||||
the number that can be unavailable `U`. If you are not sure, then 1 is a fine choice.
|
||||
|
||||
If it is allowable for load-balancing to direct traffic to any region in the event of a cluster failure, then
|
||||
you need at least the larger of `R` or `U + 1` clusters. If it is not (e.g you want to ensure low latency for all
|
||||
users in the event of a cluster failure), then you need to have `R * (U + 1)` clusters
|
||||
(`U + 1` in each of `R` regions). In any case, try to put each cluster in a different zone.
|
||||
|
||||
Finally, if any of your clusters would need more than the maximum recommended number of nodes for a Kubernetes cluster, then
|
||||
you may need even more clusters. Kubernetes v1.0 currently supports clusters up to 100 nodes in size, but we are targeting
|
||||
1000-node clusters by early 2016.
|
||||
|
||||
## Working with multiple clusters
|
||||
|
||||
When you have multiple clusters, you would typically create services with the same config in each cluster and put each of those
|
||||
service instances behind a load balancer (AWS Elastic Load Balancer, GCE Forwarding Rule or HTTP Load Balancer) spanning all of them, so that
|
||||
failures of a single cluster are not visible to end users.
|
||||
This file has moved to: http://kubernetes.github.io/docs/admin/multi-cluster/
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
|
||||
@@ -32,152 +32,7 @@ Documentation for other releases can be found at
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
# Namespaces
|
||||
|
||||
## Abstract
|
||||
|
||||
A Namespace is a mechanism to partition resources created by users into
|
||||
a logically named group.
|
||||
|
||||
## Motivation
|
||||
|
||||
A single cluster should be able to satisfy the needs of multiple users or groups of users (henceforth a 'user community').
|
||||
|
||||
Each user community wants to be able to work in isolation from other communities.
|
||||
|
||||
Each user community has its own:
|
||||
|
||||
1. resources (pods, services, replication controllers, etc.)
|
||||
2. policies (who can or cannot perform actions in their community)
|
||||
3. constraints (this community is allowed this much quota, etc.)
|
||||
|
||||
A cluster operator may create a Namespace for each unique user community.
|
||||
|
||||
The Namespace provides a unique scope for:
|
||||
|
||||
1. named resources (to avoid basic naming collisions)
|
||||
2. delegated management authority to trusted users
|
||||
3. ability to limit community resource consumption
|
||||
|
||||
## Use cases
|
||||
|
||||
1. As a cluster operator, I want to support multiple user communities on a single cluster.
|
||||
2. As a cluster operator, I want to delegate authority to partitions of the cluster to trusted users
|
||||
in those communities.
|
||||
3. As a cluster operator, I want to limit the amount of resources each community can consume in order
|
||||
to limit the impact to other communities using the cluster.
|
||||
4. As a cluster user, I want to interact with resources that are pertinent to my user community in
|
||||
isolation of what other user communities are doing on the cluster.
|
||||
|
||||
|
||||
## Usage
|
||||
|
||||
Look [here](namespaces/) for an in depth example of namespaces.
|
||||
|
||||
### Viewing namespaces
|
||||
|
||||
You can list the current namespaces in a cluster using:
|
||||
|
||||
```console
|
||||
$ kubectl get namespaces
|
||||
NAME LABELS STATUS
|
||||
default <none> Active
|
||||
kube-system <none> Active
|
||||
```
|
||||
|
||||
Kubernetes starts with two initial namespaces:
|
||||
* `default` The default namespace for objects with no other namespace
|
||||
* `kube-system` The namespace for objects created by the Kubernetes system
|
||||
|
||||
You can also get the summary of a specific namespace using:
|
||||
|
||||
```console
|
||||
$ kubectl get namespaces <name>
|
||||
```
|
||||
|
||||
Or you can get detailed information with:
|
||||
|
||||
```console
|
||||
$ kubectl describe namespaces <name>
|
||||
Name: default
|
||||
Labels: <none>
|
||||
Status: Active
|
||||
|
||||
No resource quota.
|
||||
|
||||
Resource Limits
|
||||
Type Resource Min Max Default
|
||||
---- -------- --- --- ---
|
||||
Container cpu - - 100m
|
||||
```
|
||||
|
||||
Note that these details show both resource quota (if present) as well as resource limit ranges.
|
||||
|
||||
Resource quota tracks aggregate usage of resources in the *Namespace* and allows cluster operators
|
||||
to define *Hard* resource usage limits that a *Namespace* may consume.
|
||||
|
||||
A limit range defines min/max constraints on the amount of resources a single entity can consume in
|
||||
a *Namespace*.
|
||||
|
||||
See [Admission control: Limit Range](../design/admission_control_limit_range.md)
|
||||
|
||||
A namespace can be in one of two phases:
|
||||
* `Active` the namespace is in use
|
||||
* `Terminating` the namespace is being deleted, and can not be used for new objects
|
||||
|
||||
See the [design doc](../design/namespaces.md#phases) for more details.
|
||||
|
||||
### Creating a new namespace
|
||||
|
||||
To create a new namespace, first create a new YAML file called `my-namespace.yaml` with the contents:
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Namespace
|
||||
metadata:
|
||||
name: <insert-namespace-name-here>
|
||||
```
|
||||
|
||||
Note that the name of your namespace must be a DNS compatible label.
|
||||
|
||||
More information on the `finalizers` field can be found in the namespace [design doc](../design/namespaces.md#finalizers).
|
||||
|
||||
Then run:
|
||||
|
||||
```console
|
||||
$ kubectl create -f ./my-namespace.yaml
|
||||
```
|
||||
|
||||
### Working in namespaces
|
||||
|
||||
See [Setting the namespace for a request](../../docs/user-guide/namespaces.md#setting-the-namespace-for-a-request)
|
||||
and [Setting the namespace preference](../../docs/user-guide/namespaces.md#setting-the-namespace-preference).
|
||||
|
||||
### Deleting a namespace
|
||||
|
||||
You can delete a namespace with
|
||||
|
||||
```console
|
||||
$ kubectl delete namespaces <insert-some-namespace-name>
|
||||
```
|
||||
|
||||
**WARNING, this deletes _everything_ under the namespace!**
|
||||
|
||||
This delete is asynchronous, so for a time you will see the namespace in the `Terminating` state.
|
||||
|
||||
## Namespaces and DNS
|
||||
|
||||
When you create a [Service](../../docs/user-guide/services.md), it creates a corresponding [DNS entry](dns.md).
|
||||
This entry is of the form `<service-name>.<namespace-name>.svc.cluster.local`, which means
|
||||
that if a container just uses `<service-name>` it will resolve to the service which
|
||||
is local to a namespace. This is useful for using the same configuration across
|
||||
multiple namespaces such as Development, Staging and Production. If you want to reach
|
||||
across namespaces, you need to use the fully qualified domain name (FQDN).
|
||||
|
||||
## Design
|
||||
|
||||
Details of the design of namespaces in Kubernetes, including a [detailed example](../design/namespaces.md#example-openshift-origin-managing-a-kubernetes-namespace)
|
||||
can be found in the [namespaces design doc](../design/namespaces.md)
|
||||
This file has moved to: http://kubernetes.github.io/docs/admin/namespaces/
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
|
||||
@@ -32,256 +32,7 @@ Documentation for other releases can be found at
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
## Kubernetes Namespaces
|
||||
|
||||
Kubernetes _[namespaces](../../../docs/admin/namespaces.md)_ help different projects, teams, or customers to share a Kubernetes cluster.
|
||||
|
||||
It does this by providing the following:
|
||||
|
||||
1. A scope for [Names](../../user-guide/identifiers.md).
|
||||
2. A mechanism to attach authorization and policy to a subsection of the cluster.
|
||||
|
||||
Use of multiple namespaces is optional.
|
||||
|
||||
This example demonstrates how to use Kubernetes namespaces to subdivide your cluster.
|
||||
|
||||
### Step Zero: Prerequisites
|
||||
|
||||
This example assumes the following:
|
||||
|
||||
1. You have an [existing Kubernetes cluster](../../getting-started-guides/).
|
||||
2. You have a basic understanding of Kubernetes _[pods](../../user-guide/pods.md)_, _[services](../../user-guide/services.md)_, and _[replication controllers](../../user-guide/replication-controller.md)_.
|
||||
|
||||
### Step One: Understand the default namespace
|
||||
|
||||
By default, a Kubernetes cluster will instantiate a default namespace when provisioning the cluster to hold the default set of pods,
|
||||
services, and replication controllers used by the cluster.
|
||||
|
||||
Assuming you have a fresh cluster, you can introspect the available namespace's by doing the following:
|
||||
|
||||
```console
|
||||
$ kubectl get namespaces
|
||||
NAME LABELS
|
||||
default <none>
|
||||
```
|
||||
|
||||
### Step Two: Create new namespaces
|
||||
|
||||
For this exercise, we will create two additional Kubernetes namespaces to hold our content.
|
||||
|
||||
Let's imagine a scenario where an organization is using a shared Kubernetes cluster for development and production use cases.
|
||||
|
||||
The development team would like to maintain a space in the cluster where they can get a view on the list of pods, services, and replication controllers
|
||||
they use to build and run their application. In this space, Kubernetes resources come and go, and the restrictions on who can or cannot modify resources
|
||||
are relaxed to enable agile development.
|
||||
|
||||
The operations team would like to maintain a space in the cluster where they can enforce strict procedures on who can or cannot manipulate the set of
|
||||
pods, services, and replication controllers that run the production site.
|
||||
|
||||
One pattern this organization could follow is to partition the Kubernetes cluster into two namespaces: development and production.
|
||||
|
||||
Let's create two new namespaces to hold our work.
|
||||
|
||||
Use the file [`namespace-dev.json`](namespace-dev.json) which describes a development namespace:
|
||||
|
||||
<!-- BEGIN MUNGE: EXAMPLE namespace-dev.json -->
|
||||
|
||||
```json
|
||||
{
|
||||
"kind": "Namespace",
|
||||
"apiVersion": "v1",
|
||||
"metadata": {
|
||||
"name": "development",
|
||||
"labels": {
|
||||
"name": "development"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
[Download example](namespace-dev.json?raw=true)
|
||||
<!-- END MUNGE: EXAMPLE namespace-dev.json -->
|
||||
|
||||
Create the development namespace using kubectl.
|
||||
|
||||
```console
|
||||
$ kubectl create -f docs/admin/namespaces/namespace-dev.json
|
||||
```
|
||||
|
||||
And then lets create the production namespace using kubectl.
|
||||
|
||||
```console
|
||||
$ kubectl create -f docs/admin/namespaces/namespace-prod.json
|
||||
```
|
||||
|
||||
To be sure things are right, let's list all of the namespaces in our cluster.
|
||||
|
||||
```console
|
||||
$ kubectl get namespaces
|
||||
NAME LABELS STATUS
|
||||
default <none> Active
|
||||
development name=development Active
|
||||
production name=production Active
|
||||
```
|
||||
|
||||
|
||||
### Step Three: Create pods in each namespace
|
||||
|
||||
A Kubernetes namespace provides the scope for pods, services, and replication controllers in the cluster.
|
||||
|
||||
Users interacting with one namespace do not see the content in another namespace.
|
||||
|
||||
To demonstrate this, let's spin up a simple replication controller and pod in the development namespace.
|
||||
|
||||
We first check what is the current context:
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
clusters:
|
||||
- cluster:
|
||||
certificate-authority-data: REDACTED
|
||||
server: https://130.211.122.180
|
||||
name: lithe-cocoa-92103_kubernetes
|
||||
contexts:
|
||||
- context:
|
||||
cluster: lithe-cocoa-92103_kubernetes
|
||||
user: lithe-cocoa-92103_kubernetes
|
||||
name: lithe-cocoa-92103_kubernetes
|
||||
current-context: lithe-cocoa-92103_kubernetes
|
||||
kind: Config
|
||||
preferences: {}
|
||||
users:
|
||||
- name: lithe-cocoa-92103_kubernetes
|
||||
user:
|
||||
client-certificate-data: REDACTED
|
||||
client-key-data: REDACTED
|
||||
token: 65rZW78y8HbwXXtSXuUw9DbP4FLjHi4b
|
||||
- name: lithe-cocoa-92103_kubernetes-basic-auth
|
||||
user:
|
||||
password: h5M0FtUUIflBSdI7
|
||||
username: admin
|
||||
```
|
||||
|
||||
The next step is to define a context for the kubectl client to work in each namespace. The value of "cluster" and "user" fields are copied from the current context.
|
||||
|
||||
```console
|
||||
$ kubectl config set-context dev --namespace=development --cluster=lithe-cocoa-92103_kubernetes --user=lithe-cocoa-92103_kubernetes
|
||||
$ kubectl config set-context prod --namespace=production --cluster=lithe-cocoa-92103_kubernetes --user=lithe-cocoa-92103_kubernetes
|
||||
```
|
||||
|
||||
The above commands provided two request contexts you can alternate against depending on what namespace you
|
||||
wish to work against.
|
||||
|
||||
Let's switch to operate in the development namespace.
|
||||
|
||||
```console
|
||||
$ kubectl config use-context dev
|
||||
```
|
||||
|
||||
You can verify your current context by doing the following:
|
||||
|
||||
```console
|
||||
$ kubectl config view
|
||||
```
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
clusters:
|
||||
- cluster:
|
||||
certificate-authority-data: REDACTED
|
||||
server: https://130.211.122.180
|
||||
name: lithe-cocoa-92103_kubernetes
|
||||
contexts:
|
||||
- context:
|
||||
cluster: lithe-cocoa-92103_kubernetes
|
||||
namespace: development
|
||||
user: lithe-cocoa-92103_kubernetes
|
||||
name: dev
|
||||
- context:
|
||||
cluster: lithe-cocoa-92103_kubernetes
|
||||
user: lithe-cocoa-92103_kubernetes
|
||||
name: lithe-cocoa-92103_kubernetes
|
||||
- context:
|
||||
cluster: lithe-cocoa-92103_kubernetes
|
||||
namespace: production
|
||||
user: lithe-cocoa-92103_kubernetes
|
||||
name: prod
|
||||
current-context: dev
|
||||
kind: Config
|
||||
preferences: {}
|
||||
users:
|
||||
- name: lithe-cocoa-92103_kubernetes
|
||||
user:
|
||||
client-certificate-data: REDACTED
|
||||
client-key-data: REDACTED
|
||||
token: 65rZW78y8HbwXXtSXuUw9DbP4FLjHi4b
|
||||
- name: lithe-cocoa-92103_kubernetes-basic-auth
|
||||
user:
|
||||
password: h5M0FtUUIflBSdI7
|
||||
username: admin
|
||||
```
|
||||
|
||||
At this point, all requests we make to the Kubernetes cluster from the command line are scoped to the development namespace.
|
||||
|
||||
Let's create some content.
|
||||
|
||||
```console
|
||||
$ kubectl run snowflake --image=kubernetes/serve_hostname --replicas=2
|
||||
```
|
||||
|
||||
We have just created a replication controller whose replica size is 2 that is running the pod called snowflake with a basic container that just serves the hostname.
|
||||
|
||||
```console
|
||||
$ kubectl get rc
|
||||
CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS
|
||||
snowflake snowflake kubernetes/serve_hostname run=snowflake 2
|
||||
|
||||
$ kubectl get pods
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
snowflake-8w0qn 1/1 Running 0 22s
|
||||
snowflake-jrpzb 1/1 Running 0 22s
|
||||
```
|
||||
|
||||
And this is great, developers are able to do what they want, and they do not have to worry about affecting content in the production namespace.
|
||||
|
||||
Let's switch to the production namespace and show how resources in one namespace are hidden from the other.
|
||||
|
||||
```console
|
||||
$ kubectl config use-context prod
|
||||
```
|
||||
|
||||
The production namespace should be empty.
|
||||
|
||||
```console
|
||||
$ kubectl get rc
|
||||
CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS
|
||||
|
||||
$ kubectl get pods
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
```
|
||||
|
||||
Production likes to run cattle, so let's create some cattle pods.
|
||||
|
||||
```console
|
||||
$ kubectl run cattle --image=kubernetes/serve_hostname --replicas=5
|
||||
|
||||
$ kubectl get rc
|
||||
CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS
|
||||
cattle cattle kubernetes/serve_hostname run=cattle 5
|
||||
|
||||
$ kubectl get pods
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
cattle-97rva 1/1 Running 0 12s
|
||||
cattle-i9ojn 1/1 Running 0 12s
|
||||
cattle-qj3yv 1/1 Running 0 12s
|
||||
cattle-yc7vn 1/1 Running 0 12s
|
||||
cattle-zz7ea 1/1 Running 0 12s
|
||||
```
|
||||
|
||||
At this point, it should be clear that the resources users create in one namespace are hidden from the other namespace.
|
||||
|
||||
As the policy support in Kubernetes evolves, we will extend this scenario to show how you can provide different
|
||||
authorization rules for each namespace.
|
||||
This file has moved to: http://kubernetes.github.io/docs/admin/namespaces/README/
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
|
||||
@@ -27,44 +27,7 @@ Documentation for other releases can be found at
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
# Network Plugins
|
||||
|
||||
__Disclaimer__: Network plugins are in alpha. Its contents will change rapidly.
|
||||
|
||||
Network plugins in Kubernetes come in a few flavors:
|
||||
* Plain vanilla exec plugins - deprecated in favor of CNI plugins.
|
||||
* CNI plugins: adhere to the appc/CNI specification, designed for interoperability.
|
||||
* Kubenet plugin: implements basic `cbr0` using the `bridge` and `host-local` CNI plugins
|
||||
|
||||
## Installation
|
||||
|
||||
The kubelet has a single default network plugin, and a default network common to the entire cluster. It probes for plugins when it starts up, remembers what it found, and executes the selected plugin at appropriate times in the pod lifecycle (this is only true for docker, as rkt manages its own CNI plugins). There are two Kubelet command line parameters to keep in mind when using plugins:
|
||||
* `network-plugin-dir`: Kubelet probes this directory for plugins on startup
|
||||
* `network-plugin`: The network plugin to use from `network-plugin-dir`. It must match the name reported by a plugin probed from the plugin directory. For CNI plugins, this is simply "cni".
|
||||
|
||||
## Network Plugin Requirements
|
||||
|
||||
Besides providing the [`NetworkPlugin` interface](../../pkg/kubelet/network/plugins.go) to configure and clean up pod networking, the plugin may also need specific support for kube-proxy. The iptables proxy obviously depends on iptables, and the plugin may need to ensure that container traffic is made available to iptables. For example, if the plugin connects containers to a Linux bridge, the plugin must set the `net/bridge/bridge-nf-call-iptables` sysctl to `1` to ensure that the iptables proxy functions correctly. If the plugin does not use a Linux bridge (but instead something like Open vSwitch or some other mechanism) it should ensure container traffic is appropriately routed for the proxy.
|
||||
|
||||
By default if no kubelet network plugin is specified, the `noop` plugin is used, which sets `net/bridge/bridge-nf-call-iptables=1` to ensure simple configurations (like docker with a bridge) work correctly with the iptables proxy.
|
||||
|
||||
### Exec
|
||||
|
||||
Place plugins in `network-plugin-dir/plugin-name/plugin-name`, i.e if you have a bridge plugin and `network-plugin-dir` is `/usr/lib/kubernetes`, you'd place the bridge plugin executable at `/usr/lib/kubernetes/bridge/bridge`. See [this comment](../../pkg/kubelet/network/exec/exec.go) for more details.
|
||||
|
||||
### CNI
|
||||
|
||||
The CNI plugin is selected by passing Kubelet the `--network-plugin=cni` command-line option. Kubelet reads the first CNI configuration file from `--network-plugin-dir` and uses the CNI configuration from that file to set up each pod's network. The CNI configuration file must match the [CNI specification](https://github.com/appc/cni/blob/master/SPEC.md), and any required CNI plugins referenced by the configuration must be present in `/opt/cni/bin`.
|
||||
|
||||
### kubenet
|
||||
|
||||
The Linux-only kubenet plugin provides functionality similar to the `--configure-cbr0` kubelet command-line option. It creates a Linux bridge named `cbr0` and creates a veth pair for each pod with the host end of each pair connected to `cbr0`. The pod end of the pair is assigned an IP address allocated from a range assigned to the node through either configuration or by the controller-manager. `cbr0` is assigned an MTU matching the smallest MTU of an enabled normal interface on the host. The kubenet plugin is currently mutually exclusive with, and will eventually replace, the --configure-cbr0 option. It is also currently incompatible with the flannel experimental overlay.
|
||||
|
||||
The plugin requires a few things:
|
||||
* The standard CNI `bridge` and `host-local` plugins to be placed in `/opt/cni/bin`.
|
||||
* Kubelet must be run with the `--network-plugin=kubenet` argument to enable the plugin
|
||||
* Kubelet must also be run with the `--reconcile-cidr` argument to ensure the IP subnet assigned to the node by configuration or the controller-manager is propagated to the plugin
|
||||
* The node must be assigned an IP subnet through either the `--pod-cidr` kubelet command-line option or the `--allocate-node-cidrs=true --cluster-cidr=<cidr>` controller-manager command-line options.
|
||||
This file has moved to: http://kubernetes.github.io/docs/admin/network-plugins/
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
|
||||
@@ -32,201 +32,7 @@ Documentation for other releases can be found at
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
# Networking in Kubernetes
|
||||
|
||||
**Table of Contents**
|
||||
<!-- BEGIN MUNGE: GENERATED_TOC -->
|
||||
|
||||
- [Networking in Kubernetes](#networking-in-kubernetes)
|
||||
- [Summary](#summary)
|
||||
- [Docker model](#docker-model)
|
||||
- [Kubernetes model](#kubernetes-model)
|
||||
- [How to achieve this](#how-to-achieve-this)
|
||||
- [Google Compute Engine (GCE)](#google-compute-engine-gce)
|
||||
- [L2 networks and linux bridging](#l2-networks-and-linux-bridging)
|
||||
- [Flannel](#flannel)
|
||||
- [OpenVSwitch](#openvswitch)
|
||||
- [Weave](#weave)
|
||||
- [Calico](#calico)
|
||||
- [Other reading](#other-reading)
|
||||
|
||||
<!-- END MUNGE: GENERATED_TOC -->
|
||||
|
||||
Kubernetes approaches networking somewhat differently than Docker does by
|
||||
default. There are 4 distinct networking problems to solve:
|
||||
1. Highly-coupled container-to-container communications: this is solved by
|
||||
[pods](../user-guide/pods.md) and `localhost` communications.
|
||||
2. Pod-to-Pod communications: this is the primary focus of this document.
|
||||
3. Pod-to-Service communications: this is covered by [services](../user-guide/services.md).
|
||||
4. External-to-Service communications: this is covered by [services](../user-guide/services.md).
|
||||
|
||||
## Summary
|
||||
|
||||
Kubernetes assumes that pods can communicate with other pods, regardless of
|
||||
which host they land on. We give every pod its own IP address so you do not
|
||||
need to explicitly create links between pods and you almost never need to deal
|
||||
with mapping container ports to host ports. This creates a clean,
|
||||
backwards-compatible model where pods can be treated much like VMs or physical
|
||||
hosts from the perspectives of port allocation, naming, service discovery, load
|
||||
balancing, application configuration, and migration.
|
||||
|
||||
To achieve this we must impose some requirements on how you set up your cluster
|
||||
networking.
|
||||
|
||||
## Docker model
|
||||
|
||||
Before discussing the Kubernetes approach to networking, it is worthwhile to
|
||||
review the "normal" way that networking works with Docker. By default, Docker
|
||||
uses host-private networking. It creates a virtual bridge, called `docker0` by
|
||||
default, and allocates a subnet from one of the private address blocks defined
|
||||
in [RFC1918](https://tools.ietf.org/html/rfc1918) for that bridge. For each
|
||||
container that Docker creates, it allocates a virtual ethernet device (called
|
||||
`veth`) which is attached to the bridge. The veth is mapped to appear as `eth0`
|
||||
in the container, using Linux namespaces. The in-container `eth0` interface is
|
||||
given an IP address from the bridge's address range.
|
||||
|
||||
The result is that Docker containers can talk to other containers only if they
|
||||
are on the same machine (and thus the same virtual bridge). Containers on
|
||||
different machines can not reach each other - in fact they may end up with the
|
||||
exact same network ranges and IP addresses.
|
||||
|
||||
In order for Docker containers to communicate across nodes, they must be
|
||||
allocated ports on the machine's own IP address, which are then forwarded or
|
||||
proxied to the containers. This obviously means that containers must either
|
||||
coordinate which ports they use very carefully or else be allocated ports
|
||||
dynamically.
|
||||
|
||||
## Kubernetes model
|
||||
|
||||
Coordinating ports across multiple developers is very difficult to do at
|
||||
scale and exposes users to cluster-level issues outside of their control.
|
||||
Dynamic port allocation brings a lot of complications to the system - every
|
||||
application has to take ports as flags, the API servers have to know how to
|
||||
insert dynamic port numbers into configuration blocks, services have to know
|
||||
how to find each other, etc. Rather than deal with this, Kubernetes takes a
|
||||
different approach.
|
||||
|
||||
Kubernetes imposes the following fundamental requirements on any networking
|
||||
implementation (barring any intentional network segmentation policies):
|
||||
* all containers can communicate with all other containers without NAT
|
||||
* all nodes can communicate with all containers (and vice-versa) without NAT
|
||||
* the IP that a container sees itself as is the same IP that others see it as
|
||||
|
||||
What this means in practice is that you can not just take two computers
|
||||
running Docker and expect Kubernetes to work. You must ensure that the
|
||||
fundamental requirements are met.
|
||||
|
||||
This model is not only less complex overall, but it is principally compatible
|
||||
with the desire for Kubernetes to enable low-friction porting of apps from VMs
|
||||
to containers. If your job previously ran in a VM, your VM had an IP and could
|
||||
talk to other VMs in your project. This is the same basic model.
|
||||
|
||||
Until now this document has talked about containers. In reality, Kubernetes
|
||||
applies IP addresses at the `Pod` scope - containers within a `Pod` share their
|
||||
network namespaces - including their IP address. This means that containers
|
||||
within a `Pod` can all reach each other’s ports on `localhost`. This does imply
|
||||
that containers within a `Pod` must coordinate port usage, but this is no
|
||||
different than processes in a VM. We call this the "IP-per-pod" model. This
|
||||
is implemented in Docker as a "pod container" which holds the network namespace
|
||||
open while "app containers" (the things the user specified) join that namespace
|
||||
with Docker's `--net=container:<id>` function.
|
||||
|
||||
As with Docker, it is possible to request host ports, but this is reduced to a
|
||||
very niche operation. In this case a port will be allocated on the host `Node`
|
||||
and traffic will be forwarded to the `Pod`. The `Pod` itself is blind to the
|
||||
existence or non-existence of host ports.
|
||||
|
||||
## How to achieve this
|
||||
|
||||
There are a number of ways that this network model can be implemented. This
|
||||
document is not an exhaustive study of the various methods, but hopefully serves
|
||||
as an introduction to various technologies and serves as a jumping-off point.
|
||||
If some techniques become vastly preferable to others, we might detail them more
|
||||
here.
|
||||
|
||||
### Google Compute Engine (GCE)
|
||||
|
||||
For the Google Compute Engine cluster configuration scripts, we use [advanced
|
||||
routing](https://developers.google.com/compute/docs/networking#routing) to
|
||||
assign each VM a subnet (default is `/24` - 254 IPs). Any traffic bound for that
|
||||
subnet will be routed directly to the VM by the GCE network fabric. This is in
|
||||
addition to the "main" IP address assigned to the VM, which is NAT'ed for
|
||||
outbound internet access. A linux bridge (called `cbr0`) is configured to exist
|
||||
on that subnet, and is passed to docker's `--bridge` flag.
|
||||
|
||||
We start Docker with:
|
||||
|
||||
```sh
|
||||
DOCKER_OPTS="--bridge=cbr0 --iptables=false --ip-masq=false"
|
||||
```
|
||||
|
||||
This bridge is created by Kubelet (controlled by the `--configure-cbr0=true`
|
||||
flag) according to the `Node`'s `spec.podCIDR`.
|
||||
|
||||
Docker will now allocate IPs from the `cbr-cidr` block. Containers can reach
|
||||
each other and `Nodes` over the `cbr0` bridge. Those IPs are all routable
|
||||
within the GCE project network.
|
||||
|
||||
GCE itself does not know anything about these IPs, though, so it will not NAT
|
||||
them for outbound internet traffic. To achieve that we use an iptables rule to
|
||||
masquerade (aka SNAT - to make it seem as if packets came from the `Node`
|
||||
itself) traffic that is bound for IPs outside the GCE project network
|
||||
(10.0.0.0/8).
|
||||
|
||||
```sh
|
||||
iptables -t nat -A POSTROUTING ! -d 10.0.0.0/8 -o eth0 -j MASQUERADE
|
||||
```
|
||||
|
||||
Lastly we enable IP forwarding in the kernel (so the kernel will process
|
||||
packets for bridged containers):
|
||||
|
||||
```sh
|
||||
sysctl net.ipv4.ip_forward=1
|
||||
```
|
||||
|
||||
The result of all this is that all `Pods` can reach each other and can egress
|
||||
traffic to the internet.
|
||||
|
||||
### L2 networks and linux bridging
|
||||
|
||||
If you have a "dumb" L2 network, such as a simple switch in a "bare-metal"
|
||||
environment, you should be able to do something similar to the above GCE setup.
|
||||
Note that these instructions have only been tried very casually - it seems to
|
||||
work, but has not been thoroughly tested. If you use this technique and
|
||||
perfect the process, please let us know.
|
||||
|
||||
Follow the "With Linux Bridge devices" section of [this very nice
|
||||
tutorial](http://blog.oddbit.com/2014/08/11/four-ways-to-connect-a-docker/) from
|
||||
Lars Kellogg-Stedman.
|
||||
|
||||
### Flannel
|
||||
|
||||
[Flannel](https://github.com/coreos/flannel#flannel) is a very simple overlay
|
||||
network that satisfies the Kubernetes requirements. It installs in minutes and
|
||||
should get you up and running if the above techniques are not working. Many
|
||||
people have reported success with Flannel and Kubernetes.
|
||||
|
||||
### OpenVSwitch
|
||||
|
||||
[OpenVSwitch](ovs-networking.md) is a somewhat more mature but also
|
||||
complicated way to build an overlay network. This is endorsed by several of the
|
||||
"Big Shops" for networking.
|
||||
|
||||
### Weave
|
||||
|
||||
[Weave](https://github.com/zettio/weave) is yet another way to build an overlay
|
||||
network, primarily aiming at Docker integration.
|
||||
|
||||
### Calico
|
||||
|
||||
[Calico](https://github.com/projectcalico/calico-containers) uses BGP to enable real container
|
||||
IPs.
|
||||
|
||||
## Other reading
|
||||
|
||||
The early design of the networking model and its rationale, and some future
|
||||
plans are described in more detail in the [networking design
|
||||
document](../design/networking.md).
|
||||
This file has moved to: http://kubernetes.github.io/docs/admin/networking/
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
|
||||
@@ -32,239 +32,7 @@ Documentation for other releases can be found at
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
# Node
|
||||
|
||||
**Table of Contents**
|
||||
<!-- BEGIN MUNGE: GENERATED_TOC -->
|
||||
|
||||
- [Node](#node)
|
||||
- [What is a node?](#what-is-a-node)
|
||||
- [Node Status](#node-status)
|
||||
- [Node Addresses](#node-addresses)
|
||||
- [Node Phase](#node-phase)
|
||||
- [Node Condition](#node-condition)
|
||||
- [Node Capacity](#node-capacity)
|
||||
- [Node Info](#node-info)
|
||||
- [Node Management](#node-management)
|
||||
- [Node Controller](#node-controller)
|
||||
- [Self-Registration of Nodes](#self-registration-of-nodes)
|
||||
- [Manual Node Administration](#manual-node-administration)
|
||||
- [Node capacity](#node-capacity)
|
||||
- [API Object](#api-object)
|
||||
|
||||
<!-- END MUNGE: GENERATED_TOC -->
|
||||
|
||||
## What is a node?
|
||||
|
||||
`Node` is a worker machine in Kubernetes, previously known as `Minion`. Node
|
||||
may be a VM or physical machine, depending on the cluster. Each node has
|
||||
the services necessary to run [Pods](../user-guide/pods.md) and is managed by the master
|
||||
components. The services on a node include docker, kubelet and network proxy. See
|
||||
[The Kubernetes Node](../design/architecture.md#the-kubernetes-node) section in the
|
||||
architecture design doc for more details.
|
||||
|
||||
## Node Status
|
||||
|
||||
Node status describes current status of a node. For now, there are the following
|
||||
pieces of information:
|
||||
|
||||
### Node Addresses
|
||||
|
||||
The usage of these fields varies depending on your cloud provider or bare metal configuration.
|
||||
|
||||
* HostName: Generally not used
|
||||
|
||||
* ExternalIP: Generally the IP address of the node that is externally routable (available from outside the cluster)
|
||||
|
||||
* InternalIP: Generally the IP address of the node that is routable only within the cluster
|
||||
|
||||
|
||||
### Node Phase
|
||||
|
||||
Node Phase is the current lifecycle phase of node, one of `Pending`,
|
||||
`Running` and `Terminated`.
|
||||
|
||||
* Pending: New nodes are created in this state. A node stays in this state until it is configured.
|
||||
|
||||
* Running: Node has been configured and the Kubernetes components are running
|
||||
|
||||
* Terminated: Node has been removed from the cluster. It will not receive any scheduling requests,
|
||||
and any running pods will be removed from the node.
|
||||
|
||||
Node with `Running` phase is necessary but not sufficient requirement for
|
||||
scheduling Pods. For a node to be considered a scheduling candidate, it
|
||||
must have appropriate conditions, see below.
|
||||
|
||||
### Node Condition
|
||||
|
||||
Node Condition describes the conditions of `Running` nodes. Currently the only
|
||||
node condition is Ready. The Status of this condition can be True, False, or
|
||||
Unknown. True means the Kubelet is healthy and ready to accept pods.
|
||||
False means the Kubelet is not healthy and is not accepting pods. Unknown
|
||||
means the Node Controller, which manages node lifecycle and is responsible for
|
||||
setting the Status of the condition, has not heard from the
|
||||
node recently (currently 40 seconds).
|
||||
Node condition is represented as a json object. For example,
|
||||
the following conditions mean the node is in sane state:
|
||||
|
||||
```json
|
||||
"conditions": [
|
||||
{
|
||||
"kind": "Ready",
|
||||
"status": "True",
|
||||
},
|
||||
]
|
||||
```
|
||||
|
||||
If the Status of the Ready condition
|
||||
is Unknown or False for more than five minutes, then all of the Pods on the node are terminated by the Node Controller.
|
||||
|
||||
### Node Capacity
|
||||
|
||||
Describes the resources available on the node: CPUs, memory and the maximum
|
||||
number of pods that can be scheduled onto the node.
|
||||
|
||||
### Node Info
|
||||
|
||||
General information about the node, for instance kernel version, Kubernetes version
|
||||
(kubelet version, kube-proxy version), docker version (if used), OS name.
|
||||
The information is gathered by Kubelet from the node.
|
||||
|
||||
## Node Management
|
||||
|
||||
Unlike [Pods](../user-guide/pods.md) and [Services](../user-guide/services.md), a Node is not inherently
|
||||
created by Kubernetes: it is either taken from cloud providers like Google Compute Engine,
|
||||
or from your pool of physical or virtual machines. What this means is that when
|
||||
Kubernetes creates a node, it is really just creating an object that represents the node in its internal state.
|
||||
After creation, Kubernetes will check whether the node is valid or not.
|
||||
For example, if you try to create a node from the following content:
|
||||
|
||||
```json
|
||||
{
|
||||
"kind": "Node",
|
||||
"apiVersion": "v1",
|
||||
"metadata": {
|
||||
"name": "10.240.79.157",
|
||||
"labels": {
|
||||
"name": "my-first-k8s-node"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Kubernetes will create a Node object internally (the representation), and
|
||||
validate the node by health checking based on the `metadata.name` field: we
|
||||
assume `metadata.name` can be resolved. If the node is valid, i.e. all necessary
|
||||
services are running, it is eligible to run a Pod; otherwise, it will be
|
||||
ignored for any cluster activity, until it becomes valid. Note that Kubernetes
|
||||
will keep the object for the invalid node unless it is explicitly deleted by the client, and it will keep
|
||||
checking to see if it becomes valid.
|
||||
|
||||
Currently, there are three components that interact with the Kubernetes node interface: Node Controller, Kubelet, and kubectl.
|
||||
|
||||
### Node Controller
|
||||
|
||||
Node controller is a component in Kubernetes master which manages Node
|
||||
objects. It performs two major functions: cluster-wide node synchronization
|
||||
and single node life-cycle management.
|
||||
|
||||
Node controller has a sync loop that deletes Nodes from Kubernetes
|
||||
based on all matching VM instances listed from the cloud provider. The sync period
|
||||
can be controlled via flag `--node-sync-period`. If a new VM instance
|
||||
gets created, Node Controller creates a representation for it. If an existing
|
||||
instance gets deleted, Node Controller deletes the representation. Note however,
|
||||
that Node Controller is unable to provision the node for you, i.e. it won't install
|
||||
any binary; therefore, to
|
||||
join a node to a Kubernetes cluster, you as an admin need to make sure proper services are
|
||||
running in the node. In the future, we plan to automatically provision some node
|
||||
services.
|
||||
|
||||
In general, node controller is responsible for updating the NodeReady condition of node
|
||||
status to ConditionUnknown when a node becomes unreachable (e.g. due to the node being down),
|
||||
and then later evicting all the pods from the node (using graceful termination) if the node
|
||||
continues to be unreachable. (The current timeouts for those are 40s and 5m, respectively.)
|
||||
It also allocates CIDR blocks to the new nodes.
|
||||
|
||||
### Self-Registration of Nodes
|
||||
|
||||
When kubelet flag `--register-node` is true (the default), the kubelet will attempt to
|
||||
register itself with the API server. This is the preferred pattern, used by most distros.
|
||||
|
||||
For self-registration, the kubelet is started with the following options:
|
||||
- `--api-servers=` tells the kubelet the location of the apiserver.
|
||||
- `--kubeconfig` tells kubelet where to find credentials to authenticate itself to the apiserver.
|
||||
- `--cloud-provider=` tells the kubelet how to talk to a cloud provider to read metadata about itself.
|
||||
- `--register-node` tells the kubelet to create its own node resource.
|
||||
|
||||
Currently, any kubelet is authorized to create/modify any node resource, but in practice it only creates/modifies
|
||||
its own. (In the future, we plan to limit authorization to only allow a kubelet to modify its own Node resource.)
|
||||
|
||||
#### Manual Node Administration
|
||||
|
||||
A cluster administrator can create and modify Node objects.
|
||||
|
||||
If the administrator wishes to create node objects manually, set kubelet flag
|
||||
`--register-node=false`.
|
||||
|
||||
The administrator can modify Node resources (regardless of the setting of `--register-node`).
|
||||
Modifications include setting labels on the Node, and marking it unschedulable.
|
||||
|
||||
Labels on nodes can be used in conjunction with node selectors on pods to control scheduling,
|
||||
e.g. to constrain a Pod to only be eligible to run on a subset of the nodes.
|
||||
|
||||
Making a node unscheduleable will prevent new pods from being scheduled to that
|
||||
node, but will not affect any existing pods on the node. This is useful as a
|
||||
preparatory step before a node reboot, etc. For example, to mark a node
|
||||
unschedulable, run this command:
|
||||
|
||||
```sh
|
||||
kubectl patch nodes $NODENAME -p '{"spec": {"unschedulable": true}}'
|
||||
```
|
||||
|
||||
Note that pods which are created by a daemonSet controller bypass the Kubernetes scheduler,
|
||||
and do not respect the unschedulable attribute on a node. The assumption is that daemons belong on
|
||||
the machine even if it is being drained of applications in preparation for a reboot.
|
||||
|
||||
### Node capacity
|
||||
|
||||
The capacity of the node (number of cpus and amount of memory) is part of the node resource.
|
||||
Normally, nodes register themselves and report their capacity when creating the node resource. If
|
||||
you are doing [manual node administration](#manual-node-administration), then you need to set node
|
||||
capacity when adding a node.
|
||||
|
||||
The Kubernetes scheduler ensures that there are enough resources for all the pods on a node. It
|
||||
checks that the sum of the limits of containers on the node is no greater than the node capacity. It
|
||||
includes all containers started by kubelet, but not containers started directly by docker, nor
|
||||
processes not in containers.
|
||||
|
||||
If you want to explicitly reserve resources for non-Pod processes, you can create a placeholder
|
||||
pod. Use the following template:
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: resource-reserver
|
||||
spec:
|
||||
containers:
|
||||
- name: sleep-forever
|
||||
image: gcr.io/google_containers/pause:0.8.0
|
||||
resources:
|
||||
limits:
|
||||
cpu: 100m
|
||||
memory: 100Mi
|
||||
```
|
||||
|
||||
Set the `cpu` and `memory` values to the amount of resources you want to reserve.
|
||||
Place the file in the manifest directory (`--config=DIR` flag of kubelet). Do this
|
||||
on each kubelet where you want to reserve resources.
|
||||
|
||||
|
||||
## API Object
|
||||
|
||||
Node is a top-level resource in the kubernetes REST API. More details about the
|
||||
API object can be found at: [Node API
|
||||
object](https://htmlpreview.github.io/?https://github.com/kubernetes/kubernetes/blob/HEAD/docs/api-reference/v1/definitions.html#_v1_node).
|
||||
This file has moved to: http://kubernetes.github.io/docs/admin/node/
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
|
||||
@@ -32,20 +32,7 @@ Documentation for other releases can be found at
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
# Kubernetes OpenVSwitch GRE/VxLAN networking
|
||||
|
||||
This document describes how OpenVSwitch is used to setup networking between pods across nodes.
|
||||
The tunnel type could be GRE or VxLAN. VxLAN is preferable when large scale isolation needs to be performed within the network.
|
||||
|
||||

|
||||
|
||||
The vagrant setup in Kubernetes does the following:
|
||||
|
||||
The docker bridge is replaced with a brctl generated linux bridge (kbr0) with a 256 address space subnet. Basically, a node gets 10.244.x.0/24 subnet and docker is configured to use that bridge instead of the default docker0 bridge.
|
||||
|
||||
Also, an OVS bridge is created(obr0) and added as a port to the kbr0 bridge. All OVS bridges across all nodes are linked with GRE tunnels. So, each node has an outgoing GRE tunnel to all other nodes. It does not need to be a complete mesh really, just meshier the better. STP (spanning tree) mode is enabled in the bridges to prevent loops.
|
||||
|
||||
Routing rules enable any 10.244.0.0/16 target to become reachable via the OVS bridge connected with the tunnels.
|
||||
This file has moved to: http://kubernetes.github.io/docs/admin/ovs-networking/
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
|
||||
@@ -32,156 +32,7 @@ Documentation for other releases can be found at
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
# Resource Quotas
|
||||
|
||||
When several users or teams share a cluster with a fixed number of nodes,
|
||||
there is a concern that one team could use more than its fair share of resources.
|
||||
|
||||
Resource quotas are a tool for administrators to address this concern. Resource quotas
|
||||
work like this:
|
||||
- Different teams work in different namespaces. Currently this is voluntary, but
|
||||
support for making this mandatory via ACLs is planned.
|
||||
- The administrator creates a Resource Quota for each namespace.
|
||||
- Users put compute resource requests on their pods. The sum of all resource requests across
|
||||
all pods in the same namespace must not exceed any hard resource limit in any Resource Quota
|
||||
document for the namespace. Note that we used to verify Resource Quota by taking the sum of
|
||||
resource limits of the pods, but this was altered to use resource requests. Backwards compatibility
|
||||
for those pods previously created is preserved because pods that only specify a resource limit have
|
||||
their resource requests defaulted to match their defined limits. The user is only charged for the
|
||||
resources they request in the Resource Quota versus their limits because the request is the minimum
|
||||
amount of resource guaranteed by the cluster during scheduling. For more information on over commit,
|
||||
see [compute-resources](../user-guide/compute-resources.md).
|
||||
- If creating a pod would cause the namespace to exceed any of the limits specified in the
|
||||
the Resource Quota for that namespace, then the request will fail with HTTP status
|
||||
code `403 FORBIDDEN`.
|
||||
- If quota is enabled in a namespace and the user does not specify *requests* on the pod for each
|
||||
of the resources for which quota is enabled, then the POST of the pod will fail with HTTP
|
||||
status code `403 FORBIDDEN`. Hint: Use the LimitRange admission controller to force default
|
||||
values of *limits* (then resource *requests* would be equal to *limits* by default, see
|
||||
[admission controller](admission-controllers.md)) before the quota is checked to avoid this problem.
|
||||
|
||||
Examples of policies that could be created using namespaces and quotas are:
|
||||
- In a cluster with a capacity of 32 GiB RAM, and 16 cores, let team A use 20 Gib and 10 cores,
|
||||
let B use 10GiB and 4 cores, and hold 2GiB and 2 cores in reserve for future allocation.
|
||||
- Limit the "testing" namespace to using 1 core and 1GiB RAM. Let the "production" namespace
|
||||
use any amount.
|
||||
|
||||
In the case where the total capacity of the cluster is less than the sum of the quotas of the namespaces,
|
||||
there may be contention for resources. This is handled on a first-come-first-served basis.
|
||||
|
||||
Neither contention nor changes to quota will affect already-running pods.
|
||||
|
||||
## Enabling Resource Quota
|
||||
|
||||
Resource Quota support is enabled by default for many Kubernetes distributions. It is
|
||||
enabled when the apiserver `--admission-control=` flag has `ResourceQuota` as
|
||||
one of its arguments.
|
||||
|
||||
Resource Quota is enforced in a particular namespace when there is a
|
||||
`ResourceQuota` object in that namespace. There should be at most one
|
||||
`ResourceQuota` object in a namespace.
|
||||
|
||||
## Compute Resource Quota
|
||||
|
||||
The total sum of [compute resources](../user-guide/compute-resources.md) requested by pods
|
||||
in a namespace can be limited. The following compute resource types are supported:
|
||||
|
||||
| ResourceName | Description |
|
||||
| ------------ | ----------- |
|
||||
| cpu | Total cpu requests of containers |
|
||||
| memory | Total memory requests of containers
|
||||
|
||||
For example, `cpu` quota sums up the `resources.requests.cpu` fields of every
|
||||
container of every pod in the namespace, and enforces a maximum on that sum.
|
||||
|
||||
## Object Count Quota
|
||||
|
||||
The number of objects of a given type can be restricted. The following types
|
||||
are supported:
|
||||
|
||||
| ResourceName | Description |
|
||||
| ------------ | ----------- |
|
||||
| pods | Total number of pods |
|
||||
| services | Total number of services |
|
||||
| replicationcontrollers | Total number of replication controllers |
|
||||
| resourcequotas | Total number of [resource quotas](admission-controllers.md#resourcequota) |
|
||||
| secrets | Total number of secrets |
|
||||
| persistentvolumeclaims | Total number of [persistent volume claims](../user-guide/persistent-volumes.md#persistentvolumeclaims) |
|
||||
|
||||
For example, `pods` quota counts and enforces a maximum on the number of `pods`
|
||||
created in a single namespace.
|
||||
|
||||
You might want to set a pods quota on a namespace
|
||||
to avoid the case where a user creates many small pods and exhausts the cluster's
|
||||
supply of Pod IPs.
|
||||
|
||||
## Viewing and Setting Quotas
|
||||
|
||||
Kubectl supports creating, updating, and viewing quotas:
|
||||
|
||||
```console
|
||||
$ kubectl namespace myspace
|
||||
$ cat <<EOF > quota.json
|
||||
{
|
||||
"apiVersion": "v1",
|
||||
"kind": "ResourceQuota",
|
||||
"metadata": {
|
||||
"name": "quota"
|
||||
},
|
||||
"spec": {
|
||||
"hard": {
|
||||
"memory": "1Gi",
|
||||
"cpu": "20",
|
||||
"pods": "10",
|
||||
"services": "5",
|
||||
"replicationcontrollers":"20",
|
||||
"resourcequotas":"1"
|
||||
}
|
||||
}
|
||||
}
|
||||
EOF
|
||||
$ kubectl create -f ./quota.json
|
||||
$ kubectl get quota
|
||||
NAME
|
||||
quota
|
||||
$ kubectl describe quota quota
|
||||
Name: quota
|
||||
Resource Used Hard
|
||||
-------- ---- ----
|
||||
cpu 0m 20
|
||||
memory 0 1Gi
|
||||
pods 5 10
|
||||
replicationcontrollers 5 20
|
||||
resourcequotas 1 1
|
||||
services 3 5
|
||||
```
|
||||
|
||||
## Quota and Cluster Capacity
|
||||
|
||||
Resource Quota objects are independent of the Cluster Capacity. They are
|
||||
expressed in absolute units. So, if you add nodes to your cluster, this does *not*
|
||||
automatically give each namespace the ability to consume more resources.
|
||||
|
||||
Sometimes more complex policies may be desired, such as:
|
||||
- proportionally divide total cluster resources among several teams.
|
||||
- allow each tenant to grow resource usage as needed, but have a generous
|
||||
limit to prevent accidental resource exhaustion.
|
||||
- detect demand from one namespace, add nodes, and increase quota.
|
||||
|
||||
Such policies could be implemented using ResourceQuota as a building-block, by
|
||||
writing a 'controller' which watches the quota usage and adjusts the quota
|
||||
hard limits of each namespace according to other signals.
|
||||
|
||||
Note that resource quota divides up aggregate cluster resources, but it creates no
|
||||
restrictions around nodes: pods from several namespaces may run on the same node.
|
||||
|
||||
## Example
|
||||
|
||||
See a [detailed example for how to use resource quota](resourcequota/)..
|
||||
|
||||
## Read More
|
||||
|
||||
See [ResourceQuota design doc](../design/admission_control_resource_quota.md) for more information.
|
||||
This file has moved to: http://kubernetes.github.io/docs/admin/resource-quota/
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
|
||||
@@ -31,165 +31,8 @@ Documentation for other releases can be found at
|
||||
<!-- END STRIP_FOR_RELEASE -->
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
Resource Quota
|
||||
========================================
|
||||
This example demonstrates how [resource quota](../../admin/admission-controllers.md#resourcequota) and
|
||||
[limitsranger](../../admin/admission-controllers.md#limitranger) can be applied to a Kubernetes namespace.
|
||||
See [ResourceQuota design doc](../../design/admission_control_resource_quota.md) for more information.
|
||||
|
||||
This example assumes you have a functional Kubernetes setup.
|
||||
|
||||
Step 1: Create a namespace
|
||||
-----------------------------------------
|
||||
This example will work in a custom namespace to demonstrate the concepts involved.
|
||||
|
||||
Let's create a new namespace called quota-example:
|
||||
|
||||
```console
|
||||
$ kubectl create -f docs/admin/resourcequota/namespace.yaml
|
||||
namespace "quota-example" created
|
||||
$ kubectl get namespaces
|
||||
NAME LABELS STATUS AGE
|
||||
default <none> Active 2m
|
||||
quota-example <none> Active 39s
|
||||
```
|
||||
|
||||
Step 2: Apply a quota to the namespace
|
||||
-----------------------------------------
|
||||
By default, a pod will run with unbounded CPU and memory requests/limits. This means that any pod in the
|
||||
system will be able to consume as much CPU and memory on the node that executes the pod.
|
||||
|
||||
Users may want to restrict how much of the cluster resources a given namespace may consume
|
||||
across all of its pods in order to manage cluster usage. To do this, a user applies a quota to
|
||||
a namespace. A quota lets the user set hard limits on the total amount of node resources (cpu, memory)
|
||||
and API resources (pods, services, etc.) that a namespace may consume. In term of resources, Kubernetes
|
||||
checks the total resource *requests*, not resource *limits* of all containers/pods in the namespace.
|
||||
|
||||
Let's create a simple quota in our namespace:
|
||||
|
||||
```console
|
||||
$ kubectl create -f docs/admin/resourcequota/quota.yaml --namespace=quota-example
|
||||
resourcequota "quota" created
|
||||
```
|
||||
|
||||
Once your quota is applied to a namespace, the system will restrict any creation of content
|
||||
in the namespace until the quota usage has been calculated. This should happen quickly.
|
||||
|
||||
You can describe your current quota usage to see what resources are being consumed in your
|
||||
namespace.
|
||||
|
||||
```console
|
||||
$ kubectl describe quota quota --namespace=quota-example
|
||||
Name: quota
|
||||
Namespace: quota-example
|
||||
Resource Used Hard
|
||||
-------- ---- ----
|
||||
cpu 0 20
|
||||
memory 0 1Gi
|
||||
persistentvolumeclaims 0 10
|
||||
pods 0 10
|
||||
replicationcontrollers 0 20
|
||||
resourcequotas 1 1
|
||||
secrets 1 10
|
||||
services 0 5
|
||||
```
|
||||
|
||||
Step 3: Applying default resource requests and limits
|
||||
-----------------------------------------
|
||||
Pod authors rarely specify resource requests and limits for their pods.
|
||||
|
||||
Since we applied a quota to our project, let's see what happens when an end-user creates a pod that has unbounded
|
||||
cpu and memory by creating an nginx container.
|
||||
|
||||
To demonstrate, lets create a replication controller that runs nginx:
|
||||
|
||||
```console
|
||||
$ kubectl run nginx --image=nginx --replicas=1 --namespace=quota-example
|
||||
replicationcontroller "nginx" created
|
||||
```
|
||||
|
||||
Now let's look at the pods that were created.
|
||||
|
||||
```console
|
||||
$ kubectl get pods --namespace=quota-example
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
```
|
||||
|
||||
What happened? I have no pods! Let's describe the replication controller to get a view of what is happening.
|
||||
|
||||
```console
|
||||
kubectl describe rc nginx --namespace=quota-example
|
||||
Name: nginx
|
||||
Namespace: quota-example
|
||||
Image(s): nginx
|
||||
Selector: run=nginx
|
||||
Labels: run=nginx
|
||||
Replicas: 0 current / 1 desired
|
||||
Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed
|
||||
No volumes.
|
||||
Events:
|
||||
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
|
||||
───────── ──────── ───── ──── ───────────── ──────── ────── ───────
|
||||
12s 12s 2 {replication-controller } Warning FailedCreate Error creating: Pod "nginx-" is forbidden: memory is limited by quota, must make explicit request.
|
||||
```
|
||||
|
||||
The Kubernetes API server is rejecting the replication controllers requests to create a pod because our pods
|
||||
do not specify any memory usage *request*.
|
||||
|
||||
So let's set some default values for the amount of cpu and memory a pod can consume:
|
||||
|
||||
```console
|
||||
$ kubectl create -f docs/admin/resourcequota/limits.yaml --namespace=quota-example
|
||||
limitrange "limits" created
|
||||
$ kubectl describe limits limits --namespace=quota-example
|
||||
Name: limits
|
||||
Namespace: quota-example
|
||||
Type Resource Min Max Default Request Default Limit Max Limit/Request Ratio
|
||||
---- -------- --- --- --------------- ------------- -----------------------
|
||||
Container memory - - 256Mi 512Mi -
|
||||
Container cpu - - 100m 200m -
|
||||
```
|
||||
|
||||
Now any time a pod is created in this namespace, if it has not specified any resource request/limit, the default
|
||||
amount of cpu and memory per container will be applied, and the request will be used as part of admission control.
|
||||
|
||||
Now that we have applied default resource *request* for our namespace, our replication controller should be able to
|
||||
create its pods.
|
||||
|
||||
```console
|
||||
$ kubectl get pods --namespace=quota-example
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
nginx-fca65 1/1 Running 0 1m
|
||||
```
|
||||
|
||||
And if we print out our quota usage in the namespace:
|
||||
|
||||
```console
|
||||
$ kubectl describe quota quota --namespace=quota-example
|
||||
Name: quota
|
||||
Namespace: quota-example
|
||||
Resource Used Hard
|
||||
-------- ---- ----
|
||||
cpu 100m 20
|
||||
memory 268435456 1Gi
|
||||
persistentvolumeclaims 0 10
|
||||
pods 1 10
|
||||
replicationcontrollers 1 20
|
||||
resourcequotas 1 1
|
||||
secrets 1 10
|
||||
services 0 5
|
||||
```
|
||||
|
||||
You can now see the pod that was created is consuming explicit amounts of resources (specified by resource *request*),
|
||||
and the usage is being tracked by the Kubernetes system properly.
|
||||
|
||||
Summary
|
||||
----------------------------
|
||||
Actions that consume node resources for cpu and memory can be subject to hard quota limits defined
|
||||
by the namespace quota. The resource consumption is measured by resource *request* in pod specification.
|
||||
|
||||
Any action that consumes those resources can be tweaked, or can pick up namespace level defaults to
|
||||
meet your end goal.
|
||||
This file has moved to: http://kubernetes.github.io/docs/admin/resourcequota/README/
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
|
||||
@@ -32,105 +32,7 @@ Documentation for other releases can be found at
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
# Using Salt to configure Kubernetes
|
||||
|
||||
The Kubernetes cluster can be configured using Salt.
|
||||
|
||||
The Salt scripts are shared across multiple hosting providers, so it's important to understand some background information prior to making a modification to ensure your changes do not break hosting Kubernetes across multiple environments. Depending on where you host your Kubernetes cluster, you may be using different operating systems and different networking configurations. As a result, it's important to understand some background information before making Salt changes in order to minimize introducing failures for other hosting providers.
|
||||
|
||||
## Salt cluster setup
|
||||
|
||||
The **salt-master** service runs on the kubernetes-master [(except on the default GCE setup)](#standalone-salt-configuration-on-gce).
|
||||
|
||||
The **salt-minion** service runs on the kubernetes-master and each kubernetes-node in the cluster.
|
||||
|
||||
Each salt-minion service is configured to interact with the **salt-master** service hosted on the kubernetes-master via the **master.conf** file [(except on GCE)](#standalone-salt-configuration-on-gce).
|
||||
|
||||
```console
|
||||
[root@kubernetes-master] $ cat /etc/salt/minion.d/master.conf
|
||||
master: kubernetes-master
|
||||
```
|
||||
|
||||
The salt-master is contacted by each salt-minion and depending upon the machine information presented, the salt-master will provision the machine as either a kubernetes-master or kubernetes-node with all the required capabilities needed to run Kubernetes.
|
||||
|
||||
If you are running the Vagrant based environment, the **salt-api** service is running on the kubernetes-master. It is configured to enable the vagrant user to introspect the salt cluster in order to find out about machines in the Vagrant environment via a REST API.
|
||||
|
||||
## Standalone Salt Configuration on GCE
|
||||
|
||||
On GCE, the master and nodes are all configured as [standalone minions](http://docs.saltstack.com/en/latest/topics/tutorials/standalone_minion.html). The configuration for each VM is derived from the VM's [instance metadata](https://cloud.google.com/compute/docs/metadata) and then stored in Salt grains (`/etc/salt/minion.d/grains.conf`) and pillars (`/srv/salt-overlay/pillar/cluster-params.sls`) that local Salt uses to enforce state.
|
||||
|
||||
All remaining sections that refer to master/minion setups should be ignored for GCE. One fallout of the GCE setup is that the Salt mine doesn't exist - there is no sharing of configuration amongst nodes.
|
||||
|
||||
## Salt security
|
||||
|
||||
*(Not applicable on default GCE setup.)*
|
||||
|
||||
Security is not enabled on the salt-master, and the salt-master is configured to auto-accept incoming requests from minions. It is not recommended to use this security configuration in production environments without deeper study. (In some environments this isn't as bad as it might sound if the salt master port isn't externally accessible and you trust everyone on your network.)
|
||||
|
||||
```console
|
||||
[root@kubernetes-master] $ cat /etc/salt/master.d/auto-accept.conf
|
||||
open_mode: True
|
||||
auto_accept: True
|
||||
```
|
||||
|
||||
## Salt minion configuration
|
||||
|
||||
Each minion in the salt cluster has an associated configuration that instructs the salt-master how to provision the required resources on the machine.
|
||||
|
||||
An example file is presented below using the Vagrant based environment.
|
||||
|
||||
```console
|
||||
[root@kubernetes-master] $ cat /etc/salt/minion.d/grains.conf
|
||||
grains:
|
||||
etcd_servers: $MASTER_IP
|
||||
cloud: vagrant
|
||||
roles:
|
||||
- kubernetes-master
|
||||
```
|
||||
|
||||
Each hosting environment has a slightly different grains.conf file that is used to build conditional logic where required in the Salt files.
|
||||
|
||||
The following enumerates the set of defined key/value pairs that are supported today. If you add new ones, please make sure to update this list.
|
||||
|
||||
Key | Value
|
||||
------------- | -------------
|
||||
`api_servers` | (Optional) The IP address / host name where a kubelet can get read-only access to kube-apiserver
|
||||
`cbr-cidr` | (Optional) The minion IP address range used for the docker container bridge.
|
||||
`cloud` | (Optional) Which IaaS platform is used to host Kubernetes, *gce*, *azure*, *aws*, *vagrant*
|
||||
`etcd_servers` | (Optional) Comma-delimited list of IP addresses the kube-apiserver and kubelet use to reach etcd. Uses the IP of the first machine in the kubernetes_master role, or 127.0.0.1 on GCE.
|
||||
`hostnamef` | (Optional) The full host name of the machine, i.e. uname -n
|
||||
`node_ip` | (Optional) The IP address to use to address this node
|
||||
`hostname_override` | (Optional) Mapped to the kubelet hostname-override
|
||||
`network_mode` | (Optional) Networking model to use among nodes: *openvswitch*
|
||||
`networkInterfaceName` | (Optional) Networking interface to use to bind addresses, default value *eth0*
|
||||
`publicAddressOverride` | (Optional) The IP address the kube-apiserver should use to bind against for external read-only access
|
||||
`roles` | (Required) 1. `kubernetes-master` means this machine is the master in the Kubernetes cluster. 2. `kubernetes-pool` means this machine is a kubernetes-node. Depending on the role, the Salt scripts will provision different resources on the machine.
|
||||
|
||||
These keys may be leveraged by the Salt sls files to branch behavior.
|
||||
|
||||
In addition, a cluster may be running a Debian based operating system or Red Hat based operating system (Centos, Fedora, RHEL, etc.). As a result, it's important to sometimes distinguish behavior based on operating system using if branches like the following.
|
||||
|
||||
```jinja
|
||||
{% if grains['os_family'] == 'RedHat' %}
|
||||
// something specific to a RedHat environment (Centos, Fedora, RHEL) where you may use yum, systemd, etc.
|
||||
{% else %}
|
||||
// something specific to Debian environment (apt-get, initd)
|
||||
{% endif %}
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. When configuring default arguments for processes, it's best to avoid the use of EnvironmentFiles (Systemd in Red Hat environments) or init.d files (Debian distributions) to hold default values that should be common across operating system environments. This helps keep our Salt template files easy to understand for editors who may not be familiar with the particulars of each distribution.
|
||||
|
||||
## Future enhancements (Networking)
|
||||
|
||||
Per pod IP configuration is provider-specific, so when making networking changes, it's important to sandbox these as all providers may not use the same mechanisms (iptables, openvswitch, etc.)
|
||||
|
||||
We should define a grains.conf key that captures more specifically what network configuration environment is being used to avoid future confusion across providers.
|
||||
|
||||
## Further reading
|
||||
|
||||
The [cluster/saltbase](http://releases.k8s.io/HEAD/cluster/saltbase/) tree has more details on the current SaltStack configuration.
|
||||
This file has moved to: http://kubernetes.github.io/docs/admin/salt/
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
|
||||
@@ -32,98 +32,7 @@ Documentation for other releases can be found at
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
# Cluster Admin Guide to Service Accounts
|
||||
|
||||
*This is a Cluster Administrator guide to service accounts. It assumes knowledge of
|
||||
the [User Guide to Service Accounts](../user-guide/service-accounts.md).*
|
||||
|
||||
*Support for authorization and user accounts is planned but incomplete. Sometimes
|
||||
incomplete features are referred to in order to better describe service accounts.*
|
||||
|
||||
## User accounts vs service accounts
|
||||
|
||||
Kubernetes distinguished between the concept of a user account and a service accounts
|
||||
for a number of reasons:
|
||||
- User accounts are for humans. Service accounts are for processes, which
|
||||
run in pods.
|
||||
- User accounts are intended to be global. Names must be unique across all
|
||||
namespaces of a cluster, future user resource will not be namespaced.
|
||||
Service accounts are namespaced.
|
||||
- Typically, a cluster's User accounts might be synced from a corporate
|
||||
database, where new user account creation requires special privileges and
|
||||
is tied to complex business processes. Service account creation is intended
|
||||
to be more lightweight, allowing cluster users to create service accounts for
|
||||
specific tasks (i.e. principle of least privilege).
|
||||
- Auditing considerations for humans and service accounts may differ.
|
||||
- A config bundle for a complex system may include definition of various service
|
||||
accounts for components of that system. Because service accounts can be created
|
||||
ad-hoc and have namespaced names, such config is portable.
|
||||
|
||||
## Service account automation
|
||||
|
||||
Three separate components cooperate to implement the automation around service accounts:
|
||||
- A Service account admission controller
|
||||
- A Token controller
|
||||
- A Service account controller
|
||||
|
||||
### Service Account Admission Controller
|
||||
|
||||
The modification of pods is implemented via a plugin
|
||||
called an [Admission Controller](admission-controllers.md). It is part of the apiserver.
|
||||
It acts synchronously to modify pods as they are created or updated. When this plugin is active
|
||||
(and it is by default on most distributions), then it does the following when a pod is created or modified:
|
||||
1. If the pod does not have a `ServiceAccount` set, it sets the `ServiceAccount` to `default`.
|
||||
2. It ensures that the `ServiceAccount` referenced by the pod exists, and otherwise rejects it.
|
||||
4. If the pod does not contain any `ImagePullSecrets`, then `ImagePullSecrets` of the
|
||||
`ServiceAccount` are added to the pod.
|
||||
5. It adds a `volume` to the pod which contains a token for API access.
|
||||
6. It adds a `volumeSource` to each container of the pod mounted at `/var/run/secrets/kubernetes.io/serviceaccount`.
|
||||
|
||||
### Token Controller
|
||||
|
||||
TokenController runs as part of controller-manager. It acts asynchronously. It:
|
||||
- observes serviceAccount creation and creates a corresponding Secret to allow API access.
|
||||
- observes serviceAccount deletion and deletes all corresponding ServiceAccountToken Secrets
|
||||
- observes secret addition, and ensures the referenced ServiceAccount exists, and adds a token to the secret if needed
|
||||
- observes secret deletion and removes a reference from the corresponding ServiceAccount if needed
|
||||
|
||||
#### To create additional API tokens
|
||||
|
||||
A controller loop ensures a secret with an API token exists for each service
|
||||
account. To create additional API tokens for a service account, create a secret
|
||||
of type `ServiceAccountToken` with an annotation referencing the service
|
||||
account, and the controller will update it with a generated token:
|
||||
|
||||
```json
|
||||
secret.json:
|
||||
{
|
||||
"kind": "Secret",
|
||||
"apiVersion": "v1",
|
||||
"metadata": {
|
||||
"name": "mysecretname",
|
||||
"annotations": {
|
||||
"kubernetes.io/service-account.name": "myserviceaccount"
|
||||
}
|
||||
},
|
||||
"type": "kubernetes.io/service-account-token"
|
||||
}
|
||||
```
|
||||
|
||||
```sh
|
||||
kubectl create -f ./secret.json
|
||||
kubectl describe secret mysecretname
|
||||
```
|
||||
|
||||
#### To delete/invalidate a service account token
|
||||
|
||||
```sh
|
||||
kubectl delete secret mysecretname
|
||||
```
|
||||
|
||||
### Service Account Controller
|
||||
|
||||
Service Account Controller manages ServiceAccount inside namespaces, and ensures
|
||||
a ServiceAccount named "default" exists in every active namespace.
|
||||
This file has moved to: http://kubernetes.github.io/docs/admin/service-accounts-admin/
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
|
||||
@@ -32,131 +32,7 @@ Documentation for other releases can be found at
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
# Static pods
|
||||
|
||||
**If you are running clustered Kubernetes and are using static pods to run a pod on every node, you should probably be using a [DaemonSet](daemons.md)!**
|
||||
|
||||
*Static pod* are managed directly by kubelet daemon on a specific node, without API server observing it. It does not have associated any replication controller, kubelet daemon itself watches it and restarts it when it crashes. There is no health check though. Static pods are always bound to one kubelet daemon and always run on the same node with it.
|
||||
|
||||
Kubelet automatically creates so-called *mirror pod* on Kubernetes API server for each static pod, so the pods are visible there, but they cannot be controlled from the API server.
|
||||
|
||||
## Static pod creation
|
||||
|
||||
Static pod can be created in two ways: either by using configuration file(s) or by HTTP.
|
||||
|
||||
### Configuration files
|
||||
|
||||
The configuration files are just standard pod definition in json or yaml format in specific directory. Use `kubelet --config=<the directory>` to start kubelet daemon, which periodically scans the directory and creates/deletes static pods as yaml/json files appear/disappear there.
|
||||
|
||||
For example, this is how to start a simple web server as a static pod:
|
||||
|
||||
1. Choose a node where we want to run the static pod. In this example, it's `my-node1`.
|
||||
|
||||
```console
|
||||
[joe@host ~] $ ssh my-node1
|
||||
```
|
||||
|
||||
2. Choose a directory, say `/etc/kubelet.d` and place a web server pod definition there, e.g. `/etc/kubernetes.d/static-web.yaml`:
|
||||
|
||||
```console
|
||||
[root@my-node1 ~] $ mkdir /etc/kubernetes.d/
|
||||
[root@my-node1 ~] $ cat <<EOF >/etc/kubernetes.d/static-web.yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: static-web
|
||||
labels:
|
||||
role: myrole
|
||||
spec:
|
||||
containers:
|
||||
- name: web
|
||||
image: nginx
|
||||
ports:
|
||||
- name: web
|
||||
containerPort: 80
|
||||
protocol: tcp
|
||||
EOF
|
||||
```
|
||||
|
||||
2. Configure your kubelet daemon on the node to use this directory by running it with `--config=/etc/kubelet.d/` argument. On Fedora Fedora 21 with Kubernetes 0.17 edit `/etc/kubernetes/kubelet` to include this line:
|
||||
|
||||
```
|
||||
KUBELET_ARGS="--cluster-dns=10.254.0.10 --cluster-domain=kube.local --config=/etc/kubelet.d/"
|
||||
```
|
||||
|
||||
Instructions for other distributions or Kubernetes installations may vary.
|
||||
|
||||
3. Restart kubelet. On Fedora 21, this is:
|
||||
|
||||
```console
|
||||
[root@my-node1 ~] $ systemctl restart kubelet
|
||||
```
|
||||
|
||||
## Pods created via HTTP
|
||||
|
||||
Kubelet periodically downloads a file specified by `--manifest-url=<URL>` argument and interprets it as a json/yaml file with a pod definition. It works the same as `--config=<directory>`, i.e. it's reloaded every now and then and changes are applied to running static pods (see below).
|
||||
|
||||
## Behavior of static pods
|
||||
|
||||
When kubelet starts, it automatically starts all pods defined in directory specified in `--config=` or `--manifest-url=` arguments, i.e. our static-web. (It may take some time to pull nginx image, be patient…):
|
||||
|
||||
```console
|
||||
[joe@my-node1 ~] $ docker ps
|
||||
CONTAINER ID IMAGE COMMAND CREATED STATUS NAMES
|
||||
f6d05272b57e nginx:latest "nginx" 8 minutes ago Up 8 minutes k8s_web.6f802af4_static-web-fk-node1_default_67e24ed9466ba55986d120c867395f3c_378e5f3c
|
||||
```
|
||||
|
||||
If we look at our Kubernetes API server (running on host `my-master`), we see that a new mirror-pod was created there too:
|
||||
|
||||
```console
|
||||
[joe@host ~] $ ssh my-master
|
||||
[joe@my-master ~] $ kubectl get pods
|
||||
POD IP CONTAINER(S) IMAGE(S) HOST LABELS STATUS CREATED MESSAGE
|
||||
static-web-my-node1 172.17.0.3 my-node1/192.168.100.71 role=myrole Running 11 minutes
|
||||
web nginx Running 11 minutes
|
||||
```
|
||||
|
||||
Labels from the static pod are propagated into the mirror-pod and can be used as usual for filtering.
|
||||
|
||||
Notice we cannot delete the pod with the API server (e.g. via [`kubectl`](../user-guide/kubectl/kubectl.md) command), kubelet simply won't remove it.
|
||||
|
||||
```console
|
||||
[joe@my-master ~] $ kubectl delete pod static-web-my-node1
|
||||
pods/static-web-my-node1
|
||||
[joe@my-master ~] $ kubectl get pods
|
||||
POD IP CONTAINER(S) IMAGE(S) HOST ...
|
||||
static-web-my-node1 172.17.0.3 my-node1/192.168.100.71 ...
|
||||
```
|
||||
|
||||
Back to our `my-node1` host, we can try to stop the container manually and see, that kubelet automatically restarts it in a while:
|
||||
|
||||
```console
|
||||
[joe@host ~] $ ssh my-node1
|
||||
[joe@my-node1 ~] $ docker stop f6d05272b57e
|
||||
[joe@my-node1 ~] $ sleep 20
|
||||
[joe@my-node1 ~] $ docker ps
|
||||
CONTAINER ID IMAGE COMMAND CREATED ...
|
||||
5b920cbaf8b1 nginx:latest "nginx -g 'daemon of 2 seconds ago ...
|
||||
```
|
||||
|
||||
## Dynamic addition and removal of static pods
|
||||
|
||||
Running kubelet periodically scans the configured directory (`/etc/kubelet.d` in our example) for changes and adds/removes pods as files appear/disappear in this directory.
|
||||
|
||||
```console
|
||||
[joe@my-node1 ~] $ mv /etc/kubernetes.d/static-web.yaml /tmp
|
||||
[joe@my-node1 ~] $ sleep 20
|
||||
[joe@my-node1 ~] $ docker ps
|
||||
// no nginx container is running
|
||||
[joe@my-node1 ~] $ mv /tmp/static-web.yaml /etc/kubernetes.d/
|
||||
[joe@my-node1 ~] $ sleep 20
|
||||
[joe@my-node1 ~] $ docker ps
|
||||
CONTAINER ID IMAGE COMMAND CREATED ...
|
||||
e7a62e3427f1 nginx:latest "nginx -g 'daemon of 27 seconds ago
|
||||
```
|
||||
|
||||
|
||||
|
||||
This file has moved to: http://kubernetes.github.io/docs/admin/static-pods/
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
|
||||
Reference in New Issue
Block a user