diff --git a/docs/api-conventions.md b/docs/api-conventions.md index 20406521d21..067f691f4a8 100644 --- a/docs/api-conventions.md +++ b/docs/api-conventions.md @@ -118,7 +118,7 @@ In order to preserve extensibility, in the future, we intend to explicitly conve Note that historical information status (e.g., last transition time, failure counts) is only provided at best effort, and is not guaranteed to not be lost. -Status information that may be large (especially unbounded in size, such as lists of references to other objects -- see below) and/or rapidly changing, such as [resource usage](resources.md#usage-data), should be put into separate objects, with possibly a reference from the original object. This helps to ensure that GETs and watch remain reasonably efficient for the majority of clients, which may not need that data. +Status information that may be large (especially unbounded in size, such as lists of references to other objects -- see below) and/or rapidly changing, such as [resource usage](./design/resources.md#usage-data), should be put into separate objects, with possibly a reference from the original object. This helps to ensure that GETs and watch remain reasonably efficient for the majority of clients, which may not need that data. #### References to related objects diff --git a/docs/api.md b/docs/api.md index bb1271c5921..e44dec9d429 100644 --- a/docs/api.md +++ b/docs/api.md @@ -65,7 +65,7 @@ Some important differences between v1beta1/2 and v1beta3: * The `labels` query parameter has been renamed to `labelSelector`. * The `fields` query parameter has been renamed to `fieldSelector`. * The container `entrypoint` has been renamed to `command`, and `command` has been renamed to `args`. -* Container, volume, and node resources are expressed as nested maps (e.g., `resources{cpu:1}`) rather than as individual fields, and resource values support [scaling suffixes](resources.md#resource-quantities) rather than fixed scales (e.g., milli-cores). +* Container, volume, and node resources are expressed as nested maps (e.g., `resources{cpu:1}`) rather than as individual fields, and resource values support [scaling suffixes](compute_resources.md#specifying-resource-quantities) rather than fixed scales (e.g., milli-cores). * Restart policy is represented simply as a string (e.g., `"Always"`) rather than as a nested map (`always{}`). * Pull policies changed from `PullAlways`, `PullNever`, and `PullIfNotPresent` to `Always`, `Never`, and `IfNotPresent`. * The volume `source` is inlined into `volume` rather than nested. diff --git a/docs/compute_resources.md b/docs/compute_resources.md new file mode 100644 index 00000000000..ff835cb6241 --- /dev/null +++ b/docs/compute_resources.md @@ -0,0 +1,165 @@ +# Compute Resources + +** Table of Contents** +- Compute Resources + - [Container and Pod Resource Limits](#container-and-pod-resource-limits) + - [How Pods with Resource Limits are Scheduled](#how-pods-with-resource-limits-are-scheduled) + - [How Pods with Resource Limits are Run](#how-pods-with-resource-limits-are-run) + - [Monitoring Compute Resource Usage](#monitoring-compute-resource-usage) + - [Troubleshooting](#troubleshooting) + - [Planned Improvements](#planned-improvements) + +When specifying a [pod](./pods.md), you can optionally specify how much CPU and memory (RAM) each +container needs. When containers have resource limits, the scheduler is able to make better +decisions about which nodes to place pods on, and contention for resources can be handled in a +consistent manner. + +*CPU* and *memory* are each a *resource type*. A resource type has a base unit. CPU is specified +in units of cores. Memory is specified in units of bytes. + +CPU and RAM are collectively refered to as *compute resources*, or just *resources*. Compute +resources are measureable quantities which can be requested, allocated, and consumed. They are +distinct from [API resources](./working_with_resources.md). API resources, such as pods and +[services](./services.md) are objects that can be written to and retrieved from the Kubernetes API +server. + +## Container and Pod Resource Limits + +Each container of a Pod can optionally specify `spec.container[].resources.limits.cpu` and/or +`spec.container[].resources.limits.memory`. The `spec.container[].resources.requests` field is not +currently used and need not be set. + +Specifying resource limits is optional. In some clusters, an unset value may be replaced with a +default value when a pod is created or updated. The default value depends on how the cluster is +configured. + +Although limits can only be specified on individual containers, it is convenient to talk about pod +resource limits. A *pod resource limit* for a particular resource type is the sum of the resource +limits of that type for each container in the pod, with unset values treated as zero. + +The following pod has two containers. Each has a limit of 0.5 core of cpu and 128MiB +(220 bytes) of memory. The pod can be said to have a limit of 1 core and 256MiB of +memory. + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: frontend +spec: + containers: + - name: db + image: mysql + resources: + limits: + memory: "128Mi" + cpu: "500m" + - name: wp + image: wordpress + resources: + limits: + memory: "128Mi" + cpu: "500m" +``` + +## How Pods with Resource Limits are Scheduled + +When a pod is created, the kubernetes scheduler selects a node for the pod to +run on. Each node has a maximum capacity for each of the resource types: the +amount of CPU and memory it can provide for pods. The scheduler ensures that, +for each resource type (CPU and memory), the sum of the resource limits of the +containers scheduled to the node is less than the capacity of the node. Note +that although actual memory or CPU resource usage on nodes is very low, the +scheduler will still refuse to place pods onto nodes if the capacity check +fails. This protects against a resource shortage on a node when resource usage +later increases, such as due to a daily peak in request rate. + +Note: Although the scheduler normally spreads pods out across nodes, there are currently some cases +where pods with no limits (unset values) might all land on the same node. + +## How Pods with Resource Limits are Run + +When kubelet starts a container of a pod, it passes the CPU and memory limits to the container +runner (Docker or rkt). + +When using Docker: +- The `spec.container[].resources.limits.cpu` is multiplied by 1024, converted to an integer, and + used as the value of the [`--cpu-shares`]( + https://docs.docker.com/reference/run/#runtime-constraints-on-resources) flag to the `docker run` + command. +- The `spec.container[].resources.limits.memory` is converted to an integer, and used as the value + of the [`--memory`](https://docs.docker.com/reference/run/#runtime-constraints-on-resources) flag + to the `docker run` command. + +**TODO: document behavior for rkt** + +If a container exceeds its memory limit, it may be terminated. If it is restartable, it will be +restarted by kubelet, as will any other type of runtime failure. If it is killed for exceeding its +memory limit, you will see the reason `OOM Killed`, as in this example: +``` +$ kubectl get pods/memhog +NAME READY REASON RESTARTS AGE +memhog 0/1 OOM Killed 0 1h +``` +*OOM* stands for Out Of Memory. + +A container may or may not be allowed to exceed its CPU limit for extended periods of time. +However, it will not be killed for excessive CPU usage. + +## Monitoring Compute Resource Usage + +The resource usage of a pod is reported as part of the Pod status. + +If [optional monitoring](../cluster/addons/monitoring/README.md) is configured for your cluster, +then pod resource usage can be retrieved from the monitoring system. + +## Troubleshooting + +If the scheduler cannot find any node where a pod can fit, then the pod will remain unscheduled +until a place can be found. An event will be produced each time the scheduler fails to find a +place for the pod, like this: +``` +$ kubectl describe pods/frontend | grep -A 3 Events +Events: + FirstSeen LastSeen Count From SubobjectPath Reason Message + Tue, 30 Jun 2015 09:01:41 -0700 Tue, 30 Jun 2015 09:39:27 -0700 128 {scheduler } failedScheduling Error scheduling: For each of these fitness predicates, pod frontend failed on at least one node: PodFitsResources. +``` + +If a pod or pods are pending with this message, then there are several things to try: +- Add more nodes to the cluster. +- Terminate unneeded pods to make room for pending pods. +- Check that the pod is not larger than all the nodes. For example, if all the nodes +have a capacity of `cpu: 1`, then a pod with a limit of `cpu: 1.1` will never be scheduled. + +You can check node capacities with the `kubectl get nodes -o ` command. +Here are some example command lines that extract just the necessary information: +- `kubectl get nodes -o yaml | grep '\sname\|cpu\|memory'` +- `kubectl get nodes -o json | jq '.items[] | {name: .metadata.name, cap: .status.capacity}'` + +The [resource quota](./resource_quota_admin.md) feature can be configured +to limit the total amount of resources that can be consumed. If used in conjunction +with namespaces, it can prevent one team from hogging all the resources. + +## Planned Improvements + +The current system only allows resource quantities to be specified on a container. +It is planned to improve accounting for resources which are shared by all containers in a pod, +such as [EmptyDir volumes](./volumes.md#emptydir). + +The current system only supports container limits for CPU and Memory. +It is planned to add new resource types, including a node disk space +resource, and a framework for adding custom [resource types](./design/resources.md#resource-types). + +The current system does not facilitate overcommitment of resources because resources reserved +with container limits are assured. It is planned to support multiple levels of [Quality of +Service](https://github.com/GoogleCloudPlatform/kubernetes/issues/168). + +Currently, one unit of CPU means different things on different cloud providers, and on different +machine types within the same cloud providers. For example, on AWS, the capacity of a node +is reported in [ECUs](http://aws.amazon.com/ec2/faqs/), while in GCE it is reported in logical +cores. We plan to revise the definition of the cpu resource to allow for more consistency +across providers and platforms. + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/compute_resources.md?pixel)]() diff --git a/docs/design/access.md b/docs/design/access.md index dd64784e4eb..72ca969cb65 100644 --- a/docs/design/access.md +++ b/docs/design/access.md @@ -212,7 +212,7 @@ Policy objects may be applicable only to a single namespace or to all namespaces ## Accounting -The API should have a `quota` concept (see https://github.com/GoogleCloudPlatform/kubernetes/issues/442). A quota object relates a namespace (and optionally a label selector) to a maximum quantity of resources that may be used (see [resources.md](/docs/resources.md)). +The API should have a `quota` concept (see https://github.com/GoogleCloudPlatform/kubernetes/issues/442). A quota object relates a namespace (and optionally a label selector) to a maximum quantity of resources that may be used (see [resources design doc](resources.md)). Initially: - a `quota` object is immutable. diff --git a/docs/resources.md b/docs/design/resources.md similarity index 97% rename from docs/resources.md rename to docs/design/resources.md index b18f5f289cf..17bb5c18a93 100644 --- a/docs/resources.md +++ b/docs/design/resources.md @@ -1,4 +1,9 @@ -**Note that the model described in this document has not yet been implemented. The tracking issue for implementation of this model is [#168](https://github.com/GoogleCloudPlatform/kubernetes/issues/168). Currently, only memory and cpu limits on containers (not pods) are supported. "memory" is in bytes and "cpu" is in milli-cores.** +**Note: this is a design doc, which describes features that have not been completely implemented. +User documentation of the current state is [here](../resources.md). The tracking issue for +implementation of this model is +[#168](https://github.com/GoogleCloudPlatform/kubernetes/issues/168). Currently, only memory and +cpu limits on containers (not pods) are supported. "memory" is in bytes and "cpu" is in +milli-cores.** # The Kubernetes resource model @@ -208,4 +213,4 @@ This is the amount of time a container spends accessing disk, including actuator * Compressible? yes -[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/resources.md?pixel)]() +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/resources.md?pixel)]() diff --git a/docs/glossary.md b/docs/glossary.md index 54e88b5f54c..36868a013ff 100644 --- a/docs/glossary.md +++ b/docs/glossary.md @@ -33,7 +33,7 @@ can be created/destroyed together. See [pods](./pods.md). for easy scaling of replicated systems, and handles restarting of a Pod when the machine it is on reboots or otherwise fails. **Resource** -: CPU, memory, and other things that a pod can request. See [resources](resources.md). +: CPU, memory, and other things that a pod can request. See [compute resources](compute_resources.md). **Secret** : An object containing sensitive information, such as authentication tokens, which can be made available to containers upon request. See [secrets](secrets.md). diff --git a/docs/user-guide.md b/docs/user-guide.md index a3ab0e0c056..bfbb2287ec9 100644 --- a/docs/user-guide.md +++ b/docs/user-guide.md @@ -94,7 +94,7 @@ for i in *.md; do grep -r $i . | grep -v "^\./$i" > /dev/null; rv=$?; if [[ $rv * **Services and firewalls** ([services-firewalls.md](services-firewalls.md)): How to use firewalls. -* **The Kubernetes Resource Model** ([resources.md](resources.md)): +* **Compute Resources** ([compute_resources.md](compute_resources.md)): Provides resource information such as size, type, and quantity to assist in assigning Kubernetes resources appropriately. diff --git a/docs/volumes.md b/docs/volumes.md index 110bb74967b..772fd20b226 100644 --- a/docs/volumes.md +++ b/docs/volumes.md @@ -102,6 +102,6 @@ pods that mount these volumes. Secrets are described [here](secrets.md). The storage media (Disk, SSD, or memory) of an EmptyDir volume is determined by the media of the filesystem holding the kubelet root dir (typically `/var/lib/kubelet`). There is no limit on how much space an EmptyDir or HostPath volume can consume, and no isolation between containers or between pods. -In the future, we expect that EmptyDir and HostPath volumes will be able to request a certain amount of space using a [resource](./resources.md) specification, and to select the type of media to use, for clusters that have several media types. +In the future, we expect that EmptyDir and HostPath volumes will be able to request a certain amount of space using a [compute resource](./compute_resources.md) specification, and to select the type of media to use, for clusters that have several media types. [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/volumes.md?pixel)]()