mirror of
https://github.com/k3s-io/kubernetes.git
synced 2025-08-03 17:30:00 +00:00
Add user-oriented compute resource doc.
Adds docs/compute_resources.md with user-oriented explanation of compute resources. Reveals detail gradually and includes examples and troubleshooting. Examples are tested. Moves design-focused docs/resources.md to docs/design/resources.md. Updates links to that.
This commit is contained in:
parent
99711263a1
commit
fd325982c3
@ -118,7 +118,7 @@ In order to preserve extensibility, in the future, we intend to explicitly conve
|
|||||||
|
|
||||||
Note that historical information status (e.g., last transition time, failure counts) is only provided at best effort, and is not guaranteed to not be lost.
|
Note that historical information status (e.g., last transition time, failure counts) is only provided at best effort, and is not guaranteed to not be lost.
|
||||||
|
|
||||||
Status information that may be large (especially unbounded in size, such as lists of references to other objects -- see below) and/or rapidly changing, such as [resource usage](resources.md#usage-data), should be put into separate objects, with possibly a reference from the original object. This helps to ensure that GETs and watch remain reasonably efficient for the majority of clients, which may not need that data.
|
Status information that may be large (especially unbounded in size, such as lists of references to other objects -- see below) and/or rapidly changing, such as [resource usage](./design/resources.md#usage-data), should be put into separate objects, with possibly a reference from the original object. This helps to ensure that GETs and watch remain reasonably efficient for the majority of clients, which may not need that data.
|
||||||
|
|
||||||
#### References to related objects
|
#### References to related objects
|
||||||
|
|
||||||
|
@ -65,7 +65,7 @@ Some important differences between v1beta1/2 and v1beta3:
|
|||||||
* The `labels` query parameter has been renamed to `labelSelector`.
|
* The `labels` query parameter has been renamed to `labelSelector`.
|
||||||
* The `fields` query parameter has been renamed to `fieldSelector`.
|
* The `fields` query parameter has been renamed to `fieldSelector`.
|
||||||
* The container `entrypoint` has been renamed to `command`, and `command` has been renamed to `args`.
|
* The container `entrypoint` has been renamed to `command`, and `command` has been renamed to `args`.
|
||||||
* Container, volume, and node resources are expressed as nested maps (e.g., `resources{cpu:1}`) rather than as individual fields, and resource values support [scaling suffixes](resources.md#resource-quantities) rather than fixed scales (e.g., milli-cores).
|
* Container, volume, and node resources are expressed as nested maps (e.g., `resources{cpu:1}`) rather than as individual fields, and resource values support [scaling suffixes](compute_resources.md#specifying-resource-quantities) rather than fixed scales (e.g., milli-cores).
|
||||||
* Restart policy is represented simply as a string (e.g., `"Always"`) rather than as a nested map (`always{}`).
|
* Restart policy is represented simply as a string (e.g., `"Always"`) rather than as a nested map (`always{}`).
|
||||||
* Pull policies changed from `PullAlways`, `PullNever`, and `PullIfNotPresent` to `Always`, `Never`, and `IfNotPresent`.
|
* Pull policies changed from `PullAlways`, `PullNever`, and `PullIfNotPresent` to `Always`, `Never`, and `IfNotPresent`.
|
||||||
* The volume `source` is inlined into `volume` rather than nested.
|
* The volume `source` is inlined into `volume` rather than nested.
|
||||||
|
165
docs/compute_resources.md
Normal file
165
docs/compute_resources.md
Normal file
@ -0,0 +1,165 @@
|
|||||||
|
# Compute Resources
|
||||||
|
|
||||||
|
** Table of Contents**
|
||||||
|
- Compute Resources
|
||||||
|
- [Container and Pod Resource Limits](#container-and-pod-resource-limits)
|
||||||
|
- [How Pods with Resource Limits are Scheduled](#how-pods-with-resource-limits-are-scheduled)
|
||||||
|
- [How Pods with Resource Limits are Run](#how-pods-with-resource-limits-are-run)
|
||||||
|
- [Monitoring Compute Resource Usage](#monitoring-compute-resource-usage)
|
||||||
|
- [Troubleshooting](#troubleshooting)
|
||||||
|
- [Planned Improvements](#planned-improvements)
|
||||||
|
|
||||||
|
When specifying a [pod](./pods.md), you can optionally specify how much CPU and memory (RAM) each
|
||||||
|
container needs. When containers have resource limits, the scheduler is able to make better
|
||||||
|
decisions about which nodes to place pods on, and contention for resources can be handled in a
|
||||||
|
consistent manner.
|
||||||
|
|
||||||
|
*CPU* and *memory* are each a *resource type*. A resource type has a base unit. CPU is specified
|
||||||
|
in units of cores. Memory is specified in units of bytes.
|
||||||
|
|
||||||
|
CPU and RAM are collectively refered to as *compute resources*, or just *resources*. Compute
|
||||||
|
resources are measureable quantities which can be requested, allocated, and consumed. They are
|
||||||
|
distinct from [API resources](./working_with_resources.md). API resources, such as pods and
|
||||||
|
[services](./services.md) are objects that can be written to and retrieved from the Kubernetes API
|
||||||
|
server.
|
||||||
|
|
||||||
|
## Container and Pod Resource Limits
|
||||||
|
|
||||||
|
Each container of a Pod can optionally specify `spec.container[].resources.limits.cpu` and/or
|
||||||
|
`spec.container[].resources.limits.memory`. The `spec.container[].resources.requests` field is not
|
||||||
|
currently used and need not be set.
|
||||||
|
|
||||||
|
Specifying resource limits is optional. In some clusters, an unset value may be replaced with a
|
||||||
|
default value when a pod is created or updated. The default value depends on how the cluster is
|
||||||
|
configured.
|
||||||
|
|
||||||
|
Although limits can only be specified on individual containers, it is convenient to talk about pod
|
||||||
|
resource limits. A *pod resource limit* for a particular resource type is the sum of the resource
|
||||||
|
limits of that type for each container in the pod, with unset values treated as zero.
|
||||||
|
|
||||||
|
The following pod has two containers. Each has a limit of 0.5 core of cpu and 128MiB
|
||||||
|
(2<sup>20</sup> bytes) of memory. The pod can be said to have a limit of 1 core and 256MiB of
|
||||||
|
memory.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Pod
|
||||||
|
metadata:
|
||||||
|
name: frontend
|
||||||
|
spec:
|
||||||
|
containers:
|
||||||
|
- name: db
|
||||||
|
image: mysql
|
||||||
|
resources:
|
||||||
|
limits:
|
||||||
|
memory: "128Mi"
|
||||||
|
cpu: "500m"
|
||||||
|
- name: wp
|
||||||
|
image: wordpress
|
||||||
|
resources:
|
||||||
|
limits:
|
||||||
|
memory: "128Mi"
|
||||||
|
cpu: "500m"
|
||||||
|
```
|
||||||
|
|
||||||
|
## How Pods with Resource Limits are Scheduled
|
||||||
|
|
||||||
|
When a pod is created, the kubernetes scheduler selects a node for the pod to
|
||||||
|
run on. Each node has a maximum capacity for each of the resource types: the
|
||||||
|
amount of CPU and memory it can provide for pods. The scheduler ensures that,
|
||||||
|
for each resource type (CPU and memory), the sum of the resource limits of the
|
||||||
|
containers scheduled to the node is less than the capacity of the node. Note
|
||||||
|
that although actual memory or CPU resource usage on nodes is very low, the
|
||||||
|
scheduler will still refuse to place pods onto nodes if the capacity check
|
||||||
|
fails. This protects against a resource shortage on a node when resource usage
|
||||||
|
later increases, such as due to a daily peak in request rate.
|
||||||
|
|
||||||
|
Note: Although the scheduler normally spreads pods out across nodes, there are currently some cases
|
||||||
|
where pods with no limits (unset values) might all land on the same node.
|
||||||
|
|
||||||
|
## How Pods with Resource Limits are Run
|
||||||
|
|
||||||
|
When kubelet starts a container of a pod, it passes the CPU and memory limits to the container
|
||||||
|
runner (Docker or rkt).
|
||||||
|
|
||||||
|
When using Docker:
|
||||||
|
- The `spec.container[].resources.limits.cpu` is multiplied by 1024, converted to an integer, and
|
||||||
|
used as the value of the [`--cpu-shares`](
|
||||||
|
https://docs.docker.com/reference/run/#runtime-constraints-on-resources) flag to the `docker run`
|
||||||
|
command.
|
||||||
|
- The `spec.container[].resources.limits.memory` is converted to an integer, and used as the value
|
||||||
|
of the [`--memory`](https://docs.docker.com/reference/run/#runtime-constraints-on-resources) flag
|
||||||
|
to the `docker run` command.
|
||||||
|
|
||||||
|
**TODO: document behavior for rkt**
|
||||||
|
|
||||||
|
If a container exceeds its memory limit, it may be terminated. If it is restartable, it will be
|
||||||
|
restarted by kubelet, as will any other type of runtime failure. If it is killed for exceeding its
|
||||||
|
memory limit, you will see the reason `OOM Killed`, as in this example:
|
||||||
|
```
|
||||||
|
$ kubectl get pods/memhog
|
||||||
|
NAME READY REASON RESTARTS AGE
|
||||||
|
memhog 0/1 OOM Killed 0 1h
|
||||||
|
```
|
||||||
|
*OOM* stands for Out Of Memory.
|
||||||
|
|
||||||
|
A container may or may not be allowed to exceed its CPU limit for extended periods of time.
|
||||||
|
However, it will not be killed for excessive CPU usage.
|
||||||
|
|
||||||
|
## Monitoring Compute Resource Usage
|
||||||
|
|
||||||
|
The resource usage of a pod is reported as part of the Pod status.
|
||||||
|
|
||||||
|
If [optional monitoring](../cluster/addons/monitoring/README.md) is configured for your cluster,
|
||||||
|
then pod resource usage can be retrieved from the monitoring system.
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
If the scheduler cannot find any node where a pod can fit, then the pod will remain unscheduled
|
||||||
|
until a place can be found. An event will be produced each time the scheduler fails to find a
|
||||||
|
place for the pod, like this:
|
||||||
|
```
|
||||||
|
$ kubectl describe pods/frontend | grep -A 3 Events
|
||||||
|
Events:
|
||||||
|
FirstSeen LastSeen Count From SubobjectPath Reason Message
|
||||||
|
Tue, 30 Jun 2015 09:01:41 -0700 Tue, 30 Jun 2015 09:39:27 -0700 128 {scheduler } failedScheduling Error scheduling: For each of these fitness predicates, pod frontend failed on at least one node: PodFitsResources.
|
||||||
|
```
|
||||||
|
|
||||||
|
If a pod or pods are pending with this message, then there are several things to try:
|
||||||
|
- Add more nodes to the cluster.
|
||||||
|
- Terminate unneeded pods to make room for pending pods.
|
||||||
|
- Check that the pod is not larger than all the nodes. For example, if all the nodes
|
||||||
|
have a capacity of `cpu: 1`, then a pod with a limit of `cpu: 1.1` will never be scheduled.
|
||||||
|
|
||||||
|
You can check node capacities with the `kubectl get nodes -o <format>` command.
|
||||||
|
Here are some example command lines that extract just the necessary information:
|
||||||
|
- `kubectl get nodes -o yaml | grep '\sname\|cpu\|memory'`
|
||||||
|
- `kubectl get nodes -o json | jq '.items[] | {name: .metadata.name, cap: .status.capacity}'`
|
||||||
|
|
||||||
|
The [resource quota](./resource_quota_admin.md) feature can be configured
|
||||||
|
to limit the total amount of resources that can be consumed. If used in conjunction
|
||||||
|
with namespaces, it can prevent one team from hogging all the resources.
|
||||||
|
|
||||||
|
## Planned Improvements
|
||||||
|
|
||||||
|
The current system only allows resource quantities to be specified on a container.
|
||||||
|
It is planned to improve accounting for resources which are shared by all containers in a pod,
|
||||||
|
such as [EmptyDir volumes](./volumes.md#emptydir).
|
||||||
|
|
||||||
|
The current system only supports container limits for CPU and Memory.
|
||||||
|
It is planned to add new resource types, including a node disk space
|
||||||
|
resource, and a framework for adding custom [resource types](./design/resources.md#resource-types).
|
||||||
|
|
||||||
|
The current system does not facilitate overcommitment of resources because resources reserved
|
||||||
|
with container limits are assured. It is planned to support multiple levels of [Quality of
|
||||||
|
Service](https://github.com/GoogleCloudPlatform/kubernetes/issues/168).
|
||||||
|
|
||||||
|
Currently, one unit of CPU means different things on different cloud providers, and on different
|
||||||
|
machine types within the same cloud providers. For example, on AWS, the capacity of a node
|
||||||
|
is reported in [ECUs](http://aws.amazon.com/ec2/faqs/), while in GCE it is reported in logical
|
||||||
|
cores. We plan to revise the definition of the cpu resource to allow for more consistency
|
||||||
|
across providers and platforms.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
[]()
|
@ -212,7 +212,7 @@ Policy objects may be applicable only to a single namespace or to all namespaces
|
|||||||
|
|
||||||
## Accounting
|
## Accounting
|
||||||
|
|
||||||
The API should have a `quota` concept (see https://github.com/GoogleCloudPlatform/kubernetes/issues/442). A quota object relates a namespace (and optionally a label selector) to a maximum quantity of resources that may be used (see [resources.md](/docs/resources.md)).
|
The API should have a `quota` concept (see https://github.com/GoogleCloudPlatform/kubernetes/issues/442). A quota object relates a namespace (and optionally a label selector) to a maximum quantity of resources that may be used (see [resources design doc](resources.md)).
|
||||||
|
|
||||||
Initially:
|
Initially:
|
||||||
- a `quota` object is immutable.
|
- a `quota` object is immutable.
|
||||||
|
@ -1,4 +1,9 @@
|
|||||||
**Note that the model described in this document has not yet been implemented. The tracking issue for implementation of this model is [#168](https://github.com/GoogleCloudPlatform/kubernetes/issues/168). Currently, only memory and cpu limits on containers (not pods) are supported. "memory" is in bytes and "cpu" is in milli-cores.**
|
**Note: this is a design doc, which describes features that have not been completely implemented.
|
||||||
|
User documentation of the current state is [here](../resources.md). The tracking issue for
|
||||||
|
implementation of this model is
|
||||||
|
[#168](https://github.com/GoogleCloudPlatform/kubernetes/issues/168). Currently, only memory and
|
||||||
|
cpu limits on containers (not pods) are supported. "memory" is in bytes and "cpu" is in
|
||||||
|
milli-cores.**
|
||||||
|
|
||||||
# The Kubernetes resource model
|
# The Kubernetes resource model
|
||||||
|
|
||||||
@ -208,4 +213,4 @@ This is the amount of time a container spends accessing disk, including actuator
|
|||||||
* Compressible? yes
|
* Compressible? yes
|
||||||
|
|
||||||
|
|
||||||
[]()
|
[]()
|
@ -33,7 +33,7 @@ can be created/destroyed together. See [pods](./pods.md).
|
|||||||
for easy scaling of replicated systems, and handles restarting of a Pod when the machine it is on reboots or otherwise fails.
|
for easy scaling of replicated systems, and handles restarting of a Pod when the machine it is on reboots or otherwise fails.
|
||||||
|
|
||||||
**Resource**
|
**Resource**
|
||||||
: CPU, memory, and other things that a pod can request. See [resources](resources.md).
|
: CPU, memory, and other things that a pod can request. See [compute resources](compute_resources.md).
|
||||||
|
|
||||||
**Secret**
|
**Secret**
|
||||||
: An object containing sensitive information, such as authentication tokens, which can be made available to containers upon request. See [secrets](secrets.md).
|
: An object containing sensitive information, such as authentication tokens, which can be made available to containers upon request. See [secrets](secrets.md).
|
||||||
|
@ -94,7 +94,7 @@ for i in *.md; do grep -r $i . | grep -v "^\./$i" > /dev/null; rv=$?; if [[ $rv
|
|||||||
* **Services and firewalls** ([services-firewalls.md](services-firewalls.md)): How
|
* **Services and firewalls** ([services-firewalls.md](services-firewalls.md)): How
|
||||||
to use firewalls.
|
to use firewalls.
|
||||||
|
|
||||||
* **The Kubernetes Resource Model** ([resources.md](resources.md)):
|
* **Compute Resources** ([compute_resources.md](compute_resources.md)):
|
||||||
Provides resource information such as size, type, and quantity to assist in
|
Provides resource information such as size, type, and quantity to assist in
|
||||||
assigning Kubernetes resources appropriately.
|
assigning Kubernetes resources appropriately.
|
||||||
|
|
||||||
|
@ -102,6 +102,6 @@ pods that mount these volumes. Secrets are described [here](secrets.md).
|
|||||||
The storage media (Disk, SSD, or memory) of an EmptyDir volume is determined by the media of the filesystem holding the kubelet root dir (typically `/var/lib/kubelet`).
|
The storage media (Disk, SSD, or memory) of an EmptyDir volume is determined by the media of the filesystem holding the kubelet root dir (typically `/var/lib/kubelet`).
|
||||||
There is no limit on how much space an EmptyDir or HostPath volume can consume, and no isolation between containers or between pods.
|
There is no limit on how much space an EmptyDir or HostPath volume can consume, and no isolation between containers or between pods.
|
||||||
|
|
||||||
In the future, we expect that EmptyDir and HostPath volumes will be able to request a certain amount of space using a [resource](./resources.md) specification, and to select the type of media to use, for clusters that have several media types.
|
In the future, we expect that EmptyDir and HostPath volumes will be able to request a certain amount of space using a [compute resource](./compute_resources.md) specification, and to select the type of media to use, for clusters that have several media types.
|
||||||
|
|
||||||
[]()
|
[]()
|
||||||
|
Loading…
Reference in New Issue
Block a user