Merge pull request #15779 from jszczepkowski/hpa-docs

Proposal for horizontal pod autoscaler updated and moved to design.
This commit is contained in:
Piotr Szczesniak 2015-10-26 09:53:43 +01:00
commit cbf4bc5f7f
4 changed files with 281 additions and 281 deletions

View File

@ -0,0 +1,272 @@
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
<!-- BEGIN STRIP_FOR_RELEASE -->
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
width="25" height="25">
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
width="25" height="25">
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
width="25" height="25">
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
width="25" height="25">
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
width="25" height="25">
<h2>PLEASE NOTE: This document applies to the HEAD of the source tree</h2>
If you are using a released version of Kubernetes, you should
refer to the docs that go with that version.
<strong>
The latest 1.0.x release of this document can be found
[here](http://releases.k8s.io/release-1.0/docs/design/horizontal-pod-autoscaler.md).
Documentation for other releases can be found at
[releases.k8s.io](http://releases.k8s.io).
</strong>
--
<!-- END STRIP_FOR_RELEASE -->
<!-- END MUNGE: UNVERSIONED_WARNING -->
# Horizontal Pod Autoscaling
## Preface
This document briefly describes the design of the horizontal autoscaler for pods.
The autoscaler (implemented as a Kubernetes API resource and controller) is responsible for dynamically controlling
the number of replicas of some collection (e.g. the pods of a ReplicationController) to meet some objective(s),
for example a target per-pod CPU utilization.
This design supersedes [autoscaling.md](http://releases.k8s.io/release-1.0/docs/proposals/autoscaling.md).
## Overview
The resource usage of a serving application usually varies over time: sometimes the demand for the application rises,
and sometimes it drops.
In Kubernetes version 1.0, a user can only manually set the number of serving pods.
Our aim is to provide a mechanism for the automatic adjustment of the number of pods based on CPU utilization statistics
(a future version will allow autoscaling based on other resources/metrics).
## Scale Subresource
In Kubernetes version 1.1, we are introducing Scale subresource and implementing horizontal autoscaling of pods based on it.
Scale subresource is supported for replication controllers and deployments.
Scale subresource is a Virtual Resource (does not correspond to an object stored in etcd).
It is only present in the API as an interface that a controller (in this case the HorizontalPodAutoscaler) can use to dynamically scale
the number of replicas controlled by some other API object (currently ReplicationController and Deployment) and to learn the current number of replicas.
Scale is a subresource of the API object that it serves as the interface for.
The Scale subresource is useful because whenever we introduce another type we want to autoscale, we just need to implement the Scale subresource for it.
The wider discussion regarding Scale took place in [#1629](https://github.com/kubernetes/kubernetes/issues/1629).
Scale subresource is in API for replication controller or deployment under the following paths:
`apis/extensions/v1beta1/replicationcontrollers/myrc/scale`
`apis/extensions/v1beta1/deployments/mydeployment/scale`
It has the following structure:
```go
// represents a scaling request for a resource.
type Scale struct {
unversioned.TypeMeta
api.ObjectMeta
// defines the behavior of the scale.
Spec ScaleSpec
// current status of the scale.
Status ScaleStatus
}
// describes the attributes of a scale subresource
type ScaleSpec struct {
// desired number of instances for the scaled object.
Replicas int `json:"replicas,omitempty"`
}
// represents the current status of a scale subresource.
type ScaleStatus struct {
// actual number of observed instances of the scaled object.
Replicas int `json:"replicas"`
// label query over pods that should match the replicas count.
Selector map[string]string `json:"selector,omitempty"`
}
```
Writing to `ScaleSpec.Replicas` resizes the replication controller/deployment associated with
the given Scale subresource.
`ScaleStatus.Replicas` reports how many pods are currently running in the replication controller/deployment,
and `ScaleStatus.Selector` returns selector for the pods.
## HorizontalPodAutoscaler Object
In Kubernetes version 1.1, we are introducing HorizontalPodAutoscaler object. It is accessible under:
`apis/extensions/v1beta1/horizontalpodautoscalers/myautoscaler`
It has the following structure:
```go
// configuration of a horizontal pod autoscaler.
type HorizontalPodAutoscaler struct {
unversioned.TypeMeta
api.ObjectMeta
// behavior of autoscaler.
Spec HorizontalPodAutoscalerSpec
// current information about the autoscaler.
Status HorizontalPodAutoscalerStatus
}
// specification of a horizontal pod autoscaler.
type HorizontalPodAutoscalerSpec struct {
// reference to Scale subresource; horizontal pod autoscaler will learn the current resource
// consumption from its status,and will set the desired number of pods by modifying its spec.
ScaleRef SubresourceReference
// lower limit for the number of pods that can be set by the autoscaler, default 1.
MinReplicas *int
// upper limit for the number of pods that can be set by the autoscaler.
// It cannot be smaller than MinReplicas.
MaxReplicas int
// target average CPU utilization (represented as a percentage of requested CPU) over all the pods;
// if not specified it defaults to the target CPU utilization at 80% of the requested resources.
CPUUtilization *CPUTargetUtilization
}
type CPUTargetUtilization struct {
// fraction of the requested CPU that should be utilized/used,
// e.g. 70 means that 70% of the requested CPU should be in use.
TargetPercentage int
}
// current status of a horizontal pod autoscaler
type HorizontalPodAutoscalerStatus struct {
// most recent generation observed by this autoscaler.
ObservedGeneration *int64
// last time the HorizontalPodAutoscaler scaled the number of pods;
// used by the autoscaler to control how often the number of pods is changed.
LastScaleTime *unversioned.Time
// current number of replicas of pods managed by this autoscaler.
CurrentReplicas int
// desired number of replicas of pods managed by this autoscaler.
DesiredReplicas int
// current average CPU utilization over all pods, represented as a percentage of requested CPU,
// e.g. 70 means that an average pod is using now 70% of its requested CPU.
CurrentCPUUtilizationPercentage *int
}
```
`ScaleRef` is a reference to the Scale subresource.
`MinReplicas`, `MaxReplicas` and `CPUUtilization` define autoscaler configuration.
We are also introducing HorizontalPodAutoscalerList object to enable listing all autoscalers in a namespace:
```go
// list of horizontal pod autoscaler objects.
type HorizontalPodAutoscalerList struct {
unversioned.TypeMeta
unversioned.ListMeta
// list of horizontal pod autoscaler objects.
Items []HorizontalPodAutoscaler
}
```
## Autoscaling Algorithm
The autoscaler is implemented as a control loop. It periodically queries pods described by `Status.PodSelector` of Scale subresource, and collects their CPU utilization.
Then, it compares the arithmetic mean of the pods' CPU utilization with the target defined in `Spec.CPUUtilization`,
and adjust the replicas of the Scale if needed to match the target
(preserving condition: MinReplicas <= Replicas <= MaxReplicas).
The period of the autoscaler is controlled by `--horizontal-pod-autoscaler-sync-period` flag of controller manager.
The default value is 30 seconds.
CPU utilization is the recent CPU usage of a pod (average across the last 1 minute) divided by the CPU requested by the pod.
In Kubernetes version 1.1, CPU usage is taken directly from Heapster.
In future, there will be API on master for this purpose
(see [#11951](https://github.com/kubernetes/kubernetes/issues/11951)).
The target number of pods is calculated from the following formula:
```
TargetNumOfPods = ceil(sum(CurrentPodsCPUUtilization) / Target)
```
Starting and stopping pods may introduce noise to the metric (for instance, starting may temporarily increase CPU).
So, after each action, the autoscaler should wait some time for reliable data.
Scale-up can only happen if there was no rescaling within the last 3 minutes.
Scale-down will wait for 5 minutes from the last rescaling.
Moreover any scaling will only be made if: `avg(CurrentPodsConsumption) / Target` drops below 0.9 or increases above 1.1 (10% tolerance).
Such approach has two benefits:
* Autoscaler works in a conservative way.
If new user load appears, it is important for us to rapidly increase the number of pods,
so that user requests will not be rejected.
Lowering the number of pods is not that urgent.
* Autoscaler avoids thrashing, i.e.: prevents rapid execution of conflicting decision if the load is not stable.
## Relative vs. absolute metrics
We chose values of the target metric to be relative (e.g. 90% of requested CPU resource) rather than absolute (e.g. 0.6 core) for the following reason.
If we choose absolute metric, user will need to guarantee that the target is lower than the request.
Otherwise, overloaded pods may not be able to consume more than the autoscaler's absolute target utilization,
thereby preventing the autoscaler from seeing high enough utilization to trigger it to scale up.
This may be especially troublesome when user changes requested resources for a pod
because they would need to also change the autoscaler utilization threshold.
Therefore, we decided to choose relative metric.
For user, it is enough to set it to a value smaller than 100%, and further changes of requested resources will not invalidate it.
## Support in kubectl
To make manipulation of HorizontalPodAutoscaler object simpler, we added support for
creating/updating/deleting/listing of HorizontalPodAutoscaler to kubectl.
In addition, in future, we are planning to add kubectl support for the following use-cases:
* When creating a replication controller or deployment with `kubectl create [-f]`, there should be
a possibility to specify an additional autoscaler object.
(This should work out-of-the-box when creation of autoscaler is supported by kubectl as we may include
multiple objects in the same config file).
* *[future]* When running an image with `kubectl run`, there should be an additional option to create
an autoscaler for it.
* *[future]* We will add a new command `kubectl autoscale` that will allow for easy creation of an autoscaler object
for already existing replication controller/deployment.
## Next steps
We list here some features that are not supported in Kubernetes version 1.1.
However, we want to keep them in mind, as they will most probably be needed in future.
Our design is in general compatible with them.
* *[future]* **Autoscale pods based on metrics different than CPU** (e.g. memory, network traffic, qps).
This includes scaling based on a custom/application metric.
* *[future]* **Autoscale pods base on an aggregate metric.**
Autoscaler, instead of computing average for a target metric across pods, will use a single, external, metric (e.g. qps metric from load balancer).
The metric will be aggregated while the target will remain per-pod
(e.g. when observing 100 qps on load balancer while the target is 20 qps per pod, autoscaler will set the number of replicas to 5).
* *[future]* **Autoscale pods based on multiple metrics.**
If the target numbers of pods for different metrics are different, choose the largest target number of pods.
* *[future]* **Scale the number of pods starting from 0.**
All pods can be turned-off, and then turned-on when there is a demand for them.
When a request to service with no pods arrives, kube-proxy will generate an event for autoscaler
to create a new pod.
Discussed in [#3247](https://github.com/kubernetes/kubernetes/issues/3247).
* *[future]* **When scaling down, make more educated decision which pods to kill.**
E.g.: if two or more pods from the same replication controller are on the same node, kill one of them.
Discussed in [#4301](https://github.com/kubernetes/kubernetes/issues/4301).
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/horizontal-pod-autoscaler.md?pixel)]()
<!-- END MUNGE: GENERATED_ANALYTICS -->

View File

@ -31,6 +31,14 @@ Documentation for other releases can be found at
<!-- END MUNGE: UNVERSIONED_WARNING -->
---
# WARNING:
## This document is outdated. It is superseded by [the horizontal pod autoscaler design doc](../design/horizontal-pod-autoscaler.md).
---
## Abstract
Auto-scaling is a data-driven feature that allows users to increase or decrease capacity as needed by controlling the

View File

@ -1,280 +0,0 @@
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
<!-- BEGIN STRIP_FOR_RELEASE -->
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
width="25" height="25">
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
width="25" height="25">
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
width="25" height="25">
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
width="25" height="25">
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
width="25" height="25">
<h2>PLEASE NOTE: This document applies to the HEAD of the source tree</h2>
If you are using a released version of Kubernetes, you should
refer to the docs that go with that version.
<strong>
The latest 1.0.x release of this document can be found
[here](http://releases.k8s.io/release-1.0/docs/proposals/horizontal-pod-autoscaler.md).
Documentation for other releases can be found at
[releases.k8s.io](http://releases.k8s.io).
</strong>
--
<!-- END STRIP_FOR_RELEASE -->
<!-- END MUNGE: UNVERSIONED_WARNING -->
# Horizontal Pod Autoscaling
**Author**: Jerzy Szczepkowski (@jszczepkowski)
## Preface
This document briefly describes the design of the horizontal autoscaler for pods.
The autoscaler (implemented as a kubernetes control loop) will be responsible for automatically
choosing and setting the number of pods of a given type that run in a kubernetes cluster.
This proposal supersedes [autoscaling.md](http://releases.k8s.io/release-1.0/docs/proposals/autoscaling.md).
## Overview
The usage of a serving application usually vary over time: sometimes the demand for the application rises,
and sometimes it drops.
In Kubernetes version 1.0, a user can only manually set the number of serving pods.
Our aim is to provide a mechanism for the automatic adjustment of the number of pods based on usage statistics.
## Scale Subresource
We are going to introduce Scale subresource and implement horizontal autoscaling of pods based on it.
Scale subresource will be supported for replication controllers and deployments.
Scale subresource will be a Virtual Resource (will not be stored in etcd as a separate object).
It will be only present in API as an interface to accessing replication controller or deployment,
and the values of Scale fields will be inferred from the corresponding replication controller/deployment object.
HorizontalPodAutoscaler object will be bound with exactly one Scale subresource and will be
autoscaling associated replication controller/deployment through it.
The main advantage of such approach is that whenever we introduce another type we want to auto-scale,
we just need to implement Scale subresource for it (w/o modifying autoscaler code or API).
The wider discussion regarding Scale took place in [#1629](https://github.com/kubernetes/kubernetes/issues/1629).
Scale subresource will be present in API for replication controller or deployment under the following paths:
* `api/vX/replicationcontrollers/myrc/scale`
* `api/vX/deployments/mydeployment/scale`
It will have the following structure:
```go
// Scale subresource, applicable to ReplicationControllers and (in future) Deployment.
type Scale struct {
api.TypeMeta
api.ObjectMeta
// Spec defines the behavior of the scale.
Spec ScaleSpec
// Status represents the current status of the scale.
Status ScaleStatus
}
// ScaleSpec describes the attributes a Scale subresource
type ScaleSpec struct {
// Replicas is the number of desired replicas.
Replicas int
}
// ScaleStatus represents the current status of a Scale subresource.
type ScaleStatus struct {
// Replicas is the number of actual replicas.
Replicas int
// Selector is a label query over pods that should match the replicas count.
Selector map[string]string
}
```
Writing ```ScaleSpec.Replicas``` will resize the replication controller/deployment associated with
the given Scale subresource.
```ScaleStatus.Replicas``` will report how many pods are currently running in the replication controller/deployment,
and ```ScaleStatus.Selector``` will return selector for the pods.
## HorizontalPodAutoscaler Object
We will introduce HorizontalPodAutoscaler object, it will be accessible under:
```
api/vX/horizontalpodautoscalers/myautoscaler
```
It will have the following structure:
```go
// HorizontalPodAutoscaler represents the configuration of a horizontal pod autoscaler.
type HorizontalPodAutoscaler struct {
api.TypeMeta
api.ObjectMeta
// Spec defines the behaviour of autoscaler.
Spec HorizontalPodAutoscalerSpec
// Status represents the current information about the autoscaler.
Status HorizontalPodAutoscalerStatus
}
// HorizontalPodAutoscalerSpec is the specification of a horizontal pod autoscaler.
type HorizontalPodAutoscalerSpec struct {
// ScaleRef is a reference to Scale subresource. HorizontalPodAutoscaler will learn the current
// resource consumption from its status, and will set the desired number of pods by modifying its spec.
ScaleRef *SubresourceReference
// MinReplicas is the lower limit for the number of pods that can be set by the autoscaler.
MinReplicas int
// MaxReplicas is the upper limit for the number of pods that can be set by the autoscaler.
// It cannot be smaller than MinReplicas.
MaxReplicas int
// Target is the target average consumption of the given resource that the autoscaler will try
// to maintain by adjusting the desired number of pods.
// Currently this can be either "cpu" or "memory".
Target ResourceConsumption
}
// HorizontalPodAutoscalerStatus contains the current status of a horizontal pod autoscaler
type HorizontalPodAutoscalerStatus struct {
// CurrentReplicas is the number of replicas of pods managed by this autoscaler.
CurrentReplicas int
// DesiredReplicas is the desired number of replicas of pods managed by this autoscaler.
// The number may be different because pod downscaling is sometimes delayed to keep the number
// of pods stable.
DesiredReplicas int
// CurrentConsumption is the current average consumption of the given resource that the autoscaler will
// try to maintain by adjusting the desired number of pods.
// Two types of resources are supported: "cpu" and "memory".
CurrentConsumption ResourceConsumption
// LastScaleTimestamp is the last time the HorizontalPodAutoscaler scaled the number of pods.
// This is used by the autoscaler to control how often the number of pods is changed.
LastScaleTimestamp *unversioned.Time
}
// ResourceConsumption is an object for specifying average resource consumption of a particular resource.
type ResourceConsumption struct {
Resource api.ResourceName
Quantity resource.Quantity
}
```
```Scale``` will be a reference to the Scale subresource.
```MinReplicas```, ```MaxReplicas``` and ```Target``` will define autoscaler configuration.
We will also introduce HorizontalPodAutoscalerList object to enable listing all autoscalers in the cluster:
```go
// HorizontalPodAutoscaler is a collection of pod autoscalers.
type HorizontalPodAutoscalerList struct {
api.TypeMeta
api.ListMeta
Items []HorizontalPodAutoscaler
}
```
## Autoscaling Algorithm
The autoscaler will be implemented as a control loop.
It will periodically (e.g.: every 1 minute) query pods described by ```Status.PodSelector``` of Scale subresource,
and check their average CPU or memory usage from the last 1 minute
(there will be API on master for this purpose, see
[#11951](https://github.com/kubernetes/kubernetes/issues/11951).
Then, it will compare the current CPU or memory consumption with the Target,
and adjust the replicas of the Scale if needed to match the target
(preserving condition: MinReplicas <= Replicas <= MaxReplicas).
The target number of pods will be calculated from the following formula:
```
TargetNumOfPods =ceil(sum(CurrentPodsConsumption) / Target)
```
Starting and stopping pods may introduce noise to the metrics (for instance starting may temporarily increase
CPU and decrease average memory consumption) so, after each action, the autoscaler should wait some time for reliable data.
Scale-up will happen if there was no rescaling within the last 3 minutes.
Scale-down will wait for 10 minutes from the last rescaling. Moreover any scaling will only be made if
```
avg(CurrentPodsConsumption) / Target
```
drops below 0.9 or increases above 1.1 (10% tolerance). Such approach has two benefits:
* Autoscaler works in a conservative way.
If new user load appears, it is important for us to rapidly increase the number of pods,
so that user requests will not be rejected.
Lowering the number of pods is not that urgent.
* Autoscaler avoids thrashing, i.e.: prevents rapid execution of conflicting decision if the load is not stable.
## Relative vs. absolute metrics
The question arises whether the values of the target metrics should be absolute (e.g.: 0.6 core, 100MB of RAM)
or relative (e.g.: 110% of resource request, 90% of resource limit).
The argument for the relative metrics is that when user changes resources for a pod,
she will not have to change the definition of the autoscaler object, as the relative metric will still be valid.
However, we want to be able to base autoscaling on custom metrics in the future.
Such metrics will rather be absolute (e.g.: the number of queries-per-second).
Therefore, we decided to give absolute values for the target metrics in the initial version.
Please note that when custom metrics are supported, it will be possible to create additional metrics
in heapster that will divide CPU/memory consumption by resource request/limit.
From autoscaler point of view the metrics will be absolute,
although such metrics will be bring the benefits of relative metrics to the user.
## Support in kubectl
To make manipulation on HorizontalPodAutoscaler object simpler, we will add support for
creating/updating/deletion/listing of HorizontalPodAutoscaler to kubectl.
In addition, we will add kubectl support for the following use-cases:
* When running an image with ```kubectl run```, there should be an additional option to create
an autoscaler for it.
* When creating a replication controller or deployment with ```kubectl create [-f]```, there should be
a possibility to specify an additional autoscaler object.
(This should work out-of-the-box when creation of autoscaler is supported by kubectl as we may include
multiple objects in the same config file).
* We will and a new command ```kubectl autoscale``` that will allow for easy creation of an autoscaler object
for already existing replication controller/deployment.
## Next steps
We list here some features that will not be supported in the initial version of autoscaler.
However, we want to keep them in mind, as they will most probably be needed in future.
Our design is in general compatible with them.
* Autoscale pods based on metrics different than CPU & memory (e.g.: network traffic, qps).
This includes scaling based on a custom metric.
* Autoscale pods based on multiple metrics.
If the target numbers of pods for different metrics are different, choose the largest target number of pods.
* Scale the number of pods starting from 0: all pods can be turned-off,
and then turned-on when there is a demand for them.
When a request to service with no pods arrives, kube-proxy will generate an event for autoscaler
to create a new pod.
Discussed in [#3247](https://github.com/kubernetes/kubernetes/issues/3247).
* When scaling down, make more educated decision which pods to kill (e.g.: if two or more pods are on the same node, kill one of them).
Discussed in [#4301](https://github.com/kubernetes/kubernetes/issues/4301).
* Allow rule based autoscaling: instead of specifying the target value for metric,
specify a rule, e.g.: “if average CPU consumption of pod is higher than 80% add two more replicas”.
This approach was initially suggested in
[autoscaling.md](http://releases.k8s.io/release-1.0/docs/proposals/autoscaling.md) proposal.
Before doing this, we need to evaluate why the target based scaling described in this proposal is not sufficient.
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/horizontal-pod-autoscaler.md?pixel)]()
<!-- END MUNGE: GENERATED_ANALYTICS -->

View File

@ -120,7 +120,7 @@ controlled by the php-apache replication controller you created in the first ste
Roughly speaking, the horizontal autoscaler will increase and decrease the number of replicas
(via the replication controller) so as to maintain an average CPU utilization across all Pods of 50%
(since each pod requests 200 milli-cores in [rc-php-apache.yaml](rc-php-apache.yaml), this means average CPU utilization of 100 milli-cores).
See [here](../../../docs/proposals/horizontal-pod-autoscaler.md#autoscaling-algorithm) for more details on the algorithm.
See [here](../../../docs/design/horizontal-pod-autoscaler.md#autoscaling-algorithm) for more details on the algorithm.
We will create the autoscaler by executing the following command: