Merge pull request #12291 from derekwaynecarr/resource_quota_requests

Update resource quota design to align with requests and limits
This commit is contained in:
Dawn Chen 2015-08-06 16:07:42 -07:00
commit 256eeeda2b

View File

@ -35,13 +35,17 @@ Documentation for other releases can be found at
## Background
This document proposes a system for enforcing hard resource usage limits per namespace as part of admission control.
This document describes a system for enforcing hard resource usage limits per namespace as part of admission control.
## Model Changes
## Use cases
A new resource, **ResourceQuota**, is introduced to enumerate hard resource limits in a Kubernetes namespace.
1. Ability to enumerate resource usage limits per namespace.
2. Ability to monitor resource usage for tracked resources.
3. Ability to reject resource usage exceeding hard quotas.
A new resource, **ResourceQuotaUsage**, is introduced to support atomic updates of a **ResourceQuota** status.
## Data Model
The **ResourceQuota** object is scoped to a **Namespace**.
```go
// The following identify resource constants for Kubernetes object types
@ -54,109 +58,139 @@ const (
ResourceReplicationControllers ResourceName = "replicationcontrollers"
// ResourceQuotas, number
ResourceQuotas ResourceName = "resourcequotas"
// ResourceSecrets, number
ResourceSecrets ResourceName = "secrets"
// ResourcePersistentVolumeClaims, number
ResourcePersistentVolumeClaims ResourceName = "persistentvolumeclaims"
)
// ResourceQuotaSpec defines the desired hard limits to enforce for Quota
type ResourceQuotaSpec struct {
// Hard is the set of desired hard limits for each named resource
Hard ResourceList `json:"hard,omitempty"`
Hard ResourceList `json:"hard,omitempty" description:"hard is the set of desired hard limits for each named resource; see http://releases.k8s.io/HEAD/docs/design/admission_control_resource_quota.md#admissioncontrol-plugin-resourcequota"`
}
// ResourceQuotaStatus defines the enforced hard limits and observed use
type ResourceQuotaStatus struct {
// Hard is the set of enforced hard limits for each named resource
Hard ResourceList `json:"hard,omitempty"`
Hard ResourceList `json:"hard,omitempty" description:"hard is the set of enforced hard limits for each named resource; see http://releases.k8s.io/HEAD/docs/design/admission_control_resource_quota.md#admissioncontrol-plugin-resourcequota"`
// Used is the current observed total usage of the resource in the namespace
Used ResourceList `json:"used,omitempty"`
Used ResourceList `json:"used,omitempty" description:"used is the current observed total usage of the resource in the namespace"`
}
// ResourceQuota sets aggregate quota restrictions enforced per namespace
type ResourceQuota struct {
TypeMeta `json:",inline"`
ObjectMeta `json:"metadata,omitempty"`
ObjectMeta `json:"metadata,omitempty" description:"standard object metadata; see http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#metadata"`
// Spec defines the desired quota
Spec ResourceQuotaSpec `json:"spec,omitempty"`
Spec ResourceQuotaSpec `json:"spec,omitempty" description:"spec defines the desired quota; http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#spec-and-status"`
// Status defines the actual enforced quota and its current usage
Status ResourceQuotaStatus `json:"status,omitempty"`
}
// ResourceQuotaUsage captures system observed quota status per namespace
// It is used to enforce atomic updates of a backing ResourceQuota.Status field in storage
type ResourceQuotaUsage struct {
TypeMeta `json:",inline"`
ObjectMeta `json:"metadata,omitempty"`
// Status defines the actual enforced quota and its current usage
Status ResourceQuotaStatus `json:"status,omitempty"`
Status ResourceQuotaStatus `json:"status,omitempty" description:"status defines the actual enforced quota and current usage; http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#spec-and-status"`
}
// ResourceQuotaList is a list of ResourceQuota items
type ResourceQuotaList struct {
TypeMeta `json:",inline"`
ListMeta `json:"metadata,omitempty"`
ListMeta `json:"metadata,omitempty" description:"standard list metadata; see http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#metadata"`
// Items is a list of ResourceQuota objects
Items []ResourceQuota `json:"items"`
Items []ResourceQuota `json:"items" description:"items is a list of ResourceQuota objects; see http://releases.k8s.io/HEAD/docs/design/admission_control_resource_quota.md#admissioncontrol-plugin-resourcequota"`
}
```
## Quota Tracked Resources
The following resources are supported by the quota system.
| Resource | Description |
| ------------ | ----------- |
| cpu | Total requested cpu usage |
| memory | Total requested memory usage |
| pods | Total number of active pods where phase is pending or active. |
| services | Total number of services |
| replicationcontrollers | Total number of replication controllers |
| resourcequotas | Total number of resource quotas |
| secrets | Total number of secrets |
| persistentvolumeclaims | Total number of persistent volume claims |
If a third-party wants to track additional resources, it must follow the resource naming conventions prescribed
by Kubernetes. This means the resource must have a fully-qualified name (i.e. mycompany.org/shinynewresource)
## Resource Requirements: Requests vs Limits
If a resource supports the ability to distinguish between a request and a limit for a resource,
the quota tracking system will only cost the request value against the quota usage. If a resource
is tracked by quota, and no request value is provided, the associated entity is rejected as part of admission.
For an example, consider the following scenarios relative to tracking quota on CPU:
| Pod | Container | Request CPU | Limit CPU | Result |
| --- | --------- | ----------- | --------- | ------ |
| X | C1 | 100m | 500m | The quota usage is incremented 100m |
| Y | C2 | 100m | none | The quota usage is incremented 100m |
| Y | C2 | none | 500m | The quota usage is incremented 500m since request will default to limit |
| Z | C3 | none | none | The pod is rejected since it does not enumerate a request. |
The rationale for accounting for the requested amount of a resource versus the limit is the belief
that a user should only be charged for what they are scheduled against in the cluster. In addition,
attempting to track usage against actual usage, where request < actual < limit, is considered highly
volatile.
As a consequence of this decision, the user is able to spread its usage of a resource across multiple tiers
of service. Let's demonstrate this via an example with a 4 cpu quota.
The quota may be allocated as follows:
| Pod | Container | Request CPU | Limit CPU | Tier | Quota Usage |
| --- | --------- | ----------- | --------- | ---- | ----------- |
| X | C1 | 1 | 4 | Burstable | 1 |
| Y | C2 | 2 | 2 | Guaranteed | 2 |
| Z | C3 | 1 | 3 | Burstable | 1 |
It is possible that the pods may consume 9 cpu over a given time period depending on the nodes available cpu
that held pod X and Z, but since we scheduled X and Z relative to the request, we only track the requesting
value against their allocated quota. If one wants to restrict the ratio between the request and limit,
it is encouraged that the user define a **LimitRange** with **LimitRequestRatio** to control burst out behavior.
This would in effect, let an administrator keep the difference between request and limit more in line with
tracked usage if desired.
## Status API
A REST API endpoint to update the status section of the **ResourceQuota** is exposed. It requires an atomic compare-and-swap
in order to keep resource usage tracking consistent.
## Resource Quota Controller
A resource quota controller monitors observed usage for tracked resources in the **Namespace**.
If there is observed difference between the current usage stats versus the current **ResourceQuota.Status**, the controller
posts an update of the currently observed usage metrics to the **ResourceQuota** via the /status endpoint.
The resource quota controller is the only component capable of monitoring and recording usage updates after a DELETE operation
since admission control is incapable of guaranteeing a DELETE request actually succeeded.
## AdmissionControl plugin: ResourceQuota
The **ResourceQuota** plug-in introspects all incoming admission requests.
It makes decisions by evaluating the incoming object against all defined **ResourceQuota.Status.Hard** resource limits in the request
namespace. If acceptance of the resource would cause the total usage of a named resource to exceed its hard limit, the request is denied.
To enable the plug-in and support for ResourceQuota, the kube-apiserver must be configured as follows:
The following resource limits are imposed as part of core Kubernetes at the namespace level:
| ResourceName | Description |
| ------------ | ----------- |
| cpu | Total cpu usage |
| memory | Total memory usage |
| pods | Total number of pods |
| services | Total number of services |
| replicationcontrollers | Total number of replication controllers |
| resourcequotas | Total number of resource quotas |
Any resource that is not part of core Kubernetes must follow the resource naming convention prescribed by Kubernetes.
This means the resource must have a fully-qualified name (i.e. mycompany.org/shinynewresource)
If the incoming request does not cause the total usage to exceed any of the enumerated hard resource limits, the plug-in will post a
**ResourceQuotaUsage** document to the server to atomically update the observed usage based on the previously read
**ResourceQuota.ResourceVersion**. This keeps incremental usage atomically consistent, but does introduce a bottleneck (intentionally)
into the system.
To optimize system performance, it is encouraged that all resource quotas are tracked on the same **ResourceQuota** document. As a result,
its encouraged to actually impose a cap on the total number of individual quotas that are tracked in the **Namespace** to 1 by explicitly
capping it in **ResourceQuota** document.
## kube-apiserver
The server is updated to be aware of **ResourceQuota** objects.
The quota is only enforced if the kube-apiserver is started as follows:
```console
```
$ kube-apiserver -admission_control=ResourceQuota
```
## kube-controller-manager
It makes decisions by evaluating the incoming object against all defined **ResourceQuota.Status.Hard** resource limits in the request
namespace. If acceptance of the resource would cause the total usage of a named resource to exceed its hard limit, the request is denied.
A new controller is defined that runs a synch loop to calculate quota usage across the namespace.
If the incoming request does not cause the total usage to exceed any of the enumerated hard resource limits, the plug-in will post a
**ResourceQuota.Status** document to the server to atomically update the observed usage based on the previously read
**ResourceQuota.ResourceVersion**. This keeps incremental usage atomically consistent, but does introduce a bottleneck (intentionally)
into the system.
**ResourceQuota** usage is only calculated if a namespace has a **ResourceQuota** object.
If the observed usage is different than the recorded usage, the controller sends a **ResourceQuotaUsage** resource
to the server to atomically update.
The synchronization loop frequency will control how quickly DELETE actions are recorded in the system and usage is ticked down.
To optimize the synchronization loop, this controller will WATCH on Pod resources to track DELETE events, and in response, recalculate
usage. This is because a Pod deletion will have the most impact on observed cpu and memory usage in the system, and we anticipate
this being the resource most closely running at the prescribed quota limits.
To optimize system performance, it is encouraged that all resource quotas are tracked on the same **ResourceQuota** document in a **Namespace**. As a result, its encouraged to impose a cap on the total number of individual quotas that are tracked in the **Namespace**
to 1 in the **ResourceQuota** document.
## kubectl