diff --git a/docs/design/admission_control_resource_quota.md b/docs/design/admission_control_resource_quota.md index 136603d2c36..bb7c6e0a3fc 100644 --- a/docs/design/admission_control_resource_quota.md +++ b/docs/design/admission_control_resource_quota.md @@ -35,13 +35,17 @@ Documentation for other releases can be found at ## Background -This document proposes a system for enforcing hard resource usage limits per namespace as part of admission control. +This document describes a system for enforcing hard resource usage limits per namespace as part of admission control. -## Model Changes +## Use cases -A new resource, **ResourceQuota**, is introduced to enumerate hard resource limits in a Kubernetes namespace. +1. Ability to enumerate resource usage limits per namespace. +2. Ability to monitor resource usage for tracked resources. +3. Ability to reject resource usage exceeding hard quotas. -A new resource, **ResourceQuotaUsage**, is introduced to support atomic updates of a **ResourceQuota** status. +## Data Model + +The **ResourceQuota** object is scoped to a **Namespace**. ```go // The following identify resource constants for Kubernetes object types @@ -54,109 +58,139 @@ const ( ResourceReplicationControllers ResourceName = "replicationcontrollers" // ResourceQuotas, number ResourceQuotas ResourceName = "resourcequotas" + // ResourceSecrets, number + ResourceSecrets ResourceName = "secrets" + // ResourcePersistentVolumeClaims, number + ResourcePersistentVolumeClaims ResourceName = "persistentvolumeclaims" ) // ResourceQuotaSpec defines the desired hard limits to enforce for Quota type ResourceQuotaSpec struct { // Hard is the set of desired hard limits for each named resource - Hard ResourceList `json:"hard,omitempty"` + Hard ResourceList `json:"hard,omitempty" description:"hard is the set of desired hard limits for each named resource; see http://releases.k8s.io/HEAD/docs/design/admission_control_resource_quota.md#admissioncontrol-plugin-resourcequota"` } // ResourceQuotaStatus defines the enforced hard limits and observed use type ResourceQuotaStatus struct { // Hard is the set of enforced hard limits for each named resource - Hard ResourceList `json:"hard,omitempty"` + Hard ResourceList `json:"hard,omitempty" description:"hard is the set of enforced hard limits for each named resource; see http://releases.k8s.io/HEAD/docs/design/admission_control_resource_quota.md#admissioncontrol-plugin-resourcequota"` // Used is the current observed total usage of the resource in the namespace - Used ResourceList `json:"used,omitempty"` + Used ResourceList `json:"used,omitempty" description:"used is the current observed total usage of the resource in the namespace"` } // ResourceQuota sets aggregate quota restrictions enforced per namespace type ResourceQuota struct { TypeMeta `json:",inline"` - ObjectMeta `json:"metadata,omitempty"` + ObjectMeta `json:"metadata,omitempty" description:"standard object metadata; see http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#metadata"` // Spec defines the desired quota - Spec ResourceQuotaSpec `json:"spec,omitempty"` + Spec ResourceQuotaSpec `json:"spec,omitempty" description:"spec defines the desired quota; http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#spec-and-status"` // Status defines the actual enforced quota and its current usage - Status ResourceQuotaStatus `json:"status,omitempty"` -} - -// ResourceQuotaUsage captures system observed quota status per namespace -// It is used to enforce atomic updates of a backing ResourceQuota.Status field in storage -type ResourceQuotaUsage struct { - TypeMeta `json:",inline"` - ObjectMeta `json:"metadata,omitempty"` - - // Status defines the actual enforced quota and its current usage - Status ResourceQuotaStatus `json:"status,omitempty"` + Status ResourceQuotaStatus `json:"status,omitempty" description:"status defines the actual enforced quota and current usage; http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#spec-and-status"` } // ResourceQuotaList is a list of ResourceQuota items type ResourceQuotaList struct { TypeMeta `json:",inline"` - ListMeta `json:"metadata,omitempty"` + ListMeta `json:"metadata,omitempty" description:"standard list metadata; see http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#metadata"` // Items is a list of ResourceQuota objects - Items []ResourceQuota `json:"items"` + Items []ResourceQuota `json:"items" description:"items is a list of ResourceQuota objects; see http://releases.k8s.io/HEAD/docs/design/admission_control_resource_quota.md#admissioncontrol-plugin-resourcequota"` } ``` +## Quota Tracked Resources + +The following resources are supported by the quota system. + +| Resource | Description | +| ------------ | ----------- | +| cpu | Total requested cpu usage | +| memory | Total requested memory usage | +| pods | Total number of active pods where phase is pending or active. | +| services | Total number of services | +| replicationcontrollers | Total number of replication controllers | +| resourcequotas | Total number of resource quotas | +| secrets | Total number of secrets | +| persistentvolumeclaims | Total number of persistent volume claims | + +If a third-party wants to track additional resources, it must follow the resource naming conventions prescribed +by Kubernetes. This means the resource must have a fully-qualified name (i.e. mycompany.org/shinynewresource) + +## Resource Requirements: Requests vs Limits + +If a resource supports the ability to distinguish between a request and a limit for a resource, +the quota tracking system will only cost the request value against the quota usage. If a resource +is tracked by quota, and no request value is provided, the associated entity is rejected as part of admission. + +For an example, consider the following scenarios relative to tracking quota on CPU: + +| Pod | Container | Request CPU | Limit CPU | Result | +| --- | --------- | ----------- | --------- | ------ | +| X | C1 | 100m | 500m | The quota usage is incremented 100m | +| Y | C2 | 100m | none | The quota usage is incremented 100m | +| Y | C2 | none | 500m | The quota usage is incremented 500m since request will default to limit | +| Z | C3 | none | none | The pod is rejected since it does not enumerate a request. | + +The rationale for accounting for the requested amount of a resource versus the limit is the belief +that a user should only be charged for what they are scheduled against in the cluster. In addition, +attempting to track usage against actual usage, where request < actual < limit, is considered highly +volatile. + +As a consequence of this decision, the user is able to spread its usage of a resource across multiple tiers +of service. Let's demonstrate this via an example with a 4 cpu quota. + +The quota may be allocated as follows: + +| Pod | Container | Request CPU | Limit CPU | Tier | Quota Usage | +| --- | --------- | ----------- | --------- | ---- | ----------- | +| X | C1 | 1 | 4 | Burstable | 1 | +| Y | C2 | 2 | 2 | Guaranteed | 2 | +| Z | C3 | 1 | 3 | Burstable | 1 | + +It is possible that the pods may consume 9 cpu over a given time period depending on the nodes available cpu +that held pod X and Z, but since we scheduled X and Z relative to the request, we only track the requesting +value against their allocated quota. If one wants to restrict the ratio between the request and limit, +it is encouraged that the user define a **LimitRange** with **LimitRequestRatio** to control burst out behavior. +This would in effect, let an administrator keep the difference between request and limit more in line with +tracked usage if desired. + +## Status API + +A REST API endpoint to update the status section of the **ResourceQuota** is exposed. It requires an atomic compare-and-swap +in order to keep resource usage tracking consistent. + +## Resource Quota Controller + +A resource quota controller monitors observed usage for tracked resources in the **Namespace**. + +If there is observed difference between the current usage stats versus the current **ResourceQuota.Status**, the controller +posts an update of the currently observed usage metrics to the **ResourceQuota** via the /status endpoint. + +The resource quota controller is the only component capable of monitoring and recording usage updates after a DELETE operation +since admission control is incapable of guaranteeing a DELETE request actually succeeded. + ## AdmissionControl plugin: ResourceQuota The **ResourceQuota** plug-in introspects all incoming admission requests. -It makes decisions by evaluating the incoming object against all defined **ResourceQuota.Status.Hard** resource limits in the request -namespace. If acceptance of the resource would cause the total usage of a named resource to exceed its hard limit, the request is denied. +To enable the plug-in and support for ResourceQuota, the kube-apiserver must be configured as follows: -The following resource limits are imposed as part of core Kubernetes at the namespace level: - -| ResourceName | Description | -| ------------ | ----------- | -| cpu | Total cpu usage | -| memory | Total memory usage | -| pods | Total number of pods | -| services | Total number of services | -| replicationcontrollers | Total number of replication controllers | -| resourcequotas | Total number of resource quotas | - -Any resource that is not part of core Kubernetes must follow the resource naming convention prescribed by Kubernetes. - -This means the resource must have a fully-qualified name (i.e. mycompany.org/shinynewresource) - -If the incoming request does not cause the total usage to exceed any of the enumerated hard resource limits, the plug-in will post a -**ResourceQuotaUsage** document to the server to atomically update the observed usage based on the previously read -**ResourceQuota.ResourceVersion**. This keeps incremental usage atomically consistent, but does introduce a bottleneck (intentionally) -into the system. - -To optimize system performance, it is encouraged that all resource quotas are tracked on the same **ResourceQuota** document. As a result, -its encouraged to actually impose a cap on the total number of individual quotas that are tracked in the **Namespace** to 1 by explicitly -capping it in **ResourceQuota** document. - -## kube-apiserver - -The server is updated to be aware of **ResourceQuota** objects. - -The quota is only enforced if the kube-apiserver is started as follows: - -```console +``` $ kube-apiserver -admission_control=ResourceQuota ``` -## kube-controller-manager +It makes decisions by evaluating the incoming object against all defined **ResourceQuota.Status.Hard** resource limits in the request +namespace. If acceptance of the resource would cause the total usage of a named resource to exceed its hard limit, the request is denied. -A new controller is defined that runs a synch loop to calculate quota usage across the namespace. +If the incoming request does not cause the total usage to exceed any of the enumerated hard resource limits, the plug-in will post a +**ResourceQuota.Status** document to the server to atomically update the observed usage based on the previously read +**ResourceQuota.ResourceVersion**. This keeps incremental usage atomically consistent, but does introduce a bottleneck (intentionally) +into the system. -**ResourceQuota** usage is only calculated if a namespace has a **ResourceQuota** object. - -If the observed usage is different than the recorded usage, the controller sends a **ResourceQuotaUsage** resource -to the server to atomically update. - -The synchronization loop frequency will control how quickly DELETE actions are recorded in the system and usage is ticked down. - -To optimize the synchronization loop, this controller will WATCH on Pod resources to track DELETE events, and in response, recalculate -usage. This is because a Pod deletion will have the most impact on observed cpu and memory usage in the system, and we anticipate -this being the resource most closely running at the prescribed quota limits. +To optimize system performance, it is encouraged that all resource quotas are tracked on the same **ResourceQuota** document in a **Namespace**. As a result, its encouraged to impose a cap on the total number of individual quotas that are tracked in the **Namespace** +to 1 in the **ResourceQuota** document. ## kubectl