mirror of
https://github.com/k3s-io/kubernetes.git
synced 2025-07-23 11:50:44 +00:00
Merge pull request #12291 from derekwaynecarr/resource_quota_requests
Update resource quota design to align with requests and limits
This commit is contained in:
commit
256eeeda2b
@ -35,13 +35,17 @@ Documentation for other releases can be found at
|
||||
|
||||
## Background
|
||||
|
||||
This document proposes a system for enforcing hard resource usage limits per namespace as part of admission control.
|
||||
This document describes a system for enforcing hard resource usage limits per namespace as part of admission control.
|
||||
|
||||
## Model Changes
|
||||
## Use cases
|
||||
|
||||
A new resource, **ResourceQuota**, is introduced to enumerate hard resource limits in a Kubernetes namespace.
|
||||
1. Ability to enumerate resource usage limits per namespace.
|
||||
2. Ability to monitor resource usage for tracked resources.
|
||||
3. Ability to reject resource usage exceeding hard quotas.
|
||||
|
||||
A new resource, **ResourceQuotaUsage**, is introduced to support atomic updates of a **ResourceQuota** status.
|
||||
## Data Model
|
||||
|
||||
The **ResourceQuota** object is scoped to a **Namespace**.
|
||||
|
||||
```go
|
||||
// The following identify resource constants for Kubernetes object types
|
||||
@ -54,109 +58,139 @@ const (
|
||||
ResourceReplicationControllers ResourceName = "replicationcontrollers"
|
||||
// ResourceQuotas, number
|
||||
ResourceQuotas ResourceName = "resourcequotas"
|
||||
// ResourceSecrets, number
|
||||
ResourceSecrets ResourceName = "secrets"
|
||||
// ResourcePersistentVolumeClaims, number
|
||||
ResourcePersistentVolumeClaims ResourceName = "persistentvolumeclaims"
|
||||
)
|
||||
|
||||
// ResourceQuotaSpec defines the desired hard limits to enforce for Quota
|
||||
type ResourceQuotaSpec struct {
|
||||
// Hard is the set of desired hard limits for each named resource
|
||||
Hard ResourceList `json:"hard,omitempty"`
|
||||
Hard ResourceList `json:"hard,omitempty" description:"hard is the set of desired hard limits for each named resource; see http://releases.k8s.io/HEAD/docs/design/admission_control_resource_quota.md#admissioncontrol-plugin-resourcequota"`
|
||||
}
|
||||
|
||||
// ResourceQuotaStatus defines the enforced hard limits and observed use
|
||||
type ResourceQuotaStatus struct {
|
||||
// Hard is the set of enforced hard limits for each named resource
|
||||
Hard ResourceList `json:"hard,omitempty"`
|
||||
Hard ResourceList `json:"hard,omitempty" description:"hard is the set of enforced hard limits for each named resource; see http://releases.k8s.io/HEAD/docs/design/admission_control_resource_quota.md#admissioncontrol-plugin-resourcequota"`
|
||||
// Used is the current observed total usage of the resource in the namespace
|
||||
Used ResourceList `json:"used,omitempty"`
|
||||
Used ResourceList `json:"used,omitempty" description:"used is the current observed total usage of the resource in the namespace"`
|
||||
}
|
||||
|
||||
// ResourceQuota sets aggregate quota restrictions enforced per namespace
|
||||
type ResourceQuota struct {
|
||||
TypeMeta `json:",inline"`
|
||||
ObjectMeta `json:"metadata,omitempty"`
|
||||
ObjectMeta `json:"metadata,omitempty" description:"standard object metadata; see http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#metadata"`
|
||||
|
||||
// Spec defines the desired quota
|
||||
Spec ResourceQuotaSpec `json:"spec,omitempty"`
|
||||
Spec ResourceQuotaSpec `json:"spec,omitempty" description:"spec defines the desired quota; http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#spec-and-status"`
|
||||
|
||||
// Status defines the actual enforced quota and its current usage
|
||||
Status ResourceQuotaStatus `json:"status,omitempty"`
|
||||
}
|
||||
|
||||
// ResourceQuotaUsage captures system observed quota status per namespace
|
||||
// It is used to enforce atomic updates of a backing ResourceQuota.Status field in storage
|
||||
type ResourceQuotaUsage struct {
|
||||
TypeMeta `json:",inline"`
|
||||
ObjectMeta `json:"metadata,omitempty"`
|
||||
|
||||
// Status defines the actual enforced quota and its current usage
|
||||
Status ResourceQuotaStatus `json:"status,omitempty"`
|
||||
Status ResourceQuotaStatus `json:"status,omitempty" description:"status defines the actual enforced quota and current usage; http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#spec-and-status"`
|
||||
}
|
||||
|
||||
// ResourceQuotaList is a list of ResourceQuota items
|
||||
type ResourceQuotaList struct {
|
||||
TypeMeta `json:",inline"`
|
||||
ListMeta `json:"metadata,omitempty"`
|
||||
ListMeta `json:"metadata,omitempty" description:"standard list metadata; see http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#metadata"`
|
||||
|
||||
// Items is a list of ResourceQuota objects
|
||||
Items []ResourceQuota `json:"items"`
|
||||
Items []ResourceQuota `json:"items" description:"items is a list of ResourceQuota objects; see http://releases.k8s.io/HEAD/docs/design/admission_control_resource_quota.md#admissioncontrol-plugin-resourcequota"`
|
||||
}
|
||||
```
|
||||
|
||||
## Quota Tracked Resources
|
||||
|
||||
The following resources are supported by the quota system.
|
||||
|
||||
| Resource | Description |
|
||||
| ------------ | ----------- |
|
||||
| cpu | Total requested cpu usage |
|
||||
| memory | Total requested memory usage |
|
||||
| pods | Total number of active pods where phase is pending or active. |
|
||||
| services | Total number of services |
|
||||
| replicationcontrollers | Total number of replication controllers |
|
||||
| resourcequotas | Total number of resource quotas |
|
||||
| secrets | Total number of secrets |
|
||||
| persistentvolumeclaims | Total number of persistent volume claims |
|
||||
|
||||
If a third-party wants to track additional resources, it must follow the resource naming conventions prescribed
|
||||
by Kubernetes. This means the resource must have a fully-qualified name (i.e. mycompany.org/shinynewresource)
|
||||
|
||||
## Resource Requirements: Requests vs Limits
|
||||
|
||||
If a resource supports the ability to distinguish between a request and a limit for a resource,
|
||||
the quota tracking system will only cost the request value against the quota usage. If a resource
|
||||
is tracked by quota, and no request value is provided, the associated entity is rejected as part of admission.
|
||||
|
||||
For an example, consider the following scenarios relative to tracking quota on CPU:
|
||||
|
||||
| Pod | Container | Request CPU | Limit CPU | Result |
|
||||
| --- | --------- | ----------- | --------- | ------ |
|
||||
| X | C1 | 100m | 500m | The quota usage is incremented 100m |
|
||||
| Y | C2 | 100m | none | The quota usage is incremented 100m |
|
||||
| Y | C2 | none | 500m | The quota usage is incremented 500m since request will default to limit |
|
||||
| Z | C3 | none | none | The pod is rejected since it does not enumerate a request. |
|
||||
|
||||
The rationale for accounting for the requested amount of a resource versus the limit is the belief
|
||||
that a user should only be charged for what they are scheduled against in the cluster. In addition,
|
||||
attempting to track usage against actual usage, where request < actual < limit, is considered highly
|
||||
volatile.
|
||||
|
||||
As a consequence of this decision, the user is able to spread its usage of a resource across multiple tiers
|
||||
of service. Let's demonstrate this via an example with a 4 cpu quota.
|
||||
|
||||
The quota may be allocated as follows:
|
||||
|
||||
| Pod | Container | Request CPU | Limit CPU | Tier | Quota Usage |
|
||||
| --- | --------- | ----------- | --------- | ---- | ----------- |
|
||||
| X | C1 | 1 | 4 | Burstable | 1 |
|
||||
| Y | C2 | 2 | 2 | Guaranteed | 2 |
|
||||
| Z | C3 | 1 | 3 | Burstable | 1 |
|
||||
|
||||
It is possible that the pods may consume 9 cpu over a given time period depending on the nodes available cpu
|
||||
that held pod X and Z, but since we scheduled X and Z relative to the request, we only track the requesting
|
||||
value against their allocated quota. If one wants to restrict the ratio between the request and limit,
|
||||
it is encouraged that the user define a **LimitRange** with **LimitRequestRatio** to control burst out behavior.
|
||||
This would in effect, let an administrator keep the difference between request and limit more in line with
|
||||
tracked usage if desired.
|
||||
|
||||
## Status API
|
||||
|
||||
A REST API endpoint to update the status section of the **ResourceQuota** is exposed. It requires an atomic compare-and-swap
|
||||
in order to keep resource usage tracking consistent.
|
||||
|
||||
## Resource Quota Controller
|
||||
|
||||
A resource quota controller monitors observed usage for tracked resources in the **Namespace**.
|
||||
|
||||
If there is observed difference between the current usage stats versus the current **ResourceQuota.Status**, the controller
|
||||
posts an update of the currently observed usage metrics to the **ResourceQuota** via the /status endpoint.
|
||||
|
||||
The resource quota controller is the only component capable of monitoring and recording usage updates after a DELETE operation
|
||||
since admission control is incapable of guaranteeing a DELETE request actually succeeded.
|
||||
|
||||
## AdmissionControl plugin: ResourceQuota
|
||||
|
||||
The **ResourceQuota** plug-in introspects all incoming admission requests.
|
||||
|
||||
It makes decisions by evaluating the incoming object against all defined **ResourceQuota.Status.Hard** resource limits in the request
|
||||
namespace. If acceptance of the resource would cause the total usage of a named resource to exceed its hard limit, the request is denied.
|
||||
To enable the plug-in and support for ResourceQuota, the kube-apiserver must be configured as follows:
|
||||
|
||||
The following resource limits are imposed as part of core Kubernetes at the namespace level:
|
||||
|
||||
| ResourceName | Description |
|
||||
| ------------ | ----------- |
|
||||
| cpu | Total cpu usage |
|
||||
| memory | Total memory usage |
|
||||
| pods | Total number of pods |
|
||||
| services | Total number of services |
|
||||
| replicationcontrollers | Total number of replication controllers |
|
||||
| resourcequotas | Total number of resource quotas |
|
||||
|
||||
Any resource that is not part of core Kubernetes must follow the resource naming convention prescribed by Kubernetes.
|
||||
|
||||
This means the resource must have a fully-qualified name (i.e. mycompany.org/shinynewresource)
|
||||
|
||||
If the incoming request does not cause the total usage to exceed any of the enumerated hard resource limits, the plug-in will post a
|
||||
**ResourceQuotaUsage** document to the server to atomically update the observed usage based on the previously read
|
||||
**ResourceQuota.ResourceVersion**. This keeps incremental usage atomically consistent, but does introduce a bottleneck (intentionally)
|
||||
into the system.
|
||||
|
||||
To optimize system performance, it is encouraged that all resource quotas are tracked on the same **ResourceQuota** document. As a result,
|
||||
its encouraged to actually impose a cap on the total number of individual quotas that are tracked in the **Namespace** to 1 by explicitly
|
||||
capping it in **ResourceQuota** document.
|
||||
|
||||
## kube-apiserver
|
||||
|
||||
The server is updated to be aware of **ResourceQuota** objects.
|
||||
|
||||
The quota is only enforced if the kube-apiserver is started as follows:
|
||||
|
||||
```console
|
||||
```
|
||||
$ kube-apiserver -admission_control=ResourceQuota
|
||||
```
|
||||
|
||||
## kube-controller-manager
|
||||
It makes decisions by evaluating the incoming object against all defined **ResourceQuota.Status.Hard** resource limits in the request
|
||||
namespace. If acceptance of the resource would cause the total usage of a named resource to exceed its hard limit, the request is denied.
|
||||
|
||||
A new controller is defined that runs a synch loop to calculate quota usage across the namespace.
|
||||
If the incoming request does not cause the total usage to exceed any of the enumerated hard resource limits, the plug-in will post a
|
||||
**ResourceQuota.Status** document to the server to atomically update the observed usage based on the previously read
|
||||
**ResourceQuota.ResourceVersion**. This keeps incremental usage atomically consistent, but does introduce a bottleneck (intentionally)
|
||||
into the system.
|
||||
|
||||
**ResourceQuota** usage is only calculated if a namespace has a **ResourceQuota** object.
|
||||
|
||||
If the observed usage is different than the recorded usage, the controller sends a **ResourceQuotaUsage** resource
|
||||
to the server to atomically update.
|
||||
|
||||
The synchronization loop frequency will control how quickly DELETE actions are recorded in the system and usage is ticked down.
|
||||
|
||||
To optimize the synchronization loop, this controller will WATCH on Pod resources to track DELETE events, and in response, recalculate
|
||||
usage. This is because a Pod deletion will have the most impact on observed cpu and memory usage in the system, and we anticipate
|
||||
this being the resource most closely running at the prescribed quota limits.
|
||||
To optimize system performance, it is encouraged that all resource quotas are tracked on the same **ResourceQuota** document in a **Namespace**. As a result, its encouraged to impose a cap on the total number of individual quotas that are tracked in the **Namespace**
|
||||
to 1 in the **ResourceQuota** document.
|
||||
|
||||
## kubectl
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user