From f782089bb2152ee405809fec5f13c73af3967de0 Mon Sep 17 00:00:00 2001 From: derekwaynecarr Date: Sat, 16 Jan 2016 11:57:40 -0500 Subject: [PATCH] Proposal: ResourceQuota scoping support --- docs/proposals/resource-quota-scoping.md | 362 +++++++++++++++++++++++ 1 file changed, 362 insertions(+) create mode 100644 docs/proposals/resource-quota-scoping.md diff --git a/docs/proposals/resource-quota-scoping.md b/docs/proposals/resource-quota-scoping.md new file mode 100644 index 00000000000..7ed499f862a --- /dev/null +++ b/docs/proposals/resource-quota-scoping.md @@ -0,0 +1,362 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +# Resource Quota - Scoping resources + +## Problem Description + +### Ability to limit compute requests and limits + +The existing `ResourceQuota` API object constrains the total amount of compute +resource requests. This is useful when a cluster-admin is interested in +controlling explicit resource guarantees such that there would be a relatively +strong guarantee that pods created by users who stay within their quota will find +enough free resources in the cluster to be able to schedule. The end-user creating +the pod is expected to have intimate knowledge on their minimum required resource +as well as their potential limits. + +There are many environments where a cluster-admin does not extend this level +of trust to their end-user because user's often request too much resource, and +they have trouble reasoning about what they hope to have available for their +application versus what their application actually needs. In these environments, +the cluster-admin will often just expose a single value (the limit) to the end-user. +Internally, they may choose a variety of other strategies for setting the request. +For example, some cluster operators are focused on satisfying a particular over-commit +ratio and may choose to set the request as a factor of the limit to control for +over-commit. Other cluster operators may defer to a resource estimation tool that +sets the request based on known historical trends. In this environment, the +cluster-admin is interested in exposing a quota to their end-users that maps +to their desired limit instead of their request since that is the value the user +manages. + +### Ability to limit impact to node and promote fair-use + +The current `ResourceQuota` API object does not allow the ability +to quota best-effort pods separately from pods with resource guarantees. +For example, if a cluster-admin applies a quota that caps requested +cpu at 10 cores and memory at 10Gi, all pods in the namespace must +make an explicit resource request for cpu and memory to satisfy +quota. This prevents a namespace with a quota from supporting best-effort +pods. + +In practice, the cluster-admin wants to control the impact of best-effort +pods to the cluster, but not restrict the ability to run best-effort pods +altogether. + +As a result, the cluster-admin requires the ability to control the +max number of active best-effort pods. In addition, the cluster-admin +requires the ability to scope a quota that limits compute resources to +exclude best-effort pods. + +### Ability to quota long-running vs bounded-duration compute resources + +The cluster-admin may want to quota end-users separately +based on long-running vs bounded-duration compute resources. + +For example, a cluster-admin may offer more compute resources +for long running pods that are expected to have a more permanent residence +on the node than bounded-duration pods. Many batch style workloads +tend to consume as much resource as they can until something else applies +the brakes. As a result, these workloads tend to operate at their limit, +while many traditional web applications may often consume closer to their +request if there is no active traffic. An operator that wants to control +density will offer lower quota limits for batch workloads than web applications. + +A classic example is a PaaS deployment where the cluster-admin may +allow a separate budget for pods that run their web application vs pods that +build web applications. + +Another example is providing more quota to a database pod than a +pod that performs a database migration. + +## Use Cases + +* As a cluster-admin, I want the ability to quota + * compute resource requests + * compute resource limits + * compute resources for terminating vs non-terminating workloads + * compute resources for best-effort vs non-best-effort pods + +## Proposed Change + +### New quota tracked resources + +Support the following resources that can be tracked by quota. + +| Resource Name | Description | +| ------------- | ----------- | +| cpu | total cpu requests (backwards compatibility) | +| cpu.request | total cpu requests | +| cpu.limit | total cpu limits | +| memory | total memory requests (backwards compatibility) | +| memory.request | total memory requests | +| memory.limit | total memory limits | + +### Resource Quota Scopes + +Add the ability to associate a set of `scopes` to a quota. + +A quota will only measure usage for a `resource` if it matches +the intersection of enumerated `scopes`. + +Adding a `scope` to a quota limits the number of resources +it supports to those that pertain to the `scope`. Specifying +a resource on the quota object outside of the allowed set +would result in a validation error. + +| Scope | Description | +| ----- | ----------- | +| Terminating | Match `kind=Pod` where `spec.activeDeadlineSeconds >= 0` | +| NotTerminating | Match `kind=Pod` where `spec.activeDeadlineSeconds = nil` | +| BestEffort | Match `kind=Pod` where `status.qualityOfService in (BestEffort)` | +| NotBestEffort | Match `kind=Pod` where `status.qualityOfService not in (BestEffort)` | + +A `BestEffort` scope restricts a quota to tracking the following resources: + +* pod + +A `Terminating`, `NotTerminating`, `NotBestEffort` scope restricts a quota to +tracking the following resources: + +* pod +* memory, memory.request, memory.limit +* cpu, cpu.request, cpu.limit + +## Data Model Impact + +``` +// The following identify resource constants for Kubernetes object types +const ( + // CPU Request, in cores + ResourceCPURequest ResourceName = "cpu.request" + // CPU Limit, in bytes + ResourceCPULimit ResourceName = "cpu.limit" + // Memory Request, in bytes + ResourceMemoryRequest ResourceName = "memory.request" + // Memory Limit, in bytes + ResourceMemoryLimit ResourceName = "memory.limit" +) + +// A scope is a filter that matches an object +type ResourceQuotaScope string +const ( + ResourceQuotaScopeTerminating ResourceQuotaScope = "Terminating" + ResourceQuotaScopeNotTerminating ResourceQuotaScope = "NotTerminating" + ResourceQuotaScopeBestEffort ResourceQuotaScope = "BestEffort" + ResourceQuotaScopeNotBestEffort ResourceQuotaScope = "NotBestEffort" +) + +// ResourceQuotaSpec defines the desired hard limits to enforce for Quota +// The quota matches by default on all objects in its namespace. +// The quota can optionally match objects that satisfy a set of scopes. +type ResourceQuotaSpec struct { + // Hard is the set of desired hard limits for each named resource + Hard ResourceList `json:"hard,omitempty"` + // Scopes is the set of filters that must match an object for it to be + // tracked by the quota + Scopes []ResourceQuotaScope `json:"scopes,omitempty"` +} +``` + +## Rest API Impact + +None. + +## Security Impact + +None. + +## End User Impact + +The `kubectl` commands that render quota should display its scopes. + +## Performance Impact + +This feature will make having more quota objects in a namespace +more common in certain clusters. This impacts the number of quota +objects that need to be incremented during creation of an object +in admission control. It impacts the number of quota objects +that need to be updated during controller loops. + +## Developer Impact + +None. + +## Alternatives + +This proposal initially enumerated a solution that leveraged a +`FieldSelector` on a `ResourceQuota` object. A `FieldSelector` +grouped an `APIVersion` and `Kind` with a selector over its +fields that supported set-based requirements. It would have allowed +a quota to track objects based on cluster defined attributes. + +For example, a quota could do the following: + +* match `Kind=Pod` where `spec.restartPolicy in (Always)` +* match `Kind=Pod` where `spec.restartPolicy in (Never, OnFailure)` +* match `Kind=Pod` where `status.qualityOfService in (BestEffort)` +* match `Kind=Service` where `spec.type in (LoadBalancer)` + * see [#17484](https://github.com/kubernetes/kubernetes/issues/17484) + +Theoretically, it would enable support for fine-grained tracking +on a variety of resource types. While extremely flexible, there +are cons to to this approach that make it premature to pursue +at this time. + +* Generic field selectors are not yet settled art + * see [#1362](https://github.com/kubernetes/kubernetes/issues/1362) + * see [#19084](https://github.com/kubernetes/kubernetes/pull/19804) +* Discovery API Limitations + * Not possible to discover the set of field selectors supported by kind. + * Not possible to discover if a field is readonly, readwrite, or immutable + post-creation. + +The quota system would want to validate that a field selector is valid, +and it would only want to select on those fields that are readonly/immutable +post creation to make resource tracking work during update operations. + +The current proposal could grow to support a `FieldSelector` on a +`ResourceQuotaSpec` and support a simple migration path to convert +`scopes` to the matching `FieldSelector` once the project has identified +how it wants to handle `fieldSelector` requirements longer term. + +This proposal previously discussed a solution that leveraged a +`LabelSelector` as a mechanism to partition quota. This is potentially +interesting to explore in the future to allow `namespace-admins` to +quota workloads based on local knowledge. For example, a quota +could match all kinds that match the selector +`tier=cache, environment in (dev, qa)` separately from quota that +matched `tier=cache, environment in (prod)`. This is interesting to +explore in the future, but labels are insufficient selection targets +for `cluster-administrators` to control footprint. In those instances, +you need fields that are cluster controlled and not user-defined. + +## Example + +### Scenario 1 + +The cluster-admin wants to restrict the following: + +* limit 2 best-effort pods +* limit 2 terminating pods that can not use more than 1Gi of memory, and 2 cpu cores +* limit 4 long-running pods that can not use more than 4Gi of memory, and 4 cpu cores +* limit 6 pods in total, 10 replication controllers + +This would require the following quotas to be added to the namespace: + +``` +$ cat quota-best-effort +apiVersion: v1 +kind: ResourceQuota +metadata: + name: quota-best-effort +spec: + hard: + pods: "2" + scopes: + - BestEffort + +$ cat quota-terminating +apiVersion: v1 +kind: ResourceQuota +metadata: + name: quota-terminating +spec: + hard: + pods: "2" + memory.limit: 1Gi + cpu.limit: 2 + scopes: + - Terminating + - NotBestEffort + +$ cat quota-longrunning +apiVersion: v1 +kind: ResourceQuota +metadata: + name: quota-longrunning +spec: + hard: + pods: "2" + memory.limit: 4Gi + cpu.limit: 4 + scopes: + - NotTerminating + - NotBestEffort + +$ cat quota +apiVersion: v1 +kind: ResourceQuota +metadata: + name: quota +spec: + hard: + pods: "6" + replicationcontrollers: "10" +``` + +In the above scenario, every pod creation will result in its usage being +tracked by `quota` since it has no additional scoping. The pod will then +be tracked by at 1 additional quota object based on the scope it +matches. In order for the pod creation to succeed, it must not violate +the constraint of any matching quota. So for example, a best-effort pod +would only be created if there was available quota in `quota-best-effort` +and `quota`. + +## Implementation + +### Assignee + +@derekwaynecarr + +### Work Items + +* Add support for requests and limits +* Add support for scopes in quota-related admission and controller code + +## Dependencies + +None. + +Longer term, we should evaluate what we want to do with `fieldSelector` as +the requests around different quota semantics will continue to grow. + +## Testing + +Appropriate unit and e2e testing will be authored. + +## Documentation Impact + +Existing resource quota documentation and examples will be updated. + + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/resource-quota-scoping.md?pixel)]() +