This patch fixes a bug in the CPUManager, whereby it doesn't honor the
"effective requests/limits" of a Pod as defined by:
https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#resources
The rule states that a Pod’s "effective request/limit" for a resource
should be the larger of:
* The highest of any particular resource request or limit
defined on all init Containers
* The sum of all app Containers request/limit for a
resource
Moreover, the rule states that:
* The effective QoS tier is the same for init Containers
and app containers alike
This means that the resource requests of init Containers and app
Containers should be able to overlap, such that the larger of the two
becomes the "effective resource request/limit" for the Pod. Likewise,
if a QoS tier of "Guaranteed" is determined for the Pod, then both init
Containers and app Containers should run in this tier.
In its current implementation, the CPU manager honors the effective QoS
tier for both init and app containers, but doesn't honor the "effective
request/limit" correctly.
Instead, it treats the "effective request/limit" as:
* The sum of all init Containers plus the sum of all app
Containers request/limit for a resource
It does this by not proactively removing the CPUs given to previous init
containers when new containers are being created. In the worst case,
this causes the CPUManager to give non-overlapping CPUs to all
containers (whether init or app) in the "Guaranteed" QoS tier before any
of the containers in the Pod actually start.
This effectively blocks these Pods from running if the total number of
CPUs being requested across init and app Containers goes beyond the
limits of the system.
This patch fixes this problem by updating the CPUManager static policy
so that it proactively removes any guaranteed CPUs it has granted to
init Containers before allocating CPUs to app containers. Since all init
container are run sequentially, it also makes sure this proactive
removal happens for previous init containers when allocating CPUs to
later ones.
Make the PVC protection controller robust to cases where a Pod X is deleted,
then a Pod Y with the same namespaced name is created and the two events are
delivered via a single update notification. Both pods should be processed,
because X might be blocking deletion of a PVC which is not referenced by Y.
Prior to this commit only the newer pod is processed, which means that it
is possible to leak PVCs.
Also, add unit tests to reflect the change.
default behavior does not change, it uses k8s.gcr.io by default
added two vars: KUBE_DOCKER_REGISTRY, KUBE_BASE_IMAGE_REGISTRY.
KUBE_BASE_IMAGE_REGISTRY is for base image registry of server binaries
KUBE_DOCKER_REGISTRY is for released images registry
user can interact with them by:
`KUBE_DOCKER_REGISTRY=### KUBE_BASE_IMAGE_REGISTRY=### make quick-release`
Signed-off-by: Hui Luo <luoh@vmware.com>
* Kubectl user exec should accept zero-length environment values #652
* Changing TestValidateAuthInfoExecInvalidEnv to allow for empty strings as Exec values