* podSubnet check: if a podSubnet is specified in kubeadm-config
then the e2e test will check that pod-cidrs of individual nodes
fall within this range.
* serviceSubnet check: if a serviceSubnet is specified in
kubeadm-config then the e2e test will check that the kubernetes
service created in the default namespace got a service IP
from the configured range.
"validates resource limits of pods that are allowed to run" test of
conformance tests are flake on some local environments.
The CPU workload pods don't seem work well and nodes have still CPU
capacity after running the workload pods. Then the conformance test
failed unexpectedly.
This adds message which shows how much CPU used by the workload pods
for investigating it easily.
buildPodRef creates a unique key with the {podName, namespace, UID}
tuple. By omitting the UID in the metric, duplicate metrics can be sent
to prometheus causing 500's on the /metrics endpoint.
* Allow aggregate-to-edit roles to get jobs status
Right now users/accounts with role `admin` or `edit` can create, update and delete jobs, but are not allowed to pull the status of a job that they create. This change extends `aggregate-to-edit` rules to include `jobs/status`.
* Move jobs/status to aggregate-to-view rules
* Add aggregate-to-view policy to view PVCs status
* Update fixtures to include new read permissions
* Add more status subresources
* Update cluster-roles.yaml
* Re-order deployment permissions
* Run go fmt
* Add more permissions
* Fix tests
* Re-order permissions in test data
* Automatically update yamls
This patch fixes a bug in the CPUManager, whereby it doesn't honor the
"effective requests/limits" of a Pod as defined by:
https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#resources
The rule states that a Pod’s "effective request/limit" for a resource
should be the larger of:
* The highest of any particular resource request or limit
defined on all init Containers
* The sum of all app Containers request/limit for a
resource
Moreover, the rule states that:
* The effective QoS tier is the same for init Containers
and app containers alike
This means that the resource requests of init Containers and app
Containers should be able to overlap, such that the larger of the two
becomes the "effective resource request/limit" for the Pod. Likewise,
if a QoS tier of "Guaranteed" is determined for the Pod, then both init
Containers and app Containers should run in this tier.
In its current implementation, the CPU manager honors the effective QoS
tier for both init and app containers, but doesn't honor the "effective
request/limit" correctly.
Instead, it treats the "effective request/limit" as:
* The sum of all init Containers plus the sum of all app
Containers request/limit for a resource
It does this by not proactively removing the CPUs given to previous init
containers when new containers are being created. In the worst case,
this causes the CPUManager to give non-overlapping CPUs to all
containers (whether init or app) in the "Guaranteed" QoS tier before any
of the containers in the Pod actually start.
This effectively blocks these Pods from running if the total number of
CPUs being requested across init and app Containers goes beyond the
limits of the system.
This patch fixes this problem by updating the CPUManager static policy
so that it proactively removes any guaranteed CPUs it has granted to
init Containers before allocating CPUs to app containers. Since all init
container are run sequentially, it also makes sure this proactive
removal happens for previous init containers when allocating CPUs to
later ones.