libcontainer's cgroup manager is version agnostic, and is agnostic
to whether systemd is used. This way if systemd is used, the cgroup
manager would be able to update resources properly so that if
the daemon would be restarted the changes would not be reverted.
Signed-off-by: Itamar Holder <iholder@redhat.com>
[FG:InPlacePodVerticalScaling] Fixed the apiserver panic issue that occurred when adding a container during pod updates in the InPlacePodVerticalScaling scenario.
Introducing pdb to preemption had disrupted the orderliness of pods in the victims,
which would leads picking wrong victim node with higher priority pod on it.
For a resource within a group, such as memory,
we should validate the total `Free` and total `Reserved` size of the expected `machineState` and state restored from checkpoint file after kubelet start.
If total `Free` and total `Reserved` are equal, the restored state is valid.
The old comparison however was done by reflection.
There're times when the memory accounting is equals
but the allocations across the NUMA nodes are varies.
In such cases we still need to consider the states as equals.
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
Using the "normal" logic for a feature gated field simplifies the
implementation of the feature gate.
There is one (entirely theoretic!) problem with updating from 1.31: if a claim
was allocated in 1.31 with admin access, the status field was not set because
it didn't exist yet. If a driver now follows the current definition of "unset =
off", then it will not grant admin access even though it should. This is
theoretic because drivers are starting to support admin access with 1.32, so
there shouldn't be any claim where this problem could occur.
The new DRAAdminAccess feature gate has the following effects:
- If disabled in the apiserver, the spec.devices.requests[*].adminAccess
field gets cleared. Same in the status. In both cases the scenario
that it was already set and a claim or claim template get updated
is special: in those cases, the field is not cleared.
Also, allocating a claim with admin access is allowed regardless of the
feature gate and the field is not cleared. In practice, the scheduler
will not do that.
- If disabled in the resource claim controller, creating ResourceClaims
with the field set gets rejected. This prevents running workloads
which depend on admin access.
- If disabled in the scheduler, claims with admin access don't get
allocated. The effect is the same.
The alternative would have been to ignore the fields in claim controller and
scheduler. This is bad because a monitoring workload then runs, blocking
resources that probably were meant for production workloads.
Drivers need to know that because admin access may also grant additional
permissions. The allocator needs to ignore such results when determining which
devices are considered as allocated.
In both cases it is conceptually cleaner to not rely on the content of the
ClaimSpec.