The test for checking container restarts in a Pod with restartable-init-1
and regular-1 is flaky. Right now, when we check if restartable-init-1 has
restarted, we see if it hasn’t written the "Started" log after regular-1 has
written its "Started" log.
But even though the startup sequence starts with restartable-init-1 and then
regular-1, there’s no guarantee they’ll finish starting up in that order.
Sometimes regular-1 finishes first and writes its "Started" log before restartable-init-1.
1. restartable-init-1 Starting
2. regular-1 Starting
3. regular-1 Started
4. restartable-init-1 Started
In this test, the startup order doesn’t really matter; all we need to check is
if restartable-init-1 restarted. So I changed the test to simply look for
more than one "Starting" log in restartable-init-1's logs.
There were other places that used the same helper function DoesntStartAfter,
so replaced those as well and deleted the helper function.
with systemd cgroup driver and cpumanager none policy.
This was originally planned to be a correctness check for
https://issues.k8s.io/125923, but it was difficult to reproduce the bug,
so it's now a regression test against it.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Signed-off-by: Peter Hunt <pehunt@redhat.com>
on None cpumanager policy, cgroupv2, and systemd cgroup manager, kubelet
could get into a situation where it believes the cpuset cgroup was created
(by libcontainer in the cgroupfs) but systemd has deleted it, as it wasn't requested
to create it. This causes one unnecessary restart, as kubelet fails with
`failed to initialize top level QOS containers: root container [kubepods] doesn't exist.`
This only causes one restart because the kubelet skips recreating the cgroup
if it already exists, but it's still a bother and is fixed this way
Signed-off-by: Peter Hunt <pehunt@redhat.com>
The error was only generated if both checks (generated pods and ready pods)
failed. This looks like a logic error, failing if either of those isn't
matching expectations seems better.
Fix error message if availablePhysicalCPUs = 0.
Without this change, the logic was mistakenly emitting
the old error message, which is confusing for troubleshooting.
Plus, a tiny quality of life improvement:
cpumanager static policy wants to use `cpuGroupSize` multiple times.
The value represents how many VCPUs per PCPUs the machine has.
So, let's cache (and log!) the value in the policy data.
We don't support dynamic update of the HW topology anyway.
Signed-off-by: Francesco Romani <fromani@redhat.com>
* A pod with restartable init container that exits with
a non-zero code is marked as a pod succeeded phase
* A pod with restartable init containers that exits with
a non-zero code by prestop hook is marked as a pod succeeded phase
* A pod with regular container that exceeds its termination grace period
seconds is marked as a pod failed phase
* A pod with restartable init containers that exceeds its termination
grace period seconds is marked as a pod succeeded phase
* A pod with a regular container that exceeded its termination grace
period seconds by PreStop hook is marked as a pod failed phase
* A pod with restartable init containers that exceeds its termination
grace period seconds by PreStop hook is marked as a pod succeeded phase
Signed-off-by: Tsubasa Nagasawa <toversus2357@gmail.com>