cpumanager: validate topology in static policy.

This patch adds a check for the static policy state validation. The
check fails if the CPU topology obtained from cadvisor doesn't match
with the current topology in the state file.

If the CPU topology has changed in a node, cpu manager static policy
might try to assign non-present cores to containers.

For example in my test case, static policy had the default CPU set of
0-1,4-7. Then kubelet was shut down and CPU 7 was offlined. After
restarting the kubelet, CPU manager tries to assign the non-existent CPU
7 to containers which don't have exclusive allocations assigned to them:

 Error response from daemon: Requested CPUs are not available - requested 0-1,4-7, available: 0-6)

This breaks the exclusivity, since the CPUs from the shared pool don't
get assigned to non-exclusive containers, meaning that they can execute
on the exclusive CPUs.
This commit is contained in:
Ismo Puustinen 2018-07-27 14:50:51 +03:00
parent 28b6fb5f7d
commit 4f604eb73c

View File

@ -129,7 +129,7 @@ func (p *staticPolicy) validateState(s state.State) error {
}
// State has already been initialized from file (is not empty)
// 1 Check if the reserved cpuset is not part of default cpuset because:
// 1. Check if the reserved cpuset is not part of default cpuset because:
// - kube/system reserved have changed (increased) - may lead to some containers not being able to start
// - user tampered with file
if !p.reserved.Intersection(tmpDefaultCPUset).Equals(p.reserved) {
@ -145,6 +145,23 @@ func (p *staticPolicy) validateState(s state.State) error {
cID, cset.String(), tmpDefaultCPUset.String())
}
}
// 3. It's possible that the set of available CPUs has changed since
// the state was written. This can be due to for example
// offlining a CPU when kubelet is not running. If this happens,
// CPU manager will run into trouble when later it tries to
// assign non-existent CPUs to containers. Validate that the
// topology that was received during CPU manager startup matches with
// the set of CPUs stored in the state.
totalKnownCPUs := tmpDefaultCPUset.Clone()
for _, cset := range tmpAssignments {
totalKnownCPUs = totalKnownCPUs.Union(cset)
}
if !totalKnownCPUs.Equals(p.topology.CPUDetails.CPUs()) {
return fmt.Errorf("current set of available CPUs \"%s\" doesn't match with CPUs in state \"%s\"",
p.topology.CPUDetails.CPUs().String(), totalKnownCPUs.String())
}
return nil
}