Fix CPUManager algo to calculate min NUMA nodes needed for distribution

Previously the algorithm was too restrictive because it tried to calculate the
minimum based on the number of *available* NUMA nodes and the number of
*available* CPUs on those NUMA nodes. Since there was no (easy) way to tell how
many CPUs an individual NUMA node happened to have, the average across them was
used. Using this value however, could result in thinking you need more NUMA
nodes to possibly satisfy a request than you actually do.

By using the *total* number of NUMA nodes and CPUs per NUMA node, we can get
the true minimum number of nodes required to satisfy a request. For a given
"current" allocation this may not be the true minimum, but its better to start
with fewer and move up than to start with too many and miss out on a better
option.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
This commit is contained in:
Kevin Klues 2021-11-24 00:56:25 +00:00
parent 209cd20548
commit a160d9a8cd

View File

@ -367,20 +367,25 @@ func (a *cpuAccumulator) takeRemainingCPUs() {
}
func (a *cpuAccumulator) rangeNUMANodesNeededToSatisfy(cpuGroupSize int) (int, int) {
// Get the total number of NUMA nodes in the system.
numNUMANodes := a.topo.CPUDetails.NUMANodes().Size()
// Get the total number of NUMA nodes that have CPUs available on them.
numNUMANodesAvailable := a.details.NUMANodes().Size()
// Get the total number of CPUs available across all NUMA nodes.
numCPUsAvailable := a.details.CPUs().Size()
// Get the total number of CPUs in the system.
numCPUs := a.topo.CPUDetails.CPUs().Size()
// Get the total number of 'cpuGroups' in the system.
numCPUGroups := (numCPUs-1)/cpuGroupSize + 1
// Calculate the number of 'cpuGroups' per NUMA Node in the system (rounding up).
numCPUGroupsPerNUMANode := (numCPUGroups-1)/numNUMANodes + 1
// Calculate the number of available 'cpuGroups' across all NUMA nodes as
// well as the number of 'cpuGroups' that need to be allocated (rounding up).
numCPUGroupsAvailable := (numCPUsAvailable-1)/cpuGroupSize + 1
numCPUGroupsNeeded := (a.numCPUsNeeded-1)/cpuGroupSize + 1
// Calculate the number of available 'cpuGroups' per NUMA Node (rounding up).
numCPUGroupsPerNUMANode := (numCPUGroupsAvailable-1)/numNUMANodesAvailable + 1
// Calculate the minimum number of numa nodes required to satisfy the
// allocation (rounding up).
minNUMAs := (numCPUGroupsNeeded-1)/numCPUGroupsPerNUMANode + 1