Update TopologyManager algorithm for selecting "best" non-preferred hint

For the 'single-numa' and 'restricted' TopologyManager policies, pods are only admitted if all of their containers have perfect alignment across the set of resources they are requesting. The best-effort policy, on the other hand, will prefer allocations that have perfect alignment, but fall back to a non-preferred alignment if perfect alignment can't be achieved. The existing algorithm of how to choose the best hint from the set of "non-preferred" hints is fairly naive and often results in choosing a sub-optimal hint. It works fine in cases where all resources would end up coming from a single NUMA node (even if its not the same NUMA nodes), but breaks down as soon as multiple NUMA nodes are required for the "best" alignment. We will never be able to achieve perfect alignment with these non-preferred hints, but we should try and do something more intelligent than simply choosing the hint with the narrowest mask. In an ideal world, we would have the TopologyManager return a set of "resources-relative" hints (as opposed to a common hint for all resources as is done today). Each resource-relative hint would indicate how many other resources could be aligned to it on a given NUMA node, and a hint provider would use this information to allocate its resources in the most aligned way possible. There are likely some edge cases to consider here, but such an algorithm would allow us to do partial-perfect-alignment of "some" resources, even if all resources could not be perfectly aligned. Unfortunately, supporting something like this would require a major redesign to how the TopologyManager interacts with its hint providers (as well as how those hint providers make decisions based on the hints they get back). That said, we can still do better than the naive algorithm we have today, and this patch provides a mechanism to do so. We start by looking at the set of hints passed into the TopologyManager for each resource and generate a list of the minimum number of NUMA nodes required to satisfy an allocation for a given resource. Each entry in this list then contains the 'minNUMAAffinity.Count()' for a given resources. Once we have this list, we find the *maximum* 'minNUMAAffinity.Count()' from the list and mark that as the 'bestNonPreferredAffinityCount' that we would like to have associated with whatever "bestHint" we ultimately generate. The intuition being that we would like to (at the very least) get alignment for those resources that *require* multiple NUMA nodes to satisfy their allocation. If we can't quite get there, then we should try to come as close to it as possible. Once we have this 'bestNonPreferredAffinityCount', the algorithm proceeds as follows: If the mergedHint and bestHint are both non-preferred, then try and find a hint whose affinity count is as close to (but not higher than) the bestNonPreferredAffinityCount as possible. To do this we need to consider the following cases and react accordingly: 1. bestHint.NUMANodeAffinity.Count() > bestNonPreferredAffinityCount 2. bestHint.NUMANodeAffinity.Count() == bestNonPreferredAffinityCount 3. bestHint.NUMANodeAffinity.Count() < bestNonPreferredAffinityCount For case (1), the current bestHint is larger than the bestNonPreferredAffinityCount, so updating to any narrower mergeHint is preferred over staying where we are. For case (2), the current bestHint is equal to the bestNonPreferredAffinityCount, so we would like to stick with what we have *unless* the current mergedHint is also equal to bestNonPreferredAffinityCount and it is narrower. For case (3), the current bestHint is less than bestNonPreferredAffinityCount, so we would like to creep back up to bestNonPreferredAffinityCount as close as we can. There are three cases to consider here: 3a. mergedHint.NUMANodeAffinity.Count() > bestNonPreferredAffinityCount 3b. mergedHint.NUMANodeAffinity.Count() == bestNonPreferredAffinityCount 3c. mergedHint.NUMANodeAffinity.Count() < bestNonPreferredAffinityCount For case (3a), we just want to stick with the current bestHint because choosing a new hint that is greater than bestNonPreferredAffinityCount would be counter-productive. For case (3b), we want to immediately update bestHint to the current mergedHint, making it now equal to bestNonPreferredAffinityCount. For case (3c), we know that *both* the current bestHint and the current mergedHint are less than bestNonPreferredAffinityCount, so we want to choose one that brings us back up as close to bestNonPreferredAffinityCount as possible. There are three cases to consider here: 3ca. mergedHint.NUMANodeAffinity.Count() > bestHint.NUMANodeAffinity.Count() 3cb. mergedHint.NUMANodeAffinity.Count() < bestHint.NUMANodeAffinity.Count() 3cc. mergedHint.NUMANodeAffinity.Count() == bestHint.NUMANodeAffinity.Count() For case (3ca), we want to immediately update bestHint to mergedHint because that will bring us closer to the (higher) value of bestNonPreferredAffinityCount. For case (3cb), we want to stick with the current bestHint because choosing the current mergedHint would strictly move us further away from the bestNonPreferredAffinityCount. Finally, for case (3cc), we know that the current bestHint and the current mergedHint are equal, so we simply choose the narrower of the 2. This patch implements this algorithm for the case where we must choose from a set of non-preferred hints and provides a set of unit-tests to verify its correctness. Signed-off-by: Kevin Klues <kklues@nvidia.com>
2025-07-27 21:47:07 +00:00 · 2022-02-16 08:18:39 +00:00 · 2022-02-16 08:18:39 +00:00 · 99c57828ce
commit 99c57828ce
parent f8601cb5a3
2 changed files with 357 additions and 7 deletions
--- a/pkg/kubelet/cm/topologymanager/policy.go
+++ b/pkg/kubelet/cm/topologymanager/policy.go
@ -94,7 +94,40 @@ func filterProvidersHints(providersHints []map[string][]TopologyHint) [][]Topolo
 	return allProviderHints
 }

-func compareHints(current *TopologyHint, candidate *TopologyHint) *TopologyHint {
+func narrowestHint(hints []TopologyHint) *TopologyHint {
+	if len(hints) == 0 {
+		return nil
+	}
+	var narrowestHint *TopologyHint
+	for i := range hints {
+		if hints[i].NUMANodeAffinity == nil {
+			continue
+		}
+		if narrowestHint == nil {
+			narrowestHint = &hints[i]
+		}
+		if hints[i].NUMANodeAffinity.IsNarrowerThan(narrowestHint.NUMANodeAffinity) {
+			narrowestHint = &hints[i]
+		}
+	}
+	return narrowestHint
+}
+
+func maxOfMinAffinityCounts(filteredHints [][]TopologyHint) int {
+	maxOfMinCount := 0
+	for _, resourceHints := range filteredHints {
+		narrowestHint := narrowestHint(resourceHints)
+		if narrowestHint == nil {
+			continue
+		}
+		if narrowestHint.NUMANodeAffinity.Count() > maxOfMinCount {
+			maxOfMinCount = narrowestHint.NUMANodeAffinity.Count()
+		}
+	}
+	return maxOfMinCount
+}
+
+func compareHints(bestNonPreferredAffinityCount int, current *TopologyHint, candidate *TopologyHint) *TopologyHint {
 	// Only consider candidates that result in a NUMANodeAffinity > 0 to
 	// replace the current bestHint.
 	if candidate.NUMANodeAffinity.Count() == 0 {
@ -119,18 +152,121 @@ func compareHints(current *TopologyHint, candidate *TopologyHint) *TopologyHint
 		return current
 	}

-	// If the current bestHint and the candidate hint have the same
-	// preference, only consider candidate hints that have a narrower
+	// If the current bestHint and the candidate hint are both preferred,
+	// then only consider candidate hints that have a narrower
 	// NUMANodeAffinity than the NUMANodeAffinity in the current bestHint.
-	if !candidate.NUMANodeAffinity.IsNarrowerThan(current.NUMANodeAffinity) {
+	if current.Preferred && candidate.Preferred {
+		if candidate.NUMANodeAffinity.IsNarrowerThan(current.NUMANodeAffinity) {
+			return candidate
+		}
 		return current
 	}

-	// In all other cases, update the bestHint to the candidate hint.
-	return candidate
+	// The only case left is if the current best bestHint and the candidate
+	// hint are both non-preferred. In this case, try and find a hint whose
+	// affinity count is as close to (but not higher than) the
+	// bestNonPreferredAffinityCount as possible. To do this we need to
+	// consider the following cases and react accordingly:
+	//
+	//   1. current.NUMANodeAffinity.Count() >  bestNonPreferredAffinityCount
+	//   2. current.NUMANodeAffinity.Count() == bestNonPreferredAffinityCount
+	//   3. current.NUMANodeAffinity.Count() <  bestNonPreferredAffinityCount
+	//
+	// For case (1), the current bestHint is larger than the
+	// bestNonPreferredAffinityCount, so updating to any narrower mergeHint
+	// is preferred over staying where we are.
+	//
+	// For case (2), the current bestHint is equal to the
+	// bestNonPreferredAffinityCount, so we would like to stick with what
+	// we have *unless* the candidate hint is also equal to
+	// bestNonPreferredAffinityCount and it is narrower.
+	//
+	// For case (3), the current bestHint is less than
+	// bestNonPreferredAffinityCount, so we would like to creep back up to
+	// bestNonPreferredAffinityCount as close as we can. There are three
+	// cases to consider here:
+	//
+	//   3a. candidate.NUMANodeAffinity.Count() >  bestNonPreferredAffinityCount
+	//   3b. candidate.NUMANodeAffinity.Count() == bestNonPreferredAffinityCount
+	//   3c. candidate.NUMANodeAffinity.Count() <  bestNonPreferredAffinityCount
+	//
+	// For case (3a), we just want to stick with the current bestHint
+	// because choosing a new hint that is greater than
+	// bestNonPreferredAffinityCount would be counter-productive.
+	//
+	// For case (3b), we want to immediately update bestHint to the
+	// candidate hint, making it now equal to bestNonPreferredAffinityCount.
+	//
+	// For case (3c), we know that *both* the current bestHint and the
+	// candidate hint are less than bestNonPreferredAffinityCount, so we
+	// want to choose one that brings us back up as close to
+	// bestNonPreferredAffinityCount as possible. There are three cases to
+	// consider here:
+	//
+	//   3ca. candidate.NUMANodeAffinity.Count() >  current.NUMANodeAffinity.Count()
+	//   3cb. candidate.NUMANodeAffinity.Count() <  current.NUMANodeAffinity.Count()
+	//   3cc. candidate.NUMANodeAffinity.Count() == current.NUMANodeAffinity.Count()
+	//
+	// For case (3ca), we want to immediately update bestHint to the
+	// candidate hint because that will bring us closer to the (higher)
+	// value of bestNonPreferredAffinityCount.
+	//
+	// For case (3cb), we want to stick with the current bestHint because
+	// choosing the candidate hint would strictly move us further away from
+	// the bestNonPreferredAffinityCount.
+	//
+	// Finally, for case (3cc), we know that the current bestHint and the
+	// candidate hint are equal, so we simply choose the narrower of the 2.
+
+	// Case 1
+	if current.NUMANodeAffinity.Count() > bestNonPreferredAffinityCount {
+		if candidate.NUMANodeAffinity.IsNarrowerThan(current.NUMANodeAffinity) {
+			return candidate
+		}
+		return current
+	}
+	// Case 2
+	if current.NUMANodeAffinity.Count() == bestNonPreferredAffinityCount {
+		if candidate.NUMANodeAffinity.Count() != bestNonPreferredAffinityCount {
+			return current
+		}
+		if candidate.NUMANodeAffinity.IsNarrowerThan(current.NUMANodeAffinity) {
+			return candidate
+		}
+		return current
+	}
+	// Case 3a
+	if candidate.NUMANodeAffinity.Count() > bestNonPreferredAffinityCount {
+		return current
+	}
+	// Case 3b
+	if candidate.NUMANodeAffinity.Count() == bestNonPreferredAffinityCount {
+		return candidate
+	}
+	// Case 3ca
+	if candidate.NUMANodeAffinity.Count() > current.NUMANodeAffinity.Count() {
+		return candidate
+	}
+	// Case 3cb
+	if candidate.NUMANodeAffinity.Count() < current.NUMANodeAffinity.Count() {
+		return current
+	}
+	// Case 3cc
+	if candidate.NUMANodeAffinity.IsNarrowerThan(current.NUMANodeAffinity) {
+		return candidate
+	}
+	return current
 }

 func mergeFilteredHints(numaNodes []int, filteredHints [][]TopologyHint) TopologyHint {
+	// Set bestNonPreferredAffinityCount to help decide which affinity mask is
+	// preferred amongst all non-preferred hints. We calculate this value as
+	// the maximum of the minimum affinity counts supplied for any given hint
+	// provider. In other words, prefer a hint that has an affinity mask that
+	// includes all of the NUMA nodes from the provider that requires the most
+	// NUMA nodes to satisfy its allocation.
+	bestNonPreferredAffinityCount := maxOfMinAffinityCounts(filteredHints)
+
 	var bestHint *TopologyHint
 	iterateAllProviderTopologyHints(filteredHints, func(permutation []TopologyHint) {
 		// Get the NUMANodeAffinity from each hint in the permutation and see if any
@ -139,7 +275,7 @@ func mergeFilteredHints(numaNodes []int, filteredHints [][]TopologyHint) Topolog

 		// Compare the current bestHint with the candidate mergedHint and
 		// update bestHint if appropriate.
-		bestHint = compareHints(bestHint, &mergedHint)
+		bestHint = compareHints(bestNonPreferredAffinityCount, bestHint, &mergedHint)
 	})

 	if bestHint == nil {
--- a/pkg/kubelet/cm/topologymanager/policy_test.go
+++ b/pkg/kubelet/cm/topologymanager/policy_test.go
@ -615,6 +615,220 @@ func (p *bestEffortPolicy) mergeTestCases(numaNodes []int) []policyMergeTestCase
 				Preferred:        false,
 			},
 		},
+		{
+			name: "bestNonPreferredAffinityCount (1)",
+			hp: []HintProvider{
+				&mockHintProvider{
+					map[string][]TopologyHint{
+						"resource1": {
+							{
+								NUMANodeAffinity: NewTestBitMask(0, 1, 2, 3),
+								Preferred:        false,
+							},
+							{
+								NUMANodeAffinity: NewTestBitMask(0, 1),
+								Preferred:        false,
+							},
+						},
+					},
+				},
+				&mockHintProvider{
+					map[string][]TopologyHint{
+						"resource2": {
+							{
+								NUMANodeAffinity: NewTestBitMask(0, 1),
+								Preferred:        false,
+							},
+						},
+					},
+				},
+			},
+			expected: TopologyHint{
+				NUMANodeAffinity: NewTestBitMask(0, 1),
+				Preferred:        false,
+			},
+		},
+		{
+			name: "bestNonPreferredAffinityCount (2)",
+			hp: []HintProvider{
+				&mockHintProvider{
+					map[string][]TopologyHint{
+						"resource1": {
+							{
+								NUMANodeAffinity: NewTestBitMask(0, 1, 2, 3),
+								Preferred:        false,
+							},
+							{
+								NUMANodeAffinity: NewTestBitMask(0, 1),
+								Preferred:        false,
+							},
+						},
+					},
+				},
+				&mockHintProvider{
+					map[string][]TopologyHint{
+						"resource2": {
+							{
+								NUMANodeAffinity: NewTestBitMask(0, 3),
+								Preferred:        false,
+							},
+						},
+					},
+				},
+			},
+			expected: TopologyHint{
+				NUMANodeAffinity: NewTestBitMask(0, 3),
+				Preferred:        false,
+			},
+		},
+		{
+			name: "bestNonPreferredAffinityCount (3)",
+			hp: []HintProvider{
+				&mockHintProvider{
+					map[string][]TopologyHint{
+						"resource1": {
+							{
+								NUMANodeAffinity: NewTestBitMask(0, 1, 2, 3),
+								Preferred:        false,
+							},
+							{
+								NUMANodeAffinity: NewTestBitMask(0, 1),
+								Preferred:        false,
+							},
+						},
+					},
+				},
+				&mockHintProvider{
+					map[string][]TopologyHint{
+						"resource2": {
+							{
+								NUMANodeAffinity: NewTestBitMask(1, 2),
+								Preferred:        false,
+							},
+						},
+					},
+				},
+			},
+			expected: TopologyHint{
+				NUMANodeAffinity: NewTestBitMask(1, 2),
+				Preferred:        false,
+			},
+		},
+		{
+			name: "bestNonPreferredAffinityCount (4)",
+			hp: []HintProvider{
+				&mockHintProvider{
+					map[string][]TopologyHint{
+						"resource1": {
+							{
+								NUMANodeAffinity: NewTestBitMask(0, 1, 2, 3),
+								Preferred:        false,
+							},
+							{
+								NUMANodeAffinity: NewTestBitMask(0, 1),
+								Preferred:        false,
+							},
+						},
+					},
+				},
+				&mockHintProvider{
+					map[string][]TopologyHint{
+						"resource2": {
+							{
+								NUMANodeAffinity: NewTestBitMask(2, 3),
+								Preferred:        false,
+							},
+						},
+					},
+				},
+			},
+			expected: TopologyHint{
+				NUMANodeAffinity: NewTestBitMask(2, 3),
+				Preferred:        false,
+			},
+		},
+		{
+			name: "bestNonPreferredAffinityCount (5)",
+			hp: []HintProvider{
+				&mockHintProvider{
+					map[string][]TopologyHint{
+						"resource1": {
+							{
+								NUMANodeAffinity: NewTestBitMask(0, 1, 2, 3),
+								Preferred:        false,
+							},
+							{
+								NUMANodeAffinity: NewTestBitMask(0, 1),
+								Preferred:        false,
+							},
+						},
+					},
+				},
+				&mockHintProvider{
+					map[string][]TopologyHint{
+						"resource2": {
+							{
+								NUMANodeAffinity: NewTestBitMask(1, 2),
+								Preferred:        false,
+							},
+							{
+								NUMANodeAffinity: NewTestBitMask(2, 3),
+								Preferred:        false,
+							},
+						},
+					},
+				},
+			},
+			expected: TopologyHint{
+				NUMANodeAffinity: NewTestBitMask(1, 2),
+				Preferred:        false,
+			},
+		},
+		{
+			name: "bestNonPreferredAffinityCount (6)",
+			hp: []HintProvider{
+				&mockHintProvider{
+					map[string][]TopologyHint{
+						"resource1": {
+							{
+								NUMANodeAffinity: NewTestBitMask(0, 1, 2, 3),
+								Preferred:        false,
+							},
+							{
+								NUMANodeAffinity: NewTestBitMask(0, 1),
+								Preferred:        false,
+							},
+						},
+					},
+				},
+				&mockHintProvider{
+					map[string][]TopologyHint{
+						"resource2": {
+							{
+								NUMANodeAffinity: NewTestBitMask(1, 2, 3),
+								Preferred:        false,
+							},
+							{
+								NUMANodeAffinity: NewTestBitMask(1, 2),
+								Preferred:        false,
+							},
+							{
+								NUMANodeAffinity: NewTestBitMask(1, 3),
+								Preferred:        false,
+							},
+							{
+								NUMANodeAffinity: NewTestBitMask(2, 3),
+								Preferred:        false,
+							},
+						},
+					},
+				},
+			},
+			expected: TopologyHint{
+				NUMANodeAffinity: NewTestBitMask(1, 2),
+				Preferred:        false,
+			},
+		},
 	}
 }