Commit Graph

106443 Commits

Author SHA1 Message Date
Kevin Klues
99c57828ce Update TopologyManager algorithm for selecting "best" non-preferred hint
For the 'single-numa' and 'restricted' TopologyManager policies, pods are only
admitted if all of their containers have perfect alignment across the set of
resources they are requesting. The best-effort policy, on the other hand, will
prefer allocations that have perfect alignment, but fall back to a non-preferred
alignment if perfect alignment can't be achieved.

The existing algorithm of how to choose the best hint from the set of
"non-preferred" hints is fairly naive and often results in choosing a
sub-optimal hint. It works fine in cases where all resources would end up
coming from a single NUMA node (even if its not the same NUMA nodes), but
breaks down as soon as multiple NUMA nodes are required for the "best"
alignment.  We will never be able to achieve perfect alignment with these
non-preferred hints, but we should try and do something more intelligent than
simply choosing the hint with the narrowest mask.

In an ideal world, we would have the TopologyManager return a set of
"resources-relative" hints (as opposed to a common hint for all resources as is
done today). Each resource-relative hint would indicate how many other
resources could be aligned to it on a given NUMA node, and a  hint provider
would use this information to allocate its resources in the most aligned way
possible. There are likely some edge cases to consider here, but such an
algorithm would allow us to do partial-perfect-alignment of "some" resources,
even if all resources could not be perfectly aligned.

Unfortunately, supporting something like this would require a major redesign to
how the TopologyManager interacts with its hint providers (as well as how those
hint providers make decisions based on the hints they get back).

That said, we can still do better than the naive algorithm we have today, and
this patch provides a mechanism to do so.

We start by looking at the set of hints passed into the TopologyManager for
each resource and generate a list of the minimum number of NUMA nodes required
to satisfy an allocation for a given resource. Each entry in this list then
contains the 'minNUMAAffinity.Count()' for a given resources. Once we have this
list, we find the *maximum* 'minNUMAAffinity.Count()' from the list and mark
that as the 'bestNonPreferredAffinityCount' that we would like to have
associated with whatever "bestHint" we ultimately generate. The intuition being
that we would like to (at the very least) get alignment for those resources
that *require* multiple NUMA nodes to satisfy their allocation. If we can't
quite get there, then we should try to come as close to it as possible.

Once we have this 'bestNonPreferredAffinityCount', the algorithm proceeds as
follows:

If the mergedHint and bestHint are both non-preferred, then try and find a hint
whose affinity count is as close to (but not higher than) the
bestNonPreferredAffinityCount as possible. To do this we need to consider the
following cases and react accordingly:

  1. bestHint.NUMANodeAffinity.Count() >  bestNonPreferredAffinityCount
  2. bestHint.NUMANodeAffinity.Count() == bestNonPreferredAffinityCount
  3. bestHint.NUMANodeAffinity.Count() <  bestNonPreferredAffinityCount

For case (1), the current bestHint is larger than the
bestNonPreferredAffinityCount, so updating to any narrower mergeHint is
preferred over staying where we are.

For case (2), the current bestHint is equal to the
bestNonPreferredAffinityCount, so we would like to stick with what we have
*unless* the current mergedHint is also equal to bestNonPreferredAffinityCount
and it is narrower.

For case (3), the current bestHint is less than bestNonPreferredAffinityCount,
so we would like to creep back up to bestNonPreferredAffinityCount as close as
we can. There are three cases to consider here:

  3a. mergedHint.NUMANodeAffinity.Count() >  bestNonPreferredAffinityCount
  3b. mergedHint.NUMANodeAffinity.Count() == bestNonPreferredAffinityCount
  3c. mergedHint.NUMANodeAffinity.Count() <  bestNonPreferredAffinityCount

For case (3a), we just want to stick with the current bestHint because choosing
a new hint that is greater than bestNonPreferredAffinityCount would be
counter-productive.

For case (3b), we want to immediately update bestHint to the current
mergedHint, making it now equal to bestNonPreferredAffinityCount.

For case (3c), we know that *both* the current bestHint and the current
mergedHint are less than bestNonPreferredAffinityCount, so we want to choose
one that brings us back up as close to bestNonPreferredAffinityCount as
possible. There are three cases to consider here:

  3ca. mergedHint.NUMANodeAffinity.Count() >  bestHint.NUMANodeAffinity.Count()
  3cb. mergedHint.NUMANodeAffinity.Count() <  bestHint.NUMANodeAffinity.Count()
  3cc. mergedHint.NUMANodeAffinity.Count() == bestHint.NUMANodeAffinity.Count()

For case (3ca), we want to immediately update bestHint to mergedHint because
that will bring us closer to the (higher) value of
bestNonPreferredAffinityCount.

For case (3cb), we want to stick with the current bestHint because choosing the
current mergedHint would strictly move us further away from the
bestNonPreferredAffinityCount.

Finally, for case (3cc), we know that the current bestHint and the current
mergedHint are equal, so we simply choose the narrower of the 2.

This patch implements this algorithm for the case where we must choose from a
set of non-preferred hints and provides a set of unit-tests to verify its
correctness.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2022-03-01 14:38:26 +00:00
Kevin Klues
f8601cb5a3 Refactor TopologyManager to be more explicit about bestHint calculation
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2022-02-28 20:30:01 +00:00
Kubernetes Prow Robot
d12787bc2c
Merge pull request #104698 from weiwenli97/windows_node_reboot
Add Windows node reboot
2022-02-15 02:34:02 -08:00
Kubernetes Prow Robot
d899c39ca3
Merge pull request #108111 from yselkowitz/nfs-provisioner-v3
test: bump nfs-provisioner to 3.0.1
2022-02-14 23:32:02 -08:00
Kubernetes Prow Robot
e42e2e877f
Merge pull request #107527 from wojtek-t/remove_selflink_ga
Graduate RemoveSelfLink to Stable
2022-02-14 19:46:02 -08:00
Kubernetes Prow Robot
64e83a7e43
Merge pull request #107945 from saschagrunert/cri-verbose
Add support for CRI `verbose` fields
2022-02-14 17:58:12 -08:00
Kubernetes Prow Robot
4e30fe40df
Merge pull request #108093 from hakman/remove_e2e_flag_container-runtime
Remove unused `--container-runtime` e2e.test flag
2022-02-14 14:04:31 -08:00
Kubernetes Prow Robot
1ae7da0b68
Merge pull request #108109 from eddiezane/ez/update-sig-cli-owners
Update sig-cli OWNERS
2022-02-14 12:06:30 -08:00
Yaakov Selkowitz
f44fdaca07 test: bump nfs-provisioner to 3.0.1
This is the first version built for multiple architectures.
2022-02-14 14:02:38 -05:00
Kubernetes Prow Robot
d374c954de
Merge pull request #108027 from neolit123/1.24-update-unversioned-kubelet-cm-fg
kubeadm: switch UnversionedKubeletConfigMap to true
2022-02-14 10:59:52 -08:00
Kubernetes Prow Robot
dea5589b1b
Merge pull request #107701 from kinderyj/perf/new-logic-optimiz-for-DetermineVolumeAction
perf:logic-optimiz-for-DetermineVolumeAction
2022-02-14 10:59:45 -08:00
Eddie Zaneski
040d575e9f
Update sig-cli OWNERS
Signed-off-by: Eddie Zaneski <eddiezane@gmail.com>
2022-02-14 10:55:35 -07:00
wojtekt
9732bf0d33 Autogenerated 2022-02-14 18:35:55 +01:00
Wojciech Tyczyński
c5a98327f5 Update SelfLink OpenAPI documentation 2022-02-14 18:35:55 +01:00
Wojciech Tyczyński
e46415bfbc Bump RemoveSelfLink feature gate to GA 2022-02-14 18:35:54 +01:00
Wojciech Tyczyński
b62774f2f7 Couple remaining SelfLink references cleanup 2022-02-14 18:35:54 +01:00
Wojciech Tyczyński
9b2908ea3b Cleanup apiserver storage selflink references where possible 2022-02-14 18:35:54 +01:00
Wojciech Tyczyński
d63b79ec47 Autogenerated 2022-02-14 18:35:54 +01:00
Wojciech Tyczyński
b3267092fa Remove SelfLink from autogenerating applyconfigurations 2022-02-14 18:35:54 +01:00
Wojciech Tyczyński
2169997dfe Remove Selflink from convertors 2022-02-14 18:25:12 +01:00
Kubernetes Prow Robot
67b2b347d1
Merge pull request #108103 from xmudrii/go-update-publishing-rules
Update publishing-bot rules for Go 1.17.7 / 1.16.14
2022-02-14 06:13:29 -08:00
Marko Mudrinić
b9abd5a710
Update publishing-bot rules for Go 1.17.7 / 1.16.14
Signed-off-by: Marko Mudrinić <mudrinic.mare@gmail.com>
2022-02-14 13:04:07 +01:00
Kubernetes Prow Robot
b591acca57
Merge pull request #108047 from wojtek-t/fix_event_update
Fix validation of event updates
2022-02-14 02:27:28 -08:00
Wojciech Tyczyński
8c1e8355f8 Ensure non-nil items in lists 2022-02-14 11:11:56 +01:00
Wojciech Tyczyński
1e0b9c6e20 Remove unused selflink parameters from ContextBasedNaming 2022-02-14 11:11:56 +01:00
Wojciech Tyczyński
0ad588b27b Relax to using namer instead of selflinker in API groupversion 2022-02-14 11:11:56 +01:00
Wojciech Tyczyński
c8ee055b73 Introduce Namer interface 2022-02-14 11:11:56 +01:00
Wojciech Tyczyński
0aaef27e59 Fix apiserver selflink tests 2022-02-14 11:11:56 +01:00
Wojciech Tyczyński
0a674d3ed9 Remove selflink setting from apiserver 2022-02-14 11:11:56 +01:00
Wojciech Tyczyński
8b758fb3b9 Remove selflink references in api tests 2022-02-14 11:11:56 +01:00
Wojciech Tyczyński
41ee6a3e44 Remove selflink integration tests 2022-02-14 11:11:56 +01:00
Kubernetes Prow Robot
21c0f6f6ff
Merge pull request #107677 from pohly/scheduler-integration-benchmark
scheduler integration benchmark improvements
2022-02-14 01:23:28 -08:00
Ciprian Hacman
7d5afb322d Remove unused --container-runtime e2e.test flag
Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>
2022-02-14 08:49:56 +02:00
Kubernetes Prow Robot
6669a016ae
Merge pull request #108091 from xmudrii/go-1.17.7
[go] Update Go to 1.17.7
2022-02-12 19:11:46 -08:00
Kubernetes Prow Robot
39ccd6f3f9
Merge pull request #108090 from aojea/slice_topology_error
endpointslice: don't try to update topology cache if node informer error
2022-02-12 16:19:46 -08:00
Kubernetes Prow Robot
f6a8044579
Merge pull request #106082 from matthyx/105820
container_manager: use oomScoreAdj instead of default when set
2022-02-12 09:07:47 -08:00
Matthias Bertschy
9500ee9d9c container_manager: use oomScoreAdj instead of default when set 2022-02-12 15:23:13 +01:00
Kubernetes Prow Robot
31dba0a435
Merge pull request #107142 from dimbleby/delete-user-completion
Completions for kubectl config delete-user
2022-02-12 06:19:48 -08:00
Marko Mudrinić
980406f083
Update Go to 1.17.7
Signed-off-by: Marko Mudrinić <mudrinic.mare@gmail.com>
2022-02-12 13:06:08 +01:00
Kubernetes Prow Robot
1659924a97
Merge pull request #108070 from jsafrane/remove-selinux
Remove util/selinux package
2022-02-11 18:19:47 -08:00
Kubernetes Prow Robot
1f041ccd54
Merge pull request #107887 from bertinatto/fix-panic-kubelet
Fix panic in Kubelet
2022-02-11 12:58:07 -08:00
Kubernetes Prow Robot
a07241e3e0
Merge pull request #107737 from gnufied/enable-node-restriction-default
Enable node restriction plugin by default for local clusters
2022-02-11 12:57:59 -08:00
Kubernetes Prow Robot
8580bbf7d7
Merge pull request #107594 from hakman/remove_container-runtime_logic
Clean up logic for deprecated flag --container-runtime in kubelet
2022-02-11 12:57:47 -08:00
Kubernetes Prow Robot
f74f91d3d3
Merge pull request #108044 from mengjiao-liu/improve_tail_test_coverage
Improve test coverage: add unit tests `TestReadAtMost` in `pkg/util/tail`
2022-02-11 11:48:22 -08:00
Kubernetes Prow Robot
a62759d57c
Merge pull request #107712 from sanposhiho/patch-1
Update CHANGELOG-1.23.md to delete reverted change
2022-02-11 11:48:11 -08:00
Patrick Ohly
e1e84c8e5f scheduler_perf: run with -v=0 by default
This provides a mechanism for overriding the forced increase of the klog
verbosity to 4 when starting the apiserver and uses that for the scheduler_perf
benchmark. Other tests run as before.

A global variable was used because adding an explicit parameter to several
helper functions would have caused a lot of code churn (test ->
integration/util.StartApiserver ->
integration/framework.RunAnAPIServerUsingServer ->
integration/framework.startAPIServerOrDie).
2022-02-11 16:58:33 +01:00
Kubernetes Prow Robot
a1ac74224e
Merge pull request #108062 from aojea/lease_reconciler
apiserver: use endpoint lease reconciler as default
2022-02-11 07:37:45 -08:00
Kubernetes Prow Robot
e24b5333e5
Merge pull request #108052 from klueska/fix-topology-manager
Fix bug in TopologyManager with merging hints when NUM_NUMA > 2
2022-02-11 07:37:34 -08:00
Kubernetes Prow Robot
d79ea9ea33
Merge pull request #108038 from mengjiao-liu/remove_feature_gate_SetHostnameAsFQDN
Remove feature gate `SetHostnameAsFQDN`
2022-02-11 07:36:26 -08:00
Jan Safranek
77aa06d0c8 Remove util/selinux package
The package says:

> the libcontainer SELinux package is only built for Linux, so it is
> necessary to have a NOP wrapper which is built for non-Linux platforms

This is not true, Kubernetes now imports
github.com/opencontainers/selinux/go-selinux and it has proper
multiplatform support (i.e. NOOP on non-Linux platforms).

Removing the whole package and calling go-selinux directly.
2022-02-11 15:20:35 +01:00