Commit Graph

571 Commits

Author SHA1 Message Date
Davanum Srinivas
d30c489c54 Move pkg/kubelet/pluginregistration and deviceplugin
Change-Id: I06adcb43bd278b430ffad2010869e1524c8cc4ff
2019-10-06 15:28:38 -04:00
Kubernetes Prow Robot
d60bda1971 Merge pull request #83043 from ConnorDoyle/cleanup-cpumanger-topo-hints
Delegate topology hint gen to CPU manager policy
2019-10-05 00:59:39 -07:00
Kevin Klues
d2b53af7d7 Add klueska as reviewer for CPUManager and devicemanager 2019-10-03 13:01:41 -07:00
Connor Doyle
389853894d Delegate topology hint gen to CPU manager policy
- The previous implementation depended on a fixed set of policies.
2019-09-27 22:29:02 -07:00
zouyee
b1f6974f7b using online instead to fix kubelet service failed with wrong number of possible NUMA nodes
Signed-off-by: Zou Nengren <zouyee1989@gmail.com>
2019-09-26 21:48:50 +08:00
Connor Doyle
e35301c19f Rename package socketmask to bitmask.
- As discussed in reviews and other public channels,
  this abstraction is used to represent numa nodes, not sockets.
- There is nothing inherently related to sockets in this package anyway.
2019-09-23 17:08:45 -07:00
Kubernetes Prow Robot
07cc813956 Merge pull request #81793 from lmdaly/topology-manager-owners
Added OWNERS file for Topology Manager
2019-09-11 18:26:52 -07:00
Louise Daly
fbccf25e29 Added OWNERS file for Topology Manager 2019-09-11 06:40:24 +01:00
Kubernetes Prow Robot
887edd2273 Merge pull request #82099 from lmdaly/single-numa-node-policy
Topology Manager Policy: single-numa-node
2019-08-30 11:21:26 -07:00
Kubernetes Prow Robot
9165f7bf56 Merge pull request #82104 from klueska/upstream-fix-cpu-manager-topology-bug
Fix bug in CPUManager with setting topology for policies
2019-08-30 08:00:44 -07:00
Louise Daly
8ad1b5ba3b Single-numa-node Topology Manager bug fix
Added one off fix for single-numa-node policy to correctly
reject pod admission on a resource allocation that spans
NUMA nodes

Co-authored-by: Kevin Klues <kklues@nvidia.com>
2019-08-30 07:17:56 +01:00
Louise Daly
f6c085f60e Added Single NUMA Node Policy which ensure resource are
aligned on a single NUMA node

Co-authored-by: Kevin Klues <kklues@nvidia.com>
2019-08-30 07:17:17 +01:00
Kevin Klues
5ed80dadcf Update CanAdmitPodResult() in TopologyManager to take a TopologyHint
Previously it only took a bool, which limited the logic it could perform
to determine if a pod should be admitted or not based on the merged hint
from the policy.
2019-08-30 07:17:17 +01:00
Kevin Klues
eb0216e54e Update semantics to set Preferred field in TopologyHint generation
We now only set Preferred to true if resources can be allocated with a
size equal to the minimimum _possible_ mask when all resources are
available.
2019-08-29 14:32:10 -05:00
Kevin Klues
e0e8b3e4fd Update CPUManager topology helpers to accept multiple ids 2019-08-29 13:22:54 -05:00
Kevin Klues
dcc9f66311 Add devicemanager tests for TopologyHint consumption 2019-08-29 08:22:50 -05:00
Kevin Klues
cc567afaf0 Consume TopologyHints in the devicemanager 2019-08-29 08:22:50 -05:00
Kevin Klues
a3320f80d9 Add devicemanager tests for TopologyHint generation 2019-08-29 07:45:43 -05:00
Kevin Klues
d3d7a8f5d4 Generate TopologyHints from the devicemanager 2019-08-29 07:45:43 -05:00
Louise Daly
9a118ceac4 Added stub support for Topology Manager to Device Manager
Co-authored-by: Conor Nolan <conor.nolan@intel.com>
Co-authored-by: Sreemanti Ghosh <sreemanti.ghosh@intel.com>
Co-authored-by: Kevin Klues <kklues@nvidia.com>
2019-08-29 07:45:43 -05:00
Kevin Klues
ddfd9ac0ca Fix bug in CPUManager with setting topology for policies
Also add a check in the unit tests to avoid regressions
2019-08-28 17:32:25 -05:00
Kevin Klues
df1b54fc09 Fail fast with TopologyManager on machines with more than 8 NUMA Nodes 2019-08-28 11:04:52 -05:00
Kevin Klues
5660cd3cfb Add NUMA Node awareness to the TopologyManager 2019-08-28 11:04:52 -05:00
Kubernetes Prow Robot
35867b160a Merge pull request #81951 from klueska/upstream-update-cpu-amanger-numa-mapping
Update the CPUManager to include NUMANodeID in its topology information
2019-08-28 08:55:40 -07:00
Kubernetes Prow Robot
de1cfa9bc1 Merge pull request #81787 from lmdaly/topology-manager-rename-strict-policy
Renaming strict policy to restricted policy
2019-08-28 01:38:04 -07:00
Kevin Klues
f4dbd29cdb Rename TopologyHint.SocketAffinity to TopologyHint.NUMANodeAffinity
As part of this, update the logic to use the NUMA information instead of
the Socket information when generating and consuming TopologyHints in
the CPUManager.
2019-08-27 16:51:05 -05:00
Kevin Klues
ecc14fe661 Update CPUManager to include NUMANodeID in CPUTopology
Unfortunately, the NUMA information is not readily available from
cadvisor, so we have to roll the logic to discover it by hand. In the
future, we should remove this custiom code to use the information
provided by cadvisor once it is made available.
2019-08-27 16:51:05 -05:00
Kevin Klues
869962fa48 Cache the discovered topology in the CPUManager instead of MachineInfo 2019-08-27 16:23:07 -05:00
Kubernetes Prow Robot
a3488b4cee Merge pull request #81206 from tallclair/staticcheck-kubelet-push
Cleanup Kubelet static analysis issues
2019-08-22 15:09:43 -07:00
Kubernetes Prow Robot
6b47754740 Merge pull request #81627 from tallclair/copy
Delete duplicate resource.Quantity.Copy()
2019-08-22 11:13:13 -07:00
Louise Daly
2fb94231d0 Renaming strict policy to restricted policy
Restricted policy will fail admission of guaranteed pods where
all requested resources are not available on a single NUMA Node
2019-08-22 07:57:55 +01:00
Tim Allclair
a2c51674cf Cleanup more static check issues (S1*,ST*) 2019-08-21 10:40:21 -07:00
Tim Allclair
8a495cb5e4 Clean up error messages (ST1005) 2019-08-21 10:40:21 -07:00
Tim Allclair
6510d26b6a Fix misc static check issues 2019-08-21 10:40:21 -07:00
Tim Allclair
3f510c69f6 Remove dead code from pkg/kubelet/... 2019-08-21 10:40:21 -07:00
Tim Allclair
49f50484b8 Delete duplicate resource.Quantity.Copy() 2019-08-19 17:23:14 -07:00
Kevin Klues
4fdd52b058 Update GetTopologyHints() API to return a map
At present, there is no way for a hint provider to return distinct hints
for different resource types via a call to GetTopologyHints(). This
means that hint providers that govern multiple resource types (e.g. the
devicemanager) must do some sort of "pre-merge" on the hints it
generates for each resource type before passing them back to the
TopologyManager.

This patch changes the GetTopologyHints() interface to allow a hint
provider to pass back raw hints for each resource type, and allow the
TopologyManager to merge them using a single unified strategy.

This change also allows the TopologyManager to recognize which
resource type a set of hints originated from, should this information
become useful in the future.
2019-08-16 08:06:12 +02:00
Kubernetes Prow Robot
f2dd24820a Merge pull request #73920 from nolancon/topology-manager-cpu-manager
Changes to make CPU Manager a Hint Provider for Topology Manager
2019-08-15 05:44:33 -07:00
Kevin Klues
b3f4bed97f Add CPUManager tests for TopologyHint consumption 2019-08-14 06:22:56 +02:00
Kevin Klues
8278d1134c Consume TopologyHints in the CPUManager
Co-Authored-By: Conor Nolan <conor.nolan@intel.com>
2019-08-14 06:22:56 +02:00
Sreemanti Ghosh
7c626a2a00 Add CPUManager tests for TopologyHint generation
Co-Authored-By: Conor Nolan <conor.nolan@intel.com>
Co-Authored-By: Kevin Klues <kklues@nvidia.com>
2019-08-14 06:22:56 +02:00
Kevin Klues
156b3f6af8 Generate TopologyHints from the CPUManager 2019-08-14 06:22:56 +02:00
Kevin Klues
9a6788cb13 Add IterateSocketMasks() function to socketmask abstraction 2019-08-14 06:22:56 +02:00
Kubernetes Prow Robot
ac2295a24d Merge pull request #78587 from kad/socketmask-string
Use go standard library for common bit operations
2019-08-13 00:03:39 -07:00
Kubernetes Prow Robot
d47f9ff132 Merge pull request #81086 from dims/fix-incorrect-readlink-check-for-checking-kernel-pids
[TOB-K8S-027] Fix Incorrect isKernelPid check
2019-08-08 17:58:04 -07:00
Davanum Srinivas
bd925d6611 [TOB-K8S-027] Fix Incorrect isKernelPid check
isKernelPid should explicitly check the error returned from os.Readlink and return true
only if the error value is ENOENT. Without this fix, if Readlink
returned say ENAMETOOLONG or EACESS, we would still count the process as
a kernel process (which is not true).
2019-08-07 11:19:19 -04:00
Davanum Srinivas
bc71c23bee [TOB-K8S-025] Incorrect docker daemon process name in container manager
The container manager used in kubelet checks for docker daemon process either via pidfile
or process name. While the pidfile points to the docker daemon process PID, the
dockerProcessName constant stores a docker cli name ( docker ) instead of docker daemon
name ( dockerd ).
2019-08-07 10:59:37 -04:00
Conor Nolan
e33af11add Add stub support for TopologyManager to CPUManager
Co-Authored-By: Louise Daly <louise.m.daly@intel.com>
2019-08-07 15:56:05 +02:00
Jianfei Bai
5726b22fbc Move docker specific const to dockershim. 2019-08-05 10:28:08 +08:00
Kubernetes Prow Robot
c63000ef81 Merge pull request #78793 from mattjmcnaughton/mattjmcnaughton/78629-fix-reserved-cgroup-systemd
Fix reserved cgroup systemd
2019-08-02 17:23:52 -07:00