Commit Graph

104273 Commits

Author SHA1 Message Date
CIPHERTron
683e5bb2e1 sort directories 2021-10-17 20:32:39 +05:30
CIPHERTron
195cd8a575 rearrange direcctories 2021-10-17 15:34:06 +05:30
CIPHERTron
0de2040ada mark kube-proxy structured logs as migrated 2021-10-17 12:49:39 +05:30
haoyun
bd8f26c2d7 fix: patchNode retry logic
Signed-off-by: haoyun <yun.hao@daocloud.io>
2021-10-17 12:36:36 +08:00
Kevin Klues
70e0f47191 Support full-pcpus-only with the new NUMA distribution policy option
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 19:31:02 +00:00
Kevin Klues
d54445a84d Generalize the NUMA distribution algorithm to take cpuGroupSize
This parameter ensures that CPUs are always allocated in groups of size
'cpuGroupSize'. This is important, for example, to ensure that all CPUs (i.e.
hyperthreads) from the same core are handed out together.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 19:31:02 +00:00
Kevin Klues
1436e33642 Add more extensive testing for NUMA distribution algorithm in CPUManager
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 19:31:02 +00:00
Kevin Klues
cf3afb8602 Add 2 distinguishing test cases between the 2 takeByTopology algorithms
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 19:31:02 +00:00
Kevin Klues
eb78e2406b Add a new TestTakeByTopologyNUMADistributed() test to the CPUManager
As part of this, pull out all of the existing "TakeByTopology" tests and have
them be called by the original TestTakeByTopologyNUMAPacked() as well as the
new TestTakeByTopologyNUMADistributed() test. In a subsequent commit, we will
add some tests that should differ between these two algorithms.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 19:31:02 +00:00
Kevin Klues
876dd9b078 Added algorithm to CPUManager to distribute CPUs across NUMA nodes
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 19:31:02 +00:00
Kevin Klues
462544d079 Split CPUManager takeByTopology() into two different algorithms
The first implements the original algorithm which packs CPUs onto NUMA nodes if
more than one NUMA node is required to satisfy the allocation. The second
disitributes CPUs across NUMA nodes if they can't all fit into one.

The "distributing" algorithm is currently a noop and just returns an error of
"unimplemented". A subsequent commit will add the logic to implement this
algorithm according to KEP 2902:

https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2902-cpumanager-distribute-cpus-policy-option

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 14:46:19 +00:00
Kevin Klues
0e7928edce Add new CPUManager policy option for "distribute-cpus-across-numa"
This commit only adds the option to the policy options framework. A
subsequent commit will add the logic to utilize it.

The KEP describing this new option can be found here:
https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2902-cpumanager-distribute-cpus-policy-option

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 14:46:19 +00:00
Kubernetes Prow Robot
0cef26182c
Merge pull request #105674 from tkashem/apf-debug
apf: include seat information in per request debug dump
2021-10-15 23:53:48 -07:00
Kubernetes Prow Robot
3f40906dd8
Merge pull request #105702 from liggitt/json-strict-test
JSON decoder fixup
2021-10-15 15:45:48 -07:00
Kubernetes Prow Robot
daa83e6263
Merge pull request #105688 from mcshooter/updateNPD0810
Update the binary version file for NPD to 0.8.10-gke0.1
2021-10-15 12:55:16 -07:00
Jordan Liggitt
ffb2d12633 Test json/yaml decoding type coercion 2021-10-15 11:52:56 -04:00
Jordan Liggitt
b4632c38f0 Fix strict json decoder test 2021-10-15 11:52:56 -04:00
Jordan Liggitt
fd64f8d7ef Add missing json tag on internal unstructured list 2021-10-15 11:52:56 -04:00
Aldo Culquicondor
2c1b3fdb5b Graduate JobTrackingWithFinalizers to beta
Enable feature by default.

Update integration tests for other features to assume that finalizers are present.

Change-Id: Ie969344f572627dba882c0e862e5700dadaf3026
2021-10-15 10:29:40 -04:00
kerthcet
fc9533e72f remove scheduler ServiceAffinity plugin
Signed-off-by: kerthcet <kerthcet@gmail.com>
2021-10-15 22:10:31 +08:00
Kubernetes Prow Robot
55e1d2f9a7
Merge pull request #102015 from klueska/upstream-add-numa-to-cpu-assignment-algo
Add support for consuming whole NUMA nodes in CPUManager CPU assignments
2021-10-15 05:44:54 -07:00
Konstantin Misyutin
dbc9d7b71a Remove tests when StorageObjectInUseProtection feature is disabled
As well as feature gate are locked, the tests when this feature is
disabled will crash. So we should remove them together with locking
the feature.

Signed-off-by: Konstantin Misyutin <konstantin.misyutin@huawei.com>
2021-10-15 19:39:37 +08:00
Konstantin Misyutin
e07d736522 Lock StorageObjectInUseProtection feature gate to default
Signed-off-by: Konstantin Misyutin <konstantin.misyutin@huawei.com>
2021-10-15 19:36:53 +08:00
Francesco Romani
4bae656835 cpumanager: test NUMA node support for CPU assign (2)
This batch of tests adds a fake topology on which each numa node
has multiple sockets. We didn't find yet a real HW topology in the wild
like this, but we need one to fully exercise the code.

So, until we find a HW topology, we add a fake one flipping
the NUMA/socket config of the existing xeon dual gold 6320.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-10-15 10:29:21 +00:00
Francesco Romani
547996f3f6 cpumanager: test NUMA node support for CPU assign (1)
This batch of tests adds a real topology on which each physical socket
has multiple NUMA zones. Taken by a real dual xeon 6320 gold.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-10-15 10:29:21 +00:00
Francesco Romani
f6ccc4426a cpumanager: test: use proper subtests
The exisiting unit tests where performing subtests without
actually using the full features of the testing package
(https://pkg.go.dev/testing#hdr-Subtests_and_Sub_benchmarks)

Update them with fairly minimal changes. The patch is deceptively
large because we need to move the code inside a new block.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-10-15 10:29:21 +00:00
Francesco Romani
15caa134b2 cpumanager: topology: use rich cmp package
User the `cmp.Diff` package in the unit tests, moving away from
`reflect.DeepEqual`. This gives us a clearer picture of the differences
when the tests fail.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-10-15 10:29:21 +00:00
Kevin Klues
aff54a0914 Abstract out whether NUMA or Sockets come first in the memory hierarchy
This allows us to get rid of the check for determining which one is higher all
throughout the code. Now we just check once and instantiate an interface of the
appropriate type that makes sure the ordering in the hierarchy is preserved
through the appropriate calls.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-15 10:29:15 +00:00
Francesco Romani
baa55935f3 node: e2e: clarify findKubeletService
Add docstrings to findKubeletService and restartKubelet,
fix typos along the way.
xref: https://github.com/kubernetes/kubernetes/pull/105516#pullrequestreview-780230582

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-10-15 11:19:03 +02:00
Kevin Klues
17c7e86c6d Add NUMA support to the CPU assignment algorithm in the CPUManager
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-15 08:35:59 +00:00
Shiming Zhang
e47c78a354 Add log for creating node shutdown manager 2021-10-15 11:16:21 +08:00
Shiming Zhang
b468c24e85 Refactor to use structure to pass parameters 2021-10-15 11:16:21 +08:00
Kubernetes Prow Robot
655c04d9f5
Merge pull request #105673 from andyzhangx/validate-windows-disk-num
support more than 100 disk mounts on Windows
2021-10-14 20:08:55 -07:00
Kubernetes Prow Robot
fe62fcc9b4
Merge pull request #105516 from fromanirh/e2e-kubelet-restart-improvements
e2e: node: kubelet restart improvements
2021-10-14 17:58:54 -07:00
Michelle Tandya
e9e6a7cb6b Update the binary version file for NPD to 0.8.10-gke0.1 2021-10-14 20:55:41 +00:00
Kubernetes Prow Robot
c2bff66b95
Merge pull request #104783 from YuviGold/fix-shellcheck-output-streams
Fix shellcheck output streams
2021-10-14 12:58:55 -07:00
Kubernetes Prow Robot
30a32a39a4
Merge pull request #105136 from astraw99/fix-csi-mount-log
Fix CSI `mounter.TearDownAt` log msg
2021-10-14 11:54:55 -07:00
Kubernetes Prow Robot
1f7ff80387
Merge pull request #105679 from cpanato/publishbot
staging/publishing: Set go1.16 version to go1.16.9
2021-10-14 10:50:20 -07:00
Kubernetes Prow Robot
0bfa37dfcc
Merge pull request #105676 from alculquicondor/job-name
Fix name for Pods of NonIndexed Jobs
2021-10-14 10:50:12 -07:00
Kubernetes Prow Robot
fb2556cb34
Merge pull request #105670 from pohly/restore-volume-life-cycle-check
e2e: restore volume lifecycle check for most tests, II
2021-10-14 10:50:04 -07:00
Kubernetes Prow Robot
3f85ed46db
Merge pull request #105649 from navist2020/kubeadm/kubeconfig/cfgPath
kubeadm/kubeconfig:validate flag --config to make sure it is not empty
2021-10-14 10:49:56 -07:00
Kubernetes Prow Robot
57aaa70b2c
Merge pull request #105596 from pacoxu/subresource-remove
test fix: check correct subresource patch path
2021-10-14 10:49:48 -07:00
Kubernetes Prow Robot
7c80381d98
Merge pull request #105485 from liggitt/podsecurity-limit
PodSecurity: limit webhook admission input
2021-10-14 10:49:36 -07:00
Shivanshu Raj Shrivastava
7d9a6d1de6
Migrated pkg/proxy/ipvs to structured logging (#104932)
* migrated ipset.go

* migrated graceful_termination.go

* fixed vstring

* fixed ip set entry, made it consistent

* fixed rs logging

* resolving review comments for key graceful_termination.go

* refactoring ipset.go

* included review changes
2021-10-14 09:47:29 -07:00
Shivanshu Raj Shrivastava
daf5af2917
Migrated pkg/proxy to structured logging (#104891)
* migrated service.go to structured logging

* fixing capital letter in starting

* migrated topology.go

* migrated endpointslicecache.go

* migrated endpoints.go

* nit typo

* nit plural to singular

* fixed format

* code formatting

* resolving review comment for key ipFamily

* resolving review comment for key endpoints.go

* code formating

* Converted Warningf to ErrorS, wherever applicable

* included review changes

* included review changes
2021-10-14 09:47:17 -07:00
Kubernetes Prow Robot
dea052ceba
Merge pull request #105479 from ahg-g/ahg-mutable
Allow updating scheduling directives of suspended jobs that never started
2021-10-14 08:09:18 -07:00
Aldo Culquicondor
4ef9d18abe Fix name for Pods of NonIndexed Jobs
Change-Id: I0ea4685a82f4cdec0caab362d52144476652f95a
2021-10-14 10:55:46 -04:00
Ben Luddy
1873915be6
Free APF seats for watches handled by an aggregated apiserver. 2021-10-14 10:39:15 -04:00
Carlos Panato
5ca8ae9f35
staging/publishing: Set go1.16 version to go1.16.9
Signed-off-by: Carlos Panato <ctadeu@gmail.com>
2021-10-14 16:23:31 +02:00
Abdullah Gharaibeh
335817cbce Allow updating node affinity, selector and tolerations for suspended jobs that never started 2021-10-14 10:04:47 -04:00