Commit Graph

104179 Commits

Author SHA1 Message Date
Aldo Culquicondor
1bff5eb44d Add ready field to Job status
to keep a count of the pods that have the ready condition.

Also:
- Add feature gate JobReadyPods.
- Add Ready to describe.

Change-Id: Ib934730a430a8e2a2f485671e345fe2330006939
2021-10-19 15:18:34 -04:00
Kubernetes Prow Robot
c733594040
Merge pull request #105687 from alculquicondor/job-tracking
Graduate JobTrackingWithFinalizers to beta
2021-10-19 11:40:37 -07:00
Kubernetes Prow Robot
b2c4269992
Merge pull request #105631 from klueska/upstream-distribute-cpus-across-numa
Add CPUManager policy option to distribute CPUs across NUMA nodes instead of packing them
2021-10-19 11:40:24 -07:00
Kubernetes Prow Robot
2dbdd9461d
Merge pull request #105748 from marosset/host-process-emphemeral-contianer-validation
Adding unit test coverage for API validation for ephemeral containers in hostprocess pods on Windows
2021-10-19 08:11:04 -07:00
Kubernetes Prow Robot
5cdc3407ee
Merge pull request #105738 from tkashem/apf-remove-func
apf: return nil for a request that has been removed from queue
2021-10-19 06:30:39 -07:00
Abu Kashem
cd06ba502c
apf: return nil for a request that has been removed from queue 2021-10-19 08:29:35 -04:00
Shivanshu Raj Shrivastava
3e6d122ee1
fixed using reference to loop iterator (#105433)
* fixed using reference to loop iterator

* fixed other for loops
2021-10-19 02:40:38 -07:00
Kubernetes Prow Robot
edeab47b36
Merge pull request #105757 from MikeSpreitzer/catch-up
Fix nits noticed in recent code review
2021-10-19 00:00:38 -07:00
Mike Spreitzer
1844a05277 Fix nits noticed in recent code review 2021-10-18 23:51:48 -05:00
Kubernetes Prow Robot
b977200a5d
Merge pull request #102785 from yselkowitz/master
Enable more test images for s390x
2021-10-18 19:59:34 -07:00
Kubernetes Prow Robot
1af8a8c026
Merge pull request #105465 from marosset/remove-host-process-contianer-kubelet-annotations
Stop passing WindowsHostProcessContainer annotations for CRI calls in kubelet
2021-10-18 15:50:02 -07:00
Kubernetes Prow Robot
e526cf316f
Merge pull request #105081 from aramase/update-log-mount-util
update the log message for mount windows
2021-10-18 15:49:54 -07:00
Kubernetes Prow Robot
e595d79dfc
Merge pull request #104574 from 249043822/br-repeat-package
fix duplicate package import in pod_worker
2021-10-18 15:49:46 -07:00
Kubernetes Prow Robot
819b021ada
Merge pull request #92433 from claudiubelu/windows/etcd-image
Adds Windows support for etcd image
2021-10-18 15:49:34 -07:00
Mark Rossetti
3ddff55fe6 Adding unit test coverage for API validation for emphermal contaienrs in hostprocess pods on Windows
Signed-off-by: Mark Rossetti <marosset@microsoft.com>
2021-10-18 15:46:27 -07:00
Kubernetes Prow Robot
5889fb4fbc
Merge pull request #105652 from wzshiming/feat/structure-shutdown-config
Refactor to use structure to pass parameters for GracefulNodeShutdown
2021-10-18 14:45:20 -07:00
Kubernetes Prow Robot
a78e3133a0
Merge pull request #104327 from sxllwx/fix/dynamic-client
set the content-type Header when the dynamic client sends the request
2021-10-18 03:01:49 -07:00
Kevin Klues
86f9c266bc Add optimizations to reduce iterations in distributed NUMA algorithm
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-18 08:53:25 +00:00
Kubernetes Prow Robot
9804a83d8f
Merge pull request #105343 from jonyhy96/fix-patch-node-once
kubeadm: fix some retry logic in PatchNodeOnce
2021-10-17 09:49:49 -07:00
haoyun
bd8f26c2d7 fix: patchNode retry logic
Signed-off-by: haoyun <yun.hao@daocloud.io>
2021-10-17 12:36:36 +08:00
Kevin Klues
70e0f47191 Support full-pcpus-only with the new NUMA distribution policy option
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 19:31:02 +00:00
Kevin Klues
d54445a84d Generalize the NUMA distribution algorithm to take cpuGroupSize
This parameter ensures that CPUs are always allocated in groups of size
'cpuGroupSize'. This is important, for example, to ensure that all CPUs (i.e.
hyperthreads) from the same core are handed out together.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 19:31:02 +00:00
Kevin Klues
1436e33642 Add more extensive testing for NUMA distribution algorithm in CPUManager
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 19:31:02 +00:00
Kevin Klues
cf3afb8602 Add 2 distinguishing test cases between the 2 takeByTopology algorithms
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 19:31:02 +00:00
Kevin Klues
eb78e2406b Add a new TestTakeByTopologyNUMADistributed() test to the CPUManager
As part of this, pull out all of the existing "TakeByTopology" tests and have
them be called by the original TestTakeByTopologyNUMAPacked() as well as the
new TestTakeByTopologyNUMADistributed() test. In a subsequent commit, we will
add some tests that should differ between these two algorithms.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 19:31:02 +00:00
Kevin Klues
876dd9b078 Added algorithm to CPUManager to distribute CPUs across NUMA nodes
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 19:31:02 +00:00
Kevin Klues
462544d079 Split CPUManager takeByTopology() into two different algorithms
The first implements the original algorithm which packs CPUs onto NUMA nodes if
more than one NUMA node is required to satisfy the allocation. The second
disitributes CPUs across NUMA nodes if they can't all fit into one.

The "distributing" algorithm is currently a noop and just returns an error of
"unimplemented". A subsequent commit will add the logic to implement this
algorithm according to KEP 2902:

https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2902-cpumanager-distribute-cpus-policy-option

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 14:46:19 +00:00
Kevin Klues
0e7928edce Add new CPUManager policy option for "distribute-cpus-across-numa"
This commit only adds the option to the policy options framework. A
subsequent commit will add the logic to utilize it.

The KEP describing this new option can be found here:
https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2902-cpumanager-distribute-cpus-policy-option

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 14:46:19 +00:00
Kubernetes Prow Robot
0cef26182c
Merge pull request #105674 from tkashem/apf-debug
apf: include seat information in per request debug dump
2021-10-15 23:53:48 -07:00
Kubernetes Prow Robot
3f40906dd8
Merge pull request #105702 from liggitt/json-strict-test
JSON decoder fixup
2021-10-15 15:45:48 -07:00
Kubernetes Prow Robot
daa83e6263
Merge pull request #105688 from mcshooter/updateNPD0810
Update the binary version file for NPD to 0.8.10-gke0.1
2021-10-15 12:55:16 -07:00
Jordan Liggitt
ffb2d12633 Test json/yaml decoding type coercion 2021-10-15 11:52:56 -04:00
Jordan Liggitt
b4632c38f0 Fix strict json decoder test 2021-10-15 11:52:56 -04:00
Jordan Liggitt
fd64f8d7ef Add missing json tag on internal unstructured list 2021-10-15 11:52:56 -04:00
Aldo Culquicondor
2c1b3fdb5b Graduate JobTrackingWithFinalizers to beta
Enable feature by default.

Update integration tests for other features to assume that finalizers are present.

Change-Id: Ie969344f572627dba882c0e862e5700dadaf3026
2021-10-15 10:29:40 -04:00
Kubernetes Prow Robot
55e1d2f9a7
Merge pull request #102015 from klueska/upstream-add-numa-to-cpu-assignment-algo
Add support for consuming whole NUMA nodes in CPUManager CPU assignments
2021-10-15 05:44:54 -07:00
Francesco Romani
4bae656835 cpumanager: test NUMA node support for CPU assign (2)
This batch of tests adds a fake topology on which each numa node
has multiple sockets. We didn't find yet a real HW topology in the wild
like this, but we need one to fully exercise the code.

So, until we find a HW topology, we add a fake one flipping
the NUMA/socket config of the existing xeon dual gold 6320.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-10-15 10:29:21 +00:00
Francesco Romani
547996f3f6 cpumanager: test NUMA node support for CPU assign (1)
This batch of tests adds a real topology on which each physical socket
has multiple NUMA zones. Taken by a real dual xeon 6320 gold.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-10-15 10:29:21 +00:00
Francesco Romani
f6ccc4426a cpumanager: test: use proper subtests
The exisiting unit tests where performing subtests without
actually using the full features of the testing package
(https://pkg.go.dev/testing#hdr-Subtests_and_Sub_benchmarks)

Update them with fairly minimal changes. The patch is deceptively
large because we need to move the code inside a new block.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-10-15 10:29:21 +00:00
Francesco Romani
15caa134b2 cpumanager: topology: use rich cmp package
User the `cmp.Diff` package in the unit tests, moving away from
`reflect.DeepEqual`. This gives us a clearer picture of the differences
when the tests fail.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-10-15 10:29:21 +00:00
Kevin Klues
aff54a0914 Abstract out whether NUMA or Sockets come first in the memory hierarchy
This allows us to get rid of the check for determining which one is higher all
throughout the code. Now we just check once and instantiate an interface of the
appropriate type that makes sure the ordering in the hierarchy is preserved
through the appropriate calls.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-15 10:29:15 +00:00
Kevin Klues
17c7e86c6d Add NUMA support to the CPU assignment algorithm in the CPUManager
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-15 08:35:59 +00:00
Shiming Zhang
e47c78a354 Add log for creating node shutdown manager 2021-10-15 11:16:21 +08:00
Shiming Zhang
b468c24e85 Refactor to use structure to pass parameters 2021-10-15 11:16:21 +08:00
Kubernetes Prow Robot
655c04d9f5
Merge pull request #105673 from andyzhangx/validate-windows-disk-num
support more than 100 disk mounts on Windows
2021-10-14 20:08:55 -07:00
Kubernetes Prow Robot
fe62fcc9b4
Merge pull request #105516 from fromanirh/e2e-kubelet-restart-improvements
e2e: node: kubelet restart improvements
2021-10-14 17:58:54 -07:00
Michelle Tandya
e9e6a7cb6b Update the binary version file for NPD to 0.8.10-gke0.1 2021-10-14 20:55:41 +00:00
Kubernetes Prow Robot
c2bff66b95
Merge pull request #104783 from YuviGold/fix-shellcheck-output-streams
Fix shellcheck output streams
2021-10-14 12:58:55 -07:00
Kubernetes Prow Robot
30a32a39a4
Merge pull request #105136 from astraw99/fix-csi-mount-log
Fix CSI `mounter.TearDownAt` log msg
2021-10-14 11:54:55 -07:00
Kubernetes Prow Robot
1f7ff80387
Merge pull request #105679 from cpanato/publishbot
staging/publishing: Set go1.16 version to go1.16.9
2021-10-14 10:50:20 -07:00