Commit Graph

105192 Commits

Author SHA1 Message Date
Kevin Klues
209cd20548 Fix unit tests following bug fix in CPUManager for map functions (2/2)
Now that the algorithm for balancing CPU distributions across NUMA nodes is
correct, this test actually behaves differently for the "packed" vs.
"distributed" allocation algorithms (as it should).

In the "packed" case we need to ensure that CPUs are allocated such that they
are packed onto cores. Since one CPU is already allocated from a core on NUMA
node 0, we want the next CPU to be its hyperthreaded pair (even though the
first available CPU id is on Socket 1).

In the "distributed" case, however, we want to ensure CPUs are allocated such
that we have an balanced distribution of CPUs across all NUMA nodes. This
points to allocating from Socket 1 if the only other CPU allocated has been
done on Socket 0.

To allow CPUs allocations to be packed onto full cores, one can allocate them
from the "distributed" algorithm with a 'cpuGroupSize' equal to the number of
hypthreads per core (in this case 2). We added an explicit test case for this,
demonstrating that we get the same result as the "packed" algorithm does, even
though the "distributed" algorithm is in use.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-11-24 16:51:24 +00:00
Kevin Klues
67f719cb1d Fix unit tests following bug fix in CPUManager for map functions (1/2)
This fixes two related tests to better test our "balanced" distribution algorithm.

The first test originally provided an input with the following number of CPUs
available on each NUMA node:

Node 0: 16
Node 1: 20
Node 2: 20
Node 3: 20

It then attempted to distribute 48 CPUs across them with an expectation that
each of the first 3 NUMA nodes would have 16 CPUs taken from them (leaving Node
0 with no more CPUs in the end).

This would have resulted in the following amount of CPUs on each node:

Node 0: 0
Node 1: 4
Node 2: 4
Node 3: 20

Which results in a standard deviation of 7.6811

However, a more balanced solution would actually be to pull 16 CPUs from NUMA
nodes 1, 2, and 3, and leave 0 untouched, i.e.:

Node 0: 16
Node 1: 4
Node 2: 4
Node 3: 4

Which results in a standard deviation of 5.1961524227066

To fix this test we changed the original number of available CPUs to start with
4 less CPUs on NUMA node 3, and 2 more CPUs on NUMA node 0, i.e.:

Node 0: 18
Node 1: 20
Node 2: 20
Node 3: 16

So that we end up with a result of:

Node 0: 2
Node 1: 4
Node 2: 4
Node 3: 16

Which pulls the CPUs from where we want and results in a standard deviation of 5.5452

For the second test, we simply reverse the number of CPUs available for Nodes 0
and 3 as:

Node 0: 16
Node 1: 20
Node 2: 20
Node 3: 18

Which forces the allocation to happen just as it did for the first test, except
now on NUMA nodes 1, 2, and 3 instead of NUMA nodes 0,1, and 2.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-11-24 16:51:23 +00:00
Kevin Klues
4008ea0b4c Fix bug in CPUManager map.Keys() and map.Values() implementations
Previously these would return lists that were too long because we appended to
pre-initialized lists with a specific size.

Since the primary place these functions are used is in the mean and standard
deviation calculations for the NUMA distribution algorithm, it meant that the
results of these calculations were often incorrect.

As a result, some of the unit tests we have are actually incorrect (because the
results we expect do not actually produce the best balanced
distribution of CPUs across all NUMA nodes for the input provided).

These tests will be patched up in subsequent commits.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-11-24 16:51:21 +00:00
Kevin Klues
446c58e0e7 Ensure we balance across *all* NUMA nodes in NUMA distribution algo
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-11-24 16:51:19 +00:00
Kevin Klues
c8559bc43e Short-circuit CPUManager distribute NUMA algo for unusable cpuGroupSize
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-11-24 16:51:16 +00:00
Kevin Klues
b28c1392d7 Round the CPUManager mean and stddev calculations to the nearest 1000th
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-11-24 16:51:13 +00:00
Anago GCB
c8c81cbfbb CHANGELOG: Update directory for v1.23.0-rc.0 release 2021-11-24 06:19:11 +00:00
Kubernetes Prow Robot
e53cf07724
Merge pull request #106611 from verult/delegate-fsgroup-disable-onrootmismatch-e2e
Delegate FSGroup CSI driver e2e: verify fsgroup is passed to CSI calls
2021-11-23 17:52:20 -08:00
Kubernetes Prow Robot
c3e6b66643
Merge pull request #106533 from haircommander/summary-page-fault-test
test: update major page fault values for summary test
2021-11-23 15:09:45 -08:00
Kubernetes Prow Robot
a5622f3f6e
Merge pull request #106616 from mattcary/pvc-race
Clean up deep copy needed for UpdateStatefulSet
2021-11-23 09:38:17 -08:00
Matthew Cary
0e2b901762 Clean up deep copy needed for UpdateStatefulSet
Change-Id: Id732358183d682d1a945cfee56f83bcaac0d7c31
2021-11-23 06:48:54 -08:00
Kubernetes Prow Robot
e31aafc4fd
Merge pull request #106348 from endocrimes/dani/rm-gpu
e2e_node: unify device tests
2021-11-22 19:46:16 -08:00
Cheng Xing
bca1b79728 Delegate FSGroup CSI driver e2e: verify fsgroup is passed to CSI calls using mock driver tests 2021-11-22 17:00:39 -08:00
Kubernetes Prow Robot
f572e4d5b4
Merge pull request #106518 from SergeyKanzhelev/tryProbeFix
Fix the bug with GRPC probe
2021-11-22 15:38:54 -08:00
Kubernetes Prow Robot
a142f86351
Merge pull request #105764 from jlebon/pr/add-ssh-mode
test/e2e_node/remote: support pure SSH mode
2021-11-22 10:53:33 -08:00
Jonathan Lebon
3ebd93cd02 test-e2e-node: support pure SSH mode
Right now, `run_remote.go` only supports GCE instances. But actually
running the tests is completely independent of GCE and could work just
as well on any SSH-accessible machine.

This patch adds a new `--mode` switch, which defaults to `gce` for
backwards compatibility, but can be set to `ssh`. In that mode, the GCE
API is not used at all, and we simply connect to the hosts given via
`--hosts`.

This is still better than `run_local.go` because the latter mixes build
environment with test environment, which doesn't fit well with
container-optimized operating systems.

This is part of an effort to setup the e2e node tests on Fedora CoreOS
(see https://github.com/coreos/fedora-coreos-tracker/issues/990).

Patch best viewed with whitespace ignored.
2021-11-22 10:13:15 -05:00
Jonathan Lebon
e0723c1e64 test-e2e-node: add SSH_OPTIONS
This allows overriding the default options.
2021-11-22 10:13:13 -05:00
Jonathan Lebon
591f4cdb77 run_remote.go: factor out prepareGceImages()
Mostly a pure code move. Only changed the `klog.Fatalf` to `fmt.Errorf`.
Prep for future patch.
2021-11-22 10:12:29 -05:00
Jonathan Lebon
032dbd2063 run_remote.go: move registerGceHostIP() call to testImage()
I.e. don't assume that `testHost` is called on a GCE host. Prep for
future patch.
2021-11-22 10:12:28 -05:00
Jonathan Lebon
36233b985b run_remote.go: factor out registerGceHostIP()
Prep for future patch.
2021-11-22 10:12:28 -05:00
Kubernetes Prow Robot
806e38aeb7
Merge pull request #106577 from liggitt/field-validation-speedup
Speed up field validation tests
2021-11-22 02:07:09 -08:00
Jordan Liggitt
d4d34085e4 Clean up field validation test logs 2021-11-21 21:29:06 -05:00
Jordan Liggitt
8fa1c612fd Speed up field validation tests 2021-11-21 21:29:06 -05:00
Kubernetes Prow Robot
a8c9dd6274
Merge pull request #106576 from liggitt/bad-request-patch
Return BadRequest for invalid large patch
2021-11-21 13:05:00 -08:00
Jordan Liggitt
2d307f47bd Return BadRequest for invalid large patch 2021-11-21 09:13:37 -05:00
Kubernetes Prow Robot
ed07515ee0
Merge pull request #106431 from Namanl2001/image-config-dir
enabling runtime-config to be passed via make file for node-e2e testing purposes
2021-11-20 08:22:59 -08:00
Kubernetes Prow Robot
21d3acc787
Merge pull request #106544 from ehashman/fix-flake-restart
Deflake "Kubelet should correctly account for terminated pods after restart"
2021-11-20 00:04:59 -08:00
Kubernetes Prow Robot
9a1d90165d
Merge pull request #106462 from jpbetz/cel-e2e2
Add e2e test for CEL Validation Rules
2021-11-19 22:04:59 -08:00
Kubernetes Prow Robot
823cc3cc36
Merge pull request #106563 from ehashman/more-etcd-validation
Validate etcd image versions in test manifests
2021-11-19 18:09:00 -08:00
Kubernetes Prow Robot
084b28f6d5
Merge pull request #106510 from robscott/topology-ready-fix-controller
Updating TopologyCache to disregard unready endpoints in calculations
2021-11-19 17:07:11 -08:00
Kubernetes Prow Robot
37ae94f9ed
Merge pull request #106507 from robscott/topology-ready-fix
Updating kube-proxy to ignore unready endpoints for Topology Hints
2021-11-19 17:06:59 -08:00
Sergey Kanzhelev
f390d49e24 fix the grpc probes 2021-11-20 00:23:53 +00:00
Kubernetes Prow Robot
c82a0f8ddc
Merge pull request #106562 from SergeyKanzhelev/BumpEtcdVersion
bumpt etcd image version for e2e tests
2021-11-19 16:02:08 -08:00
Kubernetes Prow Robot
1da209faab
Merge pull request #106220 from NikhilSharmaWe/betterOutputWindows
Changed code to improve output for test/e2e/windows
2021-11-19 16:01:56 -08:00
Elana Hashman
c9d9b548a4
Validate etcd image versions in test manifests 2021-11-19 15:13:49 -08:00
Kubernetes Prow Robot
8f9dd0a14c
Merge pull request #105916 from kevindelgado/validation-unify-all
Server Side Strict Field Validation
2021-11-19 14:27:22 -08:00
Sergey Kanzhelev
6e591ab8ed bumpt etcd image version for e2e tests 2021-11-19 22:00:28 +00:00
Kevin Delgado
e50e2bbc88 Server Side Field Validation
Implements server side field validation behind the
`ServerSideFieldValidation` feature gate. With the
feature enabled, any create/update/patch request
with the `fieldValidation` query param set to
"Strict" will error if the object in the request
body have unknown fields. A value of "Warn"
(also the default when the feautre is enabled)
will succeed the request with a warning.

When the feature is disabled (or the query param
has a value of "Ignore"), the request will succeed
as it previously had with no indications of any
unknown or duplicate fields.
2021-11-19 21:24:36 +00:00
Elana Hashman
6ddf86d422
Set startTimeout back to 3m, restore wait loop at end of test 2021-11-19 11:30:43 -08:00
Kubernetes Prow Robot
ddfc53922c
Merge pull request #106414 from jonyhy96/kubelet-fix-flake
kubelet: fix npe in test
2021-11-19 07:06:51 -08:00
haoyun
65ac99eef5 fix: npe in kubelet test
Signed-off-by: haoyun <yun.hao@daocloud.io>
Co-authored-by: Antonio Ojea <antonio.ojea.garcia@gmail.com>
2021-11-19 17:44:05 +08:00
Joe Betz
0b96f53f52 Add e2e test for CEL Validation Rules 2021-11-18 21:01:40 -05:00
Kubernetes Prow Robot
9b180d8913
Merge pull request #105481 from claudiubelu/tests/e2e-prepull-images
tests: Prepull images
2021-11-18 17:22:51 -08:00
Elana Hashman
b4a8861af3
Tweak resource requests for Kubelet restart test 2021-11-18 14:57:22 -08:00
Nikhil Sharma
b75acac9df Changed code to improve output for test/e2e/windows 2021-11-19 04:06:05 +05:30
Rob Scott
1983f41065
Updating kube-proxy to ignore unready endpoints for Topology Hints 2021-11-18 14:04:44 -08:00
Rob Scott
9813ec7e8a
Updating TopologyCache to disregard unready endpoints in calculations 2021-11-18 13:54:09 -08:00
Kubernetes Prow Robot
51b94de68f
Merge pull request #105451 from claudiubelu/tests/log-pod-logs
tests: Fetch the pod logs in failed cases
2021-11-18 13:33:36 -08:00
Kubernetes Prow Robot
203d145b6a
Merge pull request #106281 from ii/promote-delete-service-collection
Promote DeleteCollection service e2e test to conformance - +1 endpoint
2021-11-18 07:47:03 -08:00
Peter Hunt
76df8acb80 test: update major page fault values for summary test
as well as use a variable instead of a constant

Signed-off-by: Peter Hunt <pehunt@redhat.com>
2021-11-18 09:24:41 -05:00