Commit Graph

105491 Commits

Author SHA1 Message Date
Sascha Grunert
a063a2ba3e
Revert dockershim CRI v1 changes
We should not touch the dockershim ahead of removal and therefore
default to `v1alpha2` CRI instead of `v1`.

Partially reverts changes from https://github.com/kubernetes/kubernetes/pull/106501

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2021-12-03 18:37:11 +01:00
Slavik Panasovets
3c51478f65 add gce loadbalancer no-op finalizer and existingFwdRule tests 2021-12-03 13:49:31 +00:00
Slavik Panasovets
b2534483fa disable gce service handling if has rbs forwarding rule 2021-12-03 13:48:26 +00:00
Davanum Srinivas
555623c07e
staging: add dummy commit to trigger gomod update (#106794)
add newline to all staging repos

Signed-off-by: Davanum Srinivas <davanum@gmail.com>

Co-authored-by: Nikhita Raghunath <nikitaraghunath@gmail.com>
2021-12-02 20:38:34 -05:00
Slavik Panasovets
e2c49a0dd1 add ELBRbsFinalizer 2021-12-02 15:48:52 +00:00
Patrick Ohly
a39b3877e9 storage e2e: update mock deployment
These changes were created automatically with the updated update-hostpath.sh
script.
2021-12-02 16:18:26 +01:00
Patrick Ohly
48e9a39842 storage e2e: update snapshotter sidecard RBAC
The same change was already done for csi-driver-host-path master, but not
released yet because csi-snapshotter v5.0.0 itself was not ready yet.

We need this update in k/k because some canary jobs already use the new
snapshotter sidecar which causes permission issues.
2021-12-02 15:06:14 +01:00
Patrick Ohly
0605a394bf storage e2e: hostpath driver v1.7.3
This is an automatic update of the testing manifests that mirrors the v1.7.3
release. All of these changes were created with
   test/e2e/testing-manifests/storage-csi$ ./update-hostpath.sh v1.7.3
2021-12-02 15:02:57 +01:00
Amarnath Valluri
e68c9f3dec test/e2e/storage: replace mock driver with hostpath driver
This is a first step towards removing the mock CSI driver completely from
e2e testing in favor of hostpath plugin. With the recent hostpath plugin
changes(PR #260, #269), it supports all the features supported by the mock
csi driver.

Using hostpath-plugin for testing also covers CSI persistent feature
usecases.
2021-12-02 14:41:08 +01:00
Kubernetes Prow Robot
2ac6a4121f
Merge pull request #106781 from palnabarun/publishing-bot/remove-1.19-rules
publishing-bot: remove rules for release-1.19
2021-12-02 03:53:33 -08:00
Jian Li
8689f22821 fix mapToUnstructured error message 2021-12-02 18:07:16 +08:00
Jian Li
d4f3b5a6d1 cleanup: use present typeFrom variable to avoid another reflect.TypeOf call 2021-12-02 14:59:36 +08:00
Nabarun Pal
78e1ec2e38 publishing-bot: remove rules for release-1.19
Kubernetes 1.19 is not actively maintained anymore.

Signed-off-by: Nabarun Pal <pal.nabarun95@gmail.com>
2021-12-02 10:08:20 +05:30
Kubernetes Prow Robot
0fe049cb93
Merge pull request #106774 from SergeyKanzhelev/grpcFieldRename
Grpc field rename
2021-12-01 12:11:18 -08:00
Sergey Kanzhelev
4c9d77d724 generated files for the grpc field rename 2021-12-01 18:25:37 +00:00
Sergey Kanzhelev
1918ecad04 update the grpc field name for consistency 2021-12-01 18:16:08 +00:00
Alexis MacAskill
8102bbe05a skip parallel volume cloning test for gce pd and fix disk not ready error for gce pd 2021-12-01 17:49:48 +00:00
Nikhil Sharma
0cd58b825f Changed code to improve output for files under test/e2e/apimachinery 2021-12-01 17:08:30 +05:30
Wojciech Tyczyński
ba5e08223d Add watchcache metrics to tracking its progress 2021-12-01 09:46:55 +01:00
haoyun
84a7329cf0 fix: combine assertion prevent npe
Signed-off-by: haoyun <yun.hao@daocloud.io>
2021-12-01 16:37:24 +08:00
Mike Spreitzer
88f8e8448b Clarify APF metric wrt all three stages of execution 2021-11-30 20:16:48 -05:00
Kubernetes Prow Robot
108c284a33
Merge pull request #106728 from enj/enj/o/enj_authn_approve
Add enj to sig-auth-authenticators-approvers
2021-11-30 09:18:56 -08:00
Abdullah Gharaibeh
33a04dc5f5 Added an integration test for NodeResourcesFit scoring 2021-11-30 12:13:30 -05:00
Wojciech Tyczyński
243f4faa6d Update kubemark to use EndpointSlices and proper user-agents 2021-11-30 11:38:08 +01:00
Patrick Ohly
a155010bcb OWNERS: add pohly as SIG Instrumentation review and component-base/logs approver
I've helped review PRs already, mostly around structured logging. I'm a
co-chair of that working group.
2021-11-30 09:45:21 +01:00
Monis Khan
bffdf3580b
Add enj to sig-auth-authenticators-approvers
Signed-off-by: Monis Khan <mok@vmware.com>
2021-11-29 16:49:58 -05:00
Kubernetes Prow Robot
c1153d3353
Merge pull request #106716 from aojea/http1_flake_timeout
bump TestHTTP1DoNotReuseRequestAfterTimeout timeout
2021-11-29 13:23:22 -08:00
Kevin Delgado
b35c444e42 Update fieldValidation godoc 2021-11-29 21:21:28 +00:00
Sergey Kanzhelev
a11453efbc remove ReallyCrashForTesting and cleaned up some references to HandleCrash behavior 2021-11-29 20:00:10 +00:00
Mike Spreitzer
95964c5b35 Correct Generator calls for executing seat count 2021-11-29 14:50:11 -05:00
Antonio Ojea
85797eba70 bump TestHTTP1DoNotReuseRequestAfterTimeout timeout
the test TestHTTP1DoNotReuseRequestAfterTimeout has to wait for
request to time out to assert that subsequent requests does not
reuse the TCP connection.

It seems that current value of 100ms causes issues on some CI
environments and bumping the timeout seems to solve this flakiness,

We can bump the timeout value because is really low compared to real
scenarios and the bump still keeps it in the millisecond order.
2021-11-29 19:11:47 +01:00
calvin
d591b62b4a remove the kubeadm feature gate. 2021-11-29 18:11:02 +08:00
menglong.qi
ea31d7b813 refactor: use utilerrors instead of join error msg 2021-11-28 17:16:17 +08:00
Scott Nice
1070eb7428
Fixed issue in plugin.go for bug #106696
Fixed issue in plugin.go where valid plugin events would be skipped if any plugin had an error. This meant that valid plugins would never be installed if another was in an error state as the events fired only once.
2021-11-27 15:07:19 -05:00
wpedrak
d5e1ee4de8 Make writing version.txt more resilient
Writing file first truncate it and writes later on. During disk space pressure it may cause file to become empty. To mitigate above, we create file with new version first and then move it in place of old one (to make sure that disk space is available)
2021-11-26 12:44:50 +01:00
Kubernetes Prow Robot
9a75e7b0fd
Merge pull request #106670 from palnabarun/1.23/update-publishing-bot-rules
publishing-bot: add 1.23 rules
2021-11-25 11:23:23 -08:00
Slavik Panasovets
6ba8c86fc3 add gce elb rbs opt-in annotation 2021-11-25 17:04:28 +00:00
HaoJie Liu
1dc1a37294
fix typo in /test/integration 2021-11-25 18:59:31 +08:00
Nabarun Pal
e8b177cfc1
publishing-bot: add 1.23 rules
Signed-off-by: Nabarun Pal <pal.nabarun95@gmail.com>
2021-11-25 11:25:39 +05:30
DingShujie
25cf49770c update k/utils to v0.0.0-20211116205334-6203023598ed 2021-11-25 09:29:03 +08:00
Kubernetes Prow Robot
aff056d8a1
Merge pull request #106660 from liggitt/smd-merge
Revert sigs.k8s.io/structured-merge-diff/v4 to v4.1.2
2021-11-24 13:37:31 -08:00
Kevin Klues
f8511877e2 Add regression test for CPUManager distribute NUMA algorithm
We witnessed this exact allocation attempt in a live cluster and witnessed the
algorithm fail with an accounting error. This test was added to verify that
this case is now handled by the updates to the algorithm and that we don't
regress from it in the future.

"test" description="ensure previous failure encountered on live machine has been fixed (1/1)"
"combo remainderSet balance" combo=[2 4 6] remainderSet=[2 4 6] distribution=9 remainder=1 available=[14 2 4 4 0 3 4 1] balance=4.031
"combo remainderSet balance" combo=[2 4 6] remainderSet=[2 4] distribution=9 remainder=1 available=[0 3 4 1 14 2 4 4] balance=4.031
"combo remainderSet balance" combo=[2 4 6] remainderSet=[2 6] distribution=9 remainder=1 available=[1 14 2 4 4 0 3 4] balance=4.031
"combo remainderSet balance" combo=[2 4 6] remainderSet=[4 6] distribution=9 remainder=1 available=[1 3 4 0 14 2 4 4] balance=4.031
"combo remainderSet balance" combo=[2 4 6] remainderSet=[2] distribution=9 remainder=1 available=[4 0 3 4 1 14 2 4] balance=4.031
"combo remainderSet balance" combo=[2 4 6] remainderSet=[4] distribution=9 remainder=1 available=[3 4 0 14 2 4 4 1] balance=4.031
"combo remainderSet balance" combo=[2 4 6] remainderSet=[6] distribution=9 remainder=1 available=[1 13 2 4 4 1 3 4] balance=3.606
"bestCombo found" distribution=9 bestCombo=[2 4 6] bestRemainder=[6]

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-11-24 20:49:58 +00:00
Kevin Klues
e284c74d93 Add unit test for CPUManager distribute NUMA algorithm verifying fixes
Before Change:
"test" description="ensure bestRemainder chosen with NUMA nodes that have enough CPUs to satisfy the request"
"combo remainderSet balance" combo=[0 1 2 3] remainderSet=[0 1] distribution=8 remainder=2 available=[-1 -1 0 6] balance=2.915
"combo remainderSet balance" combo=[0 1 2 3] remainderSet=[0 2] distribution=8 remainder=2 available=[-1 0 -1 6] balance=2.915
"combo remainderSet balance" combo=[0 1 2 3] remainderSet=[0 3] distribution=8 remainder=2 available=[5 -1 0 0] balance=2.345
"combo remainderSet balance" combo=[0 1 2 3] remainderSet=[1 2] distribution=8 remainder=2 available=[0 -1 -1 6] balance=2.915
"combo remainderSet balance" combo=[0 1 2 3] remainderSet=[1 3] distribution=8 remainder=2 available=[0 -1 0 5] balance=2.345
"combo remainderSet balance" combo=[0 1 2 3] remainderSet=[2 3] distribution=8 remainder=2 available=[0 0 -1 5] balance=2.345
"bestCombo found" distribution=8 bestCombo=[0 1 2 3] bestRemainder=[0 3]

--- FAIL: TestTakeByTopologyNUMADistributed (0.01s)
    --- FAIL: TestTakeByTopologyNUMADistributed/ensure_bestRemainder_chosen_with_NUMA_nodes_that_have_enough_CPUs_to_satisfy_the_request (0.00s)
        cpu_assignment_test.go:867: unexpected error [accounting error, not enough CPUs allocated, remaining: 1]

After Change:
"test" description="ensure bestRemainder chosen with NUMA nodes that have enough CPUs to satisfy the request"
"combo remainderSet balance" combo=[0 1 2 3] remainderSet=[3] distribution=8 remainder=2 available=[0 0 0 4] balance=1.732
"bestCombo found" distribution=8 bestCombo=[0 1 2 3] bestRemainder=[3]

SUCCESS

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-11-24 20:45:37 +00:00
Cheng Xing
4de40e90d4 DelegateFSGroupToCSIDriver e2e: skip tests with chgrp 2021-11-24 11:41:53 -08:00
Kevin Klues
031f11513d Fix accounting bug in CPUManager distribute NUMA policy
Without this fix, the algorithm may decide to allocate "remainder" CPUs from a
NUMA node that has no more CPUs to allocate. Moreover, it was only considering
allocation of remainder CPUs from NUMA nodes such that each NUMA node in the
remainderSet could only allocate 1 (i.e. 'cpuGroupSize') more CPUs. With these
two issues in play, one could end up with an accounting error where not enough
CPUs were allocated by the time the algorithm runs to completion.

The updated algorithm will now omit any NUMA nodes that have 0 CPUs left from
the set of NUMA nodes considered for allocating remainder CPUs. Additionally,
we now consider *all* combinations of nodes from the remainder set of size
1..len(remainderSet). This allows us to find a better solution if allocating
CPUs from a smaller set leads to a more balanced allocation. Finally, we loop
through all NUMA nodes 1-by-1 in the remainderSet until all rmeainer CPUs have
been accounted for and allocated. This ensure that we will not hit an
accounting error later on because we explicitly remove CPUs from the remainder
set until there are none left.

A follow-on commit adds a set of unit tests that will fail before these
changes, but succeeds after them.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-11-24 19:18:11 +00:00
Kevin Klues
5317a2e2ac Fix error handling in CPUManager distribute NUMA tests
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-11-24 16:51:31 +00:00
Kevin Klues
dc4430b663 Add a sum() helper to the CPUManager cpuassignment logic
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-11-24 16:51:29 +00:00
Kevin Klues
cfacc22459 Allow the map.Values() function in the CPUManager to take a set of keys
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-11-24 16:51:28 +00:00
Kevin Klues
a160d9a8cd Fix CPUManager algo to calculate min NUMA nodes needed for distribution
Previously the algorithm was too restrictive because it tried to calculate the
minimum based on the number of *available* NUMA nodes and the number of
*available* CPUs on those NUMA nodes. Since there was no (easy) way to tell how
many CPUs an individual NUMA node happened to have, the average across them was
used. Using this value however, could result in thinking you need more NUMA
nodes to possibly satisfy a request than you actually do.

By using the *total* number of NUMA nodes and CPUs per NUMA node, we can get
the true minimum number of nodes required to satisfy a request. For a given
"current" allocation this may not be the true minimum, but its better to start
with fewer and move up than to start with too many and miss out on a better
option.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-11-24 16:51:26 +00:00
Kevin Klues
209cd20548 Fix unit tests following bug fix in CPUManager for map functions (2/2)
Now that the algorithm for balancing CPU distributions across NUMA nodes is
correct, this test actually behaves differently for the "packed" vs.
"distributed" allocation algorithms (as it should).

In the "packed" case we need to ensure that CPUs are allocated such that they
are packed onto cores. Since one CPU is already allocated from a core on NUMA
node 0, we want the next CPU to be its hyperthreaded pair (even though the
first available CPU id is on Socket 1).

In the "distributed" case, however, we want to ensure CPUs are allocated such
that we have an balanced distribution of CPUs across all NUMA nodes. This
points to allocating from Socket 1 if the only other CPU allocated has been
done on Socket 0.

To allow CPUs allocations to be packed onto full cores, one can allocate them
from the "distributed" algorithm with a 'cpuGroupSize' equal to the number of
hypthreads per core (in this case 2). We added an explicit test case for this,
demonstrating that we get the same result as the "packed" algorithm does, even
though the "distributed" algorithm is in use.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-11-24 16:51:24 +00:00