Commit Graph

107940 Commits

Author SHA1 Message Date
andyzhangx
ffca636024 fix: NeedResize build failure on Windows 2022-04-29 11:34:43 +00:00
Kevin Klues
57f8b31b42 Update tests to accommodate devicemanager refactoring
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2022-04-29 10:52:37 +00:00
Kevin Klues
f6eaa25b71 Move DevicePluginStub implementation into new plugin package
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2022-04-29 10:52:37 +00:00
Kevin Klues
db88676c20 Refactor all device plugin logic into separate 'plugin' package
This is the first step towards being able to support a new plugin API version
in parallel with the existing one.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2022-04-29 10:52:37 +00:00
Mike Spreitzer
b4a40cd43e Draft weighted and timing histograms
The following investigation occurred during development.

Add TimingHistogram impl that shares lock with WeightedHistogram

Benchmarking and profiling shows that two layers of locking is
noticeably more expensive than one.

After adding this new alternative, I now get the following benchmark
results.

```
(base) mspreitz@mjs12 kubernetes % go test -benchmem -run=^$ -bench ^BenchmarkTimingHistogram$ k8s.io/component-base/metrics/prometheusextension
goos: darwin
goarch: amd64
pkg: k8s.io/component-base/metrics/prometheusextension
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkTimingHistogram-16    	22232037	        52.79 ns/op	       0 B/op	       0 allocs/op
PASS
ok  	k8s.io/component-base/metrics/prometheusextension	1.404s

(base) mspreitz@mjs12 kubernetes % go test -benchmem -run=^$ -bench ^BenchmarkTimingHistogram$ k8s.io/component-base/metrics/prometheusextension
goos: darwin
goarch: amd64
pkg: k8s.io/component-base/metrics/prometheusextension
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkTimingHistogram-16    	22190997	        54.50 ns/op	       0 B/op	       0 allocs/op
PASS
ok  	k8s.io/component-base/metrics/prometheusextension	1.435s
```

and

```
(base) mspreitz@mjs12 kubernetes % go test -benchmem -run=^$ -bench ^BenchmarkTimingHistogramDirect$ k8s.io/component-base/metrics/prometheusextension
goos: darwin
goarch: amd64
pkg: k8s.io/component-base/metrics/prometheusextension
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkTimingHistogramDirect-16    	28863244	        40.99 ns/op	       0 B/op	       0 allocs/op
PASS
ok  	k8s.io/component-base/metrics/prometheusextension	1.890s
(base) mspreitz@mjs12 kubernetes %
(base) mspreitz@mjs12 kubernetes %
(base) mspreitz@mjs12 kubernetes % go test -benchmem -run=^$ -bench ^BenchmarkTimingHistogramDirect$ k8s.io/component-base/metrics/prometheusextension
goos: darwin
goarch: amd64
pkg: k8s.io/component-base/metrics/prometheusextension
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkTimingHistogramDirect-16    	27994173	        40.37 ns/op	       0 B/op	       0 allocs/op
PASS
ok  	k8s.io/component-base/metrics/prometheusextension	1.384s
```

So the new implementation is roughly 20% faster than the original.

Add overlooked exception, rename timingHistogram to timingHistogramLayered

Use the direct (one mutex) style of TimingHistogram impl

This is about a 20% gain in CPU speed on my development machine, in
benchmarks without lock contention.  Following are two consecutive
trials.

(base) mspreitz@mjs12 prometheusextension % go test  -benchmem -run=^$ -bench Histogram .
goos: darwin
goarch: amd64
pkg: k8s.io/component-base/metrics/prometheusextension
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkTimingHistogramLayered-16    	21650905	        51.91 ns/op	       0 B/op	       0 allocs/op
BenchmarkTimingHistogramDirect-16     	29876860	        39.33 ns/op	       0 B/op	       0 allocs/op
BenchmarkWeightedHistogram-16         	49227044	        24.13 ns/op	       0 B/op	       0 allocs/op
BenchmarkHistogram-16                 	41063907	        28.82 ns/op	       0 B/op	       0 allocs/op
PASS
ok  	k8s.io/component-base/metrics/prometheusextension	5.432s

(base) mspreitz@mjs12 prometheusextension % go test  -benchmem -run=^$ -bench Histogram .
goos: darwin
goarch: amd64
pkg: k8s.io/component-base/metrics/prometheusextension
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkTimingHistogramLayered-16    	22483816	        51.72 ns/op	       0 B/op	       0 allocs/op
BenchmarkTimingHistogramDirect-16     	29697291	        39.39 ns/op	       0 B/op	       0 allocs/op
BenchmarkWeightedHistogram-16         	48919845	        24.03 ns/op	       0 B/op	       0 allocs/op
BenchmarkHistogram-16                 	41153044	        29.26 ns/op	       0 B/op	       0 allocs/op
PASS
ok  	k8s.io/component-base/metrics/prometheusextension	5.044s

Remove layered implementation of TimingHistogram
2022-04-28 17:36:06 -04:00
Abirdcfly
a7cfbb3e6c
fix volumebinding test in scheduler
Signed-off-by: Abirdcfly <fp544037857@gmail.com>
2022-04-28 16:22:02 +08:00
Wei Huang
846ebf7814
Cleanup legacy scheduler perf tests 2022-04-27 09:57:17 -07:00
Steve Kuznetsov
138faa3799
storage/etcd3: continue unifying test setup
Previous work by liggitt in 01760927b8 improved the boilerplate
required to run an embedded etcd server for tests as well as set up the
`*etcd3.store{}` for testing. A number of tests were not ported to use the
new helpers, though, either due to custom setup or due to inconsistent
use of setup options. A follow-up by stevekuznetsov in 6aa37eb062
removed much of the inconsistency, meaning that most callers to
`newStore()` were simply using the default boilerplate and options that
`testSetup()` used.

This patch moves all users to testSetup(), adding options as necessary
to enable some fringe setup use-cases. With a unified setup, new tests
will not copy boilerplate they do not need and it will be immediately
obvious when reading a test if the client or storage setup is *not*
default, improving readability.

Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
2022-04-26 12:26:17 -07:00
Kubernetes Release Robot
537941765f CHANGELOG: Update directory for v1.24.0-rc.1 release 2022-04-26 16:45:00 +00:00
何庆国10193842
3d14bcb9a4 Log StructuredLog: spelling formatting
Signed-off-by: 何庆国10193842 <he.qingguo@zte.com.cn>
2022-04-26 17:04:06 +08:00
Kubernetes Prow Robot
a83cc51a19
Merge pull request #109658 from bobbypage/cadvisor-044-1
Bump cAdvisor to v0.44.1
2022-04-25 19:44:52 -07:00
David Porter
b0da29dcb8 Bump cAdvisor to v0.44.1
Bump cAdvisor to v0.44.1 to pick up fix for containerd task timeout
which resulted in empty network metrics.

Signed-off-by: David Porter <david@porter.me>
2022-04-25 17:18:38 -07:00
sanposhiho
b7b94b6b39 scheduler_perf: create sleep operation 2022-04-25 23:02:09 +00:00
Claudiu Belu
dc881cbc77 GCE Windows: Copy the CNI binaries from the right folder
A previous commit updated the containerd version used on Windows
nodes from 1.5.4 to 1.6.2. However, the folder structure of the
containerd releases changed since then from:

cni/$binary.exe

to:

cni/bin/$binary.exe

Because of this, the Windows nodes do not have the necessary CNI
binaries needed to setup the pod networks.
2022-04-25 15:06:30 -07:00
Patrick Ohly
2664740043 e2e: move feature gate support from test/e2e to test/e2e_node
The test/e2e suite has never supported feature gates:
- it cannot discover at runtime how the cluster is configured
- its --feature-gates parameter had no effect

Despite that, tests were written that used
e2eskipper.SkipUnlessFeatureGateEnabled even though that function then only
checked the default feature gate state.  To catch such mistakes, e2e tests
suites now must explicitly enable feature gate checking via
e2eskipper.InitFeatureGates. They also must register their own command line
flag. When that is not done, then using SkipUnlessFeatureGateEnabled or
SkipIfFeatureGateEnabled leads to a test failure.

test/e2e_node does both and therefore continues to work as before.
2022-04-25 15:41:41 +02:00
Patrick Ohly
12990dec40 e2e: remove useless SkipUnlessFeatureGateEnabled
These SkipUnlessFeatureGateEnabled are useless because:
- the tests run in test/e2e where feature gates always
  have their default state
- CSIMigration, SizeMemoryBackedVolumes and ExecProbeTimeout are
  all enabled by default (beta resp. GA)
2022-04-25 15:41:25 +02:00
Sebastian Laskawiec
f0af12bb9d Warn on receiving a space before the token 2022-04-25 15:22:28 +02:00
Abhijit Hoskeri
49dc59873b e2e_node/{service,util}: use kubelet healthz port.
The readonly port could be disabled.

Since we are only using the /healthz endpoint,
we can use the healthz port for this.

Change-Id: Ie0e05a5ab4ec6f51e4d3c63226aa23c1b3a69956
2022-04-22 16:14:31 -07:00
Steve Kuznetsov
809fd64b28
storage/etcd3: clarify the pagingation flow in LIST
It is not possible for the nil-check to ever return anything different
from what the explicit boolean used to, but this is only something that
a reader can come to the conclusion on if they very, very carefuly read
the code. Instead of having this implicit flow that is difficult to
follow, let's keep the boolean.

Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
2022-04-22 11:57:04 -07:00
Kubernetes Prow Robot
f02682c628
Merge pull request #109592 from claudiubelu/gce-updates-containerd-version
windows GCE: Bumps containerd version to 1.6.2
2022-04-22 10:18:12 -07:00
熊中谅10171568
c4579165f1 refactor: remove deprecated flags
refactor: remove deprecated deleting-pods-qps deleting-pods-burst register-retry-count flags
2022-04-22 20:28:12 +08:00
Kubernetes Prow Robot
f0791b5014
Merge pull request #109541 from dims/disable-intree-gce-pd-tests-by-default
Disable Intree GCE PD tests by default
2022-04-21 16:48:12 -07:00
Danielle Lancashire
f1f45df2c1 hack: make test-e2e-node: remove old project refs
This commit cleans up references to the old kubernetes-node-e2e-images
project. In the process it removes the `LIST_IMAGES` mode as listing
large numbers of public cloud projects is not particularly useful, and
has been somewhat broken for a long period of time - as we defaulted
launching a VM to a different project than listing.
2022-04-22 00:59:25 +02:00
Jonathan Dobson
f369b1234a e2e: add storage capability for offline volume expansion 2022-04-21 14:49:18 -06:00
Sanskar Jaiswal
f8df26ae80 Update comment and declaration of storage.GuaranteedUpdate to be clearer.
Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
2022-04-22 01:01:15 +05:30
Han Kang
a9665c2d58 remove stutter from alpha metric
Change-Id: I6669225943a4196cfe70659fa296a0f81a0ab682
2022-04-20 16:56:00 -07:00
Aldo Culquicondor
12568860cb Test Foreground deletion in job integration
Change-Id: Ia6e86da5e66422fdb653c1ee60864a1c79233ea6
2022-04-20 16:39:10 -04:00
Aldo Culquicondor
09caa36718 Fix removing finalizer from finished jobs
In some rare race conditions, the job controller might create new pods after the job is declared finished.

Change-Id: I8a00429c8845463259cd7f82bb3c241d0011583c
2022-04-20 16:39:10 -04:00
Aldo Culquicondor
53aa05df3a Don't mark job as failed until expectations are satisfied
Change-Id: I99206f35f6f145054c005ab362c792e71b9b15f4
2022-04-20 16:39:10 -04:00
Aldo Culquicondor
f2c8030845 Integration test for backoff limit and finalizers
Change-Id: Ic231ce9a5504d3aae4191901d7eb5fe69bf017ac
2022-04-20 16:39:09 -04:00
Kubernetes Release Robot
f173d01c01 CHANGELOG: Update directory for v1.23.6 release 2022-04-20 19:16:56 +00:00
Danielle Lancashire
d6c184084c sig-node: endocrimes as e2e_node approver 2022-04-20 17:12:09 +00:00
Kubernetes Release Robot
c73b887bbf CHANGELOG: Update directory for v1.22.9 release 2022-04-20 17:09:21 +00:00
Danielle Lancashire
0e0e3113e2 e2e_node: remote runner: Require containerd/crio 2022-04-20 16:49:29 +00:00
Danielle Lancashire
7151ff8d5c e2e_node: remove jenkins docker_validation 2022-04-20 16:16:57 +00:00
Danielle Lancashire
3e0041b5b9 e2e_node: remove copy-e2e-image.sh
This script is unused, and the project that was formerly used for e2e
node images is in the process of being removed.
2022-04-20 16:15:25 +00:00
Danielle Lancashire
d90ba453ce e2e_node: remove unused jenkins runner script 2022-04-20 16:15:15 +00:00
Danielle Lancashire
8333bcc6ab e2e_node: remove unused jenkins/coreos-init.json 2022-04-20 16:11:36 +00:00
Kubernetes Release Robot
c96a1c0614 CHANGELOG: Update directory for v1.21.12 release 2022-04-20 16:02:57 +00:00
Kubernetes Prow Robot
56cac1b58b
Merge pull request #109567 from palnabarun/release-1.24/update-publishing-bot-rules
[release-1.24] Update publishing-bot rules
2022-04-20 08:19:43 -07:00
Nabarun Pal
88ef5f91b0
release-1.24: update publishing bot rules
This change has been generated by the `update-rules` command in
`publishing-bot` repository. Since this is the first time we are
updating the rules using a script, there is a considerable amount of
diff, which is caused because of the YAML marshaller.

Signed-off-by: Nabarun Pal <pal.nabarun95@gmail.com>
2022-04-20 19:40:04 +05:30
Claudiu Belu
70f14e16e4 windows GCE: Bumps containerd version to 1.6.2
containerd v1.6.0 introduced HostProcessContainers support [1], which
are required for e2e tests that need that feature.

This addresses some of the permafailing tests for Windows GCE E2E test runs.

[1] https://github.com/containerd/containerd/pull/5131
2022-04-20 06:53:26 -07:00
Arda Güçlü
64ce73f5fe Show topologySpreadConstraints in Describe command
Currently describe command does not show `topologySpreadConstraints`
field. This PR adds support for showing topologySpreadConstraints in
describe command.For simplicity, if this field is not set, it will not show
(unlike to other core fields whose are shown as `<none>` even if they are null).
2022-04-20 15:37:02 +03:00
navist2020
2a7e85bfdb Return preflightError if an error occurs when running the preflight 2022-04-20 11:39:35 +08:00
Yuan Chen
d1a2f699a7 Add PodWrapper functions for scheduler testing
Fix a typo in comment
2022-04-19 20:30:04 -07:00
Sergey Kanzhelev
462e1ae7f9 fix the image for node performance tests - model expected tensorflow version <1.9
Change-Id: I116cc38eac20f7cdafb975c88ee9a0e7ec667861
2022-04-20 00:14:21 +00:00
Kubernetes Release Robot
b36c927d95 CHANGELOG: Update directory for v1.24.0-rc.0 release 2022-04-19 16:18:18 +00:00
sanposhiho
6e0da69632 Replace scheduler_e2e_scheduling_duration_seconds with scheduler_scheduling_attempt_duration_seconds in scheduler_perf 2022-04-20 00:48:12 +09:00
Wojciech Tyczyński
73da6d15f9 Fix TestPriorityLevelIsolation concurrency issue 2022-04-19 15:59:14 +02:00
Wojciech Tyczyński
e95f8f2e42 Clean apiserver shutdown in integration tests 2022-04-19 15:59:13 +02:00