Commit Graph

116722 Commits

Author SHA1 Message Date
Francesco Romani
d78671447f e2e: node: add test to check device-requiring pods are cleaned up
Make sure orphanded pods (pods deleted while kubelet is down) are
handled correctly.
Outline:
1. create a pod (not static pod)
2. stop kubelet
3. while kubelet is down, force delete the pod on API server
4. restart kubelet
the pod becomes an orphaned pod and is expected to be killed by HandlePodCleanups.

There is a similar test already, but here we want to check device
assignment.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-07-12 13:25:36 +02:00
Francesco Romani
5cf50105a2 e2e: node: devices: improve the node reboot test
The recently added e2e device plugins test to cover node reboot
works fine if runs every time on CI environment (e.g CI) but
doesn't handle correctly partial setup when run repeatedly on
the same instance (developer setup).

To accomodate both flows, we extend the error management, checking
more error conditions in the flow.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-07-12 13:25:36 +02:00
Francesco Romani
b926aba268 e2e: node: devicemanager: update tests
Fix e2e device manager tests.
Most notably, the workload pods needs to survive a kubelet
restart. Update tests to reflect that.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-07-12 13:25:36 +02:00
Francesco Romani
3bcf4220ec kubelet: devices: skip allocation for running pods
When kubelet initializes, runs admission for pods and possibly
allocated requested resources. We need to distinguish between
node reboot (no containers running) versus kubelet restart (containers
potentially running).

Running pods should always survive kubelet restart.
This means that device allocation on admission should not be attempted,
because if a container requires devices and is still running when kubelet
is restarting, that container already has devices allocated and working.

Thus, we need to properly detect this scenario in the allocation step
and handle it explicitely. We need to inform
the devicemanager about which pods are already running.

Note that if container runtime is down when kubelet restarts, the
approach implemented here won't work. In this scenario, so on kubelet
restart containers will again fail admission, hitting
https://github.com/kubernetes/kubernetes/issues/118559 again.
This scenario should however be pretty rare.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-07-12 13:25:36 +02:00
Kubernetes Prow Robot
988094878e
Merge pull request #108075 from ialidzhikov/cleanup/pointer-duration
Make use of `k8s.io/utils/pointer.Duration`
2023-06-19 05:22:21 -07:00
Kubernetes Prow Robot
bfd833baf3
Merge pull request #115982 from peaaceChoi/master
Update toplogy keyset initialization
2023-06-19 04:04:21 -07:00
Kubernetes Prow Robot
26f7f8e980
Merge pull request #118733 from neolit123/1.28-etcd-version-fixup
kubeadm: drop older etcd versions from kubeadm support
2023-06-18 23:32:21 -07:00
Kubernetes Prow Robot
0004ce8684
Merge pull request #118689 from bzsuni/clean
[dependencies] update prometheus/client_golang v1.14.0 to v1.16.0
2023-06-18 14:46:20 -07:00
ialidzhikov
958c8fb695 Make use of k8s.io/utils/pointer.Duration
Signed-off-by: ialidzhikov <i.alidjikov@gmail.com>
2023-06-18 21:46:26 +03:00
Lubomir I. Ivanov
ede2ec22b6 kubeadm: drop older etcd versions from kubeadm support
- drop versions < 1.22 in the etcd map
- use 3.5.9-0 for >= 1.22 versions
- make the minimum version for external etcd 3.4.13-4 and max 3.5.9-0
- update images_test to not rely on a pinned etcd version in tests

note: the image 3.4.18-0 was never released in registry.k8s.io!
2023-06-18 15:38:53 +03:00
Kubernetes Prow Robot
1ff1a26426
Merge pull request #118542 from cchapla/crd_webhook_metrics_updates
Updating names from webhookconversion to conversionwebhook for apiserver
2023-06-16 10:34:19 -07:00
Kubernetes Prow Robot
cef13f11fd
Merge pull request #118615 from mimowo/job-controller-backoff-cleanup
Cleanup job controller handling of backoff
2023-06-16 08:58:19 -07:00
Michal Wozniak
74c5ff97f1 Lower the constants for the rate limiter in Job controller 2023-06-16 17:00:04 +02:00
Michal Wozniak
c51a422d78 Cleanup job controller handling of backoff 2023-06-16 14:53:27 +02:00
Kubernetes Prow Robot
fa78f28f0a
Merge pull request #117522 from pawbana/auth-provider-gcp-windows
Added support for image credential provider for windows and arm64 on gce
2023-06-15 17:48:38 -07:00
Kubernetes Prow Robot
3454de64dd
Merge pull request #116863 from SergeyKanzhelev/knowninplaceBug
added known issue for 1.27 release
2023-06-15 17:48:31 -07:00
Kubernetes Prow Robot
58d7a794d2
Merge pull request #113504 from pacoxu/taint-unit-test
AddOrUpdateTaintOnNode: if node does not exists, return an error
2023-06-15 17:48:19 -07:00
Kubernetes Prow Robot
b637006302
Merge pull request #118420 from alculquicondor/job_warnings
Add warnings for big number of completions and parallelism
2023-06-15 14:24:18 -07:00
Kubernetes Prow Robot
604584d1d3
Merge pull request #118631 from champtar/ca-not-before
Make CA valid 1 hour in the past
2023-06-15 11:22:30 -07:00
Kubernetes Prow Robot
79ca192b4f
Merge pull request #118585 from twz123/fix-ginkgo-no-color-deprecation-warning
Fix ginkgo noColor deprecation warning
2023-06-15 11:22:18 -07:00
Kubernetes Prow Robot
1193ab62e2
Merge pull request #116746 from AxeZhan/csi_translate
Return name instead whole volume when error occurred in csi-translation
2023-06-15 06:50:18 -07:00
bzsuni
5aa5f1abc9 update prometheus/client_golang v1.14.0 to v1.16.0
Signed-off-by: bzsuni <bingzhe.sun@daocloud.io>
2023-06-15 11:24:32 +00:00
Kubernetes Prow Robot
c984d53b31
Merge pull request #117896 from kolyshkin/mount-utils-spring-cleaning
Mount utils spring cleaning and optimization
2023-06-15 01:40:17 -07:00
Kubernetes Prow Robot
41575586b4
Merge pull request #118668 from Riaankl/Uupdate-pending_eligible_endpoints.yaml-to-match-APISnoop
Update pending_eligible_endpoints.yaml to match APISnoop
2023-06-14 20:38:31 -07:00
Kubernetes Prow Robot
7bd66c4a30
Merge pull request #118666 from upodroid/simplify-node-e2e-flags
Update container runtime flags to use containerd instead of docker
2023-06-14 20:38:18 -07:00
Kubernetes Prow Robot
e56002ab04
Merge pull request #118665 from bart0sh/PR119-DRA-E2E-remove-NodeFeature
DRA Node E2E: remove NodeFeature label
2023-06-14 18:10:18 -07:00
Kubernetes Release Robot
8636f9353a CHANGELOG: Update directory for v1.27.3 release 2023-06-14 21:26:04 +00:00
Kubernetes Prow Robot
e436472e24
Merge pull request #118628 from dims/check-before-you-sudo
check before you sudo on AWS EC2 instances
2023-06-14 14:14:17 -07:00
Kubernetes Release Robot
763555814b CHANGELOG: Update directory for v1.26.6 release 2023-06-14 20:17:31 +00:00
Kubernetes Prow Robot
302564c66f
Merge pull request #118655 from aojea/glbc_up
use ingress-gce-glbc v1.23.1 image for CI
2023-06-14 12:50:30 -07:00
Kubernetes Prow Robot
78f18c1b4b
Merge pull request #116894 from enj/enj/i/encrypt_resp_sanity_checks
kmsv2: add sanity checks and refine probing logic
2023-06-14 12:50:19 -07:00
Riaan Kleinhans
3bf93156d8
Update pending_eligible_endpoints.yaml to match APISnoop 2023-06-15 07:31:31 +12:00
Kubernetes Release Robot
ff2a1f0167 CHANGELOG: Update directory for v1.25.11 release 2023-06-14 18:51:12 +00:00
upodroid
a29be0cfb0 update container runtime flags to use containerd instead of docker 2023-06-14 19:18:39 +01:00
Davanum Srinivas
89adbc6e5b
check for AWS environment before running sudo
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2023-06-14 14:03:44 -04:00
Kubernetes Release Robot
7f650acb3c CHANGELOG: Update directory for v1.24.15 release 2023-06-14 17:59:12 +00:00
Ed Bartosh
a83edd35c4 DRA Node E2E: relabel test suite to fix CI
Removed NodeFeature:DynamicResourceAllocation label from the
tests to fix cos-cgroupv1/v2-containerd-node-e2e-serial CI jobs.

It turned out that labeling DRA Node tests as NodeFeature was
a mistake. Re-labeling with NodeAlphaFeature would not work either.
It would fail certain containerd jobs as DRA requires containerd >= 1.7
2023-06-14 20:46:24 +03:00
Kubernetes Prow Robot
99e050f88e
Merge pull request #117597 from CoderSherlock/master
Added e2e_node test for sigkilled pods exit code and exit reason check
2023-06-14 10:34:29 -07:00
Kubernetes Prow Robot
6a79a8a57c
Merge pull request #115835 from HirazawaUi/fix-terminationGracePeriod-bug
fix terminationGracePeriod blocked by preStop
2023-06-14 10:34:18 -07:00
Kubernetes Prow Robot
77fd143c8d
Merge pull request #118603 from pbetkier/deflake-hpa-e2e-behavior-tests
e2e: deflake a HPA CPU test by stabilizing cpu consumption
2023-06-14 09:26:29 -07:00
Kubernetes Prow Robot
6fbf4824fd
Merge pull request #116091 from pacoxu/cleanup-terminationGracePeriodSeconds
cleanup: remove ProbeTerminationGracePeriod feature tag on test
2023-06-14 09:26:18 -07:00
Kubernetes Prow Robot
76c0be5462
Merge pull request #118659 from jsafrane/bump-iscsi-image
Bump iscsi test server image
2023-06-14 08:00:30 -07:00
Kubernetes Prow Robot
47e79b8156
Merge pull request #116910 from fatsheep9146/job-controller-contextual-logging
Migrated pkg/controller/job to contextual logging
2023-06-14 08:00:18 -07:00
Aldo Culquicondor
c27f9fdeb7
Add warnings for big number of completions and parallelism
Change-Id: I63e192b1ce9da7d8bb04f8be1a6e19ec6fbbfa5a
2023-06-14 10:38:42 -04:00
Kubernetes Prow Robot
b53411ffb8
Merge pull request #118639 from bergerhoffer/cli-help-updates
Update CLI help text for grammar and consistency
2023-06-14 06:54:18 -07:00
Andrea Hoffer
a86380c781 Update CLI help text for grammar and consistency 2023-06-14 08:54:23 -04:00
Kubernetes Prow Robot
173a473803
Merge pull request #118128 from carlory/fix-issue-118120
remove helper function for unused storage feature in pkg/proxy/util
2023-06-14 04:28:18 -07:00
Jan Safranek
96e7d5f1f2 Bump iscsi test server image 2023-06-14 12:47:16 +02:00
Antonio Ojea
e0f273ffda use ingress-gce-glbc v1.23.1 image for CI
Change-Id: Ia2dacdc1d8fd3e369b9dcc0ec8b2653f3a834057
2023-06-14 10:40:01 +00:00
Ziqi Zhao
7bc449d7e0 add contextual logging to job-controller
Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>
2023-06-14 13:40:02 +08:00