Commit Graph

114639 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
da87af638f
Merge pull request #115856 from lanycrost/e2e-115780-grpc-probe-tests
Promote gRPC probe e2e test to Conformance
2023-03-09 01:06:03 -08:00
cpanato
99c80ac119
[go] Bump images, dependencies and versions to go 1.20.2 2023-03-09 09:57:45 +01:00
Kubernetes Prow Robot
f5ddaa152e
Merge pull request #116392 from seans3/fallback-verifier
Fallback query param verifier
2023-03-08 23:06:00 -08:00
Todd Neal
78ca93e39c rework init containers test to remove host file dependency
Since we can't rely on the test runner and hosts under test to
be on the same machine, we write to the terminate log from each
container and concatenate the results.
2023-03-08 23:17:17 -06:00
Clayton Coleman
6b9a381185
kubelet: Force deleted pods can fail to move out of terminating
If a CRI error occurs during the terminating phase after a pod is
force deleted (API or static) then the housekeeping loop will not
deliver updates to the pod worker which prevents the pod's state
machine from progressing. The pod will remain in the terminating
phase but no further attempts to terminate or cleanup will occur
until the kubelet is restarted.

The pod worker now maintains a store of the pods state that it is
attempting to reconcile and uses that to resync unknown pods when
SyncKnownPods() is invoked, so that failures in sync methods for
unknown pods no longer hang forever.

The pod worker's store tracks desired updates and the last update
applied on podSyncStatuses. Each goroutine now synchronizes to
acquire the next work item, context, and whether the pod can start.
This synchronization moves the pending update to the stored last
update, which will ensure third parties accessing pod worker state
don't see updates before the pod worker begins synchronizing them.

As a consequence, the update channel becomes a simple notifier
(struct{}) so that SyncKnownPods can coordinate with the pod worker
to create a synthetic pending update for unknown pods (i.e. no one
besides the pod worker has data about those pods). Otherwise the
pending update info would be hidden inside the channel.

In order to properly track pending updates, we have to be very
careful not to mix RunningPods (which are calculated from the
container runtime and are missing all spec info) and config-
sourced pods. Update the pod worker to avoid using ToAPIPod()
and instead require the pod worker to directly use
update.Options.Pod or update.Options.RunningPod for the
correct methods. Add a new SyncTerminatingRuntimePod to prevent
accidental invocations of runtime only pod data.

Finally, fix SyncKnownPods to replay the last valid update for
undesired pods which drives the pod state machine towards
termination, and alter HandlePodCleanups to:

- terminate runtime pods that aren't known to the pod worker
- launch admitted pods that aren't known to the pod worker

Any started pods receive a replay until they reach the finished
state, and then are removed from the pod worker. When a desired
pod is detected as not being in the worker, the usual cause is
that the pod was deleted and recreated with the same UID (almost
always a static pod since API UID reuse is statistically
unlikely). This simplifies the previous restartable pod support.
We are careful to filter for active pods (those not already
terminal or those which have been previously rejected by
admission). We also force a refresh of the runtime cache to
ensure we don't see an older version of the state.

Future changes will allow other components that need to view the
pod worker's actual state (not the desired state the podManager
represents) to retrieve that info from the pod worker.

Several bugs in pod lifecycle have been undetectable at runtime
because the kubelet does not clearly describe the number of pods
in use. To better report, add the following metrics:

  kubelet_desired_pods: Pods the pod manager sees
  kubelet_active_pods: "Admitted" pods that gate new pods
  kubelet_mirror_pods: Mirror pods the kubelet is tracking
  kubelet_working_pods: Breakdown of pods from the last sync in
    each phase, orphaned state, and static or not
  kubelet_restarted_pods_total: A counter for pods that saw a
    CREATE before the previous pod with the same UID was finished
  kubelet_orphaned_runtime_pods_total: A counter for pods detected
    at runtime that were not known to the kubelet. Will be
    populated at Kubelet startup and should never be incremented
    after.

Add a metric check to our e2e tests that verifies the values are
captured correctly during a serial test, and then verify them in
detail in unit tests.

Adds 23 series to the kubelet /metrics endpoint.
2023-03-08 22:03:51 -06:00
Paco Xu
a1def4b9c0 pod-infra-container-image: update comments as it will be removed in couple more releases
Signed-off-by: Paco Xu <paco.xu@daocloud.io>
2023-03-09 11:14:32 +08:00
Kubernetes Prow Robot
bbe0eb7595
Merge pull request #116386 from kinvolk/rata/local-up-cleanup
hack/local-up-cluster.sh: Cleaup on SIGINT
2023-03-08 18:46:07 -08:00
Kubernetes Prow Robot
625b8be09e
Merge pull request #115371 from pacoxu/cgroup-v2-memory-tuning
default memoryThrottlingFactor to 0.9 and optimize the memory.high formulas
2023-03-08 18:46:00 -08:00
Kubernetes Prow Robot
8d5c96fed2
Merge pull request #116093 from swatisehgal/topologymanager-ga-graduation
node: topologymgr: Graduate Kubelet Topology Manager to GA
2023-03-08 16:56:06 -08:00
Kubernetes Prow Robot
30ee6914c5
Merge pull request #115149 from nilekhc/encrypt-all
Allow encryption for all resources
2023-03-08 16:55:59 -08:00
Kubernetes Prow Robot
7fe0fb7fbf
Merge pull request #116393 from liggitt/etcd-cancel-error
Recognize etcd/grpc cancel errors correctly
2023-03-08 15:42:49 -08:00
Kubernetes Prow Robot
8fa82976fc
Merge pull request #116356 from pacoxu/cleanup-bump_qps_kubelet
sync default qps of kubelet change everywhere
2023-03-08 15:42:41 -08:00
Maksim Nabokikh
c1431af4f8
KEP-3325: Promote SelfSubjectReview to Beta (#116274)
* Promote SelfSubjectReview to Beta

Signed-off-by: m.nabokikh <maksim.nabokikh@flant.com>

* Fix whoami API

Signed-off-by: m.nabokikh <maksim.nabokikh@flant.com>

* Fixes according to code review

Signed-off-by: m.nabokikh <maksim.nabokikh@flant.com>

---------

Signed-off-by: m.nabokikh <maksim.nabokikh@flant.com>
2023-03-08 15:42:33 -08:00
Kubernetes Prow Robot
0a5310fe9a
Merge pull request #116232 from aojea/e2e_terminating_connectivity
test connectivity for terminating pods
2023-03-08 15:42:21 -08:00
Kubernetes Prow Robot
8ee9b82b10
Merge pull request #115984 from tzneal/init-container-tests
add more init container testing
2023-03-08 15:42:08 -08:00
Jefftree
361391117d Enable aggregated discovery 2023-03-08 23:03:52 +00:00
Peter Schuurman
c57bc292de Add e2e tests for StatefulSetStartOrdinal feature 2023-03-08 14:55:58 -08:00
Kubernetes Prow Robot
4a896644de
Merge pull request #116235 from Jefftree/oas-ga
Promote OpenAPI V3 to GA
2023-03-08 14:44:20 -08:00
Kubernetes Prow Robot
b1ba5c5462
Merge pull request #116145 from seans3/discovery-stale
Surface "stale" GroupVersions from AggregatedDiscovery
2023-03-08 14:44:08 -08:00
Sean Sullivan
f5865043ed Fallback query param verifier 2023-03-08 22:20:39 +00:00
Nilekh Chaudhari
9382fab9b6
feat: implements encrypt all
Signed-off-by: Nilekh Chaudhari <1626598+nilekhc@users.noreply.github.com>
2023-03-08 22:18:49 +00:00
Antoine Pelisse
4f3859ce91 managedfields: Move most of fieldmanager package to managefields 2023-03-08 13:44:00 -08:00
Jordan Liggitt
ac876e5038
Turn off P&F filter in standalone CRD server tests 2023-03-08 16:21:59 -05:00
Aldo Culquicondor
07a73bb2e1
One lock among PodNominator and SchedulingQueue
Change-Id: I17fe5da40250e42c04124c25b530ce6c8dea4154
2023-03-08 16:18:36 -05:00
Kubernetes Prow Robot
8319ac5274
Merge pull request #116383 from Huang-Wei/fix/sched-perf-test
fix: remove SchedulingMigratedInTreePVs feature gate in sched perf test
2023-03-08 13:12:20 -08:00
Kubernetes Prow Robot
2a22864d9c
Merge pull request #116381 from pohly/cronjob-integration-test-shutdown
cronjob: shut down integration test quickly again
2023-03-08 13:12:08 -08:00
Jordan Liggitt
267eb25e60
Recognize etcd/grpc cancel errors correctly 2023-03-08 15:51:25 -05:00
Todd Neal
123ab80333 add more init container lifetime testing
Add some additional init container tests that work via monitoring
container lifetime based on logs written to a common file. This allows
more easily writing assertions about the container lifetimes with
respect to one another.
2023-03-08 14:39:10 -06:00
Kubernetes Prow Robot
8b413d224a
Merge pull request #116342 from msau42/unlock
Unlock CSIMigrationvSphere feature gate
2023-03-08 11:27:24 -08:00
Kubernetes Prow Robot
7598ff36cf
Merge pull request #116333 from aojea/multiport_service
e2e network test for multiple protocol services on same port
2023-03-08 11:27:12 -08:00
Kubernetes Prow Robot
3307b39bba
Merge pull request #116367 from pohly/lint-config-check
golangci-lint: synchronize configs and add verification for that
2023-03-08 09:41:29 -08:00
Kubernetes Prow Robot
a66e139214
Merge pull request #116354 from pacoxu/cleanup-CronJobTimeZone
cleanup: sync testdata as CronJobTimeZone is GAed
2023-03-08 09:41:22 -08:00
Jordan Liggitt
003b6d229c
Detect and clean up unneeded after_roundtrip fixtures 2023-03-08 11:38:32 -05:00
Rodrigo Campos
5f568d51be hack/local-up-cluster.sh: Cleaup on SIGINT
Currently we only cleanup on exit. Let's trap SIGINT (ctrl-c) too, so we
always cleanup everything.

Otherwise if we ctrl-c is easy to leave something running, specially if
we ctrl-c while the cleanup function is running. And when we leave
something running and don't reused the certs ($REUSE_CERTS), that is the
default, something is left running and it fails with weird ways as we
can't auth with the new certs.

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
2023-03-08 17:37:50 +01:00
Wei Huang
c9bc2f98d0
fix: remove SchedulingMigratedInTreePVs feature gate in sched perf test 2023-03-08 08:34:44 -08:00
Patrick Ohly
be82872eff cronjob: shut down integration test quickly again
6f2cd1b5bd swapped the order of cancel() and
closeFn() so that closeFn got called first when the test was done. This caused
it to block while waiting for goroutines which themselves were waiting for
the context cancellation. The test still shut down, it just took ~86s instead
of ~30s.

The fix is to register the cancel twice: once as soon as the context is
created (to clean up in case of an unexpected panic) and once after
closeFn (because then it'll get called first, as before).
2023-03-08 17:26:47 +01:00
Kubernetes Prow Robot
713ded7368
Merge pull request #115769 from mochizuki875/revert-115732-revert-root-test
Revert "Revert #114605: its unit test requires root permission"
2023-03-08 07:53:24 -08:00
Kubernetes Prow Robot
499a03d88b
Merge pull request #115451 from zhucan/nodeexpandvolume-secret-e2e
e2e: add e2e test to node expand volume with secret
2023-03-08 07:53:12 -08:00
Kubernetes Prow Robot
03ff890ef4
Merge pull request #116329 from dims/drop-aws-kubelet-credential-provider-and-cleanup-aws-storage-e2e-tests
Drop aws kubelet credential provider and cleanup aws storage e2e tests
2023-03-08 06:49:11 -08:00
Patrick Ohly
a04e20f622 golangci-lint: synchronize configs and add verification for that
https://github.com/kubernetes/kubernetes/pull/109728 added a
golangci-strict.yaml where gingkolinter and stylecheck (some recent additions
to golangci.yaml) were missing.

To prevent such mistakes in the future, lines that are intentionally different
get annotated with a comment about golangci-strict.yaml or golangci.yaml.
Then a suitable diff command in the new verify-golangci-lint-config.sh checks
that only such lines, comments and blank lines are different.
2023-03-08 15:23:27 +01:00
Patrick Ohly
cbf7d96a85 garbagecollector: structured logging of objectReference
When using JSON as output format, we want objectReference values to be
represented as a struct. For example, "item" is such a value:

{"ts":1678135015708.349,"caller":"garbagecollector/garbagecollector.go:595","msg":"classify object references","v":5,"item":{"name":"dra-test-driver-g4tkd","namespace":"dra-1830","apiVersion":"v1","uid":"c3f88616-7282-488c-887c-3f04291e6f4f"},"solid":null,"dangling":[{"apiVersion":"apps/v1","kind":"ReplicaSet","name":"dra-test-driver","uid":"dbe9a90c-9dfd-4ad0-8395-e5fa228f9851","controller":true,"blockOwnerDeletion":true}],"waitingForDependentsDeletion":null}
2023-03-08 08:37:56 -05:00
Andy Goldstein
26e3dab78b garbagecollector: use contextual logging
Signed-off-by: Andy Goldstein <andy.goldstein@redhat.com>
2023-03-08 08:37:56 -05:00
nilskch
0c11171b7e add tests for probe errors and ExecProbeTimeout 2023-03-08 11:59:59 +01:00
ZhangKe10140699
a239b9986b Migrated the StatefulSet controller (within `kube-controller-manager) to use [contextual logging](https://k8s.io/docs/concepts/cluster-administration/system-logs/#contextual-logging) 2023-03-08 18:57:57 +08:00
Kubernetes Prow Robot
f99c351992
Merge pull request #116174 from pacoxu/fix-TestNewNodeIpamControllerWithCIDRMasks
node ipam controller ut: run test in parallel to avoid timeout
2023-03-08 02:25:12 -08:00
Arda Güçlü
0e98533d1b Not share process namespace if user explicitly disables it
This PR sets higher priority to the `share-processes` flag than
provided profile.

For example, if user tries to use copy-to debugging with restricted
profiling, share process namespace should be false if user explicitly
disables it via `--share-processes=false`.
2023-03-08 11:58:28 +03:00
calvin0327
0ffac50126 cleanup container runtime options
Signed-off-by: calvin0327 <wen.chen@daocloud.io>
2023-03-08 16:53:19 +08:00
Paco Xu
f368413d65 sync default qps of kubelet change 2023-03-08 14:04:51 +08:00
Paco Xu
23a583dc36 cleanup: sync testdata as CronJobTimeZone is GAed 2023-03-08 13:15:46 +08:00
Paco Xu
0e6636eb33 nodeipam: return error instead of panics 2023-03-08 12:47:14 +08:00