kubernetes

mirror of https://github.com/k3s-io/kubernetes.git synced 2025-07-19 01:40:13 +00:00

Author	SHA1	Message	Date
Clayton Coleman	6b9a381185	kubelet: Force deleted pods can fail to move out of terminating If a CRI error occurs during the terminating phase after a pod is force deleted (API or static) then the housekeeping loop will not deliver updates to the pod worker which prevents the pod's state machine from progressing. The pod will remain in the terminating phase but no further attempts to terminate or cleanup will occur until the kubelet is restarted. The pod worker now maintains a store of the pods state that it is attempting to reconcile and uses that to resync unknown pods when SyncKnownPods() is invoked, so that failures in sync methods for unknown pods no longer hang forever. The pod worker's store tracks desired updates and the last update applied on podSyncStatuses. Each goroutine now synchronizes to acquire the next work item, context, and whether the pod can start. This synchronization moves the pending update to the stored last update, which will ensure third parties accessing pod worker state don't see updates before the pod worker begins synchronizing them. As a consequence, the update channel becomes a simple notifier (struct{}) so that SyncKnownPods can coordinate with the pod worker to create a synthetic pending update for unknown pods (i.e. no one besides the pod worker has data about those pods). Otherwise the pending update info would be hidden inside the channel. In order to properly track pending updates, we have to be very careful not to mix RunningPods (which are calculated from the container runtime and are missing all spec info) and config- sourced pods. Update the pod worker to avoid using ToAPIPod() and instead require the pod worker to directly use update.Options.Pod or update.Options.RunningPod for the correct methods. Add a new SyncTerminatingRuntimePod to prevent accidental invocations of runtime only pod data. Finally, fix SyncKnownPods to replay the last valid update for undesired pods which drives the pod state machine towards termination, and alter HandlePodCleanups to: - terminate runtime pods that aren't known to the pod worker - launch admitted pods that aren't known to the pod worker Any started pods receive a replay until they reach the finished state, and then are removed from the pod worker. When a desired pod is detected as not being in the worker, the usual cause is that the pod was deleted and recreated with the same UID (almost always a static pod since API UID reuse is statistically unlikely). This simplifies the previous restartable pod support. We are careful to filter for active pods (those not already terminal or those which have been previously rejected by admission). We also force a refresh of the runtime cache to ensure we don't see an older version of the state. Future changes will allow other components that need to view the pod worker's actual state (not the desired state the podManager represents) to retrieve that info from the pod worker. Several bugs in pod lifecycle have been undetectable at runtime because the kubelet does not clearly describe the number of pods in use. To better report, add the following metrics: kubelet_desired_pods: Pods the pod manager sees kubelet_active_pods: "Admitted" pods that gate new pods kubelet_mirror_pods: Mirror pods the kubelet is tracking kubelet_working_pods: Breakdown of pods from the last sync in each phase, orphaned state, and static or not kubelet_restarted_pods_total: A counter for pods that saw a CREATE before the previous pod with the same UID was finished kubelet_orphaned_runtime_pods_total: A counter for pods detected at runtime that were not known to the kubelet. Will be populated at Kubelet startup and should never be incremented after. Add a metric check to our e2e tests that verifies the values are captured correctly during a serial test, and then verify them in detail in unit tests. Adds 23 series to the kubelet /metrics endpoint.	2023-03-08 22:03:51 -06:00
David Porter	c5a1f0188b	test: Add node e2e test to verify static pod termination Add node e2e test to verify that static pods can be started after a previous static pod with the same config temporarily failed termination. The scenario is: 1. Static pod is started 2. Static pod is deleted 3. Static pod termination fails (internally `syncTerminatedPod` fails) 4. At later time, pod termination should succeed 5. New static pod with the same config is (re)-added 6. New static pod is expected to start successfully To repro this scenario, setup a pod using a NFS mount. The NFS server is stopped which will result in volumes failing to unmount and `syncTerminatedPod` to fail. The NFS server is later started, allowing the volume to unmount successfully. xref: 1. https://github.com/kubernetes/kubernetes/pull/113145#issuecomment-1289587988 2. https://github.com/kubernetes/kubernetes/pull/113065 3. https://github.com/kubernetes/kubernetes/pull/113093 Signed-off-by: David Porter <david@porter.me>	2023-03-03 10:00:48 -06:00
Gunju Kim	f690a0ce41	Fix createStaticPod to not use container.RestartPolicy	2023-02-23 21:18:24 +09:00
Patrick Ohly	136f89dfc5	e2e: use error wrapping with %w The recently introduced failure handling in ExpectNoError depends on error wrapping: if an error prefix gets added with `fmt.Errorf("foo: %v", err)`, then ExpectNoError cannot detect that the root cause is an assertion failure and then will add another useless "unexpected error" prefix and will not dump the additional failure information (currently the backtrace inside the E2E framework). Instead of manually deciding on a case-by-case basis where %w is needed, all error wrapping was updated automatically with sed -i "s/fmt.Errorf$.$: '$%s\\|%v$'\",$. err)$/fmt.Errorf\1: %w\",\3/" $(git grep -l 'fmt.Errorf' test/e2e*) This may be unnecessary in some cases, but it's not wrong.	2023-02-06 15:39:13 +01:00
Antonio Ojea	7f5ae1c0c1	Revert "e2e: wait for pods with gomega"	2023-02-06 12:08:22 +01:00
Patrick Ohly	222f655062	e2e: use error wrapping with %w The recently introduced failure handling in ExpectNoError depends on error wrapping: if an error prefix gets added with `fmt.Errorf("foo: %v", err)`, then ExpectNoError cannot detect that the root cause is an assertion failure and then will add another useless "unexpected error" prefix and will not dump the additional failure information (currently the backtrace inside the E2E framework). Instead of manually deciding on a case-by-case basis where %w is needed, all error wrapping was updated automatically with sed -i "s/fmt.Errorf$.$: '$%s\\|%v$'\",$. err)$/fmt.Errorf\1: %w\",\3/" $(git grep -l 'fmt.Errorf' test/e2e*) This may be unnecessary in some cases, but it's not wrong.	2023-01-31 13:01:39 +01:00
Patrick Ohly	2f6c4f5eab	e2e: use Ginkgo context All code must use the context from Ginkgo when doing API calls or polling for a change, otherwise the code would not return immediately when the test gets aborted.	2022-12-16 20:14:04 +01:00
Patrick Ohly	df5d84ae81	e2e: accept context from Ginkgo Every ginkgo callback should return immediately when a timeout occurs or the test run manually gets aborted with CTRL-C. To do that, they must take a ctx parameter and pass it through to all code which might block. This is a first automated step towards that: the additional parameter got added with sed -i 's/$framework.ConformanceIt\\|ginkgo.It$$.$func() {$/\1\2func(ctx context.Context) {/' \ $(git grep -l -e framework.ConformanceIt -e ginkgo.It ) $GOPATH/bin/goimports -w $(git status \| grep modified: \| sed -e 's/. //') log_test.go was left unchanged.	2022-12-10 19:50:18 +01:00
Dave Chen	857458cfa5	update ginkgo from v1 to v2 and gomega to 1.19.0 - update all the import statements - run hack/pin-dependency.sh to change pinned dependency versions - run hack/update-vendor.sh to update go.mod files and the vendor directory - update the method signatures for custom reporters Signed-off-by: Dave Chen <dave.chen@arm.com>	2022-07-08 10:44:46 +08:00
Sergiusz Urbaniak	373c08e0c7	test/e2e/framework: configure pod security admission level for e2e tests	2022-03-28 15:42:10 +02:00
Elana Hashman	47086a6623	Add test for recreating a static pod	2021-09-15 14:01:48 -04:00
wojtekt	a74737eb03	Mark remaining e2e_node tests with [sig-*] label	2021-02-23 20:11:09 +01:00
Derek Carr	d2c78b6589	Verify running mirror pod has running containers	2020-08-25 12:23:24 -04:00
Jefftree	ace97738e2	Update formatting of conformance comment	2020-07-29 20:50:44 -07:00
Mike Danese	aaf855c1e6	deref all calls to metav1.NewDeleteOptions that are passed to clients. This is gross but because NewDeleteOptions is used by various parts of storage that still pass around pointers, the return type can't be changed without significant refactoring within the apiserver. I think this would be good to cleanup, but I want to minimize apiserver side changes as much as possible in the client signature refactor.	2020-03-05 14:59:46 -08:00
Mike Danese	3aa59f7f30	generated: run refactor	2020-02-07 18:16:47 -08:00
danielqsj	6596a14d39	add missing alias of api errors under test	2019-12-26 17:29:38 +08:00
SataQiu	d2bdf89a8b	fix golint issues in test/e2e_node	2019-11-26 16:26:55 +08:00
Tim Allclair	62e7d197e3	Add mirror pod e2e test	2019-10-29 16:14:06 -07:00
draveness	aeadd793cb	feat: update multiple files in e2e node with framework helpers	2019-08-02 14:39:05 +08:00
SataQiu	641d330f89	e2e_node: clean up non-recommended import	2019-07-28 12:49:36 +08:00
Vu Cong Tuan	c747b7f38d	Fix many typos in both code and comments Signed-off-by: Vu Cong Tuan <tuanvc@vn.fujitsu.com>	2019-02-27 14:41:02 +07:00
Aaron Crickenberger	d724e979cd	Remove [Conformance] from tests in e2e_node None of these tests actually run as part of e2e testing, which is the only way conformance tests are kicked off. They should not be included as part of the conformance suite unless they live in test/e2e/common	2018-08-07 10:43:59 -07:00
Manjunath A Kumatagi	1f7f33aaa4	Update the nginx image from hub.docker.com	2018-08-04 05:19:53 +05:30
Srini Brahmaroutu	dcb7bc313f	Adding details to Conformance Tests using RFC 2119 standards.	2018-07-31 17:21:18 -07:00
Yu-Ju Hong	4ad9aedb04	test/e2e_node: Add [NodeConformance] to tests tagged [Conformance] This has no effect yet until test configurations are updated.	2018-05-21 17:51:49 -07:00
Manjunath A Kumatagi	1bb810e749	Use pause manifest image	2018-04-06 11:00:50 +05:30
Michael Taufen	b4bddcc998	expunge the word 'manifest' from Kubelet's config API The word 'manifest' technically refers to a container-group specification that predated the Pod abstraction. We should avoid using this legacy terminology where possible. Fortunately, the Kubelet's config API will be beta in 1.10 for the first time, so we still had the chance to make this change. I left the flags alone, since they're deprecated anyway. I changed a few var names in files I touched too, but this PR is the just the first shot, not the whole campaign (`git grep -i manifest \| wc -l -> 1248`).	2018-02-23 11:44:06 -08:00
jianglingxia	76e90061a2	reopen #58913 Fix TODO move GetPauseImageNameForHostArch func	2018-01-31 15:06:32 +08:00
xiangpengzhao	7fdea2b0cf	Use framework.ConformanceIt for node e2e conformance tests	2017-11-17 17:28:20 +08:00
Kevin	4c8539cece	use core client with explicit version globally	2017-10-27 15:48:32 +08:00
Manjunath A Kumatagi	ee4d54c70c	Port e2e tests for multi architecture	2017-09-01 05:40:52 +05:30
Jacob Simpson	29c1b81d4c	Scripted migration from clientset_generated to client-go.	2017-07-17 15:05:37 -07:00
Chao Xu	60604f8818	run hack/update-all	2017-06-22 11:31:03 -07:00
Chao Xu	f4989a45a5	run root-rewrite-v1-..., compile	2017-06-22 10:25:57 -07:00
Dr. Stefan Schimanski	d7eb3b6870	pkg/util: move uuid and strategicpatch into k8s.io/apimachinery	2017-01-25 19:45:09 +01:00
Clayton Coleman	be6d2933df	refactor: Move *Options references to metav1	2017-01-24 13:41:51 -05:00
deads2k	77b4d55982	mechanical	2017-01-16 09:35:12 -05:00
NickrenREN	a12dea14e0	fix redundant alias clientset	2017-01-12 10:21:05 +08:00
deads2k	6a4d5cd7cc	start the apimachinery repo	2017-01-11 09:09:48 -05:00
Chao Xu	03d8820edc	rename /release_1_5 to /clientset	2016-12-14 12:39:48 -08:00
Wojciech Tyczynski	a9ec31209e	GetOptions - fix tests	2016-12-09 09:42:01 +01:00
Chao Xu	29400ac195	test/e2e_node	2016-11-23 15:53:09 -08:00
Random-Liu	090809d8ad	Remove dependency on kubelet related flags.	2016-11-17 10:17:32 -08:00
Random-Liu	f4aee8664d	Mark more conformance tests.	2016-11-05 21:11:51 -07:00
Jan Chaloupka	4fde09d308	Replace client with clientset in code	2016-10-23 22:00:35 +02:00
Random-Liu	ed411c9042	Add image white list, images in white list will be prepulled, and only images in white list could be used in the test. Currently only enabled in node e2e test.	2016-09-19 14:39:23 -07:00
Random-Liu	3910a66bb5	Add run-services-mode option, and start e2e services in a separate process.	2016-08-15 14:45:01 -07:00
Harry Zhang	c495397cae	Refactor uuid into its own pkg	2016-07-30 00:07:02 -04:00
Random-Liu	9d48c76361	Make the node e2e test run in parallel.	2016-07-29 16:40:59 -07:00

1 2

57 Commits