tests: use readiness probes to wait for kata-deploy install

Now that kata-deploy has a proper readiness probe (/readyz returns 200 only after install completes), replace the ad-hoc wait strategies with kubectl wait --for=condition=Ready on the kata-deploy pods. Note: helm --wait is ineffective for single-node clusters with maxUnavailable=1 (the DaemonSet is considered ready with 0 ready pods), so the CI uses kubectl wait on the pod readiness condition directly. gha-run-k8s-common.sh: - Drop the waitForProcess polling loop for Running pods - Drop the `sleep 60s` with its FIXME comment - Add kubectl wait --for=condition=Ready instead helm-deploy.bash: - Drop the extra `kubectl rollout status` after helm - Drop the `sleep 60` - The existing --wait on the helm command now suffices Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-18 13:46:06 +00:00 · 2026-04-28 23:40:11 +02:00
parent 49396b7991
commit ed4f6ebc9e
2 changed files with 27 additions and 17 deletions
--- a/tests/functional/kata-deploy/lib/helm-deploy.bash
+++ b/tests/functional/kata-deploy/lib/helm-deploy.bash
@@ -101,7 +101,14 @@ deploy_kata() {
 		--wait --timeout "${HELM_TIMEOUT:-10m}"
 	)

-	# Run helm install
+	# Run helm install.
+	# --wait makes helm block until all DaemonSet pods are Ready. The readiness
+	# probe returns 200 only after install completes (artifacts extracted, CRI
+	# restarted, node labeled), so no extra rollout/sleep polling is needed.
+	#
+	# Exception: on single-node clusters with maxUnavailable=1, helm --wait can
+	# consider the DaemonSet ready with 0 ready pods. Belt-and-suspenders: also
+	# kubectl wait on the pod readiness condition.
 	"${helm_cmd[@]}"
 	local ret=$?

@@ -112,11 +119,8 @@ deploy_kata() {
 		return "${ret}"
 	fi

-	# Wait for daemonset to be ready
-	kubectl -n "${HELM_NAMESPACE}" rollout status daemonset/kata-deploy --timeout=300s
-
-	# Give it a moment to configure runtimes
-	sleep 60
+	kubectl -n "${HELM_NAMESPACE}" wait pod -l name=kata-deploy \
+		--for=condition=Ready --timeout="${HELM_TIMEOUT:-10m}" 2>/dev/null || true

 	return 0
 }
--- a/tests/gha-run-k8s-common.sh
+++ b/tests/gha-run-k8s-common.sh
@@ -1049,17 +1049,23 @@ VERIFICATION_POD_EOF
 		return 1
 	fi

-	# `helm install --wait` does not take effect on single replicas and maxUnavailable=1 DaemonSets
-	# like kata-deploy on CI. So wait for pods being Running in the "traditional" way.
-	local cmd
-	cmd="kubectl -n kube-system get -l name=kata-deploy pod 2>/dev/null | grep '\<Running\>'"
-	waitForProcess "${KATA_DEPLOY_WAIT_TIMEOUT}" 10 "${cmd}"
-
-	# FIXME: This is needed as the kata-deploy pod will be set to "Ready"
-	# when it starts running, which may cause issues like not having the
-	# node properly labeled or the artefacts properly deployed when the
-	# tests actually start running.
-	sleep 60s
+	# helm --wait is ineffective for single-node clusters with maxUnavailable=1
+	# (the DaemonSet is considered ready with 0 ready pods). First wait until at
+	# least one kata-deploy pod exists, then wait on the pod readiness condition
+	# instead — the readiness probe (/readyz) returns 200 only after install
+	# completes (artifacts extracted, CRI restarted, node labeled).
+	local pod_wait_deadline=$((SECONDS + KATA_DEPLOY_WAIT_TIMEOUT))
+	while true; do
+		if [[ -n "$(kubectl -n kube-system get pod -l name=kata-deploy -o name 2>/dev/null)" ]]; then
+			break
+		fi
+		if (( SECONDS >= pod_wait_deadline )); then
+			echo "ERROR: Timed out waiting for kata-deploy pod to be created"
+			return 1
+		fi
+		sleep 1
+	done
+	kubectl -n kube-system wait pod -l name=kata-deploy --for=condition=Ready --timeout="${KATA_DEPLOY_WAIT_TIMEOUT}s"

 	echo "::group::kata-deploy logs"
 	kubectl_retry -n kube-system logs -l name=kata-deploy