tests: use readiness probes to wait for kata-deploy install

Now that kata-deploy has a proper readiness probe (/readyz returns 200
only after install completes), replace the ad-hoc wait strategies with
kubectl wait --for=condition=Ready on the kata-deploy pods.

Note: helm --wait is ineffective for single-node clusters with
maxUnavailable=1 (the DaemonSet is considered ready with 0 ready pods),
so the CI uses kubectl wait on the pod readiness condition directly.

  gha-run-k8s-common.sh:
  - Drop the waitForProcess polling loop for Running pods
  - Drop the `sleep 60s` with its FIXME comment
  - Add kubectl wait --for=condition=Ready instead

  helm-deploy.bash:
  - Drop the extra `kubectl rollout status` after helm
  - Drop the `sleep 60`
  - The existing --wait on the helm command now suffices

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
This commit is contained in:
Fabiano Fidêncio
2026-04-28 23:40:11 +02:00
committed by Fabiano Fidêncio
parent 49396b7991
commit ed4f6ebc9e
2 changed files with 27 additions and 17 deletions

View File

@@ -101,7 +101,14 @@ deploy_kata() {
--wait --timeout "${HELM_TIMEOUT:-10m}"
)
# Run helm install
# Run helm install.
# --wait makes helm block until all DaemonSet pods are Ready. The readiness
# probe returns 200 only after install completes (artifacts extracted, CRI
# restarted, node labeled), so no extra rollout/sleep polling is needed.
#
# Exception: on single-node clusters with maxUnavailable=1, helm --wait can
# consider the DaemonSet ready with 0 ready pods. Belt-and-suspenders: also
# kubectl wait on the pod readiness condition.
"${helm_cmd[@]}"
local ret=$?
@@ -112,11 +119,8 @@ deploy_kata() {
return "${ret}"
fi
# Wait for daemonset to be ready
kubectl -n "${HELM_NAMESPACE}" rollout status daemonset/kata-deploy --timeout=300s
# Give it a moment to configure runtimes
sleep 60
kubectl -n "${HELM_NAMESPACE}" wait pod -l name=kata-deploy \
--for=condition=Ready --timeout="${HELM_TIMEOUT:-10m}" 2>/dev/null || true
return 0
}

View File

@@ -1049,17 +1049,23 @@ VERIFICATION_POD_EOF
return 1
fi
# `helm install --wait` does not take effect on single replicas and maxUnavailable=1 DaemonSets
# like kata-deploy on CI. So wait for pods being Running in the "traditional" way.
local cmd
cmd="kubectl -n kube-system get -l name=kata-deploy pod 2>/dev/null | grep '\<Running\>'"
waitForProcess "${KATA_DEPLOY_WAIT_TIMEOUT}" 10 "${cmd}"
# FIXME: This is needed as the kata-deploy pod will be set to "Ready"
# when it starts running, which may cause issues like not having the
# node properly labeled or the artefacts properly deployed when the
# tests actually start running.
sleep 60s
# helm --wait is ineffective for single-node clusters with maxUnavailable=1
# (the DaemonSet is considered ready with 0 ready pods). First wait until at
# least one kata-deploy pod exists, then wait on the pod readiness condition
# instead — the readiness probe (/readyz) returns 200 only after install
# completes (artifacts extracted, CRI restarted, node labeled).
local pod_wait_deadline=$((SECONDS + KATA_DEPLOY_WAIT_TIMEOUT))
while true; do
if [[ -n "$(kubectl -n kube-system get pod -l name=kata-deploy -o name 2>/dev/null)" ]]; then
break
fi
if (( SECONDS >= pod_wait_deadline )); then
echo "ERROR: Timed out waiting for kata-deploy pod to be created"
return 1
fi
sleep 1
done
kubectl -n kube-system wait pod -l name=kata-deploy --for=condition=Ready --timeout="${KATA_DEPLOY_WAIT_TIMEOUT}s"
echo "::group::kata-deploy logs"
kubectl_retry -n kube-system logs -l name=kata-deploy