mirror of
https://github.com/kata-containers/kata-containers.git
synced 2026-05-18 05:36:24 +00:00
The kata-deploy DaemonSet pod had no Kubernetes health probes, so the
kubelet could not distinguish between "still installing" and "crashed",
and rolling updates would proceed to the next node before install
actually finished.
Add a lightweight HTTP health server (built on raw tokio TcpListener,
no new crate dependencies) that starts immediately in the install path:
/healthz — liveness: returns 200 as soon as the server binds
/readyz — readiness: returns 503 while installing, 200 after
install completes (artifacts extracted, CRI restarted,
node labeled)
Wire the Helm chart with startup, liveness, and readiness probes
(all individually toggleable). The startup probe allows up to 10
minutes for install to complete before the liveness probe takes over.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>