Files
kata-containers/tools
Fabiano Fidêncio 3d732986d2 kata-deploy: add per-node staged cleanup for job mode
Add the uninstall counterpart to the install dispatcher for
deploymentMode: job. On `helm uninstall`, a single pre-delete hook Job
runs the kata-deploy-job-dispatcher, which enumerates the targeted nodes
live and fans out one node-pinned cleanup Job per node that runs the
install pipeline in reverse and exits:

  unlabel -> revert-cri   (initContainers, run sequentially)
  remove-artifacts        (main container)

Running as a pre-delete hook means the dispatcher ServiceAccount/RBAC and
the kata-deploy host-mutation RBAC still exist while the Jobs run, so the
unlabel stage retains node get/patch access. revert-cri and
remove-artifacts are host-only operations (privileged nsenter / host
mount) and need no extra cluster RBAC.

Ordering mirrors install in reverse: unlabel first so the scheduler stops
placing kata workloads here, then revert the CRI config + restart the
runtime, then remove the on-host artifacts. Each stage is idempotent and
skips when already undone, so partially-installed nodes and re-runs are
safe.

Uninstall node selection is deliberately SEPARATE from install (a
dedicated job.cleanup.* block) and defaults to every node carrying the
katacontainers.io/kata-runtime label (set by the install label stage)
rather than re-evaluating the install selector. Because the cleanup
dispatcher resolves nodes live when it runs, this stays robust to
install-time selector drift (relabeled nodes, etc.) while remaining fully
overridable via job.cleanup.nodes / job.cleanup.nodeSelector /
job.cleanup.nodeSelectorExpressions. The default (daemonset) mode is
unaffected.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-06-12 18:58:33 +02:00
..
2026-05-18 09:46:42 +01:00
2026-06-11 19:02:23 +02:00