Files
kata-containers/src/agent/rustjail
Alex Lyn ce3473d272 agent: Kill processes before removing container directory in destroy()
When using multi-layer EROFS snapshotter, the destroy() method fails to
kill container processes, causing process leaks in shared PID namespace
scenarios.

Problem Background:
1. Multi-layer EROFS creates temporary mount points under the container's
  root directory:
  - /run/kata-containers/<cid>/multi-layer/upper (ext4, writable)
  - /run/kata-containers/<cid>/multi-layer/lower-0 (EROFS, read-only)
2. The original destroy() method executed in this order:
  (1) umount rootfs
  (2) fs::remove_dir_all(&self.root) <- FAILS with "Read-only file system"
  (3) cgroup cleanup and process killing <- NEVER EXECUTED
3. When remove_dir_all() encounters the read-only EROFS mount point, it
  returns EROFS error (os error 30), causing destroy() to exit early
  without killing processes.

Why This Fix:
1. The test case k8s-kill-all-process-in-container.bats creates an init
  container with a background process (tail -f /dev/null), expecting it
  to be killed when the init container is destroyed.
2. With shared PID namespace (shareProcessNamespace: true), the orphaned
  process continues running, causing the test to fail.

Solution:
1. Reorder the destroy() method to kill processes BEFORE attempting to
  remove the container directory:
  (1) Get PIDs from cgroup and send SIGKILL
  (2) Destroy cgroup
  (3) umount rootfs
  (4) fs::remove_dir_all(&self.root)
2. This ensures processes are always killed regardless of filesystem
  cleanup status, matching the behavior of overlayfs snapshotter.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-04-19 13:24:31 +02:00
..