mirror of https://github.com/kata-containers/kata-containers.git synced 2026-04-02 18:13:57 +00:00

Files

Fabiano Fidêncio fd583d833b kata-deploy: nydus: clean containerd metadata before wiping backend

When /var/lib/nydus-snapshotter is removed, containerd's BoltDB
(meta.db at /var/lib/containerd/) still holds snapshot records for
the nydus snapshotter.  On the next install these stale records cause
image pulls to fail with:

  "unable to prepare extraction snapshot:
   target snapshot \"sha256:...\": already exists"

The failure path in core/unpack/unpacker.go:
1. sn.Prepare() → metadata layer finds the target chainID in BoltDB
   → returns AlreadyExists without touching the nydus backend.
2. sn.Stat()    → metadata layer finds the BoltDB record, then calls
   s.Snapshotter.Stat(bkey) on the nydus gRPC backend → NotFound
   (backend was wiped).
3. The unpacker treats NotFound as a transient key-collision race and
   retries 3 times; all 3 attempts hit the same dead end, and the
   pull is aborted.

The commit message of 62ad0814c ("nydus: Always start from a clean
state") assumed "containerd will re-pull/re-unpack when it finds non-
existent snapshots", but that is not what happens: the metadata layer
intercepts the Prepare call in BoltDB before the backend is ever
consulted.

Fix: call cleanup_containerd_nydus_snapshots() before stopping the
nydus service (and thus before wiping its data directory) in both
install_nydus_snapshotter and uninstall_nydus_snapshotter.

The cleanup must run while the service is still up because ctr
snapshots rm goes through the metadata layer which calls the nydus
gRPC backend to physically remove the snapshot; if the service is
already stopped the backend call fails and the BoltDB record remains.

The cleanup:
- Discovers all containerd namespaces via `ctr namespaces ls -q`
  (falls back to k8s.io if that fails).
- Removes containers whose Snapshotter field matches the nydus plugin
  name; these become dangling references once snapshots are gone and
  can confuse container reconciliation after an aborted CI run.
- Removes snapshots round by round (leaf-first) until either the list
  is empty or no progress can be made (see below).

Note: containerd's GC cannot substitute for this explicit cleanup.
The image record (a GC root) references content blobs which reference
the snapshots via gc.ref labels, keeping the entire chain alive in
the GC graph even after the nydus backend is wiped.

Snapshot removal rounds
-----------------------
Snapshot chains are linear: an image with N layers produces a chain
of N snapshots, each parented on the previous.  Only the current leaf
can be removed each round, so N layers require exactly N rounds.
There is no fixed round cap — the loop terminates when either the
list reaches zero (success) or a round removes nothing at all
(all remaining snapshots are actively in use by running workloads).

Active workload safety
----------------------
If active workloads still hold nydus snapshots (e.g. during a live
upgrade), no progress is made in a round and cleanup_nydus_snapshots
returns false.  Both install_nydus_snapshotter and
uninstall_nydus_snapshotter gate the fs::remove_dir_all on that
return value:

  - true  → proceed as before: stop service, wipe data dir.
  - false → stop service, skip data dir removal, log a warning.
            The new nydus instance starts on the existing backend
            state; running containers are left intact.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor

2026-03-24 16:44:25 +01:00

guest-image

kernel: Unify kernel and kernel-confidential

2026-02-09 18:28:23 +01:00

kata-debug

kata-debug: Make path resolution more robust

2025-05-06 21:16:25 +09:00

kata-deploy

kata-deploy: nydus: clean containerd metadata before wiping backend

2026-03-24 16:44:25 +01:00

kata-monitor

tools: kata-monitor: update go version used to build in Dockerfile

2025-06-25 15:32:41 -07:00

kernel

kernel: bump config version

2026-03-20 15:04:15 -07:00

kubectl

tools: Build kubectl image

2026-01-12 15:48:44 +01:00

qemu

versions: bump QEMU to v10.2.1

2026-02-18 18:18:52 +01:00

release

kata-deploy: disable provenance/SBOM for quay.io compatibility

2026-02-16 13:32:25 +01:00

scripts

scripts: use temporary GPG home when verifying cached gperf tarball

2026-02-13 19:39:55 +01:00

static-build

kernel: Fix debug build and add debug symbols to installation

2026-03-20 14:50:23 -07:00

.gitignore

packaging: Remove snap package

2023-06-12 09:24:09 +01:00

artifact-list.sh

shellcheck: Fix shellcheck SC2068

2025-03-04 09:35:46 +00:00

Makefile

packaging: Remove snap package

2023-06-12 09:24:09 +01:00

README.md

packaging: Remove snap package

2023-06-12 09:24:09 +01:00

README.md

Kata Containers packaging

Introduction

Kata Containers currently supports packages for many distributions. Tooling to aid in creating these packages are contained within this repository.

Build in a container

Kata build artifacts are available within a container image, created by a Dockerfile. Reference DaemonSets are provided in kata-deploy, which make installation of Kata Containers in a running Kubernetes Cluster very straightforward.

Build static binaries

See the static build documentation.

Build Kata Containers Kernel

See the kernel documentation.

Build QEMU

See the QEMU documentation.

Create a Kata Containers release

See the release documentation.

Packaging scripts

See the scripts documentation.

Credits

Kata Containers packaging uses packagecloud for package hosting.