When /var/lib/nydus-snapshotter is removed, containerd's BoltDB
(meta.db at /var/lib/containerd/) still holds snapshot records for
the nydus snapshotter. On the next install these stale records cause
image pulls to fail with:
"unable to prepare extraction snapshot:
target snapshot \"sha256:...\": already exists"
The failure path in core/unpack/unpacker.go:
1. sn.Prepare() → metadata layer finds the target chainID in BoltDB
→ returns AlreadyExists without touching the nydus backend.
2. sn.Stat() → metadata layer finds the BoltDB record, then calls
s.Snapshotter.Stat(bkey) on the nydus gRPC backend → NotFound
(backend was wiped).
3. The unpacker treats NotFound as a transient key-collision race and
retries 3 times; all 3 attempts hit the same dead end, and the
pull is aborted.
The commit message of 62ad0814c ("nydus: Always start from a clean
state") assumed "containerd will re-pull/re-unpack when it finds non-
existent snapshots", but that is not what happens: the metadata layer
intercepts the Prepare call in BoltDB before the backend is ever
consulted.
Fix: call cleanup_containerd_nydus_snapshots() before stopping the
nydus service (and thus before wiping its data directory) in both
install_nydus_snapshotter and uninstall_nydus_snapshotter.
The cleanup must run while the service is still up because ctr
snapshots rm goes through the metadata layer which calls the nydus
gRPC backend to physically remove the snapshot; if the service is
already stopped the backend call fails and the BoltDB record remains.
The cleanup:
- Discovers all containerd namespaces via `ctr namespaces ls -q`
(falls back to k8s.io if that fails).
- Removes containers whose Snapshotter field matches the nydus plugin
name; these become dangling references once snapshots are gone and
can confuse container reconciliation after an aborted CI run.
- Removes snapshots round by round (leaf-first) until either the list
is empty or no progress can be made (see below).
Note: containerd's GC cannot substitute for this explicit cleanup.
The image record (a GC root) references content blobs which reference
the snapshots via gc.ref labels, keeping the entire chain alive in
the GC graph even after the nydus backend is wiped.
Snapshot removal rounds
-----------------------
Snapshot chains are linear: an image with N layers produces a chain
of N snapshots, each parented on the previous. Only the current leaf
can be removed each round, so N layers require exactly N rounds.
There is no fixed round cap — the loop terminates when either the
list reaches zero (success) or a round removes nothing at all
(all remaining snapshots are actively in use by running workloads).
Active workload safety
----------------------
If active workloads still hold nydus snapshots (e.g. during a live
upgrade), no progress is made in a round and cleanup_nydus_snapshots
returns false. Both install_nydus_snapshotter and
uninstall_nydus_snapshotter gate the fs::remove_dir_all on that
return value:
- true → proceed as before: stop service, wipe data dir.
- false → stop service, skip data dir removal, log a warning.
The new nydus instance starts on the existing backend
state; running containers are left intact.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
Use the container data storage feature for the k8s-nvidia-nim.bats
test pod manifests. This reduces the pods' memory requirements.
For this, enable the block-encrypted emptydir_mode for the NVIDIA
GPU TEE handlers.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
the micro_http crate was just pointing the the main branch and hadn't been updated for
around 3 years, so pin to the latest for stability and update to remediate RUSTSEC-2024-0002
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
We need to explicitly pass `-O index.html` as the busybox' wget has a
different behaviour than GNU's wget.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
In case a wget fails for one reason or another, it'll leave behind an
'index.html' file. Let's make sure we allow overriding that file so the
retry loop doesn't fail for no reason.
Fixes: #12670
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Since the dragonball's vmm thread had been joined in the pod's
netns, which wouldn't access the network, thus we should make
sure the nydus's worker thread join into the runD's main thread's
netns which would access the network.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
We've been using `experimental_force_guest_pull`, but now that we have a
containerd release that should work more reliably with the multi
snapshotter setup, we want to give it a try.
Note: We need containerd 2.2.2+.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
With debug/ebpf updates in place, let's bump the kata config version.
Signed-off-by: Agam Dua <agam_dua@apple.com>
Co-authored-by: Eric Ernst <eric_ernst@apple.com>
Add missing terms to the spell check dictionary to fix CI failures
for kernel debug documentation:
- eBPF
- dwarves: Linux package with DWARF/BTF tools (pahole) required for
CONFIG_DEBUG_INFO_BTF kernel option
Also fix the casing of "ebpf" to "eBPF" in the kernel README to match
the official naming convention.
Signed-off-by: Agam Dua <agam_dua@apple.com>
Fixed a bug with the debug kernel build where common/ was repeated
after the common path variable, resulting in the debug
confs never being picked up.
This exposed a subsequent bug where the debug conf
was included in other builds, this is also fixed by creating a
separate directory for debug confs with one file at the moment,
debug.conf that contains debug configurations and bpf specific
configs.
To enable kernel builds (specifically for bpf) the dwarves package was added
to the kernel dockerfile for the pahole package.
Signed-off-by: Agam Dua <agam_dua@apple.com>
Add the debug kernel to the kata tarball alongside the other kernels.
Also update the kernel README documentation to describe the new debug
kernel build process.
Signed-off-by: Agam Dua <agam_dua@apple.com>
Adds a BPF section in the debug.conf kernel configuration options
to enable eBPF and BTF support for debug kernel builds.
Signed-off-by: Agam Dua <agam_dua@apple.com>
This fixes the test_dir variable in static-checks.sh so that
when a --repo-path is provided, the test_dir variable uses that
for the location instead of the GOPATH location.
Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>
This supersedes https://github.com/kata-containers/kata-containers/pull/12622.
I replaced Zensical with mkdocs-materialx. Materialx is a fork of mkdocs-material
created after mkdocs-material was put into maintenance mode. We'll use this
platform until Zensical is more feature complete.
Added a few of the existing docs into the site to make a more user-friendly flow.
Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>
With the upcoming GPU operator 26.3 relase and recent changes to
kata-containers, we adapt this documentation with notes on multi
GPU passthrough, support for TDX, changed deployment instructions,
and with various other minor improvements.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
For kata tarballs we eventually release to GitHub, check their
size against the GitHub size limit. With this, we fail in case of
an ongoing release process in 'CI | Publish Kata Containers payload'
instead of only later on in the 'Release Kata Containers' action,
and we fail during PR builds, avoiding this situation at all.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
It's a dev-dependency that doesn't seem to be used, so
remove it and resolve RUSTSEC-2025-0052
Assisted-By: Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- Bump tracing-subscriber to 0.3.20 to resolve RUSTSEC-2025-0055
- Switch deprecated `slog_info!` for `slog::info!`
Generated-By: Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
The old hunspell based spell-check was causing contributors
challenges and proving a barrier to doc updates. We've replaced
it with a cspell based-solution, so clean up the old approach.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>