kata-deploy now unpacks individual component tarballs itself, so the
final `kata-static.tar.zst` no longer needs to be a merged filesystem
payload. Merging everything has two downsides for that flow:
- It pulls in everything kept on disk under build/, which previously
forced us to also drop agent/busybox/coco-guest-components/nydus
from the build set to keep them out of the final tarball.
- The merged tarball duplicates content kata-deploy will repack on
its own anyway.
Add a `passthrough` mode to kata-deploy-merge-builds.sh that, instead
of untarring each `kata-static-*.tar.zst` into a single filesystem
tree, copies the selected component tarballs into the final tarball
as-is. The existing `merge` mode remains the default to preserve the
non-kata-deploy install paths (e.g. `make install-tarball`).
Wire `nvgpu-tarball` to the new mode via `FINAL_TARBALL_MERGE_MODE=
passthrough`, paired with the existing `FINAL_TARBALL_INPUTS`
allowlist. This lets us keep agent/busybox/coco as build prereqs of
the GPU rootfs while shipping a final tarball that only contains the
NVIDIA-relevant components.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
The Makefile passes $(MK_DIR)/../../../../versions.yaml — already an
absolute path — to kata-deploy-merge-builds.sh. The script then
unconditionally prepended ${PWD}/, producing a malformed path like:
/repo//repo/tools/.../local-build//../../../../versions.yaml
which made cp fail with "No such file or directory" at the merge-builds
step (the very last step of `make nvgpu-tarball`).
Only prepend ${PWD}/ when the input is relative — that preserves the
original fix for the pushd-changes-cwd issue (commit ae6e8d2b3) without
mangling absolute paths from Makefile callers.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Assisted-By: Claude <noreply@anthropic.com>
With `make all -j N` running multiple tarballs concurrently and silent
mode redirecting each build's stdio to its per-target log, a failing
target's "Failed to build: <name>, logs:" banner gets interleaved with
other in-flight jobs' output, making it hard to tell which target
failed.
Pass `--output-sync=target` to the recursive make so each sub-make's
output is buffered and emitted as one block when the target finishes,
keeping the failure banner contiguous with its log dump.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Assisted-By: Claude <noreply@anthropic.com>
build-qemu.sh runs in the per-target builddir (e.g.
build/qemu-tarball/builddir/), which persists across runs. If a previous
build left the cloned `qemu` tree behind (e.g. after an interrupted
build), the next run errors out with:
fatal: destination path 'qemu' already exists and is not an empty
directory.
Wipe `qemu` before cloning so the build is repeatable from a dirty
builddir.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Assisted-By: Claude <noreply@anthropic.com>
Parallel make jobs invoke kata-deploy-binaries-in-docker.sh concurrently
and collide on two shared paths:
ln: Already exists
mkdir: /home/$USER/.docker: File exists
Skip the symlink creation when the link is already in place. If a
parallel job wins the create race in the cold-start window, fall back to
re-checking that the link exists so a real ln failure (permission, disk
full, etc.) still propagates rather than being silently swallowed.
The `~/.docker` mkdir is guarded by a `[[ ! -d ]]` check that two
processes can pass simultaneously, after which one bare `mkdir` fails.
Switch to `mkdir -p` so the second invocation is a no-op.
Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
serial-targets now waits for the other BASE_TARBALLS items so the
inner rootfs assembly runs with DEPS= against already-built
artifacts. This also fixes a pre-existing race in the main flows
where the outer parallel and inner -j 1 makes could both build
kernel-tarball at the same time.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Leftover from #12954's rebase: the substantive sed-hack -> DEPS= change
landed on main, but the .PHONY declaration didn't make it. Add it so
the recipe always runs even if a stale `kata-artifacts` file exists in
CWD.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Assisted-By: Claude <noreply@anthropic.com>
containerd 2.2.0+ always imports /etc/containerd/conf.d/*.toml,
so write kata-deploy runtime config there directly, avoiding
modification of the main containerd config's imports array.
Signed-off-by: thebigbone <pacman@duck.com>
Build and publish the kata-deploy binary and CoCo guest-pull nydus
snapshotter as dedicated per-arch artifacts, then consume those tarballs
when assembling the kata-deploy image.
This avoids rebuilding those components in the payload image (which
would happen in serial) path and reduces overall CI build time.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
The cached shim-v2 tarballs ship per-variant `root_hash_*.txt` files
embedded in the matching measured-rootfs image. Until now only
shim-v2-rust validated those hashes against the freshly built rootfs
images on a cache hit; shim-v2-go reused whatever was cached without
checking, even though its bundled configuration files contain the
`KERNELVERITYPARAMS_*` values baked in at build time.
When a PR changes the agent (and therefore the rootfs image and its
dm-verity hash) but does not touch `src/runtime`, the shim-v2-go cache
key stays the same and the stale tarball is reused. The resulting
guest cmdline carries a verity hash that no longer matches the new
rootfs image, so the VM panics very early in boot:
device-mapper: verity: 254:1: metadata block 0 is corrupted
erofs (device dm-0): cannot read erofs superblock
Kernel panic - not syncing: VFS: Unable to mount root fs ...
Generalize the shim-v2-rust cache validation so it also runs for
shim-v2-go, push the per-variant root-hash sidecar files for both
shims, and fall back to a full rebuild whenever the cached hash is
missing or differs from the image one.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
The cached shim-v2 tarballs ship per-variant `root_hash_*.txt` files
embedded in the matching measured-rootfs image. Until now only
shim-v2-rust validated those hashes against the freshly built rootfs
images on a cache hit; shim-v2-go reused whatever was cached without
checking, even though its bundled configuration files contain the
`KERNELVERITYPARAMS_*` values baked in at build time.
When a PR changes the agent (and therefore the rootfs image and its
dm-verity hash) but does not touch `src/runtime`, the shim-v2-go cache
key stays the same and the stale tarball is reused. The resulting
guest cmdline carries a verity hash that no longer matches the new
rootfs image, so the VM panics very early in boot:
device-mapper: verity: 254:1: metadata block 0 is corrupted
erofs (device dm-0): cannot read erofs superblock
Kernel panic - not syncing: VFS: Unable to mount root fs ...
Generalize the shim-v2-rust cache validation so it also runs for
shim-v2-go, push the per-variant root-hash sidecar files for both
shims, and fall back to a full rebuild whenever the cached hash is
missing or differs from the image one.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Without the protocol in the URI, grpc-go defaults to the DNS resolver,
which results in an error for unix sockets (`name resolver error: produced
zero addresses`).
We also remove the `getAddressAndDialer(...)` and `dial(...)` functions, as
they are no longer necessary, grpc-go supports connecting to unix sockets
directly. This also removes the matching tests.
This also adds a `Makefile` and tweaks the Dockerfile to simplify building
the Docker image.
Fixes#12398
Signed-off-by: Florian Vichot <florian.vichot@gmail.com>
The `kata-deploy-merge-builds.sh` script blindly prepended `PWD` to the
`kata_versions_yaml_file` argument, assuming it was always a relative
path. However, the `Makefile` passes an absolute path using `$(MK_DIR)`.
This resulted in invalid double-concatenated paths like
`/workspace/...//workspace/...` which failed to copy.
Fix this by using `readlink -f` to safely resolve the path. This
correctly handles both relative and absolute paths, preventing path
corruption.
Signed-off-by: Huy Pham <huypham@google.com>
The retry loop added in efd468df3f still allows the install to declare
success while inside the kubelet's post-restart re-register window.
On rke2/k3s, `systemctl restart rke2-agent` restarts both containerd
and the kubelet, but `wait_till_node_is_ready` polls `.status.conditions[Ready]`
every 2 s and returns on the first `True` observation it sees. By default
the kubelet only publishes node status every ~10 s, so that first `True`
is almost always the stale value from before the restart — the kubelet
hasn't actually finished restarting yet. `label_node_with_retry` then
applies the label, sleeps 1 s, reads back "true" (still stale, kubelet
still down), and returns Ok. Install completes, `/readyz` flips to 200,
helm releases its `--wait`, and the bats test starts — and only then
does the kubelet finish coming up, re-register the node, and clobber
the label with its cached set. The lifecycle test sees an empty
`katacontainers.io/kata-runtime` and fails:
# Node label katacontainers.io/kata-runtime:
not ok 1 Kata artifacts are present on host after install
A single-shot verification can't distinguish "still stale true" from
"truly stable true after kubelet re-register". Replace it with a
stability window: after (re)applying the label, require it to remain
at the expected value for STABILITY_CHECKS=6 consecutive observations
spaced CHECK_INTERVAL=2 s apart (≈ 12 s — comfortably more than the
kubelet's status-update period). If the value ever drifts inside the
window, re-apply and restart the stability counter. Bounded by
MAX_APPLY_ATTEMPTS=12, so worst case is ~3 min; happy path adds ~12 s
to install.
Also add a short polling loop to the test's own label assertion as
belt-and-suspenders for any leftover transient race, matching the
existing retry pattern used for the container-runtime version check.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
The `install_cached_tarball_component` function in the binaries
packaging script contained syntax errors where it attempted to capture
the empty stdout of the `cleanup_and_fail` function inside a return
statement (e.g., `return "$(cleanup_and_fail ...)"`).
Since `cleanup_and_fail` only returns an exit status and produces no
stdout, this evaluated to `return ""`, which is invalid in bash and
causes the script to crash with `numeric argument required` instead of
returning the failure status.
Fix this by replacing the buggy inline returns with proper `if` blocks
that call `cleanup_and_fail` and explicitly return `1`.
Signed-off-by: Huy Pham <huypham@google.com>
During parallel builds of different kernel variants (e.g., generic,
debug, nvidia-gpu), the config generation script wrote to a shared
static path: `tools/packaging/kernel/configs/fragments/x86_64/.config`.
This caused critical race conditions where concurrent processes would
overwrite or delete the `.config` file while another process was reading
it, leading to sporadic build failures with "No such file or directory"
errors.
Resolve this by changing the temporary configuration path to be
build-specific, writing it inside the unique kernel build directory
(e.g., `kata-linux-.../.config.generated`). The final config is still
copied to `.config` in the kernel source tree as before, but the
intermediate merge process is now isolated.
Signed-off-by: Huy Pham <huypham@google.com>
On rke2/k3s a CRI restart also restarts the kubelet, which may briefly
re-register the node with its cached label set and clobber the
kata-runtime label that was just applied via the API.
Replace the single label_node call with a retry loop that verifies the
label value after setting it. If the label is missing or has the wrong
value, it is re-applied (up to 10 attempts with 2 s back-off). This
fixes a race condition that became more visible after the switch to
individual tarball extraction, which made install take slightly longer
and shifted the kubelet re-registration timing window.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Add zstd and tar as Rust dependencies and rewrite the artifact
installation logic to extract only the component tarballs required by
the enabled runtime classes.
extract_component_tarballs reads shim-components.json to determine which
kata-static-<name>.tar.zst files are needed for the selected shims and
current architecture. Shared components (e.g. kernel, shim-v2-go) are
listed by multiple shims and must only be unpacked once per install run.
Deduplication is handled with an in-memory set passed through the call,
avoiding any risk of stale on-disk state surviving across pod restarts.
Within each tarball, opt/kata path prefixes are stripped and absolute
symlink / hard-link targets are rewritten to point at the resolved
installation directory, correctly handling MULTI_INSTALL_SUFFIX.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Update the Dockerfile to copy each kata-static-<name>.tar.zst directly
into the image alongside shim-components.json, replacing the old
artifact-extractor stage that unpacked a single merged tarball.
Update the publish-kata-deploy-payload and release CI workflows to
download individual per-component artifacts instead of waiting for a
merged tarball, and simplify kata-deploy-build-and-upload-payload.sh
accordingly. The kata-deploy image build is no longer blocked on the
merge step.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Introduces the human-maintained shim-components.json that maps each
runtime class to the list of kata-static-<name>.tar.zst component
tarballs it needs per architecture. This is the source of truth read
by the installer at deploy time to decide which tarballs to extract.
Key design choices encoded here:
- shim-v2-go vs shim-v2-rust: explicit per-shim, so a node running
only Rust shims never extracts the Go shim binary.
- virtiofsd and nydus are both listed for hypervisors that support
configurable shared_fs (we cannot know which the user will choose).
- fc/firecracker: no virtiofsd or nydus (devmapper only).
- remote: only the shim binary (no local hypervisor artifacts).
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Split the monolithic shim-v2 build target into separate shim-v2-go and
shim-v2-rust targets in kata-deploy-binaries.sh, the local-build
Makefile, and the four architecture CI workflows.
The Go and Rust shims now each produce their own kata-static-<name>.tar.zst
artifact, allowing downstream consumers to select only the shim variant
they need. MEASURED_ROOTFS is set per-arch for the Rust job in CI.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Previously, the commit SHA tag was only added for specific components
(agent, agent-ctl) by setting artefact_tag in individual install
functions. This was inconsistent and error-prone.
Now, the HEAD commit SHA is always added as a tag for all builds in
the central tagging logic. This ensures:
- All components get tagged with the commit SHA
- The correct HEAD commit is used (not the last commit that modified
a specific path)
- Simpler, more maintainable code
The git command uses `git -C` to change to the repo directory before
running git log, which correctly returns the HEAD commit SHA regardless
of which files were modified in recent commits.
Assisted-by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
The script was creating .cargo/config.toml but referencing .cargo/config
in the vendor_dir_list, causing tar to fail with 'Cannot stat' error.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Generated-By: IBM Bob
kata-deploy is a per-node infrastructure DaemonSet; if it gets evicted
under node memory/CPU pressure the node loses its Kata runtime until
the pod is rescheduled. Default to system-node-critical so the kubelet
evicts lower-priority workloads first.
The value is configurable via `priorityClassName` in values.yaml.
Fixes: #13068
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Moving agent-ctl into the root workspace moves the target
directory, so update this target to be in root, not src/tools
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
The tests haven't been run at least since we moved to GHA,
so in the spirit of lean and mean, let clear them up
Fixes: #10957
Assisted-by IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This reverts commit edfb6f5716.
The NVIDIA non-TEE CI job has passed again over the last 5 nightly
runs after merging PRs #13007 and #13020.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Update CDH to a newer version and:
- adjust the NVIDIA root filesystem build to reflect the change from
using libcryptsetup to using the cryptsetup binary.
- adjust image-pull test cases to conduct parallel write operations
on the /dev/trusted_store backed guest image pull location since
issue #12721 has been solved on CDH side.
Fixes#12721
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Bump the go version to resolve CVEs:
- GO-2026-4918
- GO-2026-4971
- GO-2026-4976
- GO-2026-4977
- GO-2026-4980
- GO-2026-4981
- GO-2026-4982
- GO-2026-4986
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Assisted-by: IBM Bob
The DAX header (2 MiB of NVDIMM metadata + a duplicate MBR) is
unconditionally prepended to every image by set_dax_header(). NVIDIA
images use virtio-blk-pci with disable_image_nvdimm=true, so the
kernel reads MBR #1 directly and never touches the DAX metadata --
it is dead weight.
Add a SKIP_DAX_HEADER environment variable (default "no") that, when
set to "yes", skips the DAX header entirely:
- Removes the 2 MiB DAX overhead from image size calculations in
both the erofs and ext4 paths
- Skips the set_dax_header() call, avoiding compilation and
execution of the nsdax tool
- Passes the variable through to containerised builds
Enable SKIP_DAX_HEADER=yes for both install_image_nvidia_gpu() and
install_image_nvidia_gpu_confidential() in the build pipeline. All
other image builds are unaffected (default remains "no").
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>