rootfs.sh stops passing a host GOPATH bind-mount into the inner
osbuilder docker run. Pass INSTALL_IN_GOPATH=false so
ci/install_yq.sh installs yq under /usr/local/bin in the container.
scripts/lib.sh resolves yq after sourcing install_yq.sh and fails
clearly if yq is still missing.
This avoids build issues on (managed) build hosts where HOME, for
example, resolves to /localhome/... while the image user record
still points at /home/... On those hosts the old flow could make
the daemon bind-mount a GOPATH path that does not exist or is not
writable on the host (e.g. mkdir or mount under /home/... denied).
Co-authored-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
The kata-deploy DaemonSet pod had no Kubernetes health probes, so the
kubelet could not distinguish between "still installing" and "crashed",
and rolling updates would proceed to the next node before install
actually finished.
Add a lightweight HTTP health server (built on raw tokio TcpListener,
no new crate dependencies) that starts immediately in the install path:
/healthz — liveness: returns 200 as soon as the server binds
/readyz — readiness: returns 503 while installing, 200 after
install completes (artifacts extracted, CRI restarted,
node labeled)
Wire the Helm chart with startup, liveness, and readiness probes
(all individually toggleable). The startup probe allows up to 10
minutes for install to complete before the liveness probe takes over.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
The codegen check ensures that generated files are up-to-date and
correspond to the tool versions used in CI. Requiring this check
prevents us from accidentally merging, e.g., proto changes without the
corresponding Rust/Go updates.
Signed-off-by: Markus Rudy <mr@edgeless.systems>
Apply same test configs we use in runtime-go config to runtime-rs config.
These are:
- runtime.static_sandbox_resource_mgmt = true
- hypervisor.clh.valid_hypervisor_paths includes cloud-hypervisor-glibc
- hypervisor.clh.path = cloud-hypervisor-glibc
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
Copy Fail" (CVE-2026-31431) is a high-severity local privilege escalation (LPE)
vulnerability found in the Linux kernel in April 2026, which affects all major
Linux distributions—including those using Long Term Support (LTS) kernels—released since 2017.
The bug allows an unprivileged user to gain root access, escape containers,
and modify the in-memory page cache reliably using a tiny 732-byte script
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Containerd 2.3.0 introduces config schema version 4 (see upstream
RELEASES.md and the version-4 server-plugin documentation). The default file
still uses the same split-CRI layout as version 3 (plugins under
io.containerd.cri.v1.runtime and io.containerd.cri.v1.images). Schema v4
mainly moves gRPC, TTRPC, debug, and metrics listener settings under
io.containerd.server.v1.*; kata-deploy does not edit those server tables except
for containerd log verbosity when DEBUG=true.
Fixes: #12936
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Now that all but agent-ctl (still WIP) of the tools are
in the root workspace, switch the default to that and add
the exception for agent-ctl as it's the odd one out.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
During the zizmor refactoring I changed the name of two jobs
to make all the architectures match. I forgot to update required_tests
and as a workflow only change the PR didn't check this, so update
them now.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
The `runtime-rs` component of `build-checks.yaml` declared `rust`
as its only dependency, but the runtime-rs build pulls in
`prost-build v0.8.0` (via `ttrpc-codegen` -> `containerd-shim-protos`,
and via the in-tree `hypervisor` crate), and `prost-build`'s build
script needs a `protoc` binary at compile time.
This worked on x86_64 and aarch64 only because `prost-build v0.8.0`
ships bundled `protoc` binaries for those targets. On s390x (and
ppc64le, when the matrix gets there) there is no bundled binary,
so the build fails with:
Failed to find the protoc binary. The PROTOC environment variable
is not set, there is no bundled protoc for this platform, and
protoc is not in the PATH
The reason this didn't show up in CI before is that `make test`
and `make check` for runtime-rs were wrapped in arch-specific
`ifeq` blocks in `src/runtime-rs/Makefile` that turned them into
no-ops on s390x/ppc64le/riscv64gc. The previous commit dropped
those gates so `make {test,check}` now actually run on every arch,
which exposes this latent CI gap.
Match what `agent`, `libs`, `agent-ctl`, `kata-ctl` and `genpolicy`
already declare and add `protobuf-compiler` to runtime-rs's needs.
The existing `Install protobuf-compiler` step in this workflow
already runs `sudo apt-get -y install protobuf-compiler`, which
the s390x/ppc64le runners support (those other components have
been using it on s390x for some time).
Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
The MLX5 Ethernet driver is useful well beyond the DPU/SmartNIC use case
(any guest sitting on top of a Mellanox/ConnectX NIC benefits from it),
yet the existing config fragment lived under dpu/ and was only pulled in
when the kernel was built with `-D nvidia`.
Promote it to a first-class common fragment so every Kata kernel gets
MLX5 Ethernet built in.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Add the nvidia driver version to the artefact cache keys so that
a driver bump triggers image and initrd rebuilds.
Also rename the helper functions to follow a consistent
get_latest_nvidia_* naming convention.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Update the kata-deploy Cargo.toml to use the
workspace wide MSRV, so it's easy to track and bump
as and when necessary.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Address shellcheck warnings including proper variable quoting,
use of [[ ]] over [ ], declaring and assigning variables separately,
and adding appropriate shellcheck disable directives where needed.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
Address shellcheck warnings including proper variable quoting,
use of [[ ]] over [ ], declaring and assigning variables separately,
and adding appropriate shellcheck disable directives where needed.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor