Commit Graph

18775 Commits

Author SHA1 Message Date
Fabiano Fidêncio
947c7ff3b3 ci: Remove standalone kernel-modules-images build target
Module images are now built as part of the kernel-tarball target
via build-kernel.sh build-modules-images, so the separate CI
matrix entry is no longer needed.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-27 07:21:27 +02:00
Fabiano Fidêncio
68b80c998f docs: Add documentation for kernel modules images
Document the kernel_modules_images feature: building modules
volumes, TOML and Helm chart configuration, agent behavior,
and security considerations for both confidential and
non-confidential deployments.

Prominently warns that custom modules will not work with
official Kata kernel releases because the KBUILD_SIGN_PIN
used to sign modules is not public, requiring users to
rebuild the kernel with their own signing key.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-27 07:21:27 +02:00
Fabiano Fidêncio
76d83ad5f7 kernel: Bump kata_config_version
We need to do so as we're changing kernel build scripts.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-27 07:16:17 +02:00
Fabiano Fidêncio
848c2f95e4 kernel: Build combined kata-modules-all.img
In addition to per-set module images (kata-modules-mlx5.img,
kata-modules-ntfs.img), build a combined image containing all
module sets. This reduces the number of virtio-blk devices and
dm-mod.create kernel command line entries needed when a user
wants all available modules.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-27 07:16:16 +02:00
Fabiano Fidêncio
79971c7c14 kernel: Add NTFS3 modules image support
Add kernel config fragment for the NTFS3 filesystem driver as a
loadable module and register it in the orchestrator script so that
a kata-modules-ntfs.img disk image is produced alongside the MLNX
image in the same CI build.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-27 07:16:15 +02:00
Fabiano Fidêncio
51ffafa0ac kernel: Add MLX5 modules image build infrastructure
Add config fragment, build script, and CI integration for building
Mellanox MLX5/InfiniBand kernel modules as a standalone disk image.

The orchestrator script (build-kernel-modules-images.sh) builds the
kernel with extra module config fragments, runs modules_install,
filters modules by subsystem into per-set staging trees, and
packages each into its own disk image using build-modules-volume.sh.

Since these modules are built within the Kata CI using the same
KBUILD_SIGN_PIN, they are signed and loadable on the official
released Kata kernel.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-27 07:16:14 +02:00
Fabiano Fidêncio
2273a32d2c kata-deploy: Add kernel_modules_images support
Allow deploying kernel modules images via the Helm chart. Users
specify a list of images with paths and optional verity params
in values.yaml. These are rendered as a ConfigMap, mounted into
the kata-deploy pod, and used to generate a TOML drop-in with
[[hypervisor.<name>.kernel_modules_images]] array of tables.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-27 07:16:12 +02:00
Fabiano Fidêncio
eb7f3657ba agent: Mount storages before loading kernel modules
Reorder create_sandbox to call add_storages before
load_kernel_module so that modules on separate volumes are
available when modprobe runs.

After mounting, detect any storages targeting
/lib/modules/kata-modules-* and if present, write a
/etc/depmod.d/kata-modules.conf with search directives for
those directories and run depmod -a to rebuild the module
dependency database.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-27 07:16:11 +02:00
Fabiano Fidêncio
886946fc60 rootfs: Add kmod to guest rootfs package lists
The kernel modules images feature requires modprobe and depmod
to be available inside the guest VM. Add the kmod package to
the Ubuntu, Alpine, and CentOS rootfs package lists.

Debian inherits from Ubuntu's config so it picks up kmod
automatically. The NVIDIA rootfs already installs kmod
separately in nvidia_chroot.sh.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-27 07:16:10 +02:00
Fabiano Fidêncio
19cc1eb7f8 runtime-rs: Add kernel_modules_images support
Add support for attaching multiple kernel modules disk images in
the Rust runtime, mirroring the Go runtime implementation.

Each configured image is cold-plugged as a read-only block device
and a Storage entry is sent to the agent to mount it at
/lib/modules/kata-modules-<N>.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-27 07:16:09 +02:00
Fabiano Fidêncio
f5551a5bdd runtime: Add kernel_modules_images support
Add support for attaching multiple kernel modules disk images to
the guest VM as additional block devices. This enables loading
out-of-tree kernel modules from separate, independently managed
volumes without modifying the dm-verity measured rootfs.

Configuration uses TOML array of tables:

  [[hypervisor.qemu.kernel_modules_images]]
  path = "/path/to/modules-volume-1.img"
  verity_params = ""

  [[hypervisor.qemu.kernel_modules_images]]
  path = "/path/to/modules-volume-2.img"
  verity_params = "root_hash=..."

Each image is cold-plugged as a virtio-blk device (vdb, vdc, ...)
and a Storage entry is sent to the agent to mount it read-only at
/lib/modules/kata-modules-<N>.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-27 07:16:08 +02:00
Fabiano Fidêncio
8516029270 kernel: Add script to build modules volume with dm-verity
Add build-modules-volume.sh to package signed kernel modules
into a standalone ext4 disk image that can be attached to a
kata guest VM as a secondary block device.

This allows loading out-of-tree modules without modifying the
dm-verity measured rootfs. The rootfs image and its root hash
remain unchanged.

The script optionally supports dm-verity on the modules volume
itself (-V flag), providing defense-in-depth alongside kernel
module signing.

Security risks documented in the script header:
- Without dm-verity, the volume relies solely on kernel module
  signing (CONFIG_MODULE_SIG_FORCE) for integrity.
- With dm-verity, the hash must be verified during attestation
  to provide actual security benefit.
- Host-side file permissions on the volume image must prevent
  unauthorized modification.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-27 07:16:06 +02:00
Fabiano Fidêncio
3686c4a20a kernel: Remove redundant CONFIG_MODULES from NVIDIA GPU fragments
Remove CONFIG_MODULES, CONFIG_MODULE_UNLOAD, and CONFIG_MODULE_SIG
from the NVIDIA GPU config fragments (nvidia.x86_64.conf.in and
nvidia.arm64.conf.in) since these are now provided by the shared
common/modules/modules.conf and common/signing/module_signing.conf
fragments, which are always included for confidential builds.

NVIDIA GPU builds always use -x (confidential), so these options
were redundant. CONFIG_FW_LOADER is kept as it is specific to
GPU firmware loading needs.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-27 07:16:04 +02:00
Fabiano Fidêncio
bde96141df kernel: Enable module loading and signing for confidential builds
For confidential builds (-x), always include modules/modules.conf
(CONFIG_MODULES=y, CONFIG_MODULE_UNLOAD=y) and
signing/module_signing.conf (CONFIG_MODULE_SIG_FORCE=y, etc.).

This enables two important capabilities for confidential guests:

1. Loadable module support: allows out-of-tree kernel modules
   to be loaded from separate modules volume images without
   modifying the dm-verity measured rootfs.

2. Module signature enforcement: the kernel rejects any unsigned
   or wrongly-signed module, maintaining the trust chain from
   the attested kernel to loaded modules.

Previously, module signing was only included when KBUILD_SIGN_PIN
was set. For non-confidential builds, that behavior is preserved.
For confidential builds, module signing is now always enabled
since it is essential for the security model.

Security notes:
- CONFIG_MODULE_SIG_FORCE=y ensures the kernel rejects unsigned
  modules, preventing arbitrary code execution in the guest.
- The signing key is generated during kernel build. Users need
  this key (protected by KBUILD_SIGN_PIN) to sign out-of-tree
  modules.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-27 07:16:00 +02:00
Fabiano Fidêncio
a6ebcc2d38 kernel: Add config fragment for module loading
Add a new conditional kernel config fragment in a subdirectory
(following the pattern of signing/ and confidential_containers/)
so it is not auto-included by the common/*.conf wildcard:

- common/modules/modules.conf: Enables CONFIG_MODULES and
  CONFIG_MODULE_UNLOAD for out-of-tree kernel module support.
  This is required for loading user-compiled modules delivered
  via separate modules volume images.

This fragment will be explicitly included by build-kernel.sh
for confidential builds.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-27 07:15:59 +02:00
Fabiano Fidêncio
f2c8f66dcd kata-deploy: Fix cleanup_and_fail returning non-numeric value
cleanup_and_fail() prints nothing to stdout and returns 1. The
callers used `return "$(cleanup_and_fail ...)"` which expands to
`return ""`, causing bash to error with "numeric argument required".

Replace the command substitution with a compound command that calls
the cleanup function and propagates its exit code via `$?`.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-27 07:15:57 +02:00
Steve Horsman
63e50dd946 Merge pull request #12817 from burgerdev/regorus-bump
genpolicy: update regorus to 0.9.1
2026-04-26 13:58:40 +01:00
Fabiano Fidêncio
120d895d60 Merge pull request #12918 from mythi/no-ita
tests: align qemu-tdx kbs tests to use Trustee AS
2026-04-26 13:13:59 +02:00
Fabiano Fidêncio
74d9d043f0 agent: raise regorus policy length limits
regorus 0.9.0 introduced a hard, per-engine ceiling on parsed-policy
size (1024 columns / 1 MiB / 20 000 lines, see lexer.rs:30 in
microsoft/regorus). The 1024-column cap rejects realistic policies
emitted by `genpolicy`: the `NVIDIA_REQUIRE_CUDA` environment variable
on `nvcr.io/nvidia/k8s/cuda-sample` is roughly 1.3 KiB on a single line,
so the agent's `set_policy()` returns an error, the agent (PID 1) exits,
the guest kernel reboots, and the runtime eventually times out
connecting to the agent's vsock.

regorus PR #624 ("feat: make policy length limits configurable per
engine") adds `Engine::set_policy_length_config`, but it has not been
released yet -- the latest published version is still 0.9.1, which
predates that change.

Pin `regorus` to the upstream commit that includes #624 and call the
new setter from `AgentPolicy::new_engine()` with values that comfortably
fit any policy we expect to evaluate (64 KiB per line, 16 MiB per file,
200 000 lines) while still rejecting pathological/minified input. Once
a regorus release > 0.9.1 ships with #624, the dependency can be moved
back to crates.io.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-26 10:18:26 +02:00
Markus Rudy
c8fe6a60d0 genpolicy: update regorus to 0.9.1
The version we used before was released in 2024, it's about time to use
a newer version. The new version of the crate comes with a license,
which addresses a `cargo deny` finding.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2026-04-26 10:18:26 +02:00
Fabiano Fidêncio
815db4a1df Merge pull request #12920 from zvonkok/driver-bump
cuda: Bump Driver Version
2026-04-26 00:00:00 +02:00
Mikko Ylinen
9cccfb5cb5 tests: align qemu-tdx kbs tests to use Trustee AS
No need to deviate from how other CoCo targets use Trustee and
enables us to add more tests (e.g., RVPS) that ITA Trustee implemention
does not support.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2026-04-25 22:53:15 +02:00
Fabiano Fidêncio
749d4713e8 Merge pull request #12897 from kata-containers/dependabot/cargo/src/tools/trace-forwarder/rand-0.8.6
build(deps): bump rand from 0.8.5 to 0.8.6 in /src/tools/trace-forwarder
2026-04-25 22:49:59 +02:00
Steve Horsman
fc359d2140 Merge pull request #12901 from kata-containers/dependabot/cargo/openssl-0.10.78
build(deps): bump openssl from 0.10.76 to 0.10.78
2026-04-25 20:59:51 +01:00
Zvonko Kaiser
150e3ab4b8 cuda: Bump Driver Version
For HGX B300 systems we need the 595 driver branch, bump
the guest fs driver to support those systems.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-04-25 19:28:31 +02:00
Fabiano Fidêncio
28d9043d4c build: Add driver version to artefact cache
Add the nvidia driver version to the artefact cache keys so that
a driver bump triggers image and initrd rebuilds.

Also rename the helper functions to follow a consistent
get_latest_nvidia_* naming convention.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-25 19:28:31 +02:00
Fabiano Fidêncio
b3ed669d16 Merge pull request #12913 from pmores/fix-exec
runtime-rs: fix exec when selinux is disabled on guest
2026-04-25 17:34:46 +02:00
Fabiano Fidêncio
3d94620df5 Merge pull request #12900 from kata-containers/dependabot/cargo/src/tools/kata-ctl/openssl-0.10.78
build(deps): bump openssl from 0.10.73 to 0.10.78 in /src/tools/kata-ctl
2026-04-25 17:13:01 +02:00
Steve Horsman
db51842229 Merge pull request #12923 from stevenhorsman/bump-webpki-to-0.103.13
versions: Update rustls-webpki to 0.103.13
2026-04-25 16:09:47 +01:00
Fabiano Fidêncio
0a4fb4f11b Merge pull request #12891 from fidencio/topic/networking-handle-device-type-interfaces
runtimes: network: handle "device" type interfaces (mlx5 SFs)
2026-04-25 16:46:37 +02:00
dependabot[bot]
151a797fc0 build(deps): bump openssl from 0.10.76 to 0.10.78
Bumps [openssl](https://github.com/rust-openssl/rust-openssl) from 0.10.76 to 0.10.78.
- [Release notes](https://github.com/rust-openssl/rust-openssl/releases)
- [Commits](https://github.com/rust-openssl/rust-openssl/compare/openssl-v0.10.76...openssl-v0.10.78)

---
updated-dependencies:
- dependency-name: openssl
  dependency-version: 0.10.78
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-25 10:28:48 +00:00
dependabot[bot]
365f6c1efa build(deps): bump openssl from 0.10.73 to 0.10.78 in /src/tools/kata-ctl
Bumps [openssl](https://github.com/rust-openssl/rust-openssl) from 0.10.73 to 0.10.78.
- [Release notes](https://github.com/rust-openssl/rust-openssl/releases)
- [Commits](https://github.com/rust-openssl/rust-openssl/compare/openssl-v0.10.73...openssl-v0.10.78)

---
updated-dependencies:
- dependency-name: openssl
  dependency-version: 0.10.78
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-25 10:27:45 +00:00
dependabot[bot]
9a88f4f8cf build(deps): bump rand from 0.8.5 to 0.8.6 in /src/tools/trace-forwarder
Bumps [rand](https://github.com/rust-random/rand) from 0.8.5 to 0.8.6.
- [Release notes](https://github.com/rust-random/rand/releases)
- [Changelog](https://github.com/rust-random/rand/blob/0.8.6/CHANGELOG.md)
- [Commits](https://github.com/rust-random/rand/compare/0.8.5...0.8.6)

---
updated-dependencies:
- dependency-name: rand
  dependency-version: 0.8.6
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-25 10:27:32 +00:00
Pavel Mores
d3f56cd3a6 runtime-rs: remove process selinux label on exec if disable_guest_selinux
Without this commit any attempt to exec a command in a container will fail
if SELinux is disabled in the guest but an SELinux label is given for
the new process.  That will happen pretty much any time SELinux is enabled
on the host (and the container is not privileged).

Signed-off-by: Pavel Mores <pmores@redhat.com>
2026-04-25 11:27:15 +01:00
Pavel Mores
1390ad650b runtime-rs: factor getting disable_guest_linux value out to own function
We'll need to get the `disable_guest_linux` value in the exec handler, too.
This will allow us to avoid duplicating the get.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2026-04-25 11:27:15 +01:00
stevenhorsman
d6df75853b versions: Update rustls-webpki to 0.103.13
Simple bump to fix CVE GHSA-82j2-j2ch-gfr8:
Denial of service via panic on malformed CRL BIT STRING

Assisted-by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-25 11:27:02 +01:00
Fabiano Fidêncio
966e9b7f80 agent: skip non-PCI addresses in PCIDEVICE env vars
Device plugins may set PCIDEVICE_* environment variables with
non-PCI identifiers (e.g. "mlx5_core.sf.10" for mlx5 Scalable
Functions). The update_env_pci() function assumed all values were
PCI BDF addresses and failed to parse them, causing container
creation to fail with:

  "PCI address mlx5_core.sf.10 should have the format DDDD:BB:SS.F"

Skip PCIDEVICE_* entries whose values don't parse as PCI addresses,
leaving them untouched for the workload. The corresponding _INFO
variable is also left as-is since no mapping is collected.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-25 12:26:20 +02:00
Fabiano Fidêncio
8c3a0e692b runtime-rs: network: handle "device" type interfaces (mlx5 SFs)
Same fix as the Go runtime: interfaces whose drivers do not register
a specific netlink kind (e.g. mlx5 Scalable Functions) are reported
with the generic type "device", which is not handled by the endpoint
creation match, causing sandbox creation to fail with:

  "unsupported link type: device"

Add "device" as an alternative pattern alongside "veth" so these
interfaces are connected through a TAP + TC-filter bridge.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-25 12:26:20 +02:00
Fabiano Fidêncio
6436922f5b runtime: network: handle "device" type interfaces (mlx5 SFs)
Interfaces whose drivers do not register a specific netlink kind
(e.g. mlx5 Scalable Functions) are reported with the generic type
"device". The endpoint creation code did not handle this type,
causing sandbox creation to fail with:

  "Unsupported network interface: device"

This is particularly visible on arm64 with Mellanox ConnectX NICs
using Scalable Functions, where the ethtool BusInfo returns a
non-PCI identifier (e.g. "mlx5_core.sf.4") so isPhysicalIface()
cannot classify the interface as physical either.

Handle "device" type interfaces the same way as veth endpoints,
connecting them through a TAP + TC-filter bridge.

Additionally, relax getLinkForEndpoint() for VethEndpoint so it
accepts the concrete link type returned by the kernel instead of
asserting *netlink.Veth. A "device" type interface wrapped in a
VethEndpoint returns *netlink.Device from LinkByName(), which
would fail the strict type assertion. All callers only need
link.Attrs(), so accepting any link type is safe.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-25 12:26:20 +02:00
Steve Horsman
4b2d529a34 Merge pull request #12924 from fidencio/topic/temp-skip-smb-tests
ci: k8s: temporarily remove smb tests
2026-04-25 11:25:49 +01:00
Fabiano Fidêncio
df68536cd6 ci: Skip tests not working with k8s 1.36.0
At first we thought this only happened with AKS, but it seems this is a
change in k8s 1.36.0 as the tests now started failing outside of AKS as
well.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2026-04-25 08:56:42 +02:00
Fabiano Fidêncio
e6c6aad7af ci: k8s: temporarily remove smb tests
All the CIs are failing on the tests and in order to avoid blocking
upstream while allowing enough time for the developers to properly fix
it, let's just not execute the test.

This commit should be reverted once a fix is proposed.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-24 21:13:23 +02:00
Fabiano Fidêncio
e0927e0e0c Merge pull request #12846 from RainaYL/rainax/split_irqchip_pr
dragonball: Implement userspace IOAPIC to enable split irqchip
2026-04-24 19:07:45 +02:00
Aurélien Bombo
15296fc9fe Merge pull request #12374 from microsoft/cameronbaird/add-cifs
kernel: add required configs for CIFS support
2026-04-24 10:42:09 -05:00
Steve Horsman
1cab92139c Merge pull request #12501 from ANJANA-A-R-K/vuln-fix
kata-agent: Bump serde-enum-str to v0.5.0
2026-04-24 15:03:45 +01:00
Fabiano Fidêncio
3505576a98 Merge pull request #12912 from fidencio/topic/runtime-rs-qemu-as-default
runtime-rs: Set QEMU as the default hypervisor
2026-04-24 13:37:35 +02:00
Greg Kurz
de91eda11b Merge pull request #12890 from fidencio/topic/shell-check
shell check: Let the bot fix those issues
2026-04-24 12:41:33 +02:00
Anjana A R K
d2e0e277cc kata-agent: Bump serde-enum-str to v0.5.0
Upgraded the serde-enum-str to v0.5.0 which bumps serde-attributes to 0.3.0 version

Signed-off-by: Anjana A R K <anjana.a.r.k1@ibm.com>
2026-04-24 15:57:59 +05:30
Fabiano Fidêncio
785c2ca981 Merge pull request #12911 from fidencio/topic/ci-only-run-arm64-tests-on-nightly
ci: Only run arm64 k8s tests on nightly builds
2026-04-24 10:19:34 +02:00
Fabiano Fidêncio
12bb497ce2 runtime-rs: Set QEMU as the default hypervisor
Dragonball is only supported on x86_64 and aarch64, so using it as the
default hypervisor means architectures like s390x, powerpc64le, and
riscv64gc have no working default. Switch to QEMU, which is available
across all supported architectures.

Dragonball is still compiled as a feature on x86_64 and aarch64 via
USE_BUILTIN_DB, and users can still override the default with
HYPERVISOR=dragonball.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-24 09:42:10 +02:00