Commit Graph

18452 Commits

Author SHA1 Message Date
Alex Lyn
2eccf08a76 kata-sys-util: Add PCI helpers for VFIO cold-plug paths
The VFIO cold-plug path needs to resolve a PCI device's sysfs address
from its /dev/vfio/ group or iommufd cdev node. Extend the PCI helpers
in kata-sys-util to support this: add a function that walks
/sys/bus/pci/devices to find a device by its IOMMU group, and expose the
guest BDF that the QEMU command line will reference.

These helpers are consumed by the runtime-rs hypervisor crate when
building VFIO device descriptors for the QEMU command line.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-11 19:08:29 +02:00
Alex Lyn
8493d73507 kata-types: Add pod_resource_api_sock configuration for GPU cold-plug
The Go runtime already exposes a [runtime] pod_resource_api_sock option
that tells the shim where to find the kubelet Pod Resources API socket.
The runtime-rs VFIO cold-plug code needs the same setting so it can
query assigned GPU devices before the VM starts.

Add the field to RuntimeConfig and wire it through deserialization so
that configuration-*.toml files can set it.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-11 19:08:29 +02:00
Alex Lyn
c12bf20dea pod-resources-rs: Add kubelet Pod Resources API client
Add a gRPC client crate that speaks the kubelet PodResourcesLister
service (v1). The runtime-rs VFIO cold-plug path needs this to discover
which GPU devices the kubelet has assigned to a pod so they can be
passed through to the guest before the VM boots.

The crate is intentionally kept minimal: it wraps the upstream
pod_resources.proto, exposes a Unix-domain-socket client, and
re-exports the generated types.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-11 18:59:49 +02:00
Fabiano Fidêncio
bd6377a038 Merge pull request #12614 from manuelh-dev/mahuber/image-signing-nim
tests: nvidia: Enforce image signing for NIM test
2026-04-11 14:48:04 +02:00
Fabiano Fidêncio
5eb7844183 Merge pull request #12430 from stevenhorsman/cargo-deny-static-checks
static-checks: Rework cargo deny check
2026-04-11 12:05:53 +02:00
stevenhorsman
8be3a24112 ci: Update cargo-deny in gatekeeper
Update the name and move it to the static checks as we don't
need to ensure it's running for none code changes.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-11 08:46:32 +01:00
stevenhorsman
9448988783 workflow: Update cargo deny check
The cargo deny generated action doesn't seem to work
and seems unnecessarily complex, so try using
EmbarkStudios/cargo-deny-action instead

Fixes: #11218
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-11 08:46:32 +01:00
stevenhorsman
a0410e0d5c static-checks: Update cargo deny config
The previous config is not valid, so update it based on information
from https://embarkstudios.github.io/cargo-deny/checks/cfg.html

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-11 08:46:32 +01:00
stevenhorsman
a32c6fd9ff mem-agent: Add package metadata
Make the authors, edition and license be inherited from the
workspace

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-11 08:46:32 +01:00
stevenhorsman
5bcc006447 runtime-rs: Add missing license
The ch-config crate was missing a license

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-11 08:46:32 +01:00
Manuel Huber
7daeb78b67 tests: nvidia: Enforce image signing for NIM test
Validate container image signatures for the NIM test using NVIDIA's
public signing key.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-04-11 09:22:50 +02:00
Fabiano Fidêncio
3ce3644c3c Merge pull request #12807 from PiotrProkop/blk-sector-rust
runtime-rs: allow specifying logical/physical sector size for block devices
2026-04-11 00:42:45 +02:00
Fabiano Fidêncio
6f3c11aec4 Merge pull request #12808 from fidencio/topic/agent-allow-configuring-launch-process-timeout
agent: Make launch_process_timeout configurable
2026-04-11 00:36:01 +02:00
Fabiano Fidêncio
d4a042a155 Merge pull request #12813 from fitzthum/bump-gc-ma-sigs
Bump guest components to pickup additional signature support
2026-04-10 23:57:19 +02:00
Fabiano Fidêncio
78fa4c88e2 Merge pull request #12814 from fidencio/topic/nvidia-always-do-vcpu-pinning
runtime: Set `enable_vcpus_pinning = true` for NVIDIA configs
2026-04-10 23:47:44 +02:00
Fabiano Fidêncio
7244389ad4 runtime: Set enable_vcpus_pinning = true for NVIDIA configs
So we can have a better performance by default.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-10 16:41:34 +02:00
Fabiano Fidêncio
1d77c4e60f Merge pull request #12752 from LizZhang315/add-overheadEnabled
helm: add overheadEnabled switch for runtimeclass
2026-04-10 16:40:42 +02:00
Tobin Feldman-Fitzthum
ff26a6b876 versions: update image-rs to pickup signature fixes
The new version of image-rs supports more types of signed images. First,
we added supported for a few more key types. Second, we added support
for multi-arch images where the manifest digest is signed but the
individual arch manifest is not. These images are relatively common, so
let's pickup the fix asap.

Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
2026-04-10 06:54:58 -07:00
Tobin Feldman-Fitzthum
2588a0e5a5 agent-ctl: bump image-rs version
I don't think agent-ctl will benefit from the new image-rs features, but
let's update it to be complete.

Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
2026-04-10 06:52:53 -07:00
Fabiano Fidêncio
e8f34a2b26 agent: Update protocol
This is not related to this PR, but rather to #12734, which ended up not
running the `make src/agent generate-protocols`.

While here, let's also fix it.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-10 14:47:01 +02:00
Fabiano Fidêncio
36a2d8e7f2 agent: Make launch_process_timeout configurable
The hardcoded DEFAULT_LAUNCH_PROCESS_TIMEOUT of 6 seconds in the kata
agent is insufficient for environments with NVIDIA GPUs and NVSwitches,
where the attestation-agent needs significantly more time to collect
evidence during initialization (e.g. ~2 seconds per NVSwitch).

When the timeout expires, the agent (PID 1) exits with an error, causing
the guest kernel to perform an orderly shutdown before the
attestation-agent has finished starting.

Make this timeout configurable via the kernel parameter
agent.launch_process_timeout (in seconds), preserving the 6-second
default for backward compatibility. The Go runtime is wired up to pass
this value from the TOML config's [agent.kata] section through to the
kernel command line.

The NVIDIA GPU configs set the new default to 15 seconds.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-10 14:47:01 +02:00
PiotrProkop
82de35c720 runtime-rs: allow specifying logical/physical sector size for block devices
Add two new configuration knobs that control the logical and physical
sector sizes advertised by virtio-blk devices to the guest:

  block_device_logical_sector_size  (config file)
  block_device_physical_sector_size (config file)

  io.katacontainers.config.hypervisor.blk_logical_sector_size  (annotation)
  io.katacontainers.config.hypervisor.blk_physical_sector_size (annotation)

The annotation names are abbreviated relative to the config file keys
because Kubernetes enforces a 63-character limit on annotation name
segments, and the full names would exceed it.

Both settings default to 0 (let QEMU decide). When set, they are passed
as logical_block_size and physical_block_size in the QMP device_add
command during block device hotplug.

Setting logical_sector_size smaller then container filesystem
block size will cause EINVAL on mount. The physical_sector_size can
always be set independently.

Values must be 0 or a power of 2 in the range [512, 65536]; other
values are rejected with an error at sandbox creation time.

Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2026-04-10 11:14:51 +02:00
Fabiano Fidêncio
fd6375d8d5 Merge pull request #12806 from kata-containers/topic/ci-run-runtime-rs-on-SNP
ci: Run qemu-snp-runtime-rs tests in the CI
2026-04-10 11:01:20 +02:00
LizZhang315
2312f67c9b helm: add overheadEnabled switch for runtimeclass
Add a global and per-shim configurable switch to enable/disable
the overhead section in generated RuntimeClasses. This allows users
to omit overhead when it's not needed or managed externally.

Priority: per-shim > global > default(true).

Signed-off-by: LizZhang315 <123134987@qq.com>
2026-04-10 10:26:11 +02:00
Fabiano Fidêncio
218077506b Merge pull request #12769 from RuoqingHe/runtime-rs-allow-install-on-riscv
runtime-rs: Allow installation on RISC-V platforms
2026-04-10 10:24:40 +02:00
Fabiano Fidêncio
dca89485f0 Merge pull request #12802 from stevenhorsman/bump-golang-1.25.9
versions: bump golang to 1.25.9
2026-04-10 06:50:35 +02:00
Fabiano Fidêncio
72fb41d33b kata-deploy: Symlink original config to per-shim runtime copy
Users were confused about which configuration file to edit because
kata-deploy copied the base config into a per-shim runtime directory
(runtimes/<shim>/) for config.d support, leaving the original file
in place untouched.  This made it look like the original was the
authoritative config, when in reality the runtime was loading the
copy from the per-shim directory.

Replace the original config file with a symlink pointing to the
per-shim runtime copy after the copy is made.  The runtime's
ResolvePath / EvalSymlinks follows the symlink and lands in the
per-shim directory, where it naturally finds config.d/ with all
drop-in fragments.  This makes it immediately obvious that the
real configuration lives in the per-shim directory and removes the
ambiguity about which file to inspect or modify.

During cleanup, the symlink at the original location is explicitly
removed before the runtime directory is deleted.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-09 17:16:40 +02:00
Steve Horsman
9e8069569e Merge pull request #12734 from Apokleos/rm-v9p-rs
runtime-rs: Remove virtio-9p Shared Filesystem Support
2026-04-09 16:15:55 +01:00
Fabiano Fidêncio
5e1ab0aa7d tests: Support runtime-rs QEMU cmdline format in attestation test
The k8s-confidential-attestation test extracts the QEMU command line
from journal logs to compute the SNP launch measurement. It only
matched the Go runtime's log format ("launching <path> with: [<args>]"),
but runtime-rs logs differently ("qemu args: <args>").

Handle both formats so the test works with qemu-snp-runtime-rs.

Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-09 16:35:08 +02:00
Fabiano Fidêncio
3b155ab0b1 ci: Run runtime-rs tests for SNP
As we're in the process to stabilise runtime-rs for the coming 4.0.0
release, we better start running as many tests as possible with that.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-09 16:35:08 +02:00
stevenhorsman
31f9a5461b versions: bump golang to 1.25.9
Bump the go version to resolve CVEs:
- GO-2026-4947
- GO-2026-4946
- GO-2026-4870
- GO-2026-4869
- GO-2026-4865
- GO-2026-4864

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-09 08:59:40 +01:00
Hyounggyu Choi
f15f7f49f1 Merge pull request #12787 from fidencio/topic/runtime-rs-qemu-arm64-use-static-sandbox-resource-mgmt
runtime: qemu: Enable static sandbox resource management on ARM & s390x
2026-04-09 09:18:11 +02:00
Ruoqing He
98ee385220 runtime-rs: Consolidate unsupported arch
Consolidate arch we don't support at the moment, and avoid hard coding
error messages per arch.

Signed-off-by: Ruoqing He <ruoqing.he@lingcage.com>
2026-04-09 04:18:50 +00:00
Ruoqing He
26ffe1223b runtime-rs: Allow install on riscv64 platform
runtime-rs works with QEMU on RISC-V platforms, let's enable
installation on RISC-V.

Signed-off-by: Ruoqing He <ruoqing.he@lingcage.com>
2026-04-09 04:18:50 +00:00
Fabiano Fidêncio
80b0ed273f Merge pull request #12784 from hgowda-amd/sev-snp-tests-required
Add sev-snp, qemu-snp CIs as required
2026-04-09 00:22:49 +02:00
Harshitha Gowda
bb1165b23f tests: Set sev-snp, qemu-snp CIs as required
run-k8s-tests-on-tee (sev-snp, qemu-snp)

Signed-off-by: Harshitha Gowda <hgowda@amd.com>
2026-04-08 22:36:58 +02:00
Fabiano Fidêncio
2148afe243 Merge pull request #12796 from fidencio/topic/kata-deploy-run-cargo-fmt-and-cargo-check
kata-deploy: Run cargo clippy during build
2026-04-08 22:32:31 +02:00
Fabiano Fidêncio
8ff630059a Merge pull request #12778 from amd-aliem/enable-img-rootfs-snp
runtime: SNP img-based rootfs with dm-verity
2026-04-08 22:06:31 +02:00
Fabiano Fidêncio
4561ae3e29 Merge pull request #12799 from fitzthum/fixup-nv-doc-1
docs: update flow for setting nvidia devices to ready
2026-04-08 21:32:55 +02:00
Tobin Feldman-Fitzthum
9119b4982c docs: update flow for setting nvidia devices to ready
Now, we include the nvrc.smi.srs=1 flag in the default kernel cmdline.
Thus, we can remove the guidance for people to add it themselves when
not using attestation. In fact, users don't really need to know about
this flag at all.

Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
2026-04-08 18:59:51 +00:00
Fabiano Fidêncio
21466eb4e5 kata-deploy: Fix clippy warnings across crate
Fix all clippy warnings triggered by -D warnings:

- install.rs: remove useless .into() conversions on PathBuf values
  and replace vec! with an array literal where a Vec is not needed
- utils/toml.rs: replace while-let-on-iterator with a for loop and
  drop the now-unnecessary mut on the iterator binding
- main.rs: replace match-with-single-pattern with if-let in two
  places dealing with experimental_setup_snapshotter
- utils/yaml.rs: extract repeated serde_yaml::Value::String key into
  a local variable, removing needless borrows on temporary values

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-08 20:47:59 +02:00
Fabiano Fidêncio
1874d4617b kata-deploy: Run cargo clippy during build
Ensure code formatting and compilation are verified early in the
Docker build pipeline, before tests and the release build.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-08 20:47:59 +02:00
Amanda Liem
79f844d057 runtime: SNP img-based rootfs with dm-verity
Follow-on to kata-containers/kata-containers#12396

Switch SNP config from initrd-based to image-based rootfs with
dm-verity. The runtime assembles the dm-mod.create kernel cmdline
from kernel_verity_params, and with kernel-hashes=on the root hash
is included in the SNP launch measurement.

Also add qemu-snp to the measured rootfs integration test.

Signed-off-by: Amanda Liem <aliem@amd.com>
2026-04-08 16:46:32 +00:00
Greg Kurz
817580e35d Merge pull request #12795 from fidencio/topic/kata-deploy-do-not-try-to-install-a-snapshotter-when-using-crio
kata-deploy: Skip snapshotter install/uninstall on CRI-O
2026-04-08 17:18:05 +02:00
Fabiano Fidêncio
e93bfbe01a tests: Remove qemu-coco-dev* skip from sandbox vCPU allocation test
With static_sandbox_resource_mgmt calculation fixed for runtime-rs, the
VM is correctly pre-sized at creation time. The vCPU allocation test no
longer depends on CPU hotplug, so the qemu-coco-dev* skip is no longer
needed.

Fixes: #10928

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-08 16:36:00 +02:00
Fabiano Fidêncio
6bc2452664 tests: Remove aarch64 skip from sandbox vCPU allocation test
With static_sandbox_resource_mgmt now enabled for ARM on runtime-rs,
the VM is correctly pre-sized at creation time. The vCPU allocation
test no longer depends on CPU hotplug, so the aarch64 skip (issue
 #10928) is no longer needed.

Fixes: #10928

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-08 16:36:00 +02:00
Fabiano Fidêncio
e0141991d3 runtime-rs: Enable static sandbox resource management on s390x
runtime-rs memory hotplug hard-codes the `pc-dimm` device driver, which
is an x86-only QEMU device model. On s390x, the `s390-ccw-virtio`
machine type does not support `pc-dimm` at all — the Go runtime handles
this by using `virtio-mem-ccw` instead (controlled by the
`enable_virtio_mem` config knob, defaulting to true on s390x).

runtime-rs has no virtio-mem support, so any attempt to dynamically
hotplug memory on s390x fails with:

  'pc-dimm' is not a valid device model name

This is a pre-existing limitation on main — it has never worked. It is
now visible because commit 45dfb6ff25 ("runtime-rs: Fix initial vCPU /
memory with static_sandbox_resource_mgmt") expanded runtime-rs test
coverage, causing k8s-memory.bats and k8s-oom.bats to actually exercise
this code path on s390x.

Let's enforce using static_sandbox_resources_mgmt also for s390x so the
VM is sized upfront at creation time, bypassing the broken dynamic
hotplug path entirely.

If someone decides to implement hotplug support for s390x, the work
would basically be an implemntation of virtio-mem-ccw support in the
runtime-rs QEMU backend (boot-time device creation, qom-set based
resize, and virtio-mem aware memory accounting), mirroring what the Go
runtime already does, but I'm not game for this (sorry).

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-08 16:36:00 +02:00
Fabiano Fidêncio
ffab9b7eee runtime: qemu: Enable static sandbox resource management on ARM
runtime-rs lacks several features needed for CPU hotplug on ARM:
pflash/UEFI firmware passthrough, SMP topology in -smp, nr_cpus
kernel parameter, and QMP vCPU add handling for the virt machine
type (which requires core-id only placement with socket/thread/die
set to -1).

Without static sandbox resource management, these gaps cause
failures in tests like k8s-memory.bats where the VM is not correctly
sized for the workload.

Enable static_sandbox_resource_mgmt for aarch64 in the QEMU
runtime-rs configuration so the VM is pre-sized at creation time,
sidestepping the need for hotplug entirely.

Together with this we're aligning the go runtime to the very same
behaviour.

Fixes: #10928

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-08 16:36:00 +02:00
Fabiano Fidêncio
0e5e4802d7 runtime-rs: Fix initial vCPU / memory with static_sandbox_resource_mgmt
InitialSizeManager::setup_config() is responsible for applying the
sandbox workload sizing (computed from containerd/CRI-O sandbox
annotations) to the hypervisor configuration before VM creation.

Previously, the workload vCPU count was only logged but never actually
added to default_vcpus, so the VM was always created with only the base
vCPUs from the configuration/annotations. This caused the
k8s-sandbox-vcpus-allocation test to fail with qemu-snp-runtime-rs:
a pod with default_vcpus=0.75 and a container CPU limit of 1.2 should
see ceil(0.75 + 1.2) = 2 vCPUs, but only got 1.

Additionally, the workload memory was being added to default_memory
unconditionally, diverging from the Go runtime which only applies both
CPU and memory additions when static_sandbox_resource_mgmt is enabled.
In the non-static path, adding workload resources here would cause
double-counting: once from setup_config() at sandbox creation, and
again from update_cpu_resources()/update_mem_resources() when
individual containers are added.

Guard both additions behind static_sandbox_resource_mgmt, matching the
Go runtime's behavior in src/runtime/pkg/oci/utils.go:

    if sandboxConfig.StaticResourceMgmt {
        sandboxConfig.HypervisorConfig.NumVCPUsF += sandboxConfig.SandboxResources.WorkloadCPUs
        sandboxConfig.HypervisorConfig.MemorySize += sandboxConfig.SandboxResources.WorkloadMemMB
    }

Fixes: k8s-sandbox-vcpus-allocation test failure on qemu-snp-runtime-rs

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-08 16:36:00 +02:00
Fabiano Fidêncio
bb051bb16a Merge pull request #12788 from fidencio/topic/kata-deploy-re-apply-GPU-specific-labels
kata-deploy: re-apply labels for the GPU runtime classes
2026-04-08 16:27:59 +02:00