kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-05-14 11:03:31 +00:00

Author	SHA1	Message	Date
Fabiano Fidêncio	923162cecb	ci: Add runtime-rs GPU shims to NVIDIA GPU CI workflow Add qemu-nvidia-gpu-runtime-rs and qemu-nvidia-gpu-snp-runtime-rs to the NVIDIA GPU test matrix so CI covers the new runtime-rs shims. Introduce a `coco` boolean field in each matrix entry and use it for all CoCo-related conditionals (KBS, snapshotter, KBS deploy/cleanup steps). This replaces fragile name-string comparisons that were already broken for the runtime-rs variants: `nvidia-gpu (runtime-rs)` was incorrectly getting KBS steps, and `nvidia-gpu-snp (runtime-rs)` was not getting the right env vars. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-11 22:31:33 +02:00
Fabiano Fidêncio	09f5c85d53	kata-deploy: Add qemu-nvidia-gpu-tdx-runtime-rs shim Register the new qemu-nvidia-gpu-tdx-runtime-rs shim across the kata-deploy stack so it is built, installed, and exposed as a RuntimeClass. This adds the shim to the Rust binary's RUST_SHIMS list (so it uses the runtime-rs binary), SHIMS list, the qemu-tdx-experimental share name mapping, and the x86_64 default shim set. The Helm chart gets the new shim entry in values.yaml, try-kata-nvidia-gpu.values.yaml, and the RuntimeClass overhead definition in runtimeclasses.yaml. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-04-11 22:31:33 +02:00
Fabiano Fidêncio	20613bf0bf	runtime-rs: Add configuration-qemu-nvidia-gpu-tdx-runtime-rs.toml.in Add a new runtime-rs configuration template that combines the NVIDIA GPU cold-plug stack with Intel TDX confidential guest support. This is the runtime-rs counterpart of the Go runtime's configuration-qemu-nvidia-gpu-tdx template. The template merges the GPU NV settings (VFIO cold-plug, Pod Resources API, NV-specific kernel/image/firmware, extended timeouts) with TDX confidential guest settings (confidential_guest, OVMF.inteltdx.fd firmware, TDX Quote Generation Service socket, confidential NV kernel and image). The Makefile is updated with the new config file registration and the FIRMWARETDVFPATH_NV variable pointing to OVMF.inteltdx.fd. Also removes a stray tdx_quote_generation_service_socket_port setting from the SNP GPU template where it did not belong. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-04-11 22:31:33 +02:00
Fabiano Fidêncio	0c04b00248	kata-deploy: Add qemu-nvidia-gpu-snp-runtime-rs shim Register the new qemu-nvidia-gpu-snp-runtime-rs shim across the kata-deploy stack so it is built, installed, and exposed as a RuntimeClass. This adds the shim to the Rust binary's RUST_SHIMS list (so it uses the runtime-rs binary), SHIMS list, the qemu-snp-experimental share name mapping, and the x86_64 default shim set. The Helm chart gets the new shim entry in values.yaml, try-kata-nvidia-gpu.values.yaml, and the RuntimeClass overhead definition in runtimeclasses.yaml. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-04-11 22:31:33 +02:00
Fabiano Fidêncio	ffa0444cf3	runtime-rs: Add configuration-qemu-nvidia-gpu-snp-runtime-rs.toml.in Add a new runtime-rs configuration template that combines the NVIDIA GPU cold-plug stack with AMD SEV-SNP confidential guest support. This is the runtime-rs counterpart of the Go runtime's configuration-qemu-nvidia-gpu-snp template. The template merges the GPU NV settings (VFIO cold-plug, Pod Resources API, NV-specific kernel/image/firmware, extended timeouts) with the SNP confidential guest settings (confidential_guest, sev_snp_guest, SNP ID block/auth, guest policy, AMDSEV.fd firmware, confidential NV kernel and image). The Makefile is updated with the new config file registration, the CONFIDENTIAL_NV image/kernel variables, and FIRMWARESNPPATH_NV pointing to AMDSEV.fd. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-04-11 22:31:33 +02:00
Fabiano Fidêncio	ab68eb86fc	kata-deploy: Add qemu-nvidia-gpu-runtime-rs shim Register the Rust NVIDIA GPU runtime as a kata-deploy shim so it gets installed and configured alongside the existing Go-based qemu-nvidia-gpu shim. Add qemu-nvidia-gpu-runtime-rs to the RUST_SHIMS list and the default enabled shims, create its RuntimeClass entry in the Helm chart, and include it in the try-kata-nvidia-gpu values overlay. The kata-deploy installer will now copy the runtime-rs configuration and create the containerd runtime entry for it. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-04-11 22:31:33 +02:00
Fabiano Fidêncio	f2eecc0229	runtime-rs: Add configuration-qemu-nvidia-gpu-runtime-rs.toml.in Add a QEMU configuration template for the NVIDIA GPU runtime-rs shim, mirroring the Go runtime's configuration-qemu-nvidia-gpu.toml.in. The template uses _NV-suffixed Makefile variables for kernel, image, and verity params so the GPU-specific rootfs and kernel are selected at build time. Wire the new config into the runtime-rs Makefile: define FIRMWAREPATH_NV with arch-specific OVMF/AAVMF paths (matching the Go runtime's PR #12780), add EDK2_NAME for x86_64, and register the config in CONFIGS/CONFIG_PATHS/SYSCONFIG_PATHS so it gets installed alongside the other runtime-rs configurations. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-04-11 22:31:33 +02:00
Alex Lyn	2a4a1f588e	agent: Update VFIO device handling for GPU cold-plug Extend the in-guest agent's VFIO device handler to support the cold-plug flow. When the runtime cold-plugs a GPU before the VM boots, the agent needs to bind the device to the vfio-pci driver inside the guest and set up the correct /dev/vfio/ group nodes so the workload can access the GPU. This updates the device discovery logic to handle the PCI topology that QEMU presents for cold-plugged vfio-pci devices and ensures the IOMMU group is properly resolved from the guest's sysfs. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-11 22:31:33 +02:00
Alex Lyn	d097917658	runtime-rs: Add QEMU VFIO cold-plug and Pod Resources CDI injection Implement GPU passthrough for runtime-rs by cold-plugging VFIO devices into the QEMU command line before the VM boots. When cold_plug_vfio is enabled, the sandbox queries the kubelet Pod Resources API to discover which GPU devices have been assigned to the pod, resolves their host PCI addresses and IOMMU groups through sysfs, and passes them to QEMU as vfio-pci devices on dedicated PCIe root ports. The implementation adds a full VFIO device driver (discovery, topology placement, QEMU parameter generation, and QMP integration), extends the PCIe topology to allocate root ports for cold-plugged devices, and wires CDI device specs from the container runtime through the resource manager into the hypervisor layer. This also adapts the dragonball VFIO DMA mapping calls to the current vfio-ioctls API signatures, and handles iommufd cdev paths alongside legacy VFIO group paths for CDI compatibility. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com> Co-authored-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-11 22:31:33 +02:00
Alex Lyn	2eccf08a76	kata-sys-util: Add PCI helpers for VFIO cold-plug paths The VFIO cold-plug path needs to resolve a PCI device's sysfs address from its /dev/vfio/ group or iommufd cdev node. Extend the PCI helpers in kata-sys-util to support this: add a function that walks /sys/bus/pci/devices to find a device by its IOMMU group, and expose the guest BDF that the QEMU command line will reference. These helpers are consumed by the runtime-rs hypervisor crate when building VFIO device descriptors for the QEMU command line. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-11 19:08:29 +02:00
Alex Lyn	8493d73507	kata-types: Add pod_resource_api_sock configuration for GPU cold-plug The Go runtime already exposes a [runtime] pod_resource_api_sock option that tells the shim where to find the kubelet Pod Resources API socket. The runtime-rs VFIO cold-plug code needs the same setting so it can query assigned GPU devices before the VM starts. Add the field to RuntimeConfig and wire it through deserialization so that configuration-*.toml files can set it. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-11 19:08:29 +02:00
Alex Lyn	c12bf20dea	pod-resources-rs: Add kubelet Pod Resources API client Add a gRPC client crate that speaks the kubelet PodResourcesLister service (v1). The runtime-rs VFIO cold-plug path needs this to discover which GPU devices the kubelet has assigned to a pod so they can be passed through to the guest before the VM boots. The crate is intentionally kept minimal: it wraps the upstream pod_resources.proto, exposes a Unix-domain-socket client, and re-exports the generated types. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-11 18:59:49 +02:00
Fabiano Fidêncio	bd6377a038	Merge pull request #12614 from manuelh-dev/mahuber/image-signing-nim tests: nvidia: Enforce image signing for NIM test	2026-04-11 14:48:04 +02:00
Fabiano Fidêncio	5eb7844183	Merge pull request #12430 from stevenhorsman/cargo-deny-static-checks static-checks: Rework cargo deny check	2026-04-11 12:05:53 +02:00
stevenhorsman	8be3a24112	ci: Update cargo-deny in gatekeeper Update the name and move it to the static checks as we don't need to ensure it's running for none code changes. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-11 08:46:32 +01:00
stevenhorsman	9448988783	workflow: Update cargo deny check The cargo deny generated action doesn't seem to work and seems unnecessarily complex, so try using EmbarkStudios/cargo-deny-action instead Fixes: #11218 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-11 08:46:32 +01:00
stevenhorsman	a0410e0d5c	static-checks: Update cargo deny config The previous config is not valid, so update it based on information from https://embarkstudios.github.io/cargo-deny/checks/cfg.html Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-11 08:46:32 +01:00
stevenhorsman	a32c6fd9ff	mem-agent: Add package metadata Make the authors, edition and license be inherited from the workspace Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-11 08:46:32 +01:00
stevenhorsman	5bcc006447	runtime-rs: Add missing license The ch-config crate was missing a license Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-11 08:46:32 +01:00
Manuel Huber	7daeb78b67	tests: nvidia: Enforce image signing for NIM test Validate container image signatures for the NIM test using NVIDIA's public signing key. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-04-11 09:22:50 +02:00
Fabiano Fidêncio	3ce3644c3c	Merge pull request #12807 from PiotrProkop/blk-sector-rust runtime-rs: allow specifying logical/physical sector size for block devices	2026-04-11 00:42:45 +02:00
Fabiano Fidêncio	6f3c11aec4	Merge pull request #12808 from fidencio/topic/agent-allow-configuring-launch-process-timeout agent: Make launch_process_timeout configurable	2026-04-11 00:36:01 +02:00
Fabiano Fidêncio	d4a042a155	Merge pull request #12813 from fitzthum/bump-gc-ma-sigs Bump guest components to pickup additional signature support	2026-04-10 23:57:19 +02:00
Fabiano Fidêncio	78fa4c88e2	Merge pull request #12814 from fidencio/topic/nvidia-always-do-vcpu-pinning runtime: Set `enable_vcpus_pinning = true` for NVIDIA configs	2026-04-10 23:47:44 +02:00
Fabiano Fidêncio	7244389ad4	runtime: Set `enable_vcpus_pinning = true` for NVIDIA configs So we can have a better performance by default. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-10 16:41:34 +02:00
Fabiano Fidêncio	1d77c4e60f	Merge pull request #12752 from LizZhang315/add-overheadEnabled helm: add overheadEnabled switch for runtimeclass	2026-04-10 16:40:42 +02:00
Tobin Feldman-Fitzthum	ff26a6b876	versions: update image-rs to pickup signature fixes The new version of image-rs supports more types of signed images. First, we added supported for a few more key types. Second, we added support for multi-arch images where the manifest digest is signed but the individual arch manifest is not. These images are relatively common, so let's pickup the fix asap. Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>	2026-04-10 06:54:58 -07:00
Tobin Feldman-Fitzthum	2588a0e5a5	agent-ctl: bump image-rs version I don't think agent-ctl will benefit from the new image-rs features, but let's update it to be complete. Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>	2026-04-10 06:52:53 -07:00
Fabiano Fidêncio	e8f34a2b26	agent: Update protocol This is not related to this PR, but rather to #12734, which ended up not running the `make src/agent generate-protocols`. While here, let's also fix it. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-10 14:47:01 +02:00
Fabiano Fidêncio	36a2d8e7f2	agent: Make launch_process_timeout configurable The hardcoded DEFAULT_LAUNCH_PROCESS_TIMEOUT of 6 seconds in the kata agent is insufficient for environments with NVIDIA GPUs and NVSwitches, where the attestation-agent needs significantly more time to collect evidence during initialization (e.g. ~2 seconds per NVSwitch). When the timeout expires, the agent (PID 1) exits with an error, causing the guest kernel to perform an orderly shutdown before the attestation-agent has finished starting. Make this timeout configurable via the kernel parameter agent.launch_process_timeout (in seconds), preserving the 6-second default for backward compatibility. The Go runtime is wired up to pass this value from the TOML config's [agent.kata] section through to the kernel command line. The NVIDIA GPU configs set the new default to 15 seconds. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-10 14:47:01 +02:00
PiotrProkop	82de35c720	runtime-rs: allow specifying logical/physical sector size for block devices Add two new configuration knobs that control the logical and physical sector sizes advertised by virtio-blk devices to the guest: block_device_logical_sector_size (config file) block_device_physical_sector_size (config file) io.katacontainers.config.hypervisor.blk_logical_sector_size (annotation) io.katacontainers.config.hypervisor.blk_physical_sector_size (annotation) The annotation names are abbreviated relative to the config file keys because Kubernetes enforces a 63-character limit on annotation name segments, and the full names would exceed it. Both settings default to 0 (let QEMU decide). When set, they are passed as logical_block_size and physical_block_size in the QMP device_add command during block device hotplug. Setting logical_sector_size smaller then container filesystem block size will cause EINVAL on mount. The physical_sector_size can always be set independently. Values must be 0 or a power of 2 in the range [512, 65536]; other values are rejected with an error at sandbox creation time. Signed-off-by: PiotrProkop <pprokop@nvidia.com>	2026-04-10 11:14:51 +02:00
Fabiano Fidêncio	fd6375d8d5	Merge pull request #12806 from kata-containers/topic/ci-run-runtime-rs-on-SNP ci: Run qemu-snp-runtime-rs tests in the CI	2026-04-10 11:01:20 +02:00
LizZhang315	2312f67c9b	helm: add overheadEnabled switch for runtimeclass Add a global and per-shim configurable switch to enable/disable the overhead section in generated RuntimeClasses. This allows users to omit overhead when it's not needed or managed externally. Priority: per-shim > global > default(true). Signed-off-by: LizZhang315 <123134987@qq.com>	2026-04-10 10:26:11 +02:00
Fabiano Fidêncio	218077506b	Merge pull request #12769 from RuoqingHe/runtime-rs-allow-install-on-riscv runtime-rs: Allow installation on RISC-V platforms	2026-04-10 10:24:40 +02:00
Fabiano Fidêncio	dca89485f0	Merge pull request #12802 from stevenhorsman/bump-golang-1.25.9 versions: bump golang to 1.25.9	2026-04-10 06:50:35 +02:00
Fabiano Fidêncio	72fb41d33b	kata-deploy: Symlink original config to per-shim runtime copy Users were confused about which configuration file to edit because kata-deploy copied the base config into a per-shim runtime directory (runtimes/<shim>/) for config.d support, leaving the original file in place untouched. This made it look like the original was the authoritative config, when in reality the runtime was loading the copy from the per-shim directory. Replace the original config file with a symlink pointing to the per-shim runtime copy after the copy is made. The runtime's ResolvePath / EvalSymlinks follows the symlink and lands in the per-shim directory, where it naturally finds config.d/ with all drop-in fragments. This makes it immediately obvious that the real configuration lives in the per-shim directory and removes the ambiguity about which file to inspect or modify. During cleanup, the symlink at the original location is explicitly removed before the runtime directory is deleted. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-09 17:16:40 +02:00
Steve Horsman	9e8069569e	Merge pull request #12734 from Apokleos/rm-v9p-rs runtime-rs: Remove virtio-9p Shared Filesystem Support	2026-04-09 16:15:55 +01:00
Fabiano Fidêncio	5e1ab0aa7d	tests: Support runtime-rs QEMU cmdline format in attestation test The k8s-confidential-attestation test extracts the QEMU command line from journal logs to compute the SNP launch measurement. It only matched the Go runtime's log format ("launching <path> with: [<args>]"), but runtime-rs logs differently ("qemu args: <args>"). Handle both formats so the test works with qemu-snp-runtime-rs. Made-with: Cursor Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-09 16:35:08 +02:00
Fabiano Fidêncio	3b155ab0b1	ci: Run runtime-rs tests for SNP As we're in the process to stabilise runtime-rs for the coming 4.0.0 release, we better start running as many tests as possible with that. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-09 16:35:08 +02:00
stevenhorsman	31f9a5461b	versions: bump golang to 1.25.9 Bump the go version to resolve CVEs: - GO-2026-4947 - GO-2026-4946 - GO-2026-4870 - GO-2026-4869 - GO-2026-4865 - GO-2026-4864 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-09 08:59:40 +01:00
Hyounggyu Choi	f15f7f49f1	Merge pull request #12787 from fidencio/topic/runtime-rs-qemu-arm64-use-static-sandbox-resource-mgmt runtime: qemu: Enable static sandbox resource management on ARM & s390x	2026-04-09 09:18:11 +02:00
Ruoqing He	98ee385220	runtime-rs: Consolidate unsupported arch Consolidate arch we don't support at the moment, and avoid hard coding error messages per arch. Signed-off-by: Ruoqing He <ruoqing.he@lingcage.com>	2026-04-09 04:18:50 +00:00
Ruoqing He	26ffe1223b	runtime-rs: Allow install on riscv64 platform runtime-rs works with QEMU on RISC-V platforms, let's enable installation on RISC-V. Signed-off-by: Ruoqing He <ruoqing.he@lingcage.com>	2026-04-09 04:18:50 +00:00
Fabiano Fidêncio	80b0ed273f	Merge pull request #12784 from hgowda-amd/sev-snp-tests-required Add sev-snp, qemu-snp CIs as required	2026-04-09 00:22:49 +02:00
Harshitha Gowda	bb1165b23f	tests: Set sev-snp, qemu-snp CIs as required run-k8s-tests-on-tee (sev-snp, qemu-snp) Signed-off-by: Harshitha Gowda <hgowda@amd.com>	2026-04-08 22:36:58 +02:00
Fabiano Fidêncio	2148afe243	Merge pull request #12796 from fidencio/topic/kata-deploy-run-cargo-fmt-and-cargo-check kata-deploy: Run cargo clippy during build	2026-04-08 22:32:31 +02:00
Fabiano Fidêncio	8ff630059a	Merge pull request #12778 from amd-aliem/enable-img-rootfs-snp runtime: SNP img-based rootfs with dm-verity	2026-04-08 22:06:31 +02:00
Fabiano Fidêncio	4561ae3e29	Merge pull request #12799 from fitzthum/fixup-nv-doc-1 docs: update flow for setting nvidia devices to ready	2026-04-08 21:32:55 +02:00
Tobin Feldman-Fitzthum	9119b4982c	docs: update flow for setting nvidia devices to ready Now, we include the nvrc.smi.srs=1 flag in the default kernel cmdline. Thus, we can remove the guidance for people to add it themselves when not using attestation. In fact, users don't really need to know about this flag at all. Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>	2026-04-08 18:59:51 +00:00
Fabiano Fidêncio	21466eb4e5	kata-deploy: Fix clippy warnings across crate Fix all clippy warnings triggered by -D warnings: - install.rs: remove useless .into() conversions on PathBuf values and replace vec! with an array literal where a Vec is not needed - utils/toml.rs: replace while-let-on-iterator with a for loop and drop the now-unnecessary mut on the iterator binding - main.rs: replace match-with-single-pattern with if-let in two places dealing with experimental_setup_snapshotter - utils/yaml.rs: extract repeated serde_yaml::Value::String key into a local variable, removing needless borrows on temporary values Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-08 20:47:59 +02:00

1 2 3 4 5 ...

18461 Commits