kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-07-01 22:50:54 +00:00

Author	SHA1	Message	Date
Pradipta Banerjee	1487eaaaa2	kernel: Enable landlock LSM Allows using landlock LSM for the container process Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>	2026-05-27 13:33:46 +02:00
Fabiano Fidêncio	25491fc20c	Merge pull request #13104 from kata-containers/topic/kata-deploy-build-as-an-artefact kata-deploy: prebuild payload-specific component artifacts	2026-05-25 22:56:55 +02:00
Fabiano Fidêncio	c65d64873b	kata-deploy: prebuild payload-specific component artifacts Build and publish the kata-deploy binary and CoCo guest-pull nydus snapshotter as dedicated per-arch artifacts, then consume those tarballs when assembling the kata-deploy image. This avoids rebuilding those components in the payload image (which would happen in serial) path and reduces overall CI build time. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-25 22:13:41 +02:00
Fabiano Fidêncio	3dc02a8604	Merge pull request #13085 from Apokleos/erofs-gpt-vmdk-only runtime-rs: Support erofs snapshotter with gpt vmdk mode	2026-05-25 16:29:59 +02:00
Zvonko Kaiser	6c6c5809f1	Merge pull request #13109 from fidencio/topic/build-validate-measured-rootfs-root-hashes-for-all-shims build: Validate measured-rootfs root hashes all shims	2026-05-25 15:58:35 +02:00
Zvonko Kaiser	aeadb1af35	Merge pull request #12948 from fidencio/topic/numa runtime (go): agent: Add NUMA support for QEMU	2026-05-25 15:33:14 +02:00
Alex Lyn	53699b0170	docs: Reset max_unmerged_layers = 0 for gpt+vmdk mode As max_unmerged_layers = 1 is just for fsmerge mode, as containerd temperally unsupport fsmerge, we just reset it with default 0. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-25 19:13:28 +08:00
Alex Lyn	a359d13476	build: Validate measured-rootfs root hashes all shims The cached shim-v2 tarballs ship per-variant `root_hash_.txt` files embedded in the matching measured-rootfs image. Until now only shim-v2-rust validated those hashes against the freshly built rootfs images on a cache hit; shim-v2-go reused whatever was cached without checking, even though its bundled configuration files contain the `KERNELVERITYPARAMS_` values baked in at build time. When a PR changes the agent (and therefore the rootfs image and its dm-verity hash) but does not touch `src/runtime`, the shim-v2-go cache key stays the same and the stale tarball is reused. The resulting guest cmdline carries a verity hash that no longer matches the new rootfs image, so the VM panics very early in boot: device-mapper: verity: 254:1: metadata block 0 is corrupted erofs (device dm-0): cannot read erofs superblock Kernel panic - not syncing: VFS: Unable to mount root fs ... Generalize the shim-v2-rust cache validation so it also runs for shim-v2-go, push the per-variant root-hash sidecar files for both shims, and fall back to a full rebuild whenever the cached hash is missing or differs from the image one. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-25 19:12:52 +08:00
Alex Lyn	fd139a1143	kata-deploy: Reset max_unmerged_layers to "0" within erofs snapshotter we should set max_unmerged_layers = 0 for erofs snapshotter gpt-vmdk mode. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-25 19:08:31 +08:00
Alex Lyn	2036e66bc3	kata-agent: Integrate GPT partition support into multi-layer handler In GPT mode, all partitions share the same base block device, so resolving it once per uevent source and caching the result avoids redundant hotplug waits that would otherwise scale linearly with layer count. Layers are sorted by partition number before mounting to guarantee correct overlay lowerdir precedence regardless of the order the host emits Storage entries. And it will remove dead_code attributes to mark the codes working. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-25 19:08:31 +08:00
Alex Lyn	17fadde6d8	kata-agent: Add GPT partition utility functions The guest agent needs to resolve individual partition devices from a single GPT-partitioned block device, but the kernel does not always create partition nodes immediately after the base device appears, especially when another fd holds the device open during hot-plug. Add utility functions that handle two problems: (1) Mapping a base device path to its partition path following the kernel naming convention (bare suffix vs 'p' separator). (2) And ensuring the partition node exists before mount. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-25 19:08:31 +08:00
Alex Lyn	8119a561ae	kata-agent: Refactor wait_and_mount_layer to return LayerMountInfo This commit has No functional change — all callers pass None, so every call still resolves the device via uevent exactly as before. It just prepare the multi-layer EROFS handler for GPT partition and dm-verity support by widening the wait_and_mount_layer() interface without changing behavior. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-25 19:08:31 +08:00
Alex Lyn	0bd150e5f1	runtime-rs: Integrate GPT+VMDK mode for multi-layer EROFS rootfs When multiple EROFS layers are present, wrap them into a single GPT-partitioned virtual disk delivered via one VMDK descriptor and a single block device hotplug which significantly reduce pci bus slots compared with the previous one-device-per-layer approach that exhausts virtio-blk slots for large layer counts. The host detects multi-layer mounts, computes the GPT layout, generates head metadata plus a VMDK descriptor referencing all EROFS images, and hot-plugs the composite disk. Per-partition Storage entries are created with X-kata.gpt-partitioned and X-kata.partition-number options so the guest agent can resolve each layer to its partition device. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-25 19:08:31 +08:00
Alex Lyn	c3b06af4c7	kata-types: Add gpt_disk module for GPT metadata generation Introduce gpt_disk.rs to compute GPT partition layouts and generate metadata files for multi-layer EROFS rootfs. The module creates GPT head metadata that are combined with EROFS layer images via VMDK descriptors, presenting a single GPT-partitioned virtual disk to the guest VM — each EROFS layer mapped to its own partition. The layout engine calculates LBA positions for an arbitrary number of EROFS layers, then writes a full protective-MBR + GPT image and extracts the head (MBR + primary GPT table) segments as standalone files for VMDK extent assembly. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-25 19:08:31 +08:00
Alex Lyn	148810312d	runtime-rs: Refactor VMDK writer and erofs rootfs handling logic Restructure the erofs rootfs handler to support multi-layer GPT+VMDK mode where multiple EROFS layers are wrapped into a single virtual disk with a GPT partition table. Extract VmdkDescriptorWriter as a reusable struct for atomic VMDK descriptor generation. Change erofs_storage from Option<Storage> to Vec<Storage> to hold per-layer metadata, and add GPT metadata path tracking for proper cleanup with path-traversal guards. Bump MAX_VIRTIO_BLK_DEVICES from 10 to 127 to accommodate GPT disks carrying many partitions. Pre-extract mkdir directives from overlay mounts before the main loop to avoid redundant option parsing. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-25 19:08:31 +08:00
Alex Lyn	7086caaddf	kata-agent: Remove unused mode field from MkdirDirective As previous unused codes are with attribute of dead_code which actually are never used, we'd better remove them totally. It will remove the mode field from MkdirDirective structure and also remove its relavent test cases. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-25 19:08:31 +08:00
Alex Lyn	39c512bc36	kata-agent: Enhance virtio block matcher to reject partition uevents Enhance VirtioBlkPciMatcher to only match whole-disk uevents. This prevents the matcher from incorrectly matching partition uevents (e.g., /dev/vdaX) which is critical for partitioned disks where partition uevents appear alongside whole-disk uevents. This commit aims to eliminate such bad cases. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-25 19:08:31 +08:00
Alex Lyn	56f05aa534	kata-agent: Enhance SCSI block device matcher to reject partition uevents Refactor ScsiBlockMatcher to only match whole-disk uevents. This prevents the matcher from incorrectly matching partition uevents (e.g., block/sdd/sdd9) which is critical for partitioned disks where partition uevents appear alongside whole-disk uevents. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-25 19:08:31 +08:00
Fabiano Fidêncio	72be31c384	build: Validate measured-rootfs root hashes all shims The cached shim-v2 tarballs ship per-variant `root_hash_.txt` files embedded in the matching measured-rootfs image. Until now only shim-v2-rust validated those hashes against the freshly built rootfs images on a cache hit; shim-v2-go reused whatever was cached without checking, even though its bundled configuration files contain the `KERNELVERITYPARAMS_` values baked in at build time. When a PR changes the agent (and therefore the rootfs image and its dm-verity hash) but does not touch `src/runtime`, the shim-v2-go cache key stays the same and the stale tarball is reused. The resulting guest cmdline carries a verity hash that no longer matches the new rootfs image, so the VM panics very early in boot: device-mapper: verity: 254:1: metadata block 0 is corrupted erofs (device dm-0): cannot read erofs superblock Kernel panic - not syncing: VFS: Unable to mount root fs ... Generalize the shim-v2-rust cache validation so it also runs for shim-v2-go, push the per-variant root-hash sidecar files for both shims, and fall back to a full rebuild whenever the cached hash is missing or differs from the image one. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-25 11:04:08 +02:00
Fabiano Fidêncio	7ddea26137	Merge pull request #13086 from fvichot/flo-kata-monitor-fix kata-monitor: use full URI for connecting to containerd	2026-05-25 10:16:11 +02:00
Fabiano Fidêncio	513d87db7e	Merge pull request #13106 from fidencio/topic/runtime-rs-ensure-bios-is-passed-to-qemu-on-non-CC-cases runtime-rs: qemu: pass -bios for non-confidential guests	2026-05-25 09:56:11 +02:00
Fabiano Fidêncio	407a6946f2	Merge pull request #13077 from hdp617/fix-kata-deploy-build packaging: fix parallel kernel build race and kata-deploy script bugs	2026-05-25 09:53:38 +02:00
Fabiano Fidêncio	f763e9cca9	tests: Add NUMA topology / GPU placement tests to the NV CIs Add k8s-nvidia-numa.bats with five tests that validate NUMA behaviour on hosts where NUMA is configured by default (qemu-nvidia-gpu, qemu-nvidia-gpu-snp, qemu-nvidia-gpu-tdx): 1. Multi-node sandbox (large workload spanning all host NUMA nodes): - Guest NUMA node count matches host - Guest vCPU distribution is balanced across nodes (max-min <= 1) - Guest memory is distributed across NUMA nodes - Host-side vCPU pinning is balanced across NUMA nodes 2. Right-sized single-node sandbox (small workload fitting one node): - Guest collapses to a single NUMA node - All host vCPU threads pinned to that one NUMA node 3. GPU passthrough with VFIO, multi-node: - Guest NUMA topology is balanced (same as test 1) - Guest GPU's NUMA node matches the host GPU's NUMA node (resolved via the vfio-pci,host=<BDF> from the QEMU command line and /sys/bus/pci/devices/<BDF>/numa_node) - QEMU command line contains pxb-pcie and policy=bind - Host vCPU pinning is balanced 4. GPU passthrough with VFIO, right-sized single-node: small workload plus GPU that fits in a single host NUMA node: - Guest collapses to a single NUMA node - The chosen node is the GPU's host NUMA node, not just any node that fits — verified by matching host-nodes= in the memory backend and pxb-pcie numa_node= against the GPU's host node - Guest GPU reports the same NUMA node as the host GPU 5. Explicit numa_mapping in the runtime TOML (QEMU-only): - Drops a config.d/ fragment that sets numa_mapping = ["1"], so the auto-derive + right-sizing path is bypassed entirely - Guest sees exactly 1 NUMA node - QEMU memory backend is bound to host node 1 (host-nodes=1, policy=bind), not host node 0 - Host-side vCPU threads land on host node 1 - Drop-in is removed on teardown so subsequent tests are unaffected Guest-side checks use a dedicated container image (quay.io/kata-containers/numa) that reads sysfs and prints results to stdout — no kubectl exec or CoCo policy overrides needed. Host-side checks (crictl, pgrep, taskset) run directly on the host via sudo; a standalone numa-pinning-check.sh script handles the vCPU thread affinity inspection. The config.d/ helpers used by test 5 are runtime-agnostic (probe Go vs runtime-rs layout on disk) but the test is gated to qemu-* shims since runtime-rs does not yet implement NUMA. Skips cleanly on single-NUMA hosts, unsupported hypervisors, or when no nvidia.com/pgpu resources are available (GPU tests only). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-05-24 22:00:46 +02:00
Fabiano Fidêncio	20705470e9	docs: Add NUMA support guide for Kata Containers with QEMU Add a step-by-step how-to guide covering host inspection, Kata NUMA drop-in setup (via kata-deploy Helm and manual config.d/), pod deployment examples, and guest/host verification procedures. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-24 22:00:46 +02:00
Fabiano Fidêncio	8787da13a9	agent: Add NUMA-aware PCI path parsing Extend pcipath_from_dev_tree_path() to support the full NUMA-aware path format "root_complex/bus/device" (e.g. "10/00/02") in addition to the legacy "bus/device" format, defaulting to root complex "00" for backward compatibility. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-05-24 22:00:46 +02:00
Fabiano Fidêncio	1cbe930fc9	runtime: Add pxb-pcie NUMA-aware PCIe topology for VFIO devices When NUMA placement is active and VFIO devices are cold-plugged, create a pxb-pcie (PCIe Expander Bridge) per NUMA node that has devices. Each pxb-pcie carries a numa_node property that gives the guest kernel correct NUMA affinity for all PCI devices beneath it. Root ports are created on each pxb-pcie bus instead of pcie.0, and VFIODevice.Attach() assigns each device to the root port on its host NUMA node's pxb bridge. Non-VFIO devices remain on pcie.0. NUMA placement is "active" when there is more than one guest NUMA node OR a single guest node mapped to a specific host node (the latter happens when maybeRightSizeAutoNUMA() collapses a multi-node sandbox to the GPU's host NUMA node). In both cases buildNUMATopology() also emits the matching memory-backend-ram,host-nodes=,policy=bind entries so guest memory is sourced from the right host node. So pxb-pcie can never capture a leaf virtio-pci device as the default bus, every virtio-pci device emitter (NetDevice, VSOCK, vhost-user-{net,scsi,blk,fs}) now appends bus=pcie.0 explicitly when the machine actually exposes a pcie.0 root. Detection is done via a new hasPCIeRoot() helper that returns true only for q35/virt machine types — ppc64le's pseries (pci.0), s390x's s390-ccw-virtio (CCW transport) and microvm (no PCI) intentionally skip the pin to avoid "Bus 'pcie.0' not found" at startup. This is the only QEMU mechanism that works for both regular and confidential (TDX/SNP) guests, as it operates through the PCI bus hierarchy rather than ACPI table injection. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-24 22:00:46 +02:00
Fabiano Fidêncio	15292da217	config: Enable NUMA by default for nvidia-gpu configurations Enable enable_numa=true in the three nvidia-gpu QEMU configuration templates (base, SNP, TDX). On single-NUMA hosts this is a no-op since buildNUMATopology() returns nil when there is only one node. On multi-NUMA hosts it ensures GPU memory accesses are NUMA-local. Add documentation to all QEMU config templates explaining the VFIO device NUMA placement validation that occurs when NUMA is enabled. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-05-24 22:00:46 +02:00
Fabiano Fidêncio	feeb5d8ecc	runtime-rs: Fix vCPU pinning race with backoff retry QEMU can report fewer vCPU threads during early startup, causing partial affinity setup. Let's retry with exponential backoff until the expected thread count is visible, then continue with best-effort pinning if the window is exhausted. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-24 22:00:46 +02:00
Fabiano Fidêncio	f53f427859	runtime: Fix vCPU pinning race for Go runtime QEMU may not have spawned all vCPU threads when pinning starts, so query_cpus_fast can return an incomplete list and leave some vCPUs unpinned. To fix it, let's add exponential backoff retries before pinning and fall back to available threads if retries are exhausted. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-24 22:00:46 +02:00
Fabiano Fidêncio	b688619314	runtime: oci: Fix sandbox CPU sizing with cpuManagerPolicy=static When cpuManagerPolicy=static is configured, kubelet sets the sandbox CPU quota to -1 (unconstrained) because it uses cpuset pinning instead of CFS quota. This causes CalculateSandboxSizing to compute 0 workload CPUs, resulting in the VM starting with only default_vcpus. Fall back to deriving the CPU count from sandbox CPU shares (1024 shares per CPU) when the quota-based calculation yields 0. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-24 22:00:46 +02:00
Fabiano Fidêncio	12e5985dbd	runtime: Add NUMA-aware vCPU pinning and cpuset.mems forwarding Make checkVCPUsPinning() NUMA-aware: when GuestNUMANodes are configured, vCPU threads are pinned to host CPUs belonging to the same NUMA node as the vCPU's guest NUMA node assignment via checkVCPUsPinningNUMA(), preserving memory locality. vCPUs are distributed proportionally across NUMA nodes, matching the distribution in buildNUMATopology(). Stop unconditionally stripping cpuset.mems in constrainGRPCSpec() and container update(). When multi-NUMA is configured, translate host NUMA node IDs to guest NUMA node IDs using translateHostMemsToGuest() before forwarding to the agent. This allows the agent to enforce NUMA-aware memory placement for containers. Filter guest NUMA nodes at VM creation time: before calling CreateVM(), prune GuestNUMANodes to only those whose HostCPUs intersect the sandbox cpuset. This avoids exposing fake NUMA topology to the guest when Kubernetes allocates CPUs from fewer nodes than the host has (e.g. all CPUs from node 0 on a 2-node host), improving memory locality and avoiding unnecessary cross-node memory traffic. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-05-24 22:00:46 +02:00
Fabiano Fidêncio	d0d7deb262	runtime: Add host NUMA distance discovery and build guest NUMA topology Add sysfs-based host NUMA distance reading (GetHostNUMADistances) that parses /sys/devices/system/node/nodeN/distance to mirror the host NUMA distance matrix into the guest via -numa dist entries. Implement buildNUMATopology() which translates the GuestNUMANodes configuration into govmm NUMANode and NUMADist slices. Each guest NUMA node gets a floor-divided share of vCPUs and memory, with the last node absorbing any remainder. This handles the common Kata case of +1 VMM overhead vCPU gracefully. Memory backends are selected based on hugepages/virtio-fs/file-backed-mem configuration. Guard multi-NUMA topology generation to amd64 and arm64 only, since other architectures (s390x, riscv64) do not support QEMU NUMA/DIMM. Wire buildNUMATopology() into CreateVM so the QEMU config includes NUMA nodes and distances. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-05-24 22:00:46 +02:00
Fabiano Fidêncio	447e2a3faf	runtime: Add VFIO device NUMA node detection and placement validation Add PCISysFsDevicesNUMANode property and GetPCIDeviceNUMANode() helper to read /sys/bus/pci/devices/<BDF>/numa_node when discovering VFIO devices. Store the result in the new NUMANode field on VFIODev (-1 for unknown/no affinity). Wire NUMA node detection into both GetAllVFIODevicesFromIOMMUGroup() (legacy VFIO path) and GetDeviceFromVFIODev() (IOMMUFD path) so every discovered VFIO device carries its host NUMA node. Add validateVFIODeviceNUMAPlacement() which runs at the end of buildNUMATopology(). It checks every cold-plugged VFIO device's host NUMA node against the guest NUMA topology and logs a warning if a device is on a host NUMA node not covered by any guest NUMA node (indicating potential cross-NUMA memory access overhead), or an info message confirming correct placement. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-05-24 22:00:46 +02:00
Fabiano Fidêncio	1ee8bb5740	runtime: Add NUMA-aware SMP topology Make cpuTopology() NUMA-aware by accepting a numNUMANodes parameter. When multiple NUMA nodes are configured, restructure the SMP topology so that Sockets=numNUMA and Cores=ceil(maxvcpus/numNUMA), grouping vCPUs by socket per NUMA node. Use ceiling division so that uneven vCPU counts (e.g. the +1 VMM overhead vCPU that Kata adds) produce a QEMU-valid SMP topology where MaxCPUs == Sockets * Cores * Threads. When numNUMANodes <= 1, the existing flat topology (Sockets=maxvcpus, Cores=1) is preserved. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-05-24 22:00:46 +02:00
Fabiano Fidêncio	1e9da61d48	govmm: Add multi-NUMA memory backend and distance matrix support Introduce NUMANode and NUMADist types, add NUMANodes/NUMADists fields to Config, and implement appendMultiNUMAMemoryKnobs() to generate per-node memory-backend objects with host-nodes/policy=bind, -numa node entries with cpus= ranges, and -numa dist entries for the distance matrix. Gate the multi-NUMA path in appendMemoryKnobs() behind isDimmSupported() to ensure architectures without DIMM support (s390x, riscv64) fall back to the single-node path. Drop 386 from isDimmSupported since 32-bit x86 is not a supported Kata target. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-05-24 22:00:46 +02:00
Fabiano Fidêncio	8d2ecaabb5	versions: Bump QEMU to v11.0.0 For more details see QEMU's release notes: https://www.qemu.org/2026/04/22/qemu-11-0-0/ GPU experimental variants are also using v11.0.0 plus one patch to solve issues related to NUMA mapping. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-24 22:00:46 +02:00
Fabiano Fidêncio	ed4d0fb51f	runtime-rs: qemu: pass `-bios` for non-confidential guests The `boot_info.firmware` field from the hypervisor configuration is loaded by kata-types and surfaces in the TOML as `firmware = "..."`, but the qemu cmdline generator never consumed it for non-CC guests. Today, `-bios <path>` is only appended via the `Bios` device pushed by `add_{sev,sev_snp,tdx}_protection_device()` in `QemuInner::start_vm()`, which use the firmware copied into the `ProtectionDeviceConfig`. That path is taken only when `confidential_guest = true` and a SEV/SEV-SNP/TDX protection device is configured. For plain Q35 profiles (notably the nvidia-gpu one, which needs OVMF to boot the GPU passthrough VM), the `firmware` set in the TOML was silently dropped and qemu fell back to its default BIOS. Wire `boot_info.firmware` directly in `QemuCmdLine::new()` when no protection device path is going to emit `-bios` (i.e. for non-CC guests). CC paths are left untouched so we don't end up with a duplicated `-bios` argument. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-24 15:05:26 +02:00
Fabiano Fidêncio	4c1b3312ea	runtime-rs: nvidia-gpu: use _NV firmware substitutions in config template The `configuration-qemu-nvidia-gpu-runtime-rs.toml.in` template was using the generic `@FIRMWAREPATH@` / `@FIRMWAREVOLUMEPATH@` placeholders, which are left empty for the qemu hypervisor in the runtime-rs Makefile. As a result, no firmware (BIOS) was actually passed to qemu when launching a VM with the nvidia-gpu configuration, breaking OVMF based boot. Switch the placeholders to `@FIRMWAREPATH_NV@` / `@FIRMWAREVOLUMEPATH_NV@`, matching the runtime-go nvidia-gpu template and the substitutions exported by the runtime-rs Makefile, so the OVMF firmware path is properly plumbed through to qemu. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-24 14:59:11 +02:00
Florian Vichot	554e8f91b1	kata-monitor: use full URI for connecting to containerd Without the protocol in the URI, grpc-go defaults to the DNS resolver, which results in an error for unix sockets (`name resolver error: produced zero addresses`). We also remove the `getAddressAndDialer(...)` and `dial(...)` functions, as they are no longer necessary, grpc-go supports connecting to unix sockets directly. This also removes the matching tests. This also adds a `Makefile` and tweaks the Dockerfile to simplify building the Docker image. Fixes #12398 Signed-off-by: Florian Vichot <florian.vichot@gmail.com>	2026-05-23 16:47:46 +02:00
Fabiano Fidêncio	cbcdd999e4	Merge pull request #12957 from Apokleos/fix-sb-api runtime-rs: Fix sandbox-api lifecycle and CRI status handling	2026-05-23 09:26:14 +02:00
Fabiano Fidêncio	a7aa2576c6	Merge pull request #13089 from fidencio/topic/kata-deploy-fix-label-set-on-rke2 kata-deploy: verify kata-runtime label remains stable on rke2/k3s	2026-05-23 08:52:27 +02:00
Fabiano Fidêncio	7faeb9b727	Merge pull request #13091 from kata-containers/dependabot/go_modules/src/runtime/github.com/containerd/containerd-1.7.32 build(deps): bump github.com/containerd/containerd from 1.7.29 to 1.7.32 in /src/runtime	2026-05-23 08:51:36 +02:00
Huy Pham	3ec444a7df	kernel: bump config version Bump the Kata Containers kernel configuration version to 195. Signed-off-by: Huy Pham <huypham@google.com>	2026-05-22 12:26:53 -07:00
Huy Pham	c490373a78	kata-deploy: packaging: fix absolute path resolution in merge script The `kata-deploy-merge-builds.sh` script blindly prepended `PWD` to the `kata_versions_yaml_file` argument, assuming it was always a relative path. However, the `Makefile` passes an absolute path using `$(MK_DIR)`. This resulted in invalid double-concatenated paths like `/workspace/...//workspace/...` which failed to copy. Fix this by using `readlink -f` to safely resolve the path. This correctly handles both relative and absolute paths, preventing path corruption. Signed-off-by: Huy Pham <huypham@google.com>	2026-05-22 12:05:56 -07:00
Fabiano Fidêncio	5d3e1e6396	kata-deploy: verify kata-runtime label remains stable on rke2/k3s The retry loop added in `efd468df3f` still allows the install to declare success while inside the kubelet's post-restart re-register window. On rke2/k3s, `systemctl restart rke2-agent` restarts both containerd and the kubelet, but `wait_till_node_is_ready` polls `.status.conditions[Ready]` every 2 s and returns on the first `True` observation it sees. By default the kubelet only publishes node status every ~10 s, so that first `True` is almost always the stale value from before the restart — the kubelet hasn't actually finished restarting yet. `label_node_with_retry` then applies the label, sleeps 1 s, reads back "true" (still stale, kubelet still down), and returns Ok. Install completes, `/readyz` flips to 200, helm releases its `--wait`, and the bats test starts — and only then does the kubelet finish coming up, re-register the node, and clobber the label with its cached set. The lifecycle test sees an empty `katacontainers.io/kata-runtime` and fails: # Node label katacontainers.io/kata-runtime: not ok 1 Kata artifacts are present on host after install A single-shot verification can't distinguish "still stale true" from "truly stable true after kubelet re-register". Replace it with a stability window: after (re)applying the label, require it to remain at the expected value for STABILITY_CHECKS=6 consecutive observations spaced CHECK_INTERVAL=2 s apart (≈ 12 s — comfortably more than the kubelet's status-update period). If the value ever drifts inside the window, re-apply and restart the stability counter. Bounded by MAX_APPLY_ATTEMPTS=12, so worst case is ~3 min; happy path adds ~12 s to install. Also add a short polling loop to the test's own label assertion as belt-and-suspenders for any leftover transient race, matching the existing retry pattern used for the container-runtime version check. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-22 11:53:18 +02:00
Alex Lyn	adf6d43e24	test: skip TestContainerMemoryUpdate for sandbox api Temporarily skip the `TestContainerMemoryUpdate` test case for sandbox api. This test case is currently skipped in other VMMs (e.g., QEMU, Cloud-Hypervisor) due to known issues and environmental stability concerns. To maintain consistency across the project, we are skipping it for sandbox as well. A follow-up PR will be dedicated to addressing these issues and properly enabling/refining this test case for all VMMs. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-22 10:46:44 +08:00
Alex Lyn	b5349f4d78	versions: bump containerd to 2.3 for sandbox API tests containerd 2.3 requires Go 1.26.3, but Kata still pins Go 1.25.10. Use Go 1.26.3 for the sandbox-api job so that make cri-integration can build containerd from source. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-22 10:46:16 +08:00
Alex Lyn	9f78dc687f	tests: exclude TestContainerRestart from the cri-containerd test list Creating a new container in the same sandbox VM after the previous container has exited and been removed has never been supported by kata-containers (neither with the go-based nor the rust-based runtime). When the last container is removed the kata VM shuts down, so any attempt to start a new container in the same sandbox fails. This test exercises a use-case kata does not currently support, and it has never been part of the passing list for good reason. Mark it explicitly excluded with a comment so it is clear this is a deliberate omission rather than an oversight. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-22 10:45:50 +08:00
Alex Lyn	328fccfbbd	ci: Re-enable run-containerd-sandboxapi job The job was disabled because TestImageLoad was failing when using the shim sandboxer with runc due to a containerd bug (config.json not being written to the bundle directory). Now that check_daemon_setup uses podsandbox for the runc sanity check, the root cause of the failure is worked around on our side and the job can be re-enabled. Also update the runner to ubuntu-24.04. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-22 10:45:26 +08:00
Alex Lyn	a7739579d6	tests: Use podsandbox sandboxer for the runc sanity check The check_daemon_setup function verifies that containerd + runc are functional before the real kata tests run. Using the shim sandboxer for this runc check hits a known containerd bug where the OCI spec is not populated before NewBundle is called, so config.json is never written and containerd-shim-runc-v2 fails at startup. See containerd/containerd#11640 The sandboxer choice is irrelevant for this sanity check, so use podsandbox which works correctly with runc. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-22 10:44:38 +08:00

1 2 3 4 5 ...

19143 Commits