kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-07-01 14:38:33 +00:00

Author	SHA1	Message	Date
Alex Lyn	b2d0e5b712	kata-agent: Use kata-types dmverity with optional devicemapper support Replace the agent's inline devicemapper implementation with the libs kata-types::dmverity module. The agent's devicemapper Cargo feature now forwards to kata-types/devicemapper, removing the direct libdevmapper link dependency from the agent crate. Gate all dm-verity imports, constants, and call sites behind libdevmapper. Add USE_DEVMAPPER Makefile variable (default no) that appends the devicemapper feature flag and forces LIBC=gnu when enabled. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-26 09:51:05 +08:00
Alex Lyn	274a904bf7	kata-agent: Mount multi-layer EROFS partitions concurrently This commit is just a enhancement without any functionality changes. Replace the sequential loop in handle_multi_layer_erofs_group with join_all-based concurrent mounting. Base device paths and mount directories are pre-resolved before spawning futures to avoid lock contention. On partial failure, successfully mounted layers are unmounted and dm-verity devices cleaned up before propagating the error. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-26 09:51:05 +08:00
Alex Lyn	51e8310ef3	kata-agent: Integrate dm-verity into multi-layer EROFS mount path Wire the dm-verity helpers into the layer mount flow so that GPT partitions carrying verity metadata are mounted through a verified device-mapper target instead of the raw partition. Refactor wait_and_mount_layer to resolve partition path and verity device as separate steps: create a dm-verity device when X-kata.dmverity-enabled=true is set, fall back to direct partition mount otherwise, and return the verity device path for cleanup tracking. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-26 09:51:05 +08:00
Alex Lyn	963ba6c6cd	kata-agent: Add dm-verity device cleanup for GPT-partitioned layers Add per-container verity_devices tracking in Sandbox and wire the teardown path: destroy_partition_dmverity_device removes the device-mapper target via deferred-remove ioctl and deletes the mknod node, cleanup_dmverity_devices iterates all devices in reverse order. Wire into remove_container_resources (rpc.rs) so verity devices are torn down after unmount, and record verity device paths in add_storages (storage/mod.rs) for tracking. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-26 09:51:05 +08:00
Alex Lyn	dce409bc35	kata-agent: Add dm-verity device creation for GPT-partitioned layers GPT-partitioned EROFS layers can carry dm-verity hashes appended after the filesystem data within the same partition. The host runtime passes the root hash and parameters as X-kata.dmverity.* storage options; the agent must set up the kernel dm-verity target before mounting so that every read is integrity-checked against the Merkle tree. Implement dm-verity device creation: option parsing from storage options, device name generation, and create helper via devicemapper ioctls with hash_start_block calculation (accounting for v1 superblock presence). Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-26 09:51:05 +08:00
Alex Lyn	e900eae388	kata-agent: Add no-udev DmOptions builders and mknod device node helpers The kata guest VM runs without udev, so device-mapper nodes under /dev/mapper are never created automatically. Add the foundational helpers that subsequent dm-verity integration will rely on: It focus on the following key points: (1) DmOptions builders that disable all udev synchronization flags, with read-only and deferred-remove variants. (2) mknod-based device node creation/removal under /dev/mapper, since devtmpfs nodes are not auto-created without udev. Also add the devicemapper crate dependency (default-features = false). But note that the commit depends on device mapper with no-udev support with the PR:https://github.com/stratis-storage/devicemapper-rs/pull/1036 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-26 09:51:05 +08:00
Chris Ayoub	4e3d257dc0	agent: Apply init subcgroup in cgroupfs manager When cgroup v2 is enabled, exec can fail with EBUSY while writing the process to cgroup.procs if the container process has been delegated to an init subcgroup. PR #10845 fixed this behavior for the systemd/D-Bus cgroup manager path, which was related to #10733. The cgroupfs manager still writes the process directly to the container cgroup, so apply the same init subcgroup handling there. Also fix the cgroupfs init-subcgroup existence check for absolute OCI cgroup paths by joining the trimmed cgroup path under the cgroup root. Fixes: #9701 Signed-off-by: Chris Ayoub <cayoub@openai.com> Generated-By: OpenAI Codex	2026-06-24 21:25:49 +00:00
Alex Lyn	9550a323ac	Merge pull request #13245 from kata-containers/unify-nix-version Unify nix version	2026-06-22 15:25:10 +08:00
PiotrProkop	c2d737c9d7	agent: report 128+signal as exit code for signal-terminated processes When a container process is terminated by a signal, the agent's SIGCHLD reaper stored the raw signal number as the process exit code. As a result a process killed by SIGKILL(9) reported exit code 9 instead of the conventional 137 (128+9). Apply the standard shell convention of 128+signal_number so that signal-terminated processes report the expected exit codes, e.g. SIGKILL(9) -> 137, SIGTERM(15) -> 143, SIGINT(2) -> 130. This mimics runc, which encodes wait-status exit codes the same way: https://github.com/opencontainers/runc/blob/v1.4.3/libcontainer/utils/utils.go#L19 Both runc and this new Kata behaviour follow the conventional exit code semantics documented at https://tldp.org/LDP/abs/html/exitcodes.html. The conversion is factored into a small helper and covered by a unit test. The runtime and shim already pass the exit code through unchanged, so no further changes are needed for the corrected value to surface. Fixes: signal-terminated containers reporting raw signal numbers Signed-off-by: PiotrProkop <pprokop@nvidia.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-21 16:34:17 +02:00
stevenhorsman	531877f28f	deps: Upgrade nix crate from 0.26.4 to 0.31.3 Upgrade the nix crate across the workspace to version 0.30.1 to address security vulnerabilities and adopt safer file descriptor handling patterns. ### Breaking Changes in nix 0.28.0 1. File Descriptor Type Changes - Functions now return `OwnedFd` instead of `RawFd` (i32) - Functions requiring file descriptors now expect types implementing `AsFd` trait - This provides RAII-based automatic cleanup and prevents fd leaks 2. API Signature Changes - `pipe()`, `pipe2()`, `openpty()` now return `OwnedFd` tuples - `socket()` returns `OwnedFd` instead of `RawFd` - `open()`, `memfd_create()` return `OwnedFd` - `setns()`, `write()`, `fcntl()` require `AsFd` trait - `madvise()` requires `NonNull<c_void>` instead of raw pointer - `bind()`, `listen()`, `connect()` require `AsFd` and `Backlog` type 3. Module Feature Flags - Modules now require explicit feature flags (mman, reboot, etc.) ### Additional Breaking Changes in nix 0.30.1 1. symlinkat() API Change - `dirfd` parameter now requires `AsFd` trait instead of `Option<RawFd>` - Use `BorrowedFd::borrow_raw(libc::AT_FDCWD)` for current directory 2. Type Alias Deprecation - `MemFdCreateFlag` renamed to `MFdFlags` for consistency ### Changes Made Workspace Configuration (Cargo.toml) - Updated nix to 0.30.1 with features: fs, mount, sched, process, ioctl, signal, socket, feature, user, hostname, term, event, mman, reboot File Descriptor Handling Patterns - Use `BorrowedFd::borrow_raw(raw_fd)` to wrap RawFd for AsFd requirements - Use `.as_fd().as_raw_fd()` to extract raw fd without ownership transfer - Use `.into_raw_fd()` only when ownership transfer is needed - Use `NonNull::new().unwrap()` for madvise pointer conversion Deprecated API Replacements - `eventfd()` → `EventFd::from_value_and_flags()` - `Errno::from_i32()` → `Errno::from_raw()` - `listen(fd, backlog)` → `listen(&fd, Backlog::new(backlog).unwrap())` - `MemFdCreateFlag` → `MFdFlags` Generated by: IBM Bob Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-06-19 03:49:16 -07:00
SantoshMadhukar-K	736e07d18e	test: Improve test coverage for device handlers Add comprehensive test coverage for the device handler modules under src/agent/src/device, including matcher behavior, edge cases, and shared helper coverage across block, network, nvdimm, scsi, and vfio device paths. Assisted-by: IBM Bob Signed-off-by: SantoshMadhukar-K <SantoshMadhukar.Khandyana@ibm.com>	2026-06-18 07:18:36 -07:00
LandonTClipp	676fc90d0b	feat(agent): translate VISIBLE_CDI_DEVICES into CDI device requests Add an opt-in `visible_cdi_devices` agent option that lets a container select which of the VM's CDI-known devices it sees via a VISIBLE_CDI_DEVICES env var. The schema is `<cdi-kind>=<devices>` (e.g. "nvidia.com/gpu=all", or "kata.com/gpu=0,1"), with multiple kinds delimited by ':'. When enabled, the agent maps the value to CDI device requests and feeds them through the existing CDI injection path, so device nodes, mounts, env and createContainer hooks from the guest CDI spec (e.g. /var/run/cdi/nvidia.yaml, generated by NVRC/nvidia-ctk) are applied. The variable is intentionally distinct from NVIDIA_VISIBLE_DEVICES and does not promise identical semantics. If a requested kind is present in the guest CDI registry but the specific device index is not, the agent fails fast rather than waiting for the CDI-spec watch/timeout path. An entirely absent kind falls through to the existing wait/timeout behavior. Defaults to false; containers that don't set the env var are unaffected. Signed-off-by: LandonTClipp <lclipp@coreweave.com>	2026-06-16 11:44:09 +02:00
Thejas N	7807aa3d62	agent: fix get_oom_event deadlock after connection restart When the agent-protocol-forwarder's inbound connection restarts (e.g. during a Cloud API Adaptor restart in peer pod environments), the shim re-sends a GetOOMEvent request through the new connection. Since the forwarder→agent Unix socket survives the restart, the old handler from the previous connection remains alive, holding the event_rx lock while blocked in recv().await. The new handler acquires the sandbox lock, then attempts to acquire the event_rx lock — which is held by the old handler. Because the sandbox lock is still held during this wait, every subsequent RPC (ExecProcess, WaitProcess, StatsContainer, SignalProcess, etc.) blocks on the sandbox lock, rendering the pod completely unresponsive. The root cause is a lock ordering violation: get_oom_event held the sandbox lock while acquiring the event_rx lock. Fix this by scoping the sandbox lock acquisition so it is dropped before the event_rx lock is acquired. The sandbox lock is only needed to clone the Arc<Mutex<Receiver>> — once cloned, it can be released immediately. Assisted-by: Claude Code <noreply@anthropic.com> Signed-off-by: Thejas N <thn@redhat.com>	2026-06-15 07:47:18 +02:00
Fupan Li	9553614f32	Merge pull request #12772 from Apokleos/nydus-standalone runtime-rs: Nydus standalone mode support in runtime-rs	2026-06-12 10:36:17 +08:00
Alex Lyn	4c63b8e3de	agent: handle ENOSYS in overlayfs storage handler In standalone nydusd mode with virtio-fs passthrough, the guest-side mkdir may fail with ENOSYS. Update the overlayfs storage handler to skip directory creation when the directory already exists, logging a warning instead of failing. This ensures container rootfs setup succeeds when nydusd's native overlay manages the directory structure. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 21:25:18 +02:00
Alex Lyn	5a00053b38	kata-agent: Implement filesystem space usage collection via statfs Add update_guest_filesystem_metrics() that collects disk space usage (total/used/available) for all read-write mounted filesystems inside the guest VM. This enables monitoring guest disk usage in kata/coco pod through the existing GetMetrics RPC. And its output metrics looks like as below: - kata_guest_filesystem_bytes{mount="/",device="vda",item="total\|used\|available"} - kata_guest_filesystem_inodes{mount="/",device="vda",item="total\|used\|available"} Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 20:47:05 +02:00
Alex Lyn	6c66724591	kata-agent: Add filesystem space usage metric declarations Add two new GaugeVec metrics to expose guest filesystem space usage: (1) kata_guest_filesystem_bytes{mount, device, item}: space in bytes (total/used/available) (2) kata_guest_filesystem_inodes{mount, device, item}: inode counts (total/used/available) Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 20:47:05 +02:00
manuelh-dev	953b306ff3	Merge pull request #12979 from manuelh-dev/mahuber/erofs-tmpfs-mount runtime-rs/agent: support EROFS snapshots without a rwlayer	2026-05-29 13:50:27 -07:00
Fabiano Fidêncio	91df041803	agent: expose guest InfiniBand devices to VFIO containers When a VF is cold-plugged in guest-kernel mode, mlx5_core binds to the PCI device inside the VM and mlx5_ib creates IB character devices under /dev/infiniband/ (uverbs, rdma_cm, umad). The container cannot reach these devices unless they are explicitly added to its OCI spec. Add expose_guest_infiniband_devices(), called from create_devices() when the container carries at least one VFIO device entry. The function: - Walks /dev/infiniband/ inside the guest VM. - Appends each char device to spec.linux.devices. - Inserts matching cgroup allow rules (rwm). - Is a no-op if /dev/infiniband/ is absent or empty (no IB driver, or VF not yet rebound), so non-RDMA pods are unaffected. Gate the call on container_has_vfio_device() so unrelated containers sharing the sandbox do not get IB device access widened. Add is_vfio_device_type() and snapshot_infiniband() to kata-sys-util/pcilibs. is_vfio_device_type() lets the agent check device type strings against the VFIO driver name constants without duplication. snapshot_infiniband() summarises /sys/class/infiniband, /sys/class/infiniband_verbs, and /dev/infiniband as a single diagnostic string for log context; it lives in pcilibs because it has no agent-specific dependencies (pure sysfs/devfs reads). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-05-29 13:07:45 +02:00
Fabiano Fidêncio	9893b6dc03	runtime: correctly resolve cold-plug VFIO guest PCI paths Populate missing VFIO guest PCI paths via QMP before serializing container devices so guest-kernel PCI env translation has the mappings it needs. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-05-28 21:54:52 +02:00
Fabiano Fidêncio	118b7fa611	agent: reconcile VFIO netdev MAC before UpdateInterface lookup When a VFIO cold-plugged network device appears in guest with a different MAC than the runtime request, resolve the netdev by PCI path and apply the requested MAC before the normal by-MAC update flow. This preserves existing behavior while avoiding UpdateInterface mismatches in SR-IOV cold-plug cases. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-05-28 21:54:52 +02:00
Fabiano Fidêncio	e89eb77245	agent: keep PCIDEVICE env unchanged when pcimap is missing Avoid failing container creation when per-container PCI mappings are unavailable by preserving PCIDEVICE entries unchanged and warning instead. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-28 21:54:52 +02:00
Manuel Huber	4fbfba2f79	agent: support run-backed EROFS upper Support multi-layer EROFS storage without an explicit ext4 upper layer. When runtime-rs sends only EROFS lower storage and overlay metadata, create the overlay upper/work directories under the container bundle in /run/kata-containers. Keep the explicit ext4 rwlayer path for disk-backed snapshots, and only track real temporary mount points for cleanup. The implicit /run-backed upper is bundle-scoped state and is removed with the container bundle. Assisted-by: OpenAI Codex <codex@openai.com> Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-05-27 17:12:20 +00:00
Fabiano Fidêncio	5adfb27297	Merge pull request #13118 from PiotrProkop/fix-missing-cwd agent: restore process CWD auto-creation	2026-05-27 13:32:05 +02:00
PiotrProkop	60a2e27f02	agent: Restore process CWD auto-creation Commit `b56313472` ("agent: Align agent OCI spec with oci-spec-rs", PR #9944) inverted the condition guarding the create_dir_all call for process.cwd: the leading `!` was dropped during the refactor. As a result, the CWD is created only when process.cwd is the empty string. When the guest then runs chdir(process.cwd) and CWD doesn't exist it returns ENOENT. The agent propagates that to the shim, which surfaces it to containerd as "failed to create shim task: ENOENT: No such file or directory" — indistinguishable from a missing argv[0]. This regressed the original fix in PR #2375 (Fixes #2374), which deliberately mirrored runc's behavior. Put the `!` back. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: PiotrProkop <pprokop@nvidia.com>	2026-05-27 09:59:15 +02:00
Manuel Huber	e838cd7d8d	agent: compact EROFS overlay lowerdirs Use kata_types::mount::Mount for the final multi-layer EROFS overlay mount instead of calling baremount() directly. The mount helper detects overlay option strings close to the kernel mount data limit. When lowerdir entries share a common parent, it changes into that directory and rewrites lowerdir to relative paths. That avoids repeating the same long prefix for every layer. Multi-layer EROFS images can have many lower layers under /run/kata-containers/<cid>/multi-layer. Passing the raw absolute lowerdir list can exceed the mount option buffer and fail the final overlay mount, even after all layer devices mounted successfully. Reuse the helper so this path follows Kata's normal overlay mount handling, including lowerdir compaction before mount(2). Assisted-by: OpenAI Codex <codex@openai.com> Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-05-26 18:42:11 +00:00
Dan Mihai	c81dadaba1	Merge pull request #13064 from burgerdev/add-arp-neighbour agent: use rtnetlink to add ARP neighbour	2026-05-26 09:59:44 -07:00
Fabiano Fidêncio	3dc02a8604	Merge pull request #13085 from Apokleos/erofs-gpt-vmdk-only runtime-rs: Support erofs snapshotter with gpt vmdk mode	2026-05-25 16:29:59 +02:00
Alex Lyn	2036e66bc3	kata-agent: Integrate GPT partition support into multi-layer handler In GPT mode, all partitions share the same base block device, so resolving it once per uevent source and caching the result avoids redundant hotplug waits that would otherwise scale linearly with layer count. Layers are sorted by partition number before mounting to guarantee correct overlay lowerdir precedence regardless of the order the host emits Storage entries. And it will remove dead_code attributes to mark the codes working. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-25 19:08:31 +08:00
Alex Lyn	17fadde6d8	kata-agent: Add GPT partition utility functions The guest agent needs to resolve individual partition devices from a single GPT-partitioned block device, but the kernel does not always create partition nodes immediately after the base device appears, especially when another fd holds the device open during hot-plug. Add utility functions that handle two problems: (1) Mapping a base device path to its partition path following the kernel naming convention (bare suffix vs 'p' separator). (2) And ensuring the partition node exists before mount. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-25 19:08:31 +08:00
Alex Lyn	8119a561ae	kata-agent: Refactor wait_and_mount_layer to return LayerMountInfo This commit has No functional change — all callers pass None, so every call still resolves the device via uevent exactly as before. It just prepare the multi-layer EROFS handler for GPT partition and dm-verity support by widening the wait_and_mount_layer() interface without changing behavior. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-25 19:08:31 +08:00
Alex Lyn	7086caaddf	kata-agent: Remove unused mode field from MkdirDirective As previous unused codes are with attribute of dead_code which actually are never used, we'd better remove them totally. It will remove the mode field from MkdirDirective structure and also remove its relavent test cases. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-25 19:08:31 +08:00
Alex Lyn	39c512bc36	kata-agent: Enhance virtio block matcher to reject partition uevents Enhance VirtioBlkPciMatcher to only match whole-disk uevents. This prevents the matcher from incorrectly matching partition uevents (e.g., /dev/vdaX) which is critical for partitioned disks where partition uevents appear alongside whole-disk uevents. This commit aims to eliminate such bad cases. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-25 19:08:31 +08:00
Alex Lyn	56f05aa534	kata-agent: Enhance SCSI block device matcher to reject partition uevents Refactor ScsiBlockMatcher to only match whole-disk uevents. This prevents the matcher from incorrectly matching partition uevents (e.g., block/sdd/sdd9) which is critical for partitioned disks where partition uevents appear alongside whole-disk uevents. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-25 19:08:31 +08:00
Fabiano Fidêncio	8787da13a9	agent: Add NUMA-aware PCI path parsing Extend pcipath_from_dev_tree_path() to support the full NUMA-aware path format "root_complex/bus/device" (e.g. "10/00/02") in addition to the legacy "bus/device" format, defaulting to root complex "00" for backward compatibility. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-05-24 22:00:46 +02:00
Markus Rudy	bcd3d6936e	agent: use rtnetlink to add ARP neighbour The rtnetlink crate has had an API for neighbours since 0.11. The last attempt to use this API caused problems on AKS, but looking at it again shows that not all functionality was ported back then (state, flags and lladdr). Attempt the migration again, considering all parameters. Fixes: #11942 Signed-off-by: Markus Rudy <mr@edgeless.systems>	2026-05-18 10:01:29 +02:00
Fabiano Fidêncio	1a4074ab2e	agent: handle encrypted ephemeral storage for CCW block devices VirtioBlkCcwHandler::create_device was calling common_storage_handler directly, bypassing the handle_block_storage function that checks for the encryption_key=ephemeral driver option. This meant that encrypted emptyDir volumes on s390x would attempt a plain mount of the raw block device instead of setting up dm-crypt via the CDH, resulting in an EINVAL mount error. Route CCW block devices through handle_block_storage, matching the pattern used by VirtioBlkPciHandler. Fixes: failed to mount /dev/vda to .../storage/..., EINVAL Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-16 12:07:12 +02:00
Fabiano Fidêncio	8e1d73a4b5	Merge pull request #13052 from burgerdev/abort-later agent: wait for logs before aborting	2026-05-15 23:58:26 +02:00
Markus Rudy	32f2c5c2e4	agent: wait for logs before aborting If the policy loading encounters an error, we `abort(3)` the agent for safety. Since abort causes the process to stop immediately, the async logs might not be flushed yet, and thus won't make it to the runtime, hiding the reason for the abort. Wait a bit before aborting so that the logs are fully written. Fixes: #13031 Signed-off-by: Markus Rudy <mr@edgeless.systems>	2026-05-15 12:36:29 +02:00
Fabiano Fidêncio	d3a9669be5	runtime-rs: implement EncryptedEmptyDirVolume Add the core volume handler for block-encrypted emptyDir support in runtime-rs, bringing it to parity with the Go runtime (PR #10559). When emptydir_mode is set to "block-encrypted", host emptyDir bind mounts are intercepted and handled as follows: 1. A sparse disk image (disk.img) is created inside the emptyDir folder, sized to match the host filesystem capacity. 2. A mountInfo.json is written under the kata direct-volume root with volume_type "blk", fs_type "ext4", and metadata encryptionKey=ephemeral. 3. The disk image is plugged into the guest VM as a virtio-blk device via the hypervisor device manager. 4. An agent::Storage is built with driver_options containing encryption_key=ephemeral and shared=true, so the kata-agent delegates formatting and encryption to CDH using LUKS2. The volume is registered in the dispatch chain before the regular block-volume check, and ephemeral disk metadata is tracked for sandbox-level cleanup at teardown. Also re-exports EMPTYDIR_MODE_* constants from kata-types::config so downstream crates can reference them. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-05-14 22:56:11 +02:00
Alex Lyn	1441b2b84a	runtime-rs: Fix warnings in rust runtime So many unformatted rust codes cause uncommitted change files in rust runtime and its libs or agent sources, which can be easily found just by `cargo fmt --all`. Let's reduce such noisy bad experiences Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-08 14:56:00 +08:00
Alex Lyn	a51e0b630e	agent: Update VFIO device handling for GPU cold-plug Extend the in-guest agent's VFIO device handler to support the cold-plug flow. When the runtime cold-plugs a GPU before the VM boots, the agent needs to bind the device to the vfio-pci driver inside the guest and set up the correct /dev/vfio/ group nodes so the workload can access the GPU. This updates the device discovery logic to handle the PCI topology that QEMU presents for cold-plugged vfio-pci devices and ensures the IOMMU group is properly resolved from the guest's sysfs. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-07 10:33:26 +02:00
Greg Kurz	bb933f65e4	vendor: Remove `make vendor` across the repo `make vendor` isn't required anymore. People who need vendored code should use the `tools/packaging/release/generate_vendor.sh` script instead. Assisted-by: Claude AI Signed-off-by: Greg Kurz <groug@kaod.org>	2026-05-06 09:49:52 +02:00
Markus Rudy	044c96a9d6	agent: remove standard-oci-runtime feature This feature was only added for runk, which was removed entirely in `96e1fb4ca6`. Fixes: #12849 Signed-off-by: Markus Rudy <mr@edgeless.systems>	2026-04-28 10:35:14 +02:00
Spyros Seimenis	d7385eee99	genpolicy: make FileType::from portable across Darwin libc::S_IF* are u16 on Darwin/BSD and u32 on Linux. The match in FileType::from and its tests mix both widths and don't compile on Darwin. Cast everything to u32; on Linux that's a no-op, hence the clippy::unnecessary_cast allow (rust-lang/rust-clippy#6466). Fixes: #12916 Signed-off-by: Spyros Seimenis <sse@edgeless.systems>	2026-04-27 12:14:04 +03:00
Steve Horsman	d5785b4eba	Merge pull request #12872 from stevenhorsman/bump-rust-to-1.93 Bump rust to 1.93	2026-04-27 09:01:00 +01:00
Fabiano Fidêncio	74d9d043f0	agent: raise regorus policy length limits regorus 0.9.0 introduced a hard, per-engine ceiling on parsed-policy size (1024 columns / 1 MiB / 20 000 lines, see lexer.rs:30 in microsoft/regorus). The 1024-column cap rejects realistic policies emitted by `genpolicy`: the `NVIDIA_REQUIRE_CUDA` environment variable on `nvcr.io/nvidia/k8s/cuda-sample` is roughly 1.3 KiB on a single line, so the agent's `set_policy()` returns an error, the agent (PID 1) exits, the guest kernel reboots, and the runtime eventually times out connecting to the agent's vsock. regorus PR #624 ("feat: make policy length limits configurable per engine") adds `Engine::set_policy_length_config`, but it has not been released yet -- the latest published version is still 0.9.1, which predates that change. Pin `regorus` to the upstream commit that includes #624 and call the new setter from `AgentPolicy::new_engine()` with values that comfortably fit any policy we expect to evaluate (64 KiB per line, 16 MiB per file, 200 000 lines) while still rejecting pathological/minified input. Once a regorus release > 0.9.1 ships with #624, the dependency can be moved back to crates.io. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-26 10:18:26 +02:00
Markus Rudy	c8fe6a60d0	genpolicy: update regorus to 0.9.1 The version we used before was released in 2024, it's about time to use a newer version. The new version of the crate comes with a license, which addresses a `cargo deny` finding. Signed-off-by: Markus Rudy <mr@edgeless.systems>	2026-04-26 10:18:26 +02:00
stevenhorsman	d1a20b1887	agent: Fix let_unit_value warning in pipestream tests Remove unnecessary let binding for unit value expression to fix clippy warning in Rust 1.93. Assisted-by: IBM Bob Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-25 11:27:39 +01:00
stevenhorsman	7ab2f0eeb6	agent: Fix needless_borrow warning in container tests Remove unnecessary reference operator from expression that is immediately dereferenced by the compiler to fix clippy warning in Rust 1.93. Assisted-by: IBM Bob Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-25 11:27:39 +01:00

1 2 3 4 5 ...

1484 Commits