Commit Graph

1484 Commits

Author SHA1 Message Date
Alex Lyn
b2d0e5b712 kata-agent: Use kata-types dmverity with optional devicemapper support
Replace the agent's inline devicemapper implementation with the libs
kata-types::dmverity module. The agent's devicemapper Cargo feature
now forwards to kata-types/devicemapper, removing the direct
libdevmapper link dependency from the agent crate. Gate all dm-verity
imports, constants, and call sites behind libdevmapper.

Add USE_DEVMAPPER Makefile variable (default no) that appends the
devicemapper feature flag and forces LIBC=gnu when enabled.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-26 09:51:05 +08:00
Alex Lyn
274a904bf7 kata-agent: Mount multi-layer EROFS partitions concurrently
This commit is just a enhancement without any functionality changes.

Replace the sequential loop in handle_multi_layer_erofs_group with
join_all-based concurrent mounting. Base device paths and mount
directories are pre-resolved before spawning futures to avoid lock
contention. On partial failure, successfully mounted layers are
unmounted and dm-verity devices cleaned up before propagating the
error.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-26 09:51:05 +08:00
Alex Lyn
51e8310ef3 kata-agent: Integrate dm-verity into multi-layer EROFS mount path
Wire the dm-verity helpers into the layer mount flow so that GPT
partitions carrying verity metadata are mounted through a verified
device-mapper target instead of the raw partition.

Refactor wait_and_mount_layer to resolve partition path and verity
device as separate steps: create a dm-verity device when
X-kata.dmverity-enabled=true is set, fall back to direct partition
mount otherwise, and return the verity device path for cleanup
tracking.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-26 09:51:05 +08:00
Alex Lyn
963ba6c6cd kata-agent: Add dm-verity device cleanup for GPT-partitioned layers
Add per-container verity_devices tracking in Sandbox and wire the
teardown path: destroy_partition_dmverity_device removes the
device-mapper target via deferred-remove ioctl and deletes the mknod
node, cleanup_dmverity_devices iterates all devices in reverse order.

Wire into remove_container_resources (rpc.rs) so verity devices are
torn down after unmount, and record verity device paths in
add_storages (storage/mod.rs) for tracking.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-26 09:51:05 +08:00
Alex Lyn
dce409bc35 kata-agent: Add dm-verity device creation for GPT-partitioned layers
GPT-partitioned EROFS layers can carry dm-verity hashes appended after
the filesystem data within the same partition. The host runtime passes
the root hash and parameters as X-kata.dmverity.* storage options; the
agent must set up the kernel dm-verity target before mounting so that
every read is integrity-checked against the Merkle tree.

Implement dm-verity device creation: option parsing from storage
options, device name generation, and create helper via devicemapper
ioctls with hash_start_block calculation (accounting for v1 superblock
presence).

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-26 09:51:05 +08:00
Alex Lyn
e900eae388 kata-agent: Add no-udev DmOptions builders and mknod device node helpers
The kata guest VM runs without udev, so device-mapper nodes under
/dev/mapper are never created automatically. Add the foundational
helpers that subsequent dm-verity integration will rely on:

It focus on the following key points:
(1) DmOptions builders that disable all udev synchronization flags,
  with read-only and deferred-remove variants.
(2) mknod-based device node creation/removal under /dev/mapper, since
  devtmpfs nodes are not auto-created without udev.

Also add the devicemapper crate dependency (default-features = false).

But note that the commit depends on device mapper with no-udev support
with the PR:https://github.com/stratis-storage/devicemapper-rs/pull/1036

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-26 09:51:05 +08:00
Chris Ayoub
4e3d257dc0 agent: Apply init subcgroup in cgroupfs manager
When cgroup v2 is enabled, exec can fail with EBUSY while writing the
process to cgroup.procs if the container process has been delegated to an
init subcgroup.

PR #10845 fixed this behavior for the systemd/D-Bus cgroup manager
path, which was related to #10733. The cgroupfs manager still writes the
process directly to the container cgroup, so apply the same init
subcgroup handling there.

Also fix the cgroupfs init-subcgroup existence check for absolute OCI
cgroup paths by joining the trimmed cgroup path under the cgroup root.

Fixes: #9701

Signed-off-by: Chris Ayoub <cayoub@openai.com>

Generated-By: OpenAI Codex
2026-06-24 21:25:49 +00:00
Alex Lyn
9550a323ac Merge pull request #13245 from kata-containers/unify-nix-version
Unify nix version
2026-06-22 15:25:10 +08:00
PiotrProkop
c2d737c9d7 agent: report 128+signal as exit code for signal-terminated processes
When a container process is terminated by a signal, the agent's SIGCHLD
reaper stored the raw signal number as the process exit code. As a result
a process killed by SIGKILL(9) reported exit code 9 instead of the
conventional 137 (128+9).

Apply the standard shell convention of 128+signal_number so that
signal-terminated processes report the expected exit codes, e.g.
SIGKILL(9) -> 137, SIGTERM(15) -> 143, SIGINT(2) -> 130. This mimics
runc, which encodes wait-status exit codes the same way:
https://github.com/opencontainers/runc/blob/v1.4.3/libcontainer/utils/utils.go#L19

Both runc and this new Kata behaviour follow the conventional exit code
semantics documented at https://tldp.org/LDP/abs/html/exitcodes.html.

The conversion is factored into a small helper and covered by a unit
test. The runtime and shim already pass the exit code through unchanged,
so no further changes are needed for the corrected value to surface.

Fixes: signal-terminated containers reporting raw signal numbers

Signed-off-by: PiotrProkop <pprokop@nvidia.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-21 16:34:17 +02:00
stevenhorsman
531877f28f deps: Upgrade nix crate from 0.26.4 to 0.31.3
Upgrade the nix crate across the workspace to version 0.30.1 to address
security vulnerabilities and adopt safer file descriptor handling patterns.

### Breaking Changes in nix 0.28.0

1. **File Descriptor Type Changes**
   - Functions now return `OwnedFd` instead of `RawFd` (i32)
   - Functions requiring file descriptors now expect types implementing `AsFd` trait
   - This provides RAII-based automatic cleanup and prevents fd leaks

2. **API Signature Changes**
   - `pipe()`, `pipe2()`, `openpty()` now return `OwnedFd` tuples
   - `socket()` returns `OwnedFd` instead of `RawFd`
   - `open()`, `memfd_create()` return `OwnedFd`
   - `setns()`, `write()`, `fcntl()` require `AsFd` trait
   - `madvise()` requires `NonNull<c_void>` instead of raw pointer
   - `bind()`, `listen()`, `connect()` require `AsFd` and `Backlog` type

3. **Module Feature Flags**
   - Modules now require explicit feature flags (mman, reboot, etc.)

### Additional Breaking Changes in nix 0.30.1

1. **symlinkat() API Change**
   - `dirfd` parameter now requires `AsFd` trait instead of `Option<RawFd>`
   - Use `BorrowedFd::borrow_raw(libc::AT_FDCWD)` for current directory

2. **Type Alias Deprecation**
   - `MemFdCreateFlag` renamed to `MFdFlags` for consistency

### Changes Made

**Workspace Configuration (Cargo.toml)**
- Updated nix to 0.30.1 with features: fs, mount, sched, process, ioctl,
  signal, socket, feature, user, hostname, term, event, mman, reboot

**File Descriptor Handling Patterns**
- Use `BorrowedFd::borrow_raw(raw_fd)` to wrap RawFd for AsFd requirements
- Use `.as_fd().as_raw_fd()` to extract raw fd without ownership transfer
- Use `.into_raw_fd()` only when ownership transfer is needed
- Use `NonNull::new().unwrap()` for madvise pointer conversion

**Deprecated API Replacements**
- `eventfd()` → `EventFd::from_value_and_flags()`
- `Errno::from_i32()` → `Errno::from_raw()`
- `listen(fd, backlog)` → `listen(&fd, Backlog::new(backlog).unwrap())`
- `MemFdCreateFlag` → `MFdFlags`

Generated by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-06-19 03:49:16 -07:00
SantoshMadhukar-K
736e07d18e test: Improve test coverage for device handlers
Add comprehensive test coverage for the device handler modules under
src/agent/src/device, including matcher behavior, edge cases, and
shared helper coverage across block, network, nvdimm, scsi, and vfio
device paths.

Assisted-by: IBM Bob

Signed-off-by: SantoshMadhukar-K <SantoshMadhukar.Khandyana@ibm.com>
2026-06-18 07:18:36 -07:00
LandonTClipp
676fc90d0b feat(agent): translate VISIBLE_CDI_DEVICES into CDI device requests
Add an opt-in `visible_cdi_devices` agent option that lets a container
select which of the VM's CDI-known devices it sees via a
VISIBLE_CDI_DEVICES env var. The schema is `<cdi-kind>=<devices>`
(e.g. "nvidia.com/gpu=all", or "kata.com/gpu=0,1"), with multiple kinds
delimited by ':'.

When enabled, the agent maps the value to CDI device requests and feeds
them through the existing CDI injection path, so device nodes, mounts,
env and createContainer hooks from the guest CDI spec (e.g.
/var/run/cdi/nvidia.yaml, generated by NVRC/nvidia-ctk) are applied.
The variable is intentionally distinct from NVIDIA_VISIBLE_DEVICES and
does not promise identical semantics.

If a requested kind is present in the guest CDI registry but the
specific device index is not, the agent fails fast rather than waiting
for the CDI-spec watch/timeout path. An entirely absent kind falls
through to the existing wait/timeout behavior.

Defaults to false; containers that don't set the env var are unaffected.

Signed-off-by: LandonTClipp <lclipp@coreweave.com>
2026-06-16 11:44:09 +02:00
Thejas N
7807aa3d62 agent: fix get_oom_event deadlock after connection restart
When the agent-protocol-forwarder's inbound connection restarts (e.g.
during a Cloud API Adaptor restart in peer pod environments), the shim
re-sends a GetOOMEvent request through the new connection. Since the
forwarder→agent Unix socket survives the restart, the old handler from
the previous connection remains alive, holding the event_rx lock while
blocked in recv().await.

The new handler acquires the sandbox lock, then attempts to acquire the
event_rx lock — which is held by the old handler. Because the sandbox
lock is still held during this wait, every subsequent RPC
(ExecProcess, WaitProcess, StatsContainer, SignalProcess, etc.) blocks
on the sandbox lock, rendering the pod completely unresponsive.

The root cause is a lock ordering violation: get_oom_event held the
sandbox lock while acquiring the event_rx lock. Fix this by scoping the
sandbox lock acquisition so it is dropped before the event_rx lock is
acquired. The sandbox lock is only needed to clone the Arc<Mutex<Receiver>>
— once cloned, it can be released immediately.

Assisted-by: Claude Code <noreply@anthropic.com>
Signed-off-by: Thejas N <thn@redhat.com>
2026-06-15 07:47:18 +02:00
Fupan Li
9553614f32 Merge pull request #12772 from Apokleos/nydus-standalone
runtime-rs: Nydus standalone mode support in runtime-rs
2026-06-12 10:36:17 +08:00
Alex Lyn
4c63b8e3de agent: handle ENOSYS in overlayfs storage handler
In standalone nydusd mode with virtio-fs passthrough, the guest-side
mkdir may fail with ENOSYS. Update the overlayfs storage handler to
skip directory creation when the directory already exists, logging a
warning instead of failing.

This ensures container rootfs setup succeeds when nydusd's native
overlay manages the directory structure.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-11 21:25:18 +02:00
Alex Lyn
5a00053b38 kata-agent: Implement filesystem space usage collection via statfs
Add update_guest_filesystem_metrics() that collects disk space usage
(total/used/available) for all read-write mounted filesystems inside
the guest VM. This enables monitoring guest disk usage in kata/coco
pod through the existing GetMetrics RPC.

And its output metrics looks like as below:
- kata_guest_filesystem_bytes{mount="/",device="vda",item="total|used|available"}
- kata_guest_filesystem_inodes{mount="/",device="vda",item="total|used|available"}

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-11 20:47:05 +02:00
Alex Lyn
6c66724591 kata-agent: Add filesystem space usage metric declarations
Add two new GaugeVec metrics to expose guest filesystem space usage:
(1) kata_guest_filesystem_bytes{mount, device, item}: space in bytes
  (total/used/available)
(2) kata_guest_filesystem_inodes{mount, device, item}: inode counts
  (total/used/available)

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-11 20:47:05 +02:00
manuelh-dev
953b306ff3 Merge pull request #12979 from manuelh-dev/mahuber/erofs-tmpfs-mount
runtime-rs/agent: support EROFS snapshots without a rwlayer
2026-05-29 13:50:27 -07:00
Fabiano Fidêncio
91df041803 agent: expose guest InfiniBand devices to VFIO containers
When a VF is cold-plugged in guest-kernel mode, mlx5_core binds to the
PCI device inside the VM and mlx5_ib creates IB character devices under
/dev/infiniband/ (uverbs*, rdma_cm, umad*). The container cannot reach
these devices unless they are explicitly added to its OCI spec.

Add expose_guest_infiniband_devices(), called from create_devices() when
the container carries at least one VFIO device entry. The function:

  - Walks /dev/infiniband/ inside the guest VM.
  - Appends each char device to spec.linux.devices.
  - Inserts matching cgroup allow rules (rwm).
  - Is a no-op if /dev/infiniband/ is absent or empty (no IB driver,
    or VF not yet rebound), so non-RDMA pods are unaffected.

Gate the call on container_has_vfio_device() so unrelated containers
sharing the sandbox do not get IB device access widened.

Add is_vfio_device_type() and snapshot_infiniband() to
kata-sys-util/pcilibs. is_vfio_device_type() lets the agent check
device type strings against the VFIO driver name constants without
duplication. snapshot_infiniband() summarises /sys/class/infiniband,
/sys/class/infiniband_verbs, and /dev/infiniband as a single diagnostic
string for log context; it lives in pcilibs because it has no
agent-specific dependencies (pure sysfs/devfs reads).

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-05-29 13:07:45 +02:00
Fabiano Fidêncio
9893b6dc03 runtime: correctly resolve cold-plug VFIO guest PCI paths
Populate missing VFIO guest PCI paths via QMP before serializing
container devices so guest-kernel PCI env translation has the mappings
it needs.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-05-28 21:54:52 +02:00
Fabiano Fidêncio
118b7fa611 agent: reconcile VFIO netdev MAC before UpdateInterface lookup
When a VFIO cold-plugged network device appears in guest with a
different MAC than the runtime request, resolve the netdev by PCI path
and apply the requested MAC before the normal by-MAC update flow.

This preserves existing behavior while avoiding UpdateInterface
mismatches in SR-IOV cold-plug cases.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-05-28 21:54:52 +02:00
Fabiano Fidêncio
e89eb77245 agent: keep PCIDEVICE env unchanged when pcimap is missing
Avoid failing container creation when per-container PCI mappings are
unavailable by preserving PCIDEVICE entries unchanged and warning
instead.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-28 21:54:52 +02:00
Manuel Huber
4fbfba2f79 agent: support run-backed EROFS upper
Support multi-layer EROFS storage without an explicit ext4 upper
layer. When runtime-rs sends only EROFS lower storage and overlay
metadata, create the overlay upper/work directories under the
container bundle in /run/kata-containers.

Keep the explicit ext4 rwlayer path for disk-backed snapshots, and
only track real temporary mount points for cleanup. The implicit
/run-backed upper is bundle-scoped state and is removed with the
container bundle.

Assisted-by: OpenAI Codex <codex@openai.com>
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-05-27 17:12:20 +00:00
Fabiano Fidêncio
5adfb27297 Merge pull request #13118 from PiotrProkop/fix-missing-cwd
agent: restore process CWD auto-creation
2026-05-27 13:32:05 +02:00
PiotrProkop
60a2e27f02 agent: Restore process CWD auto-creation
Commit b56313472 ("agent: Align agent OCI spec with oci-spec-rs",
PR #9944) inverted the condition guarding the create_dir_all call
for process.cwd: the leading `!` was dropped during the refactor.
As a result, the CWD is created only when process.cwd is the empty
string.

When the guest then runs chdir(process.cwd) and CWD doesn't exist
it returns ENOENT.  The agent propagates that to the shim, which
surfaces it to containerd as "failed to create shim task: ENOENT:
No such file or directory" — indistinguishable from a missing
argv[0].
This regressed the original fix in PR #2375 (Fixes #2374), which
deliberately mirrored runc's behavior.  Put the `!` back.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2026-05-27 09:59:15 +02:00
Manuel Huber
e838cd7d8d agent: compact EROFS overlay lowerdirs
Use kata_types::mount::Mount for the final multi-layer EROFS
overlay mount instead of calling baremount() directly.

The mount helper detects overlay option strings close to the kernel
mount data limit. When lowerdir entries share a common parent, it
changes into that directory and rewrites lowerdir to relative paths.
That avoids repeating the same long prefix for every layer.

Multi-layer EROFS images can have many lower layers under
/run/kata-containers/<cid>/multi-layer. Passing the raw absolute
lowerdir list can exceed the mount option buffer and fail the final
overlay mount, even after all layer devices mounted successfully.

Reuse the helper so this path follows Kata's normal overlay mount
handling, including lowerdir compaction before mount(2).

Assisted-by: OpenAI Codex <codex@openai.com>
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-05-26 18:42:11 +00:00
Dan Mihai
c81dadaba1 Merge pull request #13064 from burgerdev/add-arp-neighbour
agent: use rtnetlink to add ARP neighbour
2026-05-26 09:59:44 -07:00
Fabiano Fidêncio
3dc02a8604 Merge pull request #13085 from Apokleos/erofs-gpt-vmdk-only
runtime-rs: Support erofs snapshotter with gpt vmdk mode
2026-05-25 16:29:59 +02:00
Alex Lyn
2036e66bc3 kata-agent: Integrate GPT partition support into multi-layer handler
In GPT mode, all partitions share the same base block device, so
resolving it once per uevent source and caching the result avoids
redundant hotplug waits that would otherwise scale linearly with
layer count.

Layers are sorted by partition number before mounting to guarantee
correct overlay lowerdir precedence regardless of the order the host
emits Storage entries.

And it will remove dead_code attributes to mark the codes working.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-05-25 19:08:31 +08:00
Alex Lyn
17fadde6d8 kata-agent: Add GPT partition utility functions
The guest agent needs to resolve individual partition devices from a
single GPT-partitioned block device, but the kernel does not always
create partition nodes immediately after the base device appears,
especially when another fd holds the device open during hot-plug.

Add utility functions that handle two problems:
(1) Mapping a base device path to its partition path following the
kernel naming convention (bare suffix vs 'p' separator).
(2) And ensuring the partition node exists before mount.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-05-25 19:08:31 +08:00
Alex Lyn
8119a561ae kata-agent: Refactor wait_and_mount_layer to return LayerMountInfo
This commit has No functional change — all callers pass None, so
every call still resolves the device via uevent exactly as before.

It just prepare the multi-layer EROFS handler for GPT partition and
dm-verity support by widening the wait_and_mount_layer() interface
without changing behavior.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-05-25 19:08:31 +08:00
Alex Lyn
7086caaddf kata-agent: Remove unused mode field from MkdirDirective
As previous unused codes are with attribute of dead_code which
actually are never used, we'd better remove them totally.

It will remove the mode field from MkdirDirective structure and
also remove its relavent test cases.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-05-25 19:08:31 +08:00
Alex Lyn
39c512bc36 kata-agent: Enhance virtio block matcher to reject partition uevents
Enhance VirtioBlkPciMatcher to only match whole-disk uevents. This
prevents the matcher from incorrectly matching partition uevents
(e.g., /dev/vdaX) which is critical for partitioned disks where
partition uevents appear alongside whole-disk uevents.

This commit aims to eliminate such bad cases.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-05-25 19:08:31 +08:00
Alex Lyn
56f05aa534 kata-agent: Enhance SCSI block device matcher to reject partition uevents
Refactor ScsiBlockMatcher to only match whole-disk uevents. This
prevents the matcher from incorrectly matching partition uevents
(e.g., block/sdd/sdd9) which is critical for partitioned disks
where partition uevents appear alongside whole-disk uevents.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-05-25 19:08:31 +08:00
Fabiano Fidêncio
8787da13a9 agent: Add NUMA-aware PCI path parsing
Extend pcipath_from_dev_tree_path() to support the full NUMA-aware path
format "root_complex/bus/device" (e.g. "10/00/02") in addition to the
legacy "bus/device" format, defaulting to root complex "00" for backward
compatibility.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-05-24 22:00:46 +02:00
Markus Rudy
bcd3d6936e agent: use rtnetlink to add ARP neighbour
The rtnetlink crate has had an API for neighbours since 0.11. The last
attempt to use this API caused problems on AKS, but looking at it again
shows that not all functionality was ported back then (state, flags and
lladdr). Attempt the migration again, considering all parameters.

Fixes: #11942

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2026-05-18 10:01:29 +02:00
Fabiano Fidêncio
1a4074ab2e agent: handle encrypted ephemeral storage for CCW block devices
VirtioBlkCcwHandler::create_device was calling common_storage_handler
directly, bypassing the handle_block_storage function that checks for
the encryption_key=ephemeral driver option. This meant that encrypted
emptyDir volumes on s390x would attempt a plain mount of the raw block
device instead of setting up dm-crypt via the CDH, resulting in an
EINVAL mount error.

Route CCW block devices through handle_block_storage, matching the
pattern used by VirtioBlkPciHandler.

Fixes: failed to mount /dev/vda to .../storage/..., EINVAL

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-16 12:07:12 +02:00
Fabiano Fidêncio
8e1d73a4b5 Merge pull request #13052 from burgerdev/abort-later
agent: wait for logs before aborting
2026-05-15 23:58:26 +02:00
Markus Rudy
32f2c5c2e4 agent: wait for logs before aborting
If the policy loading encounters an error, we `abort(3)` the agent for
safety. Since abort causes the process to stop immediately, the async
logs might not be flushed yet, and thus won't make it to the runtime,
hiding the reason for the abort. Wait a bit before aborting so that the
logs are fully written.

Fixes: #13031

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2026-05-15 12:36:29 +02:00
Fabiano Fidêncio
d3a9669be5 runtime-rs: implement EncryptedEmptyDirVolume
Add the core volume handler for block-encrypted emptyDir support
in runtime-rs, bringing it to parity with the Go runtime (PR #10559).

When emptydir_mode is set to "block-encrypted", host emptyDir bind
mounts are intercepted and handled as follows:

  1. A sparse disk image (disk.img) is created inside the emptyDir
     folder, sized to match the host filesystem capacity.
  2. A mountInfo.json is written under the kata direct-volume root
     with volume_type "blk", fs_type "ext4", and metadata
     encryptionKey=ephemeral.
  3. The disk image is plugged into the guest VM as a virtio-blk
     device via the hypervisor device manager.
  4. An agent::Storage is built with driver_options containing
     encryption_key=ephemeral and shared=true, so the kata-agent
     delegates formatting and encryption to CDH using LUKS2.

The volume is registered in the dispatch chain before the regular
block-volume check, and ephemeral disk metadata is tracked for
sandbox-level cleanup at teardown.

Also re-exports EMPTYDIR_MODE_* constants from kata-types::config
so downstream crates can reference them.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-05-14 22:56:11 +02:00
Alex Lyn
1441b2b84a runtime-rs: Fix warnings in rust runtime
So many unformatted rust codes cause uncommitted change files in
rust runtime and its libs or agent sources, which can be easily
found just by `cargo fmt --all`.

Let's reduce such noisy bad experiences

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-05-08 14:56:00 +08:00
Alex Lyn
a51e0b630e agent: Update VFIO device handling for GPU cold-plug
Extend the in-guest agent's VFIO device handler to support the cold-plug
flow. When the runtime cold-plugs a GPU before the VM boots, the agent
needs to bind the device to the vfio-pci driver inside the guest and
set up the correct /dev/vfio/ group nodes so the workload can access
the GPU.

This updates the device discovery logic to handle the PCI topology that
QEMU presents for cold-plugged vfio-pci devices and ensures the IOMMU
group is properly resolved from the guest's sysfs.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-07 10:33:26 +02:00
Greg Kurz
bb933f65e4 vendor: Remove make vendor across the repo
`make vendor` isn't required anymore. People who need vendored code should
use the `tools/packaging/release/generate_vendor.sh` script instead.

Assisted-by: Claude AI
Signed-off-by: Greg Kurz <groug@kaod.org>
2026-05-06 09:49:52 +02:00
Markus Rudy
044c96a9d6 agent: remove standard-oci-runtime feature
This feature was only added for runk, which was removed entirely in
96e1fb4ca6.

Fixes: #12849
Signed-off-by: Markus Rudy <mr@edgeless.systems>
2026-04-28 10:35:14 +02:00
Spyros Seimenis
d7385eee99 genpolicy: make FileType::from portable across Darwin
libc::S_IF* are u16 on Darwin/BSD and u32 on Linux. The match in
FileType::from and its tests mix both widths and don't compile on
Darwin. Cast everything to u32; on Linux that's a no-op, hence the
clippy::unnecessary_cast allow (rust-lang/rust-clippy#6466).

Fixes: #12916

Signed-off-by: Spyros Seimenis <sse@edgeless.systems>
2026-04-27 12:14:04 +03:00
Steve Horsman
d5785b4eba Merge pull request #12872 from stevenhorsman/bump-rust-to-1.93
Bump rust to 1.93
2026-04-27 09:01:00 +01:00
Fabiano Fidêncio
74d9d043f0 agent: raise regorus policy length limits
regorus 0.9.0 introduced a hard, per-engine ceiling on parsed-policy
size (1024 columns / 1 MiB / 20 000 lines, see lexer.rs:30 in
microsoft/regorus). The 1024-column cap rejects realistic policies
emitted by `genpolicy`: the `NVIDIA_REQUIRE_CUDA` environment variable
on `nvcr.io/nvidia/k8s/cuda-sample` is roughly 1.3 KiB on a single line,
so the agent's `set_policy()` returns an error, the agent (PID 1) exits,
the guest kernel reboots, and the runtime eventually times out
connecting to the agent's vsock.

regorus PR #624 ("feat: make policy length limits configurable per
engine") adds `Engine::set_policy_length_config`, but it has not been
released yet -- the latest published version is still 0.9.1, which
predates that change.

Pin `regorus` to the upstream commit that includes #624 and call the
new setter from `AgentPolicy::new_engine()` with values that comfortably
fit any policy we expect to evaluate (64 KiB per line, 16 MiB per file,
200 000 lines) while still rejecting pathological/minified input. Once
a regorus release > 0.9.1 ships with #624, the dependency can be moved
back to crates.io.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-26 10:18:26 +02:00
Markus Rudy
c8fe6a60d0 genpolicy: update regorus to 0.9.1
The version we used before was released in 2024, it's about time to use
a newer version. The new version of the crate comes with a license,
which addresses a `cargo deny` finding.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2026-04-26 10:18:26 +02:00
stevenhorsman
d1a20b1887 agent: Fix let_unit_value warning in pipestream tests
Remove unnecessary let binding for unit value expression to fix clippy
warning in Rust 1.93.

Assisted-by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-25 11:27:39 +01:00
stevenhorsman
7ab2f0eeb6 agent: Fix needless_borrow warning in container tests
Remove unnecessary reference operator from expression that is
immediately dereferenced by the compiler to fix clippy warning in
Rust 1.93.

Assisted-by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-25 11:27:39 +01:00