Commit Graph

6575 Commits

Author SHA1 Message Date
Charlotte Hartmann Paludo
b4be5fdcca runtime-rs: change safe-path dependency from crates.io to workspace
`safe-path` is resolved from the local workspace in all other workspace
member crates. This commit changes the dependency to a local one for
runtime-rs as well.

Signed-off-by: Charlotte Hartmann Paludo <git@charlotteharludo.com>
Co-authored-by: Markus Rudy <mr@edgeless.systems>
2026-06-18 06:32:06 +02:00
Fabiano Fidêncio
0ddb2ee1f1 Merge pull request #13160 from LandonTClipp/kata_visible_devices
feat(agent): translate KATA_VISIBLE_DEVICES into CDI GPU requests
2026-06-16 19:10:35 +02:00
Fabiano Fidêncio
3ca5742338 Merge pull request #13129 from pmores/fix-default_memory_annontation
runtime-rs: fix default_memory annonation processing
2026-06-16 18:11:19 +02:00
davidweisse
ac56ea21d8 genpolicy: support pod-level resources
Add support for resource requests and limits in the PodSpec.

Fixes #12816

Signed-off-by: davidweisse <98460960+davidweisse@users.noreply.github.com>
2026-06-16 15:30:22 +02:00
Fabiano Fidêncio
774e698aeb Merge pull request #12293 from Apokleos/graceful-errors
runtime-rs: make OOM watcher and signal handling lifecycle-aware
2026-06-16 15:02:54 +02:00
Pavel Mores
9b31e06c20 runtime-rs: bump the byte-unit dependency version
The unit tests added by the previous commit exposed a malfunction of the
byte-unit crate on big-endian systems(*), causing s390x CI to fail.
Bump the dependency's version to include a fix.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2026-06-16 13:15:23 +02:00
Pavel Mores
5ba5046e97 runtime-rs: fix default_memory annonation processing
The annotation value is implicitly in MiB but when presented to the
byte-unit crate this is interpreted as bytes.  When a common value like
2048, meant to mean 2048 MiB but interpreted as 2048 B, is then converted
to MiB the result is zero which is less than the minimal allowable memory
and the runtime fails to launch.

This is fixed by adding a detection whether the annotation value contains
units or not.  If it doesn't it's first converted to MiB and the rest of
the processing then goes like before.

This way we allow for the implicit MiB units when no units are given, thus
keeping compatibility with existing go shim behaviour, while also allowing
for any legal units to be given as well.

We take the opportunity to add some unit tests as well.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2026-06-16 13:15:23 +02:00
LandonTClipp
a1dd28cb52 feat(runtime): plumb VISIBLE_CDI_DEVICES through the Go runtime
Add a `visible_cdi_devices` TOML option to the Go runtime so the
agent.visible_cdi_devices=true kernel parameter is emitted to the guest
when enabled. Wire the option through the NVIDIA GPU configuration
templates and add tests verifying the kernel-params flow.

Signed-off-by: LandonTClipp <lclipp@coreweave.com>
2026-06-16 11:44:09 +02:00
LandonTClipp
b49eb577b2 feat(runtime-rs): expose visible_cdi_devices in config
Declare the `visible_cdi_devices` agent option (kernel param
agent.visible_cdi_devices) in kata-types so runtime-rs can opt into
emitting it to the guest, and expose it in the three NVIDIA GPU
configuration templates (qemu, qemu-snp, qemu-tdx) at runtime-rs/config/.

The agent consumes the corresponding VISIBLE_CDI_DEVICES env var to
drive CDI device requests.

Signed-off-by: LandonTClipp <lclipp@coreweave.com>
2026-06-16 11:44:09 +02:00
LandonTClipp
676fc90d0b feat(agent): translate VISIBLE_CDI_DEVICES into CDI device requests
Add an opt-in `visible_cdi_devices` agent option that lets a container
select which of the VM's CDI-known devices it sees via a
VISIBLE_CDI_DEVICES env var. The schema is `<cdi-kind>=<devices>`
(e.g. "nvidia.com/gpu=all", or "kata.com/gpu=0,1"), with multiple kinds
delimited by ':'.

When enabled, the agent maps the value to CDI device requests and feeds
them through the existing CDI injection path, so device nodes, mounts,
env and createContainer hooks from the guest CDI spec (e.g.
/var/run/cdi/nvidia.yaml, generated by NVRC/nvidia-ctk) are applied.
The variable is intentionally distinct from NVIDIA_VISIBLE_DEVICES and
does not promise identical semantics.

If a requested kind is present in the guest CDI registry but the
specific device index is not, the agent fails fast rather than waiting
for the CDI-spec watch/timeout path. An entirely absent kind falls
through to the existing wait/timeout behavior.

Defaults to false; containers that don't set the env var are unaffected.

Signed-off-by: LandonTClipp <lclipp@coreweave.com>
2026-06-16 11:44:09 +02:00
Alex Lyn
8fc1a16225 runtime-rs: Make signal_process idempotent for exited init processes
Address the issue where signal_process returns an INTERNAL error when
the container's init process has already exited, and ensure teardown
is never aborted by signal failures.

Introduce is_no_such_process_error() to detect "no such process"
conditions (ESRCH/ENOENT codes or equivalent messages). When the init
process is already gone, treat it as success with an info log instead
of an error.

In stop_process(), never propagate signal failures. During sandbox
shutdown the agent connection is often already closed, causing
AgentConnectionClosed errors that bypass is_no_such_process_error().
If stop_process() aborts on such errors, cleanup_container() is skipped
and leftover mounts cause "Resource busy" failures in sandbox cleanup.
Restore "always proceed to cleanup" semantics: log the failure as a
warning, but never skip resource cleanup.

Resource cleanup must be best-effort and idempotent regardless of kill
outcome.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-06-16 15:12:28 +08:00
Alex Lyn
44dd2b1f34 runtime-rs: Refine OOM watcher error reporting for sandbox teardown
This commit refines the error handling within the OOM watcher to
distinguish between genuine failures and errors that occur as a natural
consequence of sandbox shutdown via the helper is_normal_shutdown_error.
Previously, various connection-related errors during teardown were logged
as warnings, contributing to noisy logs.

It aims to improve OOM error handling, distinguish error types:
The logic now differentiates between "normal shutdown" errors (e.g.,
Connection reset by peer, broken pipe) and actual OOM watcher failures.

This enhancement makes OOM event logs more informative and less prone to
clutter during normal sandbox termination.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-16 15:12:24 +08:00
Alex Lyn
3095bd379b runtime-rs: Introduce cancellation for OOM watcher during teardown
This commit introduces an explicit cancellation mechanism for the OOM
watcher loop within VirtSandbox. This addresses the issue where the
watcher continues to poll for OOM events even when the sandbox is being
stopped, leading to spurious "Connection reset by peer" errors.

Key changes:
(1) A CancellationToken is added to VirtSandbox to signal the watcher
loop when the sandbox is undergoing teardown.
(2) The OOM watcher loop in VirtSandbox::start() is now wrapped in a
tokio::select! statement. This allows it to concurrently listen for
two events:
- cancel_token.cancelled(): Triggered when the sandbox/VM is stopping.
- agent.get_oom_event(): The regular OOM event polling.
(3) In the sandbox stop/teardown path, cancel_token.cancel() is called
before stopping the VM. This ensures the OOM watcher loop exits cleanly
via the cancellation token, preventing the occurrence of ECONNRESET/EOF
errors on a closed channel.

This change improves the robustness of OOM event handling during sandbox
lifecycle management.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-16 12:56:54 +08:00
Alex Lyn
0ffdc576d3 runtime-rs: Introduce a helper to check if process/container exists
Returns `true` if the error indicates that the target process/container
no longer exists.

This is used to determine if an operation, like signaling a process,
failed because the target is no longer available. The function checks
for standard OS error codes (`ESRCH`, `ENOENT`) and common error message
patterns.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-16 12:56:54 +08:00
Alex Lyn
59677688ee runtime-rs: Introduce a helper to check normal oom shutdown errors
It mainly for checking if an error is a normal oom shutdown error
due to network disconnection issues.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-16 12:56:54 +08:00
Fabiano Fidêncio
cfab6f496b runtime-rs: Propagate block device read-only flag to the VMM
Block volumes and block-mode device nodes were attached to the guest
read-write regardless of the volume's read-only intent, so the
guest-visible virtio-blk device was always writable.

This matters beyond simple write protection: filesystems such as XFS
inspect the block device read-only state to decide whether to attempt
journal/log recovery. When the device is writable, XFS tries to replay
the log even on a read-only mount, which fails badly. Mounting with
"-o ro" inside the guest is not sufficient; the device itself must
advertise read-only (VIRTIO_BLK_F_RO), which only happens when the VMM
opens the backing image read-only.

Set is_readonly on the block device config from two signals, combined
with OR so either one marks the device read-only:

  - the read-only intent from the OCI spec:
      * bind-mounted block volumes and direct-assigned (raw block)
        volumes derive it from the "ro" mount option, and
      * block-mode volumes (e.g. Kubernetes volumeDevices) arrive as
        device nodes in spec.Linux.Devices with no mount option; their
        intent is expressed only via the cgroup device access in
        spec.Linux.Resources.Devices ("rm" = read+mknod, no write, for
        read-only; "rwm" for read-write). handler_devices() derives the
        flag from the matching cgroup allow rule, and
  - the host block device's own read-only flag (queried via the BLKROGET
    ioctl). Both the volume path (block_volume/rawblock_volume) and the
    device-node path (handler_devices, resolving the host node via
    get_host_path) honor it, so a device that is physically read-only on
    the host is exposed read-only to the guest even when the intent is
    not encoded in the OCI spec.

All in-tree hypervisors (qemu, cloud-hypervisor, dragonball) already
honor BlockConfig.is_readonly, so no hypervisor changes are required.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor
2026-06-15 23:18:36 +02:00
Fabiano Fidêncio
6203e28bef runtime: Propagate block-mode device read-only flag to the VMM
Block-mode volumes (e.g. Kubernetes volumeDevices) are passed to the
container as device nodes in spec.Linux.Devices and carry no mount "ro"
option. Their read-only intent is expressed only via the cgroup device
access in spec.Linux.Resources.Devices ("rm" = read+mknod, no write, for
read-only; "rwm" for read-write).

The device path ignored that signal: newLinuxDeviceInfo() built the
DeviceInfo without ever setting ReadOnly (it only consumed FileMode, the
node permission bits, not the read/write access), so the device was
always attached read-write.

This is problematic for filesystems such as XFS, which inspect the block
device read-only state to decide whether to attempt journal/log recovery.
When the guest device is writable, XFS tries to replay the log even for a
read-only mount, which fails badly. Mounting "-o ro" in the guest is not
enough; the device itself must advertise read-only, which only happens
when the VMM opens the backing device read-only (DeviceInfo.ReadOnly ->
BlockDrive.ReadOnly -> qemu read-only=on / clh Readonly).

Derive the read-only flag from two independent signals, combined with OR
so either one marks the device read-only:

  - the cgroup device access rule that exactly matches the device, so a
    block-mode volume marked read-only by the orchestrator (e.g. a pod
    volume with persistentVolumeClaim.readOnly: true) is honored, and
  - the host block device's own read-only flag (queried via the BLKROGET
    ioctl). Block-mode volumes frequently carry no read-only signal in
    the OCI spec at all, so the device flag is often the only reliable
    source.

The BLKROGET probe is shared (pkg/device/config.BlockDeviceIsReadOnly,
Linux-only with a stub on other platforms) between the device-node path
(newLinuxDeviceInfo, probing /dev/block/<major>:<minor>) and the
bind-mounted/filesystem block path (createDeviceInfo). None of this
relies on external host tooling such as "blockdev --setro".

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor
2026-06-15 23:18:36 +02:00
Thejas N
7807aa3d62 agent: fix get_oom_event deadlock after connection restart
When the agent-protocol-forwarder's inbound connection restarts (e.g.
during a Cloud API Adaptor restart in peer pod environments), the shim
re-sends a GetOOMEvent request through the new connection. Since the
forwarder→agent Unix socket survives the restart, the old handler from
the previous connection remains alive, holding the event_rx lock while
blocked in recv().await.

The new handler acquires the sandbox lock, then attempts to acquire the
event_rx lock — which is held by the old handler. Because the sandbox
lock is still held during this wait, every subsequent RPC
(ExecProcess, WaitProcess, StatsContainer, SignalProcess, etc.) blocks
on the sandbox lock, rendering the pod completely unresponsive.

The root cause is a lock ordering violation: get_oom_event held the
sandbox lock while acquiring the event_rx lock. Fix this by scoping the
sandbox lock acquisition so it is dropped before the event_rx lock is
acquired. The sandbox lock is only needed to clone the Arc<Mutex<Receiver>>
— once cloned, it can be released immediately.

Assisted-by: Claude Code <noreply@anthropic.com>
Signed-off-by: Thejas N <thn@redhat.com>
2026-06-15 07:47:18 +02:00
Fabiano Fidêncio
37c4a0b6a2 Merge pull request #13128 from nikolasgkou/fix/guest-protection-fallback
runtime-rs: don't fail VM start when guest protection detection errors
2026-06-13 08:56:56 +02:00
Fabiano Fidêncio
5efc761002 Merge pull request #13211 from glingy/patch-1
runtime-rs: Fix queue_size of zero in block_rootfs
2026-06-12 22:37:18 +02:00
nikolasgkou
80b8f592a0 runtime-rs: skip guest protection detection for non-confidential guests
prepare_protection_device_config() called available_guest_protection()
unconditionally and propagated any error before the "confidential_guest
is not set" case was handled.

On AMD hosts where the kvm_amd `sev` module parameter is "Y" but the CPU
does not expose the SEV-SNP CPUID bit (8000_001f EAX[4]) -- e.g. consumer
Ryzen -- available_guest_protection() returns Err("SEV not supported"),
which blocked every non-confidential VM from booting even though no
protection was requested.

When confidential_guest is not set there is no reason to probe the host,
so return Ok(None) before calling available_guest_protection(). Detection
(and any error it produces) now runs only when a confidential guest is
actually requested.

Signed-off-by: nikolasgkou <nikolasgkou@disroot.org>
2026-06-12 22:20:13 +02:00
Fabiano Fidêncio
c5d5fc6ee8 Merge pull request #13213 from burgerdev/grpc-probes
genpolicy: add missing probe fields
2026-06-12 19:04:53 +02:00
Gregory Ling
d90178c179 runtime-rs: Fix queue_size of zero in block_rootfs
Fix BlockRootfs to save the queue_size, num_queues, logical_sector_size,
and physical_sector_size of the hypervisor's block device info in the
BlockConfig passed to the vm

Fixes #13210

Signed-off-by: Gregory Ling <17791817+glingy@users.noreply.github.com>
2026-06-12 18:24:50 +02:00
Fabiano Fidêncio
110843d6e1 Merge pull request #13138 from manuelh-dev/mahuber/runt-rs-mem-file-removal
runtime(-rs): remove file_mem_backend config option
2026-06-12 17:13:04 +02:00
Markus Rudy
2e8f61a575 genpolicy: add missing probe fields
This commit adds fields for readiness/liveness/startup probes that were
missing so far, and adds probes to the ignored_fields test to ensure
these stay supported. None of these fields has an influence on the
generated policy, they just allow parsing valid k8s yaml.

Co-authored-by: Spyros Seimenis <sse@edgeless.systems>
Signed-off-by: Markus Rudy <mr@edgeless.systems>
2026-06-12 13:20:16 +02:00
Fupan Li
9553614f32 Merge pull request #12772 from Apokleos/nydus-standalone
runtime-rs: Nydus standalone mode support in runtime-rs
2026-06-12 10:36:17 +08:00
Manuel Huber
70d8f1bf3d runtime: remove file_mem_backend config option
Remove the Go runtime file_mem_backend and valid_file_mem_backends
config knobs, along with the corresponding sandbox annotation handling.

The runtime still enables file-backed shared memory automatically for
virtio-fs by using /dev/shm as the backing directory. This only removes
the user-selectable backend path.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Assisted-by: OpenAI Codex <codex@openai.com>
2026-06-12 00:07:16 +00:00
Manuel Huber
86fd65271c runtime-rs: remove file_mem_backend config option
While the config knob is being parsed, it is being unused in the
rust shim. This renders the config knob useless. Remove the
file_mem_backend config option as there is no current users for it.
As this option is being usable in the go shim, we leave it intact.

For the rust shim, /dev/shm is still being used in a similar way to
the go shim when filesystem sharing is enabled (virtio-fs). Future
use cases where other file_mem_backends are being utilized are
currently planning to define these backends in a similar manner:
based on the configuration/platform, determine the proper file
memory backend, but do not let end users determine the file memory
backend.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-06-12 00:07:16 +00:00
Fabiano Fidêncio
b323697f37 Merge pull request #13111 from Apokleos/monitor-disk-usage
Metrics: Add support for monitoring disk usage via statfs
2026-06-12 00:41:31 +02:00
Alex Lyn
fa84eecd2d runtime-rs: Implement ShareVirtioFsNydus for standalone mode
Introduce `ShareVirtioFsNydus` to enable standalone Nydus rootfs
support. This implementation acts as the bridge between runtime-rs
and the external `nydusd` daemon.

Key Capabilities:
(1) Trait Implementation: Implements `ShareFs` (for VM device/storage) and
  `NydusShareFs` (for RAFS lifecycle) traits.
(2) Daemon Lifecycle Management: Handles `nydusd` spawning, supervision,
  and graceful shutdown.
(3) Native Overlay Support: Configures `nydusd` with `passthrough_fs`
  backend to provide native overlay (upperdir/workdir) support.
(4) API Integration: Utilizes `NydusClient` for granular control over RAFS
  mount/umount operations.
(5) QEMU Integration: Enables `virtio-fs-nydus` device support,
  facilitating standalone mode execution.

This implementation allows Kata containers to utilize an external `nydusd`
process for Nydus rootfs management, providing a cleaner separation between
the runtime and the Nydus daemon lifecycle.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-11 21:42:48 +02:00
Alex Lyn
edfe9ea403 runtime-rs: refine ShareFs abstraction with lifecycle and Nydus traits
Refactor the `ShareFs` trait to improve modularity and support
standalone Nydus mode:

(1) Added `stop()` method to manage daemon teardown.
(2) Introduced a dedicated trait for Nydus-specific data-plane
operations.

This refactoring cleans up the `ShareFs` trait by consolidating
daemon lifecycle handling and isolating Nydus-specific extensions,
paving the way for cleaner standalone Nydus implementation.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-11 21:42:48 +02:00
Alex Lyn
720a8688b4 runtime-rs: Add daemon manager for nydusd process lifecycle
Implement Nydusd to manage nydusd daemon process:
(1) start: spawn process, validate paths, wait for API ready,
    setup passthrough fs.
(2) stop: kill process, cleanup socket files.
(3) mount_rafs/mount_rafs_with_overlay: high-level filesystem
    mount operations.
(4) build_args: construct virtiofs mode command line arguments.

This provides process lifecycle management with internal NydusClient

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-11 21:42:48 +02:00
Alex Lyn
c1ebf269f7 runtime-rs: Add nydus client for nydusd API communication via HTTP
Implement NydusClient to interact with nydusd daemon via Unix
socket:
(1) check_status: query daemon state via GET /api/v1/daemon.
(2) mount/umount: manage filesystem mounts via POST/DELETE
  /api/v1/mount.
(3) wait_until_ready: poll daemon until RUNNING state.

This provides a lightweight, stateless HTTP client layer for nydusd
API.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-11 21:42:48 +02:00
Alex Lyn
4c63b8e3de agent: handle ENOSYS in overlayfs storage handler
In standalone nydusd mode with virtio-fs passthrough, the guest-side
mkdir may fail with ENOSYS. Update the overlayfs storage handler to
skip directory creation when the directory already exists, logging a
warning instead of failing.

This ensures container rootfs setup succeeds when nydusd's native
overlay manages the directory structure.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-11 21:25:18 +02:00
Alex Lyn
8eb564dfb8 kata-sys-util: handle ENOSYS gracefully in mount destination creation
When using virtio-fs with nydusd's passthrough_fs, mkdir operations may
return ENOSYS on certain filesystem configurations. This causes mount
destination creation to fail unexpectedly.

Handle ENOSYS errors gracefully alongside AlreadyExists by verifying the
directory exists after the failed mkdir attempt, allowing the mount to
proceed if the directory is already present.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-11 21:25:18 +02:00
Alex Lyn
b50f803a4e kata-types: add virtio-fs-nydus shared fs configuration support
Add "virtio-fs-nydus" as a recognized shared filesystem type in the
hypervisor configuration. This enables the standalone nydusd mode where
nydusd runs as a separate process alongside virtiofsd.

The key changes:
(1) Add VIRTIO_FS_NYDUS constant for the new shared fs type.
(2) Register virtio-fs-nydus in adjust() and validate() paths, reusing
  the same virtio-fs validation logic since both use vhost-user protocol

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-11 21:25:18 +02:00
Alex Lyn
854e76fb47 kata-types: Enhance related stuff for independent io threads
Refactor comments and tests stuff for independent iothreads.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-11 20:47:20 +02:00
Alex Lyn
b0ebbc685d runtime-rs: Add support for independent iothreads for virtio blk devices
As independent iothreads can work in both virtio-scsi and virtio-blk
devices, this commit aims to enable such feature in virtio-blk-pci
devices.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-11 20:47:20 +02:00
Alex Lyn
980ecfdd96 runtime-rs: Add support iodependent iothreads within virtio-blk
1. Determine iothread for virtio-blk devices, only attach iothread
when:
(1) enable_iothreads is true
(2) indep_iothreads > 0
(3) block driver is not virtio-scsi (i.e., it's
virtio-blk)
And for more complex cases, some enhancements will be done in future

2. Add iothread parameter for virtio-blk devices if specified.
If iothreads set and passed, we will have to set it correctly for
virtio-blk devices via qmp with device_add arguments.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-11 20:47:20 +02:00
Alex Lyn
36e626649d runtime-rs: Add support independent IO threads in qemu cmdline
To make it work well for independent IO threads for virtio-blk devices.
A new method for independent IO threads for virtio-blk hotplug devices
within qemu command line.

Note that as ObjectIoThread has been done for days, it can be directly
reused in this case.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-11 20:47:20 +02:00
Alex Lyn
86d165c0cc kata-types: Introduce a dedicated annotation for indep_iothreads
To make it more flexible when users want to set this feature, one
more way to make it valid is via annotations.

The dedicated annnotation of
"io.katacontainers.config.hypervisor.indep_iothreads" is introduced
within k8s clusters.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-11 20:47:20 +02:00
Alex Lyn
bdc57b16e5 runtime-rs: Add configurable indep_iothreads in configurations
It's useful and helpful to set indep_iothreads with enable_iothreads
for high IO performance. And we need provide an entry for people to
set it if needed.

This commit will introduce two configurable items:
- Makefile: DEFINDEPIOTHREADS when make build.
- configurations: indep_iothreads for people to set.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-11 20:47:20 +02:00
Alex Lyn
d086d324e0 kata-types: Introduce independent IO thread for virtio-blk devices
The 'indep_iothreads' field is introduced in Hypervisor to make it
configurable for number of independent IO threads for virtio-blk
devices. When set to a value greater than 0, creates independent
IO threads that can be attached to virtio-blk devices during hotplug.

Note that it requires 'enable_iothreads' to be true for virtio-blk
devices to use these threads.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-11 20:47:20 +02:00
Alex Lyn
5a00053b38 kata-agent: Implement filesystem space usage collection via statfs
Add update_guest_filesystem_metrics() that collects disk space usage
(total/used/available) for all read-write mounted filesystems inside
the guest VM. This enables monitoring guest disk usage in kata/coco
pod through the existing GetMetrics RPC.

And its output metrics looks like as below:
- kata_guest_filesystem_bytes{mount="/",device="vda",item="total|used|available"}
- kata_guest_filesystem_inodes{mount="/",device="vda",item="total|used|available"}

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-11 20:47:05 +02:00
Alex Lyn
6c66724591 kata-agent: Add filesystem space usage metric declarations
Add two new GaugeVec metrics to expose guest filesystem space usage:
(1) kata_guest_filesystem_bytes{mount, device, item}: space in bytes
  (total/used/available)
(2) kata_guest_filesystem_inodes{mount, device, item}: inode counts
  (total/used/available)

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-11 20:47:05 +02:00
stevenhorsman
fb4600d66a runtime-rs: Fix test breakage
In #13147, for some reason a test block was added in the middle of code
and the code was stale when merged, which meant that a second
`mod test` section was added, breaking our tests. Merge the two
to fix this.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-06-11 19:03:33 +02:00
Fabiano Fidêncio
21657b9cd9 Merge pull request #13147 from manuelh-dev/mahuber/debug-go-rust
runtime-rs: Honor enable_debug for logs and adjust debugging documentation
2026-06-11 08:57:36 +02:00
Hyounggyu Choi
7cc6767fa2 runtime*: use static_sandbox_resource_mgmt defaults for qemu-se
Switch qemu-se config templates to use the TEE/CoCo-specific
static_sandbox_resource_mgmt defaults instead of the generic
QEMU defaults.

qemu-se-runtime-rs config now uses DEFSTATICRESOURCEMGMT_COCO
while runtime qemu-se config now uses DEFSTATICRESOURCEMGMT_TEE.
This aligns static sandbox resource management behavior with confidential
container expectations for qemu-se variants.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2026-06-09 14:45:50 +02:00
Alex Lyn
6500e018c0 Merge pull request #13093 from RainaYL/rainax/tdx_boot_pr
dragonball: Add steps to boot TDX VM
2026-06-09 10:13:57 +08:00
Fabiano Fidêncio
4dc288401e runtime-rs: make sandbox cgroup runtime attach idempotent
The dragonball nerdctl CI job can race when creating and attaching the
runtime process to the sandbox cgroup, surfacing an os error 17
(AlreadyExists) during shim task creation.

Let's retry add_proc once on this pre-existing cgroup condition so
startup remains robust.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Codex <codex@openai.com>
2026-06-08 13:11:34 +02:00