kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-07-01 22:50:54 +00:00

Author	SHA1	Message	Date
Charlotte Hartmann Paludo	b4be5fdcca	runtime-rs: change `safe-path` dependency from crates.io to workspace `safe-path` is resolved from the local workspace in all other workspace member crates. This commit changes the dependency to a local one for runtime-rs as well. Signed-off-by: Charlotte Hartmann Paludo <git@charlotteharludo.com> Co-authored-by: Markus Rudy <mr@edgeless.systems>	2026-06-18 06:32:06 +02:00
Fabiano Fidêncio	0ddb2ee1f1	Merge pull request #13160 from LandonTClipp/kata_visible_devices feat(agent): translate KATA_VISIBLE_DEVICES into CDI GPU requests	2026-06-16 19:10:35 +02:00
Fabiano Fidêncio	3ca5742338	Merge pull request #13129 from pmores/fix-default_memory_annontation runtime-rs: fix default_memory annonation processing	2026-06-16 18:11:19 +02:00
davidweisse	ac56ea21d8	genpolicy: support pod-level resources Add support for resource requests and limits in the PodSpec. Fixes #12816 Signed-off-by: davidweisse <98460960+davidweisse@users.noreply.github.com>	2026-06-16 15:30:22 +02:00
Fabiano Fidêncio	774e698aeb	Merge pull request #12293 from Apokleos/graceful-errors runtime-rs: make OOM watcher and signal handling lifecycle-aware	2026-06-16 15:02:54 +02:00
Pavel Mores	9b31e06c20	runtime-rs: bump the byte-unit dependency version The unit tests added by the previous commit exposed a malfunction of the byte-unit crate on big-endian systems(*), causing s390x CI to fail. Bump the dependency's version to include a fix. Signed-off-by: Pavel Mores <pmores@redhat.com>	2026-06-16 13:15:23 +02:00
Pavel Mores	5ba5046e97	runtime-rs: fix default_memory annonation processing The annotation value is implicitly in MiB but when presented to the byte-unit crate this is interpreted as bytes. When a common value like 2048, meant to mean 2048 MiB but interpreted as 2048 B, is then converted to MiB the result is zero which is less than the minimal allowable memory and the runtime fails to launch. This is fixed by adding a detection whether the annotation value contains units or not. If it doesn't it's first converted to MiB and the rest of the processing then goes like before. This way we allow for the implicit MiB units when no units are given, thus keeping compatibility with existing go shim behaviour, while also allowing for any legal units to be given as well. We take the opportunity to add some unit tests as well. Signed-off-by: Pavel Mores <pmores@redhat.com>	2026-06-16 13:15:23 +02:00
LandonTClipp	a1dd28cb52	feat(runtime): plumb VISIBLE_CDI_DEVICES through the Go runtime Add a `visible_cdi_devices` TOML option to the Go runtime so the agent.visible_cdi_devices=true kernel parameter is emitted to the guest when enabled. Wire the option through the NVIDIA GPU configuration templates and add tests verifying the kernel-params flow. Signed-off-by: LandonTClipp <lclipp@coreweave.com>	2026-06-16 11:44:09 +02:00
LandonTClipp	b49eb577b2	feat(runtime-rs): expose visible_cdi_devices in config Declare the `visible_cdi_devices` agent option (kernel param agent.visible_cdi_devices) in kata-types so runtime-rs can opt into emitting it to the guest, and expose it in the three NVIDIA GPU configuration templates (qemu, qemu-snp, qemu-tdx) at runtime-rs/config/. The agent consumes the corresponding VISIBLE_CDI_DEVICES env var to drive CDI device requests. Signed-off-by: LandonTClipp <lclipp@coreweave.com>	2026-06-16 11:44:09 +02:00
LandonTClipp	676fc90d0b	feat(agent): translate VISIBLE_CDI_DEVICES into CDI device requests Add an opt-in `visible_cdi_devices` agent option that lets a container select which of the VM's CDI-known devices it sees via a VISIBLE_CDI_DEVICES env var. The schema is `<cdi-kind>=<devices>` (e.g. "nvidia.com/gpu=all", or "kata.com/gpu=0,1"), with multiple kinds delimited by ':'. When enabled, the agent maps the value to CDI device requests and feeds them through the existing CDI injection path, so device nodes, mounts, env and createContainer hooks from the guest CDI spec (e.g. /var/run/cdi/nvidia.yaml, generated by NVRC/nvidia-ctk) are applied. The variable is intentionally distinct from NVIDIA_VISIBLE_DEVICES and does not promise identical semantics. If a requested kind is present in the guest CDI registry but the specific device index is not, the agent fails fast rather than waiting for the CDI-spec watch/timeout path. An entirely absent kind falls through to the existing wait/timeout behavior. Defaults to false; containers that don't set the env var are unaffected. Signed-off-by: LandonTClipp <lclipp@coreweave.com>	2026-06-16 11:44:09 +02:00
Alex Lyn	8fc1a16225	runtime-rs: Make signal_process idempotent for exited init processes Address the issue where signal_process returns an INTERNAL error when the container's init process has already exited, and ensure teardown is never aborted by signal failures. Introduce is_no_such_process_error() to detect "no such process" conditions (ESRCH/ENOENT codes or equivalent messages). When the init process is already gone, treat it as success with an info log instead of an error. In stop_process(), never propagate signal failures. During sandbox shutdown the agent connection is often already closed, causing AgentConnectionClosed errors that bypass is_no_such_process_error(). If stop_process() aborts on such errors, cleanup_container() is skipped and leftover mounts cause "Resource busy" failures in sandbox cleanup. Restore "always proceed to cleanup" semantics: log the failure as a warning, but never skip resource cleanup. Resource cleanup must be best-effort and idempotent regardless of kill outcome. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-06-16 15:12:28 +08:00
Alex Lyn	44dd2b1f34	runtime-rs: Refine OOM watcher error reporting for sandbox teardown This commit refines the error handling within the OOM watcher to distinguish between genuine failures and errors that occur as a natural consequence of sandbox shutdown via the helper is_normal_shutdown_error. Previously, various connection-related errors during teardown were logged as warnings, contributing to noisy logs. It aims to improve OOM error handling, distinguish error types: The logic now differentiates between "normal shutdown" errors (e.g., Connection reset by peer, broken pipe) and actual OOM watcher failures. This enhancement makes OOM event logs more informative and less prone to clutter during normal sandbox termination. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-16 15:12:24 +08:00
Alex Lyn	3095bd379b	runtime-rs: Introduce cancellation for OOM watcher during teardown This commit introduces an explicit cancellation mechanism for the OOM watcher loop within VirtSandbox. This addresses the issue where the watcher continues to poll for OOM events even when the sandbox is being stopped, leading to spurious "Connection reset by peer" errors. Key changes: (1) A CancellationToken is added to VirtSandbox to signal the watcher loop when the sandbox is undergoing teardown. (2) The OOM watcher loop in VirtSandbox::start() is now wrapped in a tokio::select! statement. This allows it to concurrently listen for two events: - cancel_token.cancelled(): Triggered when the sandbox/VM is stopping. - agent.get_oom_event(): The regular OOM event polling. (3) In the sandbox stop/teardown path, cancel_token.cancel() is called before stopping the VM. This ensures the OOM watcher loop exits cleanly via the cancellation token, preventing the occurrence of ECONNRESET/EOF errors on a closed channel. This change improves the robustness of OOM event handling during sandbox lifecycle management. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-16 12:56:54 +08:00
Alex Lyn	0ffdc576d3	runtime-rs: Introduce a helper to check if process/container exists Returns `true` if the error indicates that the target process/container no longer exists. This is used to determine if an operation, like signaling a process, failed because the target is no longer available. The function checks for standard OS error codes (`ESRCH`, `ENOENT`) and common error message patterns. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-16 12:56:54 +08:00
Alex Lyn	59677688ee	runtime-rs: Introduce a helper to check normal oom shutdown errors It mainly for checking if an error is a normal oom shutdown error due to network disconnection issues. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-16 12:56:54 +08:00
Fabiano Fidêncio	cfab6f496b	runtime-rs: Propagate block device read-only flag to the VMM Block volumes and block-mode device nodes were attached to the guest read-write regardless of the volume's read-only intent, so the guest-visible virtio-blk device was always writable. This matters beyond simple write protection: filesystems such as XFS inspect the block device read-only state to decide whether to attempt journal/log recovery. When the device is writable, XFS tries to replay the log even on a read-only mount, which fails badly. Mounting with "-o ro" inside the guest is not sufficient; the device itself must advertise read-only (VIRTIO_BLK_F_RO), which only happens when the VMM opens the backing image read-only. Set is_readonly on the block device config from two signals, combined with OR so either one marks the device read-only: - the read-only intent from the OCI spec: * bind-mounted block volumes and direct-assigned (raw block) volumes derive it from the "ro" mount option, and * block-mode volumes (e.g. Kubernetes volumeDevices) arrive as device nodes in spec.Linux.Devices with no mount option; their intent is expressed only via the cgroup device access in spec.Linux.Resources.Devices ("rm" = read+mknod, no write, for read-only; "rwm" for read-write). handler_devices() derives the flag from the matching cgroup allow rule, and - the host block device's own read-only flag (queried via the BLKROGET ioctl). Both the volume path (block_volume/rawblock_volume) and the device-node path (handler_devices, resolving the host node via get_host_path) honor it, so a device that is physically read-only on the host is exposed read-only to the guest even when the intent is not encoded in the OCI spec. All in-tree hypervisors (qemu, cloud-hypervisor, dragonball) already honor BlockConfig.is_readonly, so no hypervisor changes are required. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor	2026-06-15 23:18:36 +02:00
Fabiano Fidêncio	6203e28bef	runtime: Propagate block-mode device read-only flag to the VMM Block-mode volumes (e.g. Kubernetes volumeDevices) are passed to the container as device nodes in spec.Linux.Devices and carry no mount "ro" option. Their read-only intent is expressed only via the cgroup device access in spec.Linux.Resources.Devices ("rm" = read+mknod, no write, for read-only; "rwm" for read-write). The device path ignored that signal: newLinuxDeviceInfo() built the DeviceInfo without ever setting ReadOnly (it only consumed FileMode, the node permission bits, not the read/write access), so the device was always attached read-write. This is problematic for filesystems such as XFS, which inspect the block device read-only state to decide whether to attempt journal/log recovery. When the guest device is writable, XFS tries to replay the log even for a read-only mount, which fails badly. Mounting "-o ro" in the guest is not enough; the device itself must advertise read-only, which only happens when the VMM opens the backing device read-only (DeviceInfo.ReadOnly -> BlockDrive.ReadOnly -> qemu read-only=on / clh Readonly). Derive the read-only flag from two independent signals, combined with OR so either one marks the device read-only: - the cgroup device access rule that exactly matches the device, so a block-mode volume marked read-only by the orchestrator (e.g. a pod volume with persistentVolumeClaim.readOnly: true) is honored, and - the host block device's own read-only flag (queried via the BLKROGET ioctl). Block-mode volumes frequently carry no read-only signal in the OCI spec at all, so the device flag is often the only reliable source. The BLKROGET probe is shared (pkg/device/config.BlockDeviceIsReadOnly, Linux-only with a stub on other platforms) between the device-node path (newLinuxDeviceInfo, probing /dev/block/<major>:<minor>) and the bind-mounted/filesystem block path (createDeviceInfo). None of this relies on external host tooling such as "blockdev --setro". Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor	2026-06-15 23:18:36 +02:00
Thejas N	7807aa3d62	agent: fix get_oom_event deadlock after connection restart When the agent-protocol-forwarder's inbound connection restarts (e.g. during a Cloud API Adaptor restart in peer pod environments), the shim re-sends a GetOOMEvent request through the new connection. Since the forwarder→agent Unix socket survives the restart, the old handler from the previous connection remains alive, holding the event_rx lock while blocked in recv().await. The new handler acquires the sandbox lock, then attempts to acquire the event_rx lock — which is held by the old handler. Because the sandbox lock is still held during this wait, every subsequent RPC (ExecProcess, WaitProcess, StatsContainer, SignalProcess, etc.) blocks on the sandbox lock, rendering the pod completely unresponsive. The root cause is a lock ordering violation: get_oom_event held the sandbox lock while acquiring the event_rx lock. Fix this by scoping the sandbox lock acquisition so it is dropped before the event_rx lock is acquired. The sandbox lock is only needed to clone the Arc<Mutex<Receiver>> — once cloned, it can be released immediately. Assisted-by: Claude Code <noreply@anthropic.com> Signed-off-by: Thejas N <thn@redhat.com>	2026-06-15 07:47:18 +02:00
Fabiano Fidêncio	37c4a0b6a2	Merge pull request #13128 from nikolasgkou/fix/guest-protection-fallback runtime-rs: don't fail VM start when guest protection detection errors	2026-06-13 08:56:56 +02:00
Fabiano Fidêncio	5efc761002	Merge pull request #13211 from glingy/patch-1 runtime-rs: Fix queue_size of zero in block_rootfs	2026-06-12 22:37:18 +02:00
nikolasgkou	80b8f592a0	runtime-rs: skip guest protection detection for non-confidential guests prepare_protection_device_config() called available_guest_protection() unconditionally and propagated any error before the "confidential_guest is not set" case was handled. On AMD hosts where the kvm_amd `sev` module parameter is "Y" but the CPU does not expose the SEV-SNP CPUID bit (8000_001f EAX[4]) -- e.g. consumer Ryzen -- available_guest_protection() returns Err("SEV not supported"), which blocked every non-confidential VM from booting even though no protection was requested. When confidential_guest is not set there is no reason to probe the host, so return Ok(None) before calling available_guest_protection(). Detection (and any error it produces) now runs only when a confidential guest is actually requested. Signed-off-by: nikolasgkou <nikolasgkou@disroot.org>	2026-06-12 22:20:13 +02:00
Fabiano Fidêncio	c5d5fc6ee8	Merge pull request #13213 from burgerdev/grpc-probes genpolicy: add missing probe fields	2026-06-12 19:04:53 +02:00
Gregory Ling	d90178c179	runtime-rs: Fix queue_size of zero in block_rootfs Fix BlockRootfs to save the queue_size, num_queues, logical_sector_size, and physical_sector_size of the hypervisor's block device info in the BlockConfig passed to the vm Fixes #13210 Signed-off-by: Gregory Ling <17791817+glingy@users.noreply.github.com>	2026-06-12 18:24:50 +02:00
Fabiano Fidêncio	110843d6e1	Merge pull request #13138 from manuelh-dev/mahuber/runt-rs-mem-file-removal runtime(-rs): remove file_mem_backend config option	2026-06-12 17:13:04 +02:00
Markus Rudy	2e8f61a575	genpolicy: add missing probe fields This commit adds fields for readiness/liveness/startup probes that were missing so far, and adds probes to the ignored_fields test to ensure these stay supported. None of these fields has an influence on the generated policy, they just allow parsing valid k8s yaml. Co-authored-by: Spyros Seimenis <sse@edgeless.systems> Signed-off-by: Markus Rudy <mr@edgeless.systems>	2026-06-12 13:20:16 +02:00
Fupan Li	9553614f32	Merge pull request #12772 from Apokleos/nydus-standalone runtime-rs: Nydus standalone mode support in runtime-rs	2026-06-12 10:36:17 +08:00
Manuel Huber	70d8f1bf3d	runtime: remove file_mem_backend config option Remove the Go runtime file_mem_backend and valid_file_mem_backends config knobs, along with the corresponding sandbox annotation handling. The runtime still enables file-backed shared memory automatically for virtio-fs by using /dev/shm as the backing directory. This only removes the user-selectable backend path. Signed-off-by: Manuel Huber <manuelh@nvidia.com> Assisted-by: OpenAI Codex <codex@openai.com>	2026-06-12 00:07:16 +00:00
Manuel Huber	86fd65271c	runtime-rs: remove file_mem_backend config option While the config knob is being parsed, it is being unused in the rust shim. This renders the config knob useless. Remove the file_mem_backend config option as there is no current users for it. As this option is being usable in the go shim, we leave it intact. For the rust shim, /dev/shm is still being used in a similar way to the go shim when filesystem sharing is enabled (virtio-fs). Future use cases where other file_mem_backends are being utilized are currently planning to define these backends in a similar manner: based on the configuration/platform, determine the proper file memory backend, but do not let end users determine the file memory backend. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-06-12 00:07:16 +00:00
Fabiano Fidêncio	b323697f37	Merge pull request #13111 from Apokleos/monitor-disk-usage Metrics: Add support for monitoring disk usage via statfs	2026-06-12 00:41:31 +02:00
Alex Lyn	fa84eecd2d	runtime-rs: Implement ShareVirtioFsNydus for standalone mode Introduce `ShareVirtioFsNydus` to enable standalone Nydus rootfs support. This implementation acts as the bridge between runtime-rs and the external `nydusd` daemon. Key Capabilities: (1) Trait Implementation: Implements `ShareFs` (for VM device/storage) and `NydusShareFs` (for RAFS lifecycle) traits. (2) Daemon Lifecycle Management: Handles `nydusd` spawning, supervision, and graceful shutdown. (3) Native Overlay Support: Configures `nydusd` with `passthrough_fs` backend to provide native overlay (upperdir/workdir) support. (4) API Integration: Utilizes `NydusClient` for granular control over RAFS mount/umount operations. (5) QEMU Integration: Enables `virtio-fs-nydus` device support, facilitating standalone mode execution. This implementation allows Kata containers to utilize an external `nydusd` process for Nydus rootfs management, providing a cleaner separation between the runtime and the Nydus daemon lifecycle. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 21:42:48 +02:00
Alex Lyn	edfe9ea403	runtime-rs: refine ShareFs abstraction with lifecycle and Nydus traits Refactor the `ShareFs` trait to improve modularity and support standalone Nydus mode: (1) Added `stop()` method to manage daemon teardown. (2) Introduced a dedicated trait for Nydus-specific data-plane operations. This refactoring cleans up the `ShareFs` trait by consolidating daemon lifecycle handling and isolating Nydus-specific extensions, paving the way for cleaner standalone Nydus implementation. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 21:42:48 +02:00
Alex Lyn	720a8688b4	runtime-rs: Add daemon manager for nydusd process lifecycle Implement Nydusd to manage nydusd daemon process: (1) start: spawn process, validate paths, wait for API ready, setup passthrough fs. (2) stop: kill process, cleanup socket files. (3) mount_rafs/mount_rafs_with_overlay: high-level filesystem mount operations. (4) build_args: construct virtiofs mode command line arguments. This provides process lifecycle management with internal NydusClient Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 21:42:48 +02:00
Alex Lyn	c1ebf269f7	runtime-rs: Add nydus client for nydusd API communication via HTTP Implement NydusClient to interact with nydusd daemon via Unix socket: (1) check_status: query daemon state via GET /api/v1/daemon. (2) mount/umount: manage filesystem mounts via POST/DELETE /api/v1/mount. (3) wait_until_ready: poll daemon until RUNNING state. This provides a lightweight, stateless HTTP client layer for nydusd API. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 21:42:48 +02:00
Alex Lyn	4c63b8e3de	agent: handle ENOSYS in overlayfs storage handler In standalone nydusd mode with virtio-fs passthrough, the guest-side mkdir may fail with ENOSYS. Update the overlayfs storage handler to skip directory creation when the directory already exists, logging a warning instead of failing. This ensures container rootfs setup succeeds when nydusd's native overlay manages the directory structure. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 21:25:18 +02:00
Alex Lyn	8eb564dfb8	kata-sys-util: handle ENOSYS gracefully in mount destination creation When using virtio-fs with nydusd's passthrough_fs, mkdir operations may return ENOSYS on certain filesystem configurations. This causes mount destination creation to fail unexpectedly. Handle ENOSYS errors gracefully alongside AlreadyExists by verifying the directory exists after the failed mkdir attempt, allowing the mount to proceed if the directory is already present. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 21:25:18 +02:00
Alex Lyn	b50f803a4e	kata-types: add virtio-fs-nydus shared fs configuration support Add "virtio-fs-nydus" as a recognized shared filesystem type in the hypervisor configuration. This enables the standalone nydusd mode where nydusd runs as a separate process alongside virtiofsd. The key changes: (1) Add VIRTIO_FS_NYDUS constant for the new shared fs type. (2) Register virtio-fs-nydus in adjust() and validate() paths, reusing the same virtio-fs validation logic since both use vhost-user protocol Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 21:25:18 +02:00
Alex Lyn	854e76fb47	kata-types: Enhance related stuff for independent io threads Refactor comments and tests stuff for independent iothreads. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 20:47:20 +02:00
Alex Lyn	b0ebbc685d	runtime-rs: Add support for independent iothreads for virtio blk devices As independent iothreads can work in both virtio-scsi and virtio-blk devices, this commit aims to enable such feature in virtio-blk-pci devices. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 20:47:20 +02:00
Alex Lyn	980ecfdd96	runtime-rs: Add support iodependent iothreads within virtio-blk 1. Determine iothread for virtio-blk devices, only attach iothread when: (1) enable_iothreads is true (2) indep_iothreads > 0 (3) block driver is not virtio-scsi (i.e., it's virtio-blk) And for more complex cases, some enhancements will be done in future 2. Add iothread parameter for virtio-blk devices if specified. If iothreads set and passed, we will have to set it correctly for virtio-blk devices via qmp with device_add arguments. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 20:47:20 +02:00
Alex Lyn	36e626649d	runtime-rs: Add support independent IO threads in qemu cmdline To make it work well for independent IO threads for virtio-blk devices. A new method for independent IO threads for virtio-blk hotplug devices within qemu command line. Note that as ObjectIoThread has been done for days, it can be directly reused in this case. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 20:47:20 +02:00
Alex Lyn	86d165c0cc	kata-types: Introduce a dedicated annotation for indep_iothreads To make it more flexible when users want to set this feature, one more way to make it valid is via annotations. The dedicated annnotation of "io.katacontainers.config.hypervisor.indep_iothreads" is introduced within k8s clusters. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 20:47:20 +02:00
Alex Lyn	bdc57b16e5	runtime-rs: Add configurable indep_iothreads in configurations It's useful and helpful to set indep_iothreads with enable_iothreads for high IO performance. And we need provide an entry for people to set it if needed. This commit will introduce two configurable items: - Makefile: DEFINDEPIOTHREADS when make build. - configurations: indep_iothreads for people to set. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 20:47:20 +02:00
Alex Lyn	d086d324e0	kata-types: Introduce independent IO thread for virtio-blk devices The 'indep_iothreads' field is introduced in Hypervisor to make it configurable for number of independent IO threads for virtio-blk devices. When set to a value greater than 0, creates independent IO threads that can be attached to virtio-blk devices during hotplug. Note that it requires 'enable_iothreads' to be true for virtio-blk devices to use these threads. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 20:47:20 +02:00
Alex Lyn	5a00053b38	kata-agent: Implement filesystem space usage collection via statfs Add update_guest_filesystem_metrics() that collects disk space usage (total/used/available) for all read-write mounted filesystems inside the guest VM. This enables monitoring guest disk usage in kata/coco pod through the existing GetMetrics RPC. And its output metrics looks like as below: - kata_guest_filesystem_bytes{mount="/",device="vda",item="total\|used\|available"} - kata_guest_filesystem_inodes{mount="/",device="vda",item="total\|used\|available"} Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 20:47:05 +02:00
Alex Lyn	6c66724591	kata-agent: Add filesystem space usage metric declarations Add two new GaugeVec metrics to expose guest filesystem space usage: (1) kata_guest_filesystem_bytes{mount, device, item}: space in bytes (total/used/available) (2) kata_guest_filesystem_inodes{mount, device, item}: inode counts (total/used/available) Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 20:47:05 +02:00
stevenhorsman	fb4600d66a	runtime-rs: Fix test breakage In #13147, for some reason a test block was added in the middle of code and the code was stale when merged, which meant that a second `mod test` section was added, breaking our tests. Merge the two to fix this. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-06-11 19:03:33 +02:00
Fabiano Fidêncio	21657b9cd9	Merge pull request #13147 from manuelh-dev/mahuber/debug-go-rust runtime-rs: Honor enable_debug for logs and adjust debugging documentation	2026-06-11 08:57:36 +02:00
Hyounggyu Choi	7cc6767fa2	runtime*: use static_sandbox_resource_mgmt defaults for qemu-se Switch qemu-se config templates to use the TEE/CoCo-specific static_sandbox_resource_mgmt defaults instead of the generic QEMU defaults. qemu-se-runtime-rs config now uses DEFSTATICRESOURCEMGMT_COCO while runtime qemu-se config now uses DEFSTATICRESOURCEMGMT_TEE. This aligns static sandbox resource management behavior with confidential container expectations for qemu-se variants. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2026-06-09 14:45:50 +02:00
Alex Lyn	6500e018c0	Merge pull request #13093 from RainaYL/rainax/tdx_boot_pr dragonball: Add steps to boot TDX VM	2026-06-09 10:13:57 +08:00
Fabiano Fidêncio	4dc288401e	runtime-rs: make sandbox cgroup runtime attach idempotent The dragonball nerdctl CI job can race when creating and attaching the runtime process to the sandbox cgroup, surfacing an os error 17 (AlreadyExists) during shim task creation. Let's retry add_proc once on this pre-existing cgroup condition so startup remains robust. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Codex <codex@openai.com>	2026-06-08 13:11:34 +02:00

1 2 3 4 5 ...

6575 Commits