kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-07-02 15:09:45 +00:00

Author	SHA1	Message	Date
Aurélien Bombo	e4fbddb91a	ci: rename cloud-hypervisor to clh-runtime-rs This aligns on qemu-runtime-rs and makes more sense. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-04-28 10:58:01 -05:00
Fabiano Fidêncio	19bb8746f8	runtime-rs: rescan network at Start RPC for Docker 26+ Docker 26+ configures the container's veth pair between the Create and Start RPCs by bind-mounting `/proc/<vmm_pid>/ns/net`. The Rust shim's network scan during sandbox creation finds no interfaces because they don't exist yet. The Go shim (commit `f7878cc`) solves this with `detectHypervisorNetns` inside `addAllEndpoints`: when the placeholder netns is empty, it switches to the hypervisor's network namespace and rescans there. Port this approach to the Rust shim: - Add `rescan_network()` to the `Sandbox` trait - Implement it on `VirtSandbox`: build a rescan config that always targets the hypervisor's netns (`/proc/<vmm_pid>/ns/net`), bypassing the placeholder netns and the `network_created` flag - Call `sandbox.rescan_network()` synchronously in the `StartProcess` handler, before `cm.start_process()`, so interfaces are wired before the container process runs Fixes: #9340 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-28 10:20:18 +02:00
Fabiano Fidêncio	0da2f00488	runtime-rs: resource: add network rescan polling for Docker 26+ Docker 26+ configures veth pairs in the hypervisor's network namespace between the Create and Start RPCs. The initial network scan during sandbox creation finds no interfaces because they do not exist yet. Add `rescan_network_if_unconfigured` which polls the network namespace (50ms intervals, 5s timeout) until interfaces appear, then pushes the configuration to the guest agent. This mirrors the Go runtime's `RescanNetwork` (commit `f7878cc`). Supporting changes: - Derive `Clone` on `NetworkWithNetNsConfig` so it can be reused across poll iterations - Add `tokio/time` feature to the resource crate - Add `apply_network_to_agent` helper to push interfaces, routes, and neighbors to the guest Fixes: #9340 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-28 10:20:18 +02:00
Fabiano Fidêncio	67679ddd15	runtime-rs: detect Docker 26+ netns from hook args and filter /proc/0/ Docker 26+ with `runtimeType` may not publish the network namespace in `linux.namespaces` at create time. Instead, the netns path can be discovered from `libnetwork-setkey` hook arguments. Additionally, filter out the invalid `/proc/0/ns/net` placeholder that appears when the task PID is not yet known. This mirrors the Go runtime's `DockerNetnsPath` fallback logic. Fixes: #9340 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-28 10:20:18 +02:00
Fabiano Fidêncio	3ad2de584f	runtime-rs: return hypervisor PID from container manager methods Docker and containerd use the PID returned by the shim to construct `/proc/<pid>/ns/net` for network namespace operations. The Rust shim was returning the shim's own PID instead of the hypervisor's PID, which meant Docker would look at the wrong network namespace. Update `create_container`, `start_process`, `state_process`, `pid`, and `connect_container` to return the VMM master thread/process ID (`vmm_master_tid`) instead of `self.pid`. For QEMU this is the QEMU process PID; for Dragonball this is the VMM thread ID — both are valid for `/proc/<id>/ns/net` on Linux. Fixes: #9340 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-28 10:20:18 +02:00
Fabiano Fidêncio	b1393f03c4	runtime-rs: fix ConnectResponse to set both shim_pid and task_pid The containerd runtime v2 `shimTask.Create()` discards the `CreateTaskResponse.Pid` and instead retrieves the task PID by calling the shim's Connect RPC, reading `ConnectResponse.task_pid`. The Rust shim only set `shim_pid` in the ConnectResponse, leaving `task_pid` at its default zero value. This caused Docker to call `sb.SetKey("/proc/0/ns/net", ...)` which fails with "no such file or directory". Set `shim_pid` to the actual shim process ID and `task_pid` to the hypervisor PID (vmm_master_tid), matching the Go shim's Connect handler behavior. Fixes: #9340 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-28 10:20:18 +02:00
Steve Horsman	d5785b4eba	Merge pull request #12872 from stevenhorsman/bump-rust-to-1.93 Bump rust to 1.93	2026-04-27 09:01:00 +01:00
Fabiano Fidêncio	b3ed669d16	Merge pull request #12913 from pmores/fix-exec runtime-rs: fix exec when selinux is disabled on guest	2026-04-25 17:34:46 +02:00
stevenhorsman	1dbfd4b7f4	runtime-rs: Fix clippy warnings for Rust 1.93 - Replace is_ok() check followed by unwrap_err() with if let Err pattern - Replace .err().expect() with .expect_err() - Replace is_some() check followed by unwrap() with if let Some pattern These changes address clippy::unnecessary_unwrap and clippy::err_expect warnings in Rust 1.93. Assisted-by: IBM Bob Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-25 11:27:39 +01:00
Pavel Mores	d3f56cd3a6	runtime-rs: remove process selinux label on exec if disable_guest_selinux Without this commit any attempt to exec a command in a container will fail if SELinux is disabled in the guest but an SELinux label is given for the new process. That will happen pretty much any time SELinux is enabled on the host (and the container is not privileged). Signed-off-by: Pavel Mores <pmores@redhat.com>	2026-04-25 11:27:15 +01:00
Pavel Mores	1390ad650b	runtime-rs: factor getting disable_guest_linux value out to own function We'll need to get the `disable_guest_linux` value in the exec handler, too. This will allow us to avoid duplicating the get. Signed-off-by: Pavel Mores <pmores@redhat.com>	2026-04-25 11:27:15 +01:00
Fabiano Fidêncio	8c3a0e692b	runtime-rs: network: handle "device" type interfaces (mlx5 SFs) Same fix as the Go runtime: interfaces whose drivers do not register a specific netlink kind (e.g. mlx5 Scalable Functions) are reported with the generic type "device", which is not handled by the endpoint creation match, causing sandbox creation to fail with: "unsupported link type: device" Add "device" as an alternative pattern alongside "veth" so these interfaces are connected through a TAP + TC-filter bridge. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-25 12:26:20 +02:00
Fabiano Fidêncio	e0927e0e0c	Merge pull request #12846 from RainaYL/rainax/split_irqchip_pr dragonball: Implement userspace IOAPIC to enable split irqchip	2026-04-24 19:07:45 +02:00
Fabiano Fidêncio	12bb497ce2	runtime-rs: Set QEMU as the default hypervisor Dragonball is only supported on x86_64 and aarch64, so using it as the default hypervisor means architectures like s390x, powerpc64le, and riscv64gc have no working default. Switch to QEMU, which is available across all supported architectures. Dragonball is still compiled as a feature on x86_64 and aarch64 via USE_BUILTIN_DB, and users can still override the default with HYPERVISOR=dragonball. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-24 09:42:10 +02:00
Xiaofan Xxf	fd39117a21	dragonball: Implement userspace IOAPIC to enable split irqchip From Linux 6.14, creating a TDX VM requires that split irqchip is enabled. Under this circumstance, device IOAPIC would be managed in userspace, instead of KVM, so a manager is needed to handle MMIO read/write to emulated IOAPIC registers. Also, with split irqchip, irqfd is no longer able to trigger an interrupt after device IO is completed. Instead, KVM_SIGNAL_MSI is used for interrupt triggering. Note that only legacy irq with edge-triggered interrupt is implemented here. And split irqchip feature is only enabled when confidential VM type is set to TDX. Signed-off-by: Xiaofan Xxf <xiaofan.xxf@antgroup.com>	2026-04-24 10:33:05 +08:00
Saul Paredes	ed44b363ba	runtime-rs: ch: disable nested vCPUs on MSHV This is a runtime-rs port for `7973e4e2a8` The recently-added nested property is true by default, but is not supported yet on MSHV. See https://github.com/cloud-hypervisor/cloud-hypervisor/pull/7408 for additional information. Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2026-04-23 21:04:53 -05:00
Fupan Li	18378145d2	Merge pull request #12821 from fidencio/topic/runtime-rs-cpu-pinning runtime-rs: Add vCPU thread pinning support	2026-04-23 16:49:18 +08:00
Aurélien Bombo	206c1d3be8	Merge pull request #12889 from fidencio/topic/ch-config hypervisor: Enable cloud-hypervisor feature by default	2026-04-21 11:04:31 -05:00
Fabiano Fidêncio	48669a894e	runtime-rs: Add vCPU thread pinning support Port the Go runtime's enable_vcpus_pinning feature to runtime-rs. The Go runtime already lets users pin each vCPU thread to a specific host CPU when the vCPU count matches the sandbox cpuset size, using sched_setaffinity. This is useful for latency-sensitive workloads that benefit from eliminating cross-CPU migration of vCPU threads. The approach mirrors the Go implementation: After VM start and on every container add/update/delete, we fetch the vCPU thread IDs (via QMP query-cpus-fast for QEMU), compute the union of all containers' OCI cpusets, and if the two counts match, pin vCPU i to cpuset[i]. If they diverge (hotplug, container removal, etc.) we reset all threads back to the full cpuset so nothing gets stuck on a single core. The pinning check lives in CgroupsResourceInner::update_sandbox_cgroups, which already runs at exactly the right points in the lifecycle. The enable_vcpus_pinning flag flows from the TOML config through CgroupConfig into the cgroup resource layer, and can also be overridden per-pod via the io.katacontainers.config.runtime.enable_vcpus_pinning annotation. The QEMU config templates default to false. The NV GPU configs will get their own default (true) in a follow-up once those templates are added. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-21 12:45:56 +02:00
Fabiano Fidêncio	2bfa94b7cb	hypervisor: Enable cloud-hypervisor feature by default The cloud-hypervisor feature has been fully functional for some time now: it's enabled by default in virt_container, used by agent-ctl, and exercised in CI. Drop the stale comments referencing issue #6264 and promote the feature to a default. Fixes: #6264 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-21 11:26:12 +02:00
Aurélien Bombo	3cf9581fbe	runtime-rs/ch: Fix errors on pod deletion * get_rootless_symlink_sandbox_path() would get without first checking for is_rootless(), meaning cleanup() would ALWAYS fail (see below error), even though the shim/CH would NOT leak thanks to containerd's recovery routine. * Cleanup wouldn't be idempotent (in case the CRI issues multiple shutdown requests). This was fixed by introducing remove_dir_all_if_exists(). Apr 17 17:53:21 containerd[4078033]: time="2026-04-17T17:53:21.821624475-05:00" level=error msg="failed to shutdown shim task and the shim might be leaked" error="Others(\"failed to handle message handler TaskRequest\\n\\nCaused by:\\n 0: do shutdown\\n 1: do the clean up\\n 2: delete hypervisor\\n 3: No such file or directory (os error 2)\\n\\nStack backtrace:\\n 0: anyhow::error::<impl core::convert::From<E> for anyhow::Error>::from\\n 1: <hypervisor::ch::CloudHypervisor as hypervisor::Hypervisor>::cleanup::{{closure}}\\n 2: <virt_container::sandbox::VirtSandbox as common::sandbox::Sandbox>::cleanup::{{closure}}\\n 3: <virt_container::sandbox::VirtSandbox as common::sandbox::Sandbox>::shutdown::{{closure}}\\n 4: runtimes::manager::RuntimeHandlerManager::handler_task_message::{{closure}}::{{closure}}\\n 5: runtimes::manager::RuntimeHandlerManager::handler_task_message::{{closure}}\\n 6: <service::task_service::TaskService as containerd_shim_protos::shim::shim_ttrpc_async::Task>::shutdown::{{closure}}\\n 7: <containerd_shim_protos::shim::shim_ttrpc_async::ShutdownMethod as ttrpc::asynchronous::utils::MethodHandler>::handler::{{closure}}\\n 8: ttrpc::asynchronous::server::HandlerContext::handle_msg::{{closure}}\\n 9: <core::future::poll_fn::PollFn<F> as core::future::future::Future>::poll\\n 10: <ttrpc::asynchronous::server::ServerReader as ttrpc::asynchronous::connection::ReaderDelegate>::handle_msg::{{closure}}::{{closure}}\\n 11: tokio::runtime::task::core::Core<T,S>::poll\\n 12: tokio::runtime::task::harness::Harness<T,S>::poll\\n 13: tokio::runtime::scheduler::multi_thread::worker::Context::run_task\\n 14: tokio::runtime::scheduler::multi_thread::worker::Context::run\\n 15: tokio::runtime::context::scoped::Scoped<T>::set\\n 16: tokio::runtime::context::runtime::enter_runtime\\n 17: tokio::runtime::scheduler::multi_thread::worker::run\\n 18: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll\\n 19: tokio::runtime::task::core::Core<T,S>::poll\\n 20: tokio::runtime::task::harness::Harness<T,S>::poll\\n 21: tokio::runtime::blocking::pool::Inner::run\\n 22: std::sys::backtrace::__rust_begin_short_backtrace\\n 23: core::ops::function::FnOnce::call_once{{vtable.shim}}\\n 24: std::sys::thread::unix::Thread::new::thread_start\\n 25: <unknown>\\n 26: <unknown>\")" id=fca6a162b8f0ed7ef2b33cd99b6f1b58124e85c5489c193ceac487db0e4acdde Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-04-20 15:36:18 -05:00
Aurélien Bombo	93bd2899fb	runtime-rs/ch: Fix hang on pod deletion This serializes CH API calls to avoid a race condition where deleting a pod would hang indefinitely and leak both the shim and CH processes. The race happened because the CRI can send multiple shutdown requests for the same pod, however the CH socket wasn't guarded against concurrent usage, hence it was possible that HTTP responses would interleave (see below) on the shutdown path, leading to an error. This would repro in <15 iterations (sometime 2-3) using a 2-container pod. With this commit, I haven't observed a repro in 200+ iterations. Fixes: #12858 ORIGINAL REPRO: while true; do kubectl apply -f busybox.yaml kubectl wait --for=condition=ready po busybox kubectl exec busybox -- echo foo kubectl delete po busybox done ORIGINAL ERROR: Apr 17 20:15:54 kata[2297383]: Failed to stop process, process = ContainerProcess { container_id: ContainerID { container_id: "d4eb8984d630111bbf808c7ea30b7a21274c0193cdb8d501d20e4f26a0a69151" }, exec_id: "", process_type: Container }, err = failed to update_mem_resource Caused by: 0: resize memory 1: get vminfo 2: failed to serde {"config":{"cpus":{"boot_vcpus":1,"max_vcpus":32,"topology":{"threads_per_core":1,"cores_per_die":32,"dies_per_package":1,"packages":1},"kvm_hyperv":false,"max_phys_bits":46,"affinity":null,"features":{"amx":false},"nested":null},"memory":{"size":2147483648,"mergeable":false,"hotplug_method":"Acpi","hotplug_size":132024107008,"hotplugged_size":null,"shared":true,"hugepages":false,"hugepage_size":null,"prefault":false,"zones":null,"thp":true},"payload":{"firmware":null,"kernel":"/usr/share/cloud-hypervisor/vmlinux.bin","cmdline":"reboot=k panic=1 systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service agent.log_vport=1025 console=ttyS0,115200n8 root=/dev/vda1 rootflags=data=ordered,errors=remount-ro ro rootfstype=ext4 no_timer_check noreplace-smp systemd.log_target=console agent.container_pipe_size=1 agent.log=debug cgroup_no_v1=all systemd.unified_cgroup_hierarchy=1","initramfs":null},"rate_limit_groups":null,"disks":[{"path":"/usr/share/kata-containers/kata-containers.img","readonly":true,"direct":false,"iommu":false,"num_queues":1,"queue_size":128,"vhost_user":false,"vhost_socket":null,"rate_limit_group":null,"rate_limiter_config":null,"id":"_disk0","disable_io_uring":false,"disable_aio":false,"pci_segment":0,"serial":null,"queue_affinity":null,"backing_files":false}],"net":[{"tap":null,"ip":"192.168.249.1","mask":"255.255.255.0","mac":"9e:7e:13:ee:03:5c","host_mac":null,"mtu":null,"iommu":false,"num_queues":2,"queue_size":256,"vhost_user":false,"vhost_socket":null,"vhost_mode":"Client","id":"_net1","fds":[-1],"rate_limiter_config":null,"pci_segment":0,"offload_tso":true,"offload_ufo":true,"offload_csum":true}],"rng":{"src":"/dev/urandom","iommu":false},"balloon":null,"fs":[{"tag":"kataShared","socket":"/run/kata/e1ae0a05f575a13a535aa95a9990d1fded4766a759f76be0e528c7912d3a5e39/root/virtiofsd.sock","num_queues":1,"queue_size":1024,"id":"_fs2","pci_segment":0}],"pmem":null:"/run/kata/e1ae0a05f575a13a535aa95a9990d1fded4766a759f76be0e528c7912d3a5e39/ch-vm.sock","iommu":false,"id":"_vsock3","pci_segment":0},"pvpanic":false,"iommu":false,"numa":null,"watchdog":false,"pci_segments":null,"platform":null,"tpm":null,"landlock_enabl"index":0,"base":3891789824,"size":524288,"type_":"Mmio32","prefetchable":false}}],"parent":null,"children":["_disk0"],"pci_bdf":"0000:00:01.0"},"_virtio-pci-_vsock3":{"id":"_virtio-pci-_vsock3","resources":[{"PciBar":{"index":0,"base":70367622201344,"sizee":false}}],"parent":null,"children":["_fs2"],"pci_bdf":"0000:00:04.0"},"_vsock3":{"id":"_vsock3","resources":[],"parent":"_virtio-pci-_vsock3","children":[],"pci_bdf":null},"_net1":{"id":"_net1","resources":[],"parent":"_virtio-pci-_net1","children":[],"presources":[{"PciBar":{"index":0,"base":70367623774208,"size":524288,"type_":"Mmio64","prefetchable":false}}],"parent":null,"children":["_net1"],"pci_bdf":"0000:00:02.0"},"_virtio-pci-__rng":{"id":"_virtio-pci-__rng","resources":[{"PciBar":{"index":0,"baseesources":[],"parent":null,"children":[],"pci_bdf":null}}}HTTP/1.1 200 Server: Cloud Hypervisor API Connection: keep-alive Content-Type: application/json Content-Length: 4285 {"config":{"cpus":{"boot_vcpus":1,"max_vcpus":32,"topology":{"threads_per_core":1,"cores_per_die":32,"dies_per_package":1,"packagesepage_size":null,"prefault":false,"zones":null,"thp":true},"payload":{"firmware":null,"kernel":"/usr/share/cloud-hypervisor/vmlinux.bin","cmdline":"reboot=k panic=1 systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service agent.log_vport=1025 console=ttyS0,115200n8 root=/dev/vda1 rootflags=data=ordered,errors=remount-ro ro rootfstype=ext4 no_timer_check noreplace-smp systemd.log_target=console agent.container_pipe_size=1 agent.log=debug cgroup_no_v1=all systemd.unified_cgroup_hierarchy=1","miter_config":null,"id":"_disk0","disable_io_uring":false,"disable_aio":false,"pci_segment":0,"serial":null,"queue_affinity":null,"backing_files":false}],"net":[{"tap":null,"ip":"192.168.249.1","mask":"255.255.255.0","mac":"9e:7e:13:ee:03:5c","host_mac":nu,"serial":{"file":null,"mode":"Tty","iommu":false,"socket":null},"console":{"file":null,"mode":"Off","iommu":false,"socket":null},"debug_console":{"file":null,"mode":"Off","iobase":233},"devices":[],"user_devices":null,"vdpa":null,"vsock":{"cid":3,"socket" 3: expected `,` or `}` at line 1 column 1924 Stack backtrace: 0: <E as anyhow::context::ext::StdError>::ext_context 1: anyhow::context::<impl anyhow::Context<T,E> for core::result::Result<T,E>>::with_context 2: <hypervisor::ch::CloudHypervisor as hypervisor::Hypervisor>::resize_memory::{{closure}} 3: resource::manager_inner::ResourceManagerInner::update_linux_resource::{{closure}} 4: virt_container::container_manager::container::Container::stop_process::{{closure}} 5: virt_container::container_manager::process::Process::run_io_wait::{{closure}}::{{closure}} 6: tokio::runtime::task::core::Core<T,S>::poll 7: tokio::runtime::task::harness::Harness<T,S>::poll 8: tokio::runtime::scheduler::multi_thread::worker::Context::run_task 9: tokio::runtime::scheduler::multi_thread::worker::Context::run 10: tokio::runtime::context::scoped::Scoped<T>::set 11: tokio::runtime::context::runtime::enter_runtime 12: tokio::runtime::scheduler::multi_thread::worker::run 13: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll 14: tokio::runtime::task::core::Core<T,S>::poll 15: tokio::runtime::task::harness::Harness<T,S>::poll 16: tokio::runtime::blocking::pool::Inner::run 17: std::sys::backtrace::__rust_begin_short_backtrace 18: core::ops::function::FnOnce::call_once{{vtable.shim}} 19: std::sys::thread::unix::Thread::new::thread_start 20: <unknown> 21: <unknown> Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-04-20 15:36:00 -05:00
Fupan Li	2629df2785	Merge pull request #12763 from Apokleos/fsmerged-erofs-rs runtime-rs: support erofs snapshotter with Fsmerge enabled	2026-04-20 11:54:19 +08:00
Alex Lyn	e975b3158b	Merge pull request #12837 from stevenhorsman/rand-bump-GHSA-cq8v-f236-94qc versions: Bump rand crate where possible	2026-04-20 10:05:19 +08:00
Alex Lyn	be47c2e932	runtime-rs: Avoid share-rw on readonly virtio-scsi/blk devices Hotplugging a readonly block device could fail with: Block node is read-only The backend block node was created readonly, but the virtio-scsi/blk frontend path still forced share-rw=true. This is unnecessary and can cause QEMU to reject the attach because the frontend configuration does not match the readonly backend. Fix the virtio-scsi/blk hotplug path by: - setting read-only for readonly devices where supported - skipping share-rw for readonly devices Readonly handling remains in the backend block node configuration, while the frontend keeps normal disk semantics for block devices. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-04-19 13:24:31 +02:00
Alex Lyn	02f975f88b	runtime-rs: Enforce read-only and shared access for RO block devices Explicitly configure `read_only` and `force_share` for readonly block devices to ensure consistency between the image's read-only state and QEMU's access mode. Motivation: Previously, EROFS images were being accessed in a way that triggered QEMU's exclusive locking (e.g., the 'resize' lock), even when the images were intended to be read-only. This conflicted with external processes (e.g., containerd snapshotter) that held read-only handles, resulting in "Failed to get shared 'resize' lock" errors during blockdev-add. Changes: - Set `read_only=true` and `force_share=true` on both format and file nodes for VMDK descriptors and Raw images. - This ensures QEMU requests shared locks, correctly matching the read-only nature of EROFS filesystems and preventing write-mode locking conflicts with concurrent processes. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-04-19 13:24:31 +02:00
Alex Lyn	526126904e	runtime-rs: Add support for handling vmdk hotplugging with scsi We should also support virtio-scsi driver for handling vmdk format block device, and this will help address more cases. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-04-19 13:24:31 +02:00
Alex Lyn	d8db044c63	runtime-rs: Add erofs rootfs handling logic in handler_rootfs Add handling for multi-layer EROFS rootfs in RootFsResource handler_rootfs method. It will correctly handle the multi-layers erofs rootfs. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-04-19 13:24:31 +02:00
Alex Lyn	8d7051436a	runtime-rs: Add support for erofs rootfs with multi-layer Add erofs_rootfs.rs implementing ErofsMultiLayerRootfs for multi-layer EROFS rootfs with VMDK descriptor generation. It's the core implementation of Erofs rootfs within runtime. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-04-19 13:24:31 +02:00
Alex Lyn	cb706219ae	runtime-rs: Change Rootfs::get_storage return type Change Rootfs::get_storage to return Option<Vec<Storage>> to support multi-layer rootfs with multiple storages. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-04-18 22:46:33 +02:00
Alex Lyn	c06bc388c2	runtime-rs: Add format argument to hotplug_block_device method Add format argument to hotplug_block_device for flexibly specifying different block formats. With this, we can support kinds of formats, currently raw and vmdk are supported, and some other formats will be supported in future. Aside the formats, the corresponding handling logics are also required to properly handle its options needed in QMP blockdev-add. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-04-18 22:46:33 +02:00
Alex Lyn	15740439eb	runtime-rs: Add BlockDeviceFormat enum to support more block formats In practice, we need more kinds of block formats, not limited to `Raw`. This commit aims to add BlockDeviceFormat enum for kinds of block device formats support, like RAW, VMDK, etc. And it will do some following actions to make this changes work well, including format field in BlockConfig. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-04-18 19:00:44 +02:00
Alex Lyn	8ed4fa1406	runtime-rs: Add RUNTIME_ALLOW_MOUNTS to RuntimeInfo Add RUNTIME_ALLOW_MOUNTS annotation to RuntimeInfo to specify custom mount types allowed by the runtime. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-04-18 19:00:44 +02:00
Fabiano Fidêncio	d04bb98e09	runtime-rs: Increase reconnect_timeout_ms for confidential VMs The Go runtime's CoCo dev config uses dial_timeout = 45s, but all runtime-rs confidential VM configs had reconnect_timeout_ms set to 3000ms (3s) or 5000ms (SE). This is too short for confidential VMs, especially on arm64 where UEFI firmware (AAVMF) adds significant boot time on top of the measured boot process, causing ECONNRESET errors on the vsock connection before the agent is ready. Bump reconnect_timeout_ms to 45000ms across all confidential VM configs (coco-dev, SNP, TDX, SE) to match the Go runtime. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-18 00:48:13 +02:00
Saul Paredes	6f6e45522e	Merge pull request #11562 from Apokleos/clh-initdata runtime-rs: Add CoCo/protected device for initdata within runtime-rs/Cloud Hypervisor	2026-04-17 11:09:19 -07:00
Fabiano Fidêncio	690f5a2b62	Merge pull request #12862 from fidencio/topic/runtime-rs-enable-measured-rootfs-tests runtime-rs: enable measured rootfs for qemu-coco-dev-runtime-rs	2026-04-17 18:48:47 +02:00
stevenhorsman	35be1a938d	versions: Bump rand crate where possible Update all versions of rand that are controlled by us to remediate GHSA-cq8v-f236-94qc. Note: There are still some usages of rand 0.8.5 it that are from transitive dependencies which we can't currently update: - fail - phf_generator - opentelemetry due to them being archived, or our usage being 17 versions out of date Also update the rand API breakages e.g. : - rand::thread_rng() → rand::rng() (function renamed) - rand::distributions::Alphanumeric → rand::distr::Alphanumeric (module renamed) - rng.gen_range() → rng.random_range() (function renamed) Assisted-by: IBM Bob Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-17 15:58:58 +01:00
Fabiano Fidêncio	1ec0e344e5	runtime-rs: enable measured rootfs for qemu-coco-dev-runtime-rs Add kernel_verity_params to the qemu-coco-dev-runtime-rs configuration so the runtime can assemble dm-verity kernel parameters, and remove the test skip that was disabling measured rootfs tests for this hypervisor. Fixes: #12851 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-17 15:22:17 +02:00
Fabiano Fidêncio	eda3bc6190	runtime-rs: wire GetDiagnosticData for termination logs Add runtime-rs support for the GetDiagnosticData RPC. This extends the Agent trait, types, and protocol translation layer with the new request/response types. During container stop, when shared_fs is "none" and the terminationMessagePolicy annotation is "File", the runtime copies the termination log from the guest via GetDiagnosticData. The call is best-effort to avoid blocking container teardown. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-17 13:16:25 +02:00
Alex Lyn	c546b3c585	Merge pull request #12843 from microsoft/saul/build-opt runtime-rs: add build optimization flags	2026-04-16 09:05:20 +08:00
Alex Lyn	2f6319f130	runtime-rs: Fix unformatted code in runtime-rs When build runtime-rs, one unformatted code block comes up,as below: ``` - config - .hypervisor - .entry("qemu".to_owned()) - .and_modify(\|hv\| { - hv.cpu_info.default_vcpus = default_vcpus; - hv.cpu_info.default_maxvcpus = default_maxvcpus; - hv.memory_info.default_memory = default_memory; - hv.memory_info.default_maxmemory = default_maxmemory; - }); + config.hypervisor.entry("qemu".to_owned()).and_modify(\|hv\| { + hv.cpu_info.default_vcpus = default_vcpus; + hv.cpu_info.default_maxvcpus = default_maxvcpus; + hv.memory_info.default_memory = default_memory; + hv.memory_info.default_maxmemory = default_maxmemory; + }); ``` Let's format it now. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-04-15 14:48:23 +02:00
Saul Paredes	9404104aba	runtime-rs: add build optimization flags Enable the following optimizations when building runtime-rs in release mode: - lto: true - codegen-units=1: Setting these reduce the binary size and improve performance at the cost of longer build times. Without these flags: - build time: 4m 55s - binary size: 51 MB With these flags: - build time: 7m 21s - binary size: 38MB Per https://github.com/kata-containers/kata-containers/issues/1125 and local experiments, a smaller binary size leads to a smaller shim memory footprint. - https://nnethercote.github.io/perf-book/build-configuration.html#codegen-units - https://nnethercote.github.io/perf-book/build-configuration.html#link-time-optimization Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2026-04-14 15:52:38 -07:00
stevenhorsman	5bcc006447	runtime-rs: Add missing license The ch-config crate was missing a license Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-11 08:46:32 +01:00
Fabiano Fidêncio	3ce3644c3c	Merge pull request #12807 from PiotrProkop/blk-sector-rust runtime-rs: allow specifying logical/physical sector size for block devices	2026-04-11 00:42:45 +02:00
Fabiano Fidêncio	36a2d8e7f2	agent: Make launch_process_timeout configurable The hardcoded DEFAULT_LAUNCH_PROCESS_TIMEOUT of 6 seconds in the kata agent is insufficient for environments with NVIDIA GPUs and NVSwitches, where the attestation-agent needs significantly more time to collect evidence during initialization (e.g. ~2 seconds per NVSwitch). When the timeout expires, the agent (PID 1) exits with an error, causing the guest kernel to perform an orderly shutdown before the attestation-agent has finished starting. Make this timeout configurable via the kernel parameter agent.launch_process_timeout (in seconds), preserving the 6-second default for backward compatibility. The Go runtime is wired up to pass this value from the TOML config's [agent.kata] section through to the kernel command line. The NVIDIA GPU configs set the new default to 15 seconds. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-10 14:47:01 +02:00
PiotrProkop	82de35c720	runtime-rs: allow specifying logical/physical sector size for block devices Add two new configuration knobs that control the logical and physical sector sizes advertised by virtio-blk devices to the guest: block_device_logical_sector_size (config file) block_device_physical_sector_size (config file) io.katacontainers.config.hypervisor.blk_logical_sector_size (annotation) io.katacontainers.config.hypervisor.blk_physical_sector_size (annotation) The annotation names are abbreviated relative to the config file keys because Kubernetes enforces a 63-character limit on annotation name segments, and the full names would exceed it. Both settings default to 0 (let QEMU decide). When set, they are passed as logical_block_size and physical_block_size in the QMP device_add command during block device hotplug. Setting logical_sector_size smaller then container filesystem block size will cause EINVAL on mount. The physical_sector_size can always be set independently. Values must be 0 or a power of 2 in the range [512, 65536]; other values are rejected with an error at sandbox creation time. Signed-off-by: PiotrProkop <pprokop@nvidia.com>	2026-04-10 11:14:51 +02:00
Fabiano Fidêncio	218077506b	Merge pull request #12769 from RuoqingHe/runtime-rs-allow-install-on-riscv runtime-rs: Allow installation on RISC-V platforms	2026-04-10 10:24:40 +02:00
Steve Horsman	9e8069569e	Merge pull request #12734 from Apokleos/rm-v9p-rs runtime-rs: Remove virtio-9p Shared Filesystem Support	2026-04-09 16:15:55 +01:00
Hyounggyu Choi	f15f7f49f1	Merge pull request #12787 from fidencio/topic/runtime-rs-qemu-arm64-use-static-sandbox-resource-mgmt runtime: qemu: Enable static sandbox resource management on ARM & s390x	2026-04-09 09:18:11 +02:00
Ruoqing He	98ee385220	runtime-rs: Consolidate unsupported arch Consolidate arch we don't support at the moment, and avoid hard coding error messages per arch. Signed-off-by: Ruoqing He <ruoqing.he@lingcage.com>	2026-04-09 04:18:50 +00:00

1 2 3 4 5 ...

1218 Commits