This serializes CH API calls to avoid a race condition where deleting a pod would hang indefinitely and leak both the shim and CH processes. The race happened because the CRI can send multiple shutdown requests for the same pod, however the CH socket wasn't guarded against concurrent usage, hence it was possible that HTTP responses would interleave (see below) on the shutdown path, leading to an error. This would repro in <15 iterations (sometime 2-3) using a 2-container pod. With this commit, I haven't observed a repro in 200+ iterations. Fixes: #12858 ORIGINAL REPRO: while true; do kubectl apply -f busybox.yaml kubectl wait --for=condition=ready po busybox kubectl exec busybox -- echo foo kubectl delete po busybox done ORIGINAL ERROR: Apr 17 20:15:54 kata[2297383]: Failed to stop process, process = ContainerProcess { container_id: ContainerID { container_id: "d4eb8984d630111bbf808c7ea30b7a21274c0193cdb8d501d20e4f26a0a69151" }, exec_id: "", process_type: Container }, err = failed to update_mem_resource Caused by: 0: resize memory 1: get vminfo 2: failed to serde {"config":{"cpus":{"boot_vcpus":1,"max_vcpus":32,"topology":{"threads_per_core":1,"cores_per_die":32,"dies_per_package":1,"packages":1},"kvm_hyperv":false,"max_phys_bits":46,"affinity":null,"features":{"amx":false},"nested":null},"memory":{"size":2147483648,"mergeable":false,"hotplug_method":"Acpi","hotplug_size":132024107008,"hotplugged_size":null,"shared":true,"hugepages":false,"hugepage_size":null,"prefault":false,"zones":null,"thp":true},"payload":{"firmware":null,"kernel":"/usr/share/cloud-hypervisor/vmlinux.bin","cmdline":"reboot=k panic=1 systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service agent.log_vport=1025 console=ttyS0,115200n8 root=/dev/vda1 rootflags=data=ordered,errors=remount-ro ro rootfstype=ext4 no_timer_check noreplace-smp systemd.log_target=console agent.container_pipe_size=1 agent.log=debug cgroup_no_v1=all systemd.unified_cgroup_hierarchy=1","initramfs":null},"rate_limit_groups":null,"disks":[{"path":"/usr/share/kata-containers/kata-containers.img","readonly":true,"direct":false,"iommu":false,"num_queues":1,"queue_size":128,"vhost_user":false,"vhost_socket":null,"rate_limit_group":null,"rate_limiter_config":null,"id":"_disk0","disable_io_uring":false,"disable_aio":false,"pci_segment":0,"serial":null,"queue_affinity":null,"backing_files":false}],"net":[{"tap":null,"ip":"192.168.249.1","mask":"255.255.255.0","mac":"9e:7e:13:ee:03:5c","host_mac":null,"mtu":null,"iommu":false,"num_queues":2,"queue_size":256,"vhost_user":false,"vhost_socket":null,"vhost_mode":"Client","id":"_net1","fds":[-1],"rate_limiter_config":null,"pci_segment":0,"offload_tso":true,"offload_ufo":true,"offload_csum":true}],"rng":{"src":"/dev/urandom","iommu":false},"balloon":null,"fs":[{"tag":"kataShared","socket":"/run/kata/e1ae0a05f575a13a535aa95a9990d1fded4766a759f76be0e528c7912d3a5e39/root/virtiofsd.sock","num_queues":1,"queue_size":1024,"id":"_fs2","pci_segment":0}],"pmem":null:"/run/kata/e1ae0a05f575a13a535aa95a9990d1fded4766a759f76be0e528c7912d3a5e39/ch-vm.sock","iommu":false,"id":"_vsock3","pci_segment":0},"pvpanic":false,"iommu":false,"numa":null,"watchdog":false,"pci_segments":null,"platform":null,"tpm":null,"landlock_enabl"index":0,"base":3891789824,"size":524288,"type_":"Mmio32","prefetchable":false}}],"parent":null,"children":["_disk0"],"pci_bdf":"0000:00:01.0"},"_virtio-pci-_vsock3":{"id":"_virtio-pci-_vsock3","resources":[{"PciBar":{"index":0,"base":70367622201344,"sizee":false}}],"parent":null,"children":["_fs2"],"pci_bdf":"0000:00:04.0"},"_vsock3":{"id":"_vsock3","resources":[],"parent":"_virtio-pci-_vsock3","children":[],"pci_bdf":null},"_net1":{"id":"_net1","resources":[],"parent":"_virtio-pci-_net1","children":[],"presources":[{"PciBar":{"index":0,"base":70367623774208,"size":524288,"type_":"Mmio64","prefetchable":false}}],"parent":null,"children":["_net1"],"pci_bdf":"0000:00:02.0"},"_virtio-pci-__rng":{"id":"_virtio-pci-__rng","resources":[{"PciBar":{"index":0,"baseesources":[],"parent":null,"children":[],"pci_bdf":null}}}HTTP/1.1 200 Server: Cloud Hypervisor API Connection: keep-alive Content-Type: application/json Content-Length: 4285 {"config":{"cpus":{"boot_vcpus":1,"max_vcpus":32,"topology":{"threads_per_core":1,"cores_per_die":32,"dies_per_package":1,"packagesepage_size":null,"prefault":false,"zones":null,"thp":true},"payload":{"firmware":null,"kernel":"/usr/share/cloud-hypervisor/vmlinux.bin","cmdline":"reboot=k panic=1 systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service agent.log_vport=1025 console=ttyS0,115200n8 root=/dev/vda1 rootflags=data=ordered,errors=remount-ro ro rootfstype=ext4 no_timer_check noreplace-smp systemd.log_target=console agent.container_pipe_size=1 agent.log=debug cgroup_no_v1=all systemd.unified_cgroup_hierarchy=1","miter_config":null,"id":"_disk0","disable_io_uring":false,"disable_aio":false,"pci_segment":0,"serial":null,"queue_affinity":null,"backing_files":false}],"net":[{"tap":null,"ip":"192.168.249.1","mask":"255.255.255.0","mac":"9e:7e:13:ee:03:5c","host_mac":nu,"serial":{"file":null,"mode":"Tty","iommu":false,"socket":null},"console":{"file":null,"mode":"Off","iommu":false,"socket":null},"debug_console":{"file":null,"mode":"Off","iobase":233},"devices":[],"user_devices":null,"vdpa":null,"vsock":{"cid":3,"socket" 3: expected `,` or `}` at line 1 column 1924 Stack backtrace: 0: <E as anyhow::context::ext::StdError>::ext_context 1: anyhow::context::<impl anyhow::Context<T,E> for core::result::Result<T,E>>::with_context 2: <hypervisor::ch::CloudHypervisor as hypervisor::Hypervisor>::resize_memory::{{closure}} 3: resource::manager_inner::ResourceManagerInner::update_linux_resource::{{closure}} 4: virt_container::container_manager::container::Container::stop_process::{{closure}} 5: virt_container::container_manager::process::Process::run_io_wait::{{closure}}::{{closure}} 6: tokio::runtime::task::core::Core<T,S>::poll 7: tokio::runtime::task::harness::Harness<T,S>::poll 8: tokio::runtime::scheduler::multi_thread::worker::Context::run_task 9: tokio::runtime::scheduler::multi_thread::worker::Context::run 10: tokio::runtime::context::scoped::Scoped<T>::set 11: tokio::runtime::context::runtime::enter_runtime 12: tokio::runtime::scheduler::multi_thread::worker::run 13: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll 14: tokio::runtime::task::core::Core<T,S>::poll 15: tokio::runtime::task::harness::Harness<T,S>::poll 16: tokio::runtime::blocking::pool::Inner::run 17: std::sys::backtrace::__rust_begin_short_backtrace 18: core::ops::function::FnOnce::call_once{{vtable.shim}} 19: std::sys::thread::unix::Thread::new::thread_start 20: <unknown> 21: <unknown> Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
runtime-rs
What is runtime-rs
runtime-rs is a core component of Kata Containers 4.0. It is a high-performance, Rust-based implementation of the containerd shim v2 runtime.
Key characteristics:
- Implementation Language: Rust, leveraging memory safety and zero-cost abstractions
- Project Maturity: Production-ready component of Kata Containers 4.0
- Architectural Design: Modular framework optimized for Kata Containers 4.0
For architecture details, see Architecture Overview.
Architecture Overview
Key features:
- Built-in VMM (Dragonball): Deeply integrated into shim lifecycle, eliminating IPC overhead for peak performance
- Asynchronous I/O: Tokio-based async runtime for high-concurrency with reduced thread footprint
- Extensible Framework: Pluggable hypervisors, network interfaces, and storage backends
- Resource Lifecycle Management: Comprehensive sandbox and container resource management
Crates
| Crate | Description |
|---|---|
shim |
Containerd shim v2 entry point (start, delete, run commands) |
service |
Services including TaskService for containerd shim protocol |
runtimes |
Runtime handlers: VirtContainer (default), LinuxContainer(experimental), WasmContainer(experimental) |
resource |
Resource management: network, share_fs, rootfs, volume, cgroups, cpu_mem |
hypervisor |
Hypervisor implementations |
agent |
Guest agent communication (KataAgent) |
persist |
State persistence to disk (JSON format) |
shim-ctl |
Development tool for testing shim without containerd |
shim
Entry point implementing containerd shim v2 binary protocol:
start: Start new shim processdelete: Delete existing shim processrun: Run ttRPC service
service
Extensible service framework. Currently implements TaskService conforming to containerd shim protocol.
runtimes
Runtime handlers manage sandbox and container operations:
| Handler | Feature Flag | Description |
|---|---|---|
VirtContainer |
virt (default) |
Virtual machine-based containers |
LinuxContainer |
linux |
Linux container runtime (experimental) |
WasmContainer |
wasm |
WebAssembly runtime (experimental) |
resource
All resources abstracted uniformly:
- Sandbox resources: network, share-fs
- Container resources: rootfs, volume, cgroup
Sub-modules: cpu_mem, cdi_devices, coco_data, network, share_fs, rootfs, volume
hypervisor
Supported hypervisors:
| Hypervisor | Mode | Description |
|---|---|---|
| Dragonball | Built-in | Integrated VMM for peak performance (default) |
| QEMU | External | Full-featured emulator |
| Cloud Hypervisor | External | Modern VMM (x86_64, aarch64) |
| Firecracker | External | Lightweight microVM |
| Remote | External | Remote hypervisor |
The built-in VMM mode (Dragonball) is recommended for production, offering superior performance by eliminating IPC overhead.
agent
Communication with guest OS agent via ttRPC. Supports KataAgent for full container lifecycle management.
persist
State serialization to disk for sandbox recovery after restart. Stores state.json under /run/kata/<sandbox-id>/.
Build from Source and Install
Prerequisites
Download Rustup and install Rust. For Rust version, see languages.rust.meta.newest-version in versions.yaml.
Example for x86_64:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env
rustup install ${RUST_VERSION}
rustup default ${RUST_VERSION}-x86_64-unknown-linux-gnu
Musl Support (Optional)
For fully static binary:
# Add musl target
rustup target add x86_64-unknown-linux-musl
# Install musl libc (example: musl 1.2.3)
curl -O https://git.musl-libc.org/cgit/musl/snapshot/musl-1.2.3.tar.gz
tar vxf musl-1.2.3.tar.gz
cd musl-1.2.3/
./configure --prefix=/usr/local/
make && sudo make install
Install Kata 4.0 Rust Runtime Shim
git clone https://github.com/kata-containers/kata-containers.git
cd kata-containers/src/runtime-rs
make && sudo make install
After installation:
- Config file:
/usr/share/defaults/kata-containers/configuration.toml - Binary:
/usr/local/bin/containerd-shim-kata-v2
Install Without Built-in Dragonball VMM
To build without the built-in Dragonball hypervisor:
make USE_BUILTIN_DB=false
Specify hypervisor during installation:
sudo make install HYPERVISOR=qemu
# or
sudo make install HYPERVISOR=cloud-hypervisor
Configuration
Configuration files in config/:
| Config File | Hypervisor | Notes |
|---|---|---|
configuration-dragonball.toml.in |
Dragonball | Built-in VMM |
configuration-qemu-runtime-rs.toml.in |
QEMU | Default external |
configuration-cloud-hypervisor.toml.in |
Cloud Hypervisor | Modern VMM |
configuration-rs-fc.toml.in |
Firecracker | Lightweight microVM |
configuration-remote.toml.in |
Remote | Remote hypervisor |
configuration-qemu-tdx-runtime-rs.toml.in |
QEMU + TDX | Intel TDX confidential computing |
configuration-qemu-snp-runtime-rs.toml.in |
QEMU + SEV-SNP | AMD SEV-SNP confidential computing |
configuration-qemu-se-runtime-rs.toml.in |
QEMU + SEV | AMD SEV confidential computing |
configuration-qemu-coco-dev-runtime-rs.toml.in |
QEMU + CoCo | CoCo development |
See runtime configuration for configuration options.
Logging
See Developer Guide - Troubleshooting.
Debugging
For development, use shim-ctl to test shim without containerd dependencies.
Limitations
See Limitations for details.