Block volumes and block-mode device nodes were attached to the guest
read-write regardless of the volume's read-only intent, so the
guest-visible virtio-blk device was always writable.
This matters beyond simple write protection: filesystems such as XFS
inspect the block device read-only state to decide whether to attempt
journal/log recovery. When the device is writable, XFS tries to replay
the log even on a read-only mount, which fails badly. Mounting with
"-o ro" inside the guest is not sufficient; the device itself must
advertise read-only (VIRTIO_BLK_F_RO), which only happens when the VMM
opens the backing image read-only.
Set is_readonly on the block device config from two signals, combined
with OR so either one marks the device read-only:
- the read-only intent from the OCI spec:
* bind-mounted block volumes and direct-assigned (raw block)
volumes derive it from the "ro" mount option, and
* block-mode volumes (e.g. Kubernetes volumeDevices) arrive as
device nodes in spec.Linux.Devices with no mount option; their
intent is expressed only via the cgroup device access in
spec.Linux.Resources.Devices ("rm" = read+mknod, no write, for
read-only; "rwm" for read-write). handler_devices() derives the
flag from the matching cgroup allow rule, and
- the host block device's own read-only flag (queried via the BLKROGET
ioctl). Both the volume path (block_volume/rawblock_volume) and the
device-node path (handler_devices, resolving the host node via
get_host_path) honor it, so a device that is physically read-only on
the host is exposed read-only to the guest even when the intent is
not encoded in the OCI spec.
All in-tree hypervisors (qemu, cloud-hypervisor, dragonball) already
honor BlockConfig.is_readonly, so no hypervisor changes are required.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor
runtime-rs
What is runtime-rs
runtime-rs is a core component of Kata Containers 4.0. It is a high-performance, Rust-based implementation of the containerd shim v2 runtime.
Key characteristics:
- Implementation Language: Rust, leveraging memory safety and zero-cost abstractions
- Project Maturity: Production-ready component of Kata Containers 4.0
- Architectural Design: Modular framework optimized for Kata Containers 4.0
For architecture details, see Architecture Overview.
Architecture Overview
Key features:
- Built-in VMM (Dragonball): Deeply integrated into shim lifecycle, eliminating IPC overhead for peak performance
- Asynchronous I/O: Tokio-based async runtime for high-concurrency with reduced thread footprint
- Extensible Framework: Pluggable hypervisors, network interfaces, and storage backends
- Resource Lifecycle Management: Comprehensive sandbox and container resource management
Crates
| Crate | Description |
|---|---|
shim |
Containerd shim v2 entry point (start, delete, run commands) |
service |
Services including TaskService for containerd shim protocol |
runtimes |
Runtime handlers: VirtContainer (default), LinuxContainer(experimental), WasmContainer(experimental) |
resource |
Resource management: network, share_fs, rootfs, volume, cgroups, cpu_mem |
hypervisor |
Hypervisor implementations |
agent |
Guest agent communication (KataAgent) |
persist |
State persistence to disk (JSON format) |
shim-ctl |
Development tool for testing shim without containerd |
shim
Entry point implementing containerd shim v2 binary protocol:
start: Start new shim processdelete: Delete existing shim processrun: Run ttRPC service
service
Extensible service framework. Currently implements TaskService conforming to containerd shim protocol.
runtimes
Runtime handlers manage sandbox and container operations:
| Handler | Feature Flag | Description |
|---|---|---|
VirtContainer |
virt (default) |
Virtual machine-based containers |
LinuxContainer |
linux |
Linux container runtime (experimental) |
WasmContainer |
wasm |
WebAssembly runtime (experimental) |
resource
All resources abstracted uniformly:
- Sandbox resources: network, share-fs
- Container resources: rootfs, volume, cgroup
Sub-modules: cpu_mem, cdi_devices, coco_data, network, share_fs, rootfs, volume
hypervisor
Supported hypervisors:
| Hypervisor | Mode | Description |
|---|---|---|
| Dragonball | Built-in | Integrated VMM for peak performance (default) |
| QEMU | External | Full-featured emulator |
| Cloud Hypervisor | External | Modern VMM (x86_64, aarch64) |
| Firecracker | External | Lightweight microVM |
| Remote | External | Remote hypervisor |
The built-in VMM mode (Dragonball) is recommended for production, offering superior performance by eliminating IPC overhead.
agent
Communication with guest OS agent via ttRPC. Supports KataAgent for full container lifecycle management.
persist
State serialization to disk for sandbox recovery after restart. Stores state.json under /run/kata/<sandbox-id>/.
Build from Source and Install
Prerequisites
Download Rustup and install Rust. For Rust version, see languages.rust.meta.newest-version in versions.yaml.
Example for x86_64:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env
rustup install ${RUST_VERSION}
rustup default ${RUST_VERSION}-x86_64-unknown-linux-gnu
Musl Support (Optional)
For fully static binary:
# Add musl target
rustup target add x86_64-unknown-linux-musl
# Install musl libc (example: musl 1.2.3)
curl -O https://git.musl-libc.org/cgit/musl/snapshot/musl-1.2.3.tar.gz
tar vxf musl-1.2.3.tar.gz
cd musl-1.2.3/
./configure --prefix=/usr/local/
make && sudo make install
Install Kata 4.0 Rust Runtime Shim
git clone https://github.com/kata-containers/kata-containers.git
cd kata-containers/src/runtime-rs
make && sudo make install
After installation:
- Config file:
/usr/share/defaults/kata-containers/configuration.toml - Binary:
/usr/local/bin/containerd-shim-kata-v2
Install Without Built-in Dragonball VMM
To build without the built-in Dragonball hypervisor:
make USE_BUILTIN_DB=false
Specify hypervisor during installation:
sudo make install HYPERVISOR=qemu
# or
sudo make install HYPERVISOR=clh-runtime-rs
Configuration
Configuration files in config/:
| Config File | Hypervisor | Notes |
|---|---|---|
configuration-dragonball.toml.in |
Dragonball | Built-in VMM |
configuration-qemu-runtime-rs.toml.in |
QEMU | Default external |
configuration-clh-runtime-rs.toml.in |
Cloud Hypervisor | Modern VMM |
configuration-rs-fc.toml.in |
Firecracker | Lightweight microVM |
configuration-remote.toml.in |
Remote | Remote hypervisor |
configuration-qemu-tdx-runtime-rs.toml.in |
QEMU + TDX | Intel TDX confidential computing |
configuration-qemu-snp-runtime-rs.toml.in |
QEMU + SEV-SNP | AMD SEV-SNP confidential computing |
configuration-qemu-se-runtime-rs.toml.in |
QEMU + SEV | AMD SEV confidential computing |
configuration-qemu-coco-dev-runtime-rs.toml.in |
QEMU + CoCo | CoCo development |
See runtime configuration for configuration options.
Logging
See Developer Guide - Troubleshooting.
Debugging
For development, use shim-ctl to test shim without containerd dependencies.
Limitations
See Limitations for details.