Files
kata-containers/src/runtime-rs
Fabiano Fidêncio cfab6f496b runtime-rs: Propagate block device read-only flag to the VMM
Block volumes and block-mode device nodes were attached to the guest
read-write regardless of the volume's read-only intent, so the
guest-visible virtio-blk device was always writable.

This matters beyond simple write protection: filesystems such as XFS
inspect the block device read-only state to decide whether to attempt
journal/log recovery. When the device is writable, XFS tries to replay
the log even on a read-only mount, which fails badly. Mounting with
"-o ro" inside the guest is not sufficient; the device itself must
advertise read-only (VIRTIO_BLK_F_RO), which only happens when the VMM
opens the backing image read-only.

Set is_readonly on the block device config from two signals, combined
with OR so either one marks the device read-only:

  - the read-only intent from the OCI spec:
      * bind-mounted block volumes and direct-assigned (raw block)
        volumes derive it from the "ro" mount option, and
      * block-mode volumes (e.g. Kubernetes volumeDevices) arrive as
        device nodes in spec.Linux.Devices with no mount option; their
        intent is expressed only via the cgroup device access in
        spec.Linux.Resources.Devices ("rm" = read+mknod, no write, for
        read-only; "rwm" for read-write). handler_devices() derives the
        flag from the matching cgroup allow rule, and
  - the host block device's own read-only flag (queried via the BLKROGET
    ioctl). Both the volume path (block_volume/rawblock_volume) and the
    device-node path (handler_devices, resolving the host node via
    get_host_path) honor it, so a device that is physically read-only on
    the host is exposed read-only to the guest even when the intent is
    not encoded in the OCI spec.

All in-tree hypervisors (qemu, cloud-hypervisor, dragonball) already
honor BlockConfig.is_readonly, so no hypervisor changes are required.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor
2026-06-15 23:18:36 +02:00
..
2026-02-26 09:37:46 +00:00

runtime-rs

What is runtime-rs

runtime-rs is a core component of Kata Containers 4.0. It is a high-performance, Rust-based implementation of the containerd shim v2 runtime.

Key characteristics:

  • Implementation Language: Rust, leveraging memory safety and zero-cost abstractions
  • Project Maturity: Production-ready component of Kata Containers 4.0
  • Architectural Design: Modular framework optimized for Kata Containers 4.0

For architecture details, see Architecture Overview.

Architecture Overview

Key features:

  • Built-in VMM (Dragonball): Deeply integrated into shim lifecycle, eliminating IPC overhead for peak performance
  • Asynchronous I/O: Tokio-based async runtime for high-concurrency with reduced thread footprint
  • Extensible Framework: Pluggable hypervisors, network interfaces, and storage backends
  • Resource Lifecycle Management: Comprehensive sandbox and container resource management

crates overview

Crates

Crate Description
shim Containerd shim v2 entry point (start, delete, run commands)
service Services including TaskService for containerd shim protocol
runtimes Runtime handlers: VirtContainer (default), LinuxContainer(experimental), WasmContainer(experimental)
resource Resource management: network, share_fs, rootfs, volume, cgroups, cpu_mem
hypervisor Hypervisor implementations
agent Guest agent communication (KataAgent)
persist State persistence to disk (JSON format)
shim-ctl Development tool for testing shim without containerd

shim

Entry point implementing containerd shim v2 binary protocol:

  • start: Start new shim process
  • delete: Delete existing shim process
  • run: Run ttRPC service

service

Extensible service framework. Currently implements TaskService conforming to containerd shim protocol.

runtimes

Runtime handlers manage sandbox and container operations:

Handler Feature Flag Description
VirtContainer virt (default) Virtual machine-based containers
LinuxContainer linux Linux container runtime (experimental)
WasmContainer wasm WebAssembly runtime (experimental)

resource

All resources abstracted uniformly:

  • Sandbox resources: network, share-fs
  • Container resources: rootfs, volume, cgroup

Sub-modules: cpu_mem, cdi_devices, coco_data, network, share_fs, rootfs, volume

hypervisor

Supported hypervisors:

Hypervisor Mode Description
Dragonball Built-in Integrated VMM for peak performance (default)
QEMU External Full-featured emulator
Cloud Hypervisor External Modern VMM (x86_64, aarch64)
Firecracker External Lightweight microVM
Remote External Remote hypervisor

The built-in VMM mode (Dragonball) is recommended for production, offering superior performance by eliminating IPC overhead.

agent

Communication with guest OS agent via ttRPC. Supports KataAgent for full container lifecycle management.

persist

State serialization to disk for sandbox recovery after restart. Stores state.json under /run/kata/<sandbox-id>/.

Build from Source and Install

Prerequisites

Download Rustup and install Rust. For Rust version, see languages.rust.meta.newest-version in versions.yaml.

Example for x86_64:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env
rustup install ${RUST_VERSION}
rustup default ${RUST_VERSION}-x86_64-unknown-linux-gnu

Musl Support (Optional)

For fully static binary:

# Add musl target
rustup target add x86_64-unknown-linux-musl

# Install musl libc (example: musl 1.2.3)
curl -O https://git.musl-libc.org/cgit/musl/snapshot/musl-1.2.3.tar.gz
tar vxf musl-1.2.3.tar.gz
cd musl-1.2.3/
./configure --prefix=/usr/local/
make && sudo make install

Install Kata 4.0 Rust Runtime Shim

git clone https://github.com/kata-containers/kata-containers.git
cd kata-containers/src/runtime-rs
make && sudo make install

After installation:

  • Config file: /usr/share/defaults/kata-containers/configuration.toml
  • Binary: /usr/local/bin/containerd-shim-kata-v2

Install Without Built-in Dragonball VMM

To build without the built-in Dragonball hypervisor:

make USE_BUILTIN_DB=false

Specify hypervisor during installation:

sudo make install HYPERVISOR=qemu
# or
sudo make install HYPERVISOR=clh-runtime-rs

Configuration

Configuration files in config/:

Config File Hypervisor Notes
configuration-dragonball.toml.in Dragonball Built-in VMM
configuration-qemu-runtime-rs.toml.in QEMU Default external
configuration-clh-runtime-rs.toml.in Cloud Hypervisor Modern VMM
configuration-rs-fc.toml.in Firecracker Lightweight microVM
configuration-remote.toml.in Remote Remote hypervisor
configuration-qemu-tdx-runtime-rs.toml.in QEMU + TDX Intel TDX confidential computing
configuration-qemu-snp-runtime-rs.toml.in QEMU + SEV-SNP AMD SEV-SNP confidential computing
configuration-qemu-se-runtime-rs.toml.in QEMU + SEV AMD SEV confidential computing
configuration-qemu-coco-dev-runtime-rs.toml.in QEMU + CoCo CoCo development

See runtime configuration for configuration options.

Logging

See Developer Guide - Troubleshooting.

Debugging

For development, use shim-ctl to test shim without containerd dependencies.

Limitations

See Limitations for details.