Files
kata-containers/docs/design/virtualization.md
Alex Lyn 302b2c8d75 docs: Restructure and modernize virtualization design document
Comprehensive rewrite of docs/design/virtualization.md to improve
clarity, completeness, and usability.

This document now serves as the authoritative guide for
understanding and selecting hypervisors in Kata Containers deployments.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-03-29 19:17:03 +02:00

12 KiB

Virtualization in Kata Containers

Overview

Kata Containers creates a second layer of isolation on top of traditional namespace-based containers using hardware virtualization. Kata launches a lightweight virtual machine (VM) and uses the guest Linux kernel to create container workloads. In Kubernetes, the sandbox is implemented at the pod level using VMs.

This document describes:

  • How Kata Containers maps container technologies to virtualization technologies
  • The multiple hypervisors and Virtual Machine Monitors (VMMs) supported by Kata
  • Guidance for selecting the appropriate hypervisor for your use case

Architecture

A typical Kata Containers deployment integrates with Kubernetes through a Container Runtime Interface (CRI) implementation:

Kubelet → CRI (containerd/CRI-O) → Kata Containers (OCI runtime) → VM → Containers

The CRI API requires Kata to support the following constructs:

CRI Construct VM Equivalent Virtualization Technology
Pod Sandbox VM Hypervisor/VMM
Container Process in VM Namespace/Cgroup in guest
Network Network Interface virtio-net, vhost-net, physical, etc.
Storage Block/File Device virtio-block, virtio-scsi, virtio-fs
Compute vCPU/Memory KVM, ACPI hotplug

Mapping Container Concepts to Virtualization Technologies

Kata Containers implements the Kubernetes Container Runtime Interface (CRI) to provide pod and container lifecycle management. The CRI API defines abstractions that Kata must translate into virtualization primitives.

The mapping from CRI constructs to virtualization technologies follows a three-layer model:

CRI API Constructs → VM Abstractions → Para-virtualized Devices

Layer 1: CRI API Constructs

The CRI API (kubernetes/cri-api) defines the following abstractions that Kata must implement:

Construct Description
Pod Sandbox Isolated execution environment for containers
Container Process workload within a sandbox
Network Pod and container networking interfaces
Storage Volume mounts and image storage
RuntimeConfig Resource constraints (CPU, memory, cgroups)

CRI API to Kata Constructs

Layer 2: VM Abstractions

Kata translates CRI constructs into VM-level concepts:

CRI Construct VM Equivalent
Pod Sandbox Virtual Machine
Container Process/namespace in guest OS
Network Virtual NIC (vNIC)
Storage Virtual block device or filesystem
RuntimeConfig VM resources (vCPU, memory)

Kata Constructs to VM Concepts

Layer 3: Para-virtualized Devices

VM abstractions are realized through para-virtualized drivers for optimal performance:

VM Concept Device Technology
vNIC virtio-net, vhost-net, macvtap
Block Storage virtio-block, virtio-scsi
Shared Filesystem virtio-fs
Agent Communication virtio-vsock
Device Passthrough VFIO with IOMMU

VM Concepts to Underlying Technology

Note: Each hypervisor implements these mappings differently based on its device model and feature set. See the Hypervisor Details section for specific implementations.

Device Mapping

Container constructs map to para-virtualized devices:

Construct Device Type Technology
Network Network Interface virtio-net, vhost-net
Storage (ephemeral) Block Device virtio-block, virtio-scsi
Storage (shared) Filesystem virtio-fs
Communication Socket virtio-vsock
GPU/Passthrough PCI Device VFIO, IOMMU

Supported Hypervisors and VMMs

Kata Containers supports multiple hypervisors, each with different characteristics:

Hypervisor Language Architectures Type
QEMU C x86_64, aarch64, ppc64le, s390x, risc-v Type 2 (KVM)
Cloud Hypervisor Rust x86_64, aarch64 Type 2 (KVM)
Firecracker Rust x86_64, aarch64 Type 2 (KVM)
Dragonball Rust x86_64, aarch64 Type 2 (KVM) Built-in

Note: All supported hypervisors use KVM (Kernel-based Virtual Machine) as the underlying hardware virtualization interface on Linux.

Hypervisor Details

QEMU/KVM

QEMU is the most mature and feature-complete hypervisor option for Kata Containers.

Machine Types:

  • q35 (x86_64, default)
  • s390x (s390x)
  • virt (aarch64)
  • pseries (ppc64le)
  • risc-v (riscv64, experimental)

Devices and Features:

  • virtio-vsock (agent communication)
  • virtio-block or virtio-scsi (storage)
  • virtio-net/vhost-net/vhost-user-net (networking)
  • virtio-fs (shared filesystem, virtio-fs recommended)
  • VFIO (device passthrough)
  • CPU and memory hotplug
  • NVDIMM (x86_64, for rootfs as persistent memory)

Use Cases:

  • Production workloads requiring full CRI API compatibility
  • Scenarios requiring device passthrough (VFIO)
  • Multi-architecture deployments

Configuration: See configuration-qemu.toml

Dragonball (Built-in VMM)

Dragonball is a Rust-based VMM integrated directly into the Kata Containers Rust runtime as a library.

Advantages:

  • Zero IPC overhead: VMM runs in the same process as the runtime
  • Unified lifecycle: Simplified resource management and error handling
  • Optimized for containers: Purpose-built for container workloads
  • Upcall support: Direct VMM-to-Guest communication for efficient hotplug operations
  • Low resource overhead: Minimal CPU and memory footprint

Architecture:

┌─────────────────────────────────────────┐
│     Kata Containers Runtime (Rust)      │
│  ┌─────────────────────────────────┐    │
│  │      Dragonball VMM Library     │    │
│  └─────────────────────────────────┘    │
└─────────────────────────────────────────┘

Features:

  • Built-in virtio-fs/nydus support
  • Async I/O via Tokio
  • Single binary deployment
  • Optimized startup latency

Use Cases:

  • Default choice for most container workloads
  • High-density container deployments and low resource overhead scenarios
  • Scenarios requiring optimal startup performance

Configuration: See configuration-dragonball.toml

Cloud Hypervisor/KVM

Cloud Hypervisor is a Rust-based VMM designed for modern cloud workloads with a focus on performance and security.

Features:

  • CPU and memory resize
  • Device hotplug (disk, VFIO)
  • virtio-fs (shared filesystem)
  • virtio-pmem (persistent memory)
  • virtio-block (block storage)
  • virtio-vsock (agent communication)
  • Fine-grained seccomp filters per VMM thread
  • HTTP OpenAPI for management

Use Cases:

  • High-performance cloud-native workloads
  • Applications requiring memory/CPU resizing
  • Security-sensitive deployments (seccomp isolation)

Configuration: See configuration-cloud-hypervisor.toml

Firecracker/KVM

Firecracker is a minimalist VMM built on rust-vmm crates, optimized for serverless and FaaS workloads.

Devices:

  • virtio-vsock (agent communication)
  • virtio-block (block storage)
  • virtio-net (networking)

Limitations:

  • No filesystem sharing (virtio-fs not supported)
  • No device hotplug
  • No VFIO/passthrough support
  • No CPU/memory hotplug
  • Limited CRI API support

Use Cases:

  • Serverless/FaaS workloads
  • Single-tenant microVMs
  • Scenarios prioritizing minimal attack surface

Configuration: See configuration-fc.toml

Hypervisor Comparison Summary

Feature QEMU Cloud Hypervisor Firecracker Dragonball
Maturity Excellent Good Good Good
CRI Compatibility Full Full Partial Full
Filesystem Sharing
Device Hotplug
VFIO/Passthrough
CPU/Memory Hotplug
Security Isolation Good Excellent (seccomp) Excellent Excellent
Startup Latency Good Excellent Excellent Best
Resource Overhead Medium Low Lowest Lowest

Choosing a Hypervisor

Decision Matrix

Requirement Recommended Hypervisor
Full CRI API compatibility QEMU, Cloud Hypervisor, Dragonball
Device passthrough (VFIO) QEMU, Cloud Hypervisor, Dragonball
Minimal resource overhead Dragonball, Firecracker
Fastest startup time Dragonball, Firecracker
Serverless/FaaS Dragonball, Firecracker
Production workloads Dragonball, QEMU
Memory/CPU resizing Dragonball, Cloud Hypervisor, QEMU
Maximum security isolation Cloud Hypervisor (seccomp), Firecracker, Dragonball
Multi-architecture QEMU

Recommendations

For Most Users: Use the default Dragonball VMM with the Kata Containers Rust runtime. It provides the best balance of performance, security, and container density.

For Device Passthrough: Use QEMU, Cloud Hypervisor, or Dragonball if you require VFIO device assignment.

For Serverless: Use Dragonball or Firecracker for ultra-lightweight, single-tenant microVMs.

For Legacy/Ecosystem Compatibility: Use QEMU for its extensive hardware emulation and multi-architecture support.

Hypervisor Configuration

Configuration Files

Each hypervisor has a dedicated configuration file:

Hypervisor Rust Runtime Configuration Go Runtime Configuration
QEMU configuration-qemu-runtime-rs.toml configuration-qemu.toml
Cloud Hypervisor configuration-cloud-hypervisor.toml configuration-clh.toml
Firecracker configuration-rs-fc.toml configuration-fc.toml
Dragonball configuration-dragonball.toml (default) No

Note: Configuration files are typically installed in /opt/kata/share/defaults/kata-containers/ or /opt/kata/share/defaults/kata-containers/runtime-rs/ or /usr/share/defaults/kata-containers/.

Switching Hypervisors

Use the kata-manager tool to switch the configured hypervisor:

# List available hypervisors
$ kata-manager -L

# Switch to a different hypervisor
$ sudo kata-manager -S <hypervisor-name>

For detailed instructions, see the kata-manager documentation.

Hypervisor Versions

The following versions are used in this release (from versions.yaml):

Hypervisor Version Repository
Cloud Hypervisor v51.1 https://github.com/cloud-hypervisor/cloud-hypervisor
Firecracker v1.12.1 https://github.com/firecracker-microvm/firecracker
QEMU v10.2.1 https://github.com/qemu/qemu
Dragonball builtin https://github.com/kata-containers/kata-containers/tree/main/src/dragonball

Note: Dragonball is integrated into the Kata Containers Rust runtime and does not have a separate version number. For the latest hypervisor versions, see the versions.yaml file in the Kata Containers repository.

References