188 Commits

Author SHA1 Message Date
LandonTClipp
4a9da5d37a chore(docs): Add info on building and running custom artifacts
I created this over the course of testing my VISIBLE_CDI_DEVICES
changes. I think this will be useful to folks who don't understand the
right way to deploy custom artifacts.

Signed-off-by: LandonTClipp <lclipp@coreweave.com>
2026-06-16 11:44:09 +02:00
Fabiano Fidêncio
110843d6e1 Merge pull request #13138 from manuelh-dev/mahuber/runt-rs-mem-file-removal
runtime(-rs): remove file_mem_backend config option
2026-06-12 17:13:04 +02:00
Manuel Huber
70d8f1bf3d runtime: remove file_mem_backend config option
Remove the Go runtime file_mem_backend and valid_file_mem_backends
config knobs, along with the corresponding sandbox annotation handling.

The runtime still enables file-backed shared memory automatically for
virtio-fs by using /dev/shm as the backing directory. This only removes
the user-selectable backend path.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Assisted-by: OpenAI Codex <codex@openai.com>
2026-06-12 00:07:16 +00:00
Manuel Huber
86fd65271c runtime-rs: remove file_mem_backend config option
While the config knob is being parsed, it is being unused in the
rust shim. This renders the config knob useless. Remove the
file_mem_backend config option as there is no current users for it.
As this option is being usable in the go shim, we leave it intact.

For the rust shim, /dev/shm is still being used in a similar way to
the go shim when filesystem sharing is enabled (virtio-fs). Future
use cases where other file_mem_backends are being utilized are
currently planning to define these backends in a similar manner:
based on the configuration/platform, determine the proper file
memory backend, but do not let end users determine the file memory
backend.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-06-12 00:07:16 +00:00
Alex Lyn
4eb7512e7b docs: Update how-to guide for virtio-fs-nydus with runtime-rs
Add comprehensive documentation for using virtio-fs-nydus shared
filesystem with Kata Containers. This guide covers:
(1) Clarify configuration options for virtio-fs-nydus and nydus image
    preparation and usage.
(2) Update daemon configuration and lifecycle management and introduce
    standalone, inline nydus architecture.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-11 21:42:48 +02:00
Fabiano Fidêncio
2b6efda67d docs: document the standalone kata-monitor image
kata-monitor is published as a standalone container image starting
with 3.32.0; point users at it from the metrics design doc and the
Prometheus-on-Kubernetes how-to, and switch the DaemonSet manifest to
the dedicated image (keeping the runtime endpoint/listen settings and
hostPath cleanups).

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: OpenAI Codex <codex@openai.com>
2026-06-09 14:33:30 +02:00
Fabiano Fidêncio
3dc02a8604 Merge pull request #13085 from Apokleos/erofs-gpt-vmdk-only
runtime-rs: Support erofs snapshotter with gpt vmdk mode
2026-05-25 16:29:59 +02:00
Alex Lyn
53699b0170 docs: Reset max_unmerged_layers = 0 for gpt+vmdk mode
As max_unmerged_layers = 1 is just for fsmerge mode, as containerd
temperally unsupport fsmerge, we just reset it with default 0.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-05-25 19:13:28 +08:00
Fabiano Fidêncio
f763e9cca9 tests: Add NUMA topology / GPU placement tests to the NV CIs
Add k8s-nvidia-numa.bats with five tests that validate NUMA behaviour
on hosts where NUMA is configured by default (qemu-nvidia-gpu,
qemu-nvidia-gpu-snp, qemu-nvidia-gpu-tdx):

1. Multi-node sandbox (large workload spanning all host NUMA nodes):
   - Guest NUMA node count matches host
   - Guest vCPU distribution is balanced across nodes (max-min <= 1)
   - Guest memory is distributed across NUMA nodes
   - Host-side vCPU pinning is balanced across NUMA nodes

2. Right-sized single-node sandbox (small workload fitting one node):
   - Guest collapses to a single NUMA node
   - All host vCPU threads pinned to that one NUMA node

3. GPU passthrough with VFIO, multi-node:
   - Guest NUMA topology is balanced (same as test 1)
   - Guest GPU's NUMA node matches the host GPU's NUMA node
     (resolved via the vfio-pci,host=<BDF> from the QEMU command
     line and /sys/bus/pci/devices/<BDF>/numa_node)
   - QEMU command line contains pxb-pcie and policy=bind
   - Host vCPU pinning is balanced

4. GPU passthrough with VFIO, right-sized single-node: small workload
   plus GPU that fits in a single host NUMA node:
   - Guest collapses to a single NUMA node
   - The chosen node is the GPU's host NUMA node, not just any node
     that fits — verified by matching host-nodes= in the memory
     backend and pxb-pcie numa_node= against the GPU's host node
   - Guest GPU reports the same NUMA node as the host GPU

5. Explicit numa_mapping in the runtime TOML (QEMU-only):
   - Drops a config.d/ fragment that sets numa_mapping = ["1"], so the
     auto-derive + right-sizing path is bypassed entirely
   - Guest sees exactly 1 NUMA node
   - QEMU memory backend is bound to host node 1 (host-nodes=1,
     policy=bind), not host node 0
   - Host-side vCPU threads land on host node 1
   - Drop-in is removed on teardown so subsequent tests are unaffected

Guest-side checks use a dedicated container image
(quay.io/kata-containers/numa) that reads sysfs and prints results to
stdout — no kubectl exec or CoCo policy overrides needed.

Host-side checks (crictl, pgrep, taskset) run directly on the host
via sudo; a standalone numa-pinning-check.sh script handles the vCPU
thread affinity inspection.  The config.d/ helpers used by test 5 are
runtime-agnostic (probe Go vs runtime-rs layout on disk) but the test
is gated to qemu-* shims since runtime-rs does not yet implement
NUMA.

Skips cleanly on single-NUMA hosts, unsupported hypervisors, or when
no nvidia.com/pgpu resources are available (GPU tests only).

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-05-24 22:00:46 +02:00
Fabiano Fidêncio
20705470e9 docs: Add NUMA support guide for Kata Containers with QEMU
Add a step-by-step how-to guide covering host inspection, Kata NUMA
drop-in setup (via kata-deploy Helm and manual config.d/), pod
deployment examples, and guest/host verification procedures.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-24 22:00:46 +02:00
Fabiano Fidêncio
ffa59ce3aa Merge commit from fork
runtime: disable virtiofsd extra-args annotation by default
2026-05-19 08:22:12 +02:00
stevenhorsman
7aa3f7777a runtime-rs: Actually send cdh_api_timeout as kernel_param
The cdh_api_timeout_ms configuration parameter wasn't being used
anywhere, so add the logic to process it as an annotation into the runtime-rs
agent config and then use that as a kernel_param.

Assisted-by IBM Bob

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-05-12 08:41:05 +01:00
Fabiano Fidêncio
c945d2701c runtime: disable virtiofsd extra-args annotation by default
Keep virtio_fs_extra_args support in code, but remove it from default
enable_annotations and add explicit security warnings in Makefiles and
docs.

Release-note note: mirror this hardening in release notes so operators
know this remains opt-in and carries host-side risk when enabled.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-09 13:21:39 +02:00
Fabiano Fidêncio
3ef2c5db65 docs: docker: Update docs to mention runtime-rs and what's tested
Now that we're adding support for the rust runtime, let's also update
the docs.

We may also need to update the docs again once we start testing with
different VMMs, but that's not in the scope for this PR.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-28 10:22:21 +02:00
Alex Tibbles
90286d3072 docs: add a simple how-to on using kata from docker
Create a new how-to covering simple installation and configuration of
kata as a docker daemon runtime.

Signed-off-by: Alex Tibbles <alex@bleg.org>
2026-04-27 17:51:13 +02:00
Fabiano Fidêncio
56c6f8bbb2 docs: Fix shellcheck issues in offline_cpu.sh
Fix shellcheck warnings and notes identified by running
shellcheck --severity=style.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-24 08:14:08 +02:00
Alex Lyn
59609463e0 docs: Update kernel modules loading document
- Restructure document with clearer sections and better readability
- Add configuration format examples for both runtimes
- Add technical details including data flow and implementation references
- Add debugging section for troubleshooting

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-04-22 16:29:46 +08:00
Alex Lyn
27341f45f1 docs: Add how-to guide for using fsmerged EROFS rootfs with Kata
Document the end-to-end workflow for using the containerd EROFS
snapshotter with Kata Containers runtime-rs, covering containerd
configuration, Kata QEMU settings, and pod deployment examples
via crictl/ctr/Kubernetes.

Include prerequisites (containerd >= 2.2, runtime-rs main branch),
QEMU VMDK format verification command, architecture diagram,
VMDK descriptor format reference, and troubleshooting guide.

Note that Cloud Hypervisor, Firecracker, and Dragonball do not
support VMDK block devices and are currently unsupported for
fsmerged EROFS rootfs.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-04-19 13:24:31 +02:00
LandonTClipp
56cdfa831f docs: Add annotation config to doc site
Adding the pod annotation config to the doc site. A symlink is created
at docs/pod-annotations.md that points to
how-to/how-to-set-sandbox-config-kata.md so that the URL for this file will be
created at `/pod-annotations`. Also adding brief contrbuting guidelines and
how-to's for running the documentation site locally for local previews.

Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>
2026-04-15 14:48:01 +01:00
Alex Lyn
9f6bce9517 docs: Remove containerd settings from crio dedicated document
As the document is just for CRI-O, we need remove containerd related
settings from it and make it clear for users.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-03-29 19:17:03 +02:00
Alex Lyn
b04260f926 docs: Rename run-kata-with-k8s with adding crio
As previous document of run-kata-with-k8s.md is not clear for new comers
to quickly find the way to run kata with k8s/crio. In this commit, it
just rename the document name and make it clear.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-03-29 19:17:03 +02:00
Alex Lyn
004333ed71 docs: Update containerd-kata.md with clear settings
In this commit:
(1) Update containerd config with kata configurations
(2) Add more comments to guide how to use containerd/kata with default
setting and customized configure setting;
(3) Update the usage of containerd cmd tool ctr with explicitly
specified runtime-config-path options to make it work.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-03-29 19:17:03 +02:00
Alex Lyn
8dae67794a docs: switch to blockfile snapshotter for SEV-SNP in runtime-rs
Updated the configuration guide to use `shared_fs = "none"`. This
change reflects that `virtio-9p` is deprecated in `runtime-rs` and
recommends the blockfile snapshotter as a stable alternative to
the buggy `virtio-fs` in SEV-SNP QEMU versions.

But this's limited in the nerdctl or ctr tools.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-03-29 19:17:03 +02:00
Alex Lyn
75ecfe3fe2 docs: Fix volume type and fs type
Correct the volume type with `volume-type` and fix the fs type
with `fstype`.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-03-29 19:17:03 +02:00
Alex Lyn
a923bb2917 docs: Add document for how-to-use passthroughfd-IO within runtime-rs
This document describes the Passthrough-FD (pass-fd) technology
implemented in Kata Containers to optimize IO performance. By bypassing
the intermediate proxy layers, this technology significantly reduces
latency and CPU overhead for container IO streams.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-03-29 19:17:03 +02:00
PiotrProkop
64735222c6 runtime: allow specifying logical/physical sector size for block devices
Add two new configuration knobs that control the logical and physical
sector sizes advertised by virtio-blk devices to the guest:

  block_device_logical_sector_size  (config file)
  block_device_physical_sector_size (config file)

  io.katacontainers.config.hypervisor.blk_logical_sector_size  (annotation)
  io.katacontainers.config.hypervisor.blk_physical_sector_size (annotation)

The annotation names are abbreviated relative to the config file keys
because Kubernetes enforces a 63-character limit on annotation name
segments, and the full names would exceed it.

Both settings default to 0 (let QEMU decide). When set, they are passed
as logical_block_size and physical_block_size in the QMP device_add
command during block device hotplug.

Setting logical_sector_size smaller then container filesystem
block size will cause EINVAL on mount. The physical_sector_size can
always be set independently.

Values must be 0 or a power of 2 in the range [512, 65536]; other
values are rejected with an error at sandbox creation time.

Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2026-03-27 18:56:54 +01:00
stevenhorsman
d06dadd8ef docs: Spelling updates
Either fixing typos, or including program/repo name in
backticks

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-19 10:22:54 +00:00
Dan Mihai
3ea23528a5 docs: require user/group/fsGroup/supplementalGroups
Add a nydus guest-pull limitation explaining that specifying runAsUser,
runAsGroup, fsGroup, and supplementalGroups are required.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-03-02 23:48:36 +01:00
Fabiano Fidêncio
96c20f8baa tests: k8s: set CreateContainerRequest (on free runners) timeout to 600s
Set KubeletConfiguration runtimeRequestTimeout to 600s mainly for CoCo
(Confidential Containers) tests, so container creation (attestation,
policy, image pull, VM start) does not hit the default CRI timeout.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-21 08:44:47 +01:00
Fabiano Fidêncio
5c0269881e tests: Make editorconfig-checker happy
- Trim trailing whitespace and ensure final newline in non-vendor files
- Add .editorconfig-checker.json excluding vendor dirs, *.patch, *.img,
  *.dtb, *.drawio, *.svg, and pkg/cloud-hypervisor/client so CI only
  checks project code
- Leave generated and binary assets unchanged (excluded from checker)

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-10 21:58:28 +01:00
Dan Mihai
20ca4d2d79 runtime: DEFDISABLEBLOCK := true
1. Add disable_block_device_use to CLH settings file, for parity with
   the already existing QEMU settings.

2. Set DEFDISABLEBLOCK := true by default for both QEMU and CLH. After
   this change, Kata Guests will use by default virtio-fs to access
   container rootfs directories from their Hosts. Hosts that were
   designed to use Host block devices attached to the Guests can
   re-enable these rootfs block devices by changing the value of
   disable_block_device_use back to false in their settings files.

3. Add test using container image without any rootfs layers. Depending
   on the container runtime and image snapshotter being used, the empty
   container rootfs image might get stored on a host block device that
   cannot be safely hotplugged to a guest VM, because the host is using
   the same block device.

4. Add block device hotplug safety warning into the Kata Shim
   configuration files.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Cameron McDermott <cameron@northflank.com>
2026-01-28 19:47:49 +01:00
Manuel Huber
65aa99f291 docs: Fix trusted-image-storage reference
The sample uses a volume device name which does not exist,
hence fix.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-01-09 11:41:18 +00:00
Alex Lyn
82e8e9fbe0 doc: add block device's settings to the doc page
Add the block device specific annotations which is dedicated within
runtime-rs for num_queues and queue_sie to the document to help
users set the two parameters.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-11 21:10:22 +01:00
Anton Ippolitov
23c46b8a00 docs: Update devmapper containerd plugin name
The Firecracker installation docs had an outaded containerd configuration for the devmapper plugin.
This commit updates the instructions so that they are compatible with more recent versions of containerd.

Signed-off-by: Anton Ippolitov <anton.ippolitov@datadoghq.com>
2025-11-05 18:42:29 +01:00
ssc
551caad4b1 docs: add guide on VM templating usage in runtime-rs
- Explained the concept and benefits of VM templating
- Provided step-by-step instructions for enabling VM templating
- Detailed the setup for using snapshotter in place of VirtioFS for template-based VM creation
- Added performance test results comparing template-based and direct VM creation

Signed-off-by: ssc <741026400@qq.com>
2025-10-30 15:18:31 +08:00
wangxinge
8e1b33cc14 docs: add document for seccomp
This commit adds a document to use
seccomp in runtime-rs

Signed-off-by: wangxinge <wangxinge@bupt.edu.cn>
2025-10-09 13:25:17 +08:00
Aurélien Bombo
476c827fca Merge pull request #11878 from kata-containers/sprt/privileged-docs
docs: Document `privileged_without_host_devices=false` as unsupported
2025-10-08 11:12:45 -05:00
Fabiano Fidêncio
8c4bad68a8 kata-deploy: Remove kustomize yamls, rely on helm-chart only
As the kata-deploy helm chart has been the only way we've been testing
kata-containers deployment as part of our CI, it's time to finally get
rid of the kustomize yamls and avoid us having to maintain two different
methods (with one of those not being tested).

Here I removed:
* kata-deploy yamls and kustomize yamls
* kata-cleanup yamls and kustomize yamls
* kata-rbac yals and kustomize yamls
* README.md for the kustomize yamls was removed

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-10-08 16:54:19 +02:00
Aurélien Bombo
6ff78373cf docs: Document privileged_without_host_devices=false as unsupported
Document that privileged containers with
privileged_without_host_devices=false are not generally supported.

When you try the above, the runtime will pass all the host devices to Kata
in the OCI spec, and Kata will fail to create the container for various
reasons depending on the setup, e.g.:

 - Attempting to hotplug uninitialized loop devices.
 - Attempting to remount /dev devices on themselves when the agent had
   already created them as default devices (e.g. /dev/full).
 - "Conflicting device updates" errors.
 - And more...

privileged_without_host_devices was originally created to support
Kata [1][2] and lots of people are having issues when it's set to
false [3].

[1] https://github.com/kata-containers/runtime/issues/1568
[2] https://github.com/containerd/cri/pull/1225
[3] https://github.com/kata-containers/kata-containers/issues?q=is%3Aissue%20%20in%3Atitle%20privileged

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2025-10-02 15:21:19 -05:00
Alex Lyn
f254eeb0e9 CI: Keep base64 output is a single line
This commit addresses an issue where base64 output, when used with a
default configuration, would introduce newlines, causing decoding to
fail on the runtime.

The fix ensures base64 output is a single, continuous line using the -w0
flag. This guarantees the encoded string is a valid Base64 sequence,
preventing potential runtime errors caused by invalid characters.

Note that: When you use the base64 command without any parameters, it
typically automatically adds newlines to the output, usually every 76 chars.

In contrast, base64 -w0 explicitly tells the command not to add any
newlines (-w for wrap, and 0 for a width of zero), which results in a
continuous string with no whitespace.

This is a critical distinction because if you pass a Base64 string with
newlines to a runtime, it may be treated as an invalid string, causing
the decoding process to fail.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-09-23 11:58:53 +08:00
Saul Paredes
cc73b14e26 docs: update policy docs
Update policy docs to use initdata annotation and encoding

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
2025-09-15 11:40:29 -07:00
Fabiano Fidêncio
ad240a39e6 kata-deploy: tools: tests: Use zstd instead of xz
Although the compress ratio is not as optimal as using xz, it's way
faster to compress / uncompress, and it's "good enough".

This change is not small, but it's still self-contained, and has to get
in at once, in order to help bisects in the future.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2025-08-21 19:53:55 +02:00
Paul Meyer
c4815eb3ad runtime: add option to force guest pull
This enables guest pull via config, without the need of any external
snapshotter. When the config enables runtime.experimental_force_guest_pull, instead of
relying on annotations to select the way to share the root FS, we always
use guest pull.

Co-authored-by: Markus Rudy <mr@edgeless.systems>
Signed-off-by: Paul Meyer <katexochen0@gmail.com>
2025-05-27 12:42:00 +02:00
Hui Zhu
17af28acad docs: Add how-to-use-memory-agent.md to howto
Add how-to-use-memory-agent.md (How to use mem-agent to decrease the
memory usage of Kata container) to docs to show how to use mem-agent.

Fixes: #11013

Signed-off-by: Hui Zhu <teawater@gmail.com>
2025-04-02 17:45:59 +08:00
Ryan Savino
90e2b7d1bc docs: updated build and host setup instructions for SNP
Referenced AMD developer page for latest SEV firmware.
Instructions to point to upstream 6.11 kernel or later.
Referenced sev-utils and AMDESE fork for kernel setup.

Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
2025-01-28 18:09:40 -06:00
Shunsuke Kimura
706e8bce89 docs: change from OVMF.fd to AmdSev.fd
change the build method to generate OVMF for AmdSev.
This commit adds `ovmf_build=sev` env parameter.
<638c2c4164>

Fixes #10378

Signed-off-by: Shunsuke Kimura <pbrehpuum@gmail.com>
2024-11-15 11:24:45 +09:00
Shunsuke Kimura
d7f6fabe65 docs: fix build-kernel.sh option
`build-kernel.sh` no longer takes an argument for the -x option.
<6c3338271b>

Fixes #10378

Signed-off-by: Shunsuke Kimura <pbrehpuum@gmail.com>
2024-11-15 11:24:45 +09:00
Pradipta Banerjee
6f1ba007ed runtime: Add GPU annotations for remote hypervisor
Add GPU annotations for remote hypervisor to help
with the right instance selection based on number of GPUs
and model

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
2024-10-29 10:28:21 -04:00
Fabiano Fidêncio
fefcf7cfa4 acrn: Drop support
As we don't have any CI, nor maintainer to keep ACRN code around, we
better have it removed than give users the expectation that it should or
would work at some point.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2024-09-19 16:05:43 +02:00
Hyounggyu Choi
8d609e47fb doc: Update how-to-run-kata-containers-with-SE-VMs.md
The following changes have been made:

- Remove unnecessary `sudo`
- Add an error message where an incorrect host key document is used
- Add a missing artifact `kernel-confidential-modules`
- Make a variable `kernel_version` and replace it with relevant hits

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-09-16 12:53:30 +02:00