Fabiano Fidêncio f763e9cca9 tests: Add NUMA topology / GPU placement tests to the NV CIs
Add k8s-nvidia-numa.bats with five tests that validate NUMA behaviour
on hosts where NUMA is configured by default (qemu-nvidia-gpu,
qemu-nvidia-gpu-snp, qemu-nvidia-gpu-tdx):

1. Multi-node sandbox (large workload spanning all host NUMA nodes):
   - Guest NUMA node count matches host
   - Guest vCPU distribution is balanced across nodes (max-min <= 1)
   - Guest memory is distributed across NUMA nodes
   - Host-side vCPU pinning is balanced across NUMA nodes

2. Right-sized single-node sandbox (small workload fitting one node):
   - Guest collapses to a single NUMA node
   - All host vCPU threads pinned to that one NUMA node

3. GPU passthrough with VFIO, multi-node:
   - Guest NUMA topology is balanced (same as test 1)
   - Guest GPU's NUMA node matches the host GPU's NUMA node
     (resolved via the vfio-pci,host=<BDF> from the QEMU command
     line and /sys/bus/pci/devices/<BDF>/numa_node)
   - QEMU command line contains pxb-pcie and policy=bind
   - Host vCPU pinning is balanced

4. GPU passthrough with VFIO, right-sized single-node: small workload
   plus GPU that fits in a single host NUMA node:
   - Guest collapses to a single NUMA node
   - The chosen node is the GPU's host NUMA node, not just any node
     that fits — verified by matching host-nodes= in the memory
     backend and pxb-pcie numa_node= against the GPU's host node
   - Guest GPU reports the same NUMA node as the host GPU

5. Explicit numa_mapping in the runtime TOML (QEMU-only):
   - Drops a config.d/ fragment that sets numa_mapping = ["1"], so the
     auto-derive + right-sizing path is bypassed entirely
   - Guest sees exactly 1 NUMA node
   - QEMU memory backend is bound to host node 1 (host-nodes=1,
     policy=bind), not host node 0
   - Host-side vCPU threads land on host node 1
   - Drop-in is removed on teardown so subsequent tests are unaffected

Guest-side checks use a dedicated container image
(quay.io/kata-containers/numa) that reads sysfs and prints results to
stdout — no kubectl exec or CoCo policy overrides needed.

Host-side checks (crictl, pgrep, taskset) run directly on the host
via sudo; a standalone numa-pinning-check.sh script handles the vCPU
thread affinity inspection.  The config.d/ helpers used by test 5 are
runtime-agnostic (probe Go vs runtime-rs layout on disk) but the test
is gated to qemu-* shims since runtime-rs does not yet implement
NUMA.

Skips cleanly on single-NUMA hosts, unsupported hypervisors, or when
no nvidia.com/pgpu resources are available (GPU tests only).

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-05-24 22:00:46 +02:00
2026-05-24 22:00:46 +02:00
2025-09-16 19:16:14 +02:00
2026-02-09 15:03:26 -08:00
2026-03-09 14:52:17 -05:00
2026-05-18 09:47:15 +01:00
2023-11-16 16:09:20 +00:00
2022-02-21 17:01:09 +00:00
2026-05-18 09:47:15 +01:00
2017-12-06 23:01:13 -06:00
2026-05-19 15:32:50 +02:00
2026-05-24 22:00:46 +02:00

CI | Publish Kata Containers payload Kata Containers Nightly CI OpenSSF Scorecard

Kata Containers

Welcome to Kata Containers!

This repository is the home of the Kata Containers code for the 2.0 and newer releases.

If you want to learn about Kata Containers, visit the main Kata Containers website.

Introduction

Kata Containers is an open source project and community working to build a standard implementation of lightweight Virtual Machines (VMs) that feel and perform like containers, but provide the workload isolation and security advantages of VMs.

License

The code is licensed under the Apache 2.0 license. See the license file for further details.

Platform support

Kata Containers currently runs on 64-bit systems supporting the following technologies:

Architecture Virtualization technology
x86_64, amd64 Intel VT-x, AMD SVM
aarch64 ("arm64") ARM Hyp
ppc64le IBM Power
s390x IBM Z & LinuxONE SIE

Hardware requirements

The Kata Containers runtime provides a command to determine if your host system is capable of running and creating a Kata Container:

$ kata-runtime check

Notes:

  • This command runs a number of checks including connecting to the network to determine if a newer release of Kata Containers is available on GitHub. If you do not wish this to check to run, add the --no-network-checks option.

  • By default, only a brief success / failure message is printed. If more details are needed, the --verbose flag can be used to display the list of all the checks performed.

  • If the command is run as the root user additional checks are run (including checking if another incompatible hypervisor is running). When running as root, network checks are automatically disabled.

Getting started

See the installation documentation.

Documentation

See the official documentation including:

Configuration

Kata Containers uses a single configuration file which contains a number of sections for various parts of the Kata Containers system including the runtime, the agent and the hypervisor.

Hypervisors

See the hypervisors document and the Hypervisor specific configuration details.

Community

To learn more about the project, its community and governance, see the community repository. This is the first place to go if you wish to contribute to the project.

Getting help

See the community section for ways to contact us.

Raising issues

Please raise an issue in this repository.

Note: If you are reporting a security issue, please follow the vulnerability reporting process

Developers

See the developer guide.

Components

Main components

The table below lists the core parts of the project:

Component Type Description
runtime core Main component run by a container manager and providing a containerd shimv2 runtime implementation.
runtime-rs core The Rust version runtime.
agent core Management process running inside the virtual machine / POD that sets up the container environment.
dragonball core An optional built-in VMM brings out-of-the-box Kata Containers experience with optimizations on container workloads
documentation documentation Documentation common to all components (such as design and install documentation).
tests tests Excludes unit tests which live with the main code.

Additional components

The table below lists the remaining parts of the project:

Component Type Description
packaging infrastructure Scripts and metadata for producing packaged binaries
(components, hypervisors, kernel and rootfs).
kernel kernel Linux kernel used by the hypervisor to boot the guest image. Patches are stored here.
osbuilder infrastructure Tool to create "mini O/S" rootfs and initrd images and kernel for the hypervisor.
kata-debug infrastructure Utility tool to gather Kata Containers debug information from Kubernetes clusters.
agent-ctl utility Tool that provides low-level access for testing the agent.
kata-ctl utility Tool that provides advanced commands and debug facilities.
trace-forwarder utility Agent tracing helper.
ci CI Continuous Integration configuration files and scripts.
ocp-ci CI Continuous Integration configuration for the OpenShift pipelines.
katacontainers.io Source for the katacontainers.io site.
Webhook utility Example of a simple admission controller webhook to annotate pods with the Kata runtime class

Packaging and releases

Kata Containers is now available natively for most distributions.

General tests

See the tests documentation.

Glossary of Terms

See the glossary of terms related to Kata Containers.

Languages
Rust 60.8%
Go 23.3%
Shell 9%
RPC 4.7%
Makefile 1%
Other 1.2%