mirror of
https://github.com/kata-containers/kata-containers.git
synced 2026-07-01 14:38:33 +00:00
Add k8s-nvidia-numa.bats with five tests that validate NUMA behaviour
on hosts where NUMA is configured by default (qemu-nvidia-gpu,
qemu-nvidia-gpu-snp, qemu-nvidia-gpu-tdx):
1. Multi-node sandbox (large workload spanning all host NUMA nodes):
- Guest NUMA node count matches host
- Guest vCPU distribution is balanced across nodes (max-min <= 1)
- Guest memory is distributed across NUMA nodes
- Host-side vCPU pinning is balanced across NUMA nodes
2. Right-sized single-node sandbox (small workload fitting one node):
- Guest collapses to a single NUMA node
- All host vCPU threads pinned to that one NUMA node
3. GPU passthrough with VFIO, multi-node:
- Guest NUMA topology is balanced (same as test 1)
- Guest GPU's NUMA node matches the host GPU's NUMA node
(resolved via the vfio-pci,host=<BDF> from the QEMU command
line and /sys/bus/pci/devices/<BDF>/numa_node)
- QEMU command line contains pxb-pcie and policy=bind
- Host vCPU pinning is balanced
4. GPU passthrough with VFIO, right-sized single-node: small workload
plus GPU that fits in a single host NUMA node:
- Guest collapses to a single NUMA node
- The chosen node is the GPU's host NUMA node, not just any node
that fits — verified by matching host-nodes= in the memory
backend and pxb-pcie numa_node= against the GPU's host node
- Guest GPU reports the same NUMA node as the host GPU
5. Explicit numa_mapping in the runtime TOML (QEMU-only):
- Drops a config.d/ fragment that sets numa_mapping = ["1"], so the
auto-derive + right-sizing path is bypassed entirely
- Guest sees exactly 1 NUMA node
- QEMU memory backend is bound to host node 1 (host-nodes=1,
policy=bind), not host node 0
- Host-side vCPU threads land on host node 1
- Drop-in is removed on teardown so subsequent tests are unaffected
Guest-side checks use a dedicated container image
(quay.io/kata-containers/numa) that reads sysfs and prints results to
stdout — no kubectl exec or CoCo policy overrides needed.
Host-side checks (crictl, pgrep, taskset) run directly on the host
via sudo; a standalone numa-pinning-check.sh script handles the vCPU
thread affinity inspection. The config.d/ helpers used by test 5 are
runtime-agnostic (probe Go vs runtime-rs layout on disk) but the test
is gated to qemu-* shims since runtime-rs does not yet implement
NUMA.
Skips cleanly on single-NUMA hosts, unsupported hypervisors, or when
no nvidia.com/pgpu resources are available (GPU tests only).
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
Documentation
The Kata Containers documentation repository hosts overall system documentation, with information common to multiple components.
For details of the other Kata Containers repositories, see the repository summary.
Getting Started
- Installation guides: Install and run Kata Containers with Docker or Kubernetes
Tracing
See the tracing documentation.
More User Guides
- Upgrading: how to upgrade from Clear Containers and runV to Kata Containers and how to upgrade an existing Kata Containers system to the latest version.
- Limitations: differences and limitations compared with the default Docker runtime,
runc.
How-to guides
See the how-to documentation.
Kata Use-Cases
- GPU Passthrough with Kata
- SR-IOV with Kata
- Intel QAT with Kata
- SPDK vhost-user with Kata
- Intel SGX with Kata
- IBM Crypto Express passthrough with Confidential Containers
Developer Guide
Documents that help to understand and contribute to Kata Containers.
Design and Implementations
- Kata Containers Architecture: Architectural overview of Kata Containers
- Kata Containers CI: Kata Containers CI document
- Kata Containers E2E Flow: The entire end-to-end flow of Kata Containers
- Kata Containers design: More Kata Containers design documents
- Kata Containers threat model: Kata Containers threat model
How to Contribute
- Developer Guide: Setup the Kata Containers developing environments
- How to contribute to Kata Containers
- Code of Conduct
- How to submit a blog post
Help Writing a Code PR
Help Writing Unit Tests
Help Improving the Documents
Code Licensing
- Licensing: About the licensing strategy of Kata Containers.
The Release Process
Presentations
Website Changes
If you have a suggestion for how we can improve the website, please raise an issue (or a PR) on the repository that holds the source for the website.