mirror of https://github.com/kata-containers/kata-containers.git synced 2026-07-01 14:38:33 +00:00

Files

Fabiano Fidêncio f763e9cca9 tests: Add NUMA topology / GPU placement tests to the NV CIs

Add k8s-nvidia-numa.bats with five tests that validate NUMA behaviour
on hosts where NUMA is configured by default (qemu-nvidia-gpu,
qemu-nvidia-gpu-snp, qemu-nvidia-gpu-tdx):

1. Multi-node sandbox (large workload spanning all host NUMA nodes):
   - Guest NUMA node count matches host
   - Guest vCPU distribution is balanced across nodes (max-min <= 1)
   - Guest memory is distributed across NUMA nodes
   - Host-side vCPU pinning is balanced across NUMA nodes

2. Right-sized single-node sandbox (small workload fitting one node):
   - Guest collapses to a single NUMA node
   - All host vCPU threads pinned to that one NUMA node

3. GPU passthrough with VFIO, multi-node:
   - Guest NUMA topology is balanced (same as test 1)
   - Guest GPU's NUMA node matches the host GPU's NUMA node
     (resolved via the vfio-pci,host=<BDF> from the QEMU command
     line and /sys/bus/pci/devices/<BDF>/numa_node)
   - QEMU command line contains pxb-pcie and policy=bind
   - Host vCPU pinning is balanced

4. GPU passthrough with VFIO, right-sized single-node: small workload
   plus GPU that fits in a single host NUMA node:
   - Guest collapses to a single NUMA node
   - The chosen node is the GPU's host NUMA node, not just any node
     that fits — verified by matching host-nodes= in the memory
     backend and pxb-pcie numa_node= against the GPU's host node
   - Guest GPU reports the same NUMA node as the host GPU

5. Explicit numa_mapping in the runtime TOML (QEMU-only):
   - Drops a config.d/ fragment that sets numa_mapping = ["1"], so the
     auto-derive + right-sizing path is bypassed entirely
   - Guest sees exactly 1 NUMA node
   - QEMU memory backend is bound to host node 1 (host-nodes=1,
     policy=bind), not host node 0
   - Host-side vCPU threads land on host node 1
   - Drop-in is removed on teardown so subsequent tests are unaffected

Guest-side checks use a dedicated container image
(quay.io/kata-containers/numa) that reads sysfs and prints results to
stdout — no kubectl exec or CoCo policy overrides needed.

Host-side checks (crictl, pgrep, taskset) run directly on the host
via sudo; a standalone numa-pinning-check.sh script handles the vCPU
thread affinity inspection.  The config.d/ helpers used by test 5 are
runtime-agnostic (probe Go vs runtime-rs layout on disk) but the test
is gated to qemu-* shims since runtime-rs does not yet implement
NUMA.

Skips cleanly on single-NUMA hosts, unsupported hypervisors, or when
no nvidia.com/pgpu resources are available (GPU tests only).

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>

2026-05-24 22:00:46 +02:00

assets/images

docs: Move to mkdocs-material, port Helm to docs site

2026-03-20 14:51:39 -05:00

design

ci: rename cloud-hypervisor to clh-runtime-rs

2026-04-28 10:58:01 -05:00

how-to

tests: Add NUMA topology / GPU placement tests to the NV CIs

2026-05-24 22:00:46 +02:00

install

docs: Remove the dedicated installation guide for runtime-rs

2026-03-29 19:17:03 +02:00

presentations

doc: Fix uninlined_format_args in examples

2025-12-22 19:49:27 +00:00

threat-model

docs: Spelling updates

2026-03-19 10:22:54 +00:00

use-cases

docs: Add NUMA support guide for Kata Containers with QEMU

2026-05-24 22:00:46 +02:00

.nav.yml

docs: Add NUMA support guide for Kata Containers with QEMU

2026-05-24 22:00:46 +02:00

Blog-Post-Submission-Guide.md

tests: Make editorconfig-checker happy

2026-02-10 21:58:28 +01:00

code-pr-advice.md

static-checks: Delete kata-spell-check

2026-03-19 10:22:54 +00:00

Debug-shim-guide.md

debugging: adding a script and instructions for debugging the GO shim

2024-05-14 11:12:31 +02:00

Developer-Guide.md

docs: Rename run-kata-with-k8s with adding crio

2026-03-29 19:17:03 +02:00

doc-contributing.md

docs: Add annotation config to doc site

2026-04-15 14:48:01 +01:00

Dockerfile

docs: Move to mkdocs-material, port Helm to docs site

2026-03-20 14:51:39 -05:00

Documentation-Requirements.md

static-checks: Delete kata-spell-check

2026-03-19 10:22:54 +00:00

helm-configuration.md

ci: rename cloud-hypervisor to clh-runtime-rs

2026-04-28 10:58:01 -05:00

hypervisors.md

docs: Move to mkdocs-material, port Helm to docs site

2026-03-20 14:51:39 -05:00

index.md

docs: Move to mkdocs-material, port Helm to docs site

2026-03-20 14:51:39 -05:00

installation.md

docs: Move to mkdocs-material, port Helm to docs site

2026-03-20 14:51:39 -05:00

Licensing-strategy.md

docs: Change location of static checks script

2023-12-15 17:13:02 -08:00

Limitations.md

docs: require user/group/fsGroup/supplementalGroups

2026-03-02 23:48:36 +01:00

Makefile

docs: merge documentation repository

2020-06-23 21:27:23 -07:00

pod-annotations.md

docs: Add annotation config to doc site

2026-04-15 14:48:01 +01:00

prerequisites.md

docs: Move to mkdocs-material, port Helm to docs site

2026-03-20 14:51:39 -05:00

README.md

doc: Add MSRV comments to toolchain guidance

2026-04-16 12:06:46 +01:00

Release-Process.md

docs: Update release process notes

2026-03-19 15:14:23 -07:00

requirements.txt

fix: add click 8.3.3 to docs requirements

2026-05-13 10:11:58 +01:00

runtime-configuration.md

runtime: disable virtiofsd extra-args annotation by default

2026-05-09 13:21:39 +02:00

Toolchain-Guidance.md

doc: Add MSRV comments to toolchain guidance

2026-04-16 12:06:46 +01:00

tracing.md

tests: Make editorconfig-checker happy

2026-02-10 21:58:28 +01:00

Unit-Test-Advice.md

doc: Fix uninlined_format_args in examples

2025-12-22 19:49:27 +00:00

Upgrading.md

docs: Adjust release documentation

2024-03-27 12:41:48 +01:00

README.md

Documentation

The Kata Containers documentation repository hosts overall system documentation, with information common to multiple components.

For details of the other Kata Containers repositories, see the repository summary.

Getting Started

Installation guides: Install and run Kata Containers with Docker or Kubernetes

Tracing

See the tracing documentation.

More User Guides

Upgrading: how to upgrade from Clear Containers and runV to Kata Containers and how to upgrade an existing Kata Containers system to the latest version.
Limitations: differences and limitations compared with the default Docker runtime, runc.