Commit Graph

17705 Commits

Author SHA1 Message Date
dependabot[bot]
c164f8dc55 build(deps): bump github.com/containerd/containerd/api in /src/runtime
Bumps [github.com/containerd/containerd/api](https://github.com/containerd/containerd) from 1.8.0 to 1.9.0.
- [Release notes](https://github.com/containerd/containerd/releases)
- [Changelog](https://github.com/containerd/containerd/blob/main/RELEASES.md)
- [Commits](https://github.com/containerd/containerd/compare/api/v1.8.0...api/v1.9.0)

---
updated-dependencies:
- dependency-name: github.com/containerd/containerd/api
  dependency-version: 1.9.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-01-21 11:43:14 +00:00
Greg Kurz
cf3441bd2c agent: Refresh Cargo.lock
Downstream builders at Red Hat complain that `Cargo.lock` doesn't match
`Cargo.toml`.

Run `cargo check` to refresh `Cargo.lock`.

`git bisect` shows that 7cfb97d41b is the first commit where
`cargo check` has an effect in `src/agent`.

Signed-off-by: Greg Kurz <groug@kaod.org>
2026-01-20 14:44:47 +01:00
Fabiano Fidêncio
e0158869b1 tests: Add common bats test runner function
Add run_bats_tests() function to common.bash that provides consistent
test execution and reporting across all test suites (k8s, nvidia,
kata-deploy).

This removes duplicated test runner code from run_kubernetes_tests.sh,
run_kubernetes_nv_tests.sh, and run-kata-deploy-tests.sh.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-20 12:31:55 +01:00
Fabiano Fidêncio
5aff81198f helm-chart: Fix warnings on README
nydus -> `nydus`
erofs -> `erofs`

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-19 22:41:50 +01:00
Fabiano Fidêncio
b5a986eacf kata-deploy: Add runtime-rs TDX / SNP runtimeclasses
https://github.com/kata-containers/kata-containers/pull/11534 has been
merged and it added all the needed bits to deploy the QEMU SNP / TDX
runtime-rs variants, apart from the kata-deploy additions, which is done
by this PR.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-19 22:41:50 +01:00
Fabiano Fidêncio
c7570427d2 tests: Add report generation to NVIDIA tests
The NVIDIA GPU test runner script was not generating test reports,
causing the report_tests() function in gha-run.sh to have nothing
to display. This aligns the script with run_kubernetes_tests.sh by:

- Adding set -o pipefail for proper pipeline error handling
- Creating a reports directory with timestamped subdirectory
- Capturing test output to files with ok-/not_ok- prefixes
- Adding --timing flag to bats for timing information

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-19 18:21:43 +01:00
Fabiano Fidêncio
c1216598e8 static-checks: Fix kata-deploy reference
Let's just point to the official documentation rather than explaining
exactly how to deploy (and the current text was very outdated).

Removing fluentd / minikube examples is out of context of this commit.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-19 15:09:20 +01:00
Fabiano Fidêncio
96e1fb4ca6 tools: Remove runk
The runk tool hasn't been supported for a few years, with no maintainers
since ManaSugi stopped being involved in the project and the CI was
disabled in 2024.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-19 14:43:53 +01:00
Fabiano Fidêncio
f68c25de6a kata-deploy: Switch to the rust version
Let's remove the script and rely only on the rust version from now on.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-19 14:07:49 +01:00
Fabiano Fidêncio
d7aa793dde Revert "ci: Run a nightly job using the kata-deploy rust"
This reverts commit 6130d7330f, as we're
officially swithcing to the rust version of kata-deploy.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-19 14:07:49 +01:00
Fabiano Fidêncio
17472f3f10 release: scripts: Accept KATA_TOOLS_STATIC_TARBALL env var
a2534e7bc8 introduced the logic to also
release a kata-tools tarball, but it missed allowing
KATA_TOOLS_STATIC_TARBALL env var to be passed to the release script,
leading to the following error during the release process:
```
ERROR: Invalid environment variable "KATA_TOOLS_STATIC_TARBALL"
```

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
3.25.0
2026-01-19 13:03:23 +01:00
Fabiano Fidêncio
882862d711 release: Bump version to 3.25.0
Bump VERSION and helm-charts versions.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-19 11:33:45 +01:00
XanderC
93beb58c5d runtime: fix network initialization for non-hotplug VMMs
In startVM(), for VMMs without hotplug support (e.g., Firecracker or
QEMU microvm), the runtime runs prestart hooks but misses rescanning
the network namespace. This causes VMs to boot with uninitialized
network configs, as updates from CNI plugins are not captured.

This patch adds a network rescan via AddEndpoints after prestart hooks
for the non-hotplug path, ensuring correct network info is passed to
the VMM configuration before the VM starts.

Fixes #11500

Signed-off-by: XanderC <xanderc@qq.com>
2026-01-17 23:56:59 +01:00
Zvonko Kaiser
428cc5d586 gpu: Chroot Cleanup
With the newest NVRC we do not need the supported GPUs
anymore.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-01-17 19:27:24 +01:00
Fabiano Fidêncio
1c154b4c15 kernel: Add DAX fix for arm64
The patch has been provided upstream by Seunguk Shin and is already
approved.

We'll drop it once it becomes available in the LTS tree.

Reference:
https://lore.kernel.org/all/18af3213-6c46-4611-ba75-da5be5a1c9b0@arm.coum

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-17 19:15:53 +01:00
Fabiano Fidêncio
33b1f0786e Revert "arm64: Do not use DAX with the rootfs image"
This reverts commit 2acb94ef2d, as we have
a kernel patch approved fixing the issue.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-17 19:15:53 +01:00
Alex Lyn
fe15f2fa47 runtime-rs: Remove deprecated virtio-9p
The virtio-9p is not supported for a long time, specially within
the runtime-rs, we have no such plan to support it. Removal of the
related items is reasonable.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-17 18:52:57 +01:00
Alex Lyn
b7cfc6fd72 runtime-rs: Remove mem-agent section from TDX/SNP configurations
As Memory Agent feature is not used within CoCo(TDX/SNP) scenarios,
with this fact, it's better to just remove the related sections.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-17 18:52:57 +01:00
Alex Lyn
634ec2b56d runtime-rs: Add configurable SNP items in Makefile when make build
It aims to introduce some related items within Makefile to enable
Intel SNP settings in configuration when do make build. And make it
possible to generate the rendered qemu-snp-runtime-rs configuration
based on the *.in template.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-17 18:52:57 +01:00
Alex Lyn
0abdb8e016 runtime-rs: Introduce a qemu-runtime-rs/SEV-SNP dedicated configuration
To make it work well on the SEV-SNP platforms for qemu-runtime-rs with
coco, a dedicated SEV-SNP configuration should be introduced to help
prepare related CVM resources.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-17 18:52:57 +01:00
Alex Lyn
b0a82f7bb8 runtime-rs: Enable measured rootfs within configuration when make build
Enable measured rootfs within configuration when make build. And add
some other important items to make the configuration work well.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-17 18:52:57 +01:00
Alex Lyn
3799855040 runtime-rs: Add configurable TDX items in Makefile when make build
It aims to introduce some related items within Makefile to enable
Intel TDX settings in configuration when do make build. And make it
possible to generate the rendered qemu-tdx-runtime-rs configuration
based on the *.in template.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-17 18:52:57 +01:00
Alex Lyn
4d55e2c8c8 runtime-rs: Introduce a dedicated configuration for qemu-runtime-rs/TDX
To make it work well on the TDX platforms for qemu-runtime-rs with
coco, a dedicated TDX configuration should be introduced to help
prepare related CVM resources.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-17 18:52:57 +01:00
Manuel Huber
956f43c6c6 runtime: skip MoveTo for systemd cgroups
Systemd-managed cgroups use the slice:prefix:name format, which is
not a filesystem path. Calling MoveTo() on such paths fails with
"invalid group path" and can abort cleanup before Delete() runs.
In some cases, this causes pod teardown delays.
Skip MoveTo for systemd-formatted sandbox/overhead cgroup paths when
sandbox_cgroup_only is true; systemd moves tasks on unit deletion.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-01-16 16:41:38 +01:00
Manuel Huber
6b70923e55 docs: Update NVIDIA GPU passthrough QEMU scenario
With cold-plug becoming by design the only supported mode with the
update of NVRC to v0.1.1, resolving references to hot-plug.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-01-16 13:50:10 +01:00
Steve Horsman
610a8bdfd5 Merge pull request #12346 from Amulyam24/ppc64le-payload
ci: move the job publish kata payload after push to an alternate runner for ppc64le
2026-01-16 11:41:53 +00:00
Fabiano Fidêncio
ea18f543b4 tests: kata-deploy: Enable verification during helm install
Enable post-install verification in kata-deploy CI tests. When
HELM_VERIFY_DEPLOYMENT is set, a simple verification pod is created
that runs with the Kata runtime to confirm deployment succeeded.

The verification pod prints kernel info and exits - success indicates
the Kata runtime is properly configured and functional.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-16 10:52:43 +01:00
Fabiano Fidêncio
a188f04d75 kata-deploy: helm: Add optional post-install verification
Add optional verification that runs after kata-deploy installation.
When a pod spec is provided via --set-file verification.pod=<file>,
a verification job runs after install/upgrade to validate deployment.

The user is fully responsible for the verification pod content:
- Pod name, runtimeClassName, annotations, and verification logic
- Pod must exit 0 on success, non-zero on failure

The verification job simply:
1. Waits for kata-deploy DaemonSet to be ready
2. Applies the user-provided pod spec
3. Waits for the pod to complete
4. Shows logs and cleans up

Usage:
  helm install kata-deploy ... \
    --set-file verification.pod=/path/to/your-pod.yaml

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-16 10:52:43 +01:00
Amulyam24
859313d904 ci: move the job payload after push to an alternate runner for ppc64le
To unlock the release, move the job to publish kata payload after push to an alternate runner(IBM owned) for ppc64le.

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2026-01-16 11:14:42 +05:30
Alex Lyn
c0cca81993 runtime-rs: Set default_bridges with 0 for dragonball vmm
As Dragonball VMM does not support PCI hotplug options, it should
be set 0.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-15 20:32:15 +01:00
Alex Lyn
1a76d44e16 kata-types: Chanage the default bridges with 1
It aims to align it with the Makefile and configuration's
setting.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-15 20:32:15 +01:00
Alex Lyn
6375b3881d runtime-rs: Set the default bridges with default 1
As runtime-go use the default bridges with 1, it should be
kept as 1 to avoid alignment issues.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-15 20:32:15 +01:00
Alex Lyn
8728b262fb Merge pull request #12338 from zvonkok/nvrc-update
gpu: Bump NVRC Version
2026-01-15 19:36:07 +08:00
Zvonko Kaiser
adce41c432 gpu: Bump NVRC Version
The new NVRC version works for CC and non-CC use cases,
no --feature confidential needed anymore.

Bump versions.yaml and adjust deployment instructions.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-01-15 01:51:10 +00:00
Manuel Huber
6753c3ac08 runtime: nvidia: Disable NVDIMM
Disable NVDIMM. When using GPU passthrough, using NVDIMM would create
a r/o file-backed memory region. When using a GPU, QEMU tries to DMA-
map guest memory for the device, resulting in a mapping error:
memory listener initialization failed: Region mem0:
vfio_container_dma_map ... -22 (Invalid argument).
For the CC configs, NVDIMM is disabled by default in qemu_amd64.go
with a warning, but we also explicitly disable the setting in the
shim configuration file.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-01-14 22:51:07 +01:00
Fabiano Fidêncio
a9dda0e52b versions: nvidia: Bump kernel to the latest LTS
As now that we have the decoupled rootfs / kernel, doing the bump
becomes trivial.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-14 20:45:54 +01:00
Fabiano Fidêncio
4e99860fd2 workflows: nvidia: Adjust to kernel / roots build decouple
We don't need to store the kernel headers anymore. We do need to store
the kernel modules, instead.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-14 20:45:54 +01:00
Zvonko Kaiser
02d2b6bdf2 kernel: bump kata_config_version
We have kernel build changes bump the config version

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-01-14 20:45:54 +01:00
Zvonko Kaiser
a075c3740a gpu: build_image.sh use versions.yaml
We've done some bad file based driver determination,
now with versions.yaml there is a single source of truth.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-01-14 20:45:54 +01:00
Zvonko Kaiser
ffc8725164 gpu: rootfs update decoupling
Remove all the driver build instructions,
sicne those are now done in the kernel target.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-01-14 20:45:54 +01:00
Zvonko Kaiser
cca973772d gpu: deploy modules for kernel build
We need to package the build modules for the rootfs
to be able to consume it. We package the whole
/lib/modules/$(uname -r)  directory strip=2.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-01-14 20:45:54 +01:00
Zvonko Kaiser
13ed3cdff9 gpu: Add NVIDA modules to build-kernel.sh
Checkout and build the kernel modules along
with the kernel to avoid the kernel rootfs dependency.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-01-14 20:45:54 +01:00
Zvonko Kaiser
2a11910acb gpu: Remove building of Headers
Since we build along the kernel we do not need to
carry over the headers to the rootfs build.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-01-14 20:45:54 +01:00
Zvonko Kaiser
b1870fef07 gpu: versions.yaml nvidia driver pinning
We want to have deterministic behaviour and only
one valid driver version acceptable via versions.yaml

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-01-14 20:45:54 +01:00
Zvonko Kaiser
229481b348 kernel: bugfix install yq
We actually never installed yq to the kernel build,
there are  some path that use yq but were never hit,
for the GPU use-case we need to read values from versions.yaml

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-01-14 20:45:54 +01:00
Steve Horsman
6db3a4cf8d Merge pull request #12333 from fitzthum/bump-v0180
Update Trustee and guest-components for upcoming releases
2026-01-14 19:44:55 +00:00
Tobin Feldman-Fitzthum
ca29e68acb agent-ctl: bump image-rs version
In preparation for coco v0.18.0, bump the version of image-rs we use in
agent-ctl to match what we have in versions.yaml.

Drop the snapshotter-overlayfs feature. This was dropped from image-rs
when we removed enclave-cc support.

Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
2026-01-14 06:54:29 -08:00
Tobin Feldman-Fitzthum
25a08ef739 versions: bump Trustee and guest-components
Before cutting the Kata release that will be used with CoCo v0.18.0,
let's bump the versions of Trustee and guest-components to latest.

Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
2026-01-14 06:43:30 -08:00
Steve Horsman
0f5f914a04 Merge pull request #12330 from LandonTClipp/docs_improvement
docs: Navigation improvements and bug fixes to Pages
2026-01-14 14:13:29 +00:00
stevenhorsman
70e3e2b0c9 genpolicy: Bump openssl-src
This is a vulnerability (CVE-2025-9230) in openssl, so move
to 3.5.4 which has a fix for this

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-01-14 14:05:48 +01:00