18178 Commits

Author SHA1 Message Date
Manuel Huber
660e3bb653 gpu: Obsolete the NVIDIA initrd build
As the NVIDIA stack has shifted to using an image for both the
confidential and non-confidential variants, we retire the initrd
build.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
3.28.0
2026-03-16 21:29:58 -04:00
Aurélien Bombo
f8e234c6f9 Merge pull request #12650 from kata-containers/sprt/remove-csi
ci: Stop building/deploying CSI driver
2026-03-16 16:53:02 -05:00
Steve Horsman
294c367063 Merge pull request #12668 from manuelh-dev/release/3.28.0
release: Bump version to 3.28.0
2026-03-16 19:47:12 +00:00
Manuel Huber
5210584f95 release: Bump version to 3.28.0
Bump VERSION and helm-charts versions.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-03-16 09:52:35 -07:00
Manuel Huber
e13748f46d tests: Adapt trusted ephemeral storage test
With the new CDH version, the LUKS header is moved off of the disk
into guest memory. We hence adapt the test's filesystem type checks.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-03-16 09:43:17 -07:00
Manuel Huber
5bbc0abb81 tests: use pre-created, signed sealed secrets
With signature support for sealed secret, use pre-created signed
sealed secrets and provision the signing public key to the KBS.

Add instructions for re-creating these signed secrets.

Improve k8s-sealed-secrets.bats by reducing repeated kubectl logs
calls. A test run showed a SIGPIPE error one one of the grep-logs
while the printouts of the initial kubectl logs invocation showed
that the expected values were actually in the logs.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-03-16 09:43:17 -07:00
Manuel Huber
a9b222f91e gpu: Update chiseled rootfs with new CDH deps
With CDH requiring libcryptsetup, mkfs.ext4, dd, and their
dependencies, we will need to update the chiseled NVIDIA rootfs
accordingly.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-03-16 09:43:17 -07:00
Manuel Huber
169f92ff09 agent: cdh: Update CDH and API
With the new CDH version, the secure_mount API changes.
Further, the new CDH version no longer uses the luks-encrypt-storage
script but utilizes libcryptsetup as well as mkfs.ext4 and dd. Hence, adapt
some of the CDH and Kata components build steps

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-03-16 09:43:17 -07:00
Alex Lyn
ef5db0a01f Merge pull request #12607 from zvonkok/system-map
kernel: Ship System.map as part of the kernel build
2026-03-16 09:37:44 +08:00
Zvonko Kaiser
99f32de1e5 kata-deploy: Update RuntimeClass PodOverhead
Align the podOverhead with the default_memory updated
in the previous commit.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-03-15 09:53:32 -07:00
Zvonko Kaiser
6a853a9684 gpu: Bump NVRC
We have a new release add this one to the next
Kata release.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>

Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-03-15 09:53:32 -07:00
Zvonko Kaiser
8ff5d164c6 runtime: make CDI annotation vendor-agnostic with lookup table
Replace hardcoded NVIDIA vendor ID (0x10de) and class (0x030) checks
with a vendor-agnostic lookup table (cdiDeviceKind) that maps PCI
vendor/class pairs to CDI device kinds. This makes it straightforward
to add support for new device types by adding entries to the table.

Refactor siblingAnnotation to resolve device BDFs once upfront and
reuse them for both CDI type detection and sibling matching, eliminating
redundant sysfs reads. Devices not in the lookup table (e.g. NVSwitches)
are skipped with errNoSiblingFound, while known device types that fail
to match a sibling produce a hard error.

Consolidate the hot-plug and cold-plug device loops into a single loop
over extracted container paths, removing duplicated filtering logic.

Export GetPCIDeviceProperty from the device drivers package to allow
vendor/class lookup from sysfs in the container annotation path.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-03-15 09:53:32 -07:00
Zvonko Kaiser
d4c21f50b5 gpu: Bump default memory to 8G for GPU runtimes
We need enough inital memory to prepare more complex
platforms like HGX H100 or HGX B200 systems.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-03-15 09:53:32 -07:00
Zvonko Kaiser
5c9683f006 gpu: Remove devtmpfs.mount=0
With the newest NVRC release this is solved and does
not need to be overriden.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-03-15 09:53:32 -07:00
Zvonko Kaiser
d22c314e91 gpu: Increase dial_timeout=1200
For cold-plug when running with nerdctl the timeouts in the config
are being used, increase the dial_timeout (e.g. for CreateSandbox) to match
create_container_timeout.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-03-15 09:53:32 -07:00
Zvonko Kaiser
7fe84c8038 gpu: HGX Rootfs Fixes
Various smaller fixes to enable HGX systems.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-03-15 09:53:32 -07:00
Joji Mekkattuparamban
1fd66db271 nvidia-gpu: add missing libraries to rootfs
Added the missing packages to the nvidia rootfs.

Fixes #12534

Signed-off-by: Joji Mekkattuparamban <jojim@nvidia.com>
2026-03-13 16:24:32 -07:00
Dan Mihai
9332b75c04 Merge pull request #12661 from stevenhorsman/runtime-go-1.25.8
runtime: bump go.mod version
2026-03-13 14:06:08 -07:00
Zvonko Kaiser
d382379571 kernel: Ship System.map as part of the kernel build
Some use-cases need the System.map of the running kernel,
ship it via kernel-artifact.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-03-13 19:27:18 +00:00
Manuel Huber
4a7022d2f4 tests: nvidia: call genpolicy auth for all tests
Call the setup_genpolicy_registry_auth in run_kubernetes_nv_tests.sh.
Authenticate before exercising any tests.

Recently, we have seen UnauthorizedError messages for the CUDA
vectorAdd image. While this image is not gated behind authentication,
rate limiting may be a possible issue.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-03-13 09:03:01 -07:00
Zvonko Kaiser
4c450a5b01 Merge pull request #12648 from manuelh-dev/mahuber/trustee-upgrade
versions: bump trustee to latest version
2026-03-12 14:09:15 -04:00
Steve Horsman
7d2e18575c Merge pull request #12343 from zvonkok/release-model
doc: Release model update
2026-03-12 14:44:51 +00:00
Zvonko Kaiser
7f662662cf lint: Fix 80 char column size
Make markdownlint happy

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-03-12 12:03:29 +00:00
Zvonko Kaiser
6e03a95730 doc: Update Release Process
Add how Kata is doing the rolling release.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-03-12 12:03:29 +00:00
stevenhorsman
f25fa6ab25 runtime: bump go.mod version
Update the runtime's go.mod go version to 1.25.8 to
keep in sync with versions.yaml

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-12 08:53:40 +00:00
Steve Horsman
a29eb3751a Merge pull request #12517 from kata-containers/osv-scanner-bump-2.3.3
workflows: Bump OSV scanner
2026-03-12 08:48:52 +00:00
stevenhorsman
064a960aaa workflows: Bump OSV scanner
Bump to the latest version to pick up bug fixes

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-12 07:00:11 +00:00
Steve Horsman
f41edcb4c0 Merge pull request #12653 from kata-containers/dependabot/cargo/src/tools/agent-ctl/quinn-proto-0.11.14
build(deps): bump quinn-proto from 0.11.8 to 0.11.14 in /src/tools/agent-ctl
2026-03-12 06:53:59 +00:00
Manuel Huber
0926c92aa0 versions: bump trustee to latest version
Ingest various recent fixes and dependency updates.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-03-11 14:45:42 -07:00
Manuel Huber
8162d15b46 nvidia: fix invalid CTK reference
Use proper reference from versions yaml structure.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-03-11 12:49:29 -07:00
Aurélien Bombo
32444737b5 gatekeeper: Remove csi-kata-directvolume build from required tests
Since we don't build that anymore.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-03-11 12:55:23 -05:00
Aurélien Bombo
64aed13d5f Revert "ci: Add no-op step to compile CSI driver"
This reverts commit e43c59a2c6.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-03-11 12:55:23 -05:00
Aurélien Bombo
dd2c4c0db3 Revert "coco: ci: Add no-op steps to deploy CSI driver"
This reverts commit 5e4990bcf5.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-03-11 12:55:23 -05:00
Aurélien Bombo
d598e0baf1 Revert "ci: Implement build step for CSI driver"
This partially reverts commit fb87bf221f.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-03-11 12:55:23 -05:00
dependabot[bot]
d366d103cc build(deps): bump quinn-proto in /src/tools/agent-ctl
Bumps [quinn-proto](https://github.com/quinn-rs/quinn) from 0.11.8 to 0.11.14.
- [Release notes](https://github.com/quinn-rs/quinn/releases)
- [Commits](https://github.com/quinn-rs/quinn/compare/quinn-proto-0.11.8...quinn-proto-0.11.14)

---
updated-dependencies:
- dependency-name: quinn-proto
  dependency-version: 0.11.14
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-03-11 16:04:34 +00:00
Dan Mihai
04f180434e Merge pull request #12640 from burgerdev/genpolicy-workspace
genpolicy: add to Cargo workspace
2026-03-11 09:02:39 -07:00
Steve Horsman
ba0f5b98fe Merge pull request #12643 from stevenhorsman/bump-golang-to-1.25.8
versions: bump golang to 1.25.8
2026-03-11 08:53:21 +00:00
Markus Rudy
cf7d4c33b3 kata-deploy: fix binary location for genpolicy
Moving the genpolicy crate into the root workspace causes the build
outputs to go into the root workspace's target directory, instead of
src/tools/genpolicy/target, invalidating assumptions made by the
kata-deploy-binaries script.

This commit adds a special case for the lookup path of the genpolicy
binary, and fixes two bugs that made identifying this problem harder.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2026-03-11 09:30:48 +01:00
Markus Rudy
221a22bd7d genpolicy: ignore RUSTSEC-2024-0320
The yaml-rust dependency is unmaintained, but no suitable alternatives
exist. We log an exception for this now and will revisit the topic after
some time.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2026-03-11 09:30:48 +01:00
Markus Rudy
6643b258bb genpolicy: update oci-client to v0.16.1
The older version we used transitively depends on an unmaintained crate.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2026-03-11 09:30:48 +01:00
Markus Rudy
8dfeeea924 genpolicy: add to Cargo workspace
This commit adds the genpolicy utility to the root workspace. For now,
only dependencies that are already in the root workspace are consumed
from there, the genpolicy-specific ones should be added later.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2026-03-11 09:30:46 +01:00
Markus Rudy
fc4eaf8b66 runtime-rs: specify the subpackage to build
Before this change, `make test` for runtime-rs used to test all crates
in the root workspace (due to the `--all` flag). This was not intended
but happened to be mostly working. However, genpolicy needs additional
steps before it can build, so this behavior blocks adding genpolicy to
the root workspace.

The solution here is to only build the inteded packages. For the build
and run commands, this is the runtime-rs crate itself. For testing, we
need to include the sub-crates, too, which needs a bit of cargo metadata
scraping.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2026-03-11 09:28:24 +01:00
Aurélien Bombo
b6c60d9229 Merge pull request #10559 from sprt/conf-local-storage
coco: Implement trusted ephemeral data storage
2026-03-10 10:39:40 -05:00
Dan Mihai
f9a8eb6ecc genpolicy: allow_mount improvements for emptyDir
1. Reduce the complexity of the new allow_mount rules for emptyDir.

2. Reverse the order of the two allow_mount versions, as a hint to the
   rego engine that the first version is more often matching the input.

3. Remove `p_mount.source != ""` from mount_source_allows, because:
 - Policy rules typically test the values from input, not values read
   from Policy.
 - mount_source_allows is no longer called for emptyDir mounts after
   these changes, so p_mount.source is not empty.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2026-03-09 14:52:17 -05:00
Fabiano Fidêncio
374b0abe29 tests: Fix kubelet data dir for k0s in trusted ephemeral storage test
k0s uses /var/lib/k0s/kubelet instead of /var/lib/kubelet as its
kubelet data directory. Introduce get_kubelet_data_dir() in
tests_common.sh and use it in k8s-trusted-ephemeral-data-storage.bats
instead of hardcoding /var/lib/kubelet.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-09 14:52:17 -05:00
Aurélien Bombo
718632bfe0 build: Add artifacts to .gitignore
This adds various files that are generated during development.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-03-09 14:52:17 -05:00
Aurélien Bombo
68bdbef676 tests: Improve logging for some tests
Use modern test semantics to ease debugging.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-03-09 14:52:17 -05:00
Aurélien Bombo
3dd77bf576 tests: Introduce new env variables to ease development
It can be useful to set these variables during local testing:

 * AZ_REGION: Region for the cluster.
 * AZ_NODEPOOL_TAGS: Node pool tags for the cluster.
 * GENPOLICY_BINARY: Path to the genpolicy binary.
 * GENPOLICY_SETTINGS_DIR: Directory holding the genpolicy settings.

I've also made it so that tests_common.sh modifies the duplicated
genpolicy-settings.json (used for testing) instead of the original git-tracked
one.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-03-09 14:52:17 -05:00
Aurélien Bombo
aae54f704c ci: Stop deploying the CSI driver
The design moved away from CSI driver so stop deploying that.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-03-09 14:52:17 -05:00
Aurélien Bombo
a98e328359 tests: Add test for trusted ephemeral data storage
This tests the feature on CoCo machines.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-03-09 14:52:17 -05:00