Commit Graph

18408 Commits

Author SHA1 Message Date
Fabiano Fidêncio
80b0ed273f Merge pull request #12784 from hgowda-amd/sev-snp-tests-required
Add sev-snp, qemu-snp CIs as required
2026-04-09 00:22:49 +02:00
Harshitha Gowda
bb1165b23f tests: Set sev-snp, qemu-snp CIs as required
run-k8s-tests-on-tee (sev-snp, qemu-snp)

Signed-off-by: Harshitha Gowda <hgowda@amd.com>
2026-04-08 22:36:58 +02:00
Fabiano Fidêncio
2148afe243 Merge pull request #12796 from fidencio/topic/kata-deploy-run-cargo-fmt-and-cargo-check
kata-deploy: Run cargo clippy during build
2026-04-08 22:32:31 +02:00
Fabiano Fidêncio
8ff630059a Merge pull request #12778 from amd-aliem/enable-img-rootfs-snp
runtime: SNP img-based rootfs with dm-verity
2026-04-08 22:06:31 +02:00
Fabiano Fidêncio
4561ae3e29 Merge pull request #12799 from fitzthum/fixup-nv-doc-1
docs: update flow for setting nvidia devices to ready
2026-04-08 21:32:55 +02:00
Tobin Feldman-Fitzthum
9119b4982c docs: update flow for setting nvidia devices to ready
Now, we include the nvrc.smi.srs=1 flag in the default kernel cmdline.
Thus, we can remove the guidance for people to add it themselves when
not using attestation. In fact, users don't really need to know about
this flag at all.

Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
2026-04-08 18:59:51 +00:00
Fabiano Fidêncio
21466eb4e5 kata-deploy: Fix clippy warnings across crate
Fix all clippy warnings triggered by -D warnings:

- install.rs: remove useless .into() conversions on PathBuf values
  and replace vec! with an array literal where a Vec is not needed
- utils/toml.rs: replace while-let-on-iterator with a for loop and
  drop the now-unnecessary mut on the iterator binding
- main.rs: replace match-with-single-pattern with if-let in two
  places dealing with experimental_setup_snapshotter
- utils/yaml.rs: extract repeated serde_yaml::Value::String key into
  a local variable, removing needless borrows on temporary values

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-08 20:47:59 +02:00
Fabiano Fidêncio
1874d4617b kata-deploy: Run cargo clippy during build
Ensure code formatting and compilation are verified early in the
Docker build pipeline, before tests and the release build.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-08 20:47:59 +02:00
Amanda Liem
79f844d057 runtime: SNP img-based rootfs with dm-verity
Follow-on to kata-containers/kata-containers#12396

Switch SNP config from initrd-based to image-based rootfs with
dm-verity. The runtime assembles the dm-mod.create kernel cmdline
from kernel_verity_params, and with kernel-hashes=on the root hash
is included in the SNP launch measurement.

Also add qemu-snp to the measured rootfs integration test.

Signed-off-by: Amanda Liem <aliem@amd.com>
2026-04-08 16:46:32 +00:00
Greg Kurz
817580e35d Merge pull request #12795 from fidencio/topic/kata-deploy-do-not-try-to-install-a-snapshotter-when-using-crio
kata-deploy: Skip snapshotter install/uninstall on CRI-O
2026-04-08 17:18:05 +02:00
Fabiano Fidêncio
bb051bb16a Merge pull request #12788 from fidencio/topic/kata-deploy-re-apply-GPU-specific-labels
kata-deploy: re-apply labels for the GPU runtime classes
2026-04-08 16:27:59 +02:00
Fabiano Fidêncio
bacc3f4ef1 Merge pull request #12785 from fidencio/topic/runtime-rs-deny-config
runtime-rs: Deny config of unknown fields & change dbg_monitor_socket name
2026-04-08 15:12:53 +02:00
Fabiano Fidêncio
f27def1a5b kata-deploy: Skip snapshotter install/uninstall on CRI-O
Snapshotters (nydus, erofs) are containerd-specific. The validation code
already warned that EXPERIMENTAL_SETUP_SNAPSHOTTER would be ignored on
CRI-O, but the actual install/configure and uninstall loops still ran
unconditionally, attempting containerd-specific operations on CRI-O
nodes.

Guard both the install and cleanup snapshotter loops with a `runtime !=
"crio"` check so the binary itself skips snapshotter work when it
detects CRI-O as the container runtime.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-08 14:41:49 +02:00
Fabiano Fidêncio
bc719a66eb kata-deploy: nvidia: Align force_guest_pull with default values.yaml
The defdault is already false, but let's keep those aligned on
explicitly setting the default.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-08 14:41:21 +02:00
Fabiano Fidêncio
78f02f2155 kata-deploy: nvidia: Align labels with default values.yaml
Joji's added the labels for the default values.yaml, but we missed
adding those to the nvidia specific values.yaml file.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-08 14:41:21 +02:00
Fabiano Fidêncio
f00b589ccd Revert "kata-deploy: Temporarily comment GPU specific labels"
This reverts commit 02c9a4b23c, as GPU
Operator v26.3.0 is out, and becomes a requirement.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-08 14:41:21 +02:00
Alex Lyn
c00f895338 kata-deploy: Fix noisy caused by unformatted code
When do cargo fmt --all, some files changes as unformatted with
`cargo fmt`. This commit is just to address it.

Just use this as an example:
```
         // Generate the common drop-in files (shared with standard
         // runtimes)
-        write_common_drop_ins(config, &runtime.base_config,
         &config_d_dir, container_runtime)?;
+        write_common_drop_ins(
+            config,
+            &runtime.base_config,
+            &config_d_dir,
+            container_runtime,
+        )?;
```

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-04-08 14:39:57 +02:00
Fabiano Fidêncio
6269b3ecde Merge pull request #12792 from fidencio/topic/nvidia-rootfs-take-nvrc-and-nvat-versions-in-consideration
build: cache: Take NVRC & NVAT version into consideration
2026-04-08 12:44:41 +02:00
Fabiano Fidêncio
a12e0f1204 build: cache: Take NVRC & NVAT version into consideration
Without those, we'd end up pulling the same / old rootfs that's cached
without re-building it in case of a bump in any of those components.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-08 10:14:11 +02:00
RuoqingHe
a4fb9aef54 Merge pull request #12789 from kata-containers/pin-actions-rs-toolchain
gha: Pin action for cargo-deny workflow
2026-04-08 08:36:13 +08:00
Fabiano Fidêncio
995767330d Merge pull request #12782 from pavithiran34/pavi-ras-version-update
fix: updated image-rs to v0.18.0
2026-04-07 23:32:05 +02:00
Aurélien Bombo
8916f5f301 gha: Pin action for cargo-deny workflow
The cargo-deny workflow should be the last workflow to not use a pinned version.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-04-07 15:41:09 -05:00
pavithiran34
528fa80953 fix: updated image-rs to v0.18.0
- Updated image-rs from rev 026694d4 to tag v0.18.0
- This update brings rsa 0.9.10 which fixes CVE-2026-21895
- Resolves vulnerability in indirect dependencies

Signed-off-by: pavithiran34 <pavithiran.p@ibm.com>
2026-04-07 21:40:01 +02:00
Fabiano Fidêncio
b3ae6ef99c Merge pull request #12760 from fitzthum/bump-nvat
Bump trustee and guest-components to add nvswitch / ppcie support
2026-04-07 19:07:50 +02:00
Aurélien Bombo
79fab93041 Merge pull request #12779 from rophy/fix/strip-cr-from-tty-exec
tests: strip \r from kubectl exec output for TTY containers
2026-04-07 10:19:21 -05:00
Tobin Feldman-Fitzthum
e40abcf72d nvidia: add nvrc.smi.srs=1 to default nvidia kernel params
The attestation-agent no longer sets nvidia devices to ready
automatically. Instead, we should use nvrc for this. Since this is
required for all nvidia workloads, add it to the default nv kernel
params.

With bounce buffers, the timing of attesting a device versus setting it
to ready is not so important.

Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
2026-04-07 14:28:50 +00:00
Manuel Huber
0fd4559f7e docs: Update NVIDIA GPU passthrough QEMU scenario
Updates for the NVIDIA GPU passthrough scenario for the
kata-containers release 3.29.0.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-04-07 14:58:40 +02:00
Fabiano Fidêncio
9a5aaf7ecb runtime-rs: move create_container_timeout before [mem_agent] section
The create_container_timeout key was placed after the
[agent.@PROJECT_TYPE@.mem_agent] TOML section header, which meant
TOML parsed it as a field of mem_agent rather than of the parent
agent table. This was silently ignored before, but now that
MemAgent has #[serde(deny_unknown_fields)] it causes a parse error.

Move the key above the [mem_agent] section so it belongs to the
correct [agent.@PROJECT_TYPE@] table.

Also fix configuration-qemu-coco-dev which had a duplicate entry:
keep only the correctly placed one with the COCO timeout value.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-07 11:23:59 +02:00
Fabiano Fidêncio
a6e891e733 runtime-rs: s/dbg_monitor_socket/extra_monitor_socket/g
Let's align this with what's been already used for the go runtime.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-07 10:50:42 +02:00
Paul Meyer
b32c5234f4 runtime-rs: deny unknown fields in config
..where possible. Failing on unknown fields makes migration easier,
as we do not silently ignore configuration options that previously
worked in runtime-go. However, serde can't deny unknown fields
where flatten is used, so this can't be used everywhere sadly.

There were also errors in test fixtures that were unnoticed.
These are fixed here, too.

Signed-off-by: Paul Meyer <katexochen0@gmail.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-07 10:50:25 +02:00
Tobin Feldman-Fitzthum
7385938c57 tests: fix default KBS Policy path
We recently moved the default policy in the Trustee repo. Now it's in
the same place as all the other policies. Update the test code to match.

Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
2026-04-07 05:46:27 +00:00
Tobin Feldman-Fitzthum
38e04bb6d8 versions: bump guest-components for switch attestation
Pick up the new version of guest-components which uses NVAT bindings
instead of NVML bindings. This will allow us to attests guests with
nvswitches.

Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
2026-04-07 05:46:27 +00:00
RuoqingHe
feaec78ad0 Merge pull request #12776 from fidencio/topic/kata-deploy-move-into-the-root-workspace
kata-deploy: Move into the root workspace
2026-04-07 12:45:26 +08:00
Fabiano Fidêncio
461907918d kata-deploy: pin nydus-snapshotter via versions.yaml
Resolve externals.nydus-snapshotter version and url in the Docker image build
with yq from the repo-root versions.yaml instead of Dockerfile ARG defaults.

Drop the redundant workflow that only enforced parity between those two sources.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-07 10:07:06 +08:00
Fabiano Fidêncio
9e1f595160 kata-deploy: add Rust binary to root workspace
Add tools/packaging/kata-deploy/binary as a workspace member, inherit shared
dependency versions from the root manifest, and refresh Cargo.lock.

Build the kata-deploy image from the repository root: copy the workspace
layout into the rust-builder stage, run cargo test/build with -p kata-deploy,
and adjust artifact and static asset COPY paths. Update the payload build
script to invoke docker buildx with -f .../Dockerfile from the repo root.

Add a repo-root .dockerignore to keep the Docker build context smaller.
Document running unit tests with cargo test -p kata-deploy from the root.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-07 10:07:06 +08:00
Rophy Tsai
f7d9024249 tests: strip \r from kubectl exec output for TTY containers
The busybox-pod.yaml test fixture sets tty: true on the second
container. When a container has a TTY, kubectl exec may return
\r\n line endings. The invisible \r causes string comparisons
to fail:

  container_name=$(kubectl exec ... -- env | grep CONTAINER_NAME)
  [ "$container_name" == "CONTAINER_NAME=second-test-container" ]

This comparison fails because $container_name contains a trailing
\r character.

Fix by piping through tr -d '\r' after grep. This is harmless
when \r is absent and fixes the mismatch when present.

Fixes: #9136

Signed-off-by: Rophy Tsai <rophy@users.noreply.github.com>
2026-04-07 01:35:10 +00:00
Alex Lyn
46a7b9e75d Merge pull request #12775 from RuoqingHe/put-libs-to-root-workspace
libs: Move libs into root workspace
2026-04-07 09:25:26 +08:00
Tobin Feldman-Fitzthum
3d60196735 versions: bump Trustee to pickup PPCIE support
Trustee is compatible with old guest components (using NVML bindings) or
new guest components (using NVAT). If we have the new version of gc, we
can attest PPCIE guests, which we need the new version of Trustee to
verify.

Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
2026-04-06 17:51:12 +00:00
Tobin Feldman-Fitzthum
0444d70704 rootfs: add runtime support for NVAT
Update NVIDIA rootfs builder to include runtime dependencies for NVAT
Rust bindings.

The nvattest package does not include the .so file, so we need to build
from source.

Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
2026-04-06 17:51:12 +00:00
Tobin Feldman-Fitzthum
78c61459f8 packaging: add built-time support for NVAT
The attestation agent will soon rely on the NVAT rust bindings, which
have some built-time dependencies.

There is currently no nvattest-dev package, so we need to build from
source to get the headers and .so file.

Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
2026-04-06 17:51:12 +00:00
Dan Mihai
9b770793ba Merge pull request #12728 from manuelh-dev/mahuber/empty-dir-fsgrou-policy
genpolicy: adjust GID after passwd GID handling and set fs_group for encrypted emptyDir volumes
2026-04-06 10:22:34 -07:00
Fabiano Fidêncio
47770daa3b helm: Align values.yaml with try-kata-nvidia-gpu.values.yaml
We've switched to nydus there, but never did for the values.yaml.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-06 18:51:54 +02:00
Fabiano Fidêncio
1300145f7a tests: add k3s/rke2 to OCI 1.3.0 drop-in overlay condition
k3s and rke2 ship containerd 2.2.2, which requires the OCI 1.3.0
drop-in overlay. Move them from the separate OCI 1.2.1 branch into
the OCI 1.3.0 condition alongside nvidia-gpu, qemu-snp, qemu-tdx,
and custom container engine versions.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-06 18:50:20 +02:00
Fabiano Fidêncio
0a739b3b55 Merge pull request #12755 from katexochen/runtime-rs-config-cleanup
runtime-rs: cleanup config
2026-04-06 13:14:58 +02:00
Ruoqing He
cb7c790dc7 libs: Specify crates explicitly in Makefile
--all option would trigger building and testing for everything within
our root workspace, which is not desired here. Let's specify the crates
of libs explicitly in our Makefile.

Signed-off-by: Ruoqing He <ruoqing.he@lingcage.com>
2026-04-06 11:03:38 +02:00
Ruoqing He
2a024f55d0 libs: Move libs into root workspace
Remove libs from exclude list, and move them explicitly into root
workspace to make sure our core components are in a consistent state.

This is a follow up of #12413.

Signed-off-by: Ruoqing He <ruoqing.he@lingcage.com>
2026-04-06 11:03:38 +02:00
Fabiano Fidêncio
9a2825a429 runtime: config: Use OVMF for the qemu-nvidia-gpu
2ba0cb0d4a7 did the ground work for using OVMF even for the
qemu-nvidia-gpu, but missed actually setting the OVMF path to be used,
which we'e fixing now.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-06 03:54:56 +02:00
Fabiano Fidêncio
e1fae11509 Merge pull request #12392 from Apokleos/enhance-tdx
runtime-rs: Enhance TDX in qemu
2026-04-05 20:54:43 +02:00
Alex Lyn
35cafe8715 runtime-rs: configure TDX machine options with kernel_irqchip=split
When TDX confidential guest support is enabled, set `kernel_irqchip=split`
for TDX CVM:
...
-machine \
   q35,accel=kvm,kernel_irqchip=split,confidential-guest-support=tdx \
...

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-04-05 10:18:47 +02:00
Fabiano Fidêncio
f074ceec6d Merge pull request #12682 from PiotrProkop/fix-direct-io-kata
runtime-rs: fix setting directio via config file
2026-04-03 16:11:57 +02:00