2004 Commits

Author SHA1 Message Date
Fabiano Fidêncio
1c2d5cb57d Merge pull request #12848 from kata-containers/sprt/fix-block-vol-test
tests: make k8s-block-volume more robust
2026-04-21 11:27:43 +02:00
Dan Mihai
b2ea9a8fc6 Merge pull request #12460 from microsoft/danmihai1/k8s-openvpn-runtime
tests: annotations for all k8s-openvpn yaml files
2026-04-20 09:47:02 -07:00
stevenhorsman
c75c432c01 ci: Update TEE scope
`k8s-confidential.bats` technically doesn't need attestation, but only runs
on TEE hardware, so include it in the attestation list so we can test it in PRs

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-20 09:36:10 +01:00
stevenhorsman
7179e92142 tests/confidentials: Remove pointless skip
The skip conditional is wrong, but it's not needed as the setup
and teardown only allow confidential hardware anyway

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-20 09:36:10 +01:00
Fabiano Fidêncio
cf1e6f82f2 tests: Show full kata-deploy pod logs in CI
Remove --tail=N limits from `kubectl logs` for kata-deploy pods so
the complete output is visible in CI job logs for debugging.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-19 13:24:31 +02:00
Alex Lyn
c26f647a3a test: Improve process verification and robustness in kill test
During tests, one error as below:
```
..k8s-kill-all-process-in-container.bats: line 40: [: too many arguments
```
This commit aims to address such issue follows:
(1) Update process query command to "ps aux || ps" to ensure
  compatibility across different container images while maximizing
  process visibility.
(2) Use "[t]ail" in grep to reliably match the process without
  self-matching.
(3) Quote variable in assertion to resolve "too many arguments" bash
  error.
(4) Improve test reliability by ensuring the process list is actually
  visible to the verification logic.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-04-19 13:24:31 +02:00
Alex Lyn
f4f6c78e9e tests: Update expectation for no-layer-image test case
The 'no-layer-image' test case was failing because the underlying shim
returned a "unsupported rootfs mounts count" error instead of the
expected application-level "file not found" or "ENOENT" error.

This change updates the BATS test to accept the shim-level rootfs
validation error as a valid failure condition for this unsupported
image scenario, ensuring the CI remains green while reflecting
current runtime behavior.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-04-19 13:24:31 +02:00
Fabiano Fidêncio
cdd09c3c65 ci: enable erofs tests with runtime-rs
Now that erofs snapshotter has added , let's make sure this is tested.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-19 13:24:31 +02:00
Alex Lyn
27341f45f1 docs: Add how-to guide for using fsmerged EROFS rootfs with Kata
Document the end-to-end workflow for using the containerd EROFS
snapshotter with Kata Containers runtime-rs, covering containerd
configuration, Kata QEMU settings, and pod deployment examples
via crictl/ctr/Kubernetes.

Include prerequisites (containerd >= 2.2, runtime-rs main branch),
QEMU VMDK format verification command, architecture diagram,
VMDK descriptor format reference, and troubleshooting guide.

Note that Cloud Hypervisor, Firecracker, and Dragonball do not
support VMDK block devices and are currently unsupported for
fsmerged EROFS rootfs.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-04-19 13:24:31 +02:00
Fabiano Fidêncio
edfaeec316 tests: arm64: Skip tests which do not have a multi-arch image
The image used has some special (as weird) properties that are being
taking advantage of to implement policy related tests.

Changing the image is a no-go at this point, otherwise we break the
tests ... so let's just skip those for now.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-18 00:48:13 +02:00
Fabiano Fidêncio
35e48fdfd1 ci: run qemu-coco-dev-runtime-rs tests on arm64
Add qemu-coco-dev-runtime-rs to the arm64 k8s test matrix so that the
CoCo non-TEE configuration is exercised on aarch64 runners.

Also enable auto-generated policy for qemu-coco-dev on aarch64 (matching
the existing x86_64 behavior) and register the new job as a required
gatekeeper check.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-18 00:48:13 +02:00
Fabiano Fidêncio
588a67a3fb kata-deploy: add arm64 support for qemu-coco-dev shims
Add aarch64/arm64 to the list of supported architectures for
qemu-coco-dev and qemu-coco-dev-runtime-rs shims across kata-deploy
configuration, Helm chart values, and test helper scripts.

Note that guest-components and the related build dependencies are not
yet wired for arm64 in these configurations; those will be addressed
separately.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-18 00:48:13 +02:00
Dan Mihai
0828784a03 tests: k8s: fix add_annotations_to_yaml
Don't hard-code caller's "${K8S_TEST_YAML}" - use the local
"${yaml_file}" as intended.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2026-04-17 17:38:11 +00:00
Dan Mihai
4fc479cac9 tests: k8s-openvpn: runtime handler annotations
This test uses YAML files from a different directory than the other
k8s CI tests, so annotations have to be added into these separate
files.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2026-04-17 17:15:45 +00:00
Dan Mihai
7158148ab7 tests: k8s-openvpn: enable kata for init pod
Enable Kata for the init secrets pod of this test, to be consistent
with the other CI pods.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2026-04-17 17:10:09 +00:00
Fabiano Fidêncio
690f5a2b62 Merge pull request #12862 from fidencio/topic/runtime-rs-enable-measured-rootfs-tests
runtime-rs: enable measured rootfs for qemu-coco-dev-runtime-rs
2026-04-17 18:48:47 +02:00
Fabiano Fidêncio
1ec0e344e5 runtime-rs: enable measured rootfs for qemu-coco-dev-runtime-rs
Add kernel_verity_params to the qemu-coco-dev-runtime-rs configuration
so the runtime can assemble dm-verity kernel parameters, and remove the
test skip that was disabling measured rootfs tests for this hypervisor.

Fixes: #12851

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-17 15:22:17 +02:00
Fabiano Fidêncio
7205fd8579 tests: add integration tests for termination log via GetDiagnosticData
Add BATS tests for the GetDiagnosticData termination log feature on
CoCo platforms where shared_fs=none.

Three test cases cover:
- Successful exit (exit 0): termination message is propagated when
  GetDiagnosticDataRequest is allowed by policy.
- Failed exit (exit 1): termination message is propagated when
  GetDiagnosticDataRequest is allowed by policy.
- Policy denied: with default CoCo policy (GetDiagnosticDataRequest
  is false), the container stops cleanly but no termination message
  is propagated (best-effort behavior).

Tests are skipped on non-CoCo platforms where shared_fs is not "none".

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-17 13:16:25 +02:00
Steve Horsman
1db12f8ccf Merge pull request #12812 from stevenhorsman/tee-test-refactor
ci: Refactor confidential TEE support
2026-04-17 11:12:13 +01:00
Aurélien Bombo
5c7a886246 tests: make k8s-block-volume more robust
* Put the loop device creation code in the test itself, so that we get proper
  logs if that part fails (following other tests).

* Reuse the $node variable to fix the test on multi-node clusters.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-04-16 09:55:01 -05:00
Fabiano Fidêncio
88ce64819d Merge pull request #12726 from LandonTClipp/doc_annotations
docs: Add annotation config to doc site
2026-04-16 13:07:53 +02:00
stevenhorsman
ff246f9538 ci: Remove deploy_snapshotter
Snapshotter deployment is a no-op now that
kata-deploy handles this, so clean up this code.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-16 09:21:04 +01:00
stevenhorsman
fce6415865 tests: Use hypervisor helpers
Utilise the new hypervisor helpers in our CI and test
code to help add clarity and reduce duplication

Note: `kubernetes_dir` is declared as readonly in
tests/integration/kubernetes/setup.sh which is sourced
by tests_common.sh, so we update it to only be set if
unset

Assisted-by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-16 09:21:04 +01:00
stevenhorsman
2f3fec9727 tests: Add new hypervisor helper script
Add a pure shell script which the CI and integration tests can
use to check for different categories of runtime

Assisted-by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-16 09:21:04 +01:00
Aurélien Bombo
1602e04b2d ci: Change Azure region to eastus2
I'm doing some bookkeeping in the Azure subscription that requires we move
from eastus to eastus2. This should have no user-facing impact.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-04-15 14:37:13 -05:00
LandonTClipp
56cdfa831f docs: Add annotation config to doc site
Adding the pod annotation config to the doc site. A symlink is created
at docs/pod-annotations.md that points to
how-to/how-to-set-sandbox-config-kata.md so that the URL for this file will be
created at `/pod-annotations`. Also adding brief contrbuting guidelines and
how-to's for running the documentation site locally for local previews.

Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>
2026-04-15 14:48:01 +01:00
Fabiano Fidêncio
d29b77e953 tests: Update images used for signed tests
I've updaed the images on the Confidential Containers side, in order to
add arm64 support, but I didn't realize it'd break tests not using
those.

Apologies!

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-15 12:11:37 +02:00
Fabiano Fidêncio
4c567a9c05 ci: Reduce TEE test scope for PR runs
TEE hardware (TDX, SEV-SNP) is very limited in CI. Running the full
test suite on every PR consumes these resources unnecessarily, since
most tests exercises what is already exercised by the -coco-dev CIs.

Introduce a `tee-test-scope` workflow input (small/full) and a new
`baremetal-small-tee` K8S_TEST_HOST_TYPE that runs only the 12 tests
that are TEE-relevant: attestation tests (encrypted/authenticated/
signed image pull, confidential attestation) plus policy and trusted
ephemeral data storage tests.

PR runs default to "small" (12 tests), nightly runs use "full" (59
tests), and manual dispatch offers a dropdown to choose.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-13 20:26:46 +02:00
Manuel Huber
7daeb78b67 tests: nvidia: Enforce image signing for NIM test
Validate container image signatures for the NIM test using NVIDIA's
public signing key.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-04-11 09:22:50 +02:00
Fabiano Fidêncio
fd6375d8d5 Merge pull request #12806 from kata-containers/topic/ci-run-runtime-rs-on-SNP
ci: Run qemu-snp-runtime-rs tests in the CI
2026-04-10 11:01:20 +02:00
Fabiano Fidêncio
5e1ab0aa7d tests: Support runtime-rs QEMU cmdline format in attestation test
The k8s-confidential-attestation test extracts the QEMU command line
from journal logs to compute the SNP launch measurement. It only
matched the Go runtime's log format ("launching <path> with: [<args>]"),
but runtime-rs logs differently ("qemu args: <args>").

Handle both formats so the test works with qemu-snp-runtime-rs.

Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-09 16:35:08 +02:00
Fabiano Fidêncio
3b155ab0b1 ci: Run runtime-rs tests for SNP
As we're in the process to stabilise runtime-rs for the coming 4.0.0
release, we better start running as many tests as possible with that.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-09 16:35:08 +02:00
stevenhorsman
31f9a5461b versions: bump golang to 1.25.9
Bump the go version to resolve CVEs:
- GO-2026-4947
- GO-2026-4946
- GO-2026-4870
- GO-2026-4869
- GO-2026-4865
- GO-2026-4864

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-09 08:59:40 +01:00
Hyounggyu Choi
f15f7f49f1 Merge pull request #12787 from fidencio/topic/runtime-rs-qemu-arm64-use-static-sandbox-resource-mgmt
runtime: qemu: Enable static sandbox resource management on ARM & s390x
2026-04-09 09:18:11 +02:00
Amanda Liem
79f844d057 runtime: SNP img-based rootfs with dm-verity
Follow-on to kata-containers/kata-containers#12396

Switch SNP config from initrd-based to image-based rootfs with
dm-verity. The runtime assembles the dm-mod.create kernel cmdline
from kernel_verity_params, and with kernel-hashes=on the root hash
is included in the SNP launch measurement.

Also add qemu-snp to the measured rootfs integration test.

Signed-off-by: Amanda Liem <aliem@amd.com>
2026-04-08 16:46:32 +00:00
Fabiano Fidêncio
e93bfbe01a tests: Remove qemu-coco-dev* skip from sandbox vCPU allocation test
With static_sandbox_resource_mgmt calculation fixed for runtime-rs, the
VM is correctly pre-sized at creation time. The vCPU allocation test no
longer depends on CPU hotplug, so the qemu-coco-dev* skip is no longer
needed.

Fixes: #10928

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-08 16:36:00 +02:00
Fabiano Fidêncio
6bc2452664 tests: Remove aarch64 skip from sandbox vCPU allocation test
With static_sandbox_resource_mgmt now enabled for ARM on runtime-rs,
the VM is correctly pre-sized at creation time. The vCPU allocation
test no longer depends on CPU hotplug, so the aarch64 skip (issue
 #10928) is no longer needed.

Fixes: #10928

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-08 16:36:00 +02:00
Fabiano Fidêncio
b3ae6ef99c Merge pull request #12760 from fitzthum/bump-nvat
Bump trustee and guest-components to add nvswitch / ppcie support
2026-04-07 19:07:50 +02:00
Aurélien Bombo
79fab93041 Merge pull request #12779 from rophy/fix/strip-cr-from-tty-exec
tests: strip \r from kubectl exec output for TTY containers
2026-04-07 10:19:21 -05:00
Tobin Feldman-Fitzthum
e40abcf72d nvidia: add nvrc.smi.srs=1 to default nvidia kernel params
The attestation-agent no longer sets nvidia devices to ready
automatically. Instead, we should use nvrc for this. Since this is
required for all nvidia workloads, add it to the default nv kernel
params.

With bounce buffers, the timing of attesting a device versus setting it
to ready is not so important.

Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
2026-04-07 14:28:50 +00:00
Tobin Feldman-Fitzthum
7385938c57 tests: fix default KBS Policy path
We recently moved the default policy in the Trustee repo. Now it's in
the same place as all the other policies. Update the test code to match.

Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
2026-04-07 05:46:27 +00:00
Rophy Tsai
f7d9024249 tests: strip \r from kubectl exec output for TTY containers
The busybox-pod.yaml test fixture sets tty: true on the second
container. When a container has a TTY, kubectl exec may return
\r\n line endings. The invisible \r causes string comparisons
to fail:

  container_name=$(kubectl exec ... -- env | grep CONTAINER_NAME)
  [ "$container_name" == "CONTAINER_NAME=second-test-container" ]

This comparison fails because $container_name contains a trailing
\r character.

Fix by piping through tr -d '\r' after grep. This is harmless
when \r is absent and fixes the mismatch when present.

Fixes: #9136

Signed-off-by: Rophy Tsai <rophy@users.noreply.github.com>
2026-04-07 01:35:10 +00:00
Dan Mihai
9b770793ba Merge pull request #12728 from manuelh-dev/mahuber/empty-dir-fsgrou-policy
genpolicy: adjust GID after passwd GID handling and set fs_group for encrypted emptyDir volumes
2026-04-06 10:22:34 -07:00
Fabiano Fidêncio
1300145f7a tests: add k3s/rke2 to OCI 1.3.0 drop-in overlay condition
k3s and rke2 ship containerd 2.2.2, which requires the OCI 1.3.0
drop-in overlay. Move them from the separate OCI 1.2.1 branch into
the OCI 1.3.0 condition alongside nvidia-gpu, qemu-snp, qemu-tdx,
and custom container engine versions.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-06 18:50:20 +02:00
llink5
f7878cc385 runtime: fix Docker 26+ networking by rescanning after Start
Docker 26+ configures container networking (veth pair, IP addresses,
routes) after task creation rather than before. Kata's endpoint scan
runs during CreateSandbox, before the interfaces exist, resulting in
VMs starting without network connectivity (no -netdev passed to QEMU).

Add RescanNetwork() which runs asynchronously after the Start RPC.
It polls the network namespace until Docker's interfaces appear, then
hotplugs them to QEMU and informs the guest agent to configure them
inside the VM.

Additional fixes:
- mountinfo parser: find fs type dynamically instead of hardcoded
  field index, fixing parsing with optional mount tags (shared:,
  master:)
- IsDockerContainer: check CreateRuntime hooks for Docker 26+
- DockerNetnsPath: extract netns path from libnetwork-setkey hook
  args with path traversal protection
- detectHypervisorNetns: verify PID ownership via /proc/pid/cmdline
  to guard against PID recycling
- startVM guard: rescan when len(endpoints)==0 after VM start

Fixes: #9340

Signed-off-by: llink5 <llink5@users.noreply.github.com>
2026-04-02 21:23:16 +02:00
Manuel Huber
dd868dee6d tests: nvidia: onboard NIM service test
Onboard a test case for deploying a NIM service using the NIM
operator. We install the operator helm chart on the fly as this is
a fast operation, spinning up a single operand. Once a NIM service
is scheduled, the operator creates a deployment with a single pod.

For now, the TEE-based flow uses an allow-all policy. In future
work, we strive to support generating pod security policies for the
scenario where NIM services are deployed and the pod manifest is
being generated on the fly.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-04-02 16:58:54 +02:00
Manuel Huber
57e42b10f1 tests: nvidia: Do not use elevated privileges
Do not run the NIM containers with elevated privileges. Note that,
using hostPath requires proper host folder permissions, and that
using emptyDir requires a proper fsGroup ID.
Once issue 11162 is resolved, we can further refine the securityContext
fields for the TEE manifests.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-04-01 10:23:26 -07:00
Manuel Huber
a762b136de tests: generate policy for pod-empty-dir-fsgroup
The logic in the k8s-empty-dirs.bats file missed to add a security
policy for the pod-empty-dir-fsgroup.yaml manifest. With this change,
we add the policy annotation.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-04-01 10:23:26 -07:00
Fabiano Fidêncio
2131147360 tests: add kata-deploy lifecycle tests for restart resilience and cleanup
Add functional tests that cover two previously untested kata-deploy
behaviors:

1. Restart resilience (regression test for #12761): deploys a
   long-running kata pod, triggers a kata-deploy DaemonSet restart via
   rollout restart, and verifies the kata pod survives with the same
   UID and zero additional container restarts.

2. Artifact cleanup: after helm uninstall, verifies that RuntimeClasses
   are removed, the kata-runtime node label is cleared, /opt/kata is
   gone from the host filesystem, and containerd remains healthy.

3. Artifact presence: after install, verifies /opt/kata and the shim
   binary exist on the host, RuntimeClasses are created, and the node
   is labeled.

Host filesystem checks use a short-lived privileged pod with a
hostPath mount to inspect the node directly.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-01 15:20:53 +02:00
Fabiano Fidêncio
8b9ce3b6cb tests: remove k3s/rke2 V3 containerd template workaround
Remove the workaround that wrote a synthetic containerd V3 config
template for k3s/rke2 in CI. This was added to test kata-deploy's
drop-in support before the upstream k3s/rke2 patch shipped. Now that
k3s and rke2 include the drop-in imports in their default template,
the workaround is no longer needed and breaks newer versions.

Removed:
- tests/containerd-config-v3.tmpl (synthetic Go template)
- _setup_containerd_v3_template_if_needed() and its k3s/rke2 wrappers
- Calls from deploy_k3s() and deploy_rke2()

This reverts the test infrastructure part of a2216ec05.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-01 14:24:55 +02:00