Commit Graph

1086 Commits

Author SHA1 Message Date
Fabiano Fidêncio
59f487d7ab do-not-merge: tests/cri-containerd: temporarily use containerd fork with getRuncOptions fix
The cri-containerd integration tests fail with the shim sandboxer when
running non-runc runtimes (e.g. Kata). The root cause is a bug in
containerd's client/task.go: getRuncOptions() unconditionally tries to
unmarshal the container's stored runtimeOptions into containerd.runc.v1.Options,
but Kata containers store runtimeoptions.v1.Options. This causes:

  failed to create containerd task: failed to get runtime v2 options:
  can't unmarshal type "runtimeoptions.v1.Options" to output
  "containerd.runc.v1.Options"

A fix has been submitted upstream. Until it is merged and released,
clone containerd from the fork that carries the fix so that
`make cri-integration` (which builds and runs its own containerd daemon)
picks up the corrected binary.

TODO: revert once the fix is in an upstream containerd release and
versions.yaml is updated accordingly.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-06 17:16:19 +01:00
Fabiano Fidêncio
1f9260d978 tests: exclude TestContainerRestart from the cri-containerd test list
Creating a new container in the same sandbox VM after the previous
container has exited and been removed has never been supported by
kata-containers (neither with the go-based nor the rust-based runtime).
When the last container is removed the kata VM shuts down, so any
attempt to start a new container in the same sandbox fails.

This test exercises a use-case kata does not currently support, and it
has never been part of the passing list for good reason.  Mark it
explicitly excluded with a comment so it is clear this is a deliberate
omission rather than an oversight.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-06 17:15:57 +01:00
Fabiano Fidêncio
458a64e9b9 tests: Use podsandbox sandboxer for the runc sanity check
The check_daemon_setup function verifies that containerd + runc are
functional before the real kata tests run. Using the shim sandboxer
for this runc check hits a known containerd bug where the OCI spec is
not populated before NewBundle is called, so config.json is never
written and containerd-shim-runc-v2 fails at startup.

See https://github.com/containerd/containerd/issues/11640

The sandboxer choice is irrelevant for this sanity check, so use
podsandbox which works correctly with runc.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-06 11:25:53 +01:00
Alex Lyn
a35dcf952e ci: Fix YAML parsing flakiness caused by mktemp random suffixes
In some CI runs, `mktemp` generates random characters that accidentally
form file extensions like `.cSV` or `.Xml`. This triggers downstream
parsing errors because the YAML content is misidentified as CSV/XML.
The issues look like as below:
```
'/tmp/bats-run-KodZEA/.../pod-guest-pull-in-trusted-storage.yaml.in.cSV':
...
```

This commit fixes the issue by:
1. Moving the `XXXXXX` placeholder before the `.yaml` extension.
2. Ensuring the generated file always ends in `.yaml`.

This prevents format misidentification while maintaining filename
uniqueness and security.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-03-06 09:21:29 +08:00
Fabiano Fidêncio
8f35c31b30 Merge pull request #12542 from fidencio/topic/genpolicy-distribute-different-settings-rather-than-patching-for-ci
genpolicy: settings.d drop-ins and scenario example drop-ins
2026-03-05 07:37:30 +01:00
Fabiano Fidêncio
b5e0a5b7d6 Merge pull request #12555 from fidencio/topic/tests-use-local-pv-pvc-for-policy-tests
k8s-policy-pvc: use local PV/PVC when no default StorageClass exists
2026-03-05 07:37:11 +01:00
Fabiano Fidêncio
a0b9d965e5 k8s-policy-pvc: use local PV/PVC when no default StorageClass exists
Create local block storage (loop device, StorageClass, PV) in the test
only when the cluster has no default StorageClass, matching the approach
used in k8s-volume.bats. Set our StorageClass as default so the PVC
binds to our PV; tear it down after the test.

When a default already exists (e.g. AKS), skip creation and cleanup so
we do not change the cluster's default storage class.

Fixes: #9846

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-04 21:50:51 +01:00
Fabiano Fidêncio
d40afe592c genpolicy: add settings drop-in directory and RFC 6902 JSON Patch support
Allow genpolicy -j to accept a directory instead of a single file.
When given a directory, genpolicy loads genpolicy-settings.json from it
and applies all genpolicy-settings.d/*.json files (sorted by name) as
RFC 6902 JSON Patches. This gives precise control over settings with
explicit operations (add, remove, replace, move, copy, test), including
array index manipulation and assertions.

Ship composable drop-in examples in drop-in-examples/:
- 10-* files set platform base settings (non-CoCo, AKS, CBL-Mariner)
- 20-* files overlay specific adjustments (OCI version, guest pull)
Users copy the combination they need into genpolicy-settings.d/.

Replace the old adapt_common_policy_settings_* jq-patching functions
in tests_common.sh with install_genpolicy_drop_ins(), which copies the
right combination of 10-* and 20-* drop-ins for the CI scenario.
Tests still generate 99-test-overrides.json on the fly for per-test
request/exec overrides.

Packaging installs 10-* and 20-* drop-ins from drop-in-examples/ into
the tarball; the default genpolicy-settings.d/ is left empty.

Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-04 20:13:21 +01:00
Dan Mihai
3f845af9d4 tests: k8s: basic test for subPathExpr
Add basic genpolicy test coverage for subPathExpr and corresponding
container mounts.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2026-03-04 16:28:29 +00:00
Fabiano Fidêncio
56c3618c1d tests: kata-deploy: wait for API recovery after uninstall
kata-deploy's SIGTERM cleanup restarts the CRI runtime, which on
k3s/rke2 takes down the API server temporarily. The helm uninstall
may complete with errors, and the next test suite would start with
a dead API. Add a wait loop after uninstall to ensure the API is
available before proceeding.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-04 11:26:31 +01:00
Fabiano Fidêncio
9725df658f tests: k8s: policy: set OCI bundle 1.2.1 for k3s/rke2
k3s and rke2 use containerd that expects OCI bundle 1.2.1; otherwise
autogenerated policy tests fail. Add adapt_common_policy_settings_for_k3s_rke2
and call it from adapt_common_policy_settings when KUBERNETES is k3s or rke2.

Tested with k3s v1.34.4+k3s1, rke2 v1.34.4+rke2r1.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-03 12:55:10 +01:00
stevenhorsman
66e58d6490 tests: Delete install_go.sh
Having a script to install go is legacy from Jenkins, so
delete it, so there is less code in our repo.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-27 12:42:43 +00:00
Hyounggyu Choi
be5ae7d1e1 Merge pull request #12573 from BbolroC/support-memory-hotplug-go-runtime-s390x
runtime: Support memory hotplug via virtio-mem on s390x
2026-02-27 09:59:40 +01:00
Aurélien Bombo
2a13f33d50 Merge pull request #12565 from microsoft/danmihai1/clh-51.1
versions: update cloud hypervisor to v51.1
2026-02-26 07:52:57 -06:00
Hyounggyu Choi
b1847f9598 tests: Run TestContainerMemoryUpdate() on s390x only with virtio-mem
Let's run `TestContainerMemoryUpdate` on s390x
only when virtio-mem is enabled.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2026-02-26 14:21:34 +01:00
Steve Horsman
da0ca483b0 Merge pull request #12572 from fitzthum/bump-trustee
versions: bump Trustee to latest version
2026-02-26 08:48:37 +00:00
Dan Mihai
2361dc7ca0 tests: k8s: reinstate testing on mariner hosts
Reinstate mariner host testing - including the Agent Policy tests on
these hosts - now that a new CLH version brought in the required fixes.

This reverts commit ea53779b90.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2026-02-25 21:01:25 +00:00
Tobin Feldman-Fitzthum
b4b5db2f1c tests: fixup SNP attestation test for new Trustee version
Trustee now returns the binary SNP TCB claims as hex rather than base64
(for consistency with other platforms). Fortunately, the sev-snp-measure
tool has a flag for setting the output type of the launch digest.

I think hex is the default, but let's keep the flag here to be explicit.

Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
2026-02-25 09:57:36 -08:00
Steve Horsman
a655605e8f Merge pull request #12566 from manuelh-dev/mahuber/fail-exp-timeout
tests: Extend fail timeout for failure test
2026-02-25 16:11:53 +00:00
Hyounggyu Choi
3d71be3dd3 tests: Improve assertion handling for runtime-rs hypervisor
Since runtime-rs added support for virtio-blk-ccw on s390x in #12531,
the assertion in k8s-guest-pull-image.bats should be generalized
to apply to all hypervisors ending with `-runtime-rs`.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2026-02-24 11:48:07 +01:00
Manuel Huber
566bb306f1 tests: enable policy for openvpn on nydus
Specify runAsUser, runAsGroup, supplementalGroups values embedded
in the image's /etc/group file explicitly in the security context.
With this, both genpolicy and containerd, which in case of using
nydus guest-pull, lack image introspection capabilities, use the
same values for user/group/additionalG IDs at policy generation
time and at runtime when the OCI spec is passed.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-24 08:08:15 +01:00
Fupan Li
0bfb6b3c45 Merge pull request #12531 from BbolroC/blkdev-hotplug-s390x-runtime-rs
runtime-rs: Support for block device hotplug on s390x
2026-02-24 13:03:59 +08:00
Fabiano Fidêncio
a0d954cf7c tests: Enable auto-generated policies for experimental_force_guest_pull
We want to run with auto-generated policies when using experimental_force_guest_pull.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-23 22:15:18 +01:00
Manuel Huber
e15c18f05c tests: Extend fail timeout for failure test
Extend the timeout for the assert_pod_fail function call for the
test case "Test we cannot pull a large image that pull time exceeds
createcontainer timeout inside the guest" when the experimental
force guest-pull method is being used. In this method, the image is
first pulled on the host before creating the pod sandbox. While
image pull times can suddenly spike, we already time out in the
assert_pod_fail function before the image is even pulled on the
host.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-23 12:56:23 -08:00
Hyounggyu Choi
4e533f82e7 tests: Remove skip condition for runtime-rs on s390x in k8s-block-volume
This commit removes the skip condition for qemu-runtime-rs on s390x
in k8s-block-volume.bats.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2026-02-23 09:00:29 +01:00
Fabiano Fidêncio
a6b7a2d8a4 tests: assert_pod_fail accept RunContainerError and StartError
Treat waiting.reason RunContainerError and terminated.reason StartError/Error
as container failure, so tests that expect guest image-pull failure (e.g.
wrong credentials) pass when the container fails with those states instead
of only BackOff.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-21 08:44:47 +01:00
Fabiano Fidêncio
42d980815a tests: skip k8s-policy-pvc on non-AKS
Otherwise it'll fail as we cannot bind the device.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-21 08:44:47 +01:00
Fabiano Fidêncio
1b9b53248e tests: k8s: coco: rely more on free runners
Run all CoCo non-TEE variants in a single job on the free runner with an
explicit environment matrix (vmm, snapshotter, pull_type, kbs,
containerd_version).

Here we're testing CoCo only with the "active" version of containerd.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-21 08:44:47 +01:00
Fabiano Fidêncio
1fa3475e36 tests: k8s: rely more on free runners
We were running most of the k8s integration tests on AKS. The ones that
don't actually depend on AKS's environment now run on normal
ubuntu-24.04 GitHub runners instead: we bring up a kubeadm cluster
there, test with both containerd lts and active, and skip attestation
tests since those runtimes don't need them. AKS is left only for the
jobs that do depend on it.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-21 08:44:47 +01:00
Zvonko Kaiser
6d1eaa1065 Merge pull request #12461 from manuelh-dev/mahuber/guest-pull-bats
tests: enable more scenarios for k8s-guest-pull-image.bats
2026-02-20 08:48:54 -05:00
Dan Mihai
ea53779b90 ci: k8s: temporarily disable mariner host
Disable mariner host testing in CI, and auto-generated policy testing
for the temporary replacements of these hosts (based on ubuntu), to work
around missing:

1. cloud-hypervisor/cloud-hypervisor@0a5e79a, that will allow Kata
   in the future to disable the nested property of guest VPs. Nested
   is enabled by default and doesn't work yet with mariner's MSHV.
2. cloud-hypervisor/cloud-hypervisor@bf6f0f8, exposed by the large
   ttrpc replies intentionally produced by the Kata CI Policy tests.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2026-02-19 20:42:50 +01:00
Manuel Huber
fd340ac91c tests: remove skips for some guest-pull scenarios
Issue 10838 is resolved by the prior commit, enabling the -m
option of the kernel build for confidential guests which are
not users of the measured rootfs, and by commit
976df22119, which ensures
relevant user space packages are present.
Not every confidential guest has the measured rootfs option
enabled. Every confidential guest is assumed to support CDH's
secure storage features, in contrast.

We also adjust test timeouts to account for occasional spikes on
our bare metal runners (e.g., SNP, TDX, s390x).

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-19 10:10:55 -08:00
Steve Horsman
6a67250397 Merge commit from fork
runtime-go/rs: Disable virtio-pmem for Cloud Hypervisor
2026-02-19 09:00:56 +00:00
Chiranjeevi Uddanti
88203cbf8d tests: Add regression test for sandbox_cgroup_only=false
Add unit test for get_ch_vcpu_tids() and integration test that creates
a pod with sandbox_cgroup_only=false to verify it starts successfully.

Signed-off-by: Chiranjeevi Uddanti <244287281+chiranjeevi-max@users.noreply.github.com>
Co-authored-by: Antigravity <antigravityagent@google.com>
2026-02-18 20:20:14 +01:00
Aurélien Bombo
8ff9cd1f12 Merge pull request #12455 from ajaypvictor/secret-cm-without-sharedfs
ci: Add integration tests for secret & configmap propagation
2026-02-18 12:06:48 -06:00
Aurélien Bombo
336b922d4f tests/cbl-mariner: Stop disabling NVDIMM explicitly
This is not needed anymore since now disable_image_nvdimm=true for
Cloud Hypervisor.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-02-18 11:52:51 -06:00
Dan Mihai
eee25095b5 tests: mariner annotations for k8s-openvpn
This test uses YAML files from a different directory than the other
k8s CI tests, so annotations have to be added into these separate
files.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2026-02-18 07:17:04 +01:00
Manuel Huber
d3742ca877 tests: enable guest pull bats for force guest pull
Similar to k8s-guest-pull-image-authenticated and to
k8s-guest-pull-image-signature, enabling k8s-guest-pull-image to
run against the experimental force guest pull method.
Only k8s-guest-pull-image-encrypted requires nydus.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-17 12:44:50 -08:00
Ajay Victor
83935e005c ci: Add integration tests for secret & configmap propagation
Enhance k8s-configmap.bats and k8s-credentials-secrets.bats to test that ConfigMap and Secret updates propagate to volume-mounted pods.

- Enhanced k8s-configmap.bats to test ConfigMap propagation
  * Added volume mount test for ConfigMap consumption
  * Added verification that ConfigMap updates propagate to volume-mounted pods

- Enhanced k8s-credentials-secrets.bats to test Secret propagation
  * Added verification that Secret updates propagate to volume-mounted pods

Fixes #8015

Signed-off-by: Ajay Victor <ajvictor@in.ibm.com>
2026-02-14 08:56:21 +05:30
Fabiano Fidêncio
2930c68c0b ci: tdx: properly skip k8s-sandbox-vcpus-allocation.bats
This is a follow-up for 25962e9325

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-13 20:56:08 +01:00
Fabiano Fidêncio
8cb7d0be9d tests: nvidia: Fix genpolicy error when pulling nvcr.io images
genpolicy pulls image manifests from nvcr.io to generate policy and was
failing with 'UnauthorizedError' because it had no registry credentials.

Genpolicy (src/tools/genpolicy) uses docker_credential::get_credential()
in registry.rs, which reads from DOCKER_CONFIG/config.json. Add
setup_genpolicy_registry_auth() to create a Docker config with nvcr.io
auth (NGC_API_KEY) and set DOCKER_CONFIG before running genpolicy so it
can authenticate when pulling manifests.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-13 13:12:55 +01:00
Mikko Ylinen
25962e9325 tests/coco: disable k8s-sandbox-vcpus-allocation.bats for TDX
After the move to Linux 6.17 and QEMU 10.2 from Kata,
k8s-sandbox-vcpus-allocation.bats started failing on TDX.

2026-02-10T16:39:39.1305813Z # pod/vcpus-less-than-one-with-no-limits created
2026-02-10T16:39:39.1306474Z # pod/vcpus-less-than-one-with-limits created
2026-02-10T16:39:39.1307090Z # pod/vcpus-more-than-one-with-limits created
2026-02-10T16:39:39.1307672Z # pod/vcpus-less-than-one-with-limits condition met
2026-02-10T16:39:39.1308373Z # timed out waiting for the condition on pods/vcpus-less-than-one-with-no-limits
2026-02-10T16:39:39.1309132Z # timed out waiting for the condition on pods/vcpus-more-than-one-with-limits
2026-02-10T16:39:39.1310370Z # Error from server (BadRequest): container "vcpus-less-than-one-with-no-limits" in pod "vcpus-less-than-one-with-no-limits" is waiting to start: ContainerCreating

A manual test without agent policies added it seems to work OK but disable
the test for now to get CI stable.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2026-02-11 22:02:59 +01:00
Hyounggyu Choi
c84e37f6ac Merge pull request #12486 from BbolroC/cpu-hotplug-s390x-runtime-rs
runtime-rs: Skip sockets and threads for hotplug_vcpus on Z/P
2026-02-11 09:40:21 +01:00
Hyounggyu Choi
67f54bdcb5 tests: Remove skip condition for runtime-rs on s390x in k8s-cpu-ns
This commit removes the skip condition for qemu-runtime-rs on s390x
in k8s-cpu-ns.bats.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2026-02-11 05:52:13 +01:00
Fabiano Fidêncio
5c0269881e tests: Make editorconfig-checker happy
- Trim trailing whitespace and ensure final newline in non-vendor files
- Add .editorconfig-checker.json excluding vendor dirs, *.patch, *.img,
  *.dtb, *.drawio, *.svg, and pkg/cloud-hypervisor/client so CI only
  checks project code
- Leave generated and binary assets unchanged (excluded from checker)

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-10 21:58:28 +01:00
Fabiano Fidêncio
cb652e0da1 tests: Update NVRC trace to use drop-in config mechanism
Update the enable_nvrc_trace() function to use the new drop-in
configuration mechanism instead of directly modifying the base
configuration file. The function now creates a 90-nvrc-trace.toml
drop-in file that properly combines existing kernel parameters
with the nvrc.log=trace setting.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-10 18:12:17 +01:00
Manuel Huber
525192832f tests: Clean up superfluous GPU annotation
This annotation was required for GPU cold-plug before using a
newer device plugin and before querying the pod resources API.
As this annotation is no longer required, cleaning it up.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-09 11:28:24 -08:00
Alex Lyn
3fda59e27d tests: rename pod_exec_with_retries to pod_exec and update callers
It will do following works in this commit:

(1) Rename pod_exec_with_retries() to pod_exec().
(2) Update implementation to call container_exec().
(3) Replace all usages of pod_exec_with_retries across tests
    with pod_exec.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-09 15:56:13 +01:00
Alex Lyn
861d39305c tests: drop kubectl exec retries in container_exec
This commit aims to drop retries when kubectl exec a container:

(1) Rename container_exec_with_retries() to container_exec().
(2) Remove the retry loop and sleep backoff around kubectl exec.

Keep the same logging and container-selection logic and return
kubectl exec exit status directly.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-09 15:56:13 +01:00
Manuel Huber
cf7f340b39 tests: Read and overwrite kernel_verity_parameters
Read the kernel_verity_paramers from the shim config and adjust
the root hash for the negative test.
Further, improve some of the test logic by using shared
functions. This especially ensures we don't read the full
journalctl logs on a node but only the portion of the logs we are
actually supposed to look at.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-05 23:04:35 +01:00