kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-07-02 07:02:16 +00:00

Author	SHA1	Message	Date
Fabiano Fidêncio	57c61e0c2f	tests: unskip hard-coded policy tests on qemu-tdx-runtime-rs Enable the hard-coded init-data policy test gate for qemu-tdx-runtime-rs so runtime-rs and Go TDX variants exercise the same Kubernetes policy coverage. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-06-06 22:48:20 +02:00
Fabiano Fidêncio	43321c7a78	Merge pull request #12931 from mythi/qemu-tdx-tests tests: fix TDX runtime-rs and initdata tests	2026-06-06 11:42:19 +02:00
Fabiano Fidêncio	f6ff9578d4	Merge pull request #13161 from kata-containers/sprt/remove-configure-mariner ci: remove Mariner annotations and use new config	2026-06-05 20:22:58 +02:00
Mikko Ylinen	013e901f1b	tests: re-enable initdata tests for qemu-tdx The coco initdata tests signature verification and authenticated registry never worked on qemu-tdx and so they have been disabled since. Add them back now that all necessary fixes are in place. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2026-06-05 16:04:05 +03:00
Mikko Ylinen	9313e336b5	tests: set image.image_pull_proxy for CDH initdata initdata tests set kernel arguments to "" which resets the kernel arguments configured by Helm install. However, TDX runner depends on agent.https_proxy= kernel arguments to pull images. In order for initdata tests to work on TDX, the same needs to be added to CDH configuration via image.image_pull_proxy. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2026-06-05 16:04:05 +03:00
Mikko Ylinen	f3a0ef6a7c	tests: use kubectl set to configure KBS env No need to patch yamls locally. Also, set RUST_LOG=debug and enable https_proxy for all TDX targets when the runner has HTTPS_PROXY is set. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2026-06-05 16:04:05 +03:00
Aurélien Bombo	de5333f275	ci: remove Mariner annotations and use new config This is a follow-up to #13126 where we forgot to remove this now-unused code. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-06-03 09:25:12 -05:00
Mikko Ylinen	018389cb22	tests: enable k8s-sandbox-vcpus-allocation.bats for tdx and coco-dev k8s-sandbox-vcpus-allocation.bats was disabled for qemu-tdx due to errors when moving to use "upstream" TDX KVM code. The failing test is vcpus-less-than-one-with-no-limits pod which ends up getting x86 default MaxCPU = 240 and erroring: Number of hotpluggable cpus requested (240) exceeds the maximum cpus supported by KVM (224) TDX max vcpus is capped to host's logical CPUs so 240 is too much. With the maxcpus logic fixed (=maxcpus not set at all) for configurations where confidential guest is enabled, qemu-tdx can be enabled for k8s-sandox-vcpus-allocation.bats again. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2026-06-03 15:27:35 +03:00
Fabiano Fidêncio	230e01b04e	Merge pull request #13126 from kata-containers/topic/runtimes-introduce-azure-specific-configs runtime/runtime-rs: introduce Azure specific configs	2026-06-02 09:17:09 +02:00
Fabiano Fidêncio	81ce51a9aa	ci: target Azure CLH runtimes directly in AKS tests Switch AKS Mariner matrix entries to clh-azure handlers and remove the temporary host-OS based helm value overrides. Update integration test wiring and required test labels so CI tracks the new runtime names. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-28 23:32:37 +02:00
Manuel Huber	3e874d0eaf	tests: accept EROFS empty-image rootfs rejection The empty-image test expects pod creation to fail. With an EROFS snapshot that has a disk-backed rwlayer, runtime-rs can still reject that pod with the existing unsupported mount-count error. With default_size=0, there is no rwlayer mount. The same negative test can instead reach the bind rootfs shape produced for the empty active snapshot, which runtime-rs rejects as an unsupported rootfs mount. Accept both messages so the test covers the expected failure for both EROFS rwlayer modes. Assisted-by: OpenAI Codex <codex@openai.com> Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-05-27 17:12:20 +00:00
Manuel Huber	6a715cf4f7	tests: nvidia: No policy for runtime-rs path The current if condition causes agent security policies to be attached to the non-TEE NVIDIA runtime-rs runtime class. While this is good to see that it works, this is not intended. Thus, replacting the condition with is_confidential_gpu_hypervisor. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-05-25 16:00:49 -07:00
Fabiano Fidêncio	f763e9cca9	tests: Add NUMA topology / GPU placement tests to the NV CIs Add k8s-nvidia-numa.bats with five tests that validate NUMA behaviour on hosts where NUMA is configured by default (qemu-nvidia-gpu, qemu-nvidia-gpu-snp, qemu-nvidia-gpu-tdx): 1. Multi-node sandbox (large workload spanning all host NUMA nodes): - Guest NUMA node count matches host - Guest vCPU distribution is balanced across nodes (max-min <= 1) - Guest memory is distributed across NUMA nodes - Host-side vCPU pinning is balanced across NUMA nodes 2. Right-sized single-node sandbox (small workload fitting one node): - Guest collapses to a single NUMA node - All host vCPU threads pinned to that one NUMA node 3. GPU passthrough with VFIO, multi-node: - Guest NUMA topology is balanced (same as test 1) - Guest GPU's NUMA node matches the host GPU's NUMA node (resolved via the vfio-pci,host=<BDF> from the QEMU command line and /sys/bus/pci/devices/<BDF>/numa_node) - QEMU command line contains pxb-pcie and policy=bind - Host vCPU pinning is balanced 4. GPU passthrough with VFIO, right-sized single-node: small workload plus GPU that fits in a single host NUMA node: - Guest collapses to a single NUMA node - The chosen node is the GPU's host NUMA node, not just any node that fits — verified by matching host-nodes= in the memory backend and pxb-pcie numa_node= against the GPU's host node - Guest GPU reports the same NUMA node as the host GPU 5. Explicit numa_mapping in the runtime TOML (QEMU-only): - Drops a config.d/ fragment that sets numa_mapping = ["1"], so the auto-derive + right-sizing path is bypassed entirely - Guest sees exactly 1 NUMA node - QEMU memory backend is bound to host node 1 (host-nodes=1, policy=bind), not host node 0 - Host-side vCPU threads land on host node 1 - Drop-in is removed on teardown so subsequent tests are unaffected Guest-side checks use a dedicated container image (quay.io/kata-containers/numa) that reads sysfs and prints results to stdout — no kubectl exec or CoCo policy overrides needed. Host-side checks (crictl, pgrep, taskset) run directly on the host via sudo; a standalone numa-pinning-check.sh script handles the vCPU thread affinity inspection. The config.d/ helpers used by test 5 are runtime-agnostic (probe Go vs runtime-rs layout on disk) but the test is gated to qemu-* shims since runtime-rs does not yet implement NUMA. Skips cleanly on single-NUMA hosts, unsupported hypervisors, or when no nvidia.com/pgpu resources are available (GPU tests only). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-05-24 22:00:46 +02:00
Alex Lyn	adf6d43e24	test: skip TestContainerMemoryUpdate for sandbox api Temporarily skip the `TestContainerMemoryUpdate` test case for sandbox api. This test case is currently skipped in other VMMs (e.g., QEMU, Cloud-Hypervisor) due to known issues and environmental stability concerns. To maintain consistency across the project, we are skipping it for sandbox as well. A follow-up PR will be dedicated to addressing these issues and properly enabling/refining this test case for all VMMs. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-22 10:46:44 +08:00
Alex Lyn	b5349f4d78	versions: bump containerd to 2.3 for sandbox API tests containerd 2.3 requires Go 1.26.3, but Kata still pins Go 1.25.10. Use Go 1.26.3 for the sandbox-api job so that make cri-integration can build containerd from source. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-22 10:46:16 +08:00
Alex Lyn	9f78dc687f	tests: exclude TestContainerRestart from the cri-containerd test list Creating a new container in the same sandbox VM after the previous container has exited and been removed has never been supported by kata-containers (neither with the go-based nor the rust-based runtime). When the last container is removed the kata VM shuts down, so any attempt to start a new container in the same sandbox fails. This test exercises a use-case kata does not currently support, and it has never been part of the passing list for good reason. Mark it explicitly excluded with a comment so it is clear this is a deliberate omission rather than an oversight. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-22 10:45:50 +08:00
Alex Lyn	a7739579d6	tests: Use podsandbox sandboxer for the runc sanity check The check_daemon_setup function verifies that containerd + runc are functional before the real kata tests run. Using the shim sandboxer for this runc check hits a known containerd bug where the OCI spec is not populated before NewBundle is called, so config.json is never written and containerd-shim-runc-v2 fails at startup. See containerd/containerd#11640 The sandboxer choice is irrelevant for this sanity check, so use podsandbox which works correctly with runc. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-22 10:44:38 +08:00
Fabiano Fidêncio	05f836ea23	Merge pull request #13038 from stevenhorsman/move-k8s-measured-rootfs ci: Move measure-rootfs to run on TEE PRs	2026-05-18 17:29:25 +02:00
Hyounggyu Choi	f6fce19e01	Merge pull request #13062 from BbolroC/skip-coco-test-with-no-reference-values-ibm-sel test: skip CDH resource test for qemu-se without reference values	2026-05-18 14:47:50 +02:00
Hyounggyu Choi	540986bc8f	test: skip CDH resource test for qemu-se without reference values Since gc and trustee were bumped (#13046), the test "Cannot get CDH resource when affirming policy is set without reference values" has started failing for IBM SEL. The attestation policy for IBM SEL returns an "affirming" result whenever the claim can be parsed successfully, meaning the evidence verification succeeds. As a result, the negative test above always produces a positive result. Skip this negative test for IBM SEL environments (e.g. qemu-se*). Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2026-05-18 08:40:16 +02:00
Dan Mihai	ddc36060d2	gha: k8s: reject unsupported KATA_HYPERVISOR values Exit early with an error message instead of starting kata-deploy if the value of KATA_HYPERVISOR is not expected during CI. For example: "cloud-hypervisor" was renamed recently to "clh-runtime-rs" and user scripts depending on the old name were getting tangled in kata-deploy instead of just rejecting the old value quickly. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2026-05-16 01:04:31 +00:00
Dan Mihai	0f3df5d1e4	Merge pull request #13025 from manuelh-dev/mahuber/img-pull-policy tests: generate guest-pull image pull agent security policies	2026-05-15 14:09:00 -07:00
Fabiano Fidêncio	c19bdbf23b	tests: nvidia-nim: use trusted storage templates for runtime-rs Now that runtime-rs supports block-encrypted emptyDir volumes, remove the no-trusted-storage workaround templates and the is_runtime_rs branching in the NIM test. Runtime-rs now uses the same TEE templates as the Go runtime with emptyDir + PVC at 48Gi memory, instead of the 128Gi workaround that compensated for lacking trusted storage. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-05-14 22:56:11 +02:00
Fabiano Fidêncio	54aaa1ea2a	tests: enable trusted ephemeral storage for runtime-rs Remove the runtime-rs skip from the trusted ephemeral data storage test now that runtime-rs implements block-encrypted emptyDir volumes. Also remove the genpolicy drop-in that disabled encrypted_emptydir for runtime-rs and the corresponding copy logic in tests_common.sh. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-05-14 22:56:11 +02:00
Manuel Huber	ed4233bf91	rootfs: cdh: Update CDH to new version Update CDH to a newer version and: - adjust the NVIDIA root filesystem build to reflect the change from using libcryptsetup to using the cryptsetup binary. - adjust image-pull test cases to conduct parallel write operations on the /dev/trusted_store backed guest image pull location since issue #12721 has been solved on CDH side. Fixes #12721 Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-05-13 20:20:45 +02:00
stevenhorsman	5c55726d11	tests/k8s: Update measured-rootfs image Try and switch the docker nginx image to our versions.yaml one so we avoid rate limit issues Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-05-13 17:17:48 +01:00
stevenhorsman	2870f7c2dd	ci: Move measure-rootfs to run on TEE PRs k8s-measured-rootfs only runs on confidential runtime, so we should move it into the subset on tests that run on TEEs Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-05-13 17:01:50 +01:00
Dan Mihai	3799473041	Merge pull request #13010 from microsoft/danmihai1/label-references genpolicy: support env variable values sourced from metadata.labels values	2026-05-12 15:41:11 -07:00
Manuel Huber	da4307efb7	tests: generate policies for guest-pull images Replace guest-pull image allow-all placeholders with explicit auto-generated policies for each generated pod manifest. Generate policy after the final YAML edits so initdata and image pull secrets are represented in the policy inputs. Assisted-by: OpenAI Codex <codex@openai.com> Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-05-12 15:03:15 -07:00
Manuel Huber	6a274b5110	tests: seed auto-generated policy from initdata Teach auto_generate_policy to reuse a cc_init_data annotation by decoding it into the temporary default-initdata.toml file. This lets tests preserve CDH initdata while genpolicy appends the generated agent security policy for the workload. Assisted-by: OpenAI Codex <codex@openai.com> Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-05-12 15:03:14 -07:00
Manuel Huber	e774b13c95	tests: share genpolicy registry auth setup helper Move the Docker auth setup into common.bash so tests beyond the NVIDIA runner can provide credentials for genpolicy image pulls. Make the registry, username, password and output directory explicit while preserving the nvcr.io setup used by the NIM tests. Assisted-by: OpenAI Codex <codex@openai.com> Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-05-12 15:03:14 -07:00
Manuel Huber	c265e4905f	tests: nvidia: avoid NIM journal dumps on success BATS_TEST_COMPLETED is per-test and remains empty in teardown_file. Track file-level state so successful NIM runs skip the journal dump while setup or test failures still include node diagnostics. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-05-10 09:10:01 -07:00
Manuel Huber	1c081ff434	tests: nvidia: place NIM service into namespace Place the NIM service into our test namespace. We are still observing various situations where for some reasons, the NIM service appears in the default namespace in our CI. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-05-10 07:36:23 +00:00
Fabiano Fidêncio	f7be57efe2	Merge pull request #13007 from manuelh-dev/mahuber/dbg-nim-svc tests: nvidia: Wait for NIM operator pod and print	2026-05-08 20:58:51 +02:00
Manuel Huber	714adec3f8	tests: nvidia: Wait for NIM operator pod and print Wait for the NIM operator pod to run before deploying NIM services. Add a temporary debug function to print resource placement into the different namespaces. Remove this function again when the NIM tests are stabilized. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-05-08 06:27:48 +00:00
Ubuntu	b95be5332a	genpolicy: env variables from metadata.labels Add basic genpolicy support for container environment variables sourced from metadata.labels. In this implementation, the relevant labels must be available as input to the policy tool. This is slightly different from the way variables sourced from metadata.annotations are treated by the tool: when the relevant annotation is not available as input, the generated Policy allows any value. Depending on metadata.labels use cases that we might encounter maybe the labels will be handled the same way as the annotations in the future. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2026-05-07 23:35:56 +00:00
Dan Mihai	39b9c318e2	tests: k8s: merge two policy-pod test cases One of these test cases was a subset of the other, so remove that redundancy. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2026-05-07 22:39:23 +00:00
Fabiano Fidêncio	0f3160276b	ci: k8s: skip no-op Helm uninstall on free runners In cleanup_kata_deploy, bail out early when no kata-deploy Helm release exists so baremetal-* pre-deploy cleanup on fresh clusters does not block on helm uninstall --wait (up to 10m). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-05-07 13:40:55 +02:00
Fabiano Fidêncio	19c194aa94	ci: Add runtime-rs GPU shims to NVIDIA GPU CI workflow Add qemu-nvidia-gpu-runtime-rs and qemu-nvidia-gpu-snp-runtime-rs to the NVIDIA GPU test matrix so CI covers the new runtime-rs shims. Introduce a `coco` boolean field in each matrix entry and use it for all CoCo-related conditionals (KBS, snapshotter, KBS deploy/cleanup steps). This replaces fragile name-string comparisons that were already broken for the runtime-rs variants: `nvidia-gpu (runtime-rs)` was incorrectly getting KBS steps, and `nvidia-gpu-snp (runtime-rs)` was not getting the right env vars. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-07 10:33:26 +02:00
Dan Mihai	fcee4864e7	genpolicy: ignore additional PodAffinity fields 1. Ignore PodAffinity's preferredDuringSchedulingIgnoredDuringExecution. 2. Ignore additional PodAffinityTerm fields. 3. Add basic tests for the new fields. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2026-05-06 01:38:02 +00:00
Dan Mihai	b6349f50ab	genpolicy: ignore preemptionPolicy Ignore the pod preemptionPolicy field from input YAML - irrelevant for building the Policy. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2026-05-06 00:35:27 +00:00
Dan Mihai	9f4a7a9d55	Merge pull request #12978 from microsoft/danmihai1/empty-env-var genpolicy: support empty environment variables	2026-05-05 14:10:35 -07:00
Dan Mihai	99dd897814	genpolicy: support empty environment variables K8s supports them, so genpolicy should support them too. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2026-05-05 18:53:25 +00:00
Fabiano Fidêncio	29e63c21a1	tests: k8s-cron-job: set runtimeClassName to kata The cron-job test workload was missing `runtimeClassName: kata`, which meant the cron job was not actually being executed under the Kata runtime, defeating the purpose of the test. Set it explicitly, consistent with the sibling `job.yaml` workload. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-05 11:21:05 +02:00
Dan Mihai	0a6dc2fae0	ci: mariner: use OCI version 1.2.1 Mariner moved from version 1.2.0 to version 1.2.1. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2026-05-05 02:23:30 +00:00
Fabiano Fidêncio	8c3c7aa871	ci: Drop ITA_KEY usage from CI workflows The ITA_KEY secret was conditionally passed to TDX jobs for Intel Trust Authority attestation, but it is no longer needed. Remove it from all workflow files and the test helper export. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-03 18:05:51 +02:00
Aurélien Bombo	f3dc71a770	Revert "tests: k8s: policy: improve settings selection for runtime-rs hypervisors" This reverts commit `cafdd278ba`.	2026-04-28 10:58:01 -05:00
Aurélien Bombo	e4fbddb91a	ci: rename cloud-hypervisor to clh-runtime-rs This aligns on qemu-runtime-rs and makes more sense. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-04-28 10:58:01 -05:00
Saul Paredes	7c8df3b9e6	Revert "test: temp skip failing tests on AKS" This reverts commit `90e94ab305`.	2026-04-27 09:36:51 -07:00
Saul Paredes	3273c4e1cc	Revert "ci: Skip tests not working with k8s 1.36.0" This reverts commit `df68536cd6`.	2026-04-27 08:08:27 -07:00

1 2 3 4 5 ...

1221 Commits