kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-05-14 11:03:31 +00:00

Author	SHA1	Message	Date
stevenhorsman	5c618dc8e2	tests: Switch nginx images to use version.yaml details - Swap out the hard-coded nginx registry and verisons for reading the test image details for version.yaml which can also ensure that the quay.io mirror is used rather than the docker hub versions which can hit pull limits - Try setting imagePullPoliycy Always to fix issues with the arm CI Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-02 10:04:09 +01:00
Manuel Huber	5a5c43429e	ci: nvidia: remove kubectl_retry calls When tests regress, the CI wait time can increase significantly with the current kubectly_retry attempt logic. Thus, align with other tests and remove kubectl_retry invocations. Instead, rely on proper timeouts. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-11-28 19:00:57 +01:00
Alex Lyn	4e450691f4	tests: Unify nydus configuration to containerd v3 schema Containerd configuration syntax (`config.toml`) varies across versions, requiring per-version logic for fields like `runtime`. However, testing confirms that containerd LTS (1.7.x) and newer versions fully support the v3 schema for the nydus remote snapshotter. This commit changes the previous containerd v1 settings in `config.toml`. Instead, it introduces a unified v3-style configuration for nydus, which can be vailid for lts and active containerds. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-11-26 17:58:16 +08:00
Alex Lyn	ebe084e093	Merge pull request #12122 from fidencio/topic/configs-do-no-have-commented-out-options runtimes: config: Do NOT have commented fields	2025-11-26 10:33:32 +08:00
Fabiano Fidêncio	e859537c74	runtimes: config: Do NOT have commented fields In order to have a better way to set things up using a toml editor, we should take the containerd approach and actually have everything uncommnted. This will help us to unify how we deal with such values in the future from the kata-deploy POV. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-25 19:26:56 +01:00
Manuel Huber	331515e1b8	ci: enable security policy for openvpn test With issue 11777 being resolved, this commit enables openvpn policy testing. The remaining work on the security policy required to successfully run this test case was to enable UDP ports for Service kinds and to use the mount path's last component instead of the volume name to construct the expected storage source path. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-11-23 17:23:43 +00:00
Manuel Huber	dfc229f51e	tests: nvidia: cc: Remove nvrc.smi.srs=1 parameter Remove the nvrc.smi.srs=1 parameter from the kernel command line. In CC use cases, the attestation agent is expected to set the GPU ready state. For the CUDA vectorAdd case where attestation agent is not being used, we set the ready state by adding the kernel command line parameter through an annotation. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-11-21 09:35:05 +01:00
Manuel Huber	6c6fc50aa5	tests: nvidia: cc: allow-all policy and init-data Add an allow-all policy for the CC GPU tests and ensure the init-data device is being created (hypervisor annotations). Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-11-21 09:24:15 +01:00
Manuel Huber	7e20118c8e	tests: nvidia: move secret definitions to bottom The add_allow_all_policy_to_yaml in tests_common.sh needs some improvements so that this function can support pod manifests with different resource kinds. For now, moving the Secret definition to the bottom so that we can create a default policy for the Pod. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-11-21 09:24:15 +01:00
Manuel Huber	ffd5443637	tests: nvidia: adapt is_aks_cluster The qemu-nvida-gpu handlers should not cause is_aks_cluster to return 1. Otherwise, CI logic will assume these hypervisors run on AKS hosts, see the following message in CI w/o this change: INFO: Adapting common policy settings for AKS Hosts Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-11-21 09:24:15 +01:00
Manuel Huber	f2bdd12e5e	tests: nvidia: Check KATA_HYPERVISOR var Fail explicitly when a wrong KATA_HYPERVISOR variable is provided. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-11-21 09:24:15 +01:00
Fabiano Fidêncio	6b40b59861	tests: Reduce KBS deployment check flakeness We currently start a pod that does a `wget` to the KBS address, and fails after 5 seconds. By the time it fails and reports back, we can see that KBS is actually running, but the workflow failed as the checker failed. :-/ Let's give it more time for the KBS to show up, and the flakeness should go away. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-20 19:29:26 +01:00
Fabiano Fidêncio	35672ec5ee	tests: cc: Test authenticated images with force guest pull As this should simply work. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-20 19:02:15 +01:00
Manuel Huber	477ca3980b	tests: nvidia: cc: Re-enable multi GPU test case Use the pod name variable so that kubectl wait finds the pod. Currently, kubectl waits for nvidia-nim-llama-3-2-nv-embedqa-1b-v2, not for nvidia-nim-llama-3-2-nv-embedqa-1b-v2-tee Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-11-20 10:05:46 +01:00
Fabiano Fidêncio	ae463642ed	tests: k8s: Fix typo in authenticated tests The person who introduced the check, someone named Fabiano Fidêncio, forgot a `$` in a variable assignment. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-19 11:59:59 +01:00
Alex Lyn	1da225efc5	tests: Enable AUTO_GENERATE_POLICY for qemu-coco-dev-runtime-rs Enable auto-generate policy on cbl-mariner Hosts for qemu-coco-dev-runtime-rs if the user didn't specify an AUTO_GENERATE_POLICY value. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-11-19 10:44:03 +08:00
Fabiano Fidêncio	8c02b5b913	tests: nvidia: cc: Temporarily skip multi GPU for nim tests We will re-enable this one later on once the changes to properly cold plug multi GPUs are merged. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-18 22:29:42 +01:00
Fabiano Fidêncio	94ed4051b0	tests: nvidia: cc: Increase RAM for NIM pods Those need to pull the models inside the guest, and the guest has 50% of its memory "allowed" to be used as tmpfs, so, we gotta usa the RAM that we have. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-18 22:29:42 +01:00
Fabiano Fidêncio	e5062a056e	tests: nvidia: cc: Adjust timeouts on NIM pods Timeout increases for confidential computing slowness: * livenessProbe: * initialDelaySeconds: 15 → 120 seconds * timeoutSeconds: 1 → 10 seconds * failureThreshold: 3 → 10 * readinessProbe: * initialDelaySeconds: 15 → 120 seconds * timeoutSeconds: 1 → 10 seconds * failureThreshold: 3 → 10 * startupProbe: * initialDelaySeconds: 40 → 180 seconds * timeoutSeconds: 1 → 10 seconds * failureThreshold: 180 → 300 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-18 22:29:42 +01:00
Fabiano Fidêncio	6be43b2308	tests: nvidia: Retry kubectl commands As with CoCo some of the commands may take longer, way longer than expected. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-18 22:29:42 +01:00
Fabiano Fidêncio	bb5bf6b864	tests: nvidia: nims: Use the current auths format for KBS We cannot use the same format used for docker, as it includes username and password, while what's expected when using Trustee does not. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-18 22:29:42 +01:00
Fabiano Fidêncio	92da54c088	tests: nvidia: cc: Enable NIM tests Now that we've bumped Trustee to a version that supports the NVIDIA remote verifier, let's re-enable the tests. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-18 22:29:42 +01:00
Fabiano Fidêncio	8eca0814bd	tests: Run authenticated tests with experimental_force_guest_pull As it should be supported. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-18 14:46:48 +01:00
Fabiano Fidêncio	75996945aa	kata-deploy: try-kata-values.yaml -> values.yaml This makes the user experience better, as the admin can deploy Kata Containers without having to download / set up any additional file. Of course, if the admin wants something more specific, examples are provided. Tests and documentation are updated to reflect this change. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-17 12:16:17 +01:00
Fabiano Fidêncio	2e000129a9	kata-deploy: tests: Add example values files for easy Kata deployment Add three example values files to make it easier for users to try out different Kata Containers configurations: - try-kata.values.yaml: Enables all available shims - try-kata-tee.values.yaml: Enables only TEE/confidential computing shims - try-kata-nvidia-gpu.values.yaml: Enables only NVIDIA GPU shims These files use the new structured configuration format and serve as ready-to-use examples for common deployment scenarios. Also update the README.md to document these example files and how to use them. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-15 09:36:14 +01:00
Fabiano Fidêncio	8717312599	tests: Migrate helm_helper to use new structured configuration Update the helm_helper function in gha-run-k8s-common.sh to use the new structured configuration format instead of the legacy env.* format. All possible settings have been migrated to the structured format: - HELM_DEBUG now sets root-level 'debug' boolean - HELM_SHIMS now enables shims in structured format with automatic architecture detection based on shim name - HELM_DEFAULT_SHIM now sets per-architecture defaultShim mapping - HELM_EXPERIMENTAL_SETUP_SNAPSHOTTER now sets snapshotter.setup array - HELM_ALLOWED_HYPERVISOR_ANNOTATIONS now sets per-shim allowedHypervisorAnnotations - HELM_SNAPSHOTTER_HANDLER_MAPPING now sets per-shim containerd.snapshotter - HELM_AGENT_HTTPS_PROXY and HELM_AGENT_NO_PROXY now set per-shim agent proxy settings - HELM_PULL_TYPE_MAPPING now sets per-shim forceGuestPull/guestPull settings - HELM_EXPERIMENTAL_FORCE_GUEST_PULL now sets per-shim forceGuestPull/guestPull The test helper automatically determines supported architectures for each shim (e.g., qemu-se supports s390x, qemu-cca supports arm64, qemu-snp/qemu-tdx support amd64, etc.) and applies per-shim settings to the appropriate shims based on HELM_SHIMS. Only HELM_HOST_OS remains in legacy env.* format as it doesn't have a structured equivalent yet. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-15 09:36:14 +01:00
Dan Mihai	5cc1024936	ci: k8s: AUTO_GENERATE_POLICY for coco-dev Re-enable AUTO_GENERATE_POLICY for coco-dev Hosts, unless PULL_TYPE is "experimental-force-guest-pull", or the caller specified a different value for AUTO_GENERATE_POLICY. Auto-generated Policy has been disabled accidentally and recently for these Hosts, by a GHA workflow change. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-11-14 15:53:34 +01:00
stevenhorsman	b7abcc4c37	tests: Fix wildcard skip in k8s-cpu-ns The formatting wasn't quite right, so the `qemu-coco-dev-runtime-rs` hypervisor wasn't skipping this test Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-11-13 14:21:05 +00:00
stevenhorsman	b51af53bc7	tests/k8s: call teardown_common in some policy tests The teardown_common will print the description of the running pods, kill them all and print the system's syslogs afterwards. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-11-13 14:18:43 +00:00
stevenhorsman	0335012824	tests/k8s: Enable tests for qemu-coco-dev-runtime-rs Add the runtime class to the non-tee tests and enable it to run in the test code Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-11-13 14:18:43 +00:00
stevenhorsman	a1ddd2c3dd	kata-deploy: Add kata-qemu-coco-dev-runtime-rs runtime class Add the runtime class and shim references for the new non-tee runtime-rs class Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-11-13 14:18:43 +00:00
Hyounggyu Choi	2dec247a54	Merge pull request #12038 from lifupan/fix_smaller-memeory runtime-rs: fix the issue of hot-unplug memory smaller	2025-11-12 11:22:04 +01:00
Zvonko Kaiser	d783e59b42	Merge pull request #12055 from fidencio/topic/coco-bump-trustee versions: Bump Trustee	2025-11-12 02:48:16 -05:00
Zvonko Kaiser	76e4e6bc24	Merge pull request #12061 from Apokleos/correct-unexpected-cap tests: Correct unexpected capability for policy failure test	2025-11-11 12:20:33 -05:00
Fabiano Fidêncio	d82eb8d0f1	ci: Drop docker tests We have had those tests broken for months. It's time to get rid of those. NOTE that we could easily revert this commit and re-add those tests as soon as we find someone to maintain and be responsible for such integration. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-11 17:02:02 +01:00
Fabiano Fidêncio	2d2b0de160	tests: kbs: Try to get the pod logs on deployment failure As this helps immensely to figure out what went wrong with the deployment. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-11 08:08:24 +01:00
Fabiano Fidêncio	58df06d90e	versions: Bump Trustee This is a bump pre-release, which brings several fixes and some improvements related to initData, and NVIDIA's remote verifier. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-11 08:08:05 +01:00
Alex Lyn	c225cba0e6	tests: Correct unexpected capability for policy failure test The test case designed to verify policy failures due to an "unexpected capability" was misconfigured. It was using "CAP_SYS_CHROOT" as the unexpected capability to be added. This configuration was flawed for two main reasons: 1.Incorrect Syntax: Kubernetes Pod specs expect capability names without the "CAP_" prefix (e.g., "SYS_CHROOT", not "CAP_SYS_CHROOT"). This made the test case's premise incorrect from a K8s API perspective. 2.Part of Default Set: "SYS_CHROOT" is already included in the `default_caps` list for a standard container. Therefore, adding it would not trigger a policy violation, defeating the purpose of the "unexpected capability" test. Furthermore, a related issue was observed where a malformed capability like "CAP_CAP_SYS_CHROOT" was being generated, causing parsing failures in the `oci-spec-rs` library. This was a symptom of incorrect string manipulation when handling capabilities. This commit corrects the test by selecting "SYS_NICE" as the unexpected capability. "SYS_NICE" is a more suitable choice because: - It is a valid Linux capability. - It is relatively harmless. - It is not part of the default capability set defined in `genpolicy-settings.json`. By using "SYS_NICE", the test now accurately simulates a scenario where a Pod requests a legitimate but non-default capability, which the policy (generated from a baseline Pod without this capability) should correctly reject. This change fixes the test's logic and also resolves the downstream `oci-spec-rs` parsing error by ensuring only valid capability names are processed. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-11-11 14:06:30 +08:00
Fabiano Fidêncio	92226d0a19	tests: nvidia: Be prepared for TDX Thankfully there's only one piece that's still SNP specific (for the supported TEEs). Let's adjust it so we can have an easy and smooth execution when adding a TDX CI machine. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-10 13:01:30 +01:00
Fabiano Fidêncio	4d314e8676	tests: nvidia: nims: Adjust to CC There are several changes needed in order to get this test working with CC, and yet we still are skipping it. Basically, we need to: * Pull an authenticated image inside the guest, which requires: * Using Trustee to release the credential * We still depend on a PR to be merged on Trustee side * https://github.com/confidential-containers/trustee/pull/1035 * We still depend on a Trustee bump (including the PR above) on our side Apart from those changes, I ended up "duplicating" the tests by adding a "-tee" version of those, which already have: * The proper kbs annotations set up * Dropped host mounts * Increases the memory needed Last but not least, as "bats" probably means "being a terrible script", I had to re-arrange a few things otherwise the tests would not even run due to bats-isms that I am sincerely not able to pin-point. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-10 13:01:30 +01:00
Fabiano Fidêncio	8cedd96d54	tests: nvidia: k8s: Enforce experimental_force_guest_pull We added the tests using virtio-9p as we knew it'd require incremental changes to be able to use any kind of guest-pull method. Now, as in the coming commits we'll be actually ensuring that guest-pull works and is in use, we can enforce the experimental_force_guest_pull usage for the nvidia cases. Note: We're using experimental_force_guest_pull instead of nydus-snapshotter due to stability concerns with the snapshotter. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-10 13:01:30 +01:00
Fabiano Fidêncio	e85cf83573	k8s: tests: Fix default for EXPERIMENTAL_FORCE_GUEST_PULL It takes either a shim name or "", but we were treating this (thankfully only in this specific file) as a boolean. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-10 13:01:30 +01:00
Manuel Huber	8b39468b36	tests: nvidia: Logging for NIM Adjust output to the setup_file and teardown_file behavior. With this, we will be able to observe relevant logging rather than adding to the output variable. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-11-10 13:01:30 +01:00
Fabiano Fidêncio	812191c1f3	tests: nvidia: Do not deploy NFD on nvidia-gpu cases As it'll come from the GPU Operator for now. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-10 13:01:30 +01:00
Dan Mihai	df7ee2dd38	ci: k8s: AUTO_GENERATE_POLICY for cbl-mariner Auto-generate policy on cbl-mariner Hosts if the user didn't specify an AUTO_GENERATE_POLICY value. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-11-08 00:00:09 +01:00
Dan Mihai	53acb74f26	genpolicy: adapt to new AKS pause container behavior The new image reference has changed to mcr.microsoft.com/oss/v2/kubernetes/pause:3.6 from mcr.microsoft.com/oss/kubernetes/pause:3.6. The new image uses by default UID=0, GID=0 while the older. The older image had: UID=65535, GID=65535. There is a new pause_container_id_policy field in genpolicy-settings.json, informing genpolicy about the way AdditionalGids gets updated - "v1" for the older behavior and "v2" for the newer AKS version: - When using v1, the default value of AdditionalGids is {65535}. - When using v2, the default value of AdditionalGids is {}. UID=65535 and GID=65535 are still hard-coded by default in genpolicy-settings.json. We might be able to remove/ignore these fields in the future, if we'll stop relying on policy::KataSpec::get_process_fields to use these fields. A new CI function adapt_common_policy_settings_for_aks() changes the pause container UID, GID, pause_container_id_policy, and image ref settings values when testing on AKS Hosts - i.e., when testing coco-dev or mariner Hosts. The genpolicy workarounds for the unexpected behavior with guest pull enabled have been improved to use the current container's GID instead of hard-coding GID=0 as the guest pull default. Also, AdditionalGids gets updated when the current container's GID is changing, instead of always changing the AdditionalGids at the very end of policy::AgentPolicy::get_container_process(), when the relevant evolution of the GID value was no longer available. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-11-08 00:00:09 +01:00
Dan Mihai	cacd37ee6e	tests: genpolicy: restore test settings for non-Coco configMap These settings got broken recently because the non-CoCo tests were disabled for unrelated reasons. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-11-08 00:00:09 +01:00
Manuel Huber	c6dc176a03	tests: nvidia: cc: Enable NIMs tests Same deal as the previous commut, just enabling the tests here, with the same list of improvements that we will need to go through in order to get is working in a perfect way. Signed-off-by: Manuel Huber <manuelh@nvidia.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-06 16:28:33 +01:00
Manuel Huber	8ca77f2655	tests: nvidia: cc: Run CUDA vectorAdd tests on CC mode While the primary goal of this change is to detect regressions to the NVIDIA SNP GPU scenario, various improvements to reflect a more realistic CC setting are planned in subsequent changes, such as: * moving away from the overlayfs snapshotter * disabling filesystem sharing * applying a pod security policy * activating the GPUs only after attestation * using a refined approach for GPU cold-plugging without requiring annotations * revisiting pod timeout and overhead parameters (the podOverhead value was increased due to CUDA vectorAdd requiring about 6Gi of podOverhead, as well as the inference and embedqa requiring at least 12Gi, respectively, 14Gi of podOverhead to run without invoking the host's oom-killer. We will revisit this aspect after addressing points 1. and 2.) Signed-off-by: Manuel Huber <manuelh@nvidia.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-06 16:28:33 +01:00
Fupan Li	bfe8da6c8a	tests: disable the qemu-runtime-rs cpu hotplug test Since there's something wrong with the cpu hotplug on qemu-runtime-rs, thus disable this test temporally. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-11-06 21:37:01 +08:00

1 2 3 4 5 ...

1759 Commits