kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-02-22 06:43:41 +00:00

Author	SHA1	Message	Date
Fabiano Fidêncio	500146bfee	versions: Bump Go to 1.24.12 Update Go from 1.24.11 to 1.24.12 to address security vulnerabilities in the standard library: - GO-2026-4342: Excessive CPU consumption in archive/zip - GO-2026-4341: Memory exhaustion in net/url query parsing - GO-2026-4340: TLS handshake encryption level issue in crypto/tls Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-29 00:23:26 +01:00
Dan Mihai	20ca4d2d79	runtime: DEFDISABLEBLOCK := true 1. Add disable_block_device_use to CLH settings file, for parity with the already existing QEMU settings. 2. Set DEFDISABLEBLOCK := true by default for both QEMU and CLH. After this change, Kata Guests will use by default virtio-fs to access container rootfs directories from their Hosts. Hosts that were designed to use Host block devices attached to the Guests can re-enable these rootfs block devices by changing the value of disable_block_device_use back to false in their settings files. 3. Add test using container image without any rootfs layers. Depending on the container runtime and image snapshotter being used, the empty container rootfs image might get stored on a host block device that cannot be safely hotplugged to a guest VM, because the host is using the same block device. 4. Add block device hotplug safety warning into the Kata Shim configuration files. Signed-off-by: Dan Mihai <dmihai@microsoft.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Cameron McDermott <cameron@northflank.com>	2026-01-28 19:47:49 +01:00
Fabiano Fidêncio	d0fe60e784	tests: Fix empty string handling for helm Fix empty string handling in format conversion When HELM_ALLOWED_HYPERVISOR_ANNOTATIONS, HELM_AGENT_HTTPS_PROXY, or HELM_AGENT_NO_PROXY are empty, the pattern matching condition `!= :` or `!= =` evaluates to true, causing the conversion loop to create invalid entries like "qemu-tdx: qemu-snp:". Add -n checks to ensure conversion only runs when variables are non-empty. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	4b2d4e96ae	tests: Add qemu-{tdx,snp}-runtime-rs to the list of tee shims We missed doing this as part of `b5a986eacf`. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	26c534d610	tests: Use shims.disableAll in test helpers Update the CI and functional test helpers to use the new shims.disableAll option instead of iterating over every shim to disable them individually. Also adds helm repo for node-feature-discovery before building dependencies to fix CI failures on some distributions. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	d8a3272f85	kata-deploy: Add tests for custom runtimes Helm templates Add Bats tests to verify the custom runtimes Helm template rendering, and that the we can start a pod with the custom runtime. Tests were written with Cursor's help. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Manuel Huber	0d35b36652	Revert "ci: Ensure the KBS resources are created" This reverts commit `c0d7222194`. Soon, guest components will switch to using a DB instead of storing resources in the filesystem. Further, I don't see any more indicators why kbs-client would struggle to set simple resources. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-01-23 16:18:10 -08:00
Fabiano Fidêncio	5b82b160e2	runtime-rs: Add arm64 QEMU support Add the necessary configuration and code changes to support QEMU on arm64 architecture in runtime-rs. Changes: - Set MACHINETYPE to "virt" for arm64 - Add machine accelerators "usb=off,gic-version=host" required for proper arm64 virtualization - Add arm64-specific kernel parameter "iommu.passthrough=0" - Guard vIOMMU (Intel IOMMU) to skip on arm64 since it's not supported These changes align runtime-rs with the Go runtime's arm64 QEMU support. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org>	2026-01-23 19:48:31 +01:00
Fabiano Fidêncio	ec18dd79ba	tests: Simplify kata-deploy test to use helm directly The kata-deploy test was using helm_helper which made it hard to debug failures (die() calls would cause "Executed 0 tests" errors) and added unnecessary complexity. The test now calls helm directly like a user would, making it simpler and more representative of real-world usage. The verification job status is explicitly checked with proper failure detection instead of relying on helm --wait. Timeouts are configurable via environment variables to account for different network speeds and image sizes: - KATA_DEPLOY_TIMEOUT (default: 600s) - KATA_DEPLOY_DAEMONSET_TIMEOUT (default: 300s) - KATA_DEPLOY_VERIFICATION_TIMEOUT (default: 120s) Documentation has been added to explain what each timeout controls and how to customize them. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-21 20:14:33 +01:00
Fabiano Fidêncio	2369cf585d	tests: Fix retry loop bugs in helm_helper The retry loop in helm_helper had two bugs: 1. Counter initialized to 10 instead of 0, causing immediate failure 2. Exit condition used -eq instead of -ge, incorrect for loop logic These bugs would cause helm_helper to fail immediately on the first retry attempt instead of properly retrying up to max_tries times. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-21 20:14:33 +01:00
Fabiano Fidêncio	e0158869b1	tests: Add common bats test runner function Add run_bats_tests() function to common.bash that provides consistent test execution and reporting across all test suites (k8s, nvidia, kata-deploy). This removes duplicated test runner code from run_kubernetes_tests.sh, run_kubernetes_nv_tests.sh, and run-kata-deploy-tests.sh. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-20 12:31:55 +01:00
Fabiano Fidêncio	b5a986eacf	kata-deploy: Add runtime-rs TDX / SNP runtimeclasses https://github.com/kata-containers/kata-containers/pull/11534 has been merged and it added all the needed bits to deploy the QEMU SNP / TDX runtime-rs variants, apart from the kata-deploy additions, which is done by this PR. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-19 22:41:50 +01:00
Fabiano Fidêncio	c7570427d2	tests: Add report generation to NVIDIA tests The NVIDIA GPU test runner script was not generating test reports, causing the report_tests() function in gha-run.sh to have nothing to display. This aligns the script with run_kubernetes_tests.sh by: - Adding set -o pipefail for proper pipeline error handling - Creating a reports directory with timestamped subdirectory - Capturing test output to files with ok-/not_ok- prefixes - Adding --timing flag to bats for timing information Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-19 18:21:43 +01:00
Fabiano Fidêncio	96e1fb4ca6	tools: Remove runk The runk tool hasn't been supported for a few years, with no maintainers since ManaSugi stopped being involved in the project and the CI was disabled in 2024. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-19 14:43:53 +01:00
Fabiano Fidêncio	ea18f543b4	tests: kata-deploy: Enable verification during helm install Enable post-install verification in kata-deploy CI tests. When HELM_VERIFY_DEPLOYMENT is set, a simple verification pod is created that runs with the Kata runtime to confirm deployment succeeded. The verification pod prints kernel info and exits - success indicates the Kata runtime is properly configured and functional. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-16 10:52:43 +01:00
Alex Lyn	fba92880c9	tests: make set_container_command idempotent and add debug output set_container_command() previously appended command arguments one-by-one with '.command += [...]'. This makes the helper non-idempotent and can lead to unexpected command arrays when invoked multiple times. Update the helper to set the full command array in a single yq v4 expression and print the target YAML path plus the command being applied to simplify debugging when tests fail. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-12 17:56:28 +01:00
Alex Lyn	38296a41b2	tests: Generate pod config with stable .yaml suffix The pod config file created by new_pod_config() was generated via mktemp using the template "pod-config.yaml.in.XXX", which produces filenames that do not end with ".yaml" (e.g. pod-config.yaml.in.ABC). If the random combination of special suffix with ".Csv" or ".Xml", etc. the following operations with yq will fail. Some helpers and tooling assume the config path ends with ".yaml". Switch the mktemp template to place the random suffix before the extension so the returned path always ends with ".yaml". Fixes: #12268, #12319 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-12 17:56:28 +01:00
Manuel Huber	183507beeb	agent: change secure_storage_integrity default Change the secure_storage_integrity option's default value to true. With this, integrity protection for encrypted block device contents will be requested from the confidential data hub by default, see the agent's cdh_handler_trusted_storage function in rpc.rs. This behavior can be disabled by explicitly setting the agent.secure_storage_integrity parameter to 0 or false via kernel command line parameters. This will affect the trusted storage implementation for the guest-pull mechanism, and it will affect future implementations using this code path, such as implementations for ephemeral secure storage. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-01-10 16:54:03 +01:00
Manuel Huber	df2896c298	docs: Create NVIDIA GPU passthrough QEMU scenario Create a new page for a reference implementation for Kubernetes using QEMU, the go shim and an NVIDIA rootfs. The new page contains information on: - components involved in the NVIDIA (TEE) GPU scenario - orchestration flow for GPU passthrough scenarios - deployment guidance Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-01-09 19:02:56 +01:00
Saul Paredes	02979a13e3	Merge pull request #12208 from romoh/patch-1 ci: Update AKS setup post Pod Sandboxing GA	2026-01-08 11:02:05 -08:00
Fabiano Fidêncio	6b3953dd51	tests: k8s: liveness-probes: Adjust events grep Till k8s 1.34 we could grep by "Started containerd". From k8s 1.35 onwards the event message changed and we should, instead, grep by "Container started". Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-07 23:01:59 +01:00
Roaa Sakr	44c79cf14a	ci: Update AKS setup post Pod Sandboxing GA Update workload-runtime value to align with current AKS Pod Sandboxing documentation post GA. Signed-off-by: Roaa Sakr <romoh@microsoft.com>	2026-01-05 13:47:33 -08:00
Hyounggyu Choi	3fa1d93f85	tests: remove re-delcared local variable in k8s-empty-dirs.bats Since #12204 was merged, the following error has been observed: ``` bats warning: Executed 1 instead of expected 2 tests [run_kubernetes_tests.sh:162] ERROR: Tests FAILED from suites: k8s-empty-dirs.bats ``` The cause is that `pod_logs_file` is re-declared as a local variable in the second test before skipping, which makes it inaccessible in `teardown()` and leads to an error. This commit removes the re-declaration of the variable. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-12-18 18:57:16 +01:00
Manuel Huber	78c41b61f4	tests: nvidia: Update images, probes and timeouts Changes in NIM/RAG samples: - update image references - update memory requirements, timeouts, model name - sanitize some of the probes and print-out Further refinements can be made in the future. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-18 10:57:14 +01:00
Manuel Huber	0373428de4	tests: nvidia: Use secret for NGC API key This is a slight change in the manifest to at least use a secret for the environment variable. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-18 10:57:14 +01:00
Hyounggyu Choi	56ec8d7788	Merge pull request #12204 from kata-containers/runtime-rs-stability-debug CI: Upgrade log details for improved error analysis	2025-12-18 10:54:54 +01:00
Tobin Feldman-Fitzthum	decc09e975	tests: cc: add test with SNP reference values Add two attestation tests. The first one sets a resource policy that requires CPU0 to have an affirming trust level. This is a negative test which can run on any platform. Setting this policy without setting any reference values should result in an attestation failure. Next, a second test will set the same policy, but this time it will use the journal log to find the QEMU command line from the previous test and calculate the expected reference values. Currently this is only supported on SNP using the sev-snp-measure tool, but the same flow should work on other platforms. Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>	2025-12-18 00:12:11 +01:00
Alex Lyn	3696d9143a	tests: Correct the teardown_common in cpu-ns.bats It will address the issue: "# bats warning: Executed 0 instead of expected 1 tests" Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com> Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-12-17 16:14:10 +00:00
Alex Lyn	a28f24ef8c	tests: move the get_pod_config_dir into setup_common As each case need such preparation of get_pod_config_dir, a better method is directly move it into the setup_common method. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-12-17 16:14:10 +00:00
Alex Lyn	5778b0a001	tests: Introduce measure_node_time to get test case end time To measure the duration for journal, we need clearly print the journal start time and end time for each case which helps to ensure the journal log is for the specified period for the case. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-12-17 16:14:10 +00:00
Alex Lyn	648f0913ca	tests: Load lib.rs in bats to ensure related function available The lib.rs should be first loaded before execute some functions call. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-12-17 16:14:10 +00:00
Alex Lyn	bbec15d695	tests: delete policy_settings_dir only for first test case Currently policy_settings_dir is created only when BATS_TEST_NUMBER == "1", but delete_tmp_policy_settings_dir "${policy_settings_dir}" is called in teardown() for every test. This means that for tests after the first one teardown() may attempt to delete a directory that was already removed by a previous test, or rely on a value that does not belong to the current test execution. Adjust teardown logic so that policy_settings_dir is only deleted for the first test case (BATS_TEST_NUMBER == "1") and ignored for subsequent tests. This keeps the original optimization of running genpolicy only once, while avoiding unnecessary or confusing cleanup attempts in later test cases. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-12-17 16:14:10 +00:00
Alex Lyn	24e68b246f	tests: Add missing bin env at the head of bats Add the missing part of `#!/bin/bash/env` in bats. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-12-17 16:14:10 +00:00
Alex Lyn	93ba6a8e76	tests: Make pod_name a global variable the previous pod_name is set as local which can not be captured within the teardown() function, causing failure. This commit just remove the `local pod_name` to make it a global variable. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-12-17 16:14:10 +00:00
Alex Lyn	89dce4eff6	tests: Enhance debug log output Introduce setup_common in setup() and teardown_common() in teardown() to get enough log to help debug Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-12-17 16:14:10 +00:00
Hyounggyu Choi	7f72acc266	Merge pull request #12180 from BbolroC/enable-vfio-ap-passthrough-runtime-rs runtime-rs: Enable VFIO-AP passthrough (hotplug only) on s390x	2025-12-17 15:50:10 +01:00
Fabiano Fidêncio	830d15d4c8	tests: Adapt to using kata-tools Instead of relying and the fully bloated kata tarball. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-16 12:55:07 +01:00
Fabiano Fidêncio	50b853eb93	tests: nvidia: Always rely on the "kata" default runtime class This is a pattern already followed by all the other tests. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-12 16:31:42 +01:00
Manuel Huber	ff2396aeec	tests: nvidia: Declare KATA_HYPERVISOR variable Align with other test logic - declare the KATA_HYPERVISOR in the run bash script, then declare the RUNTIME_CLASS_NAME variable in the bats files. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-12 16:31:42 +01:00
Manuel Huber	6e31cf2156	tests: nvidia: cc: USE is_confidential_gpu_hw This function has recently been introduced, so we align patterns. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-12 16:31:42 +01:00
Manuel Huber	cd1f55b41c	tests: nvidia: cc: Set GPU0 policy for NIM tests Now that we have a more restrictive resource policy for KBS, let us start adopting it across all NVIDIA test cases. This policy was previously introduced by the NVIDIA attestation test. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-12 16:31:42 +01:00
Manuel Huber	edbac264cb	tests: nvidia: cc: Remove KBS variable The variable is now set in the CI YAML file, thus removing the assignment. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-12 16:31:42 +01:00
Manuel Huber	9665b74653	tests: nvidia: cc: address shellcheck warnings Address shellcheck warnings for run_kubernetes_nv_tests.sh Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-12 16:31:42 +01:00
Manuel Huber	5f9e7a03a8	tests: nvidia: do not use teardown_common Clean up in each NVIDIA bats file according to our needs. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-12 16:31:42 +01:00
Manuel Huber	1781fb8b06	tests: nvidia: cc: Use CUDA image from NVCR Pull from nvcr.io to avoid hitting unauthenticated pull rate limits. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-12 12:52:33 +01:00
Manuel Huber	f63f95f315	tests: nvidia: cc: generate pod security policies With these changes, we create pod security policies when running against NVIDIA TEE GPU handlers where AUTO_GENERATE_POLICY is set. For the non-TEE GPU tests, the added functions bail out by design. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-12 12:52:33 +01:00
Manuel Huber	bf26ad9532	nvidia: tests: remove outer CDI annotations With the new device plugin being used by CI runners, these annotations are no longer necessary. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-12 12:52:33 +01:00
Manuel Huber	37b4f6ae8b	tests: Adapt NVIDIA common policy settings Following existing patterns, we adapt the common policy settings for NVIDIA GPU CI platforms. For instance, for our CI runners, we use containerd 2.x. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-12 12:52:33 +01:00
Manuel Huber	f4c0c8546e	tests: Enable AUTO_GENERATE_POLICY for NVIDIA TEEs Enable auto-generate policy for qemu-nvidia-gpu-* if the user didn't specify an AUTO_GENERATE_POLICY value. Setting this in run_kubernetes_nv_tests.sh is too late as gha-run.sh calls into run_tests, setup.sh, and then into create_common_genpolicy_settings() where the rules.rego and genpolicy-settings file are being copied to the right locations. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-12 12:52:33 +01:00
Manuel Huber	560f6f6c74	tests: nvidia: cc: Affirming attestation policy Set the attestation policy for GPU0 to affirming. This requires the GPU, for instance, to have production properties, such as properly signed VBIOS firmware. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-11 10:16:58 +01:00

1 2 3 4 5 ...

1836 Commits