kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-02-21 22:34:29 +00:00

Author	SHA1	Message	Date
Fabiano Fidêncio	2930c68c0b	ci: tdx: properly skip k8s-sandbox-vcpus-allocation.bats This is a follow-up for `25962e9325` Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-13 20:56:08 +01:00
Fabiano Fidêncio	8cb7d0be9d	tests: nvidia: Fix genpolicy error when pulling nvcr.io images genpolicy pulls image manifests from nvcr.io to generate policy and was failing with 'UnauthorizedError' because it had no registry credentials. Genpolicy (src/tools/genpolicy) uses docker_credential::get_credential() in registry.rs, which reads from DOCKER_CONFIG/config.json. Add setup_genpolicy_registry_auth() to create a Docker config with nvcr.io auth (NGC_API_KEY) and set DOCKER_CONFIG before running genpolicy so it can authenticate when pulling manifests. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-13 13:12:55 +01:00
Fabiano Fidêncio	6a3bbb1856	tests: Retry k8s deployment We've seen a lot of spurious issues when deploying the infra needed for the tests. Let's give it a few tries before actually failing. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-02-12 20:13:59 +01:00
Mikko Ylinen	25962e9325	tests/coco: disable k8s-sandbox-vcpus-allocation.bats for TDX After the move to Linux 6.17 and QEMU 10.2 from Kata, k8s-sandbox-vcpus-allocation.bats started failing on TDX. 2026-02-10T16:39:39.1305813Z # pod/vcpus-less-than-one-with-no-limits created 2026-02-10T16:39:39.1306474Z # pod/vcpus-less-than-one-with-limits created 2026-02-10T16:39:39.1307090Z # pod/vcpus-more-than-one-with-limits created 2026-02-10T16:39:39.1307672Z # pod/vcpus-less-than-one-with-limits condition met 2026-02-10T16:39:39.1308373Z # timed out waiting for the condition on pods/vcpus-less-than-one-with-no-limits 2026-02-10T16:39:39.1309132Z # timed out waiting for the condition on pods/vcpus-more-than-one-with-limits 2026-02-10T16:39:39.1310370Z # Error from server (BadRequest): container "vcpus-less-than-one-with-no-limits" in pod "vcpus-less-than-one-with-no-limits" is waiting to start: ContainerCreating A manual test without agent policies added it seems to work OK but disable the test for now to get CI stable. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2026-02-11 22:02:59 +01:00
Hyounggyu Choi	c84e37f6ac	Merge pull request #12486 from BbolroC/cpu-hotplug-s390x-runtime-rs runtime-rs: Skip sockets and threads for hotplug_vcpus on Z/P	2026-02-11 09:40:21 +01:00
Hyounggyu Choi	67f54bdcb5	tests: Remove skip condition for runtime-rs on s390x in k8s-cpu-ns This commit removes the skip condition for qemu-runtime-rs on s390x in k8s-cpu-ns.bats. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2026-02-11 05:52:13 +01:00
stevenhorsman	15d6a681ed	doc: Fix spelling issues Put things in backticks Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-10 21:58:28 +01:00
Fabiano Fidêncio	5c0269881e	tests: Make editorconfig-checker happy - Trim trailing whitespace and ensure final newline in non-vendor files - Add .editorconfig-checker.json excluding vendor dirs, .patch, .img, .dtb, .drawio, *.svg, and pkg/cloud-hypervisor/client so CI only checks project code - Leave generated and binary assets unchanged (excluded from checker) Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-02-10 21:58:28 +01:00
Fabiano Fidêncio	cb652e0da1	tests: Update NVRC trace to use drop-in config mechanism Update the enable_nvrc_trace() function to use the new drop-in configuration mechanism instead of directly modifying the base configuration file. The function now creates a 90-nvrc-trace.toml drop-in file that properly combines existing kernel parameters with the nvrc.log=trace setting. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Manuel Huber	a6ca5c6628	ci: add editorconfig checker This adds a basic configuration for editorconfig checker. The supplied configuration checks against trailing whitespaces and issues with newlines. Example: \| tools/packaging/kernel/configs/fragments/x86_64/numa.conf: \| Wrong line endings or no final newline \| tools/packaging/release/generate_vendor.sh: \| 44: Trailing whitespace Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-09 15:03:26 -08:00
Manuel Huber	525192832f	tests: Clean up superfluous GPU annotation This annotation was required for GPU cold-plug before using a newer device plugin and before querying the pod resources API. As this annotation is no longer required, cleaning it up. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-09 11:28:24 -08:00
Alex Lyn	3fda59e27d	tests: rename pod_exec_with_retries to pod_exec and update callers It will do following works in this commit: (1) Rename pod_exec_with_retries() to pod_exec(). (2) Update implementation to call container_exec(). (3) Replace all usages of pod_exec_with_retries across tests with pod_exec. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-02-09 15:56:13 +01:00
Alex Lyn	861d39305c	tests: drop kubectl exec retries in container_exec This commit aims to drop retries when kubectl exec a container: (1) Rename container_exec_with_retries() to container_exec(). (2) Remove the retry loop and sleep backoff around kubectl exec. Keep the same logging and container-selection logic and return kubectl exec exit status directly. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-02-09 15:56:13 +01:00
stevenhorsman	b29312289f	versions: Bump go to 1.24.13 Bump go to 1.24.13 to fix CVE GO-2026-4337 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-09 14:49:31 +01:00
Manuel Huber	cf7f340b39	tests: Read and overwrite kernel_verity_parameters Read the kernel_verity_paramers from the shim config and adjust the root hash for the negative test. Further, improve some of the test logic by using shared functions. This especially ensures we don't read the full journalctl logs on a node but only the portion of the logs we are actually supposed to look at. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	e120dd4cc6	tests: cc: Remove quotes from kernel command line With dm-mod.create parameters using quotes, we remove the backslashes used to escape these quotes from the output we retrieve. This will enable attestation tests to work with the kernelinit dm-verity mode. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	282014000f	tests: cc: support initrd, image for attestation Allow using an image instead of an initrd. For confidential guests using images, the assumption is that the guest kernel uses dm-verity protection, implicitly measuring the rootfs image via the kernel command line's dm-verity information. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Fabiano Fidêncio	dda1b30c34	tests: nvidia-nim: Use sealed secrets for NGC_API_KEY Convert the NGC_API_KEY from a regular Kubernetes secret to a sealed secret for the CC GPU tests. This ensures the API key is only accessible within the confidential enclave after successful attestation. The sealed secret uses the "vault" type which points to a resource stored in the Key Broker Service (KBS). The Confidential Data Hub (CDH) inside the guest will unseal this secret by fetching it from KBS after attestation. The initdata file is created AFTER create_tmp_policy_settings_dir() copies the empty default file, and BEFORE auto_generate_policy() runs. This allows genpolicy to add the generated policy.rego to our custom CDH configuration. The sealed secret format follows the CoCo specification: sealed.<JWS header>.<JWS payload>.<signature> Where the payload contains: - version: "0.1.0" - type: "vault" (pointer to KBS resource) - provider: "kbs" - resource_uri: KBS path to the actual secret Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-04 12:34:44 +01:00
Fabiano Fidêncio	c9061f9e36	tests: kata-deploy: Increase post-deployment wait time Increase the sleep time after kata-deploy deployment from 10s to 60s to give more time for runtimes to be configured. This helps avoid race conditions on slower K8s distributions like k3s where the RuntimeClass may not be immediately available after the DaemonSet rollout completes. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-04 12:13:53 +01:00
Fabiano Fidêncio	0fb2c500fd	tests: kata-deploy: Merge E2E tests to avoid timing issues Merge the two E2E tests ("Custom RuntimeClass exists with correct properties" and "Custom runtime can run a pod") into a single test, as those 2 are very much dependent of each other. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-04 12:13:53 +01:00
Fabiano Fidêncio	fef93f1e08	tests: kata-deploy: Use die() instead of fail() for error handling Replace fail() calls with die() which is already provided by common.bash. The fail() function doesn't exist in the test infrastructure, causing "command not found" errors when tests fail. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-04 12:13:53 +01:00
Dan Mihai	d7ff54769c	tests: policy: remove the need for using sudo Modify the copy of root user's settings file, instead of modifying the original file. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2026-02-01 20:09:50 +01:00
Dan Mihai	4d860dcaf5	tests: policy: avoid redundant debug output Avoid redundant and confusing teardown_common() debug output for k8s-policy-pod.bats and k8s-policy-pvc.bats. The Policy tests skip the Message field when printing information about their pods, because unfortunately that field might contain a truncated Policy log - for the test cases that intentiocally cause Policy failures. The non-truncated Policy log is already available from other "kubectl describe" fields. So, avoid the redundant pod information from teardown_common(), that also included the confusing Message field. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2026-02-01 20:09:50 +01:00
Steve Horsman	4d1095e653	Merge pull request #12350 from manuelh-dev/mahuber/term-grace-period tests: Remove terminationGracePeriod in manifests	2026-01-29 15:17:17 +00:00
Fabiano Fidêncio	500146bfee	versions: Bump Go to 1.24.12 Update Go from 1.24.11 to 1.24.12 to address security vulnerabilities in the standard library: - GO-2026-4342: Excessive CPU consumption in archive/zip - GO-2026-4341: Memory exhaustion in net/url query parsing - GO-2026-4340: TLS handshake encryption level issue in crypto/tls Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-29 00:23:26 +01:00
Dan Mihai	20ca4d2d79	runtime: DEFDISABLEBLOCK := true 1. Add disable_block_device_use to CLH settings file, for parity with the already existing QEMU settings. 2. Set DEFDISABLEBLOCK := true by default for both QEMU and CLH. After this change, Kata Guests will use by default virtio-fs to access container rootfs directories from their Hosts. Hosts that were designed to use Host block devices attached to the Guests can re-enable these rootfs block devices by changing the value of disable_block_device_use back to false in their settings files. 3. Add test using container image without any rootfs layers. Depending on the container runtime and image snapshotter being used, the empty container rootfs image might get stored on a host block device that cannot be safely hotplugged to a guest VM, because the host is using the same block device. 4. Add block device hotplug safety warning into the Kata Shim configuration files. Signed-off-by: Dan Mihai <dmihai@microsoft.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Cameron McDermott <cameron@northflank.com>	2026-01-28 19:47:49 +01:00
Fabiano Fidêncio	d0fe60e784	tests: Fix empty string handling for helm Fix empty string handling in format conversion When HELM_ALLOWED_HYPERVISOR_ANNOTATIONS, HELM_AGENT_HTTPS_PROXY, or HELM_AGENT_NO_PROXY are empty, the pattern matching condition `!= :` or `!= =` evaluates to true, causing the conversion loop to create invalid entries like "qemu-tdx: qemu-snp:". Add -n checks to ensure conversion only runs when variables are non-empty. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	4b2d4e96ae	tests: Add qemu-{tdx,snp}-runtime-rs to the list of tee shims We missed doing this as part of `b5a986eacf`. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	26c534d610	tests: Use shims.disableAll in test helpers Update the CI and functional test helpers to use the new shims.disableAll option instead of iterating over every shim to disable them individually. Also adds helm repo for node-feature-discovery before building dependencies to fix CI failures on some distributions. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	d8a3272f85	kata-deploy: Add tests for custom runtimes Helm templates Add Bats tests to verify the custom runtimes Helm template rendering, and that the we can start a pod with the custom runtime. Tests were written with Cursor's help. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Manuel Huber	6438fe7f2d	tests: Remove terminationGracePeriod in manifests Do not kill containers immediately, instead use Kubernetes' default termination grace period. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-01-23 16:18:44 -08:00
Manuel Huber	0d35b36652	Revert "ci: Ensure the KBS resources are created" This reverts commit `c0d7222194`. Soon, guest components will switch to using a DB instead of storing resources in the filesystem. Further, I don't see any more indicators why kbs-client would struggle to set simple resources. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-01-23 16:18:10 -08:00
Fabiano Fidêncio	5b82b160e2	runtime-rs: Add arm64 QEMU support Add the necessary configuration and code changes to support QEMU on arm64 architecture in runtime-rs. Changes: - Set MACHINETYPE to "virt" for arm64 - Add machine accelerators "usb=off,gic-version=host" required for proper arm64 virtualization - Add arm64-specific kernel parameter "iommu.passthrough=0" - Guard vIOMMU (Intel IOMMU) to skip on arm64 since it's not supported These changes align runtime-rs with the Go runtime's arm64 QEMU support. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org>	2026-01-23 19:48:31 +01:00
Fabiano Fidêncio	ec18dd79ba	tests: Simplify kata-deploy test to use helm directly The kata-deploy test was using helm_helper which made it hard to debug failures (die() calls would cause "Executed 0 tests" errors) and added unnecessary complexity. The test now calls helm directly like a user would, making it simpler and more representative of real-world usage. The verification job status is explicitly checked with proper failure detection instead of relying on helm --wait. Timeouts are configurable via environment variables to account for different network speeds and image sizes: - KATA_DEPLOY_TIMEOUT (default: 600s) - KATA_DEPLOY_DAEMONSET_TIMEOUT (default: 300s) - KATA_DEPLOY_VERIFICATION_TIMEOUT (default: 120s) Documentation has been added to explain what each timeout controls and how to customize them. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-21 20:14:33 +01:00
Fabiano Fidêncio	2369cf585d	tests: Fix retry loop bugs in helm_helper The retry loop in helm_helper had two bugs: 1. Counter initialized to 10 instead of 0, causing immediate failure 2. Exit condition used -eq instead of -ge, incorrect for loop logic These bugs would cause helm_helper to fail immediately on the first retry attempt instead of properly retrying up to max_tries times. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-21 20:14:33 +01:00
Fabiano Fidêncio	e0158869b1	tests: Add common bats test runner function Add run_bats_tests() function to common.bash that provides consistent test execution and reporting across all test suites (k8s, nvidia, kata-deploy). This removes duplicated test runner code from run_kubernetes_tests.sh, run_kubernetes_nv_tests.sh, and run-kata-deploy-tests.sh. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-20 12:31:55 +01:00
Fabiano Fidêncio	b5a986eacf	kata-deploy: Add runtime-rs TDX / SNP runtimeclasses https://github.com/kata-containers/kata-containers/pull/11534 has been merged and it added all the needed bits to deploy the QEMU SNP / TDX runtime-rs variants, apart from the kata-deploy additions, which is done by this PR. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-19 22:41:50 +01:00
Fabiano Fidêncio	c7570427d2	tests: Add report generation to NVIDIA tests The NVIDIA GPU test runner script was not generating test reports, causing the report_tests() function in gha-run.sh to have nothing to display. This aligns the script with run_kubernetes_tests.sh by: - Adding set -o pipefail for proper pipeline error handling - Creating a reports directory with timestamped subdirectory - Capturing test output to files with ok-/not_ok- prefixes - Adding --timing flag to bats for timing information Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-19 18:21:43 +01:00
Fabiano Fidêncio	96e1fb4ca6	tools: Remove runk The runk tool hasn't been supported for a few years, with no maintainers since ManaSugi stopped being involved in the project and the CI was disabled in 2024. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-19 14:43:53 +01:00
Fabiano Fidêncio	ea18f543b4	tests: kata-deploy: Enable verification during helm install Enable post-install verification in kata-deploy CI tests. When HELM_VERIFY_DEPLOYMENT is set, a simple verification pod is created that runs with the Kata runtime to confirm deployment succeeded. The verification pod prints kernel info and exits - success indicates the Kata runtime is properly configured and functional. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-16 10:52:43 +01:00
Alex Lyn	fba92880c9	tests: make set_container_command idempotent and add debug output set_container_command() previously appended command arguments one-by-one with '.command += [...]'. This makes the helper non-idempotent and can lead to unexpected command arrays when invoked multiple times. Update the helper to set the full command array in a single yq v4 expression and print the target YAML path plus the command being applied to simplify debugging when tests fail. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-12 17:56:28 +01:00
Alex Lyn	38296a41b2	tests: Generate pod config with stable .yaml suffix The pod config file created by new_pod_config() was generated via mktemp using the template "pod-config.yaml.in.XXX", which produces filenames that do not end with ".yaml" (e.g. pod-config.yaml.in.ABC). If the random combination of special suffix with ".Csv" or ".Xml", etc. the following operations with yq will fail. Some helpers and tooling assume the config path ends with ".yaml". Switch the mktemp template to place the random suffix before the extension so the returned path always ends with ".yaml". Fixes: #12268, #12319 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-12 17:56:28 +01:00
Manuel Huber	183507beeb	agent: change secure_storage_integrity default Change the secure_storage_integrity option's default value to true. With this, integrity protection for encrypted block device contents will be requested from the confidential data hub by default, see the agent's cdh_handler_trusted_storage function in rpc.rs. This behavior can be disabled by explicitly setting the agent.secure_storage_integrity parameter to 0 or false via kernel command line parameters. This will affect the trusted storage implementation for the guest-pull mechanism, and it will affect future implementations using this code path, such as implementations for ephemeral secure storage. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-01-10 16:54:03 +01:00
Manuel Huber	df2896c298	docs: Create NVIDIA GPU passthrough QEMU scenario Create a new page for a reference implementation for Kubernetes using QEMU, the go shim and an NVIDIA rootfs. The new page contains information on: - components involved in the NVIDIA (TEE) GPU scenario - orchestration flow for GPU passthrough scenarios - deployment guidance Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-01-09 19:02:56 +01:00
Saul Paredes	02979a13e3	Merge pull request #12208 from romoh/patch-1 ci: Update AKS setup post Pod Sandboxing GA	2026-01-08 11:02:05 -08:00
Fabiano Fidêncio	6b3953dd51	tests: k8s: liveness-probes: Adjust events grep Till k8s 1.34 we could grep by "Started containerd". From k8s 1.35 onwards the event message changed and we should, instead, grep by "Container started". Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-07 23:01:59 +01:00
Roaa Sakr	44c79cf14a	ci: Update AKS setup post Pod Sandboxing GA Update workload-runtime value to align with current AKS Pod Sandboxing documentation post GA. Signed-off-by: Roaa Sakr <romoh@microsoft.com>	2026-01-05 13:47:33 -08:00
Hyounggyu Choi	3fa1d93f85	tests: remove re-delcared local variable in k8s-empty-dirs.bats Since #12204 was merged, the following error has been observed: ``` bats warning: Executed 1 instead of expected 2 tests [run_kubernetes_tests.sh:162] ERROR: Tests FAILED from suites: k8s-empty-dirs.bats ``` The cause is that `pod_logs_file` is re-declared as a local variable in the second test before skipping, which makes it inaccessible in `teardown()` and leads to an error. This commit removes the re-declaration of the variable. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-12-18 18:57:16 +01:00
Manuel Huber	78c41b61f4	tests: nvidia: Update images, probes and timeouts Changes in NIM/RAG samples: - update image references - update memory requirements, timeouts, model name - sanitize some of the probes and print-out Further refinements can be made in the future. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-18 10:57:14 +01:00
Manuel Huber	0373428de4	tests: nvidia: Use secret for NGC API key This is a slight change in the manifest to at least use a secret for the environment variable. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-18 10:57:14 +01:00

1 2 3 4 5 ...

1861 Commits