kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-02-21 22:34:29 +00:00

Author	SHA1	Message	Date
Fabiano Fidêncio	b85393e70b	release: Bump version to 3.26.0 Bump VERSION and helm-charts versions. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> 3.26.0	2026-01-29 00:23:26 +01:00
Fabiano Fidêncio	500146bfee	versions: Bump Go to 1.24.12 Update Go from 1.24.11 to 1.24.12 to address security vulnerabilities in the standard library: - GO-2026-4342: Excessive CPU consumption in archive/zip - GO-2026-4341: Memory exhaustion in net/url query parsing - GO-2026-4340: TLS handshake encryption level issue in crypto/tls Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-29 00:23:26 +01:00
Dan Mihai	20ca4d2d79	runtime: DEFDISABLEBLOCK := true 1. Add disable_block_device_use to CLH settings file, for parity with the already existing QEMU settings. 2. Set DEFDISABLEBLOCK := true by default for both QEMU and CLH. After this change, Kata Guests will use by default virtio-fs to access container rootfs directories from their Hosts. Hosts that were designed to use Host block devices attached to the Guests can re-enable these rootfs block devices by changing the value of disable_block_device_use back to false in their settings files. 3. Add test using container image without any rootfs layers. Depending on the container runtime and image snapshotter being used, the empty container rootfs image might get stored on a host block device that cannot be safely hotplugged to a guest VM, because the host is using the same block device. 4. Add block device hotplug safety warning into the Kata Shim configuration files. Signed-off-by: Dan Mihai <dmihai@microsoft.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Cameron McDermott <cameron@northflank.com>	2026-01-28 19:47:49 +01:00
Manuel Huber	5e60d384a2	kata-deploy: Update for mariner in all target Remove the initrd function and add the image function to align with the actually existing functions in this file. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-01-28 08:58:45 -08:00
Greg Kurz	ea627166b9	Merge pull request #12389 from ldoktor/ci-helm ci.ocp: Use 0.0.0-dev tagged helm chart	2026-01-28 17:20:07 +01:00
Manuel Huber	0d8fbdef07	kernel: Readjust kernel version after decrement Readjust the kata_config_version counter after it was accidentally decremented in commit `c7f5ff4`. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-01-28 10:48:12 +01:00
Joji Mekkattuparamban	1440dd7468	shim: enforce iommufd for confidential guest vfio Confidential guests cannot use traditional IOMMU Group based VFIO. Instead, they need to use IMMUFD. This is mainly because the group abstraction is incompatible with a confidential device model. If traditional VFIO is specified for a confidential guest, detect the error and bail out early. Fixes #12393 Signed-off-by: Joji Mekkattuparamban <jojim@nvidia.com>	2026-01-28 00:11:38 +01:00
stevenhorsman	c7bc428e59	versions: Bump guest-components Bump guest-components to 9aae2eae to pick up the latest security fixes and toolchain bump Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-01-28 00:05:58 +01:00
Aurélien Bombo	932920cb86	Merge pull request #11959 from houstar/main agent: remove redundant func comment	2026-01-27 12:01:04 -06:00
Lukáš Doktor	5250d4bacd	ci.ocp: Use 0.0.0-dev tagged helm chart in CI we are testing the latest kata-deploy, which requires the latest helm chart. The previous query doesn't work anymore, but these days we should be able to rely on the "0.0.0-dev" tag and on helm to print the to-be-installed version into console. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2026-01-27 14:58:46 +01:00
Steve Horsman	eb3d204ff3	Merge pull request #12274 from ldoktor/pp-images ci.ocp: Two little fixes regarding the openshift-ci	2026-01-27 11:31:51 +00:00
Lukáš Doktor	971b096a1f	ci.ocp: Update cleanup.sh to cope with helm deployment replaces the old kata-deploy and uses "helm uninstall" instead. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2026-01-27 07:59:13 +01:00
Lukáš Doktor	272ff9c568	ci.ocp: Add notes about where to get other podvm images I keep struggling finding the debug images, let's include them in the peer-pods-azure.sh script so people can find them easier. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2026-01-27 07:59:12 +01:00
Qingyuan Hou	ca43a8cbb8	agent: remove redundant func comment This comment was first introduced in `e111093` with secure_join() but then we forgot to remove it when we switched to the safe-path lib in `c0ceaf6` Signed-off-by: Qingyuan Hou <lenohou@gmail.com>	2026-01-27 03:07:57 +00:00
Alex Lyn	6c0ae4eb04	Merge pull request #11585 from Apokleos/enhance-qmp runtime-rs: Make QMP init robust by retrying handshake with deadline	2026-01-27 09:11:19 +08:00
Zvonko Kaiser	a59f791bf5	gpu: Move CUDA repo selection to versions.yaml We want to enable local and remote CUDA repository builds. Moving the cuda and tools repo to versions.yaml with a unified build for both types. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-01-26 22:19:40 +01:00
Fabiano Fidêncio	d0fe60e784	tests: Fix empty string handling for helm Fix empty string handling in format conversion When HELM_ALLOWED_HYPERVISOR_ANNOTATIONS, HELM_AGENT_HTTPS_PROXY, or HELM_AGENT_NO_PROXY are empty, the pattern matching condition `!= :` or `!= =` evaluates to true, causing the conversion loop to create invalid entries like "qemu-tdx: qemu-snp:". Add -n checks to ensure conversion only runs when variables are non-empty. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	4b2d4e96ae	tests: Add qemu-{tdx,snp}-runtime-rs to the list of tee shims We missed doing this as part of `b5a986eacf`. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	26c534d610	tests: Use shims.disableAll in test helpers Update the CI and functional test helpers to use the new shims.disableAll option instead of iterating over every shim to disable them individually. Also adds helm repo for node-feature-discovery before building dependencies to fix CI failures on some distributions. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	04f45a379c	kata-deploy: docs: Document shims.disableAll option Update the Helm chart README to document the new shims.disableAll option and simplify the examples that previously required listing every shim to disable. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	c9e9a682ab	kata-deploy: Use disableAll in example values files Simplify the example values files by using the new shims.disableAll option instead of listing every shim to disable. Before (try-kata-nvidia-gpu.values.yaml): shims: clh: enabled: false cloud-hypervisor: enabled: false # ... 15 more lines ... After: shims: disableAll: true Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	cfe9bcbaf1	kata-deploy: Add shims.disableAll option to Helm chart Add a new `shims.disableAll` option that disables all standard shims at once. This is useful when: - Enabling only specific shims without listing every other shim - Using custom runtimes only mode (no standard Kata shims) Usage: shims: disableAll: true qemu: enabled: true # Only qemu is enabled All helper templates are updated to check for this flag before iterating over shims. One thing that's super important to note here is that helm recursively merges user values with chart defaults, making a simple `disableAll` flag problematic: if defaults have `enabled: true`, user's `disableAll: true` gets merged with those defaults, resulting in all shims still being enabled. The workaround found is to use null (`~`) as the default for `enabled` field. The template logic interprets null differently based on disableAll: \| enabled value \| disableAll: false \| disableAll: true \| \|---------------\|-------------------\|------------------\| \| ~ (null) \| Enabled \| Disabled \| \| true \| Enabled \| Enabled \| \| false \| Disabled \| Disabled \| This is backward compatible: - Default behavior unchanged: all shims enabled when disableAll: false - Users can set `disableAll: true` to disable all, then explicitly enable specific shims with `enabled: true` - Explicit `enabled: false` always disables, regardless of disableAll Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	d8a3272f85	kata-deploy: Add tests for custom runtimes Helm templates Add Bats tests to verify the custom runtimes Helm template rendering, and that the we can start a pod with the custom runtime. Tests were written with Cursor's help. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	3be57bb501	kata-deploy: Add Helm chart support for custom runtimes Add Helm chart configuration for defining custom RuntimeClasses with base configuration and drop-in overrides. Usage: helm install kata-deploy ./kata-deploy \ -f custom-runtimes.values.yaml Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	a76cdb5814	kata-deploy: Add custom runtime config installation/removal Add functions to install and remove custom runtime configuration files. Each custom runtime gets an isolated directory structure: custom-runtimes/{handler}/ configuration-{baseConfig}.toml # Copied from base config config.d/ 50-overrides.toml # User's drop-in overrides The base config is copied AFTER kata-deploy has applied its modifications (debug settings, proxy configuration, annotations), so custom runtimes inherit these settings. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	4c3989c3e4	kata-deploy: Add custom runtime configuration for containerd/CRI-O Add functions to configure custom runtimes in containerd and CRI-O. Custom runtimes use an isolated config directory under: custom-runtimes/{handler}/ Custom runtimes automatically derive the shim binary path from the baseConfig field using the existing is_rust_shim() logic. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	678b560e6d	kata-deploy: Add CustomRuntime struct and parsing Add support for parsing custom runtime configurations from a mounted ConfigMap. This allows users to define their own RuntimeClasses with custom Kata configurations. The ConfigMap format uses a custom-runtimes.list file with entries: handler:baseConfig:containerd_snapshotter:crio_pulltype Drop-in files are read from dropin-{handler}.toml, if present. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	609a25e643	kata-deploy: Refactor runtime configuration with helper functions Let's extract the common logic from configure_containerd_runtime and configure_crio_runtime into reusable helper functions. This reduces code duplication and prepares for adding custom runtime support. For containerd: - Add ContainerdRuntimeParams struct to encapsulate common parameters - Add get_containerd_pluginid() to extract version detection logic - Add get_containerd_output_path() to extract file path resolution - Add write_containerd_runtime_config() to write common TOML values For CRI-O: - Add CrioRuntimeParams struct to encapsulate common parameters - Add write_crio_runtime_config() to write common configuration While here, let's also simplify pod_annotations to always use "[\"io.katacontainers.*\"]" for all runtimes, as the NVIDIA specific case has been removed from the shell script, but we forgot to do so here. No functional changes intended. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Steve Horsman	aa94038355	Merge pull request #12388 from Apokleos/fix-shimio runtime-rs: se File instead of UnixStream for FIFO to fix ENOTSOCK	2026-01-26 13:22:57 +00:00
tak-ka3	5471fa133c	runtime-rs: Add -info flag support for containerd v2.0+ Add -info flag handling to containerd-shim-kata-v2 (Rust version). This outputs RuntimeInfo protobuf (name, version, revision) to stdout, providing compatibility with containerd v2.0+ which queries runtime information via this flag. This is the runtime-rs counterpart to the Go implementation. Fixes #12133 Signed-off-by: tak-ka3 <takumi.hiraoka@acompany-ac.com>	2026-01-26 13:38:07 +01:00
Alex Lyn	68d671af0f	runtime-rs: Make QMP init robust by retrying handshake with deadline It aims to make QMP initialize robust by retrying QMP handshake with global deadline to handle slow QEMU bring-up. Qmp::new() used DEFAULT_QMP_READ_TIMEOUT as the effective deadline for the QMP handshake read. When QEMU initialization is slow (e.g. heavy host load, large memory/device init, slow storage, confidential guests, etc.), the QMP greeting may not become readable within a small per-read timeout (e.g. 250ms). This caused QMP init to fail with "Resource temporarily unavailable (os error 11)" and spam "couldn't initialise QMP", while subsequent retries might eventually succeed once QEMU became ready. To address this issue, keep a short per-read timeout to avoid indefinite blocking, but add a global "wait for QMP ready" deadline that retries the handshake with a small backoff. This improves startup reliability under load and avoids unnecessary reconnect failures. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-26 16:47:32 +08:00
Bo Liu	c7f5ff45a2	arm64: Update ptp.conf to correct time sync Given the patch has been merged in linux upstream, it's safe to enable these two options. Signed-off-by: Bo Liu <152475812+liubocflt@users.noreply.github.com>	2026-01-24 21:08:21 +01:00
Hui Zhu	37a0c81b6a	libs: Change kv of get_agent_kernel_params to BTreeMap HashMap cannot guarantee the order. The command line is always changed. This commit change kv of get_agent_kernel_params to BTreeMap to make sure the command line is not changed. Fixes: #10977 Signed-off-by: Hui Zhu <teawater@antgroup.com>	2026-01-24 21:07:41 +01:00
Alex Lyn	e7b8b302ac	runtime-rs: se File instead of UnixStream for FIFO to fix ENOTSOCK It aims to address the issue: "run_io_copy[Stdout]: failed to copy stream: Not a socket (os error 88)" The `Not a socket (os error 88)` error was caused by incorrectly wrapping a FIFO file descriptor in a `UnixStream`. The following changes: (1) Refactor `open_fifo_write` to return `tokio::fs::File` (or a generic async reader/writer) instead of `AsyncUnixStream`. (2) Ensure IO copying logic treats stdout/stderr streams as file-like objects rather than sockets. This fix eliminates the "failed to copy stream" errors in the IO loop and ensures reliable log forwarding for legacy-io. Fixes: #12387 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-24 10:41:27 +00:00
Alex Lyn	8a0fad4b95	runtime-rs: Move the set_flag_with_blocking out as a public method Move the private closure out and make it a public method which is responsible for clear O_NONBLOCK for an fd and turn it into blocking mode. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-24 10:41:27 +00:00
Manuel Huber	0d35b36652	Revert "ci: Ensure the KBS resources are created" This reverts commit `c0d7222194`. Soon, guest components will switch to using a DB instead of storing resources in the filesystem. Further, I don't see any more indicators why kbs-client would struggle to set simple resources. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-01-23 16:18:10 -08:00
Fabiano Fidêncio	5b82b160e2	runtime-rs: Add arm64 QEMU support Add the necessary configuration and code changes to support QEMU on arm64 architecture in runtime-rs. Changes: - Set MACHINETYPE to "virt" for arm64 - Add machine accelerators "usb=off,gic-version=host" required for proper arm64 virtualization - Add arm64-specific kernel parameter "iommu.passthrough=0" - Guard vIOMMU (Intel IOMMU) to skip on arm64 since it's not supported These changes align runtime-rs with the Go runtime's arm64 QEMU support. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org>	2026-01-23 19:48:31 +01:00
tak-ka3	29e7dd27f1	runtime: Add -info flag support for containerd v2.0+ Add support for the -info flag that containerd v2.0+ passes to shims. The flag outputs RuntimeInfo protobuf to stdout containing the shim name and version information. Fixes #12133 Signed-off-by: tak-ka3 <takumi.hiraoka@acompany-ac.com>	2026-01-22 19:26:44 +01:00
Steve Horsman	d0bfb27857	Merge pull request #12384 from Apokleos/fix-full-debug doc: update enabling full debug method	2026-01-22 14:25:11 +00:00
Fabiano Fidêncio	ac8436e326	kata-deploy: Update debian in the container image to 13 (trixie) Just a bump to the latest version, as requested by Mikko. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-22 12:32:59 +01:00
Steve Horsman	2cd76796bd	Merge pull request #12305 from stevenhorsman/fix-stalebot-permissions ci: Fix stalebot permissions	2026-01-22 10:02:43 +00:00
Alex Lyn	fb7390ce3c	doc: update enabling full debug method The enable_debug parameter was explicitly set to false rather than being commented out (e.g., # enable_debug = true). As the previous enabling method failed to account for this explicit setting, it was rendered invalid. This commit updates the matching logic to correctly handle and toggle the explicit false value. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-22 17:44:57 +08:00
Hyounggyu Choi	bc131a84b9	GHA: Set timeout for kata-deploy and kbs cleanup It was observed that some kata-deploy cleanup steps could hang, causing the workflow to never finish properly. In these cases, a QEMU process was not cleaned up and kept printing debug logs to the journal. Over time, this maxed out the runner’s disk usage and caused the runner service to stop. Set timeouts for the relevant cleanup steps to avoid this. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2026-01-22 10:32:24 +01:00
Fabiano Fidêncio	dacb14619d	kata-deploy: Make verification ConfigMap a regular resource The verification job mounts a ConfigMap containing the pod spec for the Kata runtime test. Previously, both the ConfigMap and the Job were Helm hooks with different weights (-5 and 0 respectively). On k3s, a race condition was observed where the Job pod would be scheduled before the kubelet's informer cache had registered the ConfigMap, causing a FailedMount error: MountVolume.SetUp failed for volume "pod-spec": object "kube-system"/"kata-deploy-verification-spec" not registered This happened because k3s's lightweight architecture schedules pods very quickly, and the hook weight difference only controls Helm's ordering, not actual timing between resource creation and cache sync. By making the ConfigMap a regular chart resource (removing hook annotations), it is created during the main chart installation phase, well before any post-install hooks run. This guarantees the ConfigMap is fully propagated to all kubelets before the verification Job starts. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-21 20:14:33 +01:00
Fabiano Fidêncio	89e287c3b2	kata-deploy: Add more permissions to verification job's RBAC The verification job needs to list nodes to check for the katacontainers.io/kata-runtime label and list events to detect FailedCreatePodSandBox errors during pod creation. This was discovered when testing with k0s, where the service account lacked the required cluster-scope permissions to list nodes. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-21 20:14:33 +01:00
Fabiano Fidêncio	869dd5ac65	kata-deploy: Enable dynamic drop-in support for k0s Remove k0s-worker and k0s-controller from RUNTIMES_WITHOUT_CONTAINERD_DROP_IN_SUPPORT and always return true for k0s in is_containerd_capable_of_using_drop_in_files since k0s auto-loads from containerd.d/ directory regardless of containerd version. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-21 20:14:33 +01:00
Fabiano Fidêncio	d4ea02e339	kata-deploy: Add microk8s support with dynamic version detection Add microk8s case to get_containerd_paths() method and remove microk8s from RUNTIMES_WITHOUT_CONTAINERD_DROP_IN_SUPPORT to enable dynamic containerd version checking. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-21 20:14:33 +01:00
Fabiano Fidêncio	69dd9679c2	kata-deploy: Centralize containerd path management Introduce ContainerdPaths struct and get_containerd_paths() method to centralize the complex logic for determining containerd configuration file paths across different Kubernetes distributions. The new ContainerdPaths struct includes: - config_file: File to read containerd version from and write to - backup_file: Backup file path before modification - imports_file: File to add/remove drop-in imports from (Option<String>) - drop_in_file: Path to the drop-in configuration file - use_drop_in: Whether drop-in files can be used Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-21 20:14:33 +01:00
Fabiano Fidêncio	606c12df6d	kata-deploy: fix JSONPath parsing for labels with dots The JSONPath parser was incorrectly splitting on escaped dots (\.) causing microk8s detection to fail. Labels like "microk8s.io/cluster" were being split into ["microk8s\", "io/cluster"] instead of being treated as a single key. This adds a split_jsonpath() helper that properly handles escaped dots, allowing the automatic microk8s detection via the node label to work correctly. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-21 20:14:33 +01:00
Fabiano Fidêncio	ec18dd79ba	tests: Simplify kata-deploy test to use helm directly The kata-deploy test was using helm_helper which made it hard to debug failures (die() calls would cause "Executed 0 tests" errors) and added unnecessary complexity. The test now calls helm directly like a user would, making it simpler and more representative of real-world usage. The verification job status is explicitly checked with proper failure detection instead of relying on helm --wait. Timeouts are configurable via environment variables to account for different network speeds and image sizes: - KATA_DEPLOY_TIMEOUT (default: 600s) - KATA_DEPLOY_DAEMONSET_TIMEOUT (default: 300s) - KATA_DEPLOY_VERIFICATION_TIMEOUT (default: 120s) Documentation has been added to explain what each timeout controls and how to customize them. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-21 20:14:33 +01:00

1 2 3 4 5 ...

17770 Commits