kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-02-21 22:34:29 +00:00

Author	SHA1	Message	Date
Manuel Huber	d9d1073cf1	gpu: Install packages for devkit Introduce a new function to install additional packages into the devkit flavor. With modprobe, we avoid errors on pod startup related to loading nvidia kernel modules in the NVRC phase. Note, the production flavor gets modprobe from busybox, see its configuration file containing CONFIG_MODPROBE=y. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-06 09:58:32 +01:00
Manuel Huber	a786582d0b	rootfs: deprecate initramfs dm-verity mode Remove the initramfs folder, its build steps, and use the kernel based dm-verity enforcement for the handlers which used the initramfs mode. Also, remove the initramfs verity mode capability from the shims and their configs. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	976df22119	rootfs: Change condition for cryptsetup-bin Measured rootfs mode and CDH secure storage feature require the cryptsetup-bin and e2fsprogs components in the guest. This change makes this more explicity - confidential guests are users of the CDH secure container image layer storage feature. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	a3c4e0b64f	rootfs: Introduce kernelinit dm-verity mode This change introduces the kernelinit dm-verity mode, allowing initramfs-less dm-verity enforcement against the rootfs image. For this, the change introduces a new variable with dm-verity information. This variable will be picked up by shim configurations in subsequent commits. This will allow the shims to build the kernel command line with dm-verity information based on the existing kernel_parameters configuration knob and a new kernel_verity_params configuration knob. The latter specifically provides the relevant dm-verity information. This new configuration knob avoids merging the verity parameters into the kernel_params field. Avoiding this, no cumbersome escape logic is required as we do not need to pass the dm-mod.create="..." parameter directly in the kernel_parameters, but only relevant dm-verity parameters in semi-structured manner (see above). The only place where the final command line is assembled is in the shims. Further, this is a line easy to comment out for developers to disable dm-verity enforcement (or for CI tasks). This change produces the new kernelinit dm-verity parameters for the NVIDIA runtime handlers, and modifies the format of how these parameters are prepared for all handlers. With this, the parameters are currently no longer provided to the kernel_params configuration knob for any runtime handler. This change alone should thus not be used as dm-verity information will no longer be picked up by the shims. systemd-analyze on the coco-dev handler shows that using the kernelinit mode on a local machine, less time is spent in the kernel phase, slightly speeding up pod start-up. On that machine, the average of 172.5ms was reduced to 141ms (4 measurements, each with a basic pod manifest), i.e., the kernel phase duration is improved by about 18 percent. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	83a0bd1360	gpu: use dm-verity for the non-TEE GPU handler Use a dm-verity protected rootfs image for the non-TEE NVIDIA GPU handler as well. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	02ed4c99bc	rootfs: Use maxdepth=1 to search for kata tarballs These tarballs are in the top layer of the build directory, no need to traverse all sub-directories. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	d37db5f068	rootfs: Restore "gpu: Handle root_hash.txt ..." This reverts commit `923f97bc66` in order to re-instantiate the logic from commit `e4a13b9a4a`. The latter commit was previously reverted due to the NVIDIA GPU TEE handler using an initrd, not an image. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	f1ca547d66	initramfs: introduce log function Log to /dev/kmsg, this way logs will show up and not get lost. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Fabiano Fidêncio	f90c12d4df	kata-deploy: Avoid text file busy error with nydus-snapshotter We cannot overwrtie a binary that's currently in use, and that's the reason that elsewhere we remove / unlink the binary (the running process keeps its file descriptor, so we're good doing that) and only then we copy the binary. However, we missed doing this for the nydus-snapshotter deployment. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-04 10:24:49 +01:00
Steve Horsman	6bb77a2f13	Merge pull request #12390 from mythi/tdx-updates-2026-2 runtime: tdx QEMU configuration changes	2026-02-02 16:58:44 +00:00
Zvonko Kaiser	6702b48858	Merge pull request #12428 from fidencio/topic/nydus-snapshotter-start-from-a-clean-state kata-deploy: nydus: Always start from a clean state	2026-02-02 11:21:26 -05:00
Steve Horsman	0530a3494f	Merge pull request #12415 from nlle/make-helm-updatestrategy-configurable kata-deploy: Make update strategy configurable for kata-deploy DaemonSet	2026-02-02 10:29:01 +00:00
Fabiano Fidêncio	62ad0814c5	kata-deploy: nydus: Always start from a clean state Clean up existing nydus-snapshotter state to ensure fresh start with new version. This is safe across all K8s distributions (k3s, rke2, k0s, microk8s, etc.) because we only touch the nydus data directory, not containerd's internals. When containerd tries to use non-existent snapshots, it will re-pull/re-unpack. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-02 11:06:37 +01:00
Mikko Ylinen	870630c421	kata-deploy: drop custom TDX installation steps As we have moved to use QEMU (and OVMF already earlier) from kata-deploy, the custom tdx configurations and distro checks are no longer needed. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2026-02-02 11:11:26 +02:00
Nikolaj Lindberg Lerche	6e98df2bac	kata-deploy: Make update strategy configurable for kata-deploy DaemonSet This Allows the updateStrategy to be configured for the kata-deploy helm chart, this is enabling administrators to control the aggressiveness of updates. For a less aggressive approach, the strategy can be set to `OnDelete`. Alternatively, the update process can be made more aggressive by adjusting the `maxUnavailable` parameter. Signed-off-by: Nikolaj Lindberg Lerche <nlle@ambu.com>	2026-02-01 20:14:29 +01:00
Manuel Huber	8b0c199f43	packaging: Delete pause_bundle dir before unpack Delete the pause_bundle directory before running the umoci unpack operation. This will make builds idempotent and not fail with errors like "create runtime bundle: config.json already exists in .../build/pause-image/destdir/pause_bundle". This will make life better when building locally. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-01-31 19:43:11 +01:00
Fabiano Fidêncio	b85393e70b	release: Bump version to 3.26.0 Bump VERSION and helm-charts versions. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-29 00:23:26 +01:00
Fabiano Fidêncio	500146bfee	versions: Bump Go to 1.24.12 Update Go from 1.24.11 to 1.24.12 to address security vulnerabilities in the standard library: - GO-2026-4342: Excessive CPU consumption in archive/zip - GO-2026-4341: Memory exhaustion in net/url query parsing - GO-2026-4340: TLS handshake encryption level issue in crypto/tls Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-29 00:23:26 +01:00
Manuel Huber	5e60d384a2	kata-deploy: Update for mariner in all target Remove the initrd function and add the image function to align with the actually existing functions in this file. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-01-28 08:58:45 -08:00
Manuel Huber	0d8fbdef07	kernel: Readjust kernel version after decrement Readjust the kata_config_version counter after it was accidentally decremented in commit `c7f5ff4`. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-01-28 10:48:12 +01:00
Zvonko Kaiser	a59f791bf5	gpu: Move CUDA repo selection to versions.yaml We want to enable local and remote CUDA repository builds. Moving the cuda and tools repo to versions.yaml with a unified build for both types. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-01-26 22:19:40 +01:00
Fabiano Fidêncio	04f45a379c	kata-deploy: docs: Document shims.disableAll option Update the Helm chart README to document the new shims.disableAll option and simplify the examples that previously required listing every shim to disable. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	c9e9a682ab	kata-deploy: Use disableAll in example values files Simplify the example values files by using the new shims.disableAll option instead of listing every shim to disable. Before (try-kata-nvidia-gpu.values.yaml): shims: clh: enabled: false cloud-hypervisor: enabled: false # ... 15 more lines ... After: shims: disableAll: true Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	cfe9bcbaf1	kata-deploy: Add shims.disableAll option to Helm chart Add a new `shims.disableAll` option that disables all standard shims at once. This is useful when: - Enabling only specific shims without listing every other shim - Using custom runtimes only mode (no standard Kata shims) Usage: shims: disableAll: true qemu: enabled: true # Only qemu is enabled All helper templates are updated to check for this flag before iterating over shims. One thing that's super important to note here is that helm recursively merges user values with chart defaults, making a simple `disableAll` flag problematic: if defaults have `enabled: true`, user's `disableAll: true` gets merged with those defaults, resulting in all shims still being enabled. The workaround found is to use null (`~`) as the default for `enabled` field. The template logic interprets null differently based on disableAll: \| enabled value \| disableAll: false \| disableAll: true \| \|---------------\|-------------------\|------------------\| \| ~ (null) \| Enabled \| Disabled \| \| true \| Enabled \| Enabled \| \| false \| Disabled \| Disabled \| This is backward compatible: - Default behavior unchanged: all shims enabled when disableAll: false - Users can set `disableAll: true` to disable all, then explicitly enable specific shims with `enabled: true` - Explicit `enabled: false` always disables, regardless of disableAll Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	3be57bb501	kata-deploy: Add Helm chart support for custom runtimes Add Helm chart configuration for defining custom RuntimeClasses with base configuration and drop-in overrides. Usage: helm install kata-deploy ./kata-deploy \ -f custom-runtimes.values.yaml Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	a76cdb5814	kata-deploy: Add custom runtime config installation/removal Add functions to install and remove custom runtime configuration files. Each custom runtime gets an isolated directory structure: custom-runtimes/{handler}/ configuration-{baseConfig}.toml # Copied from base config config.d/ 50-overrides.toml # User's drop-in overrides The base config is copied AFTER kata-deploy has applied its modifications (debug settings, proxy configuration, annotations), so custom runtimes inherit these settings. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	4c3989c3e4	kata-deploy: Add custom runtime configuration for containerd/CRI-O Add functions to configure custom runtimes in containerd and CRI-O. Custom runtimes use an isolated config directory under: custom-runtimes/{handler}/ Custom runtimes automatically derive the shim binary path from the baseConfig field using the existing is_rust_shim() logic. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	678b560e6d	kata-deploy: Add CustomRuntime struct and parsing Add support for parsing custom runtime configurations from a mounted ConfigMap. This allows users to define their own RuntimeClasses with custom Kata configurations. The ConfigMap format uses a custom-runtimes.list file with entries: handler:baseConfig:containerd_snapshotter:crio_pulltype Drop-in files are read from dropin-{handler}.toml, if present. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	609a25e643	kata-deploy: Refactor runtime configuration with helper functions Let's extract the common logic from configure_containerd_runtime and configure_crio_runtime into reusable helper functions. This reduces code duplication and prepares for adding custom runtime support. For containerd: - Add ContainerdRuntimeParams struct to encapsulate common parameters - Add get_containerd_pluginid() to extract version detection logic - Add get_containerd_output_path() to extract file path resolution - Add write_containerd_runtime_config() to write common TOML values For CRI-O: - Add CrioRuntimeParams struct to encapsulate common parameters - Add write_crio_runtime_config() to write common configuration While here, let's also simplify pod_annotations to always use "[\"io.katacontainers.*\"]" for all runtimes, as the NVIDIA specific case has been removed from the shell script, but we forgot to do so here. No functional changes intended. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Bo Liu	c7f5ff45a2	arm64: Update ptp.conf to correct time sync Given the patch has been merged in linux upstream, it's safe to enable these two options. Signed-off-by: Bo Liu <152475812+liubocflt@users.noreply.github.com>	2026-01-24 21:08:21 +01:00
Fabiano Fidêncio	5b82b160e2	runtime-rs: Add arm64 QEMU support Add the necessary configuration and code changes to support QEMU on arm64 architecture in runtime-rs. Changes: - Set MACHINETYPE to "virt" for arm64 - Add machine accelerators "usb=off,gic-version=host" required for proper arm64 virtualization - Add arm64-specific kernel parameter "iommu.passthrough=0" - Guard vIOMMU (Intel IOMMU) to skip on arm64 since it's not supported These changes align runtime-rs with the Go runtime's arm64 QEMU support. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org>	2026-01-23 19:48:31 +01:00
Fabiano Fidêncio	ac8436e326	kata-deploy: Update debian in the container image to 13 (trixie) Just a bump to the latest version, as requested by Mikko. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-22 12:32:59 +01:00
Fabiano Fidêncio	dacb14619d	kata-deploy: Make verification ConfigMap a regular resource The verification job mounts a ConfigMap containing the pod spec for the Kata runtime test. Previously, both the ConfigMap and the Job were Helm hooks with different weights (-5 and 0 respectively). On k3s, a race condition was observed where the Job pod would be scheduled before the kubelet's informer cache had registered the ConfigMap, causing a FailedMount error: MountVolume.SetUp failed for volume "pod-spec": object "kube-system"/"kata-deploy-verification-spec" not registered This happened because k3s's lightweight architecture schedules pods very quickly, and the hook weight difference only controls Helm's ordering, not actual timing between resource creation and cache sync. By making the ConfigMap a regular chart resource (removing hook annotations), it is created during the main chart installation phase, well before any post-install hooks run. This guarantees the ConfigMap is fully propagated to all kubelets before the verification Job starts. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-21 20:14:33 +01:00
Fabiano Fidêncio	89e287c3b2	kata-deploy: Add more permissions to verification job's RBAC The verification job needs to list nodes to check for the katacontainers.io/kata-runtime label and list events to detect FailedCreatePodSandBox errors during pod creation. This was discovered when testing with k0s, where the service account lacked the required cluster-scope permissions to list nodes. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-21 20:14:33 +01:00
Fabiano Fidêncio	869dd5ac65	kata-deploy: Enable dynamic drop-in support for k0s Remove k0s-worker and k0s-controller from RUNTIMES_WITHOUT_CONTAINERD_DROP_IN_SUPPORT and always return true for k0s in is_containerd_capable_of_using_drop_in_files since k0s auto-loads from containerd.d/ directory regardless of containerd version. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-21 20:14:33 +01:00
Fabiano Fidêncio	d4ea02e339	kata-deploy: Add microk8s support with dynamic version detection Add microk8s case to get_containerd_paths() method and remove microk8s from RUNTIMES_WITHOUT_CONTAINERD_DROP_IN_SUPPORT to enable dynamic containerd version checking. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-21 20:14:33 +01:00
Fabiano Fidêncio	69dd9679c2	kata-deploy: Centralize containerd path management Introduce ContainerdPaths struct and get_containerd_paths() method to centralize the complex logic for determining containerd configuration file paths across different Kubernetes distributions. The new ContainerdPaths struct includes: - config_file: File to read containerd version from and write to - backup_file: Backup file path before modification - imports_file: File to add/remove drop-in imports from (Option<String>) - drop_in_file: Path to the drop-in configuration file - use_drop_in: Whether drop-in files can be used Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-21 20:14:33 +01:00
Fabiano Fidêncio	606c12df6d	kata-deploy: fix JSONPath parsing for labels with dots The JSONPath parser was incorrectly splitting on escaped dots (\.) causing microk8s detection to fail. Labels like "microk8s.io/cluster" were being split into ["microk8s\", "io/cluster"] instead of being treated as a single key. This adds a split_jsonpath() helper that properly handles escaped dots, allowing the automatic microk8s detection via the node label to work correctly. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-21 20:14:33 +01:00
Fabiano Fidêncio	86e0b08b13	kata-deploy: Improve verification job timing and failure detection The verification job now supports configurable timeouts to accommodate different environments and network conditions. The daemonset timeout defaults to 1200 seconds (20 minutes) to allow for large image downloads, while the verification pod timeout defaults to 180 seconds. The job now waits for the DaemonSet to exist, pods to be scheduled, rollout to complete, and nodes to be labeled before creating the verification pod. A 15-second delay is added after node labeling to allow kubelet time to refresh runtime information. Retry logic with 3 attempts and a 10-second delay handles transient FailedCreatePodSandBox errors that can occur during runtime initialization. The job only fails on pod errors after a 30-second grace period to avoid false positives from timing issues. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-21 20:14:33 +01:00
Fabiano Fidêncio	5aff81198f	helm-chart: Fix warnings on README nydus -> `nydus` erofs -> `erofs` Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-19 22:41:50 +01:00
Fabiano Fidêncio	b5a986eacf	kata-deploy: Add runtime-rs TDX / SNP runtimeclasses https://github.com/kata-containers/kata-containers/pull/11534 has been merged and it added all the needed bits to deploy the QEMU SNP / TDX runtime-rs variants, apart from the kata-deploy additions, which is done by this PR. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-19 22:41:50 +01:00
Fabiano Fidêncio	96e1fb4ca6	tools: Remove runk The runk tool hasn't been supported for a few years, with no maintainers since ManaSugi stopped being involved in the project and the CI was disabled in 2024. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-19 14:43:53 +01:00
Fabiano Fidêncio	f68c25de6a	kata-deploy: Switch to the rust version Let's remove the script and rely only on the rust version from now on. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-19 14:07:49 +01:00
Fabiano Fidêncio	d7aa793dde	Revert "ci: Run a nightly job using the kata-deploy rust" This reverts commit `6130d7330f`, as we're officially swithcing to the rust version of kata-deploy. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-19 14:07:49 +01:00
Fabiano Fidêncio	17472f3f10	release: scripts: Accept KATA_TOOLS_STATIC_TARBALL env var `a2534e7bc8` introduced the logic to also release a kata-tools tarball, but it missed allowing KATA_TOOLS_STATIC_TARBALL env var to be passed to the release script, leading to the following error during the release process: ``` ERROR: Invalid environment variable "KATA_TOOLS_STATIC_TARBALL" ``` Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-19 13:03:23 +01:00
Fabiano Fidêncio	882862d711	release: Bump version to 3.25.0 Bump VERSION and helm-charts versions. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-19 11:33:45 +01:00
Zvonko Kaiser	428cc5d586	gpu: Chroot Cleanup With the newest NVRC we do not need the supported GPUs anymore. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-01-17 19:27:24 +01:00
Fabiano Fidêncio	1c154b4c15	kernel: Add DAX fix for arm64 The patch has been provided upstream by Seunguk Shin and is already approved. We'll drop it once it becomes available in the LTS tree. Reference: https://lore.kernel.org/all/18af3213-6c46-4611-ba75-da5be5a1c9b0@arm.coum Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-17 19:15:53 +01:00
Fabiano Fidêncio	33b1f0786e	Revert "arm64: Do not use DAX with the rootfs image" This reverts commit `2acb94ef2d`, as we have a kernel patch approved fixing the issue. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-17 19:15:53 +01:00
Fabiano Fidêncio	a188f04d75	kata-deploy: helm: Add optional post-install verification Add optional verification that runs after kata-deploy installation. When a pod spec is provided via --set-file verification.pod=<file>, a verification job runs after install/upgrade to validate deployment. The user is fully responsible for the verification pod content: - Pod name, runtimeClassName, annotations, and verification logic - Pod must exit 0 on success, non-zero on failure The verification job simply: 1. Waits for kata-deploy DaemonSet to be ready 2. Applies the user-provided pod spec 3. Waits for the pod to complete 4. Shows logs and cleans up Usage: helm install kata-deploy ... \ --set-file verification.pod=/path/to/your-pod.yaml Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-16 10:52:43 +01:00

1 2 3 4 5 ...

2037 Commits