kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-02-21 22:34:29 +00:00

Author	SHA1	Message	Date
Fabiano Fidêncio	50923b6d62	kata-deploy: run cleanup on uninstall via DaemonSet preStop On helm uninstall let's rely on a preStop hook to run kata-deploy cleanup so each pod cleans its node before exiting. We must keep RBAC (resource-policy: keep) so pods retain API access during termination, and then can properly delete the NodeFeatureRules and remove the labels from the nodes. The post-delete hook Job, which runs on a single node, now is only responsible for cleaning the kept RBAC (cluster-wide resource) after uninstall, not leaving any resource or artefact behind. The changes on this commit lead to a "resouerces were kept" message when running `helm uninstall`, which document as being normal, as the post-delete job will remove those. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-11 22:05:10 +01:00
Fabiano Fidêncio	6e0cbc28a3	kata-deploy: fix node label removal When removing a node label, JSON merge patch semantics require setting the key to null; omitting the key leaves it unchanged. Fix label_node to send a patch with the label key set to null so the API server actually removes katacontainers.io/kata-runtime. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-11 22:05:10 +01:00
Fabiano Fidêncio	510d2a69ae	kata-deploy: exit with 0 on SIGTERM in install mode Wait for SIGTERM after install and exit(0) so the container terminates cleanly. If registering the SIGTERM handler fails, log a warning and sleep forever instead of exiting with an error (fallback to the old behaviour). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-11 22:05:10 +01:00
Fabiano Fidêncio	5c0269881e	tests: Make editorconfig-checker happy - Trim trailing whitespace and ensure final newline in non-vendor files - Add .editorconfig-checker.json excluding vendor dirs, .patch, .img, .dtb, .drawio, *.svg, and pkg/cloud-hypervisor/client so CI only checks project code - Leave generated and binary assets unchanged (excluded from checker) Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-02-10 21:58:28 +01:00
Fabiano Fidêncio	4cb2aea9dd	kata-deploy: Document drop-in configuration and add warning to config files When kata-deploy installs Kata Containers, the base configuration files should not be modified directly. This change adds documentation explaining how to use drop-in configuration files for customization, and prepends a warning comment to all deployed configuration files reminding users to use drop-in files instead. The warning is added to both standard shim configurations and custom runtime configurations. It includes a brief explanation of how drop-in files work and points users to the documentation for more details. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Fabiano Fidêncio	d5d561abe5	kata-deploy: Add detailed logging for drop-in configuration Add clear INFO-level messages when creating drop-in configuration files, making it easy to understand what kata-deploy is doing during installation: - "Setting up runtime directory for shim: X" - "Generating drop-in configuration files for shim: X" - "Created drop-in file: <path>" When DEBUG mode is enabled (via DEBUG=true environment variable), also log the full content of each drop-in file to aid troubleshooting. The log level is now automatically set to Debug when the DEBUG environment variable is set, ensuring debug messages are visible. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Fabiano Fidêncio	eddd1b507e	kata-deploy: Extract common drop-in generation into shared helper Deduplicate the drop-in file generation logic between configure_shim_config and install_custom_runtime_configs by extracting it into a shared write_common_drop_ins helper function. This ensures both standard and custom runtimes use the same code path for generating drop-in configuration files. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Fabiano Fidêncio	577aa6b319	kata-deploy: Propagate drop-in configs to custom runtime classes Ensure custom runtime classes receive the same drop-in configuration files as standard runtimes: - 10-installation-prefix.toml (if custom dest_dir) - 20-debug.toml (if debug enabled) - 30-kernel-params.toml (proxy + debug kernel params) Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Fabiano Fidêncio	8c60a88bda	kata-deploy: Add combined kernel_params drop-in Add a combined drop-in file (30-kernel-params.toml) that handles all kernel_params modifications. This approach reads the base kernel_params from the original untouched config file and combines them with: - Proxy settings (agent.https_proxy, agent.no_proxy) - Debug settings (agent.log=debug, initcall_debug) Using a single drop-in file for kernel_params avoids the TOML merge behavior where scalar values are replaced rather than appended. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Fabiano Fidêncio	fae96f1f82	kata-deploy: Add drop-in file for debug configuration When debug mode is enabled, generate a drop-in configuration file (20-debug.toml) with the boolean debug flags for hypervisor, runtime, and agent sections. Note: kernel_params for debug (agent.log=debug, initcall_debug) will be handled by a separate combined kernel_params drop-in file to avoid the TOML merge replacement behavior. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Fabiano Fidêncio	bb65e516e5	kata-deploy: Add drop-in file for installation prefix When the installation prefix differs from the default /opt/kata, generate a drop-in configuration file (10-installation-prefix.toml) with the adjusted paths instead of modifying the original config file. This removes the need for adjust_installation_prefix and adjust_qemu_cmdline functions which are now deleted along with their tests. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Fabiano Fidêncio	cd76d61a3d	kata-deploy: Add infrastructure for per-shim drop-in configuration Instead of modifying original config files directly, set up a per-shim directory structure that uses symlinks to the original configs and config.d/ directories for drop-in overrides. This enables cleaner configuration management where the original files remain untouched and all kata-deploy customizations are in separate drop-in files that can be easily inspected and removed. Directory structure: {config_path}/runtimes/{shim}/ {config_path}/runtimes/{shim}/configuration-{shim}.toml -> symlink {config_path}/runtimes/{shim}/config.d/ Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Lukáš Doktor	f7baa394d4	tools.gatekeeper: Add support to paginate workflows The number of workflows increased over 30 so we need to paginate them as well as jobs. This commit extracts the existing pagination from jobs and uses it for both jobs and workflows. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2026-02-10 06:53:47 +00:00
stevenhorsman	33d494b07e	kata-deploy: Bump bytes to 1.11.1 To remediate CVE-2026-25541 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-09 21:43:23 +01:00
Fabiano Fidêncio	ab515712d4	kernel: Unify kernel and kernel-confidential Build a single kernel for both kernel and kernel-confidential on x86_64 and s390x. The kernel is built with TEE support (-x) on those arches only. This helps to simplilfy and to maintain the code, and having a single kernel was the original plan since forever. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-09 18:28:23 +01:00
Fabiano Fidêncio	c5b5433866	kernel: Unify nvidia-gpu and nvidia-gpu-confidential Build a single kernel for both nvidia-gpu and nvidia-gpu-confidential, simplifying and reducing code maintenance. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-09 18:28:23 +01:00
stevenhorsman	b29312289f	versions: Bump go to 1.24.13 Bump go to 1.24.13 to fix CVE GO-2026-4337 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-09 14:49:31 +01:00
Manuel Huber	d9d1073cf1	gpu: Install packages for devkit Introduce a new function to install additional packages into the devkit flavor. With modprobe, we avoid errors on pod startup related to loading nvidia kernel modules in the NVRC phase. Note, the production flavor gets modprobe from busybox, see its configuration file containing CONFIG_MODPROBE=y. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-06 09:58:32 +01:00
Manuel Huber	a786582d0b	rootfs: deprecate initramfs dm-verity mode Remove the initramfs folder, its build steps, and use the kernel based dm-verity enforcement for the handlers which used the initramfs mode. Also, remove the initramfs verity mode capability from the shims and their configs. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	976df22119	rootfs: Change condition for cryptsetup-bin Measured rootfs mode and CDH secure storage feature require the cryptsetup-bin and e2fsprogs components in the guest. This change makes this more explicity - confidential guests are users of the CDH secure container image layer storage feature. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	a3c4e0b64f	rootfs: Introduce kernelinit dm-verity mode This change introduces the kernelinit dm-verity mode, allowing initramfs-less dm-verity enforcement against the rootfs image. For this, the change introduces a new variable with dm-verity information. This variable will be picked up by shim configurations in subsequent commits. This will allow the shims to build the kernel command line with dm-verity information based on the existing kernel_parameters configuration knob and a new kernel_verity_params configuration knob. The latter specifically provides the relevant dm-verity information. This new configuration knob avoids merging the verity parameters into the kernel_params field. Avoiding this, no cumbersome escape logic is required as we do not need to pass the dm-mod.create="..." parameter directly in the kernel_parameters, but only relevant dm-verity parameters in semi-structured manner (see above). The only place where the final command line is assembled is in the shims. Further, this is a line easy to comment out for developers to disable dm-verity enforcement (or for CI tasks). This change produces the new kernelinit dm-verity parameters for the NVIDIA runtime handlers, and modifies the format of how these parameters are prepared for all handlers. With this, the parameters are currently no longer provided to the kernel_params configuration knob for any runtime handler. This change alone should thus not be used as dm-verity information will no longer be picked up by the shims. systemd-analyze on the coco-dev handler shows that using the kernelinit mode on a local machine, less time is spent in the kernel phase, slightly speeding up pod start-up. On that machine, the average of 172.5ms was reduced to 141ms (4 measurements, each with a basic pod manifest), i.e., the kernel phase duration is improved by about 18 percent. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	83a0bd1360	gpu: use dm-verity for the non-TEE GPU handler Use a dm-verity protected rootfs image for the non-TEE NVIDIA GPU handler as well. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	02ed4c99bc	rootfs: Use maxdepth=1 to search for kata tarballs These tarballs are in the top layer of the build directory, no need to traverse all sub-directories. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	d37db5f068	rootfs: Restore "gpu: Handle root_hash.txt ..." This reverts commit `923f97bc66` in order to re-instantiate the logic from commit `e4a13b9a4a`. The latter commit was previously reverted due to the NVIDIA GPU TEE handler using an initrd, not an image. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	f1ca547d66	initramfs: introduce log function Log to /dev/kmsg, this way logs will show up and not get lost. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Fabiano Fidêncio	f90c12d4df	kata-deploy: Avoid text file busy error with nydus-snapshotter We cannot overwrtie a binary that's currently in use, and that's the reason that elsewhere we remove / unlink the binary (the running process keeps its file descriptor, so we're good doing that) and only then we copy the binary. However, we missed doing this for the nydus-snapshotter deployment. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-04 10:24:49 +01:00
Steve Horsman	6bb77a2f13	Merge pull request #12390 from mythi/tdx-updates-2026-2 runtime: tdx QEMU configuration changes	2026-02-02 16:58:44 +00:00
Zvonko Kaiser	6702b48858	Merge pull request #12428 from fidencio/topic/nydus-snapshotter-start-from-a-clean-state kata-deploy: nydus: Always start from a clean state	2026-02-02 11:21:26 -05:00
Steve Horsman	0530a3494f	Merge pull request #12415 from nlle/make-helm-updatestrategy-configurable kata-deploy: Make update strategy configurable for kata-deploy DaemonSet	2026-02-02 10:29:01 +00:00
Fabiano Fidêncio	62ad0814c5	kata-deploy: nydus: Always start from a clean state Clean up existing nydus-snapshotter state to ensure fresh start with new version. This is safe across all K8s distributions (k3s, rke2, k0s, microk8s, etc.) because we only touch the nydus data directory, not containerd's internals. When containerd tries to use non-existent snapshots, it will re-pull/re-unpack. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-02 11:06:37 +01:00
Mikko Ylinen	870630c421	kata-deploy: drop custom TDX installation steps As we have moved to use QEMU (and OVMF already earlier) from kata-deploy, the custom tdx configurations and distro checks are no longer needed. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2026-02-02 11:11:26 +02:00
Nikolaj Lindberg Lerche	6e98df2bac	kata-deploy: Make update strategy configurable for kata-deploy DaemonSet This Allows the updateStrategy to be configured for the kata-deploy helm chart, this is enabling administrators to control the aggressiveness of updates. For a less aggressive approach, the strategy can be set to `OnDelete`. Alternatively, the update process can be made more aggressive by adjusting the `maxUnavailable` parameter. Signed-off-by: Nikolaj Lindberg Lerche <nlle@ambu.com>	2026-02-01 20:14:29 +01:00
Manuel Huber	8b0c199f43	packaging: Delete pause_bundle dir before unpack Delete the pause_bundle directory before running the umoci unpack operation. This will make builds idempotent and not fail with errors like "create runtime bundle: config.json already exists in .../build/pause-image/destdir/pause_bundle". This will make life better when building locally. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-01-31 19:43:11 +01:00
Fabiano Fidêncio	b85393e70b	release: Bump version to 3.26.0 Bump VERSION and helm-charts versions. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-29 00:23:26 +01:00
Fabiano Fidêncio	500146bfee	versions: Bump Go to 1.24.12 Update Go from 1.24.11 to 1.24.12 to address security vulnerabilities in the standard library: - GO-2026-4342: Excessive CPU consumption in archive/zip - GO-2026-4341: Memory exhaustion in net/url query parsing - GO-2026-4340: TLS handshake encryption level issue in crypto/tls Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-29 00:23:26 +01:00
Manuel Huber	5e60d384a2	kata-deploy: Update for mariner in all target Remove the initrd function and add the image function to align with the actually existing functions in this file. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-01-28 08:58:45 -08:00
Manuel Huber	0d8fbdef07	kernel: Readjust kernel version after decrement Readjust the kata_config_version counter after it was accidentally decremented in commit `c7f5ff4`. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-01-28 10:48:12 +01:00
Zvonko Kaiser	a59f791bf5	gpu: Move CUDA repo selection to versions.yaml We want to enable local and remote CUDA repository builds. Moving the cuda and tools repo to versions.yaml with a unified build for both types. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-01-26 22:19:40 +01:00
Fabiano Fidêncio	04f45a379c	kata-deploy: docs: Document shims.disableAll option Update the Helm chart README to document the new shims.disableAll option and simplify the examples that previously required listing every shim to disable. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	c9e9a682ab	kata-deploy: Use disableAll in example values files Simplify the example values files by using the new shims.disableAll option instead of listing every shim to disable. Before (try-kata-nvidia-gpu.values.yaml): shims: clh: enabled: false cloud-hypervisor: enabled: false # ... 15 more lines ... After: shims: disableAll: true Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	cfe9bcbaf1	kata-deploy: Add shims.disableAll option to Helm chart Add a new `shims.disableAll` option that disables all standard shims at once. This is useful when: - Enabling only specific shims without listing every other shim - Using custom runtimes only mode (no standard Kata shims) Usage: shims: disableAll: true qemu: enabled: true # Only qemu is enabled All helper templates are updated to check for this flag before iterating over shims. One thing that's super important to note here is that helm recursively merges user values with chart defaults, making a simple `disableAll` flag problematic: if defaults have `enabled: true`, user's `disableAll: true` gets merged with those defaults, resulting in all shims still being enabled. The workaround found is to use null (`~`) as the default for `enabled` field. The template logic interprets null differently based on disableAll: \| enabled value \| disableAll: false \| disableAll: true \| \|---------------\|-------------------\|------------------\| \| ~ (null) \| Enabled \| Disabled \| \| true \| Enabled \| Enabled \| \| false \| Disabled \| Disabled \| This is backward compatible: - Default behavior unchanged: all shims enabled when disableAll: false - Users can set `disableAll: true` to disable all, then explicitly enable specific shims with `enabled: true` - Explicit `enabled: false` always disables, regardless of disableAll Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	3be57bb501	kata-deploy: Add Helm chart support for custom runtimes Add Helm chart configuration for defining custom RuntimeClasses with base configuration and drop-in overrides. Usage: helm install kata-deploy ./kata-deploy \ -f custom-runtimes.values.yaml Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	a76cdb5814	kata-deploy: Add custom runtime config installation/removal Add functions to install and remove custom runtime configuration files. Each custom runtime gets an isolated directory structure: custom-runtimes/{handler}/ configuration-{baseConfig}.toml # Copied from base config config.d/ 50-overrides.toml # User's drop-in overrides The base config is copied AFTER kata-deploy has applied its modifications (debug settings, proxy configuration, annotations), so custom runtimes inherit these settings. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	4c3989c3e4	kata-deploy: Add custom runtime configuration for containerd/CRI-O Add functions to configure custom runtimes in containerd and CRI-O. Custom runtimes use an isolated config directory under: custom-runtimes/{handler}/ Custom runtimes automatically derive the shim binary path from the baseConfig field using the existing is_rust_shim() logic. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	678b560e6d	kata-deploy: Add CustomRuntime struct and parsing Add support for parsing custom runtime configurations from a mounted ConfigMap. This allows users to define their own RuntimeClasses with custom Kata configurations. The ConfigMap format uses a custom-runtimes.list file with entries: handler:baseConfig:containerd_snapshotter:crio_pulltype Drop-in files are read from dropin-{handler}.toml, if present. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	609a25e643	kata-deploy: Refactor runtime configuration with helper functions Let's extract the common logic from configure_containerd_runtime and configure_crio_runtime into reusable helper functions. This reduces code duplication and prepares for adding custom runtime support. For containerd: - Add ContainerdRuntimeParams struct to encapsulate common parameters - Add get_containerd_pluginid() to extract version detection logic - Add get_containerd_output_path() to extract file path resolution - Add write_containerd_runtime_config() to write common TOML values For CRI-O: - Add CrioRuntimeParams struct to encapsulate common parameters - Add write_crio_runtime_config() to write common configuration While here, let's also simplify pod_annotations to always use "[\"io.katacontainers.*\"]" for all runtimes, as the NVIDIA specific case has been removed from the shell script, but we forgot to do so here. No functional changes intended. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Bo Liu	c7f5ff45a2	arm64: Update ptp.conf to correct time sync Given the patch has been merged in linux upstream, it's safe to enable these two options. Signed-off-by: Bo Liu <152475812+liubocflt@users.noreply.github.com>	2026-01-24 21:08:21 +01:00
Fabiano Fidêncio	5b82b160e2	runtime-rs: Add arm64 QEMU support Add the necessary configuration and code changes to support QEMU on arm64 architecture in runtime-rs. Changes: - Set MACHINETYPE to "virt" for arm64 - Add machine accelerators "usb=off,gic-version=host" required for proper arm64 virtualization - Add arm64-specific kernel parameter "iommu.passthrough=0" - Guard vIOMMU (Intel IOMMU) to skip on arm64 since it's not supported These changes align runtime-rs with the Go runtime's arm64 QEMU support. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org>	2026-01-23 19:48:31 +01:00
Fabiano Fidêncio	ac8436e326	kata-deploy: Update debian in the container image to 13 (trixie) Just a bump to the latest version, as requested by Mikko. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-22 12:32:59 +01:00
Fabiano Fidêncio	dacb14619d	kata-deploy: Make verification ConfigMap a regular resource The verification job mounts a ConfigMap containing the pod spec for the Kata runtime test. Previously, both the ConfigMap and the Job were Helm hooks with different weights (-5 and 0 respectively). On k3s, a race condition was observed where the Job pod would be scheduled before the kubelet's informer cache had registered the ConfigMap, causing a FailedMount error: MountVolume.SetUp failed for volume "pod-spec": object "kube-system"/"kata-deploy-verification-spec" not registered This happened because k3s's lightweight architecture schedules pods very quickly, and the hook weight difference only controls Helm's ordering, not actual timing between resource creation and cache sync. By making the ConfigMap a regular chart resource (removing hook annotations), it is created during the main chart installation phase, well before any post-install hooks run. This guarantees the ConfigMap is fully propagated to all kubelets before the verification Job starts. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-21 20:14:33 +01:00

1 2 3 4 5 ...

2054 Commits