kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-02-21 22:34:29 +00:00

Author	SHA1	Message	Date
Fabiano Fidêncio	855f4dc7fa	release: Bump version to 3.27.0 Bump VERSION and helm-charts versions. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-19 14:01:26 +01:00
Amulyam24	a22c59a204	kata-deploy: enable kata-remote for ppc64le When kata-deploy is deployed with cloud-api-adaptor, it defaults to qemu instead of configuring the remote shim. Support ppc64le to enable it correctly when shims.remote.enabled=true Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>	2026-02-19 11:14:27 +01:00
Zvonko Kaiser	1d09e70233	Merge pull request #12538 from fidencio/topic/kata-deploy-fix-regression-on-hardcopying-symlinks kata-deploy: preserve symlinks when installing artifacts	2026-02-18 12:44:46 -05:00
Mikko Ylinen	5622ab644b	versions: bump QEMU to v10.2.1 v10.2.1 is the latest patch release in v10.2 series. Changes: https://github.com/qemu/qemu/compare/v10.2.0...v10.2.1 Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2026-02-18 18:18:52 +01:00
Mikko Ylinen	d68adc54da	versions: bump to Linux v6.18.12 (LTS) Latest changelog in https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.18.12 Also other changes for 6..11 updates are available. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2026-02-18 18:18:52 +01:00
Fabiano Fidêncio	34336f87c7	kata-deploy: convert install.rs get_hypervisor_name tests to rstest Use rstest parameterized tests for QEMU variants, other hypervisors, and unknown/empty shim cases. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-18 12:41:55 +01:00
Fabiano Fidêncio	bb11bf0403	kata-deploy: preserve symlinks when installing artifacts When copying artifacts from the container to the host, detect source entries that are symlinks and recreate them as symlinks at the destination instead of copying the target file. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-18 12:29:14 +01:00
Fabiano Fidêncio	f0a0425617	kata-deploy: convert a few toml.rs tests to rstest Turn test_toml_value_types into a parameterized test with one case per type (string, bool, int). Merge the two invalid-TOML tests (get and set) into one rstest with two cases, and the two "not an array" tests into one rstest with two cases. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-17 09:33:39 +01:00
Fabiano Fidêncio	899005859c	kata-deploy: avoid leading/blank lines in written TOML config When writing containerd drop-in or other TOML (e.g. initially empty file), the serialized document could start with many newlines. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-17 09:33:39 +01:00
Fabiano Fidêncio	cfa8188cad	kata-deploy: convert containerd version support tests to rstest Replace multiple #[test] functions for snapshotter and erofs version checks with parameterized #[rstest] #[case] tests for consistency and easier extension. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-17 09:33:39 +01:00
Fabiano Fidêncio	cadac7a960	kata-deploy: runtime_platform -> runtime_platforms Fix runtime_platforms typo. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-17 09:33:39 +01:00
Hyounggyu Choi	8bc60a0761	Merge pull request #12521 from fidencio/topic/kata-deploy-auto-add-nfd-tee-labels-to-the-runtime-class kata-deploy: Add TEE nodeSelectors for TEE shims when NFD is detected	2026-02-16 18:06:18 +01:00
Fabiano Fidêncio	a04df4f4cb	kata-deploy: disable provenance/SBOM for quay.io compatibility Disable provenance and SBOM when building per-arch kata-deploy images so each tag is a single image manifest. quay.io rejects pushing multi-arch manifest lists that include attestation manifests (400 manifest invalid). Add a note in the release script documenting this. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-16 13:32:25 +01:00
Fabiano Fidêncio	0e8e30d6b5	kata-deploy: fix default RuntimeClass + nodeSelectors The default RuntimeClass (e.g. kata) is meant to point at the default shim handler (e.g. kata-qemu-$tee). We were building it in a separate block and only sometimes adding the same TEE nodeSelectors as the shim-specific RuntimeClass, leading to kata ending up without the SE/SNP/TDX nodeSelector while kata-qemu-$tee had it. The fix is to stop duplicating the RuntimeClass definition, having a single template that renders one RuntimeClass (name, handler, overhead, nodeSelectors). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-16 13:09:03 +01:00
Fabiano Fidêncio	80a175d09b	kata-deploy: Add TEE nodeSelectors for TEE shims when NFD is detected When NFD is detected (deployed by the chart or existing in the cluster), apply shim-specific nodeSelectors only for TEE runtime classes (snp, tdx, and se). Non-TEE shims keep existing behavior (e.g. runtimeClass.nodeSelector for nvidia GPU from `f3bba0885` is unchanged). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-16 12:07:51 +01:00
Fabiano Fidêncio	d000acfe08	infra: fix multi-arch manifest publish Per-arch images were failing publish-multiarch-manifest with 'X is a manifest list' because Buildx now enables attestations by default, so each arch tag became an image index. Use 'docker buildx imagetools create' instead of 'docker manifest create' so we can merge those indexes into the final multi-arch manifest while keeping provenance and SBOM on per-arch images. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-02-14 19:49:00 +01:00
Fabiano Fidêncio	02c9a4b23c	kata-deploy: Temporarily comment GPU specific labels We depend on GPU Operator v26.3 release, which is not out yet. Although we have been testing with it, it's not yet publicly available, which would break anyone actually trying to use the GPU runtime classes. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-14 09:25:14 +01:00
Fabiano Fidêncio	5106e7b341	build: Add gnupg to the agent's builder container Otherwise we'll fail to check gperf's GPG signing key when needed. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-14 00:33:45 +01:00
Fabiano Fidêncio	d8acc403c8	kata-deploy: set CRI images runtime_platform snapshotter for containerd v3 In containerd config v3 the CRI plugin is split into runtime and images, and setting the snapshotter only on the runtime plugin is not enough for image pull/prepare. The images plugin must have runtime_platform.<runtime>.snapshotter so it uses the correct snapshotter per runtime (e.g. nydus, erofs). A PR on the containerd side is open so we can rely on the runtime plugin snapshotter alone: https://github.com/containerd/containerd/pull/12836 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-13 22:15:02 +01:00
Fabiano Fidêncio	f6e0a7c33c	scripts: use temporary GPG home when verifying cached gperf tarball In CI the default GPG keyring is often read-only or missing, so 'gpg --import' of the cached keyring fails and verification cannot succeed. Use a temporary GNUPGHOME for import and verify so cached gperf can be verified without writing to the system keyring. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-13 19:39:55 +01:00
Joji Mekkattuparamban	f3bba08851	kata-deploy: add node selector to nvidia runtime classes The CC runtime classes kata-qemu-nvidia-gpu-snp and kata-qemu-nvidia-gpu-tdx are mutually exclusive with kata-qemu-nvidia-gpu, as dictated by the gpu cc mode setting. In order to properly support a cluster that has both CC and non-CC nodes, we use a node selector so the scheduling is consistent with the GPU mode. The GPU operator sets a label nvidia.com/cc.ready.state=[true, false] to indicate the gpu mode setting Fixes #12431 Signed-off-by: Joji Mekkattuparamban <jojim@nvidia.com>	2026-02-13 15:58:06 +01:00
Fabiano Fidêncio	f4dcb66a3c	ci: add workflow to push ORAS tarball cache Add push-oras-tarball-cache workflow that runs on push to main when versions.yaml changes (and on workflow_dispatch). It populates the ghcr.io ORAS cache with gperf and busybox tarballs from versions.yaml. Remove the push_to_cache call from download-with-oras-cache.sh since it was never triggered in CI. Cache population is now done solely by the new workflow and by populate-oras-tarball-cache.sh when run manually. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-13 12:57:48 +01:00
Fabiano Fidêncio	a01e95b988	kata-deploy: test k3s/rke2 template handling / version checks Add tests for the split_non_toml_header helper that strips Go template directives before TOML parsing, and for every TOML operation (set, get, append, remove, set_array) on files that start with {{ template "base" . }}. Also converts the containerd version detection tests in manager.rs from individual #[test] functions with helper wrappers to parametrized #[rstest] cases, which is more readable and easier to extend. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-12 22:30:08 +01:00
Fabiano Fidêncio	2e7633674f	kata-deploy: use k3s/rke2 base template K3s docs (https://docs.k3s.io/advanced#configuring-containerd) say that the right way to customize containerd is to extend the base template with {{ template "base" . }} and append your own TOML blocks, rather than copying a prerendered config.toml into the template file. We were copying config.toml into config.toml.tmpl / config-v3.toml.tmpl, which meant we were replacing the K3s defaults with a snapshot that gets stale as soon as K3s is upgraded. Now we create the template files with just the base directive and let our regular set_toml_value code path append the Kata runtime configuration on top. To make that work, the TOML utils learned to handle files that start with a Go template line ({{ ... }}): strip it before parsing, put it back when writing. This keeps the K3s/RKE2 path identical to every other runtime -- no special append logic needed. refs: * k3s:: https://docs.k3s.io/advanced#configuring-containerd * rke2: https://docs.rke2.io/advanced?_highlight=conyainerd#configuring-containerd Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-12 22:30:08 +01:00
Manuel Huber	ed7de905b5	build: Tighten upstream download path for ORAS The gperf-3.3 tarball frequently fails to download on my end with cryptic error messages such as: "tar: This does not look like a tar archive". This change tightens the download logic a bit: We fail at the point in time when we're supposed to fail. This way we detect rate limiting issues right away, and this way, the actual hashsum and signature checks are effective, not only printouts. This change also updates the key reference and allows for an array, for instance, when a different signer was used for a cache vs upstream version. The change also makes it clear, that signature verification is only implemented for the gperf tarball. Improvements can be made in a subsequent change. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-12 19:20:35 +01:00
Fabiano Fidêncio	9fc5be47d0	kata-deploy: fix custom runtime config path for runtime-rs shims Custom runtimes whose base config lives under runtime-rs/ (e.g. dragonball, cloud-hypervisor) were not found because the path was always built under share/defaults/kata-containers/. Use get_kata_containers_original_config_path for the handler so rust shim configs are read from .../runtime-rs/. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-12 18:08:47 +01:00
Fabiano Fidêncio	50923b6d62	kata-deploy: run cleanup on uninstall via DaemonSet preStop On helm uninstall let's rely on a preStop hook to run kata-deploy cleanup so each pod cleans its node before exiting. We must keep RBAC (resource-policy: keep) so pods retain API access during termination, and then can properly delete the NodeFeatureRules and remove the labels from the nodes. The post-delete hook Job, which runs on a single node, now is only responsible for cleaning the kept RBAC (cluster-wide resource) after uninstall, not leaving any resource or artefact behind. The changes on this commit lead to a "resouerces were kept" message when running `helm uninstall`, which document as being normal, as the post-delete job will remove those. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-11 22:05:10 +01:00
Fabiano Fidêncio	6e0cbc28a3	kata-deploy: fix node label removal When removing a node label, JSON merge patch semantics require setting the key to null; omitting the key leaves it unchanged. Fix label_node to send a patch with the label key set to null so the API server actually removes katacontainers.io/kata-runtime. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-11 22:05:10 +01:00
Fabiano Fidêncio	510d2a69ae	kata-deploy: exit with 0 on SIGTERM in install mode Wait for SIGTERM after install and exit(0) so the container terminates cleanly. If registering the SIGTERM handler fails, log a warning and sleep forever instead of exiting with an error (fallback to the old behaviour). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-11 22:05:10 +01:00
Fabiano Fidêncio	5c0269881e	tests: Make editorconfig-checker happy - Trim trailing whitespace and ensure final newline in non-vendor files - Add .editorconfig-checker.json excluding vendor dirs, .patch, .img, .dtb, .drawio, *.svg, and pkg/cloud-hypervisor/client so CI only checks project code - Leave generated and binary assets unchanged (excluded from checker) Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-02-10 21:58:28 +01:00
Fabiano Fidêncio	4cb2aea9dd	kata-deploy: Document drop-in configuration and add warning to config files When kata-deploy installs Kata Containers, the base configuration files should not be modified directly. This change adds documentation explaining how to use drop-in configuration files for customization, and prepends a warning comment to all deployed configuration files reminding users to use drop-in files instead. The warning is added to both standard shim configurations and custom runtime configurations. It includes a brief explanation of how drop-in files work and points users to the documentation for more details. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Fabiano Fidêncio	d5d561abe5	kata-deploy: Add detailed logging for drop-in configuration Add clear INFO-level messages when creating drop-in configuration files, making it easy to understand what kata-deploy is doing during installation: - "Setting up runtime directory for shim: X" - "Generating drop-in configuration files for shim: X" - "Created drop-in file: <path>" When DEBUG mode is enabled (via DEBUG=true environment variable), also log the full content of each drop-in file to aid troubleshooting. The log level is now automatically set to Debug when the DEBUG environment variable is set, ensuring debug messages are visible. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Fabiano Fidêncio	eddd1b507e	kata-deploy: Extract common drop-in generation into shared helper Deduplicate the drop-in file generation logic between configure_shim_config and install_custom_runtime_configs by extracting it into a shared write_common_drop_ins helper function. This ensures both standard and custom runtimes use the same code path for generating drop-in configuration files. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Fabiano Fidêncio	577aa6b319	kata-deploy: Propagate drop-in configs to custom runtime classes Ensure custom runtime classes receive the same drop-in configuration files as standard runtimes: - 10-installation-prefix.toml (if custom dest_dir) - 20-debug.toml (if debug enabled) - 30-kernel-params.toml (proxy + debug kernel params) Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Fabiano Fidêncio	8c60a88bda	kata-deploy: Add combined kernel_params drop-in Add a combined drop-in file (30-kernel-params.toml) that handles all kernel_params modifications. This approach reads the base kernel_params from the original untouched config file and combines them with: - Proxy settings (agent.https_proxy, agent.no_proxy) - Debug settings (agent.log=debug, initcall_debug) Using a single drop-in file for kernel_params avoids the TOML merge behavior where scalar values are replaced rather than appended. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Fabiano Fidêncio	fae96f1f82	kata-deploy: Add drop-in file for debug configuration When debug mode is enabled, generate a drop-in configuration file (20-debug.toml) with the boolean debug flags for hypervisor, runtime, and agent sections. Note: kernel_params for debug (agent.log=debug, initcall_debug) will be handled by a separate combined kernel_params drop-in file to avoid the TOML merge replacement behavior. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Fabiano Fidêncio	bb65e516e5	kata-deploy: Add drop-in file for installation prefix When the installation prefix differs from the default /opt/kata, generate a drop-in configuration file (10-installation-prefix.toml) with the adjusted paths instead of modifying the original config file. This removes the need for adjust_installation_prefix and adjust_qemu_cmdline functions which are now deleted along with their tests. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Fabiano Fidêncio	cd76d61a3d	kata-deploy: Add infrastructure for per-shim drop-in configuration Instead of modifying original config files directly, set up a per-shim directory structure that uses symlinks to the original configs and config.d/ directories for drop-in overrides. This enables cleaner configuration management where the original files remain untouched and all kata-deploy customizations are in separate drop-in files that can be easily inspected and removed. Directory structure: {config_path}/runtimes/{shim}/ {config_path}/runtimes/{shim}/configuration-{shim}.toml -> symlink {config_path}/runtimes/{shim}/config.d/ Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
stevenhorsman	33d494b07e	kata-deploy: Bump bytes to 1.11.1 To remediate CVE-2026-25541 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-09 21:43:23 +01:00
Fabiano Fidêncio	ab515712d4	kernel: Unify kernel and kernel-confidential Build a single kernel for both kernel and kernel-confidential on x86_64 and s390x. The kernel is built with TEE support (-x) on those arches only. This helps to simplilfy and to maintain the code, and having a single kernel was the original plan since forever. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-09 18:28:23 +01:00
Fabiano Fidêncio	c5b5433866	kernel: Unify nvidia-gpu and nvidia-gpu-confidential Build a single kernel for both nvidia-gpu and nvidia-gpu-confidential, simplifying and reducing code maintenance. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-09 18:28:23 +01:00
Manuel Huber	a786582d0b	rootfs: deprecate initramfs dm-verity mode Remove the initramfs folder, its build steps, and use the kernel based dm-verity enforcement for the handlers which used the initramfs mode. Also, remove the initramfs verity mode capability from the shims and their configs. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	a3c4e0b64f	rootfs: Introduce kernelinit dm-verity mode This change introduces the kernelinit dm-verity mode, allowing initramfs-less dm-verity enforcement against the rootfs image. For this, the change introduces a new variable with dm-verity information. This variable will be picked up by shim configurations in subsequent commits. This will allow the shims to build the kernel command line with dm-verity information based on the existing kernel_parameters configuration knob and a new kernel_verity_params configuration knob. The latter specifically provides the relevant dm-verity information. This new configuration knob avoids merging the verity parameters into the kernel_params field. Avoiding this, no cumbersome escape logic is required as we do not need to pass the dm-mod.create="..." parameter directly in the kernel_parameters, but only relevant dm-verity parameters in semi-structured manner (see above). The only place where the final command line is assembled is in the shims. Further, this is a line easy to comment out for developers to disable dm-verity enforcement (or for CI tasks). This change produces the new kernelinit dm-verity parameters for the NVIDIA runtime handlers, and modifies the format of how these parameters are prepared for all handlers. With this, the parameters are currently no longer provided to the kernel_params configuration knob for any runtime handler. This change alone should thus not be used as dm-verity information will no longer be picked up by the shims. systemd-analyze on the coco-dev handler shows that using the kernelinit mode on a local machine, less time is spent in the kernel phase, slightly speeding up pod start-up. On that machine, the average of 172.5ms was reduced to 141ms (4 measurements, each with a basic pod manifest), i.e., the kernel phase duration is improved by about 18 percent. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	83a0bd1360	gpu: use dm-verity for the non-TEE GPU handler Use a dm-verity protected rootfs image for the non-TEE NVIDIA GPU handler as well. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	02ed4c99bc	rootfs: Use maxdepth=1 to search for kata tarballs These tarballs are in the top layer of the build directory, no need to traverse all sub-directories. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	d37db5f068	rootfs: Restore "gpu: Handle root_hash.txt ..." This reverts commit `923f97bc66` in order to re-instantiate the logic from commit `e4a13b9a4a`. The latter commit was previously reverted due to the NVIDIA GPU TEE handler using an initrd, not an image. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	f1ca547d66	initramfs: introduce log function Log to /dev/kmsg, this way logs will show up and not get lost. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Fabiano Fidêncio	f90c12d4df	kata-deploy: Avoid text file busy error with nydus-snapshotter We cannot overwrtie a binary that's currently in use, and that's the reason that elsewhere we remove / unlink the binary (the running process keeps its file descriptor, so we're good doing that) and only then we copy the binary. However, we missed doing this for the nydus-snapshotter deployment. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-04 10:24:49 +01:00
Steve Horsman	6bb77a2f13	Merge pull request #12390 from mythi/tdx-updates-2026-2 runtime: tdx QEMU configuration changes	2026-02-02 16:58:44 +00:00
Zvonko Kaiser	6702b48858	Merge pull request #12428 from fidencio/topic/nydus-snapshotter-start-from-a-clean-state kata-deploy: nydus: Always start from a clean state	2026-02-02 11:21:26 -05:00

1 2 3 4 5 ...

1534 Commits