kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-07-02 07:02:16 +00:00

Author	SHA1	Message	Date
Fabiano Fidêncio	48ebbbec3a	kata-deploy: honor debug mode with CLI log-level Make the chart pass --log-level debug automatically when debug=true so CI and troubleshooting runs emit full rendered config dumps without requiring a separate log-level override. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <noreply@cursor.com>	2026-06-08 19:25:48 +02:00
Fabiano Fidêncio	b63494345d	kata-deploy: add configurable verbosity for full CRI config dumps Allow operators to force kata-deploy log verbosity and emit the fully rendered containerd/CRI-O config and drop-in files in debug mode so install troubleshooting can rely on exact effective configuration. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <noreply@cursor.com>	2026-06-08 19:25:48 +02:00
Fabiano Fidêncio	b119b051cb	kata-deploy: support drop-in configs for default runtimes Allow operators to provide per-shim drop-in TOML for built-in runtimes and reconcile stale override files so upgrades and migrations remain safe when drop-ins are added or removed. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Codex	2026-06-08 13:31:03 +02:00
Fabiano Fidêncio	1ca7129581	Merge pull request #13176 from Amulyam24/kata-deploy-fix kata-deploy: add the imports directive explicitly if expected but not found	2026-06-05 22:24:16 +02:00
Fabiano Fidêncio	f6ff9578d4	Merge pull request #13161 from kata-containers/sprt/remove-configure-mariner ci: remove Mariner annotations and use new config	2026-06-05 20:22:58 +02:00
Fabiano Fidêncio	e9ee97f751	kata-deploy: inherit custom RuntimeClass overhead from baseConfig Default custom runtime RuntimeClass overhead.podFixed to the selected baseConfig values, so equivalent runtimes behave consistently without repeating boilerplate. In case the user wants to enforce that no overhead is set on the custom RuntimeClass, disable inheritance with inheritBaseOverhead=false. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-06-05 17:22:25 +02:00
Amulyam24	b15a5fbe36	kata-deploy: add the imports directive explicitly if expected but not found For containerd v2.2+, the flow assumes that the imports directive would be present. It is better to check it and add if it doesn't exist. Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>	2026-06-05 18:47:07 +05:30
dependabot[bot]	4ab63d0a5d	build(deps): bump tar from 0.4.45 to 0.4.46 Bumps [tar](https://github.com/composefs/tar-rs) from 0.4.45 to 0.4.46. - [Release notes](https://github.com/composefs/tar-rs/releases) - [Commits](https://github.com/composefs/tar-rs/compare/0.4.45...0.4.46) --- updated-dependencies: - dependency-name: tar dependency-version: 0.4.46 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2026-06-04 07:52:44 +00:00
Aurélien Bombo	de5333f275	ci: remove Mariner annotations and use new config This is a follow-up to #13126 where we forgot to remove this now-unused code. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-06-03 09:25:12 -05:00
Fabiano Fidêncio	230e01b04e	Merge pull request #13126 from kata-containers/topic/runtimes-introduce-azure-specific-configs runtime/runtime-rs: introduce Azure specific configs	2026-06-02 09:17:09 +02:00
Fabiano Fidêncio	57de50f43c	Merge pull request #13141 from fidencio/topic/kata-deploy-fix-stale-containerd-import kata-deploy: scrub stale containerd import on conf.d migration	2026-06-01 18:13:08 +02:00
Greg Kurz	8a49ecb159	Merge pull request #13097 from BbolroC/fix-shim-components-for-s390x ci: Refactor boot-image-se build and update shim components	2026-06-01 11:43:42 +02:00
Fabiano Fidêncio	f788997253	kata-deploy: scrub stale containerd import on conf.d migration Since the conf.d migration (containerd >= 2.2.0), kata-deploy writes its drop-in to the auto-imported /etc/containerd/conf.d/ and no longer manages the main config's `imports` array. A node upgraded from a pre-conf.d kata-deploy keeps the legacy `{dest_dir}/containerd/config.d/kata-deploy.toml` entry in `imports`, since the new code neither adds nor removes it. On uninstall, remove_artifacts() deletes the artifacts dir (including the file that import still points at) and then restarts containerd, which fails to load the now-dangling import and wedges the node: pods get stuck Terminating and new pods cannot start. This broke the lifecycle-manager E2E tests (TC-02..TC-07) which repeatedly upgrade then reinstall across the 3.30.0 -> latest version boundary. Defensively scrub the legacy import from the main containerd config in both configure_containerd (at conf.d migration time) and cleanup_containerd (before artifacts are removed and containerd is restarted). The helper is a no-op when the config is absent, has no `imports` array, or does not contain the legacy entry. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-06-01 11:07:13 +02:00
Fabiano Fidêncio	02fd572195	Merge pull request #13134 from jojimt/rc-version kata-deploy: Add a version annotation to runtimeclass	2026-06-01 08:21:30 +02:00
manuelh-dev	953b306ff3	Merge pull request #12979 from manuelh-dev/mahuber/erofs-tmpfs-mount runtime-rs/agent: support EROFS snapshots without a rwlayer	2026-05-29 13:50:27 -07:00
Fabiano Fidêncio	f349d19bf4	Merge pull request #12956 from zvonkok/nvgpu-tarball-chart build: add kata-deploy-publish target	2026-05-29 21:22:44 +02:00
Joji Mekkattuparamban	8549d71c6f	kata-deploy: Add a version annotation to runtimeclass Enables automations to determine version with a simple read RBAC on the runtime class. Helpful when versions need to match with other tools (e.g. genpolicy) or when simple version determination is needed for other reasons. Fixes #13123 Signed-off-by: Joji Mekkattuparamban <jojim@nvidia.com>	2026-05-29 10:50:19 -07:00
Zvonko Kaiser	7f906ec95d	build: add kata-deploy-publish target Mirror the CI payload publish flow in local builds, including image and helm chart publishing, while reusing the same chart upload helper in payload-after-push to avoid duplicated chart packaging logic. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-05-29 16:22:12 +02:00
Zvonko Kaiser	fb73ccc352	build: include kata-deploy static artifacts in nvgpu bundle Build and package kata-deploy binary and nydus snapshotter component tarballs as part of nvgpu-tarball so local publish can consume a single kata-static.tar.zst without rebuilding extra artifacts. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-05-29 16:22:12 +02:00
Fabiano Fidêncio	9729ed9993	kernel: enable InfiniBand/RoCE support in mlx5 kernel config fragment Add the kernel configuration options required for RDMA / RoCE operation with Mellanox ConnectX / BlueField VFs: - CONFIG_INFINIBAND: IB subsystem core - CONFIG_INFINIBAND_ADDR_TRANS: RoCEv2 GID table management - CONFIG_INFINIBAND_USER_ACCESS: userspace verbs (/dev/infiniband/uverbs*) - CONFIG_INFINIBAND_USER_MAD: userspace MAD interface - CONFIG_MLX5_INFINIBAND: mlx5_ib ConnectX IB/RoCE driver - CONFIG_CGROUP_RDMA: RDMA cgroup controller (required by mlx5_ib) Bump kata_config_version to 196 to trigger a kernel rebuild. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-05-29 13:07:45 +02:00
Hyounggyu Choi	640fa488a5	ci: Refactor boot-image-se build and update shim components - Add FAKE_SE_IMAGE mode support in SE image build scripts for CI without real SE setup - Simplify workflow by removing build-asset-boot-image-se job - Integrate fake-boot-image-se into build matrix instead of separate job - Skip attestation for fake-boot-image-se builds - Update qemu-se and qemu-se-runtime-rs shim components to use: - rootfs-initrd-confidential instead of rootfs-image-confidential - boot-image-se component This change streamlines the s390x SE build process and makes it easier to test without requiring actual Secure Execution infrastructure. This fixes deployment issues on non-TEE systems where TEE-specific artifacts (like boot-image-se for IBM SEL) are not included in the kata-deploy image, while ensuring TEE systems still get all required components. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2026-05-29 11:35:40 +02:00
Fabiano Fidêncio	bddf1ecab4	build: stop producing cloud-hypervisor-glibc artifacts Drop cloud-hypervisor-glibc from local and CI kata-deploy build targets now that Azure CLH uses the standard cloud-hypervisor artifact set. This removes obsolete build matrix entries and installer target handling. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-28 23:32:37 +02:00
Fabiano Fidêncio	8c3a2c1a95	kata-deploy: register clh-azure shim families Add clh-azure and clh-azure-runtime-rs as first-class shims across installer logic, helm defaults, runtimeclass overhead mapping, and shim component catalogs. This aligns deploy payload selection with the new native Azure-specific CLH configs. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-28 23:32:37 +02:00
Fabiano Fidêncio	76212b9e0c	kata-deploy: allow containerd user drop-in overrides Add an optional user-provided containerd drop-in that is loaded after kata-deploy's generated drop-in so operators can override snapshotter and other runtime settings without patching kata-deploy. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-27 17:26:55 +00:00
Fabiano Fidêncio	a423cf9526	Merge pull request #13087 from bpradipt/landlock kernel: Enable landlock LSM	2026-05-27 17:34:47 +02:00
Pradipta Banerjee	1487eaaaa2	kernel: Enable landlock LSM Allows using landlock LSM for the container process Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>	2026-05-27 13:33:46 +02:00
Fabiano Fidêncio	238dd51039	Merge pull request #13108 from thebigbone/containerd-config containerd: use /etc/containerd/conf.d/ drop-in for containerd >= 2.2.0	2026-05-27 10:14:51 +02:00
Fabiano Fidêncio	64056add0d	build: add passthrough mode to kata-deploy-merge-builds kata-deploy now unpacks individual component tarballs itself, so the final `kata-static.tar.zst` no longer needs to be a merged filesystem payload. Merging everything has two downsides for that flow: - It pulls in everything kept on disk under build/, which previously forced us to also drop agent/busybox/coco-guest-components/nydus from the build set to keep them out of the final tarball. - The merged tarball duplicates content kata-deploy will repack on its own anyway. Add a `passthrough` mode to kata-deploy-merge-builds.sh that, instead of untarring each `kata-static-*.tar.zst` into a single filesystem tree, copies the selected component tarballs into the final tarball as-is. The existing `merge` mode remains the default to preserve the non-kata-deploy install paths (e.g. `make install-tarball`). Wire `nvgpu-tarball` to the new mode via `FINAL_TARBALL_MERGE_MODE= passthrough`, paired with the existing `FINAL_TARBALL_INPUTS` allowlist. This lets us keep agent/busybox/coco as build prereqs of the GPU rootfs while shipping a final tarball that only contains the NVIDIA-relevant components. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-26 21:55:08 +02:00
Zvonko Kaiser	9b85bff2b4	build: don't double-prefix absolute versions.yaml path in merge-builds The Makefile passes $(MK_DIR)/../../../../versions.yaml — already an absolute path — to kata-deploy-merge-builds.sh. The script then unconditionally prepended ${PWD}/, producing a malformed path like: /repo//repo/tools/.../local-build//../../../../versions.yaml which made cp fail with "No such file or directory" at the merge-builds step (the very last step of `make nvgpu-tarball`). Only prepend ${PWD}/ when the input is relative — that preserves the original fix for the pushd-changes-cwd issue (commit `ae6e8d2b3`) without mangling absolute paths from Makefile callers. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> Assisted-By: Claude <noreply@anthropic.com>	2026-05-26 21:55:08 +02:00
Zvonko Kaiser	5aa6229eba	build: group parallel build output by target With `make all -j N` running multiple tarballs concurrently and silent mode redirecting each build's stdio to its per-target log, a failing target's "Failed to build: <name>, logs:" banner gets interleaved with other in-flight jobs' output, making it hard to tell which target failed. Pass `--output-sync=target` to the recursive make so each sub-make's output is buffered and emitted as one block when the target finishes, keeping the failure banner contiguous with its log dump. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> Assisted-By: Claude <noreply@anthropic.com>	2026-05-26 21:55:08 +02:00
Zvonko Kaiser	3be370d2d6	qemu: clean stale clone before fetching sources build-qemu.sh runs in the per-target builddir (e.g. build/qemu-tarball/builddir/), which persists across runs. If a previous build left the cloned `qemu` tree behind (e.g. after an interrupted build), the next run errors out with: fatal: destination path 'qemu' already exists and is not an empty directory. Wipe `qemu` before cloning so the build is repeatable from a dirty builddir. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> Assisted-By: Claude <noreply@anthropic.com>	2026-05-26 21:55:08 +02:00
Zvonko Kaiser	18cee00df9	build: guard parallel races on build symlink and ~/.docker Parallel make jobs invoke kata-deploy-binaries-in-docker.sh concurrently and collide on two shared paths: ln: Already exists mkdir: /home/$USER/.docker: File exists Skip the symlink creation when the link is already in place. If a parallel job wins the create race in the cold-start window, fall back to re-checking that the link exists so a real ln failure (permission, disk full, etc.) still propagates rather than being silently swallowed. The `~/.docker` mkdir is guarded by a `[[ ! -d ]]` check that two processes can pass simultaneously, after which one bare `mkdir` fails. Switch to `mkdir -p` so the second invocation is a no-op. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-05-26 21:55:08 +02:00
Zvonko Kaiser	815ebc340d	build: add nvgpu-tarball target serial-targets now waits for the other BASE_TARBALLS items so the inner rootfs assembly runs with DEPS= against already-built artifacts. This also fixes a pre-existing race in the main flows where the outer parallel and inner -j 1 makes could both build kernel-tarball at the same time. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-05-26 21:55:08 +02:00
Zvonko Kaiser	6a367ab777	build: declare install-prebuilt-artifacts as .PHONY Leftover from #12954's rebase: the substantive sed-hack -> DEPS= change landed on main, but the .PHONY declaration didn't make it. Add it so the recipe always runs even if a stale `kata-artifacts` file exists in CWD. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> Assisted-By: Claude <noreply@anthropic.com>	2026-05-26 21:55:08 +02:00
thebigbone	d9f2aa895e	containerd: use /etc/containerd/conf.d/ drop-in for containerd >= 2.2.0 containerd 2.2.0+ always imports /etc/containerd/conf.d/*.toml, so write kata-deploy runtime config there directly, avoiding modification of the main containerd config's imports array. Signed-off-by: thebigbone <pacman@duck.com>	2026-05-26 21:29:46 +02:00
Fabiano Fidêncio	25491fc20c	Merge pull request #13104 from kata-containers/topic/kata-deploy-build-as-an-artefact kata-deploy: prebuild payload-specific component artifacts	2026-05-25 22:56:55 +02:00
Fabiano Fidêncio	c65d64873b	kata-deploy: prebuild payload-specific component artifacts Build and publish the kata-deploy binary and CoCo guest-pull nydus snapshotter as dedicated per-arch artifacts, then consume those tarballs when assembling the kata-deploy image. This avoids rebuilding those components in the payload image (which would happen in serial) path and reduces overall CI build time. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-25 22:13:41 +02:00
Fabiano Fidêncio	3dc02a8604	Merge pull request #13085 from Apokleos/erofs-gpt-vmdk-only runtime-rs: Support erofs snapshotter with gpt vmdk mode	2026-05-25 16:29:59 +02:00
Zvonko Kaiser	6c6c5809f1	Merge pull request #13109 from fidencio/topic/build-validate-measured-rootfs-root-hashes-for-all-shims build: Validate measured-rootfs root hashes all shims	2026-05-25 15:58:35 +02:00
Zvonko Kaiser	aeadb1af35	Merge pull request #12948 from fidencio/topic/numa runtime (go): agent: Add NUMA support for QEMU	2026-05-25 15:33:14 +02:00
Alex Lyn	a359d13476	build: Validate measured-rootfs root hashes all shims The cached shim-v2 tarballs ship per-variant `root_hash_.txt` files embedded in the matching measured-rootfs image. Until now only shim-v2-rust validated those hashes against the freshly built rootfs images on a cache hit; shim-v2-go reused whatever was cached without checking, even though its bundled configuration files contain the `KERNELVERITYPARAMS_` values baked in at build time. When a PR changes the agent (and therefore the rootfs image and its dm-verity hash) but does not touch `src/runtime`, the shim-v2-go cache key stays the same and the stale tarball is reused. The resulting guest cmdline carries a verity hash that no longer matches the new rootfs image, so the VM panics very early in boot: device-mapper: verity: 254:1: metadata block 0 is corrupted erofs (device dm-0): cannot read erofs superblock Kernel panic - not syncing: VFS: Unable to mount root fs ... Generalize the shim-v2-rust cache validation so it also runs for shim-v2-go, push the per-variant root-hash sidecar files for both shims, and fall back to a full rebuild whenever the cached hash is missing or differs from the image one. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-25 19:12:52 +08:00
Alex Lyn	fd139a1143	kata-deploy: Reset max_unmerged_layers to "0" within erofs snapshotter we should set max_unmerged_layers = 0 for erofs snapshotter gpt-vmdk mode. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-25 19:08:31 +08:00
Fabiano Fidêncio	72be31c384	build: Validate measured-rootfs root hashes all shims The cached shim-v2 tarballs ship per-variant `root_hash_.txt` files embedded in the matching measured-rootfs image. Until now only shim-v2-rust validated those hashes against the freshly built rootfs images on a cache hit; shim-v2-go reused whatever was cached without checking, even though its bundled configuration files contain the `KERNELVERITYPARAMS_` values baked in at build time. When a PR changes the agent (and therefore the rootfs image and its dm-verity hash) but does not touch `src/runtime`, the shim-v2-go cache key stays the same and the stale tarball is reused. The resulting guest cmdline carries a verity hash that no longer matches the new rootfs image, so the VM panics very early in boot: device-mapper: verity: 254:1: metadata block 0 is corrupted erofs (device dm-0): cannot read erofs superblock Kernel panic - not syncing: VFS: Unable to mount root fs ... Generalize the shim-v2-rust cache validation so it also runs for shim-v2-go, push the per-variant root-hash sidecar files for both shims, and fall back to a full rebuild whenever the cached hash is missing or differs from the image one. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-25 11:04:08 +02:00
Fabiano Fidêncio	7ddea26137	Merge pull request #13086 from fvichot/flo-kata-monitor-fix kata-monitor: use full URI for connecting to containerd	2026-05-25 10:16:11 +02:00
Fabiano Fidêncio	407a6946f2	Merge pull request #13077 from hdp617/fix-kata-deploy-build packaging: fix parallel kernel build race and kata-deploy script bugs	2026-05-25 09:53:38 +02:00
Fabiano Fidêncio	8d2ecaabb5	versions: Bump QEMU to v11.0.0 For more details see QEMU's release notes: https://www.qemu.org/2026/04/22/qemu-11-0-0/ GPU experimental variants are also using v11.0.0 plus one patch to solve issues related to NUMA mapping. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-24 22:00:46 +02:00
Florian Vichot	554e8f91b1	kata-monitor: use full URI for connecting to containerd Without the protocol in the URI, grpc-go defaults to the DNS resolver, which results in an error for unix sockets (`name resolver error: produced zero addresses`). We also remove the `getAddressAndDialer(...)` and `dial(...)` functions, as they are no longer necessary, grpc-go supports connecting to unix sockets directly. This also removes the matching tests. This also adds a `Makefile` and tweaks the Dockerfile to simplify building the Docker image. Fixes #12398 Signed-off-by: Florian Vichot <florian.vichot@gmail.com>	2026-05-23 16:47:46 +02:00
Huy Pham	3ec444a7df	kernel: bump config version Bump the Kata Containers kernel configuration version to 195. Signed-off-by: Huy Pham <huypham@google.com>	2026-05-22 12:26:53 -07:00
Huy Pham	c490373a78	kata-deploy: packaging: fix absolute path resolution in merge script The `kata-deploy-merge-builds.sh` script blindly prepended `PWD` to the `kata_versions_yaml_file` argument, assuming it was always a relative path. However, the `Makefile` passes an absolute path using `$(MK_DIR)`. This resulted in invalid double-concatenated paths like `/workspace/...//workspace/...` which failed to copy. Fix this by using `readlink -f` to safely resolve the path. This correctly handles both relative and absolute paths, preventing path corruption. Signed-off-by: Huy Pham <huypham@google.com>	2026-05-22 12:05:56 -07:00
Fabiano Fidêncio	5d3e1e6396	kata-deploy: verify kata-runtime label remains stable on rke2/k3s The retry loop added in `efd468df3f` still allows the install to declare success while inside the kubelet's post-restart re-register window. On rke2/k3s, `systemctl restart rke2-agent` restarts both containerd and the kubelet, but `wait_till_node_is_ready` polls `.status.conditions[Ready]` every 2 s and returns on the first `True` observation it sees. By default the kubelet only publishes node status every ~10 s, so that first `True` is almost always the stale value from before the restart — the kubelet hasn't actually finished restarting yet. `label_node_with_retry` then applies the label, sleeps 1 s, reads back "true" (still stale, kubelet still down), and returns Ok. Install completes, `/readyz` flips to 200, helm releases its `--wait`, and the bats test starts — and only then does the kubelet finish coming up, re-register the node, and clobber the label with its cached set. The lifecycle test sees an empty `katacontainers.io/kata-runtime` and fails: # Node label katacontainers.io/kata-runtime: not ok 1 Kata artifacts are present on host after install A single-shot verification can't distinguish "still stale true" from "truly stable true after kubelet re-register". Replace it with a stability window: after (re)applying the label, require it to remain at the expected value for STABILITY_CHECKS=6 consecutive observations spaced CHECK_INTERVAL=2 s apart (≈ 12 s — comfortably more than the kubelet's status-update period). If the value ever drifts inside the window, re-apply and restart the stability counter. Bounded by MAX_APPLY_ATTEMPTS=12, so worst case is ~3 min; happy path adds ~12 s to install. Also add a short polling loop to the test's own label assertion as belt-and-suspenders for any leftover transient race, matching the existing retry pattern used for the container-runtime version check. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-22 11:53:18 +02:00

1 2 3 4 5 ...

1772 Commits