Commit Graph

2411 Commits

Author SHA1 Message Date
stevenhorsman
1d854ad7af ci: Update required tests
publish-kata-deploy-payload got renamed in #13107, which broke the CI.

Now, instead of tracking all those intermediate steps, let's make sure
we only track the tests themselves.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-06-11 19:02:23 +02:00
Fabiano Fidêncio
5731d30554 helm: add optional kata-monitor deployment to kata-deploy
Add a disabled-by-default kata-monitor DaemonSet to the kata-deploy Helm chart,
including image/configuration values so operators can enable monitor shipping as
part of the same deployment workflow when needed.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: OpenAI Codex <codex@openai.com>
2026-06-09 14:33:30 +02:00
Fabiano Fidêncio
0d6234e7be ci: share kata image publishing workflows
Unify kata-deploy and kata-monitor image publishing behind a single
reusable workflow, and rename workflow files to generic kata-images
names.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: OpenAI Codex <codex@openai.com>
2026-06-09 14:33:30 +02:00
Fabiano Fidêncio
e04a4326ec tools: build kata-monitor image from shim-v2-go tarball
Build kata-monitor images by extracting the binary from the
shim-v2-go tarball and shipping it on top of
gcr.io/distroless/static-debian13.

Because the binary is built inside an Ubuntu (glibc) toolchain it
cannot run on a pure musl/alpine base — users hit __fprintf_chk /
__vfprintf_chk relocation errors. To get a small, distroless
runtime image we use the same pattern as
tools/packaging/kata-deploy/Dockerfile: copy the glibc libraries
the binary needs (plus the dynamic linker) via ldd from a glibc
base image.

In order to do so, we also added a helper script to build and
publish architecture-specific monitor images from tarball
artifacts.

Reported-by: Steve Linde <stevenlinde@google.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: OpenAI Codex <codex@openai.com>
2026-06-09 14:33:30 +02:00
Fabiano Fidêncio
ac2221a6a5 Merge pull request #13004 from fidencio/topic/versions-bump-containerd-to-2.3
versions: Bump containerd to 2.3
2026-06-09 08:21:58 +02:00
Fabiano Fidêncio
48ebbbec3a kata-deploy: honor debug mode with CLI log-level
Make the chart pass --log-level debug automatically when debug=true so
CI and troubleshooting runs emit full rendered config dumps without
requiring a separate log-level override.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
2026-06-08 19:25:48 +02:00
Fabiano Fidêncio
b63494345d kata-deploy: add configurable verbosity for full CRI config dumps
Allow operators to force kata-deploy log verbosity and emit the fully
rendered containerd/CRI-O config and drop-in files in debug mode so
install troubleshooting can rely on exact effective configuration.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
2026-06-08 19:25:48 +02:00
Fabiano Fidêncio
fc08218f55 gatekeeper: rename required tests to minimum/latest
The containerd_version matrix values were renamed from lts/active to
minimum/latest, which changes the generated CI job names.  Update the
required-tests list so the gatekeeper waits on the checks that are
actually produced.

The amd64 run-containerd-stability, run-nydus, run-cri-containerd and
free-runner run-k8s-tests jobs map lts -> minimum and active -> latest.
The s390x cri-containerd job maps active -> latest, matching its
updated matrix.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
2026-06-08 19:20:14 +02:00
Fabiano Fidêncio
b119b051cb kata-deploy: support drop-in configs for default runtimes
Allow operators to provide per-shim drop-in TOML for built-in runtimes
and reconcile stale override files so upgrades and migrations remain
safe when drop-ins are added or removed.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Codex
2026-06-08 13:31:03 +02:00
Fabiano Fidêncio
1ca7129581 Merge pull request #13176 from Amulyam24/kata-deploy-fix
kata-deploy: add the imports directive explicitly if expected but not found
2026-06-05 22:24:16 +02:00
Fabiano Fidêncio
f6ff9578d4 Merge pull request #13161 from kata-containers/sprt/remove-configure-mariner
ci: remove Mariner annotations and use new config
2026-06-05 20:22:58 +02:00
Fabiano Fidêncio
e9ee97f751 kata-deploy: inherit custom RuntimeClass overhead from baseConfig
Default custom runtime RuntimeClass overhead.podFixed to the selected
baseConfig values, so equivalent runtimes behave consistently without
repeating boilerplate.

In case the user wants to enforce that no overhead is set on the custom
RuntimeClass, disable inheritance with inheritBaseOverhead=false.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-06-05 17:22:25 +02:00
Amulyam24
b15a5fbe36 kata-deploy: add the imports directive explicitly if expected but not found
For containerd v2.2+, the flow assumes that the imports directive would be present.
It is better to check it and add if it doesn't exist.

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2026-06-05 18:47:07 +05:30
Steve Horsman
1624ebe362 Merge pull request #13135 from kata-containers/dependabot/cargo/tar-0.4.46
build(deps): bump tar from 0.4.45 to 0.4.46
2026-06-05 09:44:46 +01:00
Fabiano Fidêncio
743b0a4839 Merge pull request #13165 from stevenhorsman/bump-go-to-1.25.11
versions: bump golang to 1.25.11
2026-06-04 20:24:57 +02:00
stevenhorsman
81c7dde0ae ci: Remove kata-monitor test from required
The kata-monitor test is currently failing and is running a very EoL
version of cri-o. This area is being actively reworked in #13107,
so remove this and then once kata-monitor tests are stable we
can re-add the new versions

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-06-04 14:40:17 +01:00
dependabot[bot]
4ab63d0a5d build(deps): bump tar from 0.4.45 to 0.4.46
Bumps [tar](https://github.com/composefs/tar-rs) from 0.4.45 to 0.4.46.
- [Release notes](https://github.com/composefs/tar-rs/releases)
- [Commits](https://github.com/composefs/tar-rs/compare/0.4.45...0.4.46)

---
updated-dependencies:
- dependency-name: tar
  dependency-version: 0.4.46
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-06-04 07:52:44 +00:00
stevenhorsman
879912be25 versions: bump golang to 1.25.11
Bump the go version to resolve CVEs:
- GO-2026-5037
- GO-2026-5038
- GO-2026-5039

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Generated-By: IBM Bob
2026-06-04 08:49:17 +01:00
Aurélien Bombo
de5333f275 ci: remove Mariner annotations and use new config
This is a follow-up to #13126 where we forgot to remove this now-unused code.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-06-03 09:25:12 -05:00
stevenhorsman
51eee428f4 testing/webhook: bump golang.org/x dependencies
Bump golang.org/x/net from v0.53.0 to v0.55.0 and golang.org/x/sys
from v0.43.0 to v0.44.0 to resolve CVEs:
- GO-2026-5024
- GO-2026-5025
- GO-2026-5026
 - GO-2026-5027
- GO-2026-5028
- GO-2026-5029
- GO-2026-5030

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-06-03 09:56:54 +01:00
Fabiano Fidêncio
230e01b04e Merge pull request #13126 from kata-containers/topic/runtimes-introduce-azure-specific-configs
runtime/runtime-rs: introduce Azure specific configs
2026-06-02 09:17:09 +02:00
Fabiano Fidêncio
57de50f43c Merge pull request #13141 from fidencio/topic/kata-deploy-fix-stale-containerd-import
kata-deploy: scrub stale containerd import on conf.d migration
2026-06-01 18:13:08 +02:00
Greg Kurz
8a49ecb159 Merge pull request #13097 from BbolroC/fix-shim-components-for-s390x
ci: Refactor boot-image-se build and update shim components
2026-06-01 11:43:42 +02:00
Fabiano Fidêncio
f788997253 kata-deploy: scrub stale containerd import on conf.d migration
Since the conf.d migration (containerd >= 2.2.0), kata-deploy writes its
drop-in to the auto-imported /etc/containerd/conf.d/ and no longer manages
the main config's `imports` array. A node upgraded from a pre-conf.d
kata-deploy keeps the legacy `{dest_dir}/containerd/config.d/kata-deploy.toml`
entry in `imports`, since the new code neither adds nor removes it.

On uninstall, remove_artifacts() deletes the artifacts dir (including the
file that import still points at) and then restarts containerd, which fails
to load the now-dangling import and wedges the node: pods get stuck
Terminating and new pods cannot start. This broke the lifecycle-manager E2E
tests (TC-02..TC-07) which repeatedly upgrade then reinstall across the
3.30.0 -> latest version boundary.

Defensively scrub the legacy import from the main containerd config in both
configure_containerd (at conf.d migration time) and cleanup_containerd
(before artifacts are removed and containerd is restarted). The helper is a
no-op when the config is absent, has no `imports` array, or does not contain
the legacy entry.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-06-01 11:07:13 +02:00
Fabiano Fidêncio
02fd572195 Merge pull request #13134 from jojimt/rc-version
kata-deploy: Add a version annotation to runtimeclass
2026-06-01 08:21:30 +02:00
manuelh-dev
953b306ff3 Merge pull request #12979 from manuelh-dev/mahuber/erofs-tmpfs-mount
runtime-rs/agent: support EROFS snapshots without a rwlayer
2026-05-29 13:50:27 -07:00
Fabiano Fidêncio
f349d19bf4 Merge pull request #12956 from zvonkok/nvgpu-tarball-chart
build: add kata-deploy-publish target
2026-05-29 21:22:44 +02:00
Joji Mekkattuparamban
8549d71c6f kata-deploy: Add a version annotation to runtimeclass
Enables automations to determine version with a simple read RBAC
on the runtime class. Helpful when versions need to match with other
tools (e.g. genpolicy) or when simple version determination is needed
for other reasons.

Fixes #13123

Signed-off-by: Joji Mekkattuparamban <jojim@nvidia.com>
2026-05-29 10:50:19 -07:00
Zvonko Kaiser
7f906ec95d build: add kata-deploy-publish target
Mirror the CI payload publish flow in local builds, including image and
helm chart publishing, while reusing the same chart upload helper in
payload-after-push to avoid duplicated chart packaging logic.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-05-29 16:22:12 +02:00
Zvonko Kaiser
fb73ccc352 build: include kata-deploy static artifacts in nvgpu bundle
Build and package kata-deploy binary and nydus snapshotter component
tarballs as part of nvgpu-tarball so local publish can consume a single
kata-static.tar.zst without rebuilding extra artifacts.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-05-29 16:22:12 +02:00
Fabiano Fidêncio
9729ed9993 kernel: enable InfiniBand/RoCE support in mlx5 kernel config fragment
Add the kernel configuration options required for RDMA / RoCE operation
with Mellanox ConnectX / BlueField VFs:

  - CONFIG_INFINIBAND: IB subsystem core
  - CONFIG_INFINIBAND_ADDR_TRANS: RoCEv2 GID table management
  - CONFIG_INFINIBAND_USER_ACCESS: userspace verbs (/dev/infiniband/uverbs*)
  - CONFIG_INFINIBAND_USER_MAD: userspace MAD interface
  - CONFIG_MLX5_INFINIBAND: mlx5_ib ConnectX IB/RoCE driver
  - CONFIG_CGROUP_RDMA: RDMA cgroup controller (required by mlx5_ib)

Bump kata_config_version to 196 to trigger a kernel rebuild.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-05-29 13:07:45 +02:00
Hyounggyu Choi
640fa488a5 ci: Refactor boot-image-se build and update shim components
- Add FAKE_SE_IMAGE mode support in SE image build scripts for CI without real SE setup
- Simplify workflow by removing build-asset-boot-image-se job
- Integrate fake-boot-image-se into build matrix instead of separate job
- Skip attestation for fake-boot-image-se builds
- Update qemu-se and qemu-se-runtime-rs shim components to use:
  - rootfs-initrd-confidential instead of rootfs-image-confidential
  - boot-image-se component

This change streamlines the s390x SE build process and makes it easier
to test without requiring actual Secure Execution infrastructure.
This fixes deployment issues on non-TEE systems where TEE-specific artifacts
(like boot-image-se for IBM SEL) are not included in the kata-deploy image,
while ensuring TEE systems still get all required components.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2026-05-29 11:35:40 +02:00
Fabiano Fidêncio
bddf1ecab4 build: stop producing cloud-hypervisor-glibc artifacts
Drop cloud-hypervisor-glibc from local and CI kata-deploy build targets
now that Azure CLH uses the standard cloud-hypervisor artifact set.

This removes obsolete build matrix entries and installer target
handling.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-28 23:32:37 +02:00
Fabiano Fidêncio
81ce51a9aa ci: target Azure CLH runtimes directly in AKS tests
Switch AKS Mariner matrix entries to clh-azure handlers and remove the
temporary host-OS based helm value overrides.

Update integration test wiring and required test labels so CI tracks the
new runtime names.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-28 23:32:37 +02:00
Fabiano Fidêncio
8c3a2c1a95 kata-deploy: register clh-azure shim families
Add clh-azure and clh-azure-runtime-rs as first-class shims across
installer logic, helm defaults, runtimeclass overhead mapping, and shim
component catalogs.

This aligns deploy payload selection with the new native Azure-specific
CLH configs.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-28 23:32:37 +02:00
Fabiano Fidêncio
76212b9e0c kata-deploy: allow containerd user drop-in overrides
Add an optional user-provided containerd drop-in that is loaded after
kata-deploy's generated drop-in so operators can override snapshotter
and other runtime settings without patching kata-deploy.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-27 17:26:55 +00:00
Fabiano Fidêncio
a423cf9526 Merge pull request #13087 from bpradipt/landlock
kernel: Enable landlock LSM
2026-05-27 17:34:47 +02:00
Pradipta Banerjee
1487eaaaa2 kernel: Enable landlock LSM
Allows using landlock LSM for the container process

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
2026-05-27 13:33:46 +02:00
Fabiano Fidêncio
238dd51039 Merge pull request #13108 from thebigbone/containerd-config
containerd: use /etc/containerd/conf.d/ drop-in for containerd >= 2.2.0
2026-05-27 10:14:51 +02:00
Fabiano Fidêncio
64056add0d build: add passthrough mode to kata-deploy-merge-builds
kata-deploy now unpacks individual component tarballs itself, so the
final `kata-static.tar.zst` no longer needs to be a merged filesystem
payload. Merging everything has two downsides for that flow:

  - It pulls in everything kept on disk under build/, which previously
    forced us to also drop agent/busybox/coco-guest-components/nydus
    from the build set to keep them out of the final tarball.
  - The merged tarball duplicates content kata-deploy will repack on
    its own anyway.

Add a `passthrough` mode to kata-deploy-merge-builds.sh that, instead
of untarring each `kata-static-*.tar.zst` into a single filesystem
tree, copies the selected component tarballs into the final tarball
as-is. The existing `merge` mode remains the default to preserve the
non-kata-deploy install paths (e.g. `make install-tarball`).

Wire `nvgpu-tarball` to the new mode via `FINAL_TARBALL_MERGE_MODE=
passthrough`, paired with the existing `FINAL_TARBALL_INPUTS`
allowlist. This lets us keep agent/busybox/coco as build prereqs of
the GPU rootfs while shipping a final tarball that only contains the
NVIDIA-relevant components.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-26 21:55:08 +02:00
Zvonko Kaiser
9b85bff2b4 build: don't double-prefix absolute versions.yaml path in merge-builds
The Makefile passes $(MK_DIR)/../../../../versions.yaml — already an
absolute path — to kata-deploy-merge-builds.sh. The script then
unconditionally prepended ${PWD}/, producing a malformed path like:

  /repo//repo/tools/.../local-build//../../../../versions.yaml

which made cp fail with "No such file or directory" at the merge-builds
step (the very last step of `make nvgpu-tarball`).

Only prepend ${PWD}/ when the input is relative — that preserves the
original fix for the pushd-changes-cwd issue (commit ae6e8d2b3) without
mangling absolute paths from Makefile callers.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Assisted-By: Claude <noreply@anthropic.com>
2026-05-26 21:55:08 +02:00
Zvonko Kaiser
5aa6229eba build: group parallel build output by target
With `make all -j N` running multiple tarballs concurrently and silent
mode redirecting each build's stdio to its per-target log, a failing
target's "Failed to build: <name>, logs:" banner gets interleaved with
other in-flight jobs' output, making it hard to tell which target
failed.

Pass `--output-sync=target` to the recursive make so each sub-make's
output is buffered and emitted as one block when the target finishes,
keeping the failure banner contiguous with its log dump.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Assisted-By: Claude <noreply@anthropic.com>
2026-05-26 21:55:08 +02:00
Zvonko Kaiser
3be370d2d6 qemu: clean stale clone before fetching sources
build-qemu.sh runs in the per-target builddir (e.g.
build/qemu-tarball/builddir/), which persists across runs. If a previous
build left the cloned `qemu` tree behind (e.g. after an interrupted
build), the next run errors out with:

  fatal: destination path 'qemu' already exists and is not an empty
  directory.

Wipe `qemu` before cloning so the build is repeatable from a dirty
builddir.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Assisted-By: Claude <noreply@anthropic.com>
2026-05-26 21:55:08 +02:00
Zvonko Kaiser
18cee00df9 build: guard parallel races on build symlink and ~/.docker
Parallel make jobs invoke kata-deploy-binaries-in-docker.sh concurrently
and collide on two shared paths:

  ln: Already exists
  mkdir: /home/$USER/.docker: File exists

Skip the symlink creation when the link is already in place. If a
parallel job wins the create race in the cold-start window, fall back to
re-checking that the link exists so a real ln failure (permission, disk
full, etc.) still propagates rather than being silently swallowed.

The `~/.docker` mkdir is guarded by a `[[ ! -d ]]` check that two
processes can pass simultaneously, after which one bare `mkdir` fails.
Switch to `mkdir -p` so the second invocation is a no-op.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-05-26 21:55:08 +02:00
Zvonko Kaiser
815ebc340d build: add nvgpu-tarball target
serial-targets now waits for the other BASE_TARBALLS items so the
inner rootfs assembly runs with DEPS= against already-built
artifacts. This also fixes a pre-existing race in the main flows
where the outer parallel and inner -j 1 makes could both build
kernel-tarball at the same time.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-05-26 21:55:08 +02:00
Zvonko Kaiser
6a367ab777 build: declare install-prebuilt-artifacts as .PHONY
Leftover from #12954's rebase: the substantive sed-hack -> DEPS= change
landed on main, but the .PHONY declaration didn't make it. Add it so
the recipe always runs even if a stale `kata-artifacts` file exists in
CWD.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Assisted-By: Claude <noreply@anthropic.com>
2026-05-26 21:55:08 +02:00
thebigbone
d9f2aa895e containerd: use /etc/containerd/conf.d/ drop-in for containerd >= 2.2.0
containerd 2.2.0+ always imports /etc/containerd/conf.d/*.toml,
so write kata-deploy runtime config there directly, avoiding
modification of the main containerd config's imports array.

Signed-off-by: thebigbone <pacman@duck.com>
2026-05-26 21:29:46 +02:00
Fabiano Fidêncio
25491fc20c Merge pull request #13104 from kata-containers/topic/kata-deploy-build-as-an-artefact
kata-deploy: prebuild payload-specific component artifacts
2026-05-25 22:56:55 +02:00
Fabiano Fidêncio
c65d64873b kata-deploy: prebuild payload-specific component artifacts
Build and publish the kata-deploy binary and CoCo guest-pull nydus
snapshotter as dedicated per-arch artifacts, then consume those tarballs
when assembling the kata-deploy image.

This avoids rebuilding those components in the payload image (which
would happen in serial) path and reduces overall CI build time.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-25 22:13:41 +02:00
Fabiano Fidêncio
3dc02a8604 Merge pull request #13085 from Apokleos/erofs-gpt-vmdk-only
runtime-rs: Support erofs snapshotter with gpt vmdk mode
2026-05-25 16:29:59 +02:00