Commit Graph

2429 Commits

Author SHA1 Message Date
Fabiano Fidêncio
5959549645 release: do not publish a kata-monitor-job-dispatcher manifest
The shared _publish_multiarch_manifest() helper always derived a
"-job-dispatcher" registry from the registries it was given. However, the
dispatcher is a kata-deploy-specific sidecar image, so when the helper
was reused to publish the kata-monitor multi-arch manifest it wrongly
tried to push a non-existent kata-monitor-job-dispatcher image.

Let's gate the dispatcher derivation behind
KATA_DEPLOY_PUBLISH_JOB_DISPATCHER (defaulting to true so the
kata-deploy path is unchanged) and opt out of it when publishing the
kata-monitor manifest.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-06-15 16:29:29 +02:00
Hyounggyu Choi
59fd29fb33 Merge pull request #13225 from BbolroC/use-tar-in-exporting-kate-deploy-files
packaging: Optimize kata-deploy build export using tar output
2026-06-15 14:47:01 +02:00
Hyounggyu Choi
46b8e9f027 packaging: Optimize kata-deploy build export using tar output
Replace `type=local` with `type=tar` in kata-deploy build to reduce
export time and avoid build hangs during the export-to-client-directory
phase.

Update callers to extract binaries directly from the tar archive instead
of copying from an intermediate directory.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2026-06-15 11:52:33 +02:00
Steve Horsman
ea999aa033 Merge pull request #13221 from manuelh-dev/mahuber/nydus-root-export
kata-deploy: export nydus snapshotter root
2026-06-15 08:55:08 +01:00
Manuel Huber
639420e7f5 kata-deploy: export nydus snapshotter root
containerd uses the proxy plugin root export when reporting CRI image
filesystem paths. Without this export, the CRI plugin falls back to
/var/lib/containerd/io.containerd.snapshotter.v1.<snapshotter>.

For nydus-for-kata-tee this fallback does not match the actual
snapshotter root under /var/lib/nydus-for-kata-tee.
Kubelet/cAdvisor then fails stats collection when it tries to inspect
the nonexistent fallback path.

Export the nydus proxy snapshotter root so containerd reports the real
filesystem path for resource accounting.

When using trusted ephemeral storage or a new ephemeral storage wip
feature for providing plain disks, resource accounting would not kick
in and pods which exhausted their emptyDir sizeLimits would not get
evicted.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-06-12 19:06:01 +02:00
Fabiano Fidêncio
aa27490801 kata-deploy: track distroless static base by tag, not digest
The kata-deploy main image pinned its gcr.io/distroless/static-debian13
base by sha256 digest. distroless does not publish versioned tags, so a
pinned digest just goes stale with no clear upgrade path. Track the
rolling tag instead (guarded with a hadolint DL3007 ignore plus a comment
explaining why), matching the kata-deploy-job-dispatcher image base.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-06-12 18:58:33 +02:00
Fabiano Fidêncio
c23fe11529 kata-deploy: make verification Job aware of job deployment mode
The verification Job assumed the DaemonSet model: it waited for the
DaemonSet to exist, for its pods, and for `rollout status daemonset/...`,
then required every node in the cluster to be labeled. None of that holds
for deploymentMode: job, where install happens via the dispatcher and the
per-node Jobs it fans out, and only the targeted (worker) nodes get
labeled.

Make the hook mode-aware:
  - Hook weight: in job mode the install dispatcher runs as a
    post-install hook at weight 5, so verification now runs at weight 10
    (after it); daemonset mode keeps weight 0 (the DaemonSet is a normal
    resource).
  - Readiness wait: in job mode, wait for the install dispatcher Job to
    complete and then for the per-node install Jobs
    (kata-deploy/stage=install) to finish (with the same CRI-restart
    retry logic) instead of a DaemonSet rollout.
  - Label check: in job mode, verify exactly the nodes the dispatcher
    targeted are labeled, rather than comparing the labeled count against
    all nodes in the cluster.
  - Grant the verification ClusterRole read access to batch/jobs (used by
    the job-mode waits; harmless in daemonset mode).

The daemonset code path is unchanged and the default render (no
verification.pod) is byte-for-byte identical.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-06-12 18:58:33 +02:00
Fabiano Fidêncio
3d732986d2 kata-deploy: add per-node staged cleanup for job mode
Add the uninstall counterpart to the install dispatcher for
deploymentMode: job. On `helm uninstall`, a single pre-delete hook Job
runs the kata-deploy-job-dispatcher, which enumerates the targeted nodes
live and fans out one node-pinned cleanup Job per node that runs the
install pipeline in reverse and exits:

  unlabel -> revert-cri   (initContainers, run sequentially)
  remove-artifacts        (main container)

Running as a pre-delete hook means the dispatcher ServiceAccount/RBAC and
the kata-deploy host-mutation RBAC still exist while the Jobs run, so the
unlabel stage retains node get/patch access. revert-cri and
remove-artifacts are host-only operations (privileged nsenter / host
mount) and need no extra cluster RBAC.

Ordering mirrors install in reverse: unlabel first so the scheduler stops
placing kata workloads here, then revert the CRI config + restart the
runtime, then remove the on-host artifacts. Each stage is idempotent and
skips when already undone, so partially-installed nodes and re-runs are
safe.

Uninstall node selection is deliberately SEPARATE from install (a
dedicated job.cleanup.* block) and defaults to every node carrying the
katacontainers.io/kata-runtime label (set by the install label stage)
rather than re-evaluating the install selector. Because the cleanup
dispatcher resolves nodes live when it runs, this stays robust to
install-time selector drift (relabeled nodes, etc.) while remaining fully
overridable via job.cleanup.nodes / job.cleanup.nodeSelector /
job.cleanup.nodeSelectorExpressions. The default (daemonset) mode is
unaffected.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-06-12 18:58:33 +02:00
Fabiano Fidêncio
54878fa373 kata-deploy: add job deployment mode driven by the job-dispatcher
Phase 2 of the DaemonSet -> staged-Job migration: add an opt-in
`deploymentMode: job` that installs Kata via short-lived, per-node
install Jobs instead of the long-running DaemonSet. The DaemonSet remains
the default and is now gated behind `deploymentMode == daemonset`.

Rather than render one Job per node into the Helm release (which grows
the release secret O(nodes) and offers no rollout pacing), job mode ships
a single tiny post-install/post-upgrade hook Job that runs the
kata-deploy-job-dispatcher. The dispatcher enumerates the selected nodes
LIVE from the API server and stamps out one node-pinned install Job per
node from a constant-size ConfigMap of Job templates, keeping at most
`job.parallelism` in flight and refilling as they finish. This guarantees
per-node coverage with a paced rollout while the Helm release stays O(1)
regardless of fleet size. New nodes are picked up by re-running
`helm upgrade`; there is no always-on component.

Each per-node Job runs the staged install pipeline as ordered
initContainers and exits:

  host-check -> artifacts -> cri   (initContainers, run sequentially)
  label                            (main container)

The privilege split is explicit: the dispatcher pod is a pure
control-plane client (lists nodes, manages Jobs in its own namespace) and
runs fully unprivileged under a dedicated, least-privilege ServiceAccount
(kata-rbac.yaml); only the per-node Jobs it creates carry the privileged
kata-deploy host-mutation rights.

Node selection (templates/_helpers.tpl: nodeLabelSelector / perNodeJob):
  - job.nodes: explicit node-name list passed to the dispatcher, and
  - job.nodeSelector (equality map) ANDed with
  - job.nodeSelectorExpressions (k8s label-selector requirements:
    In / NotIn / Exists / DoesNotExist),
compiled into a single label-selector string the dispatcher resolves
live. The default expressions target worker (non-control-plane) nodes, so
no custom node labeling is required; set the expressions to [] to target
all discovered nodes.

Reuses the commonEnv/commonVolume* helpers and adds the stageContainer,
serviceAccountName, dispatcherServiceAccountName, dispatcherImage and
perNodeJob helpers shared by the dispatcher and the staged Jobs. The
default (daemonset) render is unchanged.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-06-12 18:58:33 +02:00
Fabiano Fidêncio
28fce44b70 kata-deploy: extract shared pod env/volumes into helm helpers
Pull the kata-deploy container's environment block and host
volume/volumeMount definitions out of the DaemonSet template into
reusable named templates in _helpers.tpl:

  - kata-deploy.commonEnv
  - kata-deploy.commonVolumeMounts
  - kata-deploy.commonVolumes

These are derived purely from chart values and are independent of the
deployment model, so they can be shared verbatim by upcoming per-node
install/cleanup Jobs without duplicating the (large) env wiring.

Pure refactor: the rendered DaemonSet is byte-for-byte identical to
before (verified via normalized `helm template` diff across default and
multiInstallSuffix/userDropIn/customRuntimes permutations).

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-06-12 18:58:33 +02:00
Fabiano Fidêncio
225ff2209e kata-deploy: split install/cleanup into staged actions
Phase 1 of migrating kata-deploy from a DaemonSet to a staged JobSet
workflow: refactor the binary's install/cleanup flows into discrete,
independently invocable stages while keeping the existing DaemonSet
path fully working.

Add new staged subcommands that each run one step and exit, so a JobSet
can drive them as ordered initContainers/Jobs per node:

  install: host-check -> artifacts -> cri -> label
  cleanup (reverse): unlabel -> revert-cri -> remove-artifacts

`install` becomes a compatibility wrapper composing the install stages
in the canonical order, so the DaemonSet deployment model is unchanged.
The DaemonSet `cleanup` (with its DaemonSet-presence gating) is left
intact; the staged cleanup actions are added alongside it and skip that
gating since the JobSet workflow only schedules them on a real uninstall.

Each stage has an idempotent skip check so reruns are safe:
  - install label / cleanup unlabel: short-circuit via the node label
  - cleanup remove-artifacts: skip when the install dir is already gone
  - cleanup revert-cri: skip the disruptive runtime restart when the CRI
    drop-ins are already absent (new cri_drop_in_present helper)

Introduce a shared KATA_RUNTIME_LABEL constant and add rstest-based
tests covering the subcommand-name -> Action mapping, rejection of
unknown actions, and the visible/hidden help semantics.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-06-12 18:58:33 +02:00
Fabiano Fidêncio
d4205c7fcc kata-deploy: build and publish the kata-deploy-job-dispatcher image
Package and ship the dispatcher built in the previous commit so the
job-mode Helm chart has an image to run.

  - Dockerfile.components: build kata-deploy and kata-deploy-job-dispatcher
    from the same rust-builder stage (one compile), and run fmt/clippy/
    test for both crates.
  - job-dispatcher/Dockerfile: a minimal distroless/static image containing
    only the dispatcher binary and CA certs - it is an API client, so it
    needs nothing from the host.
  - local-build: kata-deploy-job-dispatcher becomes its own build component
    with its own static tarball
    (kata-deploy-static-kata-deploy-job-dispatcher.tar.zst); the shared
    rust-builder output is reused so the two components do not recompile
    the workspace locally. The payload script builds and pushes a separate
    "<kata-deploy registry>-job-dispatcher" image with the same tag scheme,
    and release.sh publishes its multi-arch manifest symmetrically.
  - CI: add kata-deploy-job-dispatcher to the build-kata-deploy-components
    matrices (its tarball is picked up by the existing kata-artifacts-*
    glob), and gate it in the kata-deploy rust static checks.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-06-12 18:58:33 +02:00
Fabiano Fidêncio
87d27e0cc8 kata-deploy-job-dispatcher: add generic per-node Job dispatcher
Add a small, deployment-agnostic dispatcher binary that runs exactly one
Kubernetes Job per selected node and paces the rollout, so callers get
guaranteed per-node coverage without encoding the fan-out in Helm.

Motivation: templating one Job per node into a Helm release does not
scale (the release Secret hits etcd's 1 MiB limit and hooks run
sequentially), and a single Indexed Job cannot guarantee per-node
coverage when paced - the scheduler ignores completed pods when
evaluating topology spread, so nodes get uneven numbers of pods. A tiny
dispatcher that enumerates nodes live and creates node-pinned Jobs itself
sidesteps both problems and keeps the Helm release O(1) in fleet size.

The dispatcher:
  - enumerates target nodes live (explicit --nodes list or
    --node-selector label selector), paginating the API;
  - stamps out one Job per node from a YAML template, pinning it with
    nodeName and an owner label for server-side filtering;
  - keeps at most --parallelism Jobs in flight, refilling as they finish,
    and sets an OwnerReference to the owner Job so the per-node Jobs are
    garbage-collected with it;
  - is a plain API client (kube): it never touches the host, so it can
    run fully unprivileged.

Node membership is resolved live on each run, not frozen at Helm
template-render time: re-running the dispatcher (e.g. via `helm upgrade`)
picks up nodes added since the last run and skips ones already done, as
the per-node stages are idempotent. The dispatcher is one-shot, however
- it does not watch the API, so nodes added while it is not running are
only covered by the next run.

job.rs holds the pure helpers (node-name sanitization, deterministic Job
naming, template instantiation, status interpretation) with rstest unit
tests; main.rs wires up the CLI and the fan-out loop.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-06-12 18:58:33 +02:00
Fabiano Fidêncio
56da8097c2 Merge pull request #13204 from fidencio/topic/versions-bump-qemu
versions: Bump QEMU to 11.0.1
2026-06-12 17:14:57 +02:00
Fabiano Fidêncio
110843d6e1 Merge pull request #13138 from manuelh-dev/mahuber/runt-rs-mem-file-removal
runtime(-rs): remove file_mem_backend config option
2026-06-12 17:13:04 +02:00
Greg Kurz
eac5dd2907 generate_vendor: Fix heavily broken logic
While checking the content of the vendor tarball artifact in the 3.31.0
release page, I realized that it is lacking most of the rust code and
all the go code. It turns out that the script is badly broken in many
ways :

1. Cargo workspace conflicts: Vendored dependencies were treated as
   workspace members, causing "current package believes it's in a
   workspace when it's not" errors. Fixed by adding vendor directory
   exclusions to root Cargo.toml.

2. Missing Go vendoring: Script only searched for Cargo.lock files,
   never processing go.mod files despite having a case statement for
   them. Fixed by adding go.mod to the find command with '-o -name go.mod'.

3. Wrong tar execution directory: Script ran tar from release/ directory
   but vendor_dir_list contained paths relative to repo root (./vendor,
   ./src/agent/vendor, etc.), causing "Cannot stat" errors. Fixed by
   moving tar command before final popd.

4. Relative tarball path: Since tar now runs from repo root, converted
   tarball path to absolute to ensure it's created in the release
   directory.

5. Vendored go.mod pollution: Added '-path ./vendor -prune' to find
   command to exclude vendor directory, preventing the script from
   finding go.mod files inside vendored Rust dependencies.

The fixes are simple enough they can be squashed into a single
commit.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Signed-off-by: Greg Kurz <groug@kaod.org>
2026-06-12 10:06:53 +02:00
Manuel Huber
70d8f1bf3d runtime: remove file_mem_backend config option
Remove the Go runtime file_mem_backend and valid_file_mem_backends
config knobs, along with the corresponding sandbox annotation handling.

The runtime still enables file-backed shared memory automatically for
virtio-fs by using /dev/shm as the backing directory. This only removes
the user-selectable backend path.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Assisted-by: OpenAI Codex <codex@openai.com>
2026-06-12 00:07:16 +00:00
Fabiano Fidêncio
46add95802 versions: Bump QEMU to 11.0.1
Bump QEMU to its latest release.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-06-11 22:01:26 +02:00
stevenhorsman
1d854ad7af ci: Update required tests
publish-kata-deploy-payload got renamed in #13107, which broke the CI.

Now, instead of tracking all those intermediate steps, let's make sure
we only track the tests themselves.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-06-11 19:02:23 +02:00
Fabiano Fidêncio
5731d30554 helm: add optional kata-monitor deployment to kata-deploy
Add a disabled-by-default kata-monitor DaemonSet to the kata-deploy Helm chart,
including image/configuration values so operators can enable monitor shipping as
part of the same deployment workflow when needed.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: OpenAI Codex <codex@openai.com>
2026-06-09 14:33:30 +02:00
Fabiano Fidêncio
0d6234e7be ci: share kata image publishing workflows
Unify kata-deploy and kata-monitor image publishing behind a single
reusable workflow, and rename workflow files to generic kata-images
names.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: OpenAI Codex <codex@openai.com>
2026-06-09 14:33:30 +02:00
Fabiano Fidêncio
e04a4326ec tools: build kata-monitor image from shim-v2-go tarball
Build kata-monitor images by extracting the binary from the
shim-v2-go tarball and shipping it on top of
gcr.io/distroless/static-debian13.

Because the binary is built inside an Ubuntu (glibc) toolchain it
cannot run on a pure musl/alpine base — users hit __fprintf_chk /
__vfprintf_chk relocation errors. To get a small, distroless
runtime image we use the same pattern as
tools/packaging/kata-deploy/Dockerfile: copy the glibc libraries
the binary needs (plus the dynamic linker) via ldd from a glibc
base image.

In order to do so, we also added a helper script to build and
publish architecture-specific monitor images from tarball
artifacts.

Reported-by: Steve Linde <stevenlinde@google.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: OpenAI Codex <codex@openai.com>
2026-06-09 14:33:30 +02:00
Fabiano Fidêncio
ac2221a6a5 Merge pull request #13004 from fidencio/topic/versions-bump-containerd-to-2.3
versions: Bump containerd to 2.3
2026-06-09 08:21:58 +02:00
Fabiano Fidêncio
48ebbbec3a kata-deploy: honor debug mode with CLI log-level
Make the chart pass --log-level debug automatically when debug=true so
CI and troubleshooting runs emit full rendered config dumps without
requiring a separate log-level override.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
2026-06-08 19:25:48 +02:00
Fabiano Fidêncio
b63494345d kata-deploy: add configurable verbosity for full CRI config dumps
Allow operators to force kata-deploy log verbosity and emit the fully
rendered containerd/CRI-O config and drop-in files in debug mode so
install troubleshooting can rely on exact effective configuration.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
2026-06-08 19:25:48 +02:00
Fabiano Fidêncio
fc08218f55 gatekeeper: rename required tests to minimum/latest
The containerd_version matrix values were renamed from lts/active to
minimum/latest, which changes the generated CI job names.  Update the
required-tests list so the gatekeeper waits on the checks that are
actually produced.

The amd64 run-containerd-stability, run-nydus, run-cri-containerd and
free-runner run-k8s-tests jobs map lts -> minimum and active -> latest.
The s390x cri-containerd job maps active -> latest, matching its
updated matrix.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
2026-06-08 19:20:14 +02:00
Fabiano Fidêncio
b119b051cb kata-deploy: support drop-in configs for default runtimes
Allow operators to provide per-shim drop-in TOML for built-in runtimes
and reconcile stale override files so upgrades and migrations remain
safe when drop-ins are added or removed.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Codex
2026-06-08 13:31:03 +02:00
Fabiano Fidêncio
1ca7129581 Merge pull request #13176 from Amulyam24/kata-deploy-fix
kata-deploy: add the imports directive explicitly if expected but not found
2026-06-05 22:24:16 +02:00
Fabiano Fidêncio
f6ff9578d4 Merge pull request #13161 from kata-containers/sprt/remove-configure-mariner
ci: remove Mariner annotations and use new config
2026-06-05 20:22:58 +02:00
Fabiano Fidêncio
e9ee97f751 kata-deploy: inherit custom RuntimeClass overhead from baseConfig
Default custom runtime RuntimeClass overhead.podFixed to the selected
baseConfig values, so equivalent runtimes behave consistently without
repeating boilerplate.

In case the user wants to enforce that no overhead is set on the custom
RuntimeClass, disable inheritance with inheritBaseOverhead=false.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-06-05 17:22:25 +02:00
Amulyam24
b15a5fbe36 kata-deploy: add the imports directive explicitly if expected but not found
For containerd v2.2+, the flow assumes that the imports directive would be present.
It is better to check it and add if it doesn't exist.

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2026-06-05 18:47:07 +05:30
Steve Horsman
1624ebe362 Merge pull request #13135 from kata-containers/dependabot/cargo/tar-0.4.46
build(deps): bump tar from 0.4.45 to 0.4.46
2026-06-05 09:44:46 +01:00
Fabiano Fidêncio
743b0a4839 Merge pull request #13165 from stevenhorsman/bump-go-to-1.25.11
versions: bump golang to 1.25.11
2026-06-04 20:24:57 +02:00
stevenhorsman
81c7dde0ae ci: Remove kata-monitor test from required
The kata-monitor test is currently failing and is running a very EoL
version of cri-o. This area is being actively reworked in #13107,
so remove this and then once kata-monitor tests are stable we
can re-add the new versions

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-06-04 14:40:17 +01:00
dependabot[bot]
4ab63d0a5d build(deps): bump tar from 0.4.45 to 0.4.46
Bumps [tar](https://github.com/composefs/tar-rs) from 0.4.45 to 0.4.46.
- [Release notes](https://github.com/composefs/tar-rs/releases)
- [Commits](https://github.com/composefs/tar-rs/compare/0.4.45...0.4.46)

---
updated-dependencies:
- dependency-name: tar
  dependency-version: 0.4.46
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-06-04 07:52:44 +00:00
stevenhorsman
879912be25 versions: bump golang to 1.25.11
Bump the go version to resolve CVEs:
- GO-2026-5037
- GO-2026-5038
- GO-2026-5039

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Generated-By: IBM Bob
2026-06-04 08:49:17 +01:00
Aurélien Bombo
de5333f275 ci: remove Mariner annotations and use new config
This is a follow-up to #13126 where we forgot to remove this now-unused code.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-06-03 09:25:12 -05:00
stevenhorsman
51eee428f4 testing/webhook: bump golang.org/x dependencies
Bump golang.org/x/net from v0.53.0 to v0.55.0 and golang.org/x/sys
from v0.43.0 to v0.44.0 to resolve CVEs:
- GO-2026-5024
- GO-2026-5025
- GO-2026-5026
 - GO-2026-5027
- GO-2026-5028
- GO-2026-5029
- GO-2026-5030

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-06-03 09:56:54 +01:00
Fabiano Fidêncio
230e01b04e Merge pull request #13126 from kata-containers/topic/runtimes-introduce-azure-specific-configs
runtime/runtime-rs: introduce Azure specific configs
2026-06-02 09:17:09 +02:00
Fabiano Fidêncio
57de50f43c Merge pull request #13141 from fidencio/topic/kata-deploy-fix-stale-containerd-import
kata-deploy: scrub stale containerd import on conf.d migration
2026-06-01 18:13:08 +02:00
Greg Kurz
8a49ecb159 Merge pull request #13097 from BbolroC/fix-shim-components-for-s390x
ci: Refactor boot-image-se build and update shim components
2026-06-01 11:43:42 +02:00
Fabiano Fidêncio
f788997253 kata-deploy: scrub stale containerd import on conf.d migration
Since the conf.d migration (containerd >= 2.2.0), kata-deploy writes its
drop-in to the auto-imported /etc/containerd/conf.d/ and no longer manages
the main config's `imports` array. A node upgraded from a pre-conf.d
kata-deploy keeps the legacy `{dest_dir}/containerd/config.d/kata-deploy.toml`
entry in `imports`, since the new code neither adds nor removes it.

On uninstall, remove_artifacts() deletes the artifacts dir (including the
file that import still points at) and then restarts containerd, which fails
to load the now-dangling import and wedges the node: pods get stuck
Terminating and new pods cannot start. This broke the lifecycle-manager E2E
tests (TC-02..TC-07) which repeatedly upgrade then reinstall across the
3.30.0 -> latest version boundary.

Defensively scrub the legacy import from the main containerd config in both
configure_containerd (at conf.d migration time) and cleanup_containerd
(before artifacts are removed and containerd is restarted). The helper is a
no-op when the config is absent, has no `imports` array, or does not contain
the legacy entry.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-06-01 11:07:13 +02:00
Fabiano Fidêncio
02fd572195 Merge pull request #13134 from jojimt/rc-version
kata-deploy: Add a version annotation to runtimeclass
2026-06-01 08:21:30 +02:00
manuelh-dev
953b306ff3 Merge pull request #12979 from manuelh-dev/mahuber/erofs-tmpfs-mount
runtime-rs/agent: support EROFS snapshots without a rwlayer
2026-05-29 13:50:27 -07:00
Fabiano Fidêncio
f349d19bf4 Merge pull request #12956 from zvonkok/nvgpu-tarball-chart
build: add kata-deploy-publish target
2026-05-29 21:22:44 +02:00
Joji Mekkattuparamban
8549d71c6f kata-deploy: Add a version annotation to runtimeclass
Enables automations to determine version with a simple read RBAC
on the runtime class. Helpful when versions need to match with other
tools (e.g. genpolicy) or when simple version determination is needed
for other reasons.

Fixes #13123

Signed-off-by: Joji Mekkattuparamban <jojim@nvidia.com>
2026-05-29 10:50:19 -07:00
Zvonko Kaiser
7f906ec95d build: add kata-deploy-publish target
Mirror the CI payload publish flow in local builds, including image and
helm chart publishing, while reusing the same chart upload helper in
payload-after-push to avoid duplicated chart packaging logic.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-05-29 16:22:12 +02:00
Zvonko Kaiser
fb73ccc352 build: include kata-deploy static artifacts in nvgpu bundle
Build and package kata-deploy binary and nydus snapshotter component
tarballs as part of nvgpu-tarball so local publish can consume a single
kata-static.tar.zst without rebuilding extra artifacts.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-05-29 16:22:12 +02:00
Fabiano Fidêncio
9729ed9993 kernel: enable InfiniBand/RoCE support in mlx5 kernel config fragment
Add the kernel configuration options required for RDMA / RoCE operation
with Mellanox ConnectX / BlueField VFs:

  - CONFIG_INFINIBAND: IB subsystem core
  - CONFIG_INFINIBAND_ADDR_TRANS: RoCEv2 GID table management
  - CONFIG_INFINIBAND_USER_ACCESS: userspace verbs (/dev/infiniband/uverbs*)
  - CONFIG_INFINIBAND_USER_MAD: userspace MAD interface
  - CONFIG_MLX5_INFINIBAND: mlx5_ib ConnectX IB/RoCE driver
  - CONFIG_CGROUP_RDMA: RDMA cgroup controller (required by mlx5_ib)

Bump kata_config_version to 196 to trigger a kernel rebuild.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-05-29 13:07:45 +02:00
Hyounggyu Choi
640fa488a5 ci: Refactor boot-image-se build and update shim components
- Add FAKE_SE_IMAGE mode support in SE image build scripts for CI without real SE setup
- Simplify workflow by removing build-asset-boot-image-se job
- Integrate fake-boot-image-se into build matrix instead of separate job
- Skip attestation for fake-boot-image-se builds
- Update qemu-se and qemu-se-runtime-rs shim components to use:
  - rootfs-initrd-confidential instead of rootfs-image-confidential
  - boot-image-se component

This change streamlines the s390x SE build process and makes it easier
to test without requiring actual Secure Execution infrastructure.
This fixes deployment issues on non-TEE systems where TEE-specific artifacts
(like boot-image-se for IBM SEL) are not included in the kata-deploy image,
while ensuring TEE systems still get all required components.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2026-05-29 11:35:40 +02:00