kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-07-01 22:50:54 +00:00

Author	SHA1	Message	Date
Fabiano Fidêncio	5959549645	release: do not publish a kata-monitor-job-dispatcher manifest The shared _publish_multiarch_manifest() helper always derived a "-job-dispatcher" registry from the registries it was given. However, the dispatcher is a kata-deploy-specific sidecar image, so when the helper was reused to publish the kata-monitor multi-arch manifest it wrongly tried to push a non-existent kata-monitor-job-dispatcher image. Let's gate the dispatcher derivation behind KATA_DEPLOY_PUBLISH_JOB_DISPATCHER (defaulting to true so the kata-deploy path is unchanged) and opt out of it when publishing the kata-monitor manifest. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-15 16:29:29 +02:00
Hyounggyu Choi	59fd29fb33	Merge pull request #13225 from BbolroC/use-tar-in-exporting-kate-deploy-files packaging: Optimize kata-deploy build export using tar output	2026-06-15 14:47:01 +02:00
Hyounggyu Choi	46b8e9f027	packaging: Optimize kata-deploy build export using tar output Replace `type=local` with `type=tar` in kata-deploy build to reduce export time and avoid build hangs during the export-to-client-directory phase. Update callers to extract binaries directly from the tar archive instead of copying from an intermediate directory. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2026-06-15 11:52:33 +02:00
Steve Horsman	ea999aa033	Merge pull request #13221 from manuelh-dev/mahuber/nydus-root-export kata-deploy: export nydus snapshotter root	2026-06-15 08:55:08 +01:00
Manuel Huber	639420e7f5	kata-deploy: export nydus snapshotter root containerd uses the proxy plugin root export when reporting CRI image filesystem paths. Without this export, the CRI plugin falls back to /var/lib/containerd/io.containerd.snapshotter.v1.<snapshotter>. For nydus-for-kata-tee this fallback does not match the actual snapshotter root under /var/lib/nydus-for-kata-tee. Kubelet/cAdvisor then fails stats collection when it tries to inspect the nonexistent fallback path. Export the nydus proxy snapshotter root so containerd reports the real filesystem path for resource accounting. When using trusted ephemeral storage or a new ephemeral storage wip feature for providing plain disks, resource accounting would not kick in and pods which exhausted their emptyDir sizeLimits would not get evicted. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-06-12 19:06:01 +02:00
Fabiano Fidêncio	aa27490801	kata-deploy: track distroless static base by tag, not digest The kata-deploy main image pinned its gcr.io/distroless/static-debian13 base by sha256 digest. distroless does not publish versioned tags, so a pinned digest just goes stale with no clear upgrade path. Track the rolling tag instead (guarded with a hadolint DL3007 ignore plus a comment explaining why), matching the kata-deploy-job-dispatcher image base. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 18:58:33 +02:00
Fabiano Fidêncio	c23fe11529	kata-deploy: make verification Job aware of job deployment mode The verification Job assumed the DaemonSet model: it waited for the DaemonSet to exist, for its pods, and for `rollout status daemonset/...`, then required every node in the cluster to be labeled. None of that holds for deploymentMode: job, where install happens via the dispatcher and the per-node Jobs it fans out, and only the targeted (worker) nodes get labeled. Make the hook mode-aware: - Hook weight: in job mode the install dispatcher runs as a post-install hook at weight 5, so verification now runs at weight 10 (after it); daemonset mode keeps weight 0 (the DaemonSet is a normal resource). - Readiness wait: in job mode, wait for the install dispatcher Job to complete and then for the per-node install Jobs (kata-deploy/stage=install) to finish (with the same CRI-restart retry logic) instead of a DaemonSet rollout. - Label check: in job mode, verify exactly the nodes the dispatcher targeted are labeled, rather than comparing the labeled count against all nodes in the cluster. - Grant the verification ClusterRole read access to batch/jobs (used by the job-mode waits; harmless in daemonset mode). The daemonset code path is unchanged and the default render (no verification.pod) is byte-for-byte identical. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 18:58:33 +02:00
Fabiano Fidêncio	3d732986d2	kata-deploy: add per-node staged cleanup for job mode Add the uninstall counterpart to the install dispatcher for deploymentMode: job. On `helm uninstall`, a single pre-delete hook Job runs the kata-deploy-job-dispatcher, which enumerates the targeted nodes live and fans out one node-pinned cleanup Job per node that runs the install pipeline in reverse and exits: unlabel -> revert-cri (initContainers, run sequentially) remove-artifacts (main container) Running as a pre-delete hook means the dispatcher ServiceAccount/RBAC and the kata-deploy host-mutation RBAC still exist while the Jobs run, so the unlabel stage retains node get/patch access. revert-cri and remove-artifacts are host-only operations (privileged nsenter / host mount) and need no extra cluster RBAC. Ordering mirrors install in reverse: unlabel first so the scheduler stops placing kata workloads here, then revert the CRI config + restart the runtime, then remove the on-host artifacts. Each stage is idempotent and skips when already undone, so partially-installed nodes and re-runs are safe. Uninstall node selection is deliberately SEPARATE from install (a dedicated job.cleanup.* block) and defaults to every node carrying the katacontainers.io/kata-runtime label (set by the install label stage) rather than re-evaluating the install selector. Because the cleanup dispatcher resolves nodes live when it runs, this stays robust to install-time selector drift (relabeled nodes, etc.) while remaining fully overridable via job.cleanup.nodes / job.cleanup.nodeSelector / job.cleanup.nodeSelectorExpressions. The default (daemonset) mode is unaffected. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 18:58:33 +02:00
Fabiano Fidêncio	54878fa373	kata-deploy: add job deployment mode driven by the job-dispatcher Phase 2 of the DaemonSet -> staged-Job migration: add an opt-in `deploymentMode: job` that installs Kata via short-lived, per-node install Jobs instead of the long-running DaemonSet. The DaemonSet remains the default and is now gated behind `deploymentMode == daemonset`. Rather than render one Job per node into the Helm release (which grows the release secret O(nodes) and offers no rollout pacing), job mode ships a single tiny post-install/post-upgrade hook Job that runs the kata-deploy-job-dispatcher. The dispatcher enumerates the selected nodes LIVE from the API server and stamps out one node-pinned install Job per node from a constant-size ConfigMap of Job templates, keeping at most `job.parallelism` in flight and refilling as they finish. This guarantees per-node coverage with a paced rollout while the Helm release stays O(1) regardless of fleet size. New nodes are picked up by re-running `helm upgrade`; there is no always-on component. Each per-node Job runs the staged install pipeline as ordered initContainers and exits: host-check -> artifacts -> cri (initContainers, run sequentially) label (main container) The privilege split is explicit: the dispatcher pod is a pure control-plane client (lists nodes, manages Jobs in its own namespace) and runs fully unprivileged under a dedicated, least-privilege ServiceAccount (kata-rbac.yaml); only the per-node Jobs it creates carry the privileged kata-deploy host-mutation rights. Node selection (templates/_helpers.tpl: nodeLabelSelector / perNodeJob): - job.nodes: explicit node-name list passed to the dispatcher, and - job.nodeSelector (equality map) ANDed with - job.nodeSelectorExpressions (k8s label-selector requirements: In / NotIn / Exists / DoesNotExist), compiled into a single label-selector string the dispatcher resolves live. The default expressions target worker (non-control-plane) nodes, so no custom node labeling is required; set the expressions to [] to target all discovered nodes. Reuses the commonEnv/commonVolume* helpers and adds the stageContainer, serviceAccountName, dispatcherServiceAccountName, dispatcherImage and perNodeJob helpers shared by the dispatcher and the staged Jobs. The default (daemonset) render is unchanged. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 18:58:33 +02:00
Fabiano Fidêncio	28fce44b70	kata-deploy: extract shared pod env/volumes into helm helpers Pull the kata-deploy container's environment block and host volume/volumeMount definitions out of the DaemonSet template into reusable named templates in _helpers.tpl: - kata-deploy.commonEnv - kata-deploy.commonVolumeMounts - kata-deploy.commonVolumes These are derived purely from chart values and are independent of the deployment model, so they can be shared verbatim by upcoming per-node install/cleanup Jobs without duplicating the (large) env wiring. Pure refactor: the rendered DaemonSet is byte-for-byte identical to before (verified via normalized `helm template` diff across default and multiInstallSuffix/userDropIn/customRuntimes permutations). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 18:58:33 +02:00
Fabiano Fidêncio	225ff2209e	kata-deploy: split install/cleanup into staged actions Phase 1 of migrating kata-deploy from a DaemonSet to a staged JobSet workflow: refactor the binary's install/cleanup flows into discrete, independently invocable stages while keeping the existing DaemonSet path fully working. Add new staged subcommands that each run one step and exit, so a JobSet can drive them as ordered initContainers/Jobs per node: install: host-check -> artifacts -> cri -> label cleanup (reverse): unlabel -> revert-cri -> remove-artifacts `install` becomes a compatibility wrapper composing the install stages in the canonical order, so the DaemonSet deployment model is unchanged. The DaemonSet `cleanup` (with its DaemonSet-presence gating) is left intact; the staged cleanup actions are added alongside it and skip that gating since the JobSet workflow only schedules them on a real uninstall. Each stage has an idempotent skip check so reruns are safe: - install label / cleanup unlabel: short-circuit via the node label - cleanup remove-artifacts: skip when the install dir is already gone - cleanup revert-cri: skip the disruptive runtime restart when the CRI drop-ins are already absent (new cri_drop_in_present helper) Introduce a shared KATA_RUNTIME_LABEL constant and add rstest-based tests covering the subcommand-name -> Action mapping, rejection of unknown actions, and the visible/hidden help semantics. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 18:58:33 +02:00
Fabiano Fidêncio	d4205c7fcc	kata-deploy: build and publish the kata-deploy-job-dispatcher image Package and ship the dispatcher built in the previous commit so the job-mode Helm chart has an image to run. - Dockerfile.components: build kata-deploy and kata-deploy-job-dispatcher from the same rust-builder stage (one compile), and run fmt/clippy/ test for both crates. - job-dispatcher/Dockerfile: a minimal distroless/static image containing only the dispatcher binary and CA certs - it is an API client, so it needs nothing from the host. - local-build: kata-deploy-job-dispatcher becomes its own build component with its own static tarball (kata-deploy-static-kata-deploy-job-dispatcher.tar.zst); the shared rust-builder output is reused so the two components do not recompile the workspace locally. The payload script builds and pushes a separate "<kata-deploy registry>-job-dispatcher" image with the same tag scheme, and release.sh publishes its multi-arch manifest symmetrically. - CI: add kata-deploy-job-dispatcher to the build-kata-deploy-components matrices (its tarball is picked up by the existing kata-artifacts-* glob), and gate it in the kata-deploy rust static checks. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 18:58:33 +02:00
Fabiano Fidêncio	87d27e0cc8	kata-deploy-job-dispatcher: add generic per-node Job dispatcher Add a small, deployment-agnostic dispatcher binary that runs exactly one Kubernetes Job per selected node and paces the rollout, so callers get guaranteed per-node coverage without encoding the fan-out in Helm. Motivation: templating one Job per node into a Helm release does not scale (the release Secret hits etcd's 1 MiB limit and hooks run sequentially), and a single Indexed Job cannot guarantee per-node coverage when paced - the scheduler ignores completed pods when evaluating topology spread, so nodes get uneven numbers of pods. A tiny dispatcher that enumerates nodes live and creates node-pinned Jobs itself sidesteps both problems and keeps the Helm release O(1) in fleet size. The dispatcher: - enumerates target nodes live (explicit --nodes list or --node-selector label selector), paginating the API; - stamps out one Job per node from a YAML template, pinning it with nodeName and an owner label for server-side filtering; - keeps at most --parallelism Jobs in flight, refilling as they finish, and sets an OwnerReference to the owner Job so the per-node Jobs are garbage-collected with it; - is a plain API client (kube): it never touches the host, so it can run fully unprivileged. Node membership is resolved live on each run, not frozen at Helm template-render time: re-running the dispatcher (e.g. via `helm upgrade`) picks up nodes added since the last run and skips ones already done, as the per-node stages are idempotent. The dispatcher is one-shot, however - it does not watch the API, so nodes added while it is not running are only covered by the next run. job.rs holds the pure helpers (node-name sanitization, deterministic Job naming, template instantiation, status interpretation) with rstest unit tests; main.rs wires up the CLI and the fan-out loop. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 18:58:33 +02:00
Fabiano Fidêncio	56da8097c2	Merge pull request #13204 from fidencio/topic/versions-bump-qemu versions: Bump QEMU to 11.0.1	2026-06-12 17:14:57 +02:00
Fabiano Fidêncio	110843d6e1	Merge pull request #13138 from manuelh-dev/mahuber/runt-rs-mem-file-removal runtime(-rs): remove file_mem_backend config option	2026-06-12 17:13:04 +02:00
Greg Kurz	eac5dd2907	generate_vendor: Fix heavily broken logic While checking the content of the vendor tarball artifact in the 3.31.0 release page, I realized that it is lacking most of the rust code and all the go code. It turns out that the script is badly broken in many ways : 1. Cargo workspace conflicts: Vendored dependencies were treated as workspace members, causing "current package believes it's in a workspace when it's not" errors. Fixed by adding vendor directory exclusions to root Cargo.toml. 2. Missing Go vendoring: Script only searched for Cargo.lock files, never processing go.mod files despite having a case statement for them. Fixed by adding go.mod to the find command with '-o -name go.mod'. 3. Wrong tar execution directory: Script ran tar from release/ directory but vendor_dir_list contained paths relative to repo root (./vendor, ./src/agent/vendor, etc.), causing "Cannot stat" errors. Fixed by moving tar command before final popd. 4. Relative tarball path: Since tar now runs from repo root, converted tarball path to absolute to ensure it's created in the release directory. 5. Vendored go.mod pollution: Added '-path ./vendor -prune' to find command to exclude vendor directory, preventing the script from finding go.mod files inside vendored Rust dependencies. The fixes are simple enough they can be squashed into a single commit. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2026-06-12 10:06:53 +02:00
Manuel Huber	70d8f1bf3d	runtime: remove file_mem_backend config option Remove the Go runtime file_mem_backend and valid_file_mem_backends config knobs, along with the corresponding sandbox annotation handling. The runtime still enables file-backed shared memory automatically for virtio-fs by using /dev/shm as the backing directory. This only removes the user-selectable backend path. Signed-off-by: Manuel Huber <manuelh@nvidia.com> Assisted-by: OpenAI Codex <codex@openai.com>	2026-06-12 00:07:16 +00:00
Fabiano Fidêncio	46add95802	versions: Bump QEMU to 11.0.1 Bump QEMU to its latest release. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-06-11 22:01:26 +02:00
stevenhorsman	1d854ad7af	ci: Update required tests publish-kata-deploy-payload got renamed in #13107, which broke the CI. Now, instead of tracking all those intermediate steps, let's make sure we only track the tests themselves. Signed-off-by: stevenhorsman <steven@uk.ibm.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-06-11 19:02:23 +02:00
Fabiano Fidêncio	5731d30554	helm: add optional kata-monitor deployment to kata-deploy Add a disabled-by-default kata-monitor DaemonSet to the kata-deploy Helm chart, including image/configuration values so operators can enable monitor shipping as part of the same deployment workflow when needed. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: OpenAI Codex <codex@openai.com>	2026-06-09 14:33:30 +02:00
Fabiano Fidêncio	0d6234e7be	ci: share kata image publishing workflows Unify kata-deploy and kata-monitor image publishing behind a single reusable workflow, and rename workflow files to generic kata-images names. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: OpenAI Codex <codex@openai.com>	2026-06-09 14:33:30 +02:00
Fabiano Fidêncio	e04a4326ec	tools: build kata-monitor image from shim-v2-go tarball Build kata-monitor images by extracting the binary from the shim-v2-go tarball and shipping it on top of gcr.io/distroless/static-debian13. Because the binary is built inside an Ubuntu (glibc) toolchain it cannot run on a pure musl/alpine base — users hit __fprintf_chk / __vfprintf_chk relocation errors. To get a small, distroless runtime image we use the same pattern as tools/packaging/kata-deploy/Dockerfile: copy the glibc libraries the binary needs (plus the dynamic linker) via ldd from a glibc base image. In order to do so, we also added a helper script to build and publish architecture-specific monitor images from tarball artifacts. Reported-by: Steve Linde <stevenlinde@google.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: OpenAI Codex <codex@openai.com>	2026-06-09 14:33:30 +02:00
Fabiano Fidêncio	ac2221a6a5	Merge pull request #13004 from fidencio/topic/versions-bump-containerd-to-2.3 versions: Bump containerd to 2.3	2026-06-09 08:21:58 +02:00
Fabiano Fidêncio	48ebbbec3a	kata-deploy: honor debug mode with CLI log-level Make the chart pass --log-level debug automatically when debug=true so CI and troubleshooting runs emit full rendered config dumps without requiring a separate log-level override. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <noreply@cursor.com>	2026-06-08 19:25:48 +02:00
Fabiano Fidêncio	b63494345d	kata-deploy: add configurable verbosity for full CRI config dumps Allow operators to force kata-deploy log verbosity and emit the fully rendered containerd/CRI-O config and drop-in files in debug mode so install troubleshooting can rely on exact effective configuration. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <noreply@cursor.com>	2026-06-08 19:25:48 +02:00
Fabiano Fidêncio	fc08218f55	gatekeeper: rename required tests to minimum/latest The containerd_version matrix values were renamed from lts/active to minimum/latest, which changes the generated CI job names. Update the required-tests list so the gatekeeper waits on the checks that are actually produced. The amd64 run-containerd-stability, run-nydus, run-cri-containerd and free-runner run-k8s-tests jobs map lts -> minimum and active -> latest. The s390x cri-containerd job maps active -> latest, matching its updated matrix. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <noreply@cursor.com>	2026-06-08 19:20:14 +02:00
Fabiano Fidêncio	b119b051cb	kata-deploy: support drop-in configs for default runtimes Allow operators to provide per-shim drop-in TOML for built-in runtimes and reconcile stale override files so upgrades and migrations remain safe when drop-ins are added or removed. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Codex	2026-06-08 13:31:03 +02:00
Fabiano Fidêncio	1ca7129581	Merge pull request #13176 from Amulyam24/kata-deploy-fix kata-deploy: add the imports directive explicitly if expected but not found	2026-06-05 22:24:16 +02:00
Fabiano Fidêncio	f6ff9578d4	Merge pull request #13161 from kata-containers/sprt/remove-configure-mariner ci: remove Mariner annotations and use new config	2026-06-05 20:22:58 +02:00
Fabiano Fidêncio	e9ee97f751	kata-deploy: inherit custom RuntimeClass overhead from baseConfig Default custom runtime RuntimeClass overhead.podFixed to the selected baseConfig values, so equivalent runtimes behave consistently without repeating boilerplate. In case the user wants to enforce that no overhead is set on the custom RuntimeClass, disable inheritance with inheritBaseOverhead=false. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-06-05 17:22:25 +02:00
Amulyam24	b15a5fbe36	kata-deploy: add the imports directive explicitly if expected but not found For containerd v2.2+, the flow assumes that the imports directive would be present. It is better to check it and add if it doesn't exist. Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>	2026-06-05 18:47:07 +05:30
Steve Horsman	1624ebe362	Merge pull request #13135 from kata-containers/dependabot/cargo/tar-0.4.46 build(deps): bump tar from 0.4.45 to 0.4.46	2026-06-05 09:44:46 +01:00
Fabiano Fidêncio	743b0a4839	Merge pull request #13165 from stevenhorsman/bump-go-to-1.25.11 versions: bump golang to 1.25.11	2026-06-04 20:24:57 +02:00
stevenhorsman	81c7dde0ae	ci: Remove kata-monitor test from required The kata-monitor test is currently failing and is running a very EoL version of cri-o. This area is being actively reworked in #13107, so remove this and then once kata-monitor tests are stable we can re-add the new versions Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-06-04 14:40:17 +01:00
dependabot[bot]	4ab63d0a5d	build(deps): bump tar from 0.4.45 to 0.4.46 Bumps [tar](https://github.com/composefs/tar-rs) from 0.4.45 to 0.4.46. - [Release notes](https://github.com/composefs/tar-rs/releases) - [Commits](https://github.com/composefs/tar-rs/compare/0.4.45...0.4.46) --- updated-dependencies: - dependency-name: tar dependency-version: 0.4.46 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2026-06-04 07:52:44 +00:00
stevenhorsman	879912be25	versions: bump golang to 1.25.11 Bump the go version to resolve CVEs: - GO-2026-5037 - GO-2026-5038 - GO-2026-5039 Signed-off-by: stevenhorsman <steven@uk.ibm.com> Generated-By: IBM Bob	2026-06-04 08:49:17 +01:00
Aurélien Bombo	de5333f275	ci: remove Mariner annotations and use new config This is a follow-up to #13126 where we forgot to remove this now-unused code. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-06-03 09:25:12 -05:00
stevenhorsman	51eee428f4	testing/webhook: bump golang.org/x dependencies Bump golang.org/x/net from v0.53.0 to v0.55.0 and golang.org/x/sys from v0.43.0 to v0.44.0 to resolve CVEs: - GO-2026-5024 - GO-2026-5025 - GO-2026-5026 - GO-2026-5027 - GO-2026-5028 - GO-2026-5029 - GO-2026-5030 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-06-03 09:56:54 +01:00
Fabiano Fidêncio	230e01b04e	Merge pull request #13126 from kata-containers/topic/runtimes-introduce-azure-specific-configs runtime/runtime-rs: introduce Azure specific configs	2026-06-02 09:17:09 +02:00
Fabiano Fidêncio	57de50f43c	Merge pull request #13141 from fidencio/topic/kata-deploy-fix-stale-containerd-import kata-deploy: scrub stale containerd import on conf.d migration	2026-06-01 18:13:08 +02:00
Greg Kurz	8a49ecb159	Merge pull request #13097 from BbolroC/fix-shim-components-for-s390x ci: Refactor boot-image-se build and update shim components	2026-06-01 11:43:42 +02:00
Fabiano Fidêncio	f788997253	kata-deploy: scrub stale containerd import on conf.d migration Since the conf.d migration (containerd >= 2.2.0), kata-deploy writes its drop-in to the auto-imported /etc/containerd/conf.d/ and no longer manages the main config's `imports` array. A node upgraded from a pre-conf.d kata-deploy keeps the legacy `{dest_dir}/containerd/config.d/kata-deploy.toml` entry in `imports`, since the new code neither adds nor removes it. On uninstall, remove_artifacts() deletes the artifacts dir (including the file that import still points at) and then restarts containerd, which fails to load the now-dangling import and wedges the node: pods get stuck Terminating and new pods cannot start. This broke the lifecycle-manager E2E tests (TC-02..TC-07) which repeatedly upgrade then reinstall across the 3.30.0 -> latest version boundary. Defensively scrub the legacy import from the main containerd config in both configure_containerd (at conf.d migration time) and cleanup_containerd (before artifacts are removed and containerd is restarted). The helper is a no-op when the config is absent, has no `imports` array, or does not contain the legacy entry. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-06-01 11:07:13 +02:00
Fabiano Fidêncio	02fd572195	Merge pull request #13134 from jojimt/rc-version kata-deploy: Add a version annotation to runtimeclass	2026-06-01 08:21:30 +02:00
manuelh-dev	953b306ff3	Merge pull request #12979 from manuelh-dev/mahuber/erofs-tmpfs-mount runtime-rs/agent: support EROFS snapshots without a rwlayer	2026-05-29 13:50:27 -07:00
Fabiano Fidêncio	f349d19bf4	Merge pull request #12956 from zvonkok/nvgpu-tarball-chart build: add kata-deploy-publish target	2026-05-29 21:22:44 +02:00
Joji Mekkattuparamban	8549d71c6f	kata-deploy: Add a version annotation to runtimeclass Enables automations to determine version with a simple read RBAC on the runtime class. Helpful when versions need to match with other tools (e.g. genpolicy) or when simple version determination is needed for other reasons. Fixes #13123 Signed-off-by: Joji Mekkattuparamban <jojim@nvidia.com>	2026-05-29 10:50:19 -07:00
Zvonko Kaiser	7f906ec95d	build: add kata-deploy-publish target Mirror the CI payload publish flow in local builds, including image and helm chart publishing, while reusing the same chart upload helper in payload-after-push to avoid duplicated chart packaging logic. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-05-29 16:22:12 +02:00
Zvonko Kaiser	fb73ccc352	build: include kata-deploy static artifacts in nvgpu bundle Build and package kata-deploy binary and nydus snapshotter component tarballs as part of nvgpu-tarball so local publish can consume a single kata-static.tar.zst without rebuilding extra artifacts. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-05-29 16:22:12 +02:00
Fabiano Fidêncio	9729ed9993	kernel: enable InfiniBand/RoCE support in mlx5 kernel config fragment Add the kernel configuration options required for RDMA / RoCE operation with Mellanox ConnectX / BlueField VFs: - CONFIG_INFINIBAND: IB subsystem core - CONFIG_INFINIBAND_ADDR_TRANS: RoCEv2 GID table management - CONFIG_INFINIBAND_USER_ACCESS: userspace verbs (/dev/infiniband/uverbs*) - CONFIG_INFINIBAND_USER_MAD: userspace MAD interface - CONFIG_MLX5_INFINIBAND: mlx5_ib ConnectX IB/RoCE driver - CONFIG_CGROUP_RDMA: RDMA cgroup controller (required by mlx5_ib) Bump kata_config_version to 196 to trigger a kernel rebuild. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-05-29 13:07:45 +02:00
Hyounggyu Choi	640fa488a5	ci: Refactor boot-image-se build and update shim components - Add FAKE_SE_IMAGE mode support in SE image build scripts for CI without real SE setup - Simplify workflow by removing build-asset-boot-image-se job - Integrate fake-boot-image-se into build matrix instead of separate job - Skip attestation for fake-boot-image-se builds - Update qemu-se and qemu-se-runtime-rs shim components to use: - rootfs-initrd-confidential instead of rootfs-image-confidential - boot-image-se component This change streamlines the s390x SE build process and makes it easier to test without requiring actual Secure Execution infrastructure. This fixes deployment issues on non-TEE systems where TEE-specific artifacts (like boot-image-se for IBM SEL) are not included in the kata-deploy image, while ensuring TEE systems still get all required components. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2026-05-29 11:35:40 +02:00

1 2 3 4 5 ...

2429 Commits