kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-07-02 07:02:16 +00:00

Author	SHA1	Message	Date
Alex Lyn	e77795f573	ci: Update libs required-test names for libdevmapper dependency Update the two affected entries in required-tests.yaml accordingly so the gatekeeper keeps matching them instead of blocking subsequent PRs after this one merges. Co-authored-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-26 09:52:47 +08:00
Alex Lyn	adcbef0c53	kata-deploy: Configure containerd erofs for dm-verity integrity mode The deploy will read EROFS_SNAPSHOTTER_MODE and EROFS_DMVERITY from the environment to enable dmverity_mode and enable_dmverity in the containerd erofs snapshotter/differ config. Add validation for the mode value and use an explicit 300s timeout for node-readiness checks during kata-deply in github CI. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-26 09:51:05 +08:00
Alex Lyn	562c9acdb2	packaging: Add libdevmapper-dev and GNU target to agent Dockerfile Install libdevmapper-dev and pkg-config in the agent build container so devicemapper-sys can link against libdevmapper. Add the GNU libc rustup target alongside musl since USE_DEVMAPPER forces LIBC=gnu. Forward USE_DEVMAPPER through build.sh and build-static-agent.sh. And you can compile the device mapper in kata-agent as below: ``` $ make LIBC=gnu USE_DEVMAPPER=yes ``` Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-26 09:51:05 +08:00
Alex Lyn	b084c0df36	kata-deploy: Forward USE_DEVMAPPER in local build scripts Pass USE_DEVMAPPER through the Docker environment in local build scripts. Extract the OCI tag sanitization logic into a public helper of sanitize_tag_component to keep push and pull paths consistent. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-26 09:51:05 +08:00
Alex Lyn	2dd9426029	kata-deploy: Add erofsSnapshotterMode helm value and integrity mode Expose erofsSnapshotterMode in the helm chart values and render it as the EROFS_SNAPSHOTTER_MODE environment variable in the kata-deploy pod. Update gha-run-k8s-common.sh to load dm-mod/dm-verity kernel modules and configure the erofs default size when the mode is "integrity". Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-26 09:51:05 +08:00
Fabiano Fidêncio	a664595084	kata-deploy: bump qemu RuntimeClass overhead for the aarch64 VMM With sandbox_cgroup_only the shim, QEMU and virtiofsd run inside the pod's memory cgroup, whose limit is the workload limit plus the RuntimeClass pod overhead. On aarch64 the VMM host footprint is much larger than on x86 (QEMU's own anon RSS is ~160Mi+ before any guest RAM, on top of the shmem-backed guest memory), so the 160Mi overhead is too small: small-memory-limit pods get their qemu-system process OOM-killed by the pod cgroup (CONSTRAINT_MEMCG), and the agent vsock never comes up (ENODEV), so the sandbox fails to start. Raise the pod overhead to 320Mi for the qemu shims that run on aarch64 (qemu, qemu-runtime-rs, qemu-coco-dev-runtime-rs). The value is applied on all architectures for simplicity; x86 is over-provisioned by ~160Mi, which is acceptable. The TEE/GPU shims already carry far larger overhead and amd64-only shims (clh*, dragonball, fc) are unaffected. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-25 13:56:11 +02:00
Aurélien Bombo	1217dd1584	Merge pull request #12373 from kata-containers/disable-guest-empty-dir runtime: Set `disable_guest_empty_dir = true` by default	2026-06-24 20:09:46 -05:00
Aurélien Bombo	10cf6816aa	kernel: Fix FUSE crash with host emptyDir This patch was submitted by Miklos Szeredi: https://lore.kernel.org/fuse-devel/20260528142306.1792392-1-mszeredi@redhat.com/ It fixes a FUSE oops with the k8s-shared-volume.bats test. Fixes: #12589 Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-06-24 15:22:13 -05:00
LandonTClipp	85e828cc9b	docs: Add AI agent skill for doc contributions This skill will inform AI agents how to properly write and format docs in the new docs system. There is nothing too fancy, just reminding agents to use mkdocs-materialx features instead of treating the markdown like the legacy Github-based format. Signed-off-by: LandonTClipp <lclipp@coreweave.com>	2026-06-23 08:57:37 +01:00
Fabiano Fidêncio	9761ea2235	Merge pull request #13164 from manuelh-dev/mahuber/remove-resource-requests tests: use limits for Kata workload resources	2026-06-22 20:01:33 +02:00
Fabiano Fidêncio	f1ebefcdfb	Merge pull request #13222 from fidencio/topic/nvidia-switch-to-kata-deploy-jobs kata-deploy: nvidia: Default to the Job-based deployment mode	2026-06-22 12:55:10 +02:00
Fabiano Fidêncio	dc70b93573	release: Bump version to 3.32.0 Bump VERSION and helm-charts versions. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-06-22 01:15:24 +02:00
Fabiano Fidêncio	374a867774	Merge pull request #13196 from microsoft/cameronbaird/upstream/runtime-go-clh-templating runtime: Enable VM Templating Support for CLH	2026-06-21 16:31:19 +02:00
Cameron Baird	65a5f272f8	ci: Introduce tests for VM template factory Add k8s-vm-templating-test.bats which exercises pod create with the factory initialized on the target node. Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>	2026-06-19 18:00:02 +00:00
Manuel Huber	aafd16515c	tests: use limits for Kata workload manifests Kata sizes VM CPU and memory from OCI limits, not Kubernetes resource requests. Requests are consumed by the Kubernetes control plane, but they do not drive Kata VM or sandbox sizing today. Convert the straightforward Kata workload manifests and kata-deploy examples from resource requests to limits so the declared resources match the values Kata uses for VM provisioning. Keep requests where the fixture intentionally validates Kubernetes request/limit behavior. Update fixture expectations affected by the conversion. The LimitRange fixture is limit-only at 500m. Raise the policy deployment limits to 500m and 800Mi. These tests boot CoCo/runtime-rs sandboxes with policy/initdata, and the former 100m/100Mi values became real runtime limits after the conversion, which is too constrained for the CI environments. Leave PVC storage requests, explicit request/limit validation fixtures, the env resourceFieldRef request, and non-Kata workload examples unchanged where requests are handled outside the Kata shim resource sizing path. If Kata later grows request-aware sandbox sizing, for example through Sandbox API based resource plumbing, these requests can be reintroduced where they carry the intended semantics. Signed-off-by: Manuel Huber <manuelh@nvidia.com> Assisted-by: OpenAI Codex <codex@openai.com>	2026-06-19 09:38:15 -07:00
Greg Kurz	c3d98fe323	osbuilder: Simplify version fetching `tools/osbuilder/VERSION` points to the root `VERSION` file, just like the code does. Use that file. Signed-off-by: Greg Kurz <groug@kaod.org>	2026-06-17 10:08:23 +02:00
Fabiano Fidêncio	5959549645	release: do not publish a kata-monitor-job-dispatcher manifest The shared _publish_multiarch_manifest() helper always derived a "-job-dispatcher" registry from the registries it was given. However, the dispatcher is a kata-deploy-specific sidecar image, so when the helper was reused to publish the kata-monitor multi-arch manifest it wrongly tried to push a non-existent kata-monitor-job-dispatcher image. Let's gate the dispatcher derivation behind KATA_DEPLOY_PUBLISH_JOB_DISPATCHER (defaulting to true so the kata-deploy path is unchanged) and opt out of it when publishing the kata-monitor manifest. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-15 16:29:29 +02:00
Hyounggyu Choi	59fd29fb33	Merge pull request #13225 from BbolroC/use-tar-in-exporting-kate-deploy-files packaging: Optimize kata-deploy build export using tar output	2026-06-15 14:47:01 +02:00
Hyounggyu Choi	46b8e9f027	packaging: Optimize kata-deploy build export using tar output Replace `type=local` with `type=tar` in kata-deploy build to reduce export time and avoid build hangs during the export-to-client-directory phase. Update callers to extract binaries directly from the tar archive instead of copying from an intermediate directory. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2026-06-15 11:52:33 +02:00
Steve Horsman	ea999aa033	Merge pull request #13221 from manuelh-dev/mahuber/nydus-root-export kata-deploy: export nydus snapshotter root	2026-06-15 08:55:08 +01:00
Fabiano Fidêncio	fefc0b75ab	kata-deploy: nvidia: Default to the Job-based deployment mode Switch the NVIDIA GPU example values file to install Kata via the Job-based deployment mode (deploymentMode: job) instead of the always-on, privileged DaemonSet, so that nothing keeps running on the node once the install completes. To exercise this in our CI, make the helm_helper aware of the deployment mode coming from the (base) values file: - In "job" mode, clear job.nodeSelectorExpressions so the dispatcher targets every discovered node. Our CI clusters are typically single-node, where the only node carries the control-plane label, and the default selector excludes control-plane/master nodes. - There is no always-on DaemonSet to wait on in "job" mode. The dispatcher runs as a blocking post-install hook and the final per-node stage labels the node, so wait until at least one node carries the katacontainers.io/kata-runtime label as the "install complete" signal (dumping Job/pod logs on timeout). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 22:55:11 +02:00
Manuel Huber	639420e7f5	kata-deploy: export nydus snapshotter root containerd uses the proxy plugin root export when reporting CRI image filesystem paths. Without this export, the CRI plugin falls back to /var/lib/containerd/io.containerd.snapshotter.v1.<snapshotter>. For nydus-for-kata-tee this fallback does not match the actual snapshotter root under /var/lib/nydus-for-kata-tee. Kubelet/cAdvisor then fails stats collection when it tries to inspect the nonexistent fallback path. Export the nydus proxy snapshotter root so containerd reports the real filesystem path for resource accounting. When using trusted ephemeral storage or a new ephemeral storage wip feature for providing plain disks, resource accounting would not kick in and pods which exhausted their emptyDir sizeLimits would not get evicted. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-06-12 19:06:01 +02:00
Fabiano Fidêncio	aa27490801	kata-deploy: track distroless static base by tag, not digest The kata-deploy main image pinned its gcr.io/distroless/static-debian13 base by sha256 digest. distroless does not publish versioned tags, so a pinned digest just goes stale with no clear upgrade path. Track the rolling tag instead (guarded with a hadolint DL3007 ignore plus a comment explaining why), matching the kata-deploy-job-dispatcher image base. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 18:58:33 +02:00
Fabiano Fidêncio	c23fe11529	kata-deploy: make verification Job aware of job deployment mode The verification Job assumed the DaemonSet model: it waited for the DaemonSet to exist, for its pods, and for `rollout status daemonset/...`, then required every node in the cluster to be labeled. None of that holds for deploymentMode: job, where install happens via the dispatcher and the per-node Jobs it fans out, and only the targeted (worker) nodes get labeled. Make the hook mode-aware: - Hook weight: in job mode the install dispatcher runs as a post-install hook at weight 5, so verification now runs at weight 10 (after it); daemonset mode keeps weight 0 (the DaemonSet is a normal resource). - Readiness wait: in job mode, wait for the install dispatcher Job to complete and then for the per-node install Jobs (kata-deploy/stage=install) to finish (with the same CRI-restart retry logic) instead of a DaemonSet rollout. - Label check: in job mode, verify exactly the nodes the dispatcher targeted are labeled, rather than comparing the labeled count against all nodes in the cluster. - Grant the verification ClusterRole read access to batch/jobs (used by the job-mode waits; harmless in daemonset mode). The daemonset code path is unchanged and the default render (no verification.pod) is byte-for-byte identical. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 18:58:33 +02:00
Fabiano Fidêncio	3d732986d2	kata-deploy: add per-node staged cleanup for job mode Add the uninstall counterpart to the install dispatcher for deploymentMode: job. On `helm uninstall`, a single pre-delete hook Job runs the kata-deploy-job-dispatcher, which enumerates the targeted nodes live and fans out one node-pinned cleanup Job per node that runs the install pipeline in reverse and exits: unlabel -> revert-cri (initContainers, run sequentially) remove-artifacts (main container) Running as a pre-delete hook means the dispatcher ServiceAccount/RBAC and the kata-deploy host-mutation RBAC still exist while the Jobs run, so the unlabel stage retains node get/patch access. revert-cri and remove-artifacts are host-only operations (privileged nsenter / host mount) and need no extra cluster RBAC. Ordering mirrors install in reverse: unlabel first so the scheduler stops placing kata workloads here, then revert the CRI config + restart the runtime, then remove the on-host artifacts. Each stage is idempotent and skips when already undone, so partially-installed nodes and re-runs are safe. Uninstall node selection is deliberately SEPARATE from install (a dedicated job.cleanup.* block) and defaults to every node carrying the katacontainers.io/kata-runtime label (set by the install label stage) rather than re-evaluating the install selector. Because the cleanup dispatcher resolves nodes live when it runs, this stays robust to install-time selector drift (relabeled nodes, etc.) while remaining fully overridable via job.cleanup.nodes / job.cleanup.nodeSelector / job.cleanup.nodeSelectorExpressions. The default (daemonset) mode is unaffected. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 18:58:33 +02:00
Fabiano Fidêncio	54878fa373	kata-deploy: add job deployment mode driven by the job-dispatcher Phase 2 of the DaemonSet -> staged-Job migration: add an opt-in `deploymentMode: job` that installs Kata via short-lived, per-node install Jobs instead of the long-running DaemonSet. The DaemonSet remains the default and is now gated behind `deploymentMode == daemonset`. Rather than render one Job per node into the Helm release (which grows the release secret O(nodes) and offers no rollout pacing), job mode ships a single tiny post-install/post-upgrade hook Job that runs the kata-deploy-job-dispatcher. The dispatcher enumerates the selected nodes LIVE from the API server and stamps out one node-pinned install Job per node from a constant-size ConfigMap of Job templates, keeping at most `job.parallelism` in flight and refilling as they finish. This guarantees per-node coverage with a paced rollout while the Helm release stays O(1) regardless of fleet size. New nodes are picked up by re-running `helm upgrade`; there is no always-on component. Each per-node Job runs the staged install pipeline as ordered initContainers and exits: host-check -> artifacts -> cri (initContainers, run sequentially) label (main container) The privilege split is explicit: the dispatcher pod is a pure control-plane client (lists nodes, manages Jobs in its own namespace) and runs fully unprivileged under a dedicated, least-privilege ServiceAccount (kata-rbac.yaml); only the per-node Jobs it creates carry the privileged kata-deploy host-mutation rights. Node selection (templates/_helpers.tpl: nodeLabelSelector / perNodeJob): - job.nodes: explicit node-name list passed to the dispatcher, and - job.nodeSelector (equality map) ANDed with - job.nodeSelectorExpressions (k8s label-selector requirements: In / NotIn / Exists / DoesNotExist), compiled into a single label-selector string the dispatcher resolves live. The default expressions target worker (non-control-plane) nodes, so no custom node labeling is required; set the expressions to [] to target all discovered nodes. Reuses the commonEnv/commonVolume* helpers and adds the stageContainer, serviceAccountName, dispatcherServiceAccountName, dispatcherImage and perNodeJob helpers shared by the dispatcher and the staged Jobs. The default (daemonset) render is unchanged. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 18:58:33 +02:00
Fabiano Fidêncio	28fce44b70	kata-deploy: extract shared pod env/volumes into helm helpers Pull the kata-deploy container's environment block and host volume/volumeMount definitions out of the DaemonSet template into reusable named templates in _helpers.tpl: - kata-deploy.commonEnv - kata-deploy.commonVolumeMounts - kata-deploy.commonVolumes These are derived purely from chart values and are independent of the deployment model, so they can be shared verbatim by upcoming per-node install/cleanup Jobs without duplicating the (large) env wiring. Pure refactor: the rendered DaemonSet is byte-for-byte identical to before (verified via normalized `helm template` diff across default and multiInstallSuffix/userDropIn/customRuntimes permutations). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 18:58:33 +02:00
Fabiano Fidêncio	225ff2209e	kata-deploy: split install/cleanup into staged actions Phase 1 of migrating kata-deploy from a DaemonSet to a staged JobSet workflow: refactor the binary's install/cleanup flows into discrete, independently invocable stages while keeping the existing DaemonSet path fully working. Add new staged subcommands that each run one step and exit, so a JobSet can drive them as ordered initContainers/Jobs per node: install: host-check -> artifacts -> cri -> label cleanup (reverse): unlabel -> revert-cri -> remove-artifacts `install` becomes a compatibility wrapper composing the install stages in the canonical order, so the DaemonSet deployment model is unchanged. The DaemonSet `cleanup` (with its DaemonSet-presence gating) is left intact; the staged cleanup actions are added alongside it and skip that gating since the JobSet workflow only schedules them on a real uninstall. Each stage has an idempotent skip check so reruns are safe: - install label / cleanup unlabel: short-circuit via the node label - cleanup remove-artifacts: skip when the install dir is already gone - cleanup revert-cri: skip the disruptive runtime restart when the CRI drop-ins are already absent (new cri_drop_in_present helper) Introduce a shared KATA_RUNTIME_LABEL constant and add rstest-based tests covering the subcommand-name -> Action mapping, rejection of unknown actions, and the visible/hidden help semantics. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 18:58:33 +02:00
Fabiano Fidêncio	d4205c7fcc	kata-deploy: build and publish the kata-deploy-job-dispatcher image Package and ship the dispatcher built in the previous commit so the job-mode Helm chart has an image to run. - Dockerfile.components: build kata-deploy and kata-deploy-job-dispatcher from the same rust-builder stage (one compile), and run fmt/clippy/ test for both crates. - job-dispatcher/Dockerfile: a minimal distroless/static image containing only the dispatcher binary and CA certs - it is an API client, so it needs nothing from the host. - local-build: kata-deploy-job-dispatcher becomes its own build component with its own static tarball (kata-deploy-static-kata-deploy-job-dispatcher.tar.zst); the shared rust-builder output is reused so the two components do not recompile the workspace locally. The payload script builds and pushes a separate "<kata-deploy registry>-job-dispatcher" image with the same tag scheme, and release.sh publishes its multi-arch manifest symmetrically. - CI: add kata-deploy-job-dispatcher to the build-kata-deploy-components matrices (its tarball is picked up by the existing kata-artifacts-* glob), and gate it in the kata-deploy rust static checks. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 18:58:33 +02:00
Fabiano Fidêncio	87d27e0cc8	kata-deploy-job-dispatcher: add generic per-node Job dispatcher Add a small, deployment-agnostic dispatcher binary that runs exactly one Kubernetes Job per selected node and paces the rollout, so callers get guaranteed per-node coverage without encoding the fan-out in Helm. Motivation: templating one Job per node into a Helm release does not scale (the release Secret hits etcd's 1 MiB limit and hooks run sequentially), and a single Indexed Job cannot guarantee per-node coverage when paced - the scheduler ignores completed pods when evaluating topology spread, so nodes get uneven numbers of pods. A tiny dispatcher that enumerates nodes live and creates node-pinned Jobs itself sidesteps both problems and keeps the Helm release O(1) in fleet size. The dispatcher: - enumerates target nodes live (explicit --nodes list or --node-selector label selector), paginating the API; - stamps out one Job per node from a YAML template, pinning it with nodeName and an owner label for server-side filtering; - keeps at most --parallelism Jobs in flight, refilling as they finish, and sets an OwnerReference to the owner Job so the per-node Jobs are garbage-collected with it; - is a plain API client (kube): it never touches the host, so it can run fully unprivileged. Node membership is resolved live on each run, not frozen at Helm template-render time: re-running the dispatcher (e.g. via `helm upgrade`) picks up nodes added since the last run and skips ones already done, as the per-node stages are idempotent. The dispatcher is one-shot, however - it does not watch the API, so nodes added while it is not running are only covered by the next run. job.rs holds the pure helpers (node-name sanitization, deterministic Job naming, template instantiation, status interpretation) with rstest unit tests; main.rs wires up the CLI and the fan-out loop. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 18:58:33 +02:00
Fabiano Fidêncio	56da8097c2	Merge pull request #13204 from fidencio/topic/versions-bump-qemu versions: Bump QEMU to 11.0.1	2026-06-12 17:14:57 +02:00
Fabiano Fidêncio	110843d6e1	Merge pull request #13138 from manuelh-dev/mahuber/runt-rs-mem-file-removal runtime(-rs): remove file_mem_backend config option	2026-06-12 17:13:04 +02:00
Greg Kurz	eac5dd2907	generate_vendor: Fix heavily broken logic While checking the content of the vendor tarball artifact in the 3.31.0 release page, I realized that it is lacking most of the rust code and all the go code. It turns out that the script is badly broken in many ways : 1. Cargo workspace conflicts: Vendored dependencies were treated as workspace members, causing "current package believes it's in a workspace when it's not" errors. Fixed by adding vendor directory exclusions to root Cargo.toml. 2. Missing Go vendoring: Script only searched for Cargo.lock files, never processing go.mod files despite having a case statement for them. Fixed by adding go.mod to the find command with '-o -name go.mod'. 3. Wrong tar execution directory: Script ran tar from release/ directory but vendor_dir_list contained paths relative to repo root (./vendor, ./src/agent/vendor, etc.), causing "Cannot stat" errors. Fixed by moving tar command before final popd. 4. Relative tarball path: Since tar now runs from repo root, converted tarball path to absolute to ensure it's created in the release directory. 5. Vendored go.mod pollution: Added '-path ./vendor -prune' to find command to exclude vendor directory, preventing the script from finding go.mod files inside vendored Rust dependencies. The fixes are simple enough they can be squashed into a single commit. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2026-06-12 10:06:53 +02:00
Manuel Huber	70d8f1bf3d	runtime: remove file_mem_backend config option Remove the Go runtime file_mem_backend and valid_file_mem_backends config knobs, along with the corresponding sandbox annotation handling. The runtime still enables file-backed shared memory automatically for virtio-fs by using /dev/shm as the backing directory. This only removes the user-selectable backend path. Signed-off-by: Manuel Huber <manuelh@nvidia.com> Assisted-by: OpenAI Codex <codex@openai.com>	2026-06-12 00:07:16 +00:00
Fabiano Fidêncio	46add95802	versions: Bump QEMU to 11.0.1 Bump QEMU to its latest release. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-06-11 22:01:26 +02:00
stevenhorsman	1d854ad7af	ci: Update required tests publish-kata-deploy-payload got renamed in #13107, which broke the CI. Now, instead of tracking all those intermediate steps, let's make sure we only track the tests themselves. Signed-off-by: stevenhorsman <steven@uk.ibm.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-06-11 19:02:23 +02:00
Fabiano Fidêncio	5731d30554	helm: add optional kata-monitor deployment to kata-deploy Add a disabled-by-default kata-monitor DaemonSet to the kata-deploy Helm chart, including image/configuration values so operators can enable monitor shipping as part of the same deployment workflow when needed. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: OpenAI Codex <codex@openai.com>	2026-06-09 14:33:30 +02:00
Fabiano Fidêncio	0d6234e7be	ci: share kata image publishing workflows Unify kata-deploy and kata-monitor image publishing behind a single reusable workflow, and rename workflow files to generic kata-images names. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: OpenAI Codex <codex@openai.com>	2026-06-09 14:33:30 +02:00
Fabiano Fidêncio	e04a4326ec	tools: build kata-monitor image from shim-v2-go tarball Build kata-monitor images by extracting the binary from the shim-v2-go tarball and shipping it on top of gcr.io/distroless/static-debian13. Because the binary is built inside an Ubuntu (glibc) toolchain it cannot run on a pure musl/alpine base — users hit __fprintf_chk / __vfprintf_chk relocation errors. To get a small, distroless runtime image we use the same pattern as tools/packaging/kata-deploy/Dockerfile: copy the glibc libraries the binary needs (plus the dynamic linker) via ldd from a glibc base image. In order to do so, we also added a helper script to build and publish architecture-specific monitor images from tarball artifacts. Reported-by: Steve Linde <stevenlinde@google.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: OpenAI Codex <codex@openai.com>	2026-06-09 14:33:30 +02:00
Fabiano Fidêncio	ac2221a6a5	Merge pull request #13004 from fidencio/topic/versions-bump-containerd-to-2.3 versions: Bump containerd to 2.3	2026-06-09 08:21:58 +02:00
Fabiano Fidêncio	48ebbbec3a	kata-deploy: honor debug mode with CLI log-level Make the chart pass --log-level debug automatically when debug=true so CI and troubleshooting runs emit full rendered config dumps without requiring a separate log-level override. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <noreply@cursor.com>	2026-06-08 19:25:48 +02:00
Fabiano Fidêncio	b63494345d	kata-deploy: add configurable verbosity for full CRI config dumps Allow operators to force kata-deploy log verbosity and emit the fully rendered containerd/CRI-O config and drop-in files in debug mode so install troubleshooting can rely on exact effective configuration. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <noreply@cursor.com>	2026-06-08 19:25:48 +02:00
Fabiano Fidêncio	fc08218f55	gatekeeper: rename required tests to minimum/latest The containerd_version matrix values were renamed from lts/active to minimum/latest, which changes the generated CI job names. Update the required-tests list so the gatekeeper waits on the checks that are actually produced. The amd64 run-containerd-stability, run-nydus, run-cri-containerd and free-runner run-k8s-tests jobs map lts -> minimum and active -> latest. The s390x cri-containerd job maps active -> latest, matching its updated matrix. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <noreply@cursor.com>	2026-06-08 19:20:14 +02:00
Fabiano Fidêncio	b119b051cb	kata-deploy: support drop-in configs for default runtimes Allow operators to provide per-shim drop-in TOML for built-in runtimes and reconcile stale override files so upgrades and migrations remain safe when drop-ins are added or removed. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Codex	2026-06-08 13:31:03 +02:00
Fabiano Fidêncio	1ca7129581	Merge pull request #13176 from Amulyam24/kata-deploy-fix kata-deploy: add the imports directive explicitly if expected but not found	2026-06-05 22:24:16 +02:00
Fabiano Fidêncio	f6ff9578d4	Merge pull request #13161 from kata-containers/sprt/remove-configure-mariner ci: remove Mariner annotations and use new config	2026-06-05 20:22:58 +02:00
Fabiano Fidêncio	e9ee97f751	kata-deploy: inherit custom RuntimeClass overhead from baseConfig Default custom runtime RuntimeClass overhead.podFixed to the selected baseConfig values, so equivalent runtimes behave consistently without repeating boilerplate. In case the user wants to enforce that no overhead is set on the custom RuntimeClass, disable inheritance with inheritBaseOverhead=false. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-06-05 17:22:25 +02:00
Amulyam24	b15a5fbe36	kata-deploy: add the imports directive explicitly if expected but not found For containerd v2.2+, the flow assumes that the imports directive would be present. It is better to check it and add if it doesn't exist. Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>	2026-06-05 18:47:07 +05:30
Steve Horsman	1624ebe362	Merge pull request #13135 from kata-containers/dependabot/cargo/tar-0.4.46 build(deps): bump tar from 0.4.45 to 0.4.46	2026-06-05 09:44:46 +01:00
Fabiano Fidêncio	743b0a4839	Merge pull request #13165 from stevenhorsman/bump-go-to-1.25.11 versions: bump golang to 1.25.11	2026-06-04 20:24:57 +02:00

1 2 3 4 5 ...

2446 Commits