kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-07-01 22:50:54 +00:00

Author	SHA1	Message	Date
Fabiano Fidêncio	3e98f925cf	Merge pull request #13142 from davidweisse/dav/genpolicy-pod-resources genpolicy: support pod-level resources	2026-06-16 15:31:50 +02:00
davidweisse	ac56ea21d8	genpolicy: support pod-level resources Add support for resource requests and limits in the PodSpec. Fixes #12816 Signed-off-by: davidweisse <98460960+davidweisse@users.noreply.github.com>	2026-06-16 15:30:22 +02:00
Fabiano Fidêncio	774e698aeb	Merge pull request #12293 from Apokleos/graceful-errors runtime-rs: make OOM watcher and signal handling lifecycle-aware	2026-06-16 15:02:54 +02:00
Fabiano Fidêncio	c76c82ce1c	Merge pull request #13229 from hgowda-amd/skip-qos-tests-snp-tdx-runtime-rs tests: skip Guaranteed QoS test for SNP/TDX runtime-rs	2026-06-16 14:02:51 +02:00
Fabiano Fidêncio	492d604daf	Merge pull request #13214 from fidencio/topic/block-volume-readonly-propagation runtime(-rs): Propagate host block device read-only flag to the VMM	2026-06-16 13:39:23 +02:00
Alex Lyn	8fc1a16225	runtime-rs: Make signal_process idempotent for exited init processes Address the issue where signal_process returns an INTERNAL error when the container's init process has already exited, and ensure teardown is never aborted by signal failures. Introduce is_no_such_process_error() to detect "no such process" conditions (ESRCH/ENOENT codes or equivalent messages). When the init process is already gone, treat it as success with an info log instead of an error. In stop_process(), never propagate signal failures. During sandbox shutdown the agent connection is often already closed, causing AgentConnectionClosed errors that bypass is_no_such_process_error(). If stop_process() aborts on such errors, cleanup_container() is skipped and leftover mounts cause "Resource busy" failures in sandbox cleanup. Restore "always proceed to cleanup" semantics: log the failure as a warning, but never skip resource cleanup. Resource cleanup must be best-effort and idempotent regardless of kill outcome. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-06-16 15:12:28 +08:00
Alex Lyn	44dd2b1f34	runtime-rs: Refine OOM watcher error reporting for sandbox teardown This commit refines the error handling within the OOM watcher to distinguish between genuine failures and errors that occur as a natural consequence of sandbox shutdown via the helper is_normal_shutdown_error. Previously, various connection-related errors during teardown were logged as warnings, contributing to noisy logs. It aims to improve OOM error handling, distinguish error types: The logic now differentiates between "normal shutdown" errors (e.g., Connection reset by peer, broken pipe) and actual OOM watcher failures. This enhancement makes OOM event logs more informative and less prone to clutter during normal sandbox termination. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-16 15:12:24 +08:00
Alex Lyn	3095bd379b	runtime-rs: Introduce cancellation for OOM watcher during teardown This commit introduces an explicit cancellation mechanism for the OOM watcher loop within VirtSandbox. This addresses the issue where the watcher continues to poll for OOM events even when the sandbox is being stopped, leading to spurious "Connection reset by peer" errors. Key changes: (1) A CancellationToken is added to VirtSandbox to signal the watcher loop when the sandbox is undergoing teardown. (2) The OOM watcher loop in VirtSandbox::start() is now wrapped in a tokio::select! statement. This allows it to concurrently listen for two events: - cancel_token.cancelled(): Triggered when the sandbox/VM is stopping. - agent.get_oom_event(): The regular OOM event polling. (3) In the sandbox stop/teardown path, cancel_token.cancel() is called before stopping the VM. This ensures the OOM watcher loop exits cleanly via the cancellation token, preventing the occurrence of ECONNRESET/EOF errors on a closed channel. This change improves the robustness of OOM event handling during sandbox lifecycle management. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-16 12:56:54 +08:00
Alex Lyn	0ffdc576d3	runtime-rs: Introduce a helper to check if process/container exists Returns `true` if the error indicates that the target process/container no longer exists. This is used to determine if an operation, like signaling a process, failed because the target is no longer available. The function checks for standard OS error codes (`ESRCH`, `ENOENT`) and common error message patterns. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-16 12:56:54 +08:00
Alex Lyn	59677688ee	runtime-rs: Introduce a helper to check normal oom shutdown errors It mainly for checking if an error is a normal oom shutdown error due to network disconnection issues. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-16 12:56:54 +08:00
Fabiano Fidêncio	cfab6f496b	runtime-rs: Propagate block device read-only flag to the VMM Block volumes and block-mode device nodes were attached to the guest read-write regardless of the volume's read-only intent, so the guest-visible virtio-blk device was always writable. This matters beyond simple write protection: filesystems such as XFS inspect the block device read-only state to decide whether to attempt journal/log recovery. When the device is writable, XFS tries to replay the log even on a read-only mount, which fails badly. Mounting with "-o ro" inside the guest is not sufficient; the device itself must advertise read-only (VIRTIO_BLK_F_RO), which only happens when the VMM opens the backing image read-only. Set is_readonly on the block device config from two signals, combined with OR so either one marks the device read-only: - the read-only intent from the OCI spec: * bind-mounted block volumes and direct-assigned (raw block) volumes derive it from the "ro" mount option, and * block-mode volumes (e.g. Kubernetes volumeDevices) arrive as device nodes in spec.Linux.Devices with no mount option; their intent is expressed only via the cgroup device access in spec.Linux.Resources.Devices ("rm" = read+mknod, no write, for read-only; "rwm" for read-write). handler_devices() derives the flag from the matching cgroup allow rule, and - the host block device's own read-only flag (queried via the BLKROGET ioctl). Both the volume path (block_volume/rawblock_volume) and the device-node path (handler_devices, resolving the host node via get_host_path) honor it, so a device that is physically read-only on the host is exposed read-only to the guest even when the intent is not encoded in the OCI spec. All in-tree hypervisors (qemu, cloud-hypervisor, dragonball) already honor BlockConfig.is_readonly, so no hypervisor changes are required. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor	2026-06-15 23:18:36 +02:00
Fabiano Fidêncio	6203e28bef	runtime: Propagate block-mode device read-only flag to the VMM Block-mode volumes (e.g. Kubernetes volumeDevices) are passed to the container as device nodes in spec.Linux.Devices and carry no mount "ro" option. Their read-only intent is expressed only via the cgroup device access in spec.Linux.Resources.Devices ("rm" = read+mknod, no write, for read-only; "rwm" for read-write). The device path ignored that signal: newLinuxDeviceInfo() built the DeviceInfo without ever setting ReadOnly (it only consumed FileMode, the node permission bits, not the read/write access), so the device was always attached read-write. This is problematic for filesystems such as XFS, which inspect the block device read-only state to decide whether to attempt journal/log recovery. When the guest device is writable, XFS tries to replay the log even for a read-only mount, which fails badly. Mounting "-o ro" in the guest is not enough; the device itself must advertise read-only, which only happens when the VMM opens the backing device read-only (DeviceInfo.ReadOnly -> BlockDrive.ReadOnly -> qemu read-only=on / clh Readonly). Derive the read-only flag from two independent signals, combined with OR so either one marks the device read-only: - the cgroup device access rule that exactly matches the device, so a block-mode volume marked read-only by the orchestrator (e.g. a pod volume with persistentVolumeClaim.readOnly: true) is honored, and - the host block device's own read-only flag (queried via the BLKROGET ioctl). Block-mode volumes frequently carry no read-only signal in the OCI spec at all, so the device flag is often the only reliable source. The BLKROGET probe is shared (pkg/device/config.BlockDeviceIsReadOnly, Linux-only with a stub on other platforms) between the device-node path (newLinuxDeviceInfo, probing /dev/block/<major>:<minor>) and the bind-mounted/filesystem block path (createDeviceInfo). None of this relies on external host tooling such as "blockdev --setro". Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor	2026-06-15 23:18:36 +02:00
Harshitha Gowda	6588014b54	tests: skip Guaranteed QoS test for SNP/TDX runtime-rs The Guaranteed QoS test is currently failing for SNP and TDX runtime-rs due to a podOverhead configuration issue. The test requests 600Mi of memory which, combined with the 2048Mi podOverhead, exceeds 2GiB and triggers memory management issues in confidential guests. This is a temporary skip until the podOverhead fix is merged. Related: https://github.com/kata-containers/kata-containers/pull/13228 Signed-off-by: Harshitha Gowda <hgowda@amd.com>	2026-06-15 20:04:22 +00:00
Fabiano Fidêncio	1af9456e26	Merge pull request #13223 from manuelh-dev/mahuber/kata-config-dropin-helpers tests: add runtime config drop-in helpers	2026-06-15 19:53:28 +02:00
Fabiano Fidêncio	a8a2a705ed	Merge pull request #13226 from fidencio/topic/fix-image-publishing-for-release release: do not publish a kata-monitor-job-dispatcher manifest	2026-06-15 17:48:23 +02:00
Fabiano Fidêncio	5959549645	release: do not publish a kata-monitor-job-dispatcher manifest The shared _publish_multiarch_manifest() helper always derived a "-job-dispatcher" registry from the registries it was given. However, the dispatcher is a kata-deploy-specific sidecar image, so when the helper was reused to publish the kata-monitor multi-arch manifest it wrongly tried to push a non-existent kata-monitor-job-dispatcher image. Let's gate the dispatcher derivation behind KATA_DEPLOY_PUBLISH_JOB_DISPATCHER (defaulting to true so the kata-deploy path is unchanged) and opt out of it when publishing the kata-monitor manifest. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-15 16:29:29 +02:00
Hyounggyu Choi	59fd29fb33	Merge pull request #13225 from BbolroC/use-tar-in-exporting-kate-deploy-files packaging: Optimize kata-deploy build export using tar output	2026-06-15 14:47:01 +02:00
Hyounggyu Choi	46b8e9f027	packaging: Optimize kata-deploy build export using tar output Replace `type=local` with `type=tar` in kata-deploy build to reduce export time and avoid build hangs during the export-to-client-directory phase. Update callers to extract binaries directly from the tar archive instead of copying from an intermediate directory. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2026-06-15 11:52:33 +02:00
Fabiano Fidêncio	50d68d1f5f	Merge pull request #13116 from thejasn/thn/fix-oom-event-deadlock agent: fix get_oom_event deadlock after connection restart	2026-06-15 11:05:46 +02:00
Steve Horsman	ea999aa033	Merge pull request #13221 from manuelh-dev/mahuber/nydus-root-export kata-deploy: export nydus snapshotter root	2026-06-15 08:55:08 +01:00
Thejas N	7807aa3d62	agent: fix get_oom_event deadlock after connection restart When the agent-protocol-forwarder's inbound connection restarts (e.g. during a Cloud API Adaptor restart in peer pod environments), the shim re-sends a GetOOMEvent request through the new connection. Since the forwarder→agent Unix socket survives the restart, the old handler from the previous connection remains alive, holding the event_rx lock while blocked in recv().await. The new handler acquires the sandbox lock, then attempts to acquire the event_rx lock — which is held by the old handler. Because the sandbox lock is still held during this wait, every subsequent RPC (ExecProcess, WaitProcess, StatsContainer, SignalProcess, etc.) blocks on the sandbox lock, rendering the pod completely unresponsive. The root cause is a lock ordering violation: get_oom_event held the sandbox lock while acquiring the event_rx lock. Fix this by scoping the sandbox lock acquisition so it is dropped before the event_rx lock is acquired. The sandbox lock is only needed to clone the Arc<Mutex<Receiver>> — once cloned, it can be released immediately. Assisted-by: Claude Code <noreply@anthropic.com> Signed-off-by: Thejas N <thn@redhat.com>	2026-06-15 07:47:18 +02:00
Fabiano Fidêncio	37c4a0b6a2	Merge pull request #13128 from nikolasgkou/fix/guest-protection-fallback runtime-rs: don't fail VM start when guest protection detection errors	2026-06-13 08:56:56 +02:00
Manuel Huber	9ffdb1219d	tests: add runtime config drop-in helpers Add common Kubernetes test helpers for locating the active per-shim Kata runtime config directory and copying/removing TOML fragments under config.d. Update the NVIDIA NUMA test to install its temporary numa_mapping override through those helpers. This gives follow-up tests a shared pattern for temporary runtime config overrides. Signed-off-by: Manuel Huber <manuelh@nvidia.com> Assisted-by: OpenAI Codex <codex@openai.com>	2026-06-12 21:43:06 +00:00
Fabiano Fidêncio	5efc761002	Merge pull request #13211 from glingy/patch-1 runtime-rs: Fix queue_size of zero in block_rootfs	2026-06-12 22:37:18 +02:00
Fabiano Fidêncio	1b60563a34	Merge pull request #13120 from LandonTClipp/runtime-config chore(docs): Clarify dropIn runtime configuration	2026-06-12 22:34:58 +02:00
LandonTClipp	6005f8a499	chore(docs): Add cspell makefile target for local testing This makes it easier to check the spellchecker is happy before submitting it as a PR. Signed-off-by: LandonTClipp <lclipp@coreweave.com>	2026-06-12 22:24:18 +02:00
LandonTClipp	03c283edec	chore(docs): Clarify dropIn runtime configuration Clean the runtime configuration section by focusing first on the helm configuration. Then, pivot into a further explanation on how the runtime can be directly configured. Link to where these config parameters are explained more in-depth. Add open-in-new-tab (already downloaded in requirements.txt) in the mkdocs plugin config so that links don't open in the same tab. Signed-off-by: LandonTClipp <lclipp@coreweave.com>	2026-06-12 22:24:18 +02:00
nikolasgkou	80b8f592a0	runtime-rs: skip guest protection detection for non-confidential guests prepare_protection_device_config() called available_guest_protection() unconditionally and propagated any error before the "confidential_guest is not set" case was handled. On AMD hosts where the kvm_amd `sev` module parameter is "Y" but the CPU does not expose the SEV-SNP CPUID bit (8000_001f EAX[4]) -- e.g. consumer Ryzen -- available_guest_protection() returns Err("SEV not supported"), which blocked every non-confidential VM from booting even though no protection was requested. When confidential_guest is not set there is no reason to probe the host, so return Ok(None) before calling available_guest_protection(). Detection (and any error it produces) now runs only when a confidential guest is actually requested. Signed-off-by: nikolasgkou <nikolasgkou@disroot.org>	2026-06-12 22:20:13 +02:00
Fabiano Fidêncio	47b327ea35	Merge pull request #13155 from fidencio/topic/kata-deploy-no-daemonset kata-deploy: add a Job-based deployment mode (alternative to the privileged DaemonSet)	2026-06-12 21:55:11 +02:00
Manuel Huber	639420e7f5	kata-deploy: export nydus snapshotter root containerd uses the proxy plugin root export when reporting CRI image filesystem paths. Without this export, the CRI plugin falls back to /var/lib/containerd/io.containerd.snapshotter.v1.<snapshotter>. For nydus-for-kata-tee this fallback does not match the actual snapshotter root under /var/lib/nydus-for-kata-tee. Kubelet/cAdvisor then fails stats collection when it tries to inspect the nonexistent fallback path. Export the nydus proxy snapshotter root so containerd reports the real filesystem path for resource accounting. When using trusted ephemeral storage or a new ephemeral storage wip feature for providing plain disks, resource accounting would not kick in and pods which exhausted their emptyDir sizeLimits would not get evicted. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-06-12 19:06:01 +02:00
Fabiano Fidêncio	c5d5fc6ee8	Merge pull request #13213 from burgerdev/grpc-probes genpolicy: add missing probe fields	2026-06-12 19:04:53 +02:00
Fabiano Fidêncio	aa27490801	kata-deploy: track distroless static base by tag, not digest The kata-deploy main image pinned its gcr.io/distroless/static-debian13 base by sha256 digest. distroless does not publish versioned tags, so a pinned digest just goes stale with no clear upgrade path. Track the rolling tag instead (guarded with a hadolint DL3007 ignore plus a comment explaining why), matching the kata-deploy-job-dispatcher image base. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 18:58:33 +02:00
Fabiano Fidêncio	aebadb1ab2	docs: document kata-deploy job deployment mode Document the new opt-in deploymentMode: job alongside the default DaemonSet model in the maintained docs (not just the chart README): - helm-configuration.md: add a "Deployment Modes (DaemonSet vs Job)" section covering the dispatcher-driven staged install/cleanup pipelines, why a dispatcher is used instead of Helm-rendered per-node Jobs (O(1) release, guaranteed coverage, paced rollout, explicit privilege split), the "re-run helm upgrade to cover newly added nodes" model (no always-on reconcile component), and the node-selection precedence (job.nodes > job.nodeSelector + job.nodeSelectorExpressions) that defaults to worker nodes. - installation.md: note that the DaemonSet is the default but no longer the only model, linking to the section above. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 18:58:33 +02:00
Fabiano Fidêncio	c23fe11529	kata-deploy: make verification Job aware of job deployment mode The verification Job assumed the DaemonSet model: it waited for the DaemonSet to exist, for its pods, and for `rollout status daemonset/...`, then required every node in the cluster to be labeled. None of that holds for deploymentMode: job, where install happens via the dispatcher and the per-node Jobs it fans out, and only the targeted (worker) nodes get labeled. Make the hook mode-aware: - Hook weight: in job mode the install dispatcher runs as a post-install hook at weight 5, so verification now runs at weight 10 (after it); daemonset mode keeps weight 0 (the DaemonSet is a normal resource). - Readiness wait: in job mode, wait for the install dispatcher Job to complete and then for the per-node install Jobs (kata-deploy/stage=install) to finish (with the same CRI-restart retry logic) instead of a DaemonSet rollout. - Label check: in job mode, verify exactly the nodes the dispatcher targeted are labeled, rather than comparing the labeled count against all nodes in the cluster. - Grant the verification ClusterRole read access to batch/jobs (used by the job-mode waits; harmless in daemonset mode). The daemonset code path is unchanged and the default render (no verification.pod) is byte-for-byte identical. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 18:58:33 +02:00
Fabiano Fidêncio	3d732986d2	kata-deploy: add per-node staged cleanup for job mode Add the uninstall counterpart to the install dispatcher for deploymentMode: job. On `helm uninstall`, a single pre-delete hook Job runs the kata-deploy-job-dispatcher, which enumerates the targeted nodes live and fans out one node-pinned cleanup Job per node that runs the install pipeline in reverse and exits: unlabel -> revert-cri (initContainers, run sequentially) remove-artifacts (main container) Running as a pre-delete hook means the dispatcher ServiceAccount/RBAC and the kata-deploy host-mutation RBAC still exist while the Jobs run, so the unlabel stage retains node get/patch access. revert-cri and remove-artifacts are host-only operations (privileged nsenter / host mount) and need no extra cluster RBAC. Ordering mirrors install in reverse: unlabel first so the scheduler stops placing kata workloads here, then revert the CRI config + restart the runtime, then remove the on-host artifacts. Each stage is idempotent and skips when already undone, so partially-installed nodes and re-runs are safe. Uninstall node selection is deliberately SEPARATE from install (a dedicated job.cleanup.* block) and defaults to every node carrying the katacontainers.io/kata-runtime label (set by the install label stage) rather than re-evaluating the install selector. Because the cleanup dispatcher resolves nodes live when it runs, this stays robust to install-time selector drift (relabeled nodes, etc.) while remaining fully overridable via job.cleanup.nodes / job.cleanup.nodeSelector / job.cleanup.nodeSelectorExpressions. The default (daemonset) mode is unaffected. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 18:58:33 +02:00
Fabiano Fidêncio	54878fa373	kata-deploy: add job deployment mode driven by the job-dispatcher Phase 2 of the DaemonSet -> staged-Job migration: add an opt-in `deploymentMode: job` that installs Kata via short-lived, per-node install Jobs instead of the long-running DaemonSet. The DaemonSet remains the default and is now gated behind `deploymentMode == daemonset`. Rather than render one Job per node into the Helm release (which grows the release secret O(nodes) and offers no rollout pacing), job mode ships a single tiny post-install/post-upgrade hook Job that runs the kata-deploy-job-dispatcher. The dispatcher enumerates the selected nodes LIVE from the API server and stamps out one node-pinned install Job per node from a constant-size ConfigMap of Job templates, keeping at most `job.parallelism` in flight and refilling as they finish. This guarantees per-node coverage with a paced rollout while the Helm release stays O(1) regardless of fleet size. New nodes are picked up by re-running `helm upgrade`; there is no always-on component. Each per-node Job runs the staged install pipeline as ordered initContainers and exits: host-check -> artifacts -> cri (initContainers, run sequentially) label (main container) The privilege split is explicit: the dispatcher pod is a pure control-plane client (lists nodes, manages Jobs in its own namespace) and runs fully unprivileged under a dedicated, least-privilege ServiceAccount (kata-rbac.yaml); only the per-node Jobs it creates carry the privileged kata-deploy host-mutation rights. Node selection (templates/_helpers.tpl: nodeLabelSelector / perNodeJob): - job.nodes: explicit node-name list passed to the dispatcher, and - job.nodeSelector (equality map) ANDed with - job.nodeSelectorExpressions (k8s label-selector requirements: In / NotIn / Exists / DoesNotExist), compiled into a single label-selector string the dispatcher resolves live. The default expressions target worker (non-control-plane) nodes, so no custom node labeling is required; set the expressions to [] to target all discovered nodes. Reuses the commonEnv/commonVolume* helpers and adds the stageContainer, serviceAccountName, dispatcherServiceAccountName, dispatcherImage and perNodeJob helpers shared by the dispatcher and the staged Jobs. The default (daemonset) render is unchanged. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 18:58:33 +02:00
Fabiano Fidêncio	28fce44b70	kata-deploy: extract shared pod env/volumes into helm helpers Pull the kata-deploy container's environment block and host volume/volumeMount definitions out of the DaemonSet template into reusable named templates in _helpers.tpl: - kata-deploy.commonEnv - kata-deploy.commonVolumeMounts - kata-deploy.commonVolumes These are derived purely from chart values and are independent of the deployment model, so they can be shared verbatim by upcoming per-node install/cleanup Jobs without duplicating the (large) env wiring. Pure refactor: the rendered DaemonSet is byte-for-byte identical to before (verified via normalized `helm template` diff across default and multiInstallSuffix/userDropIn/customRuntimes permutations). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 18:58:33 +02:00
Fabiano Fidêncio	225ff2209e	kata-deploy: split install/cleanup into staged actions Phase 1 of migrating kata-deploy from a DaemonSet to a staged JobSet workflow: refactor the binary's install/cleanup flows into discrete, independently invocable stages while keeping the existing DaemonSet path fully working. Add new staged subcommands that each run one step and exit, so a JobSet can drive them as ordered initContainers/Jobs per node: install: host-check -> artifacts -> cri -> label cleanup (reverse): unlabel -> revert-cri -> remove-artifacts `install` becomes a compatibility wrapper composing the install stages in the canonical order, so the DaemonSet deployment model is unchanged. The DaemonSet `cleanup` (with its DaemonSet-presence gating) is left intact; the staged cleanup actions are added alongside it and skip that gating since the JobSet workflow only schedules them on a real uninstall. Each stage has an idempotent skip check so reruns are safe: - install label / cleanup unlabel: short-circuit via the node label - cleanup remove-artifacts: skip when the install dir is already gone - cleanup revert-cri: skip the disruptive runtime restart when the CRI drop-ins are already absent (new cri_drop_in_present helper) Introduce a shared KATA_RUNTIME_LABEL constant and add rstest-based tests covering the subcommand-name -> Action mapping, rejection of unknown actions, and the visible/hidden help semantics. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 18:58:33 +02:00
Fabiano Fidêncio	d4205c7fcc	kata-deploy: build and publish the kata-deploy-job-dispatcher image Package and ship the dispatcher built in the previous commit so the job-mode Helm chart has an image to run. - Dockerfile.components: build kata-deploy and kata-deploy-job-dispatcher from the same rust-builder stage (one compile), and run fmt/clippy/ test for both crates. - job-dispatcher/Dockerfile: a minimal distroless/static image containing only the dispatcher binary and CA certs - it is an API client, so it needs nothing from the host. - local-build: kata-deploy-job-dispatcher becomes its own build component with its own static tarball (kata-deploy-static-kata-deploy-job-dispatcher.tar.zst); the shared rust-builder output is reused so the two components do not recompile the workspace locally. The payload script builds and pushes a separate "<kata-deploy registry>-job-dispatcher" image with the same tag scheme, and release.sh publishes its multi-arch manifest symmetrically. - CI: add kata-deploy-job-dispatcher to the build-kata-deploy-components matrices (its tarball is picked up by the existing kata-artifacts-* glob), and gate it in the kata-deploy rust static checks. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 18:58:33 +02:00
Fabiano Fidêncio	87d27e0cc8	kata-deploy-job-dispatcher: add generic per-node Job dispatcher Add a small, deployment-agnostic dispatcher binary that runs exactly one Kubernetes Job per selected node and paces the rollout, so callers get guaranteed per-node coverage without encoding the fan-out in Helm. Motivation: templating one Job per node into a Helm release does not scale (the release Secret hits etcd's 1 MiB limit and hooks run sequentially), and a single Indexed Job cannot guarantee per-node coverage when paced - the scheduler ignores completed pods when evaluating topology spread, so nodes get uneven numbers of pods. A tiny dispatcher that enumerates nodes live and creates node-pinned Jobs itself sidesteps both problems and keeps the Helm release O(1) in fleet size. The dispatcher: - enumerates target nodes live (explicit --nodes list or --node-selector label selector), paginating the API; - stamps out one Job per node from a YAML template, pinning it with nodeName and an owner label for server-side filtering; - keeps at most --parallelism Jobs in flight, refilling as they finish, and sets an OwnerReference to the owner Job so the per-node Jobs are garbage-collected with it; - is a plain API client (kube): it never touches the host, so it can run fully unprivileged. Node membership is resolved live on each run, not frozen at Helm template-render time: re-running the dispatcher (e.g. via `helm upgrade`) picks up nodes added since the last run and skips ones already done, as the per-node stages are idempotent. The dispatcher is one-shot, however - it does not watch the API, so nodes added while it is not running are only covered by the next run. job.rs holds the pure helpers (node-name sanitization, deterministic Job naming, template instantiation, status interpretation) with rstest unit tests; main.rs wires up the CLI and the fan-out loop. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 18:58:33 +02:00
Gregory Ling	d90178c179	runtime-rs: Fix queue_size of zero in block_rootfs Fix BlockRootfs to save the queue_size, num_queues, logical_sector_size, and physical_sector_size of the hypervisor's block device info in the BlockConfig passed to the vm Fixes #13210 Signed-off-by: Gregory Ling <17791817+glingy@users.noreply.github.com>	2026-06-12 18:24:50 +02:00
Zvonko Kaiser	a2ad9b458e	Merge pull request #13215 from stevenhorsman/docs/python-cve-fixes-12th-june-2026 fix: pin idna and pymdown-extensions to remediate CVEs	2026-06-12 12:18:03 -04:00
Fabiano Fidêncio	b2376f849c	Merge pull request #13203 from fidencio/topic/versions-bump-kernel versions: Bump kernel to 6.18.35	2026-06-12 17:37:55 +02:00
Fabiano Fidêncio	56da8097c2	Merge pull request #13204 from fidencio/topic/versions-bump-qemu versions: Bump QEMU to 11.0.1	2026-06-12 17:14:57 +02:00
Fabiano Fidêncio	110843d6e1	Merge pull request #13138 from manuelh-dev/mahuber/runt-rs-mem-file-removal runtime(-rs): remove file_mem_backend config option	2026-06-12 17:13:04 +02:00
stevenhorsman	3c3f754f3f	fix: pin idna and pymdown-extensions to remediate CVEs Pin idna to 3.15 and pymdown-extensions to 10.21.3 to address security vulnerabilities: - GHSA-65pc-fj4g-8rjx (idna, severity 6.9) - GHSA-62q4-447f-wv8h (pymdown-extensions, severity 4.3) - GHSA-r6h4-mm7h-8pmq (pymdown-extensions, severity 2.7) These dependencies were previously transitive and vulnerable. They are now explicitly pinned to secure versions. Generated-by: IBM Bob Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-06-12 13:28:58 +01:00
Markus Rudy	2e8f61a575	genpolicy: add missing probe fields This commit adds fields for readiness/liveness/startup probes that were missing so far, and adds probes to the ignored_fields test to ensure these stay supported. None of these fields has an influence on the generated policy, they just allow parsing valid k8s yaml. Co-authored-by: Spyros Seimenis <sse@edgeless.systems> Signed-off-by: Markus Rudy <mr@edgeless.systems>	2026-06-12 13:20:16 +02:00
Hyounggyu Choi	edead9e97b	Merge pull request #13189 from stevenhorsman/osv-scanner-refactor workflows: refactor osv-scanner workflows	2026-06-12 12:04:12 +02:00
Fabiano Fidêncio	e758f4b280	Merge pull request #13202 from gkurz/fix-generate-vendor generate_vendor: Fix heavily broken logic	2026-06-12 11:48:50 +02:00
Fabiano Fidêncio	a016fd0485	Merge pull request #13198 from fidencio/topic/fix-ci-tee-static-sizing-overhead tests: raise k8s memory/QoS pod limits for TEE runtime-rs CI	2026-06-12 11:46:56 +02:00

1 2 3 4 5 ...

19386 Commits