kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-07-01 22:50:54 +00:00

Author	SHA1	Message	Date
Steve Horsman	ea999aa033	Merge pull request #13221 from manuelh-dev/mahuber/nydus-root-export kata-deploy: export nydus snapshotter root	2026-06-15 08:55:08 +01:00
Fabiano Fidêncio	37c4a0b6a2	Merge pull request #13128 from nikolasgkou/fix/guest-protection-fallback runtime-rs: don't fail VM start when guest protection detection errors	2026-06-13 08:56:56 +02:00
Fabiano Fidêncio	5efc761002	Merge pull request #13211 from glingy/patch-1 runtime-rs: Fix queue_size of zero in block_rootfs	2026-06-12 22:37:18 +02:00
Fabiano Fidêncio	1b60563a34	Merge pull request #13120 from LandonTClipp/runtime-config chore(docs): Clarify dropIn runtime configuration	2026-06-12 22:34:58 +02:00
LandonTClipp	6005f8a499	chore(docs): Add cspell makefile target for local testing This makes it easier to check the spellchecker is happy before submitting it as a PR. Signed-off-by: LandonTClipp <lclipp@coreweave.com>	2026-06-12 22:24:18 +02:00
LandonTClipp	03c283edec	chore(docs): Clarify dropIn runtime configuration Clean the runtime configuration section by focusing first on the helm configuration. Then, pivot into a further explanation on how the runtime can be directly configured. Link to where these config parameters are explained more in-depth. Add open-in-new-tab (already downloaded in requirements.txt) in the mkdocs plugin config so that links don't open in the same tab. Signed-off-by: LandonTClipp <lclipp@coreweave.com>	2026-06-12 22:24:18 +02:00
nikolasgkou	80b8f592a0	runtime-rs: skip guest protection detection for non-confidential guests prepare_protection_device_config() called available_guest_protection() unconditionally and propagated any error before the "confidential_guest is not set" case was handled. On AMD hosts where the kvm_amd `sev` module parameter is "Y" but the CPU does not expose the SEV-SNP CPUID bit (8000_001f EAX[4]) -- e.g. consumer Ryzen -- available_guest_protection() returns Err("SEV not supported"), which blocked every non-confidential VM from booting even though no protection was requested. When confidential_guest is not set there is no reason to probe the host, so return Ok(None) before calling available_guest_protection(). Detection (and any error it produces) now runs only when a confidential guest is actually requested. Signed-off-by: nikolasgkou <nikolasgkou@disroot.org>	2026-06-12 22:20:13 +02:00
Fabiano Fidêncio	47b327ea35	Merge pull request #13155 from fidencio/topic/kata-deploy-no-daemonset kata-deploy: add a Job-based deployment mode (alternative to the privileged DaemonSet)	2026-06-12 21:55:11 +02:00
Manuel Huber	639420e7f5	kata-deploy: export nydus snapshotter root containerd uses the proxy plugin root export when reporting CRI image filesystem paths. Without this export, the CRI plugin falls back to /var/lib/containerd/io.containerd.snapshotter.v1.<snapshotter>. For nydus-for-kata-tee this fallback does not match the actual snapshotter root under /var/lib/nydus-for-kata-tee. Kubelet/cAdvisor then fails stats collection when it tries to inspect the nonexistent fallback path. Export the nydus proxy snapshotter root so containerd reports the real filesystem path for resource accounting. When using trusted ephemeral storage or a new ephemeral storage wip feature for providing plain disks, resource accounting would not kick in and pods which exhausted their emptyDir sizeLimits would not get evicted. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-06-12 19:06:01 +02:00
Fabiano Fidêncio	c5d5fc6ee8	Merge pull request #13213 from burgerdev/grpc-probes genpolicy: add missing probe fields	2026-06-12 19:04:53 +02:00
Fabiano Fidêncio	aa27490801	kata-deploy: track distroless static base by tag, not digest The kata-deploy main image pinned its gcr.io/distroless/static-debian13 base by sha256 digest. distroless does not publish versioned tags, so a pinned digest just goes stale with no clear upgrade path. Track the rolling tag instead (guarded with a hadolint DL3007 ignore plus a comment explaining why), matching the kata-deploy-job-dispatcher image base. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 18:58:33 +02:00
Fabiano Fidêncio	aebadb1ab2	docs: document kata-deploy job deployment mode Document the new opt-in deploymentMode: job alongside the default DaemonSet model in the maintained docs (not just the chart README): - helm-configuration.md: add a "Deployment Modes (DaemonSet vs Job)" section covering the dispatcher-driven staged install/cleanup pipelines, why a dispatcher is used instead of Helm-rendered per-node Jobs (O(1) release, guaranteed coverage, paced rollout, explicit privilege split), the "re-run helm upgrade to cover newly added nodes" model (no always-on reconcile component), and the node-selection precedence (job.nodes > job.nodeSelector + job.nodeSelectorExpressions) that defaults to worker nodes. - installation.md: note that the DaemonSet is the default but no longer the only model, linking to the section above. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 18:58:33 +02:00
Fabiano Fidêncio	c23fe11529	kata-deploy: make verification Job aware of job deployment mode The verification Job assumed the DaemonSet model: it waited for the DaemonSet to exist, for its pods, and for `rollout status daemonset/...`, then required every node in the cluster to be labeled. None of that holds for deploymentMode: job, where install happens via the dispatcher and the per-node Jobs it fans out, and only the targeted (worker) nodes get labeled. Make the hook mode-aware: - Hook weight: in job mode the install dispatcher runs as a post-install hook at weight 5, so verification now runs at weight 10 (after it); daemonset mode keeps weight 0 (the DaemonSet is a normal resource). - Readiness wait: in job mode, wait for the install dispatcher Job to complete and then for the per-node install Jobs (kata-deploy/stage=install) to finish (with the same CRI-restart retry logic) instead of a DaemonSet rollout. - Label check: in job mode, verify exactly the nodes the dispatcher targeted are labeled, rather than comparing the labeled count against all nodes in the cluster. - Grant the verification ClusterRole read access to batch/jobs (used by the job-mode waits; harmless in daemonset mode). The daemonset code path is unchanged and the default render (no verification.pod) is byte-for-byte identical. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 18:58:33 +02:00
Fabiano Fidêncio	3d732986d2	kata-deploy: add per-node staged cleanup for job mode Add the uninstall counterpart to the install dispatcher for deploymentMode: job. On `helm uninstall`, a single pre-delete hook Job runs the kata-deploy-job-dispatcher, which enumerates the targeted nodes live and fans out one node-pinned cleanup Job per node that runs the install pipeline in reverse and exits: unlabel -> revert-cri (initContainers, run sequentially) remove-artifacts (main container) Running as a pre-delete hook means the dispatcher ServiceAccount/RBAC and the kata-deploy host-mutation RBAC still exist while the Jobs run, so the unlabel stage retains node get/patch access. revert-cri and remove-artifacts are host-only operations (privileged nsenter / host mount) and need no extra cluster RBAC. Ordering mirrors install in reverse: unlabel first so the scheduler stops placing kata workloads here, then revert the CRI config + restart the runtime, then remove the on-host artifacts. Each stage is idempotent and skips when already undone, so partially-installed nodes and re-runs are safe. Uninstall node selection is deliberately SEPARATE from install (a dedicated job.cleanup.* block) and defaults to every node carrying the katacontainers.io/kata-runtime label (set by the install label stage) rather than re-evaluating the install selector. Because the cleanup dispatcher resolves nodes live when it runs, this stays robust to install-time selector drift (relabeled nodes, etc.) while remaining fully overridable via job.cleanup.nodes / job.cleanup.nodeSelector / job.cleanup.nodeSelectorExpressions. The default (daemonset) mode is unaffected. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 18:58:33 +02:00
Fabiano Fidêncio	54878fa373	kata-deploy: add job deployment mode driven by the job-dispatcher Phase 2 of the DaemonSet -> staged-Job migration: add an opt-in `deploymentMode: job` that installs Kata via short-lived, per-node install Jobs instead of the long-running DaemonSet. The DaemonSet remains the default and is now gated behind `deploymentMode == daemonset`. Rather than render one Job per node into the Helm release (which grows the release secret O(nodes) and offers no rollout pacing), job mode ships a single tiny post-install/post-upgrade hook Job that runs the kata-deploy-job-dispatcher. The dispatcher enumerates the selected nodes LIVE from the API server and stamps out one node-pinned install Job per node from a constant-size ConfigMap of Job templates, keeping at most `job.parallelism` in flight and refilling as they finish. This guarantees per-node coverage with a paced rollout while the Helm release stays O(1) regardless of fleet size. New nodes are picked up by re-running `helm upgrade`; there is no always-on component. Each per-node Job runs the staged install pipeline as ordered initContainers and exits: host-check -> artifacts -> cri (initContainers, run sequentially) label (main container) The privilege split is explicit: the dispatcher pod is a pure control-plane client (lists nodes, manages Jobs in its own namespace) and runs fully unprivileged under a dedicated, least-privilege ServiceAccount (kata-rbac.yaml); only the per-node Jobs it creates carry the privileged kata-deploy host-mutation rights. Node selection (templates/_helpers.tpl: nodeLabelSelector / perNodeJob): - job.nodes: explicit node-name list passed to the dispatcher, and - job.nodeSelector (equality map) ANDed with - job.nodeSelectorExpressions (k8s label-selector requirements: In / NotIn / Exists / DoesNotExist), compiled into a single label-selector string the dispatcher resolves live. The default expressions target worker (non-control-plane) nodes, so no custom node labeling is required; set the expressions to [] to target all discovered nodes. Reuses the commonEnv/commonVolume* helpers and adds the stageContainer, serviceAccountName, dispatcherServiceAccountName, dispatcherImage and perNodeJob helpers shared by the dispatcher and the staged Jobs. The default (daemonset) render is unchanged. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 18:58:33 +02:00
Fabiano Fidêncio	28fce44b70	kata-deploy: extract shared pod env/volumes into helm helpers Pull the kata-deploy container's environment block and host volume/volumeMount definitions out of the DaemonSet template into reusable named templates in _helpers.tpl: - kata-deploy.commonEnv - kata-deploy.commonVolumeMounts - kata-deploy.commonVolumes These are derived purely from chart values and are independent of the deployment model, so they can be shared verbatim by upcoming per-node install/cleanup Jobs without duplicating the (large) env wiring. Pure refactor: the rendered DaemonSet is byte-for-byte identical to before (verified via normalized `helm template` diff across default and multiInstallSuffix/userDropIn/customRuntimes permutations). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 18:58:33 +02:00
Fabiano Fidêncio	225ff2209e	kata-deploy: split install/cleanup into staged actions Phase 1 of migrating kata-deploy from a DaemonSet to a staged JobSet workflow: refactor the binary's install/cleanup flows into discrete, independently invocable stages while keeping the existing DaemonSet path fully working. Add new staged subcommands that each run one step and exit, so a JobSet can drive them as ordered initContainers/Jobs per node: install: host-check -> artifacts -> cri -> label cleanup (reverse): unlabel -> revert-cri -> remove-artifacts `install` becomes a compatibility wrapper composing the install stages in the canonical order, so the DaemonSet deployment model is unchanged. The DaemonSet `cleanup` (with its DaemonSet-presence gating) is left intact; the staged cleanup actions are added alongside it and skip that gating since the JobSet workflow only schedules them on a real uninstall. Each stage has an idempotent skip check so reruns are safe: - install label / cleanup unlabel: short-circuit via the node label - cleanup remove-artifacts: skip when the install dir is already gone - cleanup revert-cri: skip the disruptive runtime restart when the CRI drop-ins are already absent (new cri_drop_in_present helper) Introduce a shared KATA_RUNTIME_LABEL constant and add rstest-based tests covering the subcommand-name -> Action mapping, rejection of unknown actions, and the visible/hidden help semantics. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 18:58:33 +02:00
Fabiano Fidêncio	d4205c7fcc	kata-deploy: build and publish the kata-deploy-job-dispatcher image Package and ship the dispatcher built in the previous commit so the job-mode Helm chart has an image to run. - Dockerfile.components: build kata-deploy and kata-deploy-job-dispatcher from the same rust-builder stage (one compile), and run fmt/clippy/ test for both crates. - job-dispatcher/Dockerfile: a minimal distroless/static image containing only the dispatcher binary and CA certs - it is an API client, so it needs nothing from the host. - local-build: kata-deploy-job-dispatcher becomes its own build component with its own static tarball (kata-deploy-static-kata-deploy-job-dispatcher.tar.zst); the shared rust-builder output is reused so the two components do not recompile the workspace locally. The payload script builds and pushes a separate "<kata-deploy registry>-job-dispatcher" image with the same tag scheme, and release.sh publishes its multi-arch manifest symmetrically. - CI: add kata-deploy-job-dispatcher to the build-kata-deploy-components matrices (its tarball is picked up by the existing kata-artifacts-* glob), and gate it in the kata-deploy rust static checks. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 18:58:33 +02:00
Fabiano Fidêncio	87d27e0cc8	kata-deploy-job-dispatcher: add generic per-node Job dispatcher Add a small, deployment-agnostic dispatcher binary that runs exactly one Kubernetes Job per selected node and paces the rollout, so callers get guaranteed per-node coverage without encoding the fan-out in Helm. Motivation: templating one Job per node into a Helm release does not scale (the release Secret hits etcd's 1 MiB limit and hooks run sequentially), and a single Indexed Job cannot guarantee per-node coverage when paced - the scheduler ignores completed pods when evaluating topology spread, so nodes get uneven numbers of pods. A tiny dispatcher that enumerates nodes live and creates node-pinned Jobs itself sidesteps both problems and keeps the Helm release O(1) in fleet size. The dispatcher: - enumerates target nodes live (explicit --nodes list or --node-selector label selector), paginating the API; - stamps out one Job per node from a YAML template, pinning it with nodeName and an owner label for server-side filtering; - keeps at most --parallelism Jobs in flight, refilling as they finish, and sets an OwnerReference to the owner Job so the per-node Jobs are garbage-collected with it; - is a plain API client (kube): it never touches the host, so it can run fully unprivileged. Node membership is resolved live on each run, not frozen at Helm template-render time: re-running the dispatcher (e.g. via `helm upgrade`) picks up nodes added since the last run and skips ones already done, as the per-node stages are idempotent. The dispatcher is one-shot, however - it does not watch the API, so nodes added while it is not running are only covered by the next run. job.rs holds the pure helpers (node-name sanitization, deterministic Job naming, template instantiation, status interpretation) with rstest unit tests; main.rs wires up the CLI and the fan-out loop. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 18:58:33 +02:00
Gregory Ling	d90178c179	runtime-rs: Fix queue_size of zero in block_rootfs Fix BlockRootfs to save the queue_size, num_queues, logical_sector_size, and physical_sector_size of the hypervisor's block device info in the BlockConfig passed to the vm Fixes #13210 Signed-off-by: Gregory Ling <17791817+glingy@users.noreply.github.com>	2026-06-12 18:24:50 +02:00
Zvonko Kaiser	a2ad9b458e	Merge pull request #13215 from stevenhorsman/docs/python-cve-fixes-12th-june-2026 fix: pin idna and pymdown-extensions to remediate CVEs	2026-06-12 12:18:03 -04:00
Fabiano Fidêncio	b2376f849c	Merge pull request #13203 from fidencio/topic/versions-bump-kernel versions: Bump kernel to 6.18.35	2026-06-12 17:37:55 +02:00
Fabiano Fidêncio	56da8097c2	Merge pull request #13204 from fidencio/topic/versions-bump-qemu versions: Bump QEMU to 11.0.1	2026-06-12 17:14:57 +02:00
Fabiano Fidêncio	110843d6e1	Merge pull request #13138 from manuelh-dev/mahuber/runt-rs-mem-file-removal runtime(-rs): remove file_mem_backend config option	2026-06-12 17:13:04 +02:00
stevenhorsman	3c3f754f3f	fix: pin idna and pymdown-extensions to remediate CVEs Pin idna to 3.15 and pymdown-extensions to 10.21.3 to address security vulnerabilities: - GHSA-65pc-fj4g-8rjx (idna, severity 6.9) - GHSA-62q4-447f-wv8h (pymdown-extensions, severity 4.3) - GHSA-r6h4-mm7h-8pmq (pymdown-extensions, severity 2.7) These dependencies were previously transitive and vulnerable. They are now explicitly pinned to secure versions. Generated-by: IBM Bob Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-06-12 13:28:58 +01:00
Markus Rudy	2e8f61a575	genpolicy: add missing probe fields This commit adds fields for readiness/liveness/startup probes that were missing so far, and adds probes to the ignored_fields test to ensure these stay supported. None of these fields has an influence on the generated policy, they just allow parsing valid k8s yaml. Co-authored-by: Spyros Seimenis <sse@edgeless.systems> Signed-off-by: Markus Rudy <mr@edgeless.systems>	2026-06-12 13:20:16 +02:00
Hyounggyu Choi	edead9e97b	Merge pull request #13189 from stevenhorsman/osv-scanner-refactor workflows: refactor osv-scanner workflows	2026-06-12 12:04:12 +02:00
Fabiano Fidêncio	e758f4b280	Merge pull request #13202 from gkurz/fix-generate-vendor generate_vendor: Fix heavily broken logic	2026-06-12 11:48:50 +02:00
Fabiano Fidêncio	a016fd0485	Merge pull request #13198 from fidencio/topic/fix-ci-tee-static-sizing-overhead tests: raise k8s memory/QoS pod limits for TEE runtime-rs CI	2026-06-12 11:46:56 +02:00
Fabiano Fidêncio	723f74e782	Merge pull request #13209 from fidencio/topic/fix-kata-monitor-runc-pod-runtime tests: launch kata-monitor runc workload with explicit runtime	2026-06-12 11:40:19 +02:00
Fabiano Fidêncio	54bb736ab4	Merge pull request #13193 from BbolroC/set-nightly-for-qemu-coco-dev-runtime-rs-on-s390x GHA: Set nightly/dev builds for qemu-coco-dev-runtime-rs on s390x	2026-06-12 11:21:33 +02:00
Greg Kurz	eac5dd2907	generate_vendor: Fix heavily broken logic While checking the content of the vendor tarball artifact in the 3.31.0 release page, I realized that it is lacking most of the rust code and all the go code. It turns out that the script is badly broken in many ways : 1. Cargo workspace conflicts: Vendored dependencies were treated as workspace members, causing "current package believes it's in a workspace when it's not" errors. Fixed by adding vendor directory exclusions to root Cargo.toml. 2. Missing Go vendoring: Script only searched for Cargo.lock files, never processing go.mod files despite having a case statement for them. Fixed by adding go.mod to the find command with '-o -name go.mod'. 3. Wrong tar execution directory: Script ran tar from release/ directory but vendor_dir_list contained paths relative to repo root (./vendor, ./src/agent/vendor, etc.), causing "Cannot stat" errors. Fixed by moving tar command before final popd. 4. Relative tarball path: Since tar now runs from repo root, converted tarball path to absolute to ensure it's created in the release directory. 5. Vendored go.mod pollution: Added '-path ./vendor -prune' to find command to exclude vendor directory, preventing the script from finding go.mod files inside vendored Rust dependencies. The fixes are simple enough they can be squashed into a single commit. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2026-06-12 10:06:53 +02:00
Fupan Li	9553614f32	Merge pull request #12772 from Apokleos/nydus-standalone runtime-rs: Nydus standalone mode support in runtime-rs	2026-06-12 10:36:17 +08:00
Manuel Huber	70d8f1bf3d	runtime: remove file_mem_backend config option Remove the Go runtime file_mem_backend and valid_file_mem_backends config knobs, along with the corresponding sandbox annotation handling. The runtime still enables file-backed shared memory automatically for virtio-fs by using /dev/shm as the backing directory. This only removes the user-selectable backend path. Signed-off-by: Manuel Huber <manuelh@nvidia.com> Assisted-by: OpenAI Codex <codex@openai.com>	2026-06-12 00:07:16 +00:00
Manuel Huber	86fd65271c	runtime-rs: remove file_mem_backend config option While the config knob is being parsed, it is being unused in the rust shim. This renders the config knob useless. Remove the file_mem_backend config option as there is no current users for it. As this option is being usable in the go shim, we leave it intact. For the rust shim, /dev/shm is still being used in a similar way to the go shim when filesystem sharing is enabled (virtio-fs). Future use cases where other file_mem_backends are being utilized are currently planning to define these backends in a similar manner: based on the configuration/platform, determine the proper file memory backend, but do not let end users determine the file memory backend. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-06-12 00:07:16 +00:00
Fabiano Fidêncio	b323697f37	Merge pull request #13111 from Apokleos/monitor-disk-usage Metrics: Add support for monitoring disk usage via statfs	2026-06-12 00:41:31 +02:00
Fabiano Fidêncio	780c242bfd	Merge pull request #12832 from Apokleos/indep-iothreads runtime-rs: Add support Independent iothreads	2026-06-12 00:24:41 +02:00
Fabiano Fidêncio	cda6c8c6e0	tests: raise k8s memory/QoS pod limits for TEE runtime-rs CI Increase memory request/limit values used by k8s memory and QoS integration workloads so SNP/TDX static-sized sandboxes boot reliably under the new sizing defaults. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-06-11 22:03:36 +02:00
Fabiano Fidêncio	17b9cdec1c	versions: Bump kernel to 6.18.35 Bump to the latest LTS. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-06-11 22:02:12 +02:00
Fabiano Fidêncio	46add95802	versions: Bump QEMU to 11.0.1 Bump QEMU to its latest release. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-06-11 22:01:26 +02:00
Fabiano Fidêncio	9e597d33f2	tests: launch kata-monitor runc workload with explicit runtime The kata-monitor negative test creates a non-kata pod and asserts it does not appear in the kata-monitor cache (built from /run/vc/sbs, where only kata sandboxes register). However, the workload was started without a runtime handler, so it used containerd's default runtime, which in the CI containerd config is set to kata, so the "runc" pod was actually launched as a kata sandbox, registered under /run/vc/sbs, and tripped the assertion ("cache: got runc pod ..."). Start the workload with an explicit runc handler (configurable via RUNC_RUNTIME) so it is a genuine runc sandbox that never touches /run/vc/sbs. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-06-11 21:59:53 +02:00
Alex Lyn	1034d7fc46	tests: Add support nydus tests for qemu-runtime-rs and clh-runtime-rs This commit is to enable qemu-runtime-rs/clh-runtime-rs and make it compatiable with qemu-runtime-rs and clh-runtime-rs. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 21:42:48 +02:00
Alex Lyn	e21621140f	ci: Add qemu-runtime-rs and clh-runtime-rs test with nydus It aims to enable nydus tests for qemu-runtime-rs and clh-runtime-rs. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 21:42:48 +02:00
Alex Lyn	4eb7512e7b	docs: Update how-to guide for virtio-fs-nydus with runtime-rs Add comprehensive documentation for using virtio-fs-nydus shared filesystem with Kata Containers. This guide covers: (1) Clarify configuration options for virtio-fs-nydus and nydus image preparation and usage. (2) Update daemon configuration and lifecycle management and introduce standalone, inline nydus architecture. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 21:42:48 +02:00
Alex Lyn	fa84eecd2d	runtime-rs: Implement ShareVirtioFsNydus for standalone mode Introduce `ShareVirtioFsNydus` to enable standalone Nydus rootfs support. This implementation acts as the bridge between runtime-rs and the external `nydusd` daemon. Key Capabilities: (1) Trait Implementation: Implements `ShareFs` (for VM device/storage) and `NydusShareFs` (for RAFS lifecycle) traits. (2) Daemon Lifecycle Management: Handles `nydusd` spawning, supervision, and graceful shutdown. (3) Native Overlay Support: Configures `nydusd` with `passthrough_fs` backend to provide native overlay (upperdir/workdir) support. (4) API Integration: Utilizes `NydusClient` for granular control over RAFS mount/umount operations. (5) QEMU Integration: Enables `virtio-fs-nydus` device support, facilitating standalone mode execution. This implementation allows Kata containers to utilize an external `nydusd` process for Nydus rootfs management, providing a cleaner separation between the runtime and the Nydus daemon lifecycle. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 21:42:48 +02:00
Alex Lyn	edfe9ea403	runtime-rs: refine ShareFs abstraction with lifecycle and Nydus traits Refactor the `ShareFs` trait to improve modularity and support standalone Nydus mode: (1) Added `stop()` method to manage daemon teardown. (2) Introduced a dedicated trait for Nydus-specific data-plane operations. This refactoring cleans up the `ShareFs` trait by consolidating daemon lifecycle handling and isolating Nydus-specific extensions, paving the way for cleaner standalone Nydus implementation. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 21:42:48 +02:00
Alex Lyn	720a8688b4	runtime-rs: Add daemon manager for nydusd process lifecycle Implement Nydusd to manage nydusd daemon process: (1) start: spawn process, validate paths, wait for API ready, setup passthrough fs. (2) stop: kill process, cleanup socket files. (3) mount_rafs/mount_rafs_with_overlay: high-level filesystem mount operations. (4) build_args: construct virtiofs mode command line arguments. This provides process lifecycle management with internal NydusClient Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 21:42:48 +02:00
Alex Lyn	c1ebf269f7	runtime-rs: Add nydus client for nydusd API communication via HTTP Implement NydusClient to interact with nydusd daemon via Unix socket: (1) check_status: query daemon state via GET /api/v1/daemon. (2) mount/umount: manage filesystem mounts via POST/DELETE /api/v1/mount. (3) wait_until_ready: poll daemon until RUNNING state. This provides a lightweight, stateless HTTP client layer for nydusd API. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 21:42:48 +02:00
Alex Lyn	4c63b8e3de	agent: handle ENOSYS in overlayfs storage handler In standalone nydusd mode with virtio-fs passthrough, the guest-side mkdir may fail with ENOSYS. Update the overlayfs storage handler to skip directory creation when the directory already exists, logging a warning instead of failing. This ensures container rootfs setup succeeds when nydusd's native overlay manages the directory structure. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 21:25:18 +02:00
Alex Lyn	8eb564dfb8	kata-sys-util: handle ENOSYS gracefully in mount destination creation When using virtio-fs with nydusd's passthrough_fs, mkdir operations may return ENOSYS on certain filesystem configurations. This causes mount destination creation to fail unexpectedly. Handle ENOSYS errors gracefully alongside AlreadyExists by verifying the directory exists after the failed mkdir attempt, allowing the mount to proceed if the directory is already present. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 21:25:18 +02:00

1 2 3 4 5 ...

19365 Commits