kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-07-02 07:02:16 +00:00

Author	SHA1	Message	Date
Fabiano Fidêncio	5d3e1e6396	kata-deploy: verify kata-runtime label remains stable on rke2/k3s The retry loop added in `efd468df3f` still allows the install to declare success while inside the kubelet's post-restart re-register window. On rke2/k3s, `systemctl restart rke2-agent` restarts both containerd and the kubelet, but `wait_till_node_is_ready` polls `.status.conditions[Ready]` every 2 s and returns on the first `True` observation it sees. By default the kubelet only publishes node status every ~10 s, so that first `True` is almost always the stale value from before the restart — the kubelet hasn't actually finished restarting yet. `label_node_with_retry` then applies the label, sleeps 1 s, reads back "true" (still stale, kubelet still down), and returns Ok. Install completes, `/readyz` flips to 200, helm releases its `--wait`, and the bats test starts — and only then does the kubelet finish coming up, re-register the node, and clobber the label with its cached set. The lifecycle test sees an empty `katacontainers.io/kata-runtime` and fails: # Node label katacontainers.io/kata-runtime: not ok 1 Kata artifacts are present on host after install A single-shot verification can't distinguish "still stale true" from "truly stable true after kubelet re-register". Replace it with a stability window: after (re)applying the label, require it to remain at the expected value for STABILITY_CHECKS=6 consecutive observations spaced CHECK_INTERVAL=2 s apart (≈ 12 s — comfortably more than the kubelet's status-update period). If the value ever drifts inside the window, re-apply and restart the stability counter. Bounded by MAX_APPLY_ATTEMPTS=12, so worst case is ~3 min; happy path adds ~12 s to install. Also add a short polling loop to the test's own label assertion as belt-and-suspenders for any leftover transient race, matching the existing retry pattern used for the container-runtime version check. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-22 11:53:18 +02:00
Fabiano Fidêncio	efd468df3f	kata-deploy: retry node labeling after CRI restart On rke2/k3s a CRI restart also restarts the kubelet, which may briefly re-register the node with its cached label set and clobber the kata-runtime label that was just applied via the API. Replace the single label_node call with a retry loop that verifies the label value after setting it. If the label is missing or has the wrong value, it is re-applied (up to 10 attempts with 2 s back-off). This fixes a race condition that became more visible after the switch to individual tarball extraction, which made install take slightly longer and shifted the kubelet re-registration timing window. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-20 20:52:36 +02:00
Fabiano Fidêncio	291e4d37be	kata-deploy: implement selective tarball extraction in installer Add zstd and tar as Rust dependencies and rewrite the artifact installation logic to extract only the component tarballs required by the enabled runtime classes. extract_component_tarballs reads shim-components.json to determine which kata-static-<name>.tar.zst files are needed for the selected shims and current architecture. Shared components (e.g. kernel, shim-v2-go) are listed by multiple shims and must only be unpacked once per install run. Deduplication is handled with an in-memory set passed through the call, avoiding any risk of stale on-disk state surviving across pod restarts. Within each tarball, opt/kata path prefixes are stripped and absolute symlink / hard-link targets are rewritten to point at the resolved installation directory, correctly handling MULTI_INSTALL_SUFFIX. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-20 20:52:36 +02:00
Fabiano Fidêncio	341a0d366c	kata-deploy: Fix containerd debug level path for config schema v4 Containerd 2.3 (config schema v4) uses the top-level [debug] table for log level configuration, not plugins."io.containerd.server.v1.debug" as was the case in the RC builds. Update containerd_debug_level_toml_path() to use .debug.level for all schema versions, matching the released containerd behavior. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-10 12:02:24 +02:00
Fabiano Fidêncio	9e99b21ec5	kata-deploy: re-exec into a tiny post-install waiter After install completes the kata-deploy DaemonSet pod has nothing else to do for the rest of its lifetime — it just blocks on SIGTERM and then runs cleanup. Up to here, the install path has built up substantial peak heap (kube clients, deserialised Node/RuntimeClass objects, hyper + rustls TLS pools, parsed JSON / YAML), and on musl essentially none of that is ever returned to the kernel. Idling in the same process therefore pins the pod's RSS at the install peak indefinitely. Re-exec the binary into a hidden `internal-post-install-wait` action the moment install succeeds. execve(2) discards the entire address space, so the waiter starts up holding only the working set it actually needs (a config struct, the SIGTERM handler, and the health server). To avoid a probe-availability gap during the handover the install process clears FD_CLOEXEC on the health listener and passes the raw FD to the child via KATA_DEPLOY_HEALTH_FD. The child reattaches the FD as a tokio TcpListener and resumes serving /healthz and /readyz without ever closing the socket — the kubelet sees no failure. The detected container runtime is similarly threaded through KATA_DEPLOY_DETECTED_RUNTIME so the waiter doesn't have to re-query the apiserver. The new action is tagged `#[clap(hide = true)]` so `--help` doesn't expose it; users should never invoke it directly. Add the FD-inheritance helpers in health.rs: - prepare_listener_for_exec(): clears FD_CLOEXEC on a listener and returns its raw fd number. - listener_from_inherited_fd(): wraps an inherited fd back into a tokio::net::TcpListener (and re-sets FD_CLOEXEC so future host shellouts don't leak the socket). Fixes: https://github.com/kata-containers/kata-containers/discussions/12976 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-05-07 13:40:55 +02:00
Fabiano Fidêncio	af03ab2228	kata-deploy: replace JSONPath node lookups with typed accessors The two pieces of node metadata kata-deploy actually reads are .status.nodeInfo.containerRuntimeVersion and a single label, both of which were being fetched through a homegrown JSONPath walker: - get_node_field() serialised the entire Node object back into a serde_json::Value tree on every call, - split_jsonpath() / get_jsonpath_value() then walked that tree by string key. Both the deep clone and the helpers themselves are unnecessary — kube's Node type is already strongly typed. Replace get_node_field() with two purpose-built accessors that read straight off the Node struct: - get_container_runtime_version(): pulls status.node_info.container_runtime_version with a clear error if the field isn't populated. - get_node_label(key): returns Option<String> directly from metadata.labels. Drop split_jsonpath, get_jsonpath_value, and their unit tests (which existed only to cover the JSONPath walker we no longer have). Update the three callers (config.rs, runtime/manager.rs, runtime/containerd.rs) to use the typed accessors. This removes the entire serde_json::Value clone-and-walk path from the hot read path and meaningfully cuts allocator churn during install. Fixes: https://github.com/kata-containers/kata-containers/discussions/12976 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-05-07 13:40:55 +02:00
Fabiano Fidêncio	6cd842494c	kata-deploy: cap the tokio worker pool to 2 threads The default #[tokio::main] expands with flavor = "multi_thread" and worker_threads = num_cpus::get(). On a typical NVIDIA GPU node (200+ vCPUs) that allocates 200+ worker threads with ~2 MiB stacks each, which is the single largest contributor to the DaemonSet pod's VmData reservation — hundreds of MiB of address space mapped but never touched, easily reproducing the "kata-deploy is using ~400 MB" reports on any monitoring layer that surfaces VSZ / committed virtual memory. Switch to a fixed two-worker multi-thread runtime instead: #[tokio::main(flavor = "multi_thread", worker_threads = 2)] Two workers is exactly the right number for kata-deploy: - the install path is overwhelmingly I/O-bound and runs serially; one worker is enough to drive the install future itself, - install does shell out to `nsenter --target 1 systemctl restart containerd` (and friends) via the synchronous std::process:: Command::output(), which wedges the worker thread it runs on for tens of seconds; the second worker keeps the spawned health-server task able to answer kubelet probes inside timeoutSeconds while the first is blocked. flavor = "current_thread" would be tighter still on stacks (~4 MiB saved) but is fundamentally unsafe here: with a single runtime thread, any blocking host_systemctl call freezes the health server too, the kubelet fails the readiness probe, and the pod is restarted long before install completes. The CI lifecycle test reliably reproduces this as a 15-minute timeout waiting for the kata-deploy DaemonSet pod to become Ready. Net result vs. upstream's num_cpus()-driven pool on a 200-vCPU node: ~200 fewer worker threads, ~400 MiB less VmData reservation, while keeping kubelet probes responsive across the entire install path. Add the "sync" tokio feature here too so subsequent commits in the series can use tokio::sync primitives (OnceCell) without another features bump. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-05-07 13:40:55 +02:00
Fabiano Fidêncio	346119108e	kata-deploy: drop unused kube features The binary doesn't use kube::runtime (controllers, watchers, reflectors) or kube::derive (the CustomResource macro). Pulling them in only added transitive deps (kube-runtime, kube-derive, backon, educe, ahash, async-broadcast, ...) and inflated the binary's static data segment for no functional gain. Set default-features = false and select only what the binary actually calls into: the kube-client surface plus the rustls-tls backend that hyper-rustls already pulled in transitively. Behaviour is unchanged. Fixes: https://github.com/kata-containers/kata-containers/discussions/12976 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-05-07 13:40:55 +02:00
Fabiano Fidêncio	1682b73e38	kata-deploy: Add qemu-nvidia-gpu-tdx-runtime-rs shim Register the new qemu-nvidia-gpu-tdx-runtime-rs shim across the kata-deploy stack so it is built, installed, and exposed as a RuntimeClass. This adds the shim to the Rust binary's RUST_SHIMS list (so it uses the runtime-rs binary), SHIMS list, the qemu-tdx-experimental share name mapping, and the x86_64 default shim set. The Helm chart gets the new shim entry in values.yaml, try-kata-nvidia-gpu.values.yaml, and the RuntimeClass overhead definition in runtimeclasses.yaml. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-07 10:33:26 +02:00
Fabiano Fidêncio	2280620cb9	kata-deploy: Add qemu-nvidia-gpu-snp-runtime-rs shim Register the new qemu-nvidia-gpu-snp-runtime-rs shim across the kata-deploy stack so it is built, installed, and exposed as a RuntimeClass. This adds the shim to the Rust binary's RUST_SHIMS list (so it uses the runtime-rs binary), SHIMS list, the qemu-snp-experimental share name mapping, and the x86_64 default shim set. The Helm chart gets the new shim entry in values.yaml, try-kata-nvidia-gpu.values.yaml, and the RuntimeClass overhead definition in runtimeclasses.yaml. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-07 10:33:26 +02:00
Fabiano Fidêncio	92a8cd56d1	kata-deploy: Add qemu-nvidia-gpu-runtime-rs shim Register the Rust NVIDIA GPU runtime as a kata-deploy shim so it gets installed and configured alongside the existing Go-based qemu-nvidia-gpu shim. Add qemu-nvidia-gpu-runtime-rs to the RUST_SHIMS list and the default enabled shims, create its RuntimeClass entry in the Helm chart, and include it in the try-kata-nvidia-gpu values overlay. The kata-deploy installer will now copy the runtime-rs configuration and create the containerd runtime entry for it. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-07 10:33:26 +02:00
Fabiano Fidêncio	27c3dfbb8c	Merge pull request #12943 from fidencio/topic/kata-deploy-add-http-health-probes kata-deploy: add HTTP health probes (healthz/readyz)	2026-05-05 09:30:17 +02:00
Fabiano Fidêncio	49396b7991	kata-deploy: add HTTP health probes (healthz/readyz) The kata-deploy DaemonSet pod had no Kubernetes health probes, so the kubelet could not distinguish between "still installing" and "crashed", and rolling updates would proceed to the next node before install actually finished. Add a lightweight HTTP health server (built on raw tokio TcpListener, no new crate dependencies) that starts immediately in the install path: /healthz — liveness: returns 200 as soon as the server binds /readyz — readiness: returns 503 while installing, 200 after install completes (artifacts extracted, CRI restarted, node labeled) Wire the Helm chart with startup, liveness, and readiness probes (all individually toggleable). The startup probe allows up to 10 minutes for install to complete before the liveness probe takes over. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-03 22:09:08 +02:00
Saul Paredes	cbb06545f7	kata-deploy: configure_mariner: also apply test config to runtime-rs Apply same test configs we use in runtime-go config to runtime-rs config. These are: - runtime.static_sandbox_resource_mgmt = true - hypervisor.clh.valid_hypervisor_paths includes cloud-hypervisor-glibc - hypervisor.clh.path = cloud-hypervisor-glibc Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2026-05-01 08:15:52 -07:00
Saul Paredes	564d381b79	kata-deploy: configure_mariner: correctly set static_sandbox_resource_mgmt static_sandbox_resource_mgmt is under the runtime config, not the hypervisor one. See `31f7438ecd/src/runtime/config/configuration-clh.toml.in (L439)` Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2026-05-01 08:15:52 -07:00
Fabiano Fidêncio	96b68e77a7	kata-deploy: support containerd config schema version 4 and newer Containerd 2.3.0 introduces config schema version 4 (see upstream RELEASES.md and the version-4 server-plugin documentation). The default file still uses the same split-CRI layout as version 3 (plugins under io.containerd.cri.v1.runtime and io.containerd.cri.v1.images). Schema v4 mainly moves gRPC, TTRPC, debug, and metrics listener settings under io.containerd.server.v1.*; kata-deploy does not edit those server tables except for containerd log verbosity when DEBUG=true. Fixes: #12936 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-30 16:23:43 +02:00
Aurélien Bombo	dc0f1795de	kata-deploy: remove useless unit tests These essentially merely test format!(), which is not our job. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-04-28 10:58:01 -05:00
Aurélien Bombo	cf6a91a104	runtime-rs/config: rename cloud-hypervisor to clh This aligns on the previous commit and runtime-go. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-04-28 10:58:01 -05:00
Aurélien Bombo	e4fbddb91a	ci: rename cloud-hypervisor to clh-runtime-rs This aligns on qemu-runtime-rs and makes more sense. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-04-28 10:58:01 -05:00
stevenhorsman	9d2bb4518f	kata-deloy: Update MSRV to match workspace Update the kata-deploy Cargo.toml to use the workspace wide MSRV, so it's easy to track and bump as and when necessary. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-25 11:27:39 +01:00
stevenhorsman	9fbdf513ca	kata-deploy: Delete Cargo.lock In #12776 kata-deploy's binary was moved to the main cargo workspace, but the Cargo.lock wasn't deleted. As it shares the main Cargo.lock tidy this up. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-20 17:09:21 +01:00
Fabiano Fidêncio	d6f0b15578	ci: erofs: restrict to runtime-rs only The erofs snapshotter configuration is node-wide (a single containerd drop-in) and cannot be split per runtime handler. The Go runtime does not support fsmerged EROFS — it rejects fsmeta.erofs mount sources with "unsupported mount source" — so erofs is only usable with runtime-rs. Drop qemu-coco-dev (Go) from the erofs CI matrix and add a check in kata-deploy's configure_erofs_snapshotter() that inspects the SNAPSHOTTER_HANDLER_MAPPING: if any Go shim is explicitly mapped to erofs, emit a prominent warning and bail out with a clear error telling the operator to fix the mapping. Since all shims are now guaranteed to be runtime-rs when erofs is active, remove the conditional is_rust_shim gating and always emit the full erofs configuration (differ options, default_size, max_unmerged_layers=1). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-19 13:24:31 +02:00
Alex Lyn	7f7cca16fa	kata-deploy: Complete containerd config for erofs snapshotter Add missing containerd configuration items for erofs snapshotter to enable fsmerged erofs feature: Add snapshotter plugin configuration: - default_size: "10G" # can be customized - max_unmerged_layers: 1 # Fixed with 1 These configurations align with the documentation in docs/how-to/how-to-use-fsmerged-erofs-with-kata.md Step 2, ensuring the CI workflow run-k8s-tests-coco-nontee-with-erofs-snapshotter can properly configure containerd for erofs fsmerged rootfs. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-04-19 13:24:31 +02:00
Fabiano Fidêncio	588a67a3fb	kata-deploy: add arm64 support for qemu-coco-dev shims Add aarch64/arm64 to the list of supported architectures for qemu-coco-dev and qemu-coco-dev-runtime-rs shims across kata-deploy configuration, Helm chart values, and test helper scripts. Note that guest-components and the related build dependencies are not yet wired for arm64 in these configurations; those will be addressed separately. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-18 00:48:13 +02:00
Fabiano Fidêncio	64c139208f	agent: add GetDiagnosticData RPC with termination log support Add a new extensible GetDiagnosticData RPC that retrieves diagnostic information from the guest VM. The request carries a log_type string field to specify what kind of data is requested, and a container_id field to identify the target container. The first supported log_type is "termination_log", which reads the Kubernetes termination message file from inside the guest. This is needed for shared_fs=none configurations where the host cannot directly access the guest filesystem. On the Go runtime side, the container stop() path now calls GetDiagnosticData to copy the termination message to the host when running with NoSharedFS and the terminationMessagePolicy annotation is set to "File". The call is best-effort: failures are logged as warnings rather than blocking container teardown. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Silenio Quarti <silenio_quarti@ca.ibm.com>	2026-04-17 13:01:13 +02:00
Fabiano Fidêncio	df1d02d3cf	kata-deploy: Allow overriding containerd config path and file name Add two new Helm values under `containerd`: - `configDir`: overrides the host directory where the containerd config lives, taking precedence over the k8sDistribution-based auto-detection. - `configFileName`: overrides the containerd config file name, propagated to the kata-deploy binary via the new CONTAINERD_CONFIG_FILE_NAME environment variable. These are useful for non-standard containerd setups that don't match any of the built-in k8sDistribution presets (k8s, k3s, rke2, k0s, microk8s). The config file name override only affects the default runtime branch in get_containerd_paths(). The k0s/microk8s/k3s/rke2 branches are left untouched since those runtimes have mandatory file naming conventions. Also fixes a spurious leading space in the k3s containerdConfPath branch. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-13 22:31:55 +02:00
Fabiano Fidêncio	72fb41d33b	kata-deploy: Symlink original config to per-shim runtime copy Users were confused about which configuration file to edit because kata-deploy copied the base config into a per-shim runtime directory (runtimes/<shim>/) for config.d support, leaving the original file in place untouched. This made it look like the original was the authoritative config, when in reality the runtime was loading the copy from the per-shim directory. Replace the original config file with a symlink pointing to the per-shim runtime copy after the copy is made. The runtime's ResolvePath / EvalSymlinks follows the symlink and lands in the per-shim directory, where it naturally finds config.d/ with all drop-in fragments. This makes it immediately obvious that the real configuration lives in the per-shim directory and removes the ambiguity about which file to inspect or modify. During cleanup, the symlink at the original location is explicitly removed before the runtime directory is deleted. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-09 17:16:40 +02:00
Fabiano Fidêncio	21466eb4e5	kata-deploy: Fix clippy warnings across crate Fix all clippy warnings triggered by -D warnings: - install.rs: remove useless .into() conversions on PathBuf values and replace vec! with an array literal where a Vec is not needed - utils/toml.rs: replace while-let-on-iterator with a for loop and drop the now-unnecessary mut on the iterator binding - main.rs: replace match-with-single-pattern with if-let in two places dealing with experimental_setup_snapshotter - utils/yaml.rs: extract repeated serde_yaml::Value::String key into a local variable, removing needless borrows on temporary values Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-08 20:47:59 +02:00
Fabiano Fidêncio	f27def1a5b	kata-deploy: Skip snapshotter install/uninstall on CRI-O Snapshotters (nydus, erofs) are containerd-specific. The validation code already warned that EXPERIMENTAL_SETUP_SNAPSHOTTER would be ignored on CRI-O, but the actual install/configure and uninstall loops still ran unconditionally, attempting containerd-specific operations on CRI-O nodes. Guard both the install and cleanup snapshotter loops with a `runtime != "crio"` check so the binary itself skips snapshotter work when it detects CRI-O as the container runtime. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-08 14:41:49 +02:00
Alex Lyn	c00f895338	kata-deploy: Fix noisy caused by unformatted code When do cargo fmt --all, some files changes as unformatted with `cargo fmt`. This commit is just to address it. Just use this as an example: ``` // Generate the common drop-in files (shared with standard // runtimes) - write_common_drop_ins(config, &runtime.base_config, &config_d_dir, container_runtime)?; + write_common_drop_ins( + config, + &runtime.base_config, + &config_d_dir, + container_runtime, + )?; ``` Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-04-08 14:39:57 +02:00
Fabiano Fidêncio	9e1f595160	kata-deploy: add Rust binary to root workspace Add tools/packaging/kata-deploy/binary as a workspace member, inherit shared dependency versions from the root manifest, and refresh Cargo.lock. Build the kata-deploy image from the repository root: copy the workspace layout into the rust-builder stage, run cargo test/build with -p kata-deploy, and adjust artifact and static asset COPY paths. Update the payload build script to invoke docker buildx with -f .../Dockerfile from the repo root. Add a repo-root .dockerignore to keep the Docker build context smaller. Document running unit tests with cargo test -p kata-deploy from the root. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-07 10:07:06 +08:00
Fabiano Fidêncio	b4b62417ed	kata-deploy: skip cleanup on pod restart to avoid crashing kata pods When a kata-deploy DaemonSet pod is restarted (e.g. due to a label change or rolling update), the SIGTERM handler runs cleanup which unconditionally removes kata artifacts and restarts containerd. This causes containerd to lose the kata shim binary, crashing all running kata pods on the node. Fix this by implementing a three-stage cleanup decision: 1. If this pod's owning DaemonSet still exists (exact name match via DAEMONSET_NAME env var), this is a pod restart — skip all cleanup. The replacement pod will re-run install, which is idempotent. 2. If this DaemonSet is gone but other kata-deploy DaemonSets still exist (multi-install scenario), perform instance-specific cleanup only (snapshotters, CRI config, artifacts) but skip shared resources (node label removal, CRI restart) to avoid disrupting the other instances. 3. If no kata-deploy DaemonSets remain, perform full cleanup including node label removal and CRI restart. The Helm chart injects a DAEMONSET_NAME environment variable with the exact DaemonSet name (including any multi-install suffix), ensuring instance-aware lookup rather than broadly matching any DaemonSet containing "kata-deploy". Fixes: #12761 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-01 15:20:52 +02:00
Fabiano Fidêncio	28414a614e	kata-deploy: detect k3s/rke2 via systemd services instead of version string Newer k3s releases (v1.34+) no longer include "k3s" in the containerd version string at all (e.g. "containerd://2.2.2-bd1.34" instead of "containerd://2.1.5-k3s1"). This caused kata-deploy to fall through to the default "containerd" runtime, configuring and restarting the system containerd service instead of k3s's embedded containerd — leaving the kata runtime invisible to k3s. Fix by detecting k3s/rke2 via their systemd service names (k3s, k3s-agent, rke2-server, rke2-agent) rather than parsing the containerd version string. This is more robust and works regardless of how k3s formats its containerd version. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-01 14:24:55 +02:00
Fabiano Fidêncio	4fad88499c	kata-deploy: rename nydus-snapshotter to nydus-for-kata-tee Rename all host-visible names of the nydus-snapshotter instance managed by kata-deploy from the generic "nydus-snapshotter" to "nydus-for-kata-tee". This covers the systemd service name, the containerd proxy plugin key, the runtime class snapshotter field, the data directory (/var/lib/nydus-for-kata-tee), the socket path (/run/nydus-for-kata-tee/), and the host install subdirectory. The rename makes it immediately clear that this nydus-snapshotter instance is the one deployed and managed by kata-deploy specifically for Kata TEE use cases, rather than any general-purpose nydus-snapshotter that might be present on the host. Because the old code operated under a completely separate set of paths (nydus-snapshotter.*), any previously deployed installation continues to run without interference during the transition to this new naming. CI pipelines and operators can upgrade kata-deploy on their own schedule without having to coordinate an atomic cutover: the old service keeps serving its existing workloads until it is explicitly replaced, and the new deployment lands cleanly alongside it. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-03-27 11:14:54 +01:00
Fabiano Fidêncio	fb5482f647	kata-deploy: nydus: never remove the data directory Removing /var/lib/nydus-snapshotter during install or uninstall creates a split-brain state: the nydus backend starts empty while containerd's BoltDB (meta.db) still holds snapshot records from the previous run. Any subsequent image pull then fails with: "unable to prepare extraction snapshot: target snapshot \"sha256:...\": already exists" An earlier attempt cleaned up containerd's BoltDB via `ctr snapshots rm` before wiping the directory, but that cleanup is inherently fragile: - It requires the nydus gRPC service to be reachable at cleanup time. If the service is stopped, crashed, or not yet running, every `ctr` call silently fails and the stale records remain. - Any workload still actively using a snapshot blocks the entire cleanup, making it impossible to guarantee a clean state. The correct invariant is that meta.db and the nydus backend always agree. Preserving the data directory unconditionally guarantees this: - Fresh install: data directory does not exist, nydus starts empty. - Reinstall: existing snapshots and nydus.db are preserved, meta.db and backend remain in sync, new binary starts cleanly. - After uninstall: containerd is reconfigured without the nydus proxy_plugins entry and restarted, so the snapshot records in meta.db are completely dormant — nothing will use them. If nydus is reinstalled later, the data directory is still present and both sides remain in sync, so no split-brain can occur. Any stale snapshots from previous workloads are garbage-collected by containerd once the images referencing them are removed. This also removes the cleanup_containerd_nydus_snapshots, cleanup_nydus_snapshots, and cleanup_nydus_containers helpers that were introduced by the earlier (fragile) attempt. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-03-25 07:06:41 +01:00
Fabiano Fidêncio	fd583d833b	kata-deploy: nydus: clean containerd metadata before wiping backend When /var/lib/nydus-snapshotter is removed, containerd's BoltDB (meta.db at /var/lib/containerd/) still holds snapshot records for the nydus snapshotter. On the next install these stale records cause image pulls to fail with: "unable to prepare extraction snapshot: target snapshot \"sha256:...\": already exists" The failure path in core/unpack/unpacker.go: 1. sn.Prepare() → metadata layer finds the target chainID in BoltDB → returns AlreadyExists without touching the nydus backend. 2. sn.Stat() → metadata layer finds the BoltDB record, then calls s.Snapshotter.Stat(bkey) on the nydus gRPC backend → NotFound (backend was wiped). 3. The unpacker treats NotFound as a transient key-collision race and retries 3 times; all 3 attempts hit the same dead end, and the pull is aborted. The commit message of `62ad0814c` ("nydus: Always start from a clean state") assumed "containerd will re-pull/re-unpack when it finds non- existent snapshots", but that is not what happens: the metadata layer intercepts the Prepare call in BoltDB before the backend is ever consulted. Fix: call cleanup_containerd_nydus_snapshots() before stopping the nydus service (and thus before wiping its data directory) in both install_nydus_snapshotter and uninstall_nydus_snapshotter. The cleanup must run while the service is still up because ctr snapshots rm goes through the metadata layer which calls the nydus gRPC backend to physically remove the snapshot; if the service is already stopped the backend call fails and the BoltDB record remains. The cleanup: - Discovers all containerd namespaces via `ctr namespaces ls -q` (falls back to k8s.io if that fails). - Removes containers whose Snapshotter field matches the nydus plugin name; these become dangling references once snapshots are gone and can confuse container reconciliation after an aborted CI run. - Removes snapshots round by round (leaf-first) until either the list is empty or no progress can be made (see below). Note: containerd's GC cannot substitute for this explicit cleanup. The image record (a GC root) references content blobs which reference the snapshots via gc.ref labels, keeping the entire chain alive in the GC graph even after the nydus backend is wiped. Snapshot removal rounds ----------------------- Snapshot chains are linear: an image with N layers produces a chain of N snapshots, each parented on the previous. Only the current leaf can be removed each round, so N layers require exactly N rounds. There is no fixed round cap — the loop terminates when either the list reaches zero (success) or a round removes nothing at all (all remaining snapshots are actively in use by running workloads). Active workload safety ---------------------- If active workloads still hold nydus snapshots (e.g. during a live upgrade), no progress is made in a round and cleanup_nydus_snapshots returns false. Both install_nydus_snapshotter and uninstall_nydus_snapshotter gate the fs::remove_dir_all on that return value: - true → proceed as before: stop service, wipe data dir. - false → stop service, skip data dir removal, log a warning. The new nydus instance starts on the existing backend state; running containers are left intact. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-03-24 16:44:25 +01:00
dependabot[bot]	8df9cf35df	build(deps): bump rustls-webpki in /tools/packaging/kata-deploy/binary Bumps [rustls-webpki](https://github.com/rustls/webpki) from 0.103.8 to 0.103.10. - [Release notes](https://github.com/rustls/webpki/releases) - [Commits](https://github.com/rustls/webpki/compare/v/0.103.8...v/0.103.10) --- updated-dependencies: - dependency-name: rustls-webpki dependency-version: 0.103.10 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>	2026-03-23 10:34:27 +00:00
Fabiano Fidêncio	7a08ef2f8d	kata-deploy: run cleanup on SIGTERM instead of preStop hook Move the cleanup logic from a preStop lifecycle hook (separate exec) into the main process's SIGTERM handler. This simplifies the architecture: the install process now handles its own teardown when the pod is terminated. The SIGTERM handler is registered before install begins, and tokio::select! races install against SIGTERM so cleanup always runs even if SIGTERM arrives mid-install (e.g. helm uninstall while the container is restarting after a failed install attempt). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-03-04 11:26:31 +01:00
Fabiano Fidêncio	01895bf87e	kata-deploy: use k3s/rke2 drop-in Check the rendered containerd config for the versioned drop-in dir import (config.toml.d or config-v3.toml.d) and bail with a clear error if it is missing. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-03-04 11:08:26 +01:00
Steve Horsman	b147cb1319	Merge pull request #12587 from fidencio/topic/runtime-add-configurable-kubelet-root-dir runtimes: add configurable kubelet root dir	2026-02-28 19:06:14 +00:00
Fabiano Fidêncio	330bfff4be	kata-deploy: Fix nydus snapshotter config (on v3 config version) On containerd v3 config, disable_snapshot_annotations must be set under the images plugin, not the runtime plugin. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-27 18:20:30 +01:00
Fabiano Fidêncio	0a73638744	runtime: add configurable kubelet root dir Different kubernetes distributions, such as k0s, use a different kubelet root dir location instead of the default /var/lib/kubelet, so ConfigMap and Secret volume propagation were failing. This adds a kubelet_root_dir config option that the go runtime uses when matching volume paths and kata-deploy now sets it automatically for k0s via a drop-in file. runtime-rs does not need this option: it identifies ConfigMap/Secret, projected, and downward-api volumes by volume-type path segment (kubernetes.io~configmap, etc.), not by kubelet root prefix. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-27 14:10:57 +01:00
stevenhorsman	82c27181d8	kata-deploy: Remove unused crates cargo machete has identified `serde` and `thiserror` as being unused, so remove them from Cargo.toml Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-26 09:38:35 +00:00
Fabiano Fidêncio	34336f87c7	kata-deploy: convert install.rs get_hypervisor_name tests to rstest Use rstest parameterized tests for QEMU variants, other hypervisors, and unknown/empty shim cases. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-18 12:41:55 +01:00
Fabiano Fidêncio	bb11bf0403	kata-deploy: preserve symlinks when installing artifacts When copying artifacts from the container to the host, detect source entries that are symlinks and recreate them as symlinks at the destination instead of copying the target file. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-18 12:29:14 +01:00
Fabiano Fidêncio	f0a0425617	kata-deploy: convert a few toml.rs tests to rstest Turn test_toml_value_types into a parameterized test with one case per type (string, bool, int). Merge the two invalid-TOML tests (get and set) into one rstest with two cases, and the two "not an array" tests into one rstest with two cases. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-17 09:33:39 +01:00
Fabiano Fidêncio	899005859c	kata-deploy: avoid leading/blank lines in written TOML config When writing containerd drop-in or other TOML (e.g. initially empty file), the serialized document could start with many newlines. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-17 09:33:39 +01:00
Fabiano Fidêncio	cfa8188cad	kata-deploy: convert containerd version support tests to rstest Replace multiple #[test] functions for snapshotter and erofs version checks with parameterized #[rstest] #[case] tests for consistency and easier extension. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-17 09:33:39 +01:00
Fabiano Fidêncio	cadac7a960	kata-deploy: runtime_platform -> runtime_platforms Fix runtime_platforms typo. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-17 09:33:39 +01:00
Fabiano Fidêncio	d8acc403c8	kata-deploy: set CRI images runtime_platform snapshotter for containerd v3 In containerd config v3 the CRI plugin is split into runtime and images, and setting the snapshotter only on the runtime plugin is not enough for image pull/prepare. The images plugin must have runtime_platform.<runtime>.snapshotter so it uses the correct snapshotter per runtime (e.g. nydus, erofs). A PR on the containerd side is open so we can rely on the runtime plugin snapshotter alone: https://github.com/containerd/containerd/pull/12836 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-13 22:15:02 +01:00

1 2

82 Commits