kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-07-01 22:50:54 +00:00

Author	SHA1	Message	Date
Alex Lyn	fa84eecd2d	runtime-rs: Implement ShareVirtioFsNydus for standalone mode Introduce `ShareVirtioFsNydus` to enable standalone Nydus rootfs support. This implementation acts as the bridge between runtime-rs and the external `nydusd` daemon. Key Capabilities: (1) Trait Implementation: Implements `ShareFs` (for VM device/storage) and `NydusShareFs` (for RAFS lifecycle) traits. (2) Daemon Lifecycle Management: Handles `nydusd` spawning, supervision, and graceful shutdown. (3) Native Overlay Support: Configures `nydusd` with `passthrough_fs` backend to provide native overlay (upperdir/workdir) support. (4) API Integration: Utilizes `NydusClient` for granular control over RAFS mount/umount operations. (5) QEMU Integration: Enables `virtio-fs-nydus` device support, facilitating standalone mode execution. This implementation allows Kata containers to utilize an external `nydusd` process for Nydus rootfs management, providing a cleaner separation between the runtime and the Nydus daemon lifecycle. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 21:42:48 +02:00
Alex Lyn	edfe9ea403	runtime-rs: refine ShareFs abstraction with lifecycle and Nydus traits Refactor the `ShareFs` trait to improve modularity and support standalone Nydus mode: (1) Added `stop()` method to manage daemon teardown. (2) Introduced a dedicated trait for Nydus-specific data-plane operations. This refactoring cleans up the `ShareFs` trait by consolidating daemon lifecycle handling and isolating Nydus-specific extensions, paving the way for cleaner standalone Nydus implementation. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 21:42:48 +02:00
Alex Lyn	720a8688b4	runtime-rs: Add daemon manager for nydusd process lifecycle Implement Nydusd to manage nydusd daemon process: (1) start: spawn process, validate paths, wait for API ready, setup passthrough fs. (2) stop: kill process, cleanup socket files. (3) mount_rafs/mount_rafs_with_overlay: high-level filesystem mount operations. (4) build_args: construct virtiofs mode command line arguments. This provides process lifecycle management with internal NydusClient Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 21:42:48 +02:00
Alex Lyn	c1ebf269f7	runtime-rs: Add nydus client for nydusd API communication via HTTP Implement NydusClient to interact with nydusd daemon via Unix socket: (1) check_status: query daemon state via GET /api/v1/daemon. (2) mount/umount: manage filesystem mounts via POST/DELETE /api/v1/mount. (3) wait_until_ready: poll daemon until RUNNING state. This provides a lightweight, stateless HTTP client layer for nydusd API. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 21:42:48 +02:00
Alex Lyn	4c63b8e3de	agent: handle ENOSYS in overlayfs storage handler In standalone nydusd mode with virtio-fs passthrough, the guest-side mkdir may fail with ENOSYS. Update the overlayfs storage handler to skip directory creation when the directory already exists, logging a warning instead of failing. This ensures container rootfs setup succeeds when nydusd's native overlay manages the directory structure. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 21:25:18 +02:00
Alex Lyn	8eb564dfb8	kata-sys-util: handle ENOSYS gracefully in mount destination creation When using virtio-fs with nydusd's passthrough_fs, mkdir operations may return ENOSYS on certain filesystem configurations. This causes mount destination creation to fail unexpectedly. Handle ENOSYS errors gracefully alongside AlreadyExists by verifying the directory exists after the failed mkdir attempt, allowing the mount to proceed if the directory is already present. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 21:25:18 +02:00
Alex Lyn	b50f803a4e	kata-types: add virtio-fs-nydus shared fs configuration support Add "virtio-fs-nydus" as a recognized shared filesystem type in the hypervisor configuration. This enables the standalone nydusd mode where nydusd runs as a separate process alongside virtiofsd. The key changes: (1) Add VIRTIO_FS_NYDUS constant for the new shared fs type. (2) Register virtio-fs-nydus in adjust() and validate() paths, reusing the same virtio-fs validation logic since both use vhost-user protocol Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 21:25:18 +02:00
Fabiano Fidêncio	8f5b898e6d	Merge pull request #13206 from stevenhorsman/fix-required-payload-name ci: Update required tests	2026-06-11 20:46:37 +02:00
stevenhorsman	fb4600d66a	runtime-rs: Fix test breakage In #13147, for some reason a test block was added in the middle of code and the code was stale when merged, which meant that a second `mod test` section was added, breaking our tests. Merge the two to fix this. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-06-11 19:03:33 +02:00
stevenhorsman	1d854ad7af	ci: Update required tests publish-kata-deploy-payload got renamed in #13107, which broke the CI. Now, instead of tracking all those intermediate steps, let's make sure we only track the tests themselves. Signed-off-by: stevenhorsman <steven@uk.ibm.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-06-11 19:02:23 +02:00
Fabiano Fidêncio	21657b9cd9	Merge pull request #13147 from manuelh-dev/mahuber/debug-go-rust runtime-rs: Honor enable_debug for logs and adjust debugging documentation	2026-06-11 08:57:36 +02:00
Fabiano Fidêncio	38416f78ec	Merge pull request #13190 from manuelh-dev/mahuber/fix-num-cpus-bats tests: fix k8s-number-cpus expectation	2026-06-10 21:59:21 +02:00
Steve Horsman	150c7648cf	Merge pull request #13197 from fidencio/topic/kata-monitor-tests-fixups tests: align kata-monitor containerd version selector	2026-06-10 15:12:43 +01:00
Fabiano Fidêncio	4935bf8bc6	tests: align kata-monitor containerd version selector Switch kata-monitor workflows from the deprecated "active" key to "latest" so CI resolves containerd versions from versions.yaml correctly after the key rename. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-06-09 21:25:45 +02:00
Hyounggyu Choi	6d2066b692	Merge pull request #13188 from BbolroC/set-static-resource-mgmt-properly-for-ibm-sel runtime*: use static_sandbox_resource_mgmt defaults for qemu-se	2026-06-09 18:38:09 +02:00
Fabiano Fidêncio	6b06bf4ba5	Merge pull request #13107 from kata-containers/topic/kata-monitor-ship-image-as-part-of-the-release kata-monitor: ship as a standalone multi-arch image starting with 3.32.0	2026-06-09 17:14:09 +02:00
Hyounggyu Choi	7cc6767fa2	runtime*: use static_sandbox_resource_mgmt defaults for qemu-se Switch qemu-se config templates to use the TEE/CoCo-specific static_sandbox_resource_mgmt defaults instead of the generic QEMU defaults. qemu-se-runtime-rs config now uses DEFSTATICRESOURCEMGMT_COCO while runtime qemu-se config now uses DEFSTATICRESOURCEMGMT_TEE. This aligns static sandbox resource management behavior with confidential container expectations for qemu-se variants. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2026-06-09 14:45:50 +02:00
Fabiano Fidêncio	620d641458	ci: rename kata-deploy publish jobs These jobs build and push the kata-deploy OCI image, so call them publish-kata-deploy-image-* instead of -payload-, matching the kata-monitor image jobs and making the workflow easier to read. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-06-09 14:33:30 +02:00
Fabiano Fidêncio	92a9691470	tests: add kata-monitor helm chart k8s test Add a single-job k8s test that installs the kata-deploy helm chart with monitor.enabled=true, pointed at the per-PR kata-monitor image built earlier in the same run, and exercises both the rollout and the user-visible behaviour: * the kata-monitor DaemonSet rolls out and the pod stays up without container restarts; * a real kata-runtime probe pod is scheduled, then /metrics and /sandboxes are scraped through the apiserver pod-proxy to prove kata-monitor sees the sandbox (non-zero running-shim count plus at least one per-sandbox kata_shim_* metric); * after the probe pod is deleted, /metrics drops back to a zero running-shim count. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: OpenAI Codex <codex@openai.com>	2026-06-09 14:33:30 +02:00
Fabiano Fidêncio	285d5daa23	tests: install latest cri-tools dynamically Resolve the cri-tools release at install time instead of pinning a version in versions.yaml: install_cri_tools now queries the GitHub releases API for the absolute latest stable tag, and the kata-monitor, cri-containerd and nydus jobs call it directly. Also write /etc/crictl.yaml during containerd setup so crictl stops emitting deprecation warnings about the legacy default endpoints. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: OpenAI Codex <codex@openai.com>	2026-06-09 14:33:30 +02:00
Fabiano Fidêncio	63fec205fe	tests: run kata-monitor functional tests against the dedicated image Exercise the published kata-monitor container image (the one built by publish-kata-monitor-payload-amd64) rather than the on-disk binary, so integration regressions like the recent glibc/musl mismatch surface at PR time. The kata-monitor-tests.sh script keeps the binary fallback for ad-hoc local runs. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: OpenAI Codex <codex@openai.com>	2026-06-09 14:33:30 +02:00
Fabiano Fidêncio	d5bc1177c0	tests: focus kata-monitor CI on containerd active Drop the stale CRI-O matrix entry (its cri-tools pin was several releases behind) along with the exclude that hid the containerd job, and pin the remaining job to containerd's "active" track (currently v2.2) via CONTAINERD_VERSION. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: OpenAI Codex <codex@openai.com>	2026-06-09 14:33:30 +02:00
Fabiano Fidêncio	5731d30554	helm: add optional kata-monitor deployment to kata-deploy Add a disabled-by-default kata-monitor DaemonSet to the kata-deploy Helm chart, including image/configuration values so operators can enable monitor shipping as part of the same deployment workflow when needed. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: OpenAI Codex <codex@openai.com>	2026-06-09 14:33:30 +02:00
Fabiano Fidêncio	2b6efda67d	docs: document the standalone kata-monitor image kata-monitor is published as a standalone container image starting with 3.32.0; point users at it from the metrics design doc and the Prometheus-on-Kubernetes how-to, and switch the DaemonSet manifest to the dedicated image (keeping the runtime endpoint/listen settings and hostPath cleanups). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: OpenAI Codex <codex@openai.com>	2026-06-09 14:33:30 +02:00
Fabiano Fidêncio	0d6234e7be	ci: share kata image publishing workflows Unify kata-deploy and kata-monitor image publishing behind a single reusable workflow, and rename workflow files to generic kata-images names. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: OpenAI Codex <codex@openai.com>	2026-06-09 14:33:30 +02:00
Fabiano Fidêncio	e04a4326ec	tools: build kata-monitor image from shim-v2-go tarball Build kata-monitor images by extracting the binary from the shim-v2-go tarball and shipping it on top of gcr.io/distroless/static-debian13. Because the binary is built inside an Ubuntu (glibc) toolchain it cannot run on a pure musl/alpine base — users hit __fprintf_chk / __vfprintf_chk relocation errors. To get a small, distroless runtime image we use the same pattern as tools/packaging/kata-deploy/Dockerfile: copy the glibc libraries the binary needs (plus the dynamic linker) via ldd from a glibc base image. In order to do so, we also added a helper script to build and publish architecture-specific monitor images from tarball artifacts. Reported-by: Steve Linde <stevenlinde@google.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: OpenAI Codex <codex@openai.com>	2026-06-09 14:33:30 +02:00
Fabiano Fidêncio	ddaf450086	Merge pull request #13192 from fidencio/topic/fix-cri-containerd-tests-with-containerd-1.7 cri-containerd: fix v1 sanity-check config generation	2026-06-09 14:32:35 +02:00
Fabiano Fidêncio	5000000883	tests: restore SystemdCgroup in installed containerd Set runc SystemdCgroup=true when generating /etc/containerd/config.toml during containerd installation, restoring behavior that was mistakenly dropped. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-06-09 10:46:38 +02:00
Fabiano Fidêncio	3ca9eb94b9	cri-containerd: fix v1 sanity-check config generation Avoid emitting unsupported plugin keys and empty runtime options in the v1.x config path so containerd 1.7 can load the generated TOML during runc sanity checks. While here, let's also dump the temporary cri-integration config on failure to speed diagnosis. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-06-09 10:46:38 +02:00
Fabiano Fidêncio	ac2221a6a5	Merge pull request #13004 from fidencio/topic/versions-bump-containerd-to-2.3 versions: Bump containerd to 2.3	2026-06-09 08:21:58 +02:00
Fabiano Fidêncio	c5ac9982aa	Merge pull request #13178 from fidencio/topic/kata-deploy-log-level kata-deploy: log more info when debug is enabled	2026-06-09 07:18:06 +02:00
Alex Lyn	6500e018c0	Merge pull request #13093 from RainaYL/rainax/tdx_boot_pr dragonball: Add steps to boot TDX VM	2026-06-09 10:13:57 +08:00
Manuel Huber	f37fb18b8c	tests: fix k8s-number-cpus expectation As pointed out in kata-containers/kata-containers#12961, the k8s-number-cpus retry loop could fail all retried assertions and still pass. k8s-number-cpus retried until the guest reported three CPUs, but the post-loop result was never checked. Bash suppresses errexit for the equality test before && break, so the test could exhaust retries and still pass. The current kata-qemu handler sizes vCPUs from fractional container quotas: two 500m limits produce one workload vCPU, then the default vCPU is added and rounded once. Expect two CPUs and assert the final retry result so the test fails if the count never converges. Signed-off-by: Manuel Huber <manuelh@nvidia.com> Assisted-by: OpenAI Codex <codex@openai.com>	2026-06-08 22:50:02 +00:00
Fabiano Fidêncio	48ebbbec3a	kata-deploy: honor debug mode with CLI log-level Make the chart pass --log-level debug automatically when debug=true so CI and troubleshooting runs emit full rendered config dumps without requiring a separate log-level override. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <noreply@cursor.com>	2026-06-08 19:25:48 +02:00
Fabiano Fidêncio	b63494345d	kata-deploy: add configurable verbosity for full CRI config dumps Allow operators to force kata-deploy log verbosity and emit the fully rendered containerd/CRI-O config and drop-in files in debug mode so install troubleshooting can rely on exact effective configuration. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <noreply@cursor.com>	2026-06-08 19:25:48 +02:00
Fabiano Fidêncio	fc08218f55	gatekeeper: rename required tests to minimum/latest The containerd_version matrix values were renamed from lts/active to minimum/latest, which changes the generated CI job names. Update the required-tests list so the gatekeeper waits on the checks that are actually produced. The amd64 run-containerd-stability, run-nydus, run-cri-containerd and free-runner run-k8s-tests jobs map lts -> minimum and active -> latest. The s390x cri-containerd job maps active -> latest, matching its updated matrix. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <noreply@cursor.com>	2026-06-08 19:20:14 +02:00
Fabiano Fidêncio	95b8e8bea9	tests: update remaining containerd callers for containerd 2.x tests/functional/vfio-ap/run.sh: - Source tests/common.bash so the schema helpers are available. - configure_containerd_for_runtime_rs: write kata-qemu-runtime-rs configuration via a conf.d drop-in. Schema >= 3 uses io.containerd.cri.v1.runtime; schema 2 uses io.containerd.grpc.v1.cri. The sandboxer field is emitted only for schema >= 3. tests/integration/nerdctl/gha-run.sh: - Fix "containerd config default" pipe: propagate PATH so the newly installed binary is found, suppress stdout, and call ensure_containerd_conf_d_rootful_api_sockets. tests/integration/kubernetes/gha-run.sh: - Fix jq filter for devmapper snapshotter (.version // 0 >= 3). - Add ensure_containerd_conf_d_rootful_api_sockets after config setup. tests/gha-run-k8s-common.sh: - Remove the redundant "containerd config default \| sed" override; overwrite_containerd_config (called via check_containerd_config_for_kata) now handles SystemdCgroup and all other containerd config setup. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <noreply@cursor.com>	2026-06-08 19:20:14 +02:00
Fabiano Fidêncio	1caacda174	tests/cri-containerd: update integration tests for containerd 2.x Adapt create_containerd_config to work with containerd 2.x while keeping compatibility with v1.x for completeness: - Drop the direct config.toml patching in favour of conf.d fragments: use containerd_render_config_default_with_imports to generate the base config, then write separate drop-ins for API socket overrides, debug settings, and the Kata runtime. - Use CONTAINERD_SYSTEM_FRAGMENT_PREFIX directly (no PREFIX= indirection). - Detect cfg_schema via _containerd_blob_schema_version to select the right plugin table: schema >= 3 -> io.containerd.cri.v1.runtime schema 2 -> io.containerd.grpc.v1.cri and to emit the sandboxer field only on schema >= 3. - Pass GOTOOLCHAIN via "sudo -E make clean" so the environment variable set by export_go_toolchain_for_containerd_source_builds is preserved during the containerd source build. The require_containerd_binary_default_schema_v3_plus call is kept: the test explicitly clones and builds containerd 2.x from source, so a schema v2 binary should never appear here. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <noreply@cursor.com>	2026-06-08 19:20:14 +02:00
Fabiano Fidêncio	7428832c86	tests/nydus: make containerd config schema-aware Configure containerd for nydus differently depending on the active config schema, because conf.d drop-in fragments are only honoured the same way by containerd 2.x. config_containerd now delegates to _containerd_resolved_schema_version (from common.bash) to detect the active schema and passes it to config_containerd_core, which emits schema-appropriate config: schema >= 3 (containerd v2.x): Keep the base config and add a conf.d drop-in fragment using the io.containerd.cri.v1.runtime plugin (sandboxer = 'podsandbox') and io.containerd.cri.v1.images to select nydus as the snapshotter. schema 2 (containerd v1.x): conf.d is not honoured the same way, so replace config.toml wholesale with a complete, self-contained file using the io.containerd.grpc.v1.cri plugin with nydus as the snapshotter and no sandboxer field. The [proxy_plugins] block is written in both cases as it is schema-version agnostic. Teardown restores the whole config.toml (schema v2 path) or removes the drop-in fragment (schema v3+ path) as appropriate. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <noreply@cursor.com>	2026-06-08 19:20:14 +02:00
Fabiano Fidêncio	1bb43d0a19	tests/common: make overwrite_containerd_config schema-aware Rewrite overwrite_containerd_config so that it works with containerd v1.x (schema v2) as well as containerd v2.x (schema v3+): - Always regenerate /etc/containerd/config.toml from the installed binary via "sudo containerd config default". - Call ensure_containerd_conf_d_rootful_api_sockets after regenerating the base config. - Detect the effective schema via _containerd_resolved_schema_version. - Schema >= 3 (containerd v2.x): write io.containerd.cri.v1.runtime plugin path with sandboxer = podsandbox into a conf.d drop-in. - Schema 2 (containerd v1.x): write io.containerd.grpc.v1.cri plugin path without sandboxer into the drop-in. check_containerd_config_for_kata no longer appends a schema guard; the function supports both schema generations intentionally. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <noreply@cursor.com>	2026-06-08 19:20:14 +02:00
Fabiano Fidêncio	18fbf4cd5d	tests/common: fix install_cri_containerd for containerd 2.x Three issues prevented containerd 2.x from working correctly after installation: 1. Socket uid/gid mismatch: "containerd config default" was run as the unprivileged user, which produced uid = <runner-uid> in the API socket stanza instead of uid = 0. Run it under sudo so the default output is owned by root. 2. Stale systemd unit: the CI runner ships a pre-installed containerd whose unit file is left in place after the binary is replaced by the test installer. The old unit causes "MigrateConfigTo: index out of range" panics when the new binary tries to load a schema v4 config. Always overwrite the unit file from the template so the running binary and the unit file stay in sync. 3. Schema guard removed: install_cri_containerd installs whatever version was requested (v1.7 or v2.3) and must not abort on a valid schema v2 binary. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <noreply@cursor.com>	2026-06-08 19:20:14 +02:00
Fabiano Fidêncio	fbf133ce3a	tests/common: add containerd config schema helpers Introduce helper functions used by later commits to make containerd configuration schema-aware. _containerd_blob_schema_version(): Parse the version = <n> line from a containerd config blob and echo the integer. _containerd_resolved_schema_version(): Run "containerd config default" and return the schema version of the active binary. Drives conditional logic in overwrite_containerd_config and other helpers. containerd_emit_rootful_api_socket_overrides(): Emit the TOML fragment that fixes uid/gid on the grpc/ttrpc sockets. Schema v3 uses top-level [grpc]/[ttrpc]; schema v4+ uses plugin-scoped tables. require_containerd_config_schema_v3_plus() / require_containerd_binary_default_schema_v3_plus(): Guard helpers that abort with a clear message when the installed containerd is older than v2.x. Used only in test paths that explicitly build containerd 2.x from source. containerd_render_config_default_with_imports(): Write a fresh "containerd config default" to a file and ensure the conf.d import glob is present, ready for drop-in fragments. export_go_toolchain_for_containerd_source_builds(): Set GOTOOLCHAIN=auto so "go build" of containerd 2.x downloads the exact toolchain in its go.mod without changing the global Go version. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <noreply@cursor.com>	2026-06-08 19:20:14 +02:00
Fabiano Fidêncio	8ffe4e6c02	tests: add journalctl diagnostics on containerd restart failure When restart_systemd_service_with_no_burst_limit fails or times out waiting for the containerd socket, emit "journalctl -xeu containerd.service" output so the failure reason is visible in CI logs without requiring a separate log-collection step. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <noreply@cursor.com>	2026-06-08 19:20:14 +02:00
Fabiano Fidêncio	e122d7ffb0	versions: bump containerd to 2.3 and define minimum/latest test matrix Bump the containerd version used by CI from v1.7.25 to v2.3.0. Rename the version-range fields in versions.yaml and throughout the GitHub Actions workflows from lts/active/version/sandbox_api to minimum/latest to make their meaning self-evident: minimum: "v1.7" # oldest containerd branch under test latest: "v2.3" # newest containerd branch under test Drop the bare version field (superseded by the matrix) and the sandbox_api alias (covered by latest). Update all containerd_version matrix entries in the workflow files accordingly, and update gha-run-k8s-common.sh to resolve the new key names. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <noreply@cursor.com>	2026-06-08 19:20:14 +02:00
Fabiano Fidêncio	a4138794ea	Merge pull request #13183 from fidencio/topic/kata-deploy-custom-kata-drop-in-for-default-runtimes kata-deploy: support drop-in configs for default runtimes	2026-06-08 18:44:33 +02:00
Fabiano Fidêncio	d6e1b45ce7	Merge pull request #13171 from fidencio/topic/runtime-rs-enforce-sandbox_cgroup_only-and-static_sandbox_resource_mgmt runtime-rs: default static sizing-related config flags to true	2026-06-08 17:43:37 +02:00
Fabiano Fidêncio	b119b051cb	kata-deploy: support drop-in configs for default runtimes Allow operators to provide per-shim drop-in TOML for built-in runtimes and reconcile stale override files so upgrades and migrations remain safe when drop-ins are added or removed. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Codex	2026-06-08 13:31:03 +02:00
Fabiano Fidêncio	4dc288401e	runtime-rs: make sandbox cgroup runtime attach idempotent The dragonball nerdctl CI job can race when creating and attaching the runtime process to the sandbox cgroup, surfacing an os error 17 (AlreadyExists) during shim task creation. Let's retry add_proc once on this pre-existing cgroup condition so startup remains robust. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Codex <codex@openai.com>	2026-06-08 13:11:34 +02:00
Fabiano Fidêncio	4d569c22b4	runtime-rs: enforce a minimum vsock reconnect window Low-CPU sandboxes can take longer than a few seconds to complete guest boot and start the agent. Let's clamp the reconnect timeout to a safe minimum so sandbox startup does not fail early with transient vsock ECONNRESET. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Codex <codex@openai.com>	2026-06-08 13:11:34 +02:00
Fabiano Fidêncio	ed34d7811d	runtime-rs: supplement static sizing from sandbox annotations When static sandbox resource management is enabled, CRI CPU/memory sizing may live only in sandbox annotations and be missing from the OCI spec. Let's fill missing sizing fields from annotations before applying static VM sizing so runtime-rs follows the expected Kubernetes behavior for constrained pods. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Codex <codex@openai.com>	2026-06-08 13:11:34 +02:00

1 2 3 4 5 ...

19309 Commits