kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-07-01 14:38:33 +00:00

Author	SHA1	Message	Date
Fabiano Fidêncio	8d2ecaabb5	versions: Bump QEMU to v11.0.0 For more details see QEMU's release notes: https://www.qemu.org/2026/04/22/qemu-11-0-0/ GPU experimental variants are also using v11.0.0 plus one patch to solve issues related to NUMA mapping. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-24 22:00:46 +02:00
Fabiano Fidêncio	cbcdd999e4	Merge pull request #12957 from Apokleos/fix-sb-api runtime-rs: Fix sandbox-api lifecycle and CRI status handling	2026-05-23 09:26:14 +02:00
Fabiano Fidêncio	a7aa2576c6	Merge pull request #13089 from fidencio/topic/kata-deploy-fix-label-set-on-rke2 kata-deploy: verify kata-runtime label remains stable on rke2/k3s	2026-05-23 08:52:27 +02:00
Fabiano Fidêncio	7faeb9b727	Merge pull request #13091 from kata-containers/dependabot/go_modules/src/runtime/github.com/containerd/containerd-1.7.32 build(deps): bump github.com/containerd/containerd from 1.7.29 to 1.7.32 in /src/runtime	2026-05-23 08:51:36 +02:00
Fabiano Fidêncio	5d3e1e6396	kata-deploy: verify kata-runtime label remains stable on rke2/k3s The retry loop added in `efd468df3f` still allows the install to declare success while inside the kubelet's post-restart re-register window. On rke2/k3s, `systemctl restart rke2-agent` restarts both containerd and the kubelet, but `wait_till_node_is_ready` polls `.status.conditions[Ready]` every 2 s and returns on the first `True` observation it sees. By default the kubelet only publishes node status every ~10 s, so that first `True` is almost always the stale value from before the restart — the kubelet hasn't actually finished restarting yet. `label_node_with_retry` then applies the label, sleeps 1 s, reads back "true" (still stale, kubelet still down), and returns Ok. Install completes, `/readyz` flips to 200, helm releases its `--wait`, and the bats test starts — and only then does the kubelet finish coming up, re-register the node, and clobber the label with its cached set. The lifecycle test sees an empty `katacontainers.io/kata-runtime` and fails: # Node label katacontainers.io/kata-runtime: not ok 1 Kata artifacts are present on host after install A single-shot verification can't distinguish "still stale true" from "truly stable true after kubelet re-register". Replace it with a stability window: after (re)applying the label, require it to remain at the expected value for STABILITY_CHECKS=6 consecutive observations spaced CHECK_INTERVAL=2 s apart (≈ 12 s — comfortably more than the kubelet's status-update period). If the value ever drifts inside the window, re-apply and restart the stability counter. Bounded by MAX_APPLY_ATTEMPTS=12, so worst case is ~3 min; happy path adds ~12 s to install. Also add a short polling loop to the test's own label assertion as belt-and-suspenders for any leftover transient race, matching the existing retry pattern used for the container-runtime version check. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-22 11:53:18 +02:00
Alex Lyn	adf6d43e24	test: skip TestContainerMemoryUpdate for sandbox api Temporarily skip the `TestContainerMemoryUpdate` test case for sandbox api. This test case is currently skipped in other VMMs (e.g., QEMU, Cloud-Hypervisor) due to known issues and environmental stability concerns. To maintain consistency across the project, we are skipping it for sandbox as well. A follow-up PR will be dedicated to addressing these issues and properly enabling/refining this test case for all VMMs. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-22 10:46:44 +08:00
Alex Lyn	b5349f4d78	versions: bump containerd to 2.3 for sandbox API tests containerd 2.3 requires Go 1.26.3, but Kata still pins Go 1.25.10. Use Go 1.26.3 for the sandbox-api job so that make cri-integration can build containerd from source. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-22 10:46:16 +08:00
Alex Lyn	9f78dc687f	tests: exclude TestContainerRestart from the cri-containerd test list Creating a new container in the same sandbox VM after the previous container has exited and been removed has never been supported by kata-containers (neither with the go-based nor the rust-based runtime). When the last container is removed the kata VM shuts down, so any attempt to start a new container in the same sandbox fails. This test exercises a use-case kata does not currently support, and it has never been part of the passing list for good reason. Mark it explicitly excluded with a comment so it is clear this is a deliberate omission rather than an oversight. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-22 10:45:50 +08:00
Alex Lyn	328fccfbbd	ci: Re-enable run-containerd-sandboxapi job The job was disabled because TestImageLoad was failing when using the shim sandboxer with runc due to a containerd bug (config.json not being written to the bundle directory). Now that check_daemon_setup uses podsandbox for the runc sanity check, the root cause of the failure is worked around on our side and the job can be re-enabled. Also update the runner to ubuntu-24.04. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-22 10:45:26 +08:00
Alex Lyn	a7739579d6	tests: Use podsandbox sandboxer for the runc sanity check The check_daemon_setup function verifies that containerd + runc are functional before the real kata tests run. Using the shim sandboxer for this runc check hits a known containerd bug where the OCI spec is not populated before NewBundle is called, so config.json is never written and containerd-shim-runc-v2 fails at startup. See containerd/containerd#11640 The sandboxer choice is irrelevant for this sanity check, so use podsandbox which works correctly with runc. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-22 10:44:38 +08:00
Alex Lyn	486f5f9412	runtime-rs: Align sandbox status with CRI expectations Update the sandbox status reporting to align with containerd/CRI requirements. This commit aims to address issue of `State Mapping` Previously, internal state strings were returned, which containerd could not recognize, causing running sandboxes to be misinterpreted as SANDBOX_NOTREADY. This maps internal states to CRI constants: - Running -> SANDBOX_READY - Init \| Stopped -> SANDBOX_NOTREADY These changes ensure the sandbox status is both accurately interpreted and fully compliant with the expected interface. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-22 10:42:43 +08:00
Alex Lyn	3f42929e2b	runtime-rs: Update sandbox status to include created_at field Ensure the `created_at` timestamp is correctly propagated in the sandbox status. Although `created_at` is present in the `SandboxStatus` and `SandboxStatusResponse` data structures, it was previously omitted during the status transition. This commit completes the implementation by passing the value recorded during sandbox initialization. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-22 10:42:43 +08:00
Alex Lyn	3358c7634b	runtime-rs: Avoid shutting down sandbox on container exit Prevent the sandbox from being prematurely shut down when a standard workload container exits. Previously, the shutdown logic incorrectly triggered a sandbox shutdown whenever the container list became empty. This resulted in unintended lifecycle termination for non-transient sandboxes. This change refines the `need_shutdown_sandbox()` criteria in `virt_container/src/container_manager/manager.rs` to only initiate a shutdown under specific conditions: - The shutdown request is explicit (`req.is_now`). - The request targets the sandbox itself (`req.container_id == self.sid`). By removing the implicit dependency on the empty container list, we ensure the sandbox remains active as expected after workload containers finish execution. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-22 10:42:43 +08:00
Alex Lyn	2b980b3a34	runtime-rs: Block WaitSandbox until sandbox exits Rework sandbox waiting so the WaitSandbox path blocks on sandbox lifetime rather than directly borrowing the hypervisor wait call. Once stop has been observed, the cached exit result is returned to later waiters. While the sandbox is still alive, waiters subscribe to the internal stop notifier and sleep until shutdown or VM exit records the final result. Together with the preceding support commits, this keeps the overall behaviour identical to the original WaitSandbox fix while making the dependency chain explicit. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-22 10:42:43 +08:00
Alex Lyn	ac2d39fc34	runtime-rs: Add sandbox exit notifier in VirtSandbox Add an internal exit_notify_tx channel to VirtSandbox and initialise it in both the regular and restore constructors. The later WaitSandbox rework needs a way to block until sandbox stop has been observed without polling runtime state. This commit only wires in the notifier so the follow-on behaviour change can subscribe to a dedicated stop signal. No WaitSandbox behaviour changes are made here yet. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-22 10:42:43 +08:00
Alex Lyn	116ae66025	runtime-rs: Introduce a cached sandbox exit information Introduce an exit_info field in SandboxInner so sandbox teardown can store a stable exit result in runtime state. The follow-on WaitSandbox rework needs a place to keep the final SandboxExitInfo after the sandbox has already stopped. Without that cached result, later waiters would have no consistent value to return once the original stop event has passed. This change only adds the state holder. Behaviour changes follow in later commits. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-22 10:42:43 +08:00
dependabot[bot]	ac77c5fdff	build(deps): bump github.com/containerd/containerd in /src/runtime Bumps [github.com/containerd/containerd](https://github.com/containerd/containerd) from 1.7.29 to 1.7.32. - [Release notes](https://github.com/containerd/containerd/releases) - [Changelog](https://github.com/containerd/containerd/blob/main/RELEASES.md) - [Commits](https://github.com/containerd/containerd/compare/v1.7.29...v1.7.32) --- updated-dependencies: - dependency-name: github.com/containerd/containerd dependency-version: 1.7.32 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2026-05-21 21:56:06 +00:00
Fabiano Fidêncio	7536f2c616	Merge pull request #13055 from kata-containers/topic/kata-deploy-only-install-what-will-be-used kata-deploy: only install what will actually be used	2026-05-21 17:53:09 +02:00
Fabiano Fidêncio	90799f570d	Merge pull request #13082 from fidencio/topic/fix-docker-time-namespace runtime: drop host time namespace from OCI spec	2026-05-21 17:03:20 +02:00
Fabiano Fidêncio	05f2bfcb0b	runtime-rs: drop unused std::env import in initdata_block tests The tests module imports std::env but never references it, which trips the unused_imports warning during CI builds. Remove the dead import to silence the warning. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-21 13:56:45 +02:00
Fabiano Fidêncio	f9eafb3341	runtime: drop host time namespace from OCI spec Docker 29.5+ adds a private time namespace to container bundles by default, but kata agent only supports the classic namespace set and then fails with "invalid namespace type". Let's strip time namespaces in both the Go and rust runtimes before the spec reaches the agent, matching how network and cgroup namespaces are handled. Fixes: #13080 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-21 13:56:45 +02:00
Steve Horsman	bef049d07e	Merge pull request #13081 from stevenhorsman/cache-tag-updates kata-deploy: always add HEAD commit SHA tag to all builds	2026-05-21 11:15:23 +01:00
Alex Lyn	c919aea448	Merge pull request #13066 from RainaYL/rainax/guest_memfd_pr dragonball: Add implementation for KVM-managed guest memfd	2026-05-21 17:12:44 +08:00
Alex Lyn	0283097e91	Merge pull request #13063 from RainaYL/rainax/acpi_pr dragonball: Add basic ACPI implementation for TDX boot	2026-05-21 17:04:59 +08:00
Fabiano Fidêncio	efd468df3f	kata-deploy: retry node labeling after CRI restart On rke2/k3s a CRI restart also restarts the kubelet, which may briefly re-register the node with its cached label set and clobber the kata-runtime label that was just applied via the API. Replace the single label_node call with a retry loop that verifies the label value after setting it. If the label is missing or has the wrong value, it is re-applied (up to 10 attempts with 2 s back-off). This fixes a race condition that became more visible after the switch to individual tarball extraction, which made install take slightly longer and shifted the kubelet re-registration timing window. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-20 20:52:36 +02:00
Fabiano Fidêncio	291e4d37be	kata-deploy: implement selective tarball extraction in installer Add zstd and tar as Rust dependencies and rewrite the artifact installation logic to extract only the component tarballs required by the enabled runtime classes. extract_component_tarballs reads shim-components.json to determine which kata-static-<name>.tar.zst files are needed for the selected shims and current architecture. Shared components (e.g. kernel, shim-v2-go) are listed by multiple shims and must only be unpacked once per install run. Deduplication is handled with an in-memory set passed through the call, avoiding any risk of stale on-disk state surviving across pod restarts. Within each tarball, opt/kata path prefixes are stripped and absolute symlink / hard-link targets are rewritten to point at the resolved installation directory, correctly handling MULTI_INSTALL_SUFFIX. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-20 20:52:36 +02:00
Fabiano Fidêncio	9a0acc6c4c	kata-deploy: ship individual component tarballs; drop merged tarball Update the Dockerfile to copy each kata-static-<name>.tar.zst directly into the image alongside shim-components.json, replacing the old artifact-extractor stage that unpacked a single merged tarball. Update the publish-kata-deploy-payload and release CI workflows to download individual per-component artifacts instead of waiting for a merged tarball, and simplify kata-deploy-build-and-upload-payload.sh accordingly. The kata-deploy image build is no longer blocked on the merge step. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-20 20:52:36 +02:00
Fabiano Fidêncio	87e55be4a3	kata-deploy: add shim-components.json component manifest Introduces the human-maintained shim-components.json that maps each runtime class to the list of kata-static-<name>.tar.zst component tarballs it needs per architecture. This is the source of truth read by the installer at deploy time to decide which tarballs to extract. Key design choices encoded here: - shim-v2-go vs shim-v2-rust: explicit per-shim, so a node running only Rust shims never extracts the Go shim binary. - virtiofsd and nydus are both listed for hypervisors that support configurable shared_fs (we cannot know which the user will choose). - fc/firecracker: no virtiofsd or nydus (devmapper only). - remote: only the shim binary (no local hypervisor artifacts). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-20 20:52:36 +02:00
Fabiano Fidêncio	c87e327876	kata-deploy: split shim-v2 into shim-v2-go and shim-v2-rust Split the monolithic shim-v2 build target into separate shim-v2-go and shim-v2-rust targets in kata-deploy-binaries.sh, the local-build Makefile, and the four architecture CI workflows. The Go and Rust shims now each produce their own kata-static-<name>.tar.zst artifact, allowing downstream consumers to select only the shim variant they need. MEASURED_ROOTFS is set per-arch for the Rust job in CI. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-20 20:52:36 +02:00
stevenhorsman	3f27052184	kata-deploy: always add HEAD commit SHA tag to all builds Previously, the commit SHA tag was only added for specific components (agent, agent-ctl) by setting artefact_tag in individual install functions. This was inconsistent and error-prone. Now, the HEAD commit SHA is always added as a tag for all builds in the central tagging logic. This ensures: - All components get tagged with the commit SHA - The correct HEAD commit is used (not the last commit that modified a specific path) - Simpler, more maintainable code The git command uses `git -C` to change to the repo directory before running git log, which correctly returns the HEAD commit SHA regardless of which files were modified in recent commits. Assisted-by: IBM Bob Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-05-20 17:42:09 +01:00
Xiaofan Xxf	62af158842	dragonball: Add implementation for KVM-managed guest memfd A TDX VM requires that guest memfd is managed by KVM, so that KVM is able to toggle the memory attribute for the region to shared/private. Therefore, only anonymous guest memory is allowed for TDX VM, and the KVM-managed memfd should be created by KVM_CREATE_GUEST_MEMFD ioctl, instead of issuing memfd_create system call. Also, in order to bind this memfd with corresponding memory region, KVM_SET_USER_MEMORY_REGION2 should be invoked, instead of KVM_SET_USER_MEMORY_REGION. Signed-off-by: Xiaofan Xxf <xiaofan.xxf@antgroup.com>	2026-05-20 15:02:03 +08:00
Xiaofan Xxf	2506b24c66	dragonball: Add basic ACPI implementation for TDX boot Added basic implementation for a few ACPI tables (MADT, FADT and DSDT). Td-shim does not support mptable, and requires VMM to pass ACPI table contents to virtual firmware via HOB list. Note that this is PR contains only minimal implementation enough for booting a TDX VM. More comprehensive ACPI support may require future updates. Signed-off-by: Xiaofan Xxf <xiaofan.xxf@antgroup.com>	2026-05-20 14:01:47 +08:00
Fabiano Fidêncio	cec98e0d97	Merge pull request #13076 from stevenhorsman/generate-vendor-fix release: correct .cargo/config.toml reference in generate_vendor.sh 3.31.0	2026-05-19 22:10:48 +02:00
stevenhorsman	76fc847c78	release: correct .cargo/config.toml reference in generate_vendor.sh The script was creating .cargo/config.toml but referencing .cargo/config in the vendor_dir_list, causing tar to fail with 'Cannot stat' error. Signed-off-by: stevenhorsman <steven@uk.ibm.com> Generated-By: IBM Bob	2026-05-19 18:23:53 +01:00
Fabiano Fidêncio	ddb8a5de89	Merge pull request #13065 from stevenhorsman/release/3.31 release: Bump version to 3.31.0	2026-05-19 17:47:09 +02:00
stevenhorsman	a4cfe32157	release: Bump version to 3.31.0 Bump VERSION and helm-charts versions. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-05-19 15:32:50 +02:00
Fabiano Fidêncio	035b64a981	Merge pull request #13073 from stevenhorsman/agent-ctl-clh-virtio-fs-queue-size-fix agent-ctl: CLH virtio fs queue size fix	2026-05-19 15:32:24 +02:00
stevenhorsman	6ee43475c3	agent-ctl: Fix CLH virtio-fs queue size configuration After commit `e2240b694a` ("runtime-rs: ch: source virtio-fs queue size from toml"), Cloud Hypervisor no longer provides fallback defaults for virtio-fs queue configuration. When queue_size or queue_num are 0, CH now uses those values directly instead of substituting defaults, which causes a panic in the device manager. The agent-ctl tool was hardcoding queue_size=0 and queue_num=0 in share_fs_utils.rs, relying on CH's fallback behavior. This broke the agent-api tests for Cloud Hypervisor while QEMU tests continued to pass. Fix by reading virtio_fs_queue_size from the hypervisor config and falling back to sensible defaults (1024 queue size, 1 queue) when not configured, matching the previous CH default behavior. Generated-by: IBM Bob Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-05-19 12:05:52 +01:00
stevenhorsman	f47d1c0d69	tests/agent-ctl: Add debug The agent-ctl tests are failing in the CI, but there is no log reporting, so debugging is not possible. Add some debug to help. Assisted-by: IBM Bob Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-05-19 12:00:47 +01:00
Fabiano Fidêncio	ffa59ce3aa	Merge commit from fork runtime: disable virtiofsd extra-args annotation by default	2026-05-19 08:22:12 +02:00
Alex Lyn	8dca734008	Merge pull request #12959 from DataDog/mayeul/fix-race-condition-when-adding-qdisc shim: Add backoff retry to ingress qdisc creation to avoid potential race condition	2026-05-19 14:06:37 +08:00
Fabiano Fidêncio	8d7187677e	Merge pull request #12967 from kata-containers/sprt/rs-virtiofs-queue-size-fixes runtime-rs/virtiofsd: read queue size from config	2026-05-19 07:36:44 +02:00
Aurélien Bombo	e2240b694a	runtime-rs: ch: source virtio-fs queue size from toml Now that `prepare_virtiofs` populates `ShareFsConfig` from `SharedFsInfo.virtio_fs_queue_size`, the CH-side fallback that substitutes `DEFAULT_FS_QUEUE_SIZE` (1024) when the incoming `queue_num`/`queue_size` are zero is no longer needed. Drop it from both `handle_share_fs_device` and `TryFrom<ShareFsSettings> for FsConfig` and use the values straight from the config. Drop the now unused `DEFAULT_FS_QUEUES` and `DEFAULT_FS_QUEUE_SIZE` constants. This also removes a latent bug in both call sites: the previous code gated `queue_size` on `queue_num > 0`, so a user setting only the queue size and not the (currently unconfigurable) queue count would have had their `queue_size` silently overwritten by the default. The CH config template (`configuration-clh-runtime-rs.toml.in`) did not ship the `virtio_fs_queue_size` key (unlike the qemu-runtime-rs templates), so without an explicit override the field would have deserialized to 0 and the fallback would have been the only thing keeping CH working. Add the key to the template, defaulted to `@DEFVIRTIOFSQUEUESIZE@` (1024), matching the qemu-runtime-rs templates. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-05-19 06:14:24 +02:00
Aurélien Bombo	0d5bde2181	runtime-rs: virtio-fs: plumb virtio_fs_queue_size to qemu/CH The shared filesystem device builder in `prepare_virtiofs` was hardcoding `queue_size = 0` and `queue_num = 0` on the `ShareFsConfig` it hands to the hypervisor, ignoring `SharedFsInfo.virtio_fs_queue_size` parsed from `configuration.toml` entirely. For qemu, this is silently broken: the cmdline generator's `DeviceVhostUserFs::set_queue_size` treats 0 as "not set" and skips the `queue-size=` argument when emitting the `vhost-user-fs-pci` device, so QEMU falls back to its built-in default of 128, regardless of what the user configured. For Cloud Hypervisor it happens to work in practice today, but only because `ch::handle_share_fs_device` and `TryFrom<ShareFsSettings> for FsConfig` substitute a hardcoded 1024 when the incoming `queue_num`/`queue_size` are zero. That fallback masks the real bug; the toml value still never reaches the VMM. Add a `get_shared_fs_info` accessor on `DeviceManager` mirroring the existing `get_block_device_info` helper, and use it in `prepare_virtiofs` to populate `ShareFsConfig.queue_size` from `SharedFsInfo.virtio_fs_queue_size`. Use a single virtqueue (`queue_num = 1`), matching what runtime-go hardcodes for both qemu (govmm `QemuFSParams` does not emit `num-queues=`) and CH (`numQueues := int32(1)` in `clh.go`). The CH-side fallback and the CH config template are addressed in a follow-up commit. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-05-19 06:14:24 +02:00
Alex Lyn	e5a7f5b120	Merge pull request #13009 from sebwolf-de/swolf/kata-fc-jailer-pid-leak Fix #13008: runtime/fc track real firecracker PID instead of jailer PID	2026-05-19 11:59:24 +08:00
Alex Lyn	bbef0a755c	Merge pull request #13005 from stevenhorsman/remove-osbuilder-tests osbuilder: Remove tests	2026-05-19 11:58:27 +08:00
Alex Lyn	357921df62	Merge pull request #12437 from Apokleos/fix-katactl-exec kata-ctl: Fix failures when kata-ctl exec with short id	2026-05-19 09:13:17 +08:00
Aurélien Bombo	83e20877d8	Merge pull request #12882 from stevenhorsman/runtime-rs/cdh_api_timeout runtime-rs: Add cdh_api_timeout configuration parameter	2026-05-18 15:38:27 -05:00
Sebastian Wolf	26746c9ce8	runtime/fc: track real firecracker PID instead of jailer PID When the jailer is in use (the default for kata-fc), cmd.Process.Pid in fcInit() is the jailer's PID, not firecracker's. The jailer forks + execs firecracker as a separate child and exits. fc.info.PID was therefore stored as the (soon-to-be-dead) jailer PID. At sandbox shutdown, fcEnd() calls WaitLocalProcess(fc.info.PID, SIGTERM, ...). syscall.Kill on the dead jailer PID returns ESRCH, WaitLocalProcess returns nil immediately, and the real firecracker microVM never receives a signal. It gets reparented to init and stays alive indefinitely, holding open resources from the host. Over many container lifecycles this becomes a serious resource leak. Read the real PID from <jailerRoot>/firecracker.pid, which firecracker itself writes after the exec. Update fc.info.PID with that value so all downstream code (fcEnd, Save/Load, kill-0 alive checks, NewProc) operates on the actual firecracker process. Also fix a small adjacent bug in Sandbox.Stop where the per-container teardown loop ignored the force flag, causing any container.stop error to short-circuit Stop before stopVM ran. Signed-off-by: Sebastian Wolf <swolf@nvidia.com>	2026-05-18 21:09:51 +02:00
Fabiano Fidêncio	7c971f0c4c	Merge pull request #13069 from fidencio/topic/kata-deploy-prevent-eviction helm-chart: add priorityClassName to prevent kata-deploy eviction	2026-05-18 21:08:45 +02:00

1 2 3 4 5 ...

19101 Commits