kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-07-01 22:50:54 +00:00

Author	SHA1	Message	Date
Alex Lyn	a08267faaf	runtime-rs: Track GPT partition padding files for cleanup When a GPT-partitioned VMDK is split into individual partition images, padding files may be generated between partitions to maintain correct byte offsets. These were not tracked for cleanup, leading to stale temporary files after container removal. Iterate over the partition layout and check for pad-{idx}.img files alongside the head image; add any that exist to gpt_metadata_paths so they are removed during teardown. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-26 09:51:05 +08:00
Alex Lyn	51e8310ef3	kata-agent: Integrate dm-verity into multi-layer EROFS mount path Wire the dm-verity helpers into the layer mount flow so that GPT partitions carrying verity metadata are mounted through a verified device-mapper target instead of the raw partition. Refactor wait_and_mount_layer to resolve partition path and verity device as separate steps: create a dm-verity device when X-kata.dmverity-enabled=true is set, fall back to direct partition mount otherwise, and return the verity device path for cleanup tracking. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-26 09:51:05 +08:00
Alex Lyn	963ba6c6cd	kata-agent: Add dm-verity device cleanup for GPT-partitioned layers Add per-container verity_devices tracking in Sandbox and wire the teardown path: destroy_partition_dmverity_device removes the device-mapper target via deferred-remove ioctl and deletes the mknod node, cleanup_dmverity_devices iterates all devices in reverse order. Wire into remove_container_resources (rpc.rs) so verity devices are torn down after unmount, and record verity device paths in add_storages (storage/mod.rs) for tracking. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-26 09:51:05 +08:00
Alex Lyn	dce409bc35	kata-agent: Add dm-verity device creation for GPT-partitioned layers GPT-partitioned EROFS layers can carry dm-verity hashes appended after the filesystem data within the same partition. The host runtime passes the root hash and parameters as X-kata.dmverity.* storage options; the agent must set up the kernel dm-verity target before mounting so that every read is integrity-checked against the Merkle tree. Implement dm-verity device creation: option parsing from storage options, device name generation, and create helper via devicemapper ioctls with hash_start_block calculation (accounting for v1 superblock presence). Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-26 09:51:05 +08:00
Alex Lyn	e900eae388	kata-agent: Add no-udev DmOptions builders and mknod device node helpers The kata guest VM runs without udev, so device-mapper nodes under /dev/mapper are never created automatically. Add the foundational helpers that subsequent dm-verity integration will rely on: It focus on the following key points: (1) DmOptions builders that disable all udev synchronization flags, with read-only and deferred-remove variants. (2) mknod-based device node creation/removal under /dev/mapper, since devtmpfs nodes are not auto-created without udev. Also add the devicemapper crate dependency (default-features = false). But note that the commit depends on device mapper with no-udev support with the PR:https://github.com/stratis-storage/devicemapper-rs/pull/1036 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-26 09:51:05 +08:00
Alex Lyn	c471644477	runtime-rs: Add dm-verity annotation extraction to GPT+VMDK integration Extract dm-verity metadata from containerd mount annotations and pass them through to kata-agent as X-kata.dmverity.* storage options. This enables the agent to create dm-verity devices for integrity-verified EROFS partitions. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-26 09:51:05 +08:00
Alex Lyn	3051b8d11a	runtime-rs: Add dm-verity utility functions to gpt_disk module When containerd creates dm-verity-protected EROFS layers, it stores the root hash and parameters as OCI annotations — but the format does not directly map to the kernel dm-verity table that the guest agent needs to construct. Bridge this gap with functions that parse containerd's dm-verity annotation JSON, detect whether a v1 superblock is embedded at the hash offset (to extract the salt automatically rather than relying on containerd's hardcoded default), and produce the X-kata.dmverity.* storage options the agent expects. This keeps all dm-verity metadata translation on the host side, so the agent can consume a flat list of options without understanding the containerd annotation schema. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-26 09:51:05 +08:00
Alex Lyn	499fefd972	kata-types: Extend DmVerityInfo with salt, hash_type, no_superblock fields Add fields to DmVerityInfo needed for dm-verity device creation: (1) salt: Optional salt value for the hash computation (2) hash_type: dm-verity version (3) no_superblock: whether to skip the superblock at hash offset Uses serde defaults for backward compatibility with existing serialized data that lacks these fields. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-26 09:51:05 +08:00
Fabiano Fidêncio	79cf2aed66	Merge pull request #13282 from fidencio/topic/revert-qos-test-skip Revert "tests: skip Guaranteed QoS test for SNP/TDX runtime-rs"	2026-06-25 22:18:21 +02:00
Fabiano Fidêncio	850b385f6b	Revert "tests: skip Guaranteed QoS test for SNP/TDX runtime-rs" This reverts commit `6588014b54`, as the needed PR[0] was merged this morning, allowing us to just revert the image. [0]: https://github.com/kata-containers/kata-containers/pull/13173 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-06-25 18:18:17 +02:00
Fabiano Fidêncio	31d349f999	Merge pull request #13173 from fidencio/topic/fixed-sandbox-sizing runtime-rs: size sandboxes with fixed overheads	2026-06-25 15:50:00 +02:00
Fabiano Fidêncio	a664595084	kata-deploy: bump qemu RuntimeClass overhead for the aarch64 VMM With sandbox_cgroup_only the shim, QEMU and virtiofsd run inside the pod's memory cgroup, whose limit is the workload limit plus the RuntimeClass pod overhead. On aarch64 the VMM host footprint is much larger than on x86 (QEMU's own anon RSS is ~160Mi+ before any guest RAM, on top of the shmem-backed guest memory), so the 160Mi overhead is too small: small-memory-limit pods get their qemu-system process OOM-killed by the pod cgroup (CONSTRAINT_MEMCG), and the agent vsock never comes up (ENODEV), so the sandbox fails to start. Raise the pod overhead to 320Mi for the qemu shims that run on aarch64 (qemu, qemu-runtime-rs, qemu-coco-dev-runtime-rs). The value is applied on all architectures for simplicity; x86 is over-provisioned by ~160Mi, which is acceptable. The TEE/GPU shims already carry far larger overhead and amd64-only shims (clh*, dragonball, fc) are unaffected. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-25 13:56:11 +02:00
Fabiano Fidêncio	b2f7314d31	tests: harden sandbox sizing manifests for k8s cpu workloads Route runtime-rs tests to dedicated manifests/templates and ensure the CPU allocation workloads always carry explicit memory limits, avoiding Dragonball sandbox startup failures from InvalidMemorySize(0). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-25 13:56:11 +02:00
Fabiano Fidêncio	346a3be9ad	docs: document runtime-rs sandbox overhead sizing Add a how-to describing how runtime-rs sizes static sandboxes from overhead plus requested CPU/memory, including that fractional vCPU results are rounded up for VMM-visible vCPU counts, and link it from the how-to README. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-25 13:56:11 +02:00
Fabiano Fidêncio	a34c74a2d4	runtime-rs: size static sandboxes with overhead values When static sandbox sizing is enabled, keep configured defaults when workloads do not specify CPU or memory limits. When limits are present, size the VM as requested resources plus overhead_vcpus/overhead_memory values derived from runtime-rs profile defaults. Limit-driven vCPU sizing is clamped to a minimum of one vCPU so a 0.0 result never yields an unbootable VM, and sandbox setup fails early with a clear, actionable error when the computed memory is 0 MiB (pointing at memory limits or non-zero default/overhead memory settings). This keeps static VM sizing predictable across runtime-rs profiles, including NVIDIA ones. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-25 13:56:11 +02:00
Fabiano Fidêncio	65a266f532	Merge pull request #13272 from cayoub-oai/codex/upstream-cgroupfs-init-subcgroup agent: Apply init subcgroup in cgroupfs manager	2026-06-25 13:54:19 +02:00
Aurélien Bombo	1217dd1584	Merge pull request #12373 from kata-containers/disable-guest-empty-dir runtime: Set `disable_guest_empty_dir = true` by default	2026-06-24 20:09:46 -05:00
Chris Ayoub	4e3d257dc0	agent: Apply init subcgroup in cgroupfs manager When cgroup v2 is enabled, exec can fail with EBUSY while writing the process to cgroup.procs if the container process has been delegated to an init subcgroup. PR #10845 fixed this behavior for the systemd/D-Bus cgroup manager path, which was related to #10733. The cgroupfs manager still writes the process directly to the container cgroup, so apply the same init subcgroup handling there. Also fix the cgroupfs init-subcgroup existence check for absolute OCI cgroup paths by joining the trimmed cgroup path under the cgroup root. Fixes: #9701 Signed-off-by: Chris Ayoub <cayoub@openai.com> Generated-By: OpenAI Codex	2026-06-24 21:25:49 +00:00
Aurélien Bombo	10cf6816aa	kernel: Fix FUSE crash with host emptyDir This patch was submitted by Miklos Szeredi: https://lore.kernel.org/fuse-devel/20260528142306.1792392-1-mszeredi@redhat.com/ It fixes a FUSE oops with the k8s-shared-volume.bats test. Fixes: #12589 Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-06-24 15:22:13 -05:00
Aurélien Bombo	77c3e36cf7	tests: Support GENPOLICY_SETTINGS_DIR with drop-in-examples Follow-up to `3dd77bf576`. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-06-24 15:22:13 -05:00
Aurélien Bombo	3acb618f6b	genpolicy: Assume `disable_guest_empty_dir = true` This option should be removed for 4.0, so we don't handle `false`. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-06-24 15:22:13 -05:00
Aurélien Bombo	e191c5b716	runtime-go/rs: Reconcile hugepage emptyDirs and disable_guest_empty_dir This addresses an issue where the disable_guest_empty_dir=true code paths did not take into account that hugepage-backed emptyDirs should always be recreated in the guest (using guest hugepages). Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-06-24 15:22:13 -05:00
Aurélien Bombo	a3e91d9ed2	runtime-go/rs: Set `disable_guest_empty_dir = true` by default This makes the runtime share the host Kubelet emptyDir folder with the guest instead of the agent creating an empty folder in the container rootfs. Doing so enables the Kubelet to track emptyDir usage and evict greedy pods. In other words, with virtio-fs the container rootfs uses host storage whether this is true or false, however with true, Kata uses the k8s emptyDir folder so the sizeLimit is properly enforced by k8s. Addresses the ephemeral storage part of #12203. History: * Initially, emptyDirs are slow because they are shared from the host with 9p. https://github.com/kata-containers/runtime/issues/1472 * To address above, emptyDirs are hardcoded to be created by the agent in the pause container's rootfs, potentially leveraging devicemapper and improving perf. https://github.com/kata-containers/runtime/pull/1485 * The previous PR regressed an (interesting?) use case where emptyDirs were used to share data from the host to the guest, so the behavior was made configurable and `disable_guest_empty_dir = false` is introduced, defaulting to the behavior of the previous PR. https://github.com/kata-containers/kata-containers/pull/2056 * Another resource accounting regression remains which is addressed in this PR. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-06-24 15:21:53 -05:00
Fabiano Fidêncio	6528e7a72f	Merge pull request #13228 from fidencio/topic/dont-set-slots-maxmem-for-confidential-guests runtime-rs: qemu: don't set slots/maxmem for confidential guests	2026-06-24 17:27:28 +02:00
Greg Kurz	13b3020c34	Merge pull request #13261 from c3d/bug/13260-Info-log-level runtime: Change default log level from Warn to Info	2026-06-24 08:57:13 +02:00
Fabiano Fidêncio	392b802f61	Merge pull request #12878 from Apokleos/fix-configs runtime-rs: Fix configs differences between runtime-rs and runtime-go	2026-06-23 13:53:16 +02:00
Steve Horsman	811914a372	Merge pull request #13246 from Apokleos/copyfile-with-gid-uid runtime-rs: correct uid/gid for K8s secret/configmap copy_file	2026-06-23 10:43:03 +01:00
Steve Horsman	3e429a8afb	Merge pull request #13234 from LandonTClipp/docs-skill docs: Add AI agent skill for doc contributions	2026-06-23 09:59:15 +01:00
Christophe de Dinechin	631fd96715	runtime: Change default log level from Warn to Info When the kata configuration does not set log_level to debug, the containerd-shim-v2 defaults to WarnLevel, which suppresses important diagnostic information logged at Info level. Key Info-level logs that are currently hidden: - QEMU command line (qemu.go:3566) - critical for debugging VM issues - VM lifecycle events (creation, start, stop) - Device hotplug operations (VFIO, network, volumes) - Resource configuration (NUMA, memory) - QMP socket details Info level provides significantly better diagnostic data without flooding logs with excessive detail (which would occur at Debug level). This change improves troubleshooting capabilities for production deployments where debug mode is not enabled. Note: runtime-rs already defaults to Info level (see src/runtime-rs/crates/shim/src/logger.rs:13,30), so this change only affects the Go runtime. Fixes: #13260 Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>	2026-06-23 10:29:33 +02:00
LandonTClipp	85e828cc9b	docs: Add AI agent skill for doc contributions This skill will inform AI agents how to properly write and format docs in the new docs system. There is nothing too fancy, just reminding agents to use mkdocs-materialx features instead of treating the markdown like the legacy Github-based format. Signed-off-by: LandonTClipp <lclipp@coreweave.com>	2026-06-23 08:57:37 +01:00
Fabiano Fidêncio	bbe714ae03	Merge pull request #13227 from fidencio/topic/rfc-composable-vm-images-update docs: detail composable image runtime contracts in proposal	2026-06-22 21:07:04 +02:00
Fabiano Fidêncio	84db260d9a	docs: detail composable image runtime contracts in proposal Update the composable-vm-images proposal with the design decisions we only arrived at after experimenting with the implementation: * Replace the hardcoded agent path-resolution table with the data-driven components.toml manifest (process levels, args/optional_args, env, wait_socket, ${...} substitution, and select/variants), keeping the agent generic. * Document the attester-variant contract: NVRC exports KATA_ATTESTER_VARIANT and the manifest selects the stock vs NVIDIA attestation-agent. * Document the runtime dependency requirements found during bring-up: the nvidia attester's LD_LIBRARY_PATH (libnvat closure in the coco addon + NVML in the gpu addon) and the NVML-init failure mode, plus CDH secure_mount tooling placement -- plain storage (mke2fs/mkfs.ext4/dd) in the base vs encrypted storage (cryptsetup) in the coco addon, the CDH PATH, and the base/addon ABI lockstep. * Reflect the storage tooling and bundled libraries in the base/coco-addon build sections, and mark the GPU addon as implemented. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-22 20:04:25 +02:00
Fabiano Fidêncio	9761ea2235	Merge pull request #13164 from manuelh-dev/mahuber/remove-resource-requests tests: use limits for Kata workload resources	2026-06-22 20:01:33 +02:00
Fabiano Fidêncio	d406a747a9	Merge pull request #13258 from fidencio/topic/fix-publish-payload-after-merge ci: do not publish a kata-monitor job-dispatcher manifest	2026-06-22 18:35:41 +02:00
Fabiano Fidêncio	a3f160bd40	ci: do not publish a kata-monitor job-dispatcher manifest The kata-monitor image has no job-dispatcher sidecar, so opt out of the kata-deploy-specific dispatcher manifest derivation in the payload-after-push workflow by setting KATA_DEPLOY_PUBLISH_JOB_DISPATCHER=false, mirroring the same fix already applied to the release workflows. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-06-22 13:49:57 +02:00
Fabiano Fidêncio	f1ebefcdfb	Merge pull request #13222 from fidencio/topic/nvidia-switch-to-kata-deploy-jobs kata-deploy: nvidia: Default to the Job-based deployment mode	2026-06-22 12:55:10 +02:00
Steve Horsman	a87d71763e	Merge pull request #13255 from kata-containers/dependabot/go_modules/src/runtime/github.com/containerd/containerd-1.7.33 build(deps): bump github.com/containerd/containerd from 1.7.32 to 1.7.33 in /src/runtime	2026-06-22 11:17:54 +01:00
Steve Horsman	20bcff185f	Merge pull request #13254 from kata-containers/dependabot/go_modules/src/runtime/go.mongodb.org/mongo-driver-1.17.7 build(deps): bump go.mongodb.org/mongo-driver from 1.14.0 to 1.17.7 in /src/runtime	2026-06-22 11:17:29 +01:00
Fabiano Fidêncio	f9682356ce	Merge pull request #13216 from Apokleos/hotunplug-blk runtime-rs: Add support for hot-unplugging block devices	2026-06-22 12:14:30 +02:00
Fabiano Fidêncio	337b600268	Merge pull request #13256 from fidencio/release/3.32.0 release: Bump version to 3.32.0 3.32.0	2026-06-22 10:33:25 +02:00
Alex Lyn	9550a323ac	Merge pull request #13245 from kata-containers/unify-nix-version Unify nix version	2026-06-22 15:25:10 +08:00
Alex Lyn	8ae08e7fb0	runtime-rs: Add dan_conf to allow network devices in host netns for qemu Network devices for VM-based containers are allowed to be placed in the host netns to eliminate as many hops as possible, which is what we aim for to achieve near-native networking performance. This commit introduces the `dan_conf` field to the configuration file. This allows the runtime to specify the configuration path for Direct Attached Network (DAN) devices, enabling interfaces to remain in the host network namespace while being utilized by the VM-based(qemu) containers. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-22 14:16:37 +08:00
Alex Lyn	b068f73543	runtime-rs: add experimental features documentation The experimental configuration allows enabling features not yet stable for production. These features may break compatibility and are prepared for major version bumps. Add documentation with force_guest_pull example across all runtime-rs configuration files. This feature enables guest-side image pulling in CoCo (Confidential Computing) scenarios. Example usage: experimental = ["force_guest_pull"] Fixes inconsistent documentation across configuration files Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-22 14:14:06 +08:00
Alex Lyn	71f3f783a4	runtime-rs: Remove mem_agent configuration for kata coco dev scenarios As it's useless with memory agent in kata-coco-dev scenarios, this commit aims to remove this items. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-22 14:14:06 +08:00
Alex Lyn	7aaa4e63d1	Merge pull request #13241 from PiotrProkop/exit-code agent: report 128+signal as exit code for signal-terminated processes	2026-06-22 09:13:24 +08:00
Fabiano Fidêncio	dc70b93573	release: Bump version to 3.32.0 Bump VERSION and helm-charts versions. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-06-22 01:15:24 +02:00
PiotrProkop	c2d737c9d7	agent: report 128+signal as exit code for signal-terminated processes When a container process is terminated by a signal, the agent's SIGCHLD reaper stored the raw signal number as the process exit code. As a result a process killed by SIGKILL(9) reported exit code 9 instead of the conventional 137 (128+9). Apply the standard shell convention of 128+signal_number so that signal-terminated processes report the expected exit codes, e.g. SIGKILL(9) -> 137, SIGTERM(15) -> 143, SIGINT(2) -> 130. This mimics runc, which encodes wait-status exit codes the same way: https://github.com/opencontainers/runc/blob/v1.4.3/libcontainer/utils/utils.go#L19 Both runc and this new Kata behaviour follow the conventional exit code semantics documented at https://tldp.org/LDP/abs/html/exitcodes.html. The conversion is factored into a small helper and covered by a unit test. The runtime and shim already pass the exit code through unchanged, so no further changes are needed for the corrected value to surface. Fixes: signal-terminated containers reporting raw signal numbers Signed-off-by: PiotrProkop <pprokop@nvidia.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-21 16:34:17 +02:00
dependabot[bot]	9c6cccb483	build(deps): bump github.com/containerd/containerd in /src/runtime Bumps [github.com/containerd/containerd](https://github.com/containerd/containerd) from 1.7.32 to 1.7.33. - [Release notes](https://github.com/containerd/containerd/releases) - [Changelog](https://github.com/containerd/containerd/blob/main/RELEASES.md) - [Commits](https://github.com/containerd/containerd/compare/v1.7.32...v1.7.33) --- updated-dependencies: - dependency-name: github.com/containerd/containerd dependency-version: 1.7.33 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2026-06-21 14:32:17 +00:00
Fabiano Fidêncio	374a867774	Merge pull request #13196 from microsoft/cameronbaird/upstream/runtime-go-clh-templating runtime: Enable VM Templating Support for CLH	2026-06-21 16:31:19 +02:00
Alex Lyn	0a63aebea9	runtime-rs: Implement remove_device for block device hot removal Replace the "Not yet implemented" stub in QemuInner::remove_device() with a working implementation that calls hotunplug_device() to perform the QMP-level device removal, then cleans up the internal devices list via retain() to remove stale coldplug entries. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-20 22:08:57 +08:00

1 2 3 4 5 ...

19472 Commits