kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-07-01 14:38:33 +00:00

Author	SHA1	Message	Date
Alex Lyn	499fefd972	kata-types: Extend DmVerityInfo with salt, hash_type, no_superblock fields Add fields to DmVerityInfo needed for dm-verity device creation: (1) salt: Optional salt value for the hash computation (2) hash_type: dm-verity version (3) no_superblock: whether to skip the superblock at hash offset Uses serde defaults for backward compatibility with existing serialized data that lacks these fields. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-26 09:51:05 +08:00
Fabiano Fidêncio	79cf2aed66	Merge pull request #13282 from fidencio/topic/revert-qos-test-skip Revert "tests: skip Guaranteed QoS test for SNP/TDX runtime-rs"	2026-06-25 22:18:21 +02:00
Fabiano Fidêncio	850b385f6b	Revert "tests: skip Guaranteed QoS test for SNP/TDX runtime-rs" This reverts commit `6588014b54`, as the needed PR[0] was merged this morning, allowing us to just revert the image. [0]: https://github.com/kata-containers/kata-containers/pull/13173 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-06-25 18:18:17 +02:00
Fabiano Fidêncio	31d349f999	Merge pull request #13173 from fidencio/topic/fixed-sandbox-sizing runtime-rs: size sandboxes with fixed overheads	2026-06-25 15:50:00 +02:00
Fabiano Fidêncio	a664595084	kata-deploy: bump qemu RuntimeClass overhead for the aarch64 VMM With sandbox_cgroup_only the shim, QEMU and virtiofsd run inside the pod's memory cgroup, whose limit is the workload limit plus the RuntimeClass pod overhead. On aarch64 the VMM host footprint is much larger than on x86 (QEMU's own anon RSS is ~160Mi+ before any guest RAM, on top of the shmem-backed guest memory), so the 160Mi overhead is too small: small-memory-limit pods get their qemu-system process OOM-killed by the pod cgroup (CONSTRAINT_MEMCG), and the agent vsock never comes up (ENODEV), so the sandbox fails to start. Raise the pod overhead to 320Mi for the qemu shims that run on aarch64 (qemu, qemu-runtime-rs, qemu-coco-dev-runtime-rs). The value is applied on all architectures for simplicity; x86 is over-provisioned by ~160Mi, which is acceptable. The TEE/GPU shims already carry far larger overhead and amd64-only shims (clh*, dragonball, fc) are unaffected. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-25 13:56:11 +02:00
Fabiano Fidêncio	b2f7314d31	tests: harden sandbox sizing manifests for k8s cpu workloads Route runtime-rs tests to dedicated manifests/templates and ensure the CPU allocation workloads always carry explicit memory limits, avoiding Dragonball sandbox startup failures from InvalidMemorySize(0). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-25 13:56:11 +02:00
Fabiano Fidêncio	346a3be9ad	docs: document runtime-rs sandbox overhead sizing Add a how-to describing how runtime-rs sizes static sandboxes from overhead plus requested CPU/memory, including that fractional vCPU results are rounded up for VMM-visible vCPU counts, and link it from the how-to README. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-25 13:56:11 +02:00
Fabiano Fidêncio	a34c74a2d4	runtime-rs: size static sandboxes with overhead values When static sandbox sizing is enabled, keep configured defaults when workloads do not specify CPU or memory limits. When limits are present, size the VM as requested resources plus overhead_vcpus/overhead_memory values derived from runtime-rs profile defaults. Limit-driven vCPU sizing is clamped to a minimum of one vCPU so a 0.0 result never yields an unbootable VM, and sandbox setup fails early with a clear, actionable error when the computed memory is 0 MiB (pointing at memory limits or non-zero default/overhead memory settings). This keeps static VM sizing predictable across runtime-rs profiles, including NVIDIA ones. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-25 13:56:11 +02:00
Fabiano Fidêncio	65a266f532	Merge pull request #13272 from cayoub-oai/codex/upstream-cgroupfs-init-subcgroup agent: Apply init subcgroup in cgroupfs manager	2026-06-25 13:54:19 +02:00
Aurélien Bombo	1217dd1584	Merge pull request #12373 from kata-containers/disable-guest-empty-dir runtime: Set `disable_guest_empty_dir = true` by default	2026-06-24 20:09:46 -05:00
Chris Ayoub	4e3d257dc0	agent: Apply init subcgroup in cgroupfs manager When cgroup v2 is enabled, exec can fail with EBUSY while writing the process to cgroup.procs if the container process has been delegated to an init subcgroup. PR #10845 fixed this behavior for the systemd/D-Bus cgroup manager path, which was related to #10733. The cgroupfs manager still writes the process directly to the container cgroup, so apply the same init subcgroup handling there. Also fix the cgroupfs init-subcgroup existence check for absolute OCI cgroup paths by joining the trimmed cgroup path under the cgroup root. Fixes: #9701 Signed-off-by: Chris Ayoub <cayoub@openai.com> Generated-By: OpenAI Codex	2026-06-24 21:25:49 +00:00
Aurélien Bombo	10cf6816aa	kernel: Fix FUSE crash with host emptyDir This patch was submitted by Miklos Szeredi: https://lore.kernel.org/fuse-devel/20260528142306.1792392-1-mszeredi@redhat.com/ It fixes a FUSE oops with the k8s-shared-volume.bats test. Fixes: #12589 Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-06-24 15:22:13 -05:00
Aurélien Bombo	77c3e36cf7	tests: Support GENPOLICY_SETTINGS_DIR with drop-in-examples Follow-up to `3dd77bf576`. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-06-24 15:22:13 -05:00
Aurélien Bombo	3acb618f6b	genpolicy: Assume `disable_guest_empty_dir = true` This option should be removed for 4.0, so we don't handle `false`. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-06-24 15:22:13 -05:00
Aurélien Bombo	e191c5b716	runtime-go/rs: Reconcile hugepage emptyDirs and disable_guest_empty_dir This addresses an issue where the disable_guest_empty_dir=true code paths did not take into account that hugepage-backed emptyDirs should always be recreated in the guest (using guest hugepages). Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-06-24 15:22:13 -05:00
Aurélien Bombo	a3e91d9ed2	runtime-go/rs: Set `disable_guest_empty_dir = true` by default This makes the runtime share the host Kubelet emptyDir folder with the guest instead of the agent creating an empty folder in the container rootfs. Doing so enables the Kubelet to track emptyDir usage and evict greedy pods. In other words, with virtio-fs the container rootfs uses host storage whether this is true or false, however with true, Kata uses the k8s emptyDir folder so the sizeLimit is properly enforced by k8s. Addresses the ephemeral storage part of #12203. History: * Initially, emptyDirs are slow because they are shared from the host with 9p. https://github.com/kata-containers/runtime/issues/1472 * To address above, emptyDirs are hardcoded to be created by the agent in the pause container's rootfs, potentially leveraging devicemapper and improving perf. https://github.com/kata-containers/runtime/pull/1485 * The previous PR regressed an (interesting?) use case where emptyDirs were used to share data from the host to the guest, so the behavior was made configurable and `disable_guest_empty_dir = false` is introduced, defaulting to the behavior of the previous PR. https://github.com/kata-containers/kata-containers/pull/2056 * Another resource accounting regression remains which is addressed in this PR. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-06-24 15:21:53 -05:00
Fabiano Fidêncio	6528e7a72f	Merge pull request #13228 from fidencio/topic/dont-set-slots-maxmem-for-confidential-guests runtime-rs: qemu: don't set slots/maxmem for confidential guests	2026-06-24 17:27:28 +02:00
Greg Kurz	13b3020c34	Merge pull request #13261 from c3d/bug/13260-Info-log-level runtime: Change default log level from Warn to Info	2026-06-24 08:57:13 +02:00
Fabiano Fidêncio	392b802f61	Merge pull request #12878 from Apokleos/fix-configs runtime-rs: Fix configs differences between runtime-rs and runtime-go	2026-06-23 13:53:16 +02:00
Steve Horsman	811914a372	Merge pull request #13246 from Apokleos/copyfile-with-gid-uid runtime-rs: correct uid/gid for K8s secret/configmap copy_file	2026-06-23 10:43:03 +01:00
Steve Horsman	3e429a8afb	Merge pull request #13234 from LandonTClipp/docs-skill docs: Add AI agent skill for doc contributions	2026-06-23 09:59:15 +01:00
Christophe de Dinechin	631fd96715	runtime: Change default log level from Warn to Info When the kata configuration does not set log_level to debug, the containerd-shim-v2 defaults to WarnLevel, which suppresses important diagnostic information logged at Info level. Key Info-level logs that are currently hidden: - QEMU command line (qemu.go:3566) - critical for debugging VM issues - VM lifecycle events (creation, start, stop) - Device hotplug operations (VFIO, network, volumes) - Resource configuration (NUMA, memory) - QMP socket details Info level provides significantly better diagnostic data without flooding logs with excessive detail (which would occur at Debug level). This change improves troubleshooting capabilities for production deployments where debug mode is not enabled. Note: runtime-rs already defaults to Info level (see src/runtime-rs/crates/shim/src/logger.rs:13,30), so this change only affects the Go runtime. Fixes: #13260 Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>	2026-06-23 10:29:33 +02:00
LandonTClipp	85e828cc9b	docs: Add AI agent skill for doc contributions This skill will inform AI agents how to properly write and format docs in the new docs system. There is nothing too fancy, just reminding agents to use mkdocs-materialx features instead of treating the markdown like the legacy Github-based format. Signed-off-by: LandonTClipp <lclipp@coreweave.com>	2026-06-23 08:57:37 +01:00
Fabiano Fidêncio	bbe714ae03	Merge pull request #13227 from fidencio/topic/rfc-composable-vm-images-update docs: detail composable image runtime contracts in proposal	2026-06-22 21:07:04 +02:00
Fabiano Fidêncio	84db260d9a	docs: detail composable image runtime contracts in proposal Update the composable-vm-images proposal with the design decisions we only arrived at after experimenting with the implementation: * Replace the hardcoded agent path-resolution table with the data-driven components.toml manifest (process levels, args/optional_args, env, wait_socket, ${...} substitution, and select/variants), keeping the agent generic. * Document the attester-variant contract: NVRC exports KATA_ATTESTER_VARIANT and the manifest selects the stock vs NVIDIA attestation-agent. * Document the runtime dependency requirements found during bring-up: the nvidia attester's LD_LIBRARY_PATH (libnvat closure in the coco addon + NVML in the gpu addon) and the NVML-init failure mode, plus CDH secure_mount tooling placement -- plain storage (mke2fs/mkfs.ext4/dd) in the base vs encrypted storage (cryptsetup) in the coco addon, the CDH PATH, and the base/addon ABI lockstep. * Reflect the storage tooling and bundled libraries in the base/coco-addon build sections, and mark the GPU addon as implemented. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-22 20:04:25 +02:00
Fabiano Fidêncio	9761ea2235	Merge pull request #13164 from manuelh-dev/mahuber/remove-resource-requests tests: use limits for Kata workload resources	2026-06-22 20:01:33 +02:00
Fabiano Fidêncio	d406a747a9	Merge pull request #13258 from fidencio/topic/fix-publish-payload-after-merge ci: do not publish a kata-monitor job-dispatcher manifest	2026-06-22 18:35:41 +02:00
Fabiano Fidêncio	a3f160bd40	ci: do not publish a kata-monitor job-dispatcher manifest The kata-monitor image has no job-dispatcher sidecar, so opt out of the kata-deploy-specific dispatcher manifest derivation in the payload-after-push workflow by setting KATA_DEPLOY_PUBLISH_JOB_DISPATCHER=false, mirroring the same fix already applied to the release workflows. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-06-22 13:49:57 +02:00
Fabiano Fidêncio	f1ebefcdfb	Merge pull request #13222 from fidencio/topic/nvidia-switch-to-kata-deploy-jobs kata-deploy: nvidia: Default to the Job-based deployment mode	2026-06-22 12:55:10 +02:00
Steve Horsman	a87d71763e	Merge pull request #13255 from kata-containers/dependabot/go_modules/src/runtime/github.com/containerd/containerd-1.7.33 build(deps): bump github.com/containerd/containerd from 1.7.32 to 1.7.33 in /src/runtime	2026-06-22 11:17:54 +01:00
Steve Horsman	20bcff185f	Merge pull request #13254 from kata-containers/dependabot/go_modules/src/runtime/go.mongodb.org/mongo-driver-1.17.7 build(deps): bump go.mongodb.org/mongo-driver from 1.14.0 to 1.17.7 in /src/runtime	2026-06-22 11:17:29 +01:00
Fabiano Fidêncio	f9682356ce	Merge pull request #13216 from Apokleos/hotunplug-blk runtime-rs: Add support for hot-unplugging block devices	2026-06-22 12:14:30 +02:00
Fabiano Fidêncio	337b600268	Merge pull request #13256 from fidencio/release/3.32.0 release: Bump version to 3.32.0 3.32.0	2026-06-22 10:33:25 +02:00
Alex Lyn	9550a323ac	Merge pull request #13245 from kata-containers/unify-nix-version Unify nix version	2026-06-22 15:25:10 +08:00
Alex Lyn	8ae08e7fb0	runtime-rs: Add dan_conf to allow network devices in host netns for qemu Network devices for VM-based containers are allowed to be placed in the host netns to eliminate as many hops as possible, which is what we aim for to achieve near-native networking performance. This commit introduces the `dan_conf` field to the configuration file. This allows the runtime to specify the configuration path for Direct Attached Network (DAN) devices, enabling interfaces to remain in the host network namespace while being utilized by the VM-based(qemu) containers. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-22 14:16:37 +08:00
Alex Lyn	b068f73543	runtime-rs: add experimental features documentation The experimental configuration allows enabling features not yet stable for production. These features may break compatibility and are prepared for major version bumps. Add documentation with force_guest_pull example across all runtime-rs configuration files. This feature enables guest-side image pulling in CoCo (Confidential Computing) scenarios. Example usage: experimental = ["force_guest_pull"] Fixes inconsistent documentation across configuration files Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-22 14:14:06 +08:00
Alex Lyn	71f3f783a4	runtime-rs: Remove mem_agent configuration for kata coco dev scenarios As it's useless with memory agent in kata-coco-dev scenarios, this commit aims to remove this items. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-22 14:14:06 +08:00
Alex Lyn	7aaa4e63d1	Merge pull request #13241 from PiotrProkop/exit-code agent: report 128+signal as exit code for signal-terminated processes	2026-06-22 09:13:24 +08:00
Fabiano Fidêncio	dc70b93573	release: Bump version to 3.32.0 Bump VERSION and helm-charts versions. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-06-22 01:15:24 +02:00
PiotrProkop	c2d737c9d7	agent: report 128+signal as exit code for signal-terminated processes When a container process is terminated by a signal, the agent's SIGCHLD reaper stored the raw signal number as the process exit code. As a result a process killed by SIGKILL(9) reported exit code 9 instead of the conventional 137 (128+9). Apply the standard shell convention of 128+signal_number so that signal-terminated processes report the expected exit codes, e.g. SIGKILL(9) -> 137, SIGTERM(15) -> 143, SIGINT(2) -> 130. This mimics runc, which encodes wait-status exit codes the same way: https://github.com/opencontainers/runc/blob/v1.4.3/libcontainer/utils/utils.go#L19 Both runc and this new Kata behaviour follow the conventional exit code semantics documented at https://tldp.org/LDP/abs/html/exitcodes.html. The conversion is factored into a small helper and covered by a unit test. The runtime and shim already pass the exit code through unchanged, so no further changes are needed for the corrected value to surface. Fixes: signal-terminated containers reporting raw signal numbers Signed-off-by: PiotrProkop <pprokop@nvidia.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-21 16:34:17 +02:00
dependabot[bot]	9c6cccb483	build(deps): bump github.com/containerd/containerd in /src/runtime Bumps [github.com/containerd/containerd](https://github.com/containerd/containerd) from 1.7.32 to 1.7.33. - [Release notes](https://github.com/containerd/containerd/releases) - [Changelog](https://github.com/containerd/containerd/blob/main/RELEASES.md) - [Commits](https://github.com/containerd/containerd/compare/v1.7.32...v1.7.33) --- updated-dependencies: - dependency-name: github.com/containerd/containerd dependency-version: 1.7.33 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2026-06-21 14:32:17 +00:00
Fabiano Fidêncio	374a867774	Merge pull request #13196 from microsoft/cameronbaird/upstream/runtime-go-clh-templating runtime: Enable VM Templating Support for CLH	2026-06-21 16:31:19 +02:00
Alex Lyn	0a63aebea9	runtime-rs: Implement remove_device for block device hot removal Replace the "Not yet implemented" stub in QemuInner::remove_device() with a working implementation that calls hotunplug_device() to perform the QMP-level device removal, then cleans up the internal devices list via retain() to remove stale coldplug entries. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-20 22:08:57 +08:00
Alex Lyn	d4212bcb74	runtime-rs: Add hotunplug_device dispatcher for device type routing Introduce hotunplug_device() as the device-type dispatcher that routes hot removal requests to the appropriate QMP method. Currently supports Block and BlockModern device types, which are forwarded to Qmp::hotunplug_block_device(). All other device types return an explicit "unsupported" error. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-20 22:08:57 +08:00
Alex Lyn	281b6aa61a	runtime-rs: Add hotunplug_block_device for block device hot removal Implement QMP-level block device hot-unplug by issuing device_del to remove the frontend device and blockdev_del to remove the backend blockdev node. For virtio-blk-ccw on s390x, the CCW subchannel slot is also released. Since QMP device_del is asynchronous and only initiates the removal request, introduce wait_for_device_deleted() to poll for the DEVICE_DELETED event before tearing down the backend. This prevents blockdev_del from failing with "Node is still in use". If blockdev_del fails, the error is logged but CCW cleanup still proceeds before the error is propagated, ensuring consistent subchannel state. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-20 22:08:57 +08:00
Alex Lyn	431720025c	runtime-rs: Enhance hotplug_block_device error handling and rollback Improve the reliability of block device hotplug by ensuring that blockdev-add nodes are properly cleaned up when subsequent device_add operations fail. To address this, A new method of device_add_with_rollback is introduced to do device_add and do properly cleaned up when it fails. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-20 22:08:57 +08:00
dependabot[bot]	399c863cd2	build(deps): bump go.mongodb.org/mongo-driver in /src/runtime Bumps [go.mongodb.org/mongo-driver](https://github.com/mongodb/mongo-go-driver) from 1.14.0 to 1.17.7. - [Release notes](https://github.com/mongodb/mongo-go-driver/releases) - [Commits](https://github.com/mongodb/mongo-go-driver/compare/v1.14.0...v1.17.7) --- updated-dependencies: - dependency-name: go.mongodb.org/mongo-driver dependency-version: 1.17.7 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>	2026-06-20 10:22:56 +00:00
Cameron Baird	730307f32c	factory: Default to normal sandbox boot path when factory init not done The behavior we had before was that, for a starting k8s pod, it sees enable_template=true and therefore: 1. Tries NewFactory with fetchOnly=true 2. When that fails (because template.Fetch fails to find the artifacts, we retry with fetchOnly=false. This creates a direct factory which creates the template from scratch (hence we pay a full pod sandbox boot time here) and then restores from that. Hence the boot times are strictly worse on this path. Now, even when enable_template=true, we don't try to force a direct factory. Instead we just revert to the standard sandbox boot path. Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>	2026-06-19 18:00:02 +00:00
Cameron Baird	65a5f272f8	ci: Introduce tests for VM template factory Add k8s-vm-templating-test.bats which exercises pod create with the factory initialized on the target node. Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>	2026-06-19 18:00:02 +00:00
Cameron Baird	c0f9744225	runtime: Implement support for VM Template factory in clh Add support for VM Template factory on the clh path. In order to support snapshot/restore-based VM templating, the following changes were needed: 1. For clh.go, implement SaveVM, PauseVM, restoreVM, ResumeVM 2. Remove initrd config check for VM Templating path. The root disk image (when using image mode) is created in memory and therefore captured in the VM snapshot. 3. Truncate the memory file to the size of the VM at factory VM create time. This allows CLH to use the memory file as the backing for the template VM memory, allowing O(1) snapshot times. 4. CLH uses memory zones as backing for its memory on the template paths 5. Update StartVM in CLH to use the restore path when template is configured and available Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>	2026-06-19 18:00:02 +00:00

1 2 3 4 5 ...

19465 Commits