kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-07-02 07:02:16 +00:00

Author	SHA1	Message	Date
Aurélien Bombo	0d5bde2181	runtime-rs: virtio-fs: plumb virtio_fs_queue_size to qemu/CH The shared filesystem device builder in `prepare_virtiofs` was hardcoding `queue_size = 0` and `queue_num = 0` on the `ShareFsConfig` it hands to the hypervisor, ignoring `SharedFsInfo.virtio_fs_queue_size` parsed from `configuration.toml` entirely. For qemu, this is silently broken: the cmdline generator's `DeviceVhostUserFs::set_queue_size` treats 0 as "not set" and skips the `queue-size=` argument when emitting the `vhost-user-fs-pci` device, so QEMU falls back to its built-in default of 128, regardless of what the user configured. For Cloud Hypervisor it happens to work in practice today, but only because `ch::handle_share_fs_device` and `TryFrom<ShareFsSettings> for FsConfig` substitute a hardcoded 1024 when the incoming `queue_num`/`queue_size` are zero. That fallback masks the real bug; the toml value still never reaches the VMM. Add a `get_shared_fs_info` accessor on `DeviceManager` mirroring the existing `get_block_device_info` helper, and use it in `prepare_virtiofs` to populate `ShareFsConfig.queue_size` from `SharedFsInfo.virtio_fs_queue_size`. Use a single virtqueue (`queue_num = 1`), matching what runtime-go hardcodes for both qemu (govmm `QemuFSParams` does not emit `num-queues=`) and CH (`numQueues := int32(1)` in `clh.go`). The CH-side fallback and the CH config template are addressed in a follow-up commit. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-05-19 06:14:24 +02:00
Alex Lyn	e5a7f5b120	Merge pull request #13009 from sebwolf-de/swolf/kata-fc-jailer-pid-leak Fix #13008: runtime/fc track real firecracker PID instead of jailer PID	2026-05-19 11:59:24 +08:00
Alex Lyn	357921df62	Merge pull request #12437 from Apokleos/fix-katactl-exec kata-ctl: Fix failures when kata-ctl exec with short id	2026-05-19 09:13:17 +08:00
Aurélien Bombo	83e20877d8	Merge pull request #12882 from stevenhorsman/runtime-rs/cdh_api_timeout runtime-rs: Add cdh_api_timeout configuration parameter	2026-05-18 15:38:27 -05:00
Sebastian Wolf	26746c9ce8	runtime/fc: track real firecracker PID instead of jailer PID When the jailer is in use (the default for kata-fc), cmd.Process.Pid in fcInit() is the jailer's PID, not firecracker's. The jailer forks + execs firecracker as a separate child and exits. fc.info.PID was therefore stored as the (soon-to-be-dead) jailer PID. At sandbox shutdown, fcEnd() calls WaitLocalProcess(fc.info.PID, SIGTERM, ...). syscall.Kill on the dead jailer PID returns ESRCH, WaitLocalProcess returns nil immediately, and the real firecracker microVM never receives a signal. It gets reparented to init and stays alive indefinitely, holding open resources from the host. Over many container lifecycles this becomes a serious resource leak. Read the real PID from <jailerRoot>/firecracker.pid, which firecracker itself writes after the exec. Update fc.info.PID with that value so all downstream code (fcEnd, Save/Load, kill-0 alive checks, NewProc) operates on the actual firecracker process. Also fix a small adjacent bug in Sandbox.Stop where the per-container teardown loop ignored the force flag, causing any container.stop error to short-circuit Stop before stopVM ran. Signed-off-by: Sebastian Wolf <swolf@nvidia.com>	2026-05-18 21:09:51 +02:00
Fabiano Fidêncio	9044ee22d2	Merge pull request #13024 from SAY-5/fix-typo-occured dragonball: fix typo in VsockEpollListener doc comment	2026-05-18 20:39:33 +02:00
Fabiano Fidêncio	6c2202a380	Merge pull request #13050 from burgerdev/mask-networkd-socket runtime-rs: mask systemd-networkd.socket	2026-05-18 20:34:26 +02:00
Hyounggyu Choi	c41cc4e27a	Merge pull request #13070 from BbolroC/refactor-block-dev-handling-runtime-rs runtime-rs: Extract block device storage source info logic	2026-05-18 20:24:13 +02:00
Fabiano Fidêncio	53e8fa8cbd	Merge pull request #12939 from stevenhorsman/agent-ctl/move-into-root-workspace agent-ctl: Move into root workspace	2026-05-18 18:12:51 +02:00
Hyounggyu Choi	b4d22be469	runtime-rs: Extract block device storage source info logic The two code blocks of extracting a block device storage source information for DeviceType::BlockModern/Block are essentially identical except the async lock operation. Extract the common logic into a helper function. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2026-05-18 16:05:38 +02:00
Steve Horsman	afcd995166	Merge pull request #13059 from fidencio/topic/runtime-rs-fix-trusted-ephemeral-storage-for-s390x runtime-rs: preserve ccw address for modern block devices	2026-05-18 09:49:43 +01:00
stevenhorsman	3466f888db	agent-ctl: Move into root workspace - Add agent-ctl to be a workspace member to simplify the dependency management. - Also add a test target as we've been running it in static-checks without it doing anything Assisted-by: IBM Bob Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-05-18 09:47:15 +01:00
SAY-5	99a8b7d8b4	dragonball: fix typo in VsockEpollListener doc Fixes the spelling "one ore more events have occured" to "one or more events have occurred" in the doc comment for the VsockEpollListener::notify trait method. Signed-off-by: SAY-5 <say.apm35@gmail.com>	2026-05-18 16:32:21 +08:00
Alex Lyn	aef3ab8f32	libs: Fix shim-interface tests after removing create_dir_all Two tests relied on the side-effect of create_dir_all (removed in the previous commit) to pass: (1) test_get_uds_with_sid_ok: use a directory name that actually starts with the search prefix so prefix matching works without creating dirs. (2) test_get_uds_with_sid_with_zero: assert Err on zero matches instead of Ok, matching the corrected lookup behavior. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-18 15:46:01 +08:00
Alex Lyn	4764e31d00	kata-ctl: Fix failures when kata-ctl exec with short id When running kata-ctl exec <short-id>, kata-ctl may fail with: "more than one sandbox exists with the provided prefix "ed07", please provide a unique prefix". At the same time, a new subdirectory named <short-id> is incorrectly created under /run/kata/. This is wrong behavior: a short ID should be used only to match an existing sandbox by prefix, and must not trigger creation of a new sandbox directory when lookup fails or is ambiguous. Update the exec path to perform prefix matching and return an error on no match or non-unique matches, without creating any new directories. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-18 15:46:01 +08:00
Alex Lyn	67e3bc754d	runtime-rs: Move KATA_PATH creation from sb_storage_path() to MgmtServer sb_storage_path() is a path accessor shared by both server (shim) and client (kata-ctl). Having it call create_dir_all(KATA_PATH) on every invocation is incorrect: the client side should never create directories — if /run/kata/ does not exist, no shim is running. Move the directory creation to MgmtServer::new(), which is the server- side component that manages the shim management socket under KATA_PATH. Make sb_storage_path() a pure accessor returning &'static str directly. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-18 15:45:56 +08:00
Markus Rudy	7df5907c71	runtime-rs: mask systemd-networkd.socket We are already masking systemd-networkd.service, which causes systemd to log an error about the socket still being enabled. In runtime-go, we're masking the socket, so mask it in runtime-rs, too. Signed-off-by: Markus Rudy <mr@edgeless.systems>	2026-05-18 09:28:16 +02:00
Markus Rudy	5474f68168	Merge pull request #12970 from burgerdev/genpolicy-build-test-binaries genpolicy: include test binaries in make target build	2026-05-18 09:22:11 +02:00
Alex Lyn	34dc055da3	Merge pull request #12932 from RainaYL/rainax/tdshim_pr dragonball: Allow guest VM to load tdshim firmware for booting	2026-05-18 10:43:22 +08:00
Alex Lyn	3345a370d2	Merge pull request #13051 from burgerdev/dont-modify-initdata runtime-rs: don't modify initdata from annotation	2026-05-18 09:41:47 +08:00
Markus Rudy	38948f31a7	genpolicy: include test binaries in make target build genpolicy supports building and testing on Darwin, both for Kata developers as well as for users of the tool. In CI, we're currently only testing the binary build on darwin, the test is only executed on Linux. Since we aim to support development on darwin, including test execution, we need to prevent regressions such as [1]. This commit adds the test binaries to the `make build` target, such that they are covered by `ci/darwin-tests.sh`. In order to avoid unnecessary recompilation between the build and test target, we align the `--release` handling between the two. [1]: `639ff3578d` Signed-off-by: Markus Rudy <mr@edgeless.systems>	2026-05-16 20:47:12 +02:00
Markus Rudy	4d0f32ce41	runtime-rs: use proper temp dirs in initdata tests The test currently uses a static directory at `/tmp/initimg_test`. This introduces non-determinism into the unit test: * Files that already exist in that dir might alter test results. * If the directory is owned by root, the test will fail due to permissions. Switch to using the tempfile crate instead. Fixes: #13053 Signed-off-by: Markus Rudy <mr@edgeless.systems>	2026-05-16 20:39:13 +02:00
Markus Rudy	4971445f67	runtime-rs: don't modify initdata from annotation The initdata is currently being decoded, and then re-encoded with the to_string function. This will usually not preserve the original initdata document, and thus the initdata hash will differ between the annotation and the block device. This commit changes the logic to only decode the base64, but keep the initdata document intact. Since the error message is now nested, adjust the tests to look for the expected error in the chain. Fixes: #12951 Signed-off-by: Markus Rudy <mr@edgeless.systems>	2026-05-16 20:26:22 +02:00
Fabiano Fidêncio	1a4074ab2e	agent: handle encrypted ephemeral storage for CCW block devices VirtioBlkCcwHandler::create_device was calling common_storage_handler directly, bypassing the handle_block_storage function that checks for the encryption_key=ephemeral driver option. This meant that encrypted emptyDir volumes on s390x would attempt a plain mount of the raw block device instead of setting up dm-crypt via the CDH, resulting in an EINVAL mount error. Route CCW block devices through handle_block_storage, matching the pattern used by VirtioBlkPciHandler. Fixes: failed to mount /dev/vda to .../storage/..., EINVAL Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-16 12:07:12 +02:00
Fabiano Fidêncio	10b9ab38ab	runtime-rs: preserve ccw address for modern block devices Store the hotplugged CCW address in BlockModern configs and use it when building storage sources so s390x encrypted emptyDir paths no longer fall back to /dev/vda. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-16 11:16:20 +02:00
Fabiano Fidêncio	8e1d73a4b5	Merge pull request #13052 from burgerdev/abort-later agent: wait for logs before aborting	2026-05-15 23:58:26 +02:00
Dan Mihai	0f3df5d1e4	Merge pull request #13025 from manuelh-dev/mahuber/img-pull-policy tests: generate guest-pull image pull agent security policies	2026-05-15 14:09:00 -07:00
Fabiano Fidêncio	ae1f67a4f3	Merge pull request #13040 from fidencio/topic/runtime-rs-ephemeral-storage runtime-rs: ephemeral storage port	2026-05-15 18:24:27 +02:00
Markus Rudy	32f2c5c2e4	agent: wait for logs before aborting If the policy loading encounters an error, we `abort(3)` the agent for safety. Since abort causes the process to stop immediately, the async logs might not be flushed yet, and thus won't make it to the runtime, hiding the reason for the abort. Wait a bit before aborting so that the logs are fully written. Fixes: #13031 Signed-off-by: Markus Rudy <mr@edgeless.systems>	2026-05-15 12:36:29 +02:00
Fabiano Fidêncio	33de5a6c22	runtime-rs: refactor handler_volumes to use VolumeContext Group the shared-context parameters (share_fs, device_manager, sid, agent, emptydir_mode) into a VolumeContext struct so handler_volumes stays within clippy's argument count limit and avoids -D warnings breakage in CI. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-05-14 22:56:11 +02:00
Fabiano Fidêncio	54aaa1ea2a	tests: enable trusted ephemeral storage for runtime-rs Remove the runtime-rs skip from the trusted ephemeral data storage test now that runtime-rs implements block-encrypted emptyDir volumes. Also remove the genpolicy drop-in that disabled encrypted_emptydir for runtime-rs and the corresponding copy logic in tests_common.sh. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-05-14 22:56:11 +02:00
Fabiano Fidêncio	aa7392b1b9	runtime-rs: add emptydir_mode to config templates Add the emptydir_mode configuration option to all runtime-rs config template files. CoCo configs (snp, tdx, se, coco-dev, nvidia-gpu-snp, nvidia-gpu-tdx) default to block-encrypted via @DEFEMPTYDIRMODE_COCO@, while non-CoCo configs (qemu, nvidia-gpu, fc) default to shared-fs via @DEFEMPTYDIRMODE@. Also add DEFEMPTYDIRMODE and DEFEMPTYDIRMODE_COCO variables to the runtime-rs Makefile for template substitution. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-05-14 22:56:11 +02:00
Fabiano Fidêncio	5e2ca6d6ee	runtime-rs: skip local type conversion for block-encrypted emptyDirs When emptydir_mode is "block-encrypted", host emptyDir paths must remain as "bind" mounts so the EncryptedEmptyDirVolume handler can intercept them in the volume dispatch chain. Previously, update_ephemeral_storage_type() would unconditionally convert them to "local" type, causing them to be handled as plain local volumes instead. Add the emptydir_mode parameter to update_ephemeral_storage_type() and its call chain (amend_spec in container.rs) and skip the host-emptyDir-to-local conversion when the mode is block-encrypted. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-05-14 22:56:11 +02:00
Fabiano Fidêncio	d3a9669be5	runtime-rs: implement EncryptedEmptyDirVolume Add the core volume handler for block-encrypted emptyDir support in runtime-rs, bringing it to parity with the Go runtime (PR #10559). When emptydir_mode is set to "block-encrypted", host emptyDir bind mounts are intercepted and handled as follows: 1. A sparse disk image (disk.img) is created inside the emptyDir folder, sized to match the host filesystem capacity. 2. A mountInfo.json is written under the kata direct-volume root with volume_type "blk", fs_type "ext4", and metadata encryptionKey=ephemeral. 3. The disk image is plugged into the guest VM as a virtio-blk device via the hypervisor device manager. 4. An agent::Storage is built with driver_options containing encryption_key=ephemeral and shared=true, so the kata-agent delegates formatting and encryption to CDH using LUKS2. The volume is registered in the dispatch chain before the regular block-volume check, and ephemeral disk metadata is tracked for sandbox-level cleanup at teardown. Also re-exports EMPTYDIR_MODE_* constants from kata-types::config so downstream crates can reference them. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-05-14 22:56:11 +02:00
Fabiano Fidêncio	0b1e103886	runtime-rs: agent: add shared field to Storage struct The proto Storage message already has a "shared" field (field 8), but the runtime-rs agent crate's internal Storage struct was missing it, so it was never forwarded to the kata-agent. Add the field to the Rust struct and its From<Storage> translation, and update all explicit struct initialisers across the resource crate to include shared: false so the build stays clean. This is needed for trusted ephemeral data storage, where the agent uses the shared flag to avoid premature cleanup of volumes that are shared across containers in a pod. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-05-14 15:42:20 +02:00
Fabiano Fidêncio	00d4ee2344	kata-types: add direct-volume write/remove helpers Add add_volume_mount_info(), is_volume_mounted(), and remove_volume_path() to the mount module. These mirror the Go helpers (AddMountInfo, IsVolumeMounted, Remove) in src/runtime/pkg/direct-volume/utils.go and are needed by the upcoming EncryptedEmptyDirVolume to write and clean up mountInfo.json metadata for block-encrypted emptyDir volumes. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-05-14 15:42:20 +02:00
Steve Horsman	557fb5187b	Merge pull request #12853 from kata-containers/dependabot/go_modules/src/runtime/github.com/sirupsen/logrus-1.9.4 build(deps): bump github.com/sirupsen/logrus from 1.9.3 to 1.9.4 in /src/runtime	2026-05-14 13:56:10 +01:00
Fabiano Fidêncio	b4a9d3256b	kata-types: add emptydir_mode configuration option Add the emptydir_mode field to the Runtime configuration struct, allowing runtime-rs to read the emptyDir handling mode from the TOML config file. This is groundwork for trusted ephemeral data storage support in runtime-rs (parity with the Go runtime). Two modes are supported: - shared-fs (default): share emptyDir via virtio-fs/9p. - block-encrypted: plug a block device encrypted in-guest via CDH/LUKS2. Empty values default to "shared-fs"; unknown values are rejected during validation. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-05-14 11:29:40 +02:00
Fabiano Fidêncio	c8f6f17269	Merge pull request #13027 from PiotrProkop/fix-loop-blockfile-sandbox-cgroup runtime: allow loopback devices when sandbox_cgroup_only is enabled	2026-05-14 11:18:45 +02:00
Fabiano Fidêncio	44b356c654	Merge pull request #13033 from microsoft/saul/static_maxvcpus runtime-rs: static resources: always set maxvcpus equal to vcpus	2026-05-14 11:16:35 +02:00
Xiaofan Xxf	88d892a77f	dragonball: Allow guest VM to load tdshim firmware for booting Added a firmware module to dbs_boot crate, and guest VM is allowed to load tdshim into memory, which serves as a prerequisite for booting TDX VM. And other sections (including kernel payload and cmdline) are also loaded into correct guest physical addresses according to the design of tdshim layout. Signed-off-by: Xiaofan Xxf <xiaofan.xxf@antgroup.com>	2026-05-14 10:04:39 +08:00
Saul Paredes	d930fc42b8	runtime-rs: static resources: always set maxvcpus equal to vcpus based on current runtime-go behaviour introduced in https://github.com/kata-containers/kata-containers/pull/9195 When using static resources, always set maxvcpus value equal to the vcpus value. This is because the static resources case does not support dynamic CPU hotplugging, and therefore the maximum number of vCPUs should be limited to the number of vCPUs. Booting with a high number of max vCPUs is a bit slower compared to a lower number. Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2026-05-13 13:21:56 -07:00
dependabot[bot]	408e15641c	build(deps): bump github.com/sirupsen/logrus in /src/runtime Bumps [github.com/sirupsen/logrus](https://github.com/sirupsen/logrus) from 1.9.3 to 1.9.4. - [Release notes](https://github.com/sirupsen/logrus/releases) - [Changelog](https://github.com/sirupsen/logrus/blob/master/CHANGELOG.md) - [Commits](https://github.com/sirupsen/logrus/compare/v1.9.3...v1.9.4) --- updated-dependencies: - dependency-name: github.com/sirupsen/logrus dependency-version: 1.9.4 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2026-05-13 06:11:21 +00:00
Greg Kurz	d2dc0a923c	Merge pull request #13030 from stevenhorsman/go-1.25.10-bump Go 1.25.10 bump	2026-05-13 08:09:51 +02:00
Aurélien Bombo	dcafae9645	Merge pull request #13032 from kata-containers/sprt/fix-virtiofsd-args runtime-rs: align virtiofsd args on runtime-go	2026-05-12 19:55:54 -05:00
Dan Mihai	3799473041	Merge pull request #13010 from microsoft/danmihai1/label-references genpolicy: support env variable values sourced from metadata.labels values	2026-05-12 15:41:11 -07:00
Manuel Huber	93e93f36ea	genpolicy: model ignored Pod node affinity fields Add Kubernetes nodeAffinity structures so genpolicy can parse Pod YAMLs that carry scheduling constraints ignored by policy. Cover the shape in the ignored-fields fixture alongside the existing Pod affinity and anti-affinity data. Assisted-by: OpenAI Codex <codex@openai.com> Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-05-12 15:03:14 -07:00
Aurélien Bombo	555b7738fe	runtime-rs: align virtiofsd args on runtime-go Runtime-go doesn't hardcode --sandbox none --seccomp none [1], so mirror that in runtime-rs. [1]: `733ccb3254/src/runtime/virtcontainers/virtiofsd.go (L183)` Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-05-12 12:51:32 -05:00
Greg Kurz	733ccb3254	Merge pull request #12996 from stevenhorsman/swap-agent-ctl-to-skopeo&umoci agent-ctl: Swap rootfs bundle pull implementation	2026-05-12 19:12:27 +02:00
PiotrProkop	5065058d4a	runtime: fix device allowlist detection comparing pointers Because intptr() returns a fresh pointer on every call, those comparisons compared addresses, never values, so every check evaluated to false. As a result /dev/null, /dev/urandom, /dev/ptmx, /dev/loop-control and /dev/loop* were appended to devices allowlist for sandbox_cgroup even when the runtime spec already listed them, producing duplicate entries. Switch to nil-safe value comparisons via a type switch on the cgroup device type and dereferenced d.Major / d.Minor, keeping the same detection semantics but actually matching existing entries. Assisted-By: Claude 4.7 Signed-off-by: PiotrProkop <pprokop@nvidia.com>	2026-05-12 18:52:53 +02:00

1 2 3 4 5 ...

6423 Commits