kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-04-04 02:53:45 +00:00

Author	SHA1	Message	Date
RuoqingHe	cfc1836a31	Merge pull request #12672 from stevenhorsman/agent-security-fixes agent: Bump tracing-subscriber	2026-03-20 17:37:16 +08:00
Steve Horsman	7ab6e11e10	Merge pull request #12678 from kata-containers/dependabot/go_modules/src/runtime/google.golang.org/grpc-1.79.3 build(deps): bump google.golang.org/grpc from 1.72.0 to 1.79.3 in /src/runtime	2026-03-20 08:49:35 +00:00
Steve Horsman	e475fb2116	Merge pull request #12680 from kata-containers/dependabot/go_modules/src/tools/csi-kata-directvolume/google.golang.org/grpc-1.79.3 build(deps): bump google.golang.org/grpc from 1.63.2 to 1.79.3 in /src/tools/csi-kata-directvolume	2026-03-20 08:49:27 +00:00
stevenhorsman	38a655487f	vsock-exporter: Switch bincode for serde_json bincode is not maintained, so switch to serde_json to resolve RUSTSEC-2025-0141 Assisted-By: Bob Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-03-19 10:45:17 +00:00
stevenhorsman	e1d7d5bef8	agent: Remove async-std It's a dev-dependency that doesn't seem to be used, so remove it and resolve RUSTSEC-2025-0052 Assisted-By: Bob Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-03-19 10:45:17 +00:00
stevenhorsman	e4eda5e1d8	agent: Bump tracing-subscriber - Bump tracing-subscriber to 0.3.20 to resolve RUSTSEC-2025-0055 - Switch deprecated `slog_info!` for `slog::info!` Generated-By: Bob Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-03-19 10:45:17 +00:00
stevenhorsman	d06dadd8ef	docs: Spelling updates Either fixing typos, or including program/repo name in backticks Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-03-19 10:22:54 +00:00
dependabot[bot]	2f5415d8f5	build(deps): bump google.golang.org/grpc Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.63.2 to 1.79.3. - [Release notes](https://github.com/grpc/grpc-go/releases) - [Commits](https://github.com/grpc/grpc-go/compare/v1.63.2...v1.79.3) --- updated-dependencies: - dependency-name: google.golang.org/grpc dependency-version: 1.79.3 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2026-03-19 10:03:45 +00:00
dependabot[bot]	3876a80208	build(deps): bump google.golang.org/grpc in /src/runtime Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.72.0 to 1.79.3. - [Release notes](https://github.com/grpc/grpc-go/releases) - [Commits](https://github.com/grpc/grpc-go/compare/v1.72.0...v1.79.3) --- updated-dependencies: - dependency-name: google.golang.org/grpc dependency-version: 1.79.3 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2026-03-19 10:03:30 +00:00
stevenhorsman	2a4227e02e	kata-ctl: Try fixing unused_assignement error `allow(unused_assignments)` isn't working as it's in macro generated code, so referencing the command in the error, to use it Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-03-17 16:04:58 +00:00
stevenhorsman	ca7cdcd732	kata-ctl: Rewrite path_join test This test was failing clippy by calling .unwrap() after an .is_ok(), but after I looked at it, it seemed a bit messy, so I split it up and tried rewriting it to make it more readable IMHO. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-03-17 16:04:58 +00:00
stevenhorsman	501578cc5a	agent: Remove non-idiomatic unwrap Calling .unwrap() after an .is_some() check is considered non-idiomatic in as it performs redundant work and makes the code more verbose. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-03-17 16:04:58 +00:00
Alex Lyn	833b72470c	Merge pull request #12647 from sprt/gp-improve genpolicy: Improve emptyDir storage options and mount point validation	2026-03-17 13:56:42 +08:00
Manuel Huber	660e3bb653	gpu: Obsolete the NVIDIA initrd build As the NVIDIA stack has shifted to using an image for both the confidential and non-confidential variants, we retire the initrd build. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-03-16 21:29:58 -04:00
Manuel Huber	169f92ff09	agent: cdh: Update CDH and API With the new CDH version, the secure_mount API changes. Further, the new CDH version no longer uses the luks-encrypt-storage script but utilizes libcryptsetup as well as mkfs.ext4 and dd. Hence, adapt some of the CDH and Kata components build steps Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-03-16 09:43:17 -07:00
Zvonko Kaiser	6a853a9684	gpu: Bump NVRC We have a new release add this one to the next Kata release. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-03-15 09:53:32 -07:00
Zvonko Kaiser	8ff5d164c6	runtime: make CDI annotation vendor-agnostic with lookup table Replace hardcoded NVIDIA vendor ID (0x10de) and class (0x030) checks with a vendor-agnostic lookup table (cdiDeviceKind) that maps PCI vendor/class pairs to CDI device kinds. This makes it straightforward to add support for new device types by adding entries to the table. Refactor siblingAnnotation to resolve device BDFs once upfront and reuse them for both CDI type detection and sibling matching, eliminating redundant sysfs reads. Devices not in the lookup table (e.g. NVSwitches) are skipped with errNoSiblingFound, while known device types that fail to match a sibling produce a hard error. Consolidate the hot-plug and cold-plug device loops into a single loop over extracted container paths, removing duplicated filtering logic. Export GetPCIDeviceProperty from the device drivers package to allow vendor/class lookup from sysfs in the container annotation path. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-03-15 09:53:32 -07:00
Zvonko Kaiser	d4c21f50b5	gpu: Bump default memory to 8G for GPU runtimes We need enough inital memory to prepare more complex platforms like HGX H100 or HGX B200 systems. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-03-15 09:53:32 -07:00
Zvonko Kaiser	5c9683f006	gpu: Remove devtmpfs.mount=0 With the newest NVRC release this is solved and does not need to be overriden. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-03-15 09:53:32 -07:00
Zvonko Kaiser	d22c314e91	gpu: Increase dial_timeout=1200 For cold-plug when running with nerdctl the timeouts in the config are being used, increase the dial_timeout (e.g. for CreateSandbox) to match create_container_timeout. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-03-15 09:53:32 -07:00
stevenhorsman	f25fa6ab25	runtime: bump go.mod version Update the runtime's go.mod go version to 1.25.8 to keep in sync with versions.yaml Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-03-12 08:53:40 +00:00
dependabot[bot]	d366d103cc	build(deps): bump quinn-proto in /src/tools/agent-ctl Bumps [quinn-proto](https://github.com/quinn-rs/quinn) from 0.11.8 to 0.11.14. - [Release notes](https://github.com/quinn-rs/quinn/releases) - [Commits](https://github.com/quinn-rs/quinn/compare/quinn-proto-0.11.8...quinn-proto-0.11.14) --- updated-dependencies: - dependency-name: quinn-proto dependency-version: 0.11.14 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>	2026-03-11 16:04:34 +00:00
Dan Mihai	04f180434e	Merge pull request #12640 from burgerdev/genpolicy-workspace genpolicy: add to Cargo workspace	2026-03-11 09:02:39 -07:00
Steve Horsman	ba0f5b98fe	Merge pull request #12643 from stevenhorsman/bump-golang-to-1.25.8 versions: bump golang to 1.25.8	2026-03-11 08:53:21 +00:00
Markus Rudy	6643b258bb	genpolicy: update oci-client to v0.16.1 The older version we used transitively depends on an unmaintained crate. Signed-off-by: Markus Rudy <mr@edgeless.systems>	2026-03-11 09:30:48 +01:00
Markus Rudy	8dfeeea924	genpolicy: add to Cargo workspace This commit adds the genpolicy utility to the root workspace. For now, only dependencies that are already in the root workspace are consumed from there, the genpolicy-specific ones should be added later. Signed-off-by: Markus Rudy <mr@edgeless.systems>	2026-03-11 09:30:46 +01:00
Markus Rudy	fc4eaf8b66	runtime-rs: specify the subpackage to build Before this change, `make test` for runtime-rs used to test all crates in the root workspace (due to the `--all` flag). This was not intended but happened to be mostly working. However, genpolicy needs additional steps before it can build, so this behavior blocks adding genpolicy to the root workspace. The solution here is to only build the inteded packages. For the build and run commands, this is the runtime-rs crate itself. For testing, we need to include the sub-crates, too, which needs a bit of cargo metadata scraping. Signed-off-by: Markus Rudy <mr@edgeless.systems>	2026-03-11 09:28:24 +01:00
Aurélien Bombo	2a15cfc5ec	genpolicy: Improve emptyDir storage options and mount point validation These are two changes following a Copilot review on #10559: 1. Restore the p_storage.driver != "blk" check in allow_storage_options(): - An early version of #10599 hardcoded p_storage.driver to "blk". - Hence that check needed to be removed to validate "blk" storage options. - The final version of #10599 hardcodes p_storage.driver to "" to account for both "blk" and "scsi", and checks storage options in allow_block_storage(). - Hence that check should be restored to preserve the original behavior. https://github.com/kata-containers/kata-containers/pull/10559#discussion_r2907646552 2. Don't use a regex to validate emptyDir storage mount points: - It's risky to use a regex to validate a path that has base64-encoded components. - We can infer the exact path anyway so the regex is redundant. https://github.com/kata-containers/kata-containers/pull/10559#discussion_r2907646582 Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-03-10 11:22:10 -05:00
Dan Mihai	f9a8eb6ecc	genpolicy: allow_mount improvements for emptyDir 1. Reduce the complexity of the new allow_mount rules for emptyDir. 2. Reverse the order of the two allow_mount versions, as a hint to the rego engine that the first version is more often matching the input. 3. Remove `p_mount.source != ""` from mount_source_allows, because: - Policy rules typically test the values from input, not values read from Policy. - mount_source_allows is no longer called for emptyDir mounts after these changes, so p_mount.source is not empty. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2026-03-09 14:52:17 -05:00
Aurélien Bombo	a98e328359	tests: Add test for trusted ephemeral data storage This tests the feature on CoCo machines. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-03-09 14:52:17 -05:00
Aurélien Bombo	9fe03fb170	genpolicy: Support trusted ephemeral data storage * Introduces a new cluster_config setting encrypted_emptydir defaulting to true. * Adapts genpolicy for encrypted emptyDirs. Crucially, the rules.rego change checks that the mount and the storage are well-formed together: * i_storage.source matches a known regex. * i_storage.mount_point == $(spath)/BASE64(i_storage.source) * i_storage.mount_point == p_storage.mount_point * i_storage.mount_point == i_mount.source Note that policy enforcement is necessary to prevent rogue device injection. E.g. the agent could not blindly encrypt all block devices as some use cases only need dm-verity. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-03-09 14:52:17 -05:00
Aurélien Bombo	eaa711617e	agent: Support trusted ephemeral data storage Handles block-based emptyDirs plugged via virtio-blk and virtio-scsi by encrypting and formatting them. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-03-09 14:52:17 -05:00
Aurélien Bombo	a4fd32a29a	runtime: Support trusted ephemeral data storage * Introduces the `emptydir_mode` config flag to allow instructing the runtime to create a block device for emptyDir volumes. * The block device is created in the original emptyDir folder on the host so that Kubelet can monitors its disk usage and evict the pod if it exceeds its sizeLimit. This matches runc and virtio-fs. * The block device's disk image file is sparse to minimize host disk footprint. Fixes: #10560 Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-03-09 14:52:17 -05:00
Alex Lyn	fb743a304c	runtime: Support plugging a disk as an image file Some VMMs support plugging a disk as an image file instead of a block device, so we adapt the runtime to support that. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com> Signed-off-by: Aurélien Bombo <abombo@microsoft.com> Co-authored-by: Aurélien Bombo <abombo@microsoft.com>	2026-03-09 14:52:17 -05:00
stevenhorsman	8ae0e36737	versions: bump golang to 1.25.8 Bump the builder image and versions to resolve CVEs: - GO-2026-4601 - GO-2026-4602 - GO-2026-4603 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-03-09 09:10:01 +00:00
Alex Lyn	62b0f63e37	dragonball: Generate unique TAP names to avoid conflicts The vhost-kern net unit test used a fixed TAP interface name ("test_vhosttap"). When tests run in parallel or a previous run leaves the interface behind, TAP creation can fail with EBUSY ("Resource busy"), making CI flaky. Introduce a unique_tap_name() helper in the tests and use it to generate a per-test TAP name (based on pid/thread/counter), avoiding name collisions and stabilizing CI. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-03-06 17:33:40 +08:00
Alex Lyn	1c8c0089da	dragonball: fix flaky signal_handler test using libc::raise The signal_handler test was intermittently failing because it used kill(pid, sig), which sends signals asynchronously to the process. This created a race condition where the child thread could exit and be joined before the signal was delivered or processed. This fix including: 1. Replaces `kill` with `libc::raise` to ensure signals are delivered synchronously to the calling thread. 2. Reorders triggers to verify standard signals before installing seccomp filters. 3. Guarantees that metrics are incremented before the child thread terminates and is joined by the main thread. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-03-06 09:28:56 +08:00
Alex Lyn	d0718f6001	dragonball: Fix unnecessary parentheses around type warning: unnecessary parentheses around type --> src/dragonball/dbs_legacy_devices/src/serial.rs:245:39 \| 245 \| let out: Arc<Mutex<Option<Box<(dyn std::io::Write + Send + 'static)>>>> = \| ^ ^ \| = note: `#[warn(unused_parens)]` (part of `#[warn(unused)]`) on by default help: remove these parentheses Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-03-06 09:28:56 +08:00
Alex Lyn	b4161198ee	dragonball: Remove unused imports variables in dbs_pci Fix warnings of unused imports as below: ``` warning: unused imports: `DEVICE_ACKNOWLEDGE`, `DEVICE_DRIVER_OK`, `DEVICE_DRIVER`, `DEVICE_FEATURES_OK`, and `DEVICE_INIT` --> src/dragonball/dbs_pci/src/virtio_pci.rs:1177:9 \| 1177 \| DEVICE_ACKNOWLEDGE, DEVICE_DRIVER, DEVICE_DRIVER_OK, DEVICE_FEATURES_OK, DEVICE_INIT, \| ^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^ \| = note: `#[warn(unused_imports)]` (part of `#[warn(unused)]`) on by default ``` Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-03-06 09:28:56 +08:00
Alex Lyn	ca4e14086f	runtime-rs: Fix warnings of unformatted codes Fix warnings from unformattted codes. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-03-06 09:28:56 +08:00
Alex Lyn	ce800b7c37	dragonball: Fix flaky test_vhost_user_net_virtio_device_activate hang The vhost-user-net tests could hang in CI because VhostUserNet::new_server() blocks indefinitely on listener.accept() when the slave fails to connect in time (e.g. due to scheduler delays or flaky socket paths). This also caused panics when connect_slave() returned None and the test unwrapped it. Fix the tests by: - using a `/tmp`, absolute, unique unix socket path per test run retrying slave connect with a deadline - running new_server() in a separate thread and waiting via recv_timeout() to ensure the test never blocks indefinitely Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-03-06 09:28:56 +08:00
Alex Lyn	a988b10440	dragonball: Fix flaky test_vhost_user_net_virtio_device_normal hang It aims to fix flaky test hang by implementing thread timeouts. The `test_vhost_user_net_virtio_device_normal` was hanging in CI when master/slave threads drifted. This commit stabilizes the test by: - Using `tempfile` and unique paths to ensure socket isolation. - Adding a 5s deadline for slave connections to handle CI jitter. - Running `new_server` in a separate thread with a `recv_timeout` to prevent the CI pipeline from deadlocking. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-03-06 09:28:56 +08:00
Alex Lyn	f36218d566	dragonball: Fix flaky test_inner_stream_timeout in inner backend The `test_inner_stream_timeout` test case was prone to failure due to a race condition between the main thread and the background handler. The test relied on hardcoded `thread::sleep` durations, which could cause the second read operation to time out (150ms window) before the main thread performed its write (after a 300ms sleep) under high system load. This commit stabilizes the test by: 1. Replacing fixed sleep durations with a `Condvar` and a `stage` variable to implement a deterministic state machine. 2. Synchronizing the threads so that the main thread only writes data after the background handler has confirmed it is ready or has completed its previous phase. 3. Ensuring the read timeout is explicitly managed between different validation stages to prevent accidental `TimedOut` errors. This change eliminates the flakiness and ensures the test passes consistently across different CIenvironments. Fixes #12618 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-03-06 09:28:56 +08:00
Alex Lyn	c8a39ad28d	dragonball: Fix flaky test_epoll_manager by improving synchronization This commit aims to address issues of "Infinite loop in epoll_manager tests" and improve stablity. Root causes as below: 1. Using `handle_events(-1)` caused the worker thread to block forever if an event was missed or if the internal `kick()` signal was not accounted for correctly. 2. Relying on event counts was unreliable because internal signals could fluctuate the total count, causing the it to enter an infinite loop. 3. Using `EventSet::OUT` on an EventFd is often continuously ready, leading to non-deterministic trigger behavior. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-03-06 09:28:56 +08:00
Fabiano Fidêncio	2fff33cfa4	Merge pull request #12628 from stevenhorsman/agent-ctl-bump-aws-lc-rs agent-ctl: Update aws-lc-rs	2026-03-05 20:52:03 +01:00
Fabiano Fidêncio	079fac1309	Merge pull request #12591 from fidencio/topic/kernel-add-mmio-back-to-the-unified-kernels kernel: include mmio fragment in unified build for firecracker	2026-03-05 13:45:41 +01:00
stevenhorsman	c57f2be18e	agent-ctl: Update aws-lc-rs aws-lc has mutliple high severity CVEs: - GHSA-vw5v-4f2q-w9xf - GHSA-65p9-r9h6-22vj - GHSA-hfpc-8r3f-gw53 so try and bump to the latest `aws-lc-rs` crate to pull in the available fixed versions Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-03-05 10:02:22 +00:00
Fabiano Fidêncio	8f35c31b30	Merge pull request #12542 from fidencio/topic/genpolicy-distribute-different-settings-rather-than-patching-for-ci genpolicy: settings.d drop-ins and scenario example drop-ins	2026-03-05 07:37:30 +01:00
Fabiano Fidêncio	83dd7dcc75	runtimes: reject virtio-blk-mmio when confidential_guest is true Virtio-mmio transport is not hardened for confidential computing (unlike virtio-pci). Reject config that would use virtio-blk-mmio for rootfs/block when confidential_guest is set, so CoCo guests only use virtio-blk-pci. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-03-04 21:41:27 +01:00
Fabiano Fidêncio	d40afe592c	genpolicy: add settings drop-in directory and RFC 6902 JSON Patch support Allow genpolicy -j to accept a directory instead of a single file. When given a directory, genpolicy loads genpolicy-settings.json from it and applies all genpolicy-settings.d/.json files (sorted by name) as RFC 6902 JSON Patches. This gives precise control over settings with explicit operations (add, remove, replace, move, copy, test), including array index manipulation and assertions. Ship composable drop-in examples in drop-in-examples/: - 10- files set platform base settings (non-CoCo, AKS, CBL-Mariner) - 20-* files overlay specific adjustments (OCI version, guest pull) Users copy the combination they need into genpolicy-settings.d/. Replace the old adapt_common_policy_settings_* jq-patching functions in tests_common.sh with install_genpolicy_drop_ins(), which copies the right combination of 10-* and 20-* drop-ins for the CI scenario. Tests still generate 99-test-overrides.json on the fly for per-test request/exec overrides. Packaging installs 10-* and 20-* drop-ins from drop-in-examples/ into the tarball; the default genpolicy-settings.d/ is left empty. Made-with: Cursor Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-03-04 20:13:21 +01:00

1 2 3 4 5 ...

6101 Commits