kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-05-04 12:31:27 +00:00

Author	SHA1	Message	Date
stevenhorsman	9cae783f14	kata-deploy: fix binary location for trace-forwarder Moving the trace-forwarder into the root workspace moves the target directory, so update this target. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-29 13:27:09 +01:00
stevenhorsman	7664ebda7e	trace-forwarder: Move into root workspace Add trace-forwarder to be a workspace member to simplify the dependency management. Assisted-by: IBM Bob Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-29 12:11:04 +01:00
Fabiano Fidêncio	1a22c3adec	Merge pull request #12942 from stevenhorsman/fix-cri-containerd-test-names ci: Fix cri-containerd-test names	2026-04-29 09:56:43 +02:00
Steve Horsman	2435970fe8	Merge pull request #12933 from fidencio/topic/runtime-rs-decouple-dragonball-from-non-x86-checks runtime-rs: drop misleading unsupported arches gating	2026-04-28 18:36:16 +01:00
stevenhorsman	4d4dee3af2	ci: Fix cri-containerd-test names During the zizmor refactoring I changed the name of two jobs to make all the architectures match. I forgot to update required_tests and as a workflow only change the PR didn't check this, so update them now. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-28 18:30:53 +01:00
Aurélien Bombo	886d05c7ee	Merge pull request #12898 from kata-containers/sprt/clh-runtime-rs runtime-rs: rename `cloud-hypervisor` to `clh-runtime-rs`	2026-04-28 11:50:56 -05:00
Aurélien Bombo	f3dc71a770	Revert "tests: k8s: policy: improve settings selection for runtime-rs hypervisors" This reverts commit `cafdd278ba`.	2026-04-28 10:58:01 -05:00
Aurélien Bombo	dc0f1795de	kata-deploy: remove useless unit tests These essentially merely test format!(), which is not our job. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-04-28 10:58:01 -05:00
Aurélien Bombo	cf6a91a104	runtime-rs/config: rename cloud-hypervisor to clh This aligns on the previous commit and runtime-go. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-04-28 10:58:01 -05:00
Aurélien Bombo	e4fbddb91a	ci: rename cloud-hypervisor to clh-runtime-rs This aligns on qemu-runtime-rs and makes more sense. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-04-28 10:58:01 -05:00
Fabiano Fidêncio	99b6dcf411	Merge pull request #12935 from fidencio/topic/rootfs-add-mlx5-modules kernel: bake in Mellanox MLX5 Ethernet support	2026-04-28 17:24:31 +02:00
Fabiano Fidêncio	7e5cc37fab	runtime-rs: resource: discover hugetlbfs page sizes from sysfs in test `volume::hugepage::tests::test_get_huge_page_size` was hard-coded to exercise the round-trip through `get_huge_page_option` / `get_page_size` for two hugetlbfs page sizes: let format_sizes = ["1Gi", "2Mi"]; These are the sizes x86_64 Ubuntu kernels expose by default (`/sys/kernel/mm/hugepages/hugepages-{1048576,2048}kB`), but other architectures use different sizes: * s390x: typically `hugepages-1048576kB` only (1 GiB; no 2 MiB pool) -- the kernel returns `EINVAL` for the missing 2 MiB iteration: thread 'volume::hugepage::tests::test_get_huge_page_size' panicked at .../resource/src/volume/hugepage.rs:242:14: called `Result::unwrap()` on an `Err` value: EINVAL * ppc64le: page sizes vary by kernel build (e.g. 16M/16G with 64K base pages, 2M/1G with 4K base pages), and may not match `["1Gi", "2Mi"]` exactly. Same EINVAL on the iteration whose size isn't a registered hstate. The reason this never bit before is the same as the SELinux test in the previous-but-one commit: the runtime-rs `Makefile` wrapped `test` in an `ifeq UNSUPPORTED_ARCHS` block that turned it into `echo ...; exit 0` on s390x/ppc64le/riscv64gc, so the test was only ever exercised on x86_64 (and aarch64, which happens to have the same default hugetlb page sizes). Dropping that gate is what exposed the latent assumption. Replace the hard-coded list with a small helper that lists the hugetlbfs page sizes the running kernel actually exposes via `/sys/kernel/mm/hugepages/hugepages-NkB`, rendered as binary-unit strings (e.g. "2Mi", "1Gi") that are accepted both by the kernel's `pagesize=...` mount option and by `byte_unit::Byte::parse_str(s, /allow_binary=/ true)`. If `/sys/kernel/mm/hugepages` doesn't exist or the directory is empty (e.g. hugetlbfs is unconfigured in the test environment) the test simply returns -- there's nothing meaningful to round-trip. On x86_64 the discovered list comes out as `["1Gi", "2Mi"]` (the same coverage as before). On s390x it becomes `["1Gi"]`, on ppc64le whatever that kernel build supports. Sysfs alone, however, is a necessary-but-not-sufficient signal: it tells us the kernel registered the page size, not whether this process is allowed to mount hugetlbfs. The ubuntu-24.04-s390x GHA runner demonstrates the gap -- it exposes `hugepages-1048576kB` via /sys but runs the build inside a user/mount namespace where mount(2) of hugetlbfs returns EPERM even when the test is invoked through sudo: thread 'volume::hugepage::tests::test_get_huge_page_size' panicked at .../resource/src/volume/hugepage.rs:292:14: called `Result::unwrap()` on an `Err` value: EPERM There's no portable capability bit we can sniff for that, so probe once with the first discovered size before iterating; if the probe mount fails, skip the test (rather than panic on something it can't control). A real regression on a host where mount() does work will still surface inside the loop below, since the per-size mount calls there continue to assert via `.unwrap()`. While here, feed the kernel-native shorthand (e.g. "2M", "1G") rather than the IEC form ("2Mi", "1Gi") to mount(2). hugetlbfs parses `pagesize=` via `memparse()`, which understands K/M/G but not the IEC `Ki/Mi/Gi`; today the kernel happens to silently drop the trailing `i` (memparse just stops scanning), but that leniency is incidental. /proc/mounts in turn always renders the option back as `pagesize=<N>{K,M,G}`, which is exactly the form `get_page_size()` already expects -- it strips `pagesize=` and unconditionally appends `i` before handing the result to byte_unit. Stripping the `i` for the mount option keeps the test's input aligned with the kernel's canonical syntax, while leaving the IEC form intact for the `Byte::parse_str(..., /allow_binary=/ true)` comparison. Also drop the unused `Ok` re-export from `use anyhow::{anyhow, Context, Ok, Result}`. Every existing `Ok(...)` site in this module is the variant-constructor form, for which the prelude's `Result::Ok` already works fine in `anyhow::Result<T>` context (same enum, with `E = anyhow::Error` inferred from the surrounding return type), so nothing actually needed `anyhow::Ok` to begin with. Removing the import lets the new helper use plain `let Ok(entries) = ... else` / `let Ok(name) = ... else` patterns directly instead of funneling everything through `.ok()` + `if let Some(...)` to dodge the shadowing. Made-with: Cursor Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-28 16:25:31 +02:00
Fabiano Fidêncio	cd67638618	runtime-rs: hypervisor: don't assert kernel LSM behaviour in selinux test `selinux::tests::test_set_exec_label` had two branches: when SELinux is enabled it asserts that `set_exec_label` succeeds and round-trips the label through `/proc/thread-self/attr/exec`, and when SELinux is NOT enabled it asserted that `set_exec_label` returns `Err`. The second assertion is wrong -- it's a claim about the kernel/LSM interface, not about `set_exec_label` itself. `/proc/thread-self/attr/exec` is a generic LSM interface, not SELinux-specific. When no LSM owns the slot, kernel behaviour is arch/distro/build dependent: some kernels return `EINVAL` (observed on x86_64 Ubuntu CI runners, where the test was originally written and was passing), others silently accept the write (observed on ppc64le Ubuntu CI runners, which is what made this surface): thread 'selinux::tests::test_set_exec_label' panicked at src/runtime-rs/crates/hypervisor/src/selinux.rs:62:13: Expecting error, Got Ok(()) The reason this never blew up before is that the previous-but-one commit's `ifeq UNSUPPORTED_ARCHS ... exit 0` block in the runtime-rs `Makefile` made `make test` a no-op on s390x/ppc64le/riscv64gc. Dropping that gate (so `make test` actually runs on every arch that runtime-rs builds on) is what surfaced the latent bug. Drop the `else { assert!(ret.is_err(), ...); }` branch and replace it with a comment explaining why we deliberately don't assert on `ret` in that path. The "SELinux is enabled" branch is the only side that exercises anything we own; the no-SELinux path is a kernel detail that's not ours to normalize. Made-with: Cursor Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-28 16:25:31 +02:00
Fabiano Fidêncio	8ab97a60f3	ci: install protobuf-compiler for runtime-rs build-checks The `runtime-rs` component of `build-checks.yaml` declared `rust` as its only dependency, but the runtime-rs build pulls in `prost-build v0.8.0` (via `ttrpc-codegen` -> `containerd-shim-protos`, and via the in-tree `hypervisor` crate), and `prost-build`'s build script needs a `protoc` binary at compile time. This worked on x86_64 and aarch64 only because `prost-build v0.8.0` ships bundled `protoc` binaries for those targets. On s390x (and ppc64le, when the matrix gets there) there is no bundled binary, so the build fails with: Failed to find the protoc binary. The PROTOC environment variable is not set, there is no bundled protoc for this platform, and protoc is not in the PATH The reason this didn't show up in CI before is that `make test` and `make check` for runtime-rs were wrapped in arch-specific `ifeq` blocks in `src/runtime-rs/Makefile` that turned them into no-ops on s390x/ppc64le/riscv64gc. The previous commit dropped those gates so `make {test,check}` now actually run on every arch, which exposes this latent CI gap. Match what `agent`, `libs`, `agent-ctl`, `kata-ctl` and `genpolicy` already declare and add `protobuf-compiler` to runtime-rs's needs. The existing `Install protobuf-compiler` step in this workflow already runs `sudo apt-get -y install protobuf-compiler`, which the s390x/ppc64le runners support (those other components have been using it on s390x for some time). Made-with: Cursor Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-28 16:25:31 +02:00
Fabiano Fidêncio	48ef1be3be	runtime-rs: Drop misleading "unsupported arch" gates The Makefile pretended to reject s390x, powerpc64le and riscv64gc by wrapping `default`, `test` and `install` in `ifeq UNSUPPORTED_ARCHS`, and `check` in `ifeq ($(ARCH),x86_64)`. In reality `default` and `install` were byte-for-byte identical in both branches, so only `test` and `check` were ever skipped. The user-visible "$(ARCH) is not currently supported" message and the bare `exit 0` made it look like the build was a no-op when in fact builds and installs were proceeding -- which has burned at least one maintainer trying to debug a downstream packaging failure (issue #12914). The original reasons those targets were skipped were: * `test` (commit `389ae9702`, 2022): `cargo test` would pull in the dragonball crate, which only builds on x86_64/aarch64. * `check`: delegates to `standard_rust_check` in utils.mk, which runs `cargo clippy --all-targets --all-features`. `--all-features` unconditionally turns on the `dragonball` (and `cloud-hypervisor`) feature regardless of arch, breaking the build wherever those crates can't compile. Both are now obsolete. The preceding commit arch-gated the dragonball and firecracker drivers (and their dependencies) at the Cargo and Rust source level, so on s390x/ppc64le/riscv64gc: * the `dragonball` cargo feature is a safe no-op -- enabling it just doesn't pull in the dep, * the `cloud-hypervisor` cargo feature still pulls in `ch-config` (which is portable Rust), but the `ch` driver module that uses it remains arch-gated at the source level, * `dbs-utils` and `hyperlocal` are not built at all. That means `cargo clippy --all-targets --all-features` -- exactly what `standard_rust_check` runs -- is safe on every architecture, and no runtime-rs-local override of `check` is needed. Drop both `ifeq` blocks and let `test` and `check` run on every arch the way `default` and `install` already did. Net result: `make {default, test,check,install}` now Just Work everywhere, with no arch-specific code paths in this Makefile and no misleading "not currently supported" messages. Fixes: #12914 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-28 16:25:31 +02:00
Fabiano Fidêncio	6a1d7f7d85	runtime-rs: Arch-gate dragonball and firecracker hypervisors Two of the in-tree hypervisor drivers, dragonball and firecracker, along with three of their transitive dependencies (the dragonball crate itself, dbs-utils, hyperlocal), are built unconditionally on every architecture even though both upstream projects only support x86_64 and aarch64: * dragonball: the dragonball VMM crate is x86_64+aarch64 only. The runtime-rs `dragonball` cargo feature is already gated via `USE_BUILTIN_DB` -> `ARCH_SUPPORT_DB` in the Makefile, so the default `make` flow does the right thing today. But anything that bypasses that gate -- a contributor running `cargo clippy --all-features`, a CI matrix that forces the feature on, etc. -- fails to build on s390x/ppc64le/riscv64gc, because the optional `dragonball` dependency is declared without a target predicate and Rust source sites reference it under a feature gate alone. * firecracker: firecracker upstream only releases for x86_64 and aarch64 (https://github.com/firecracker-microvm/firecracker/releases/tag/v1.15.1). The Makefile already reflects this -- `FCCMD` is only defined in the x86_64/aarch64 arch options files -- but the in-tree `firecracker` driver module compiles unconditionally, so on s390x/ppc64le/riscv64gc we still ship a runtime that thinks it can drive a hypervisor binary that doesn't exist on the platform. Decouple both at the Cargo and Rust source level, mirroring the existing cloud-hypervisor pattern. * Cargo.toml: move the optional `dragonball` dependency, plus `dbs-utils` and `hyperlocal` (whose only consumers are the dragonball and firecracker driver modules), into a target- specific dependency block: [target.'cfg(any(target_arch = "x86_64", target_arch = "aarch64"))'.dependencies] dbs-utils = { workspace = true } hyperlocal = { workspace = true } dragonball = { workspace = true, features = [ ... ], optional = true } On x86_64/aarch64 the resolved dep graph is unchanged. On s390x/ppc64le/riscv64gc enabling the `dragonball` feature becomes a safe no-op, and the dep graph for the `hypervisor` crate is completely free of any dragonball or firecracker artifacts. This also makes the gating self-policing: any future `use dbs_utils::...` or `use hyperlocal::...` outside an arch-gated module will fail to build on non-x86 instead of silently shipping dead code. * Rust modules: combine the existing `feature = "dragonball"` gate with `target_arch = "x86_64"\|"aarch64"` on `pub mod dragonball;` and the dragonball-only constants (`DEV_HUGEPAGES`, `SHMEM`, `HUGE_SHMEM`) in `crates/hypervisor/src/lib.rs`. Add the same target_arch gate to `pub mod firecracker;` (matching the existing gate on `pub mod ch;`) and to every site in `crates/runtimes/virt_container/src/{lib,sandbox}.rs` that names a now-gated type (`Dragonball`, `Firecracker`, `DragonballConfig`, `FirecrackerConfig`). * `pub(crate) enum VmmState` in `crates/hypervisor/src/lib.rs` gets the same target_arch gate -- its only consumers are the `ch`, `dragonball` and `firecracker` modules, all of which are gated to x86_64+aarch64. Without it, `cargo clippy --all-features -- -D warnings` (i.e. what `make check` runs via `standard_rust_check`) would fail on non-x86 with "enum `VmmState` is never used". The plain `HYPERVISOR_DRAGONBALL` and `HYPERVISOR_FIRECRACKER` string constants stay ungated, and the persist-side match arms in `sandbox.rs` that only compare against those strings also stay ungated, mirroring how `HYPERVISOR_NAME_CH` is already handled. Verified with `cargo tree --target=<triple> --features dragonball -p hypervisor` for x86_64/aarch64/s390x/powerpc64le/riscv64gc: * x86_64/aarch64: full dragonball stack (dbs_address_space, dbs_allocator, dbs_arch, dbs_boot, dbs_device, dbs_interrupt, dbs_legacy_devices, dbs_pci, dbs_upcall, dbs-utils, hyperlocal, ...) is pulled in, as before. * s390x/ppc64le/riscv64gc: the dep graph for the `hypervisor` crate is completely free of any dragonball or firecracker artifacts, even with `--features dragonball` explicitly enabled. `cargo clippy --target=s390x-unknown-linux-gnu --all-targets --all-features --release --locked -- -D warnings` is also clean, and `make check` on x86_64 with the default `USE_BUILTIN_DB=true` still passes. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-28 16:25:31 +02:00
Steve Horsman	a85ed99252	Merge pull request #12938 from stevenhorsman/build-checks-concurrency-fix workflows: Remove workflow concurrency	2026-04-28 15:13:42 +01:00
stevenhorsman	09ac10e8df	workflows: Remove workflow concurrency It seems like some of our workflow concurrency rules are clashing with the job-level ones for some reason and cancelling jobs, so remove these problematic workflow rules. Co-authored-by: Fabiano Fidêncio <fabiano@fidencio.org> Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-28 14:56:07 +01:00
Steve Horsman	47f5de85bb	Merge pull request #12323 from kata-containers/zizmor-update-to-1.20 workflows: zizmor update to 1.22	2026-04-28 13:30:14 +01:00
stevenhorsman	d5411e00f6	workflows: Fix version on pinned action docker/build-push-action@bcafcacb16 seemed to be given the wrong version in the comment, so update this to be correct Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-28 13:10:36 +01:00
stevenhorsman	063a13ccd0	workflows: Bump zizmor to 1.22 Bump zizmor to the 1.22 version to pick up new rule updates. Later bumps to follow once this has proven stable Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-28 13:10:36 +01:00
stevenhorsman	92ded7ff98	workflows: Add timeouts Recently I've seen a couple of occasions where jobs have seemed to run infinitely. Add timeouts for these jobs to stop this from happening if things get into a bad state. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-28 13:10:36 +01:00
stevenhorsman	af4ced32f4	workflows: Add concurrency limits It is good practice to add concurrency limits to automatically cancel jobs that have been superceded and potentially stop race conditions if we try and get artifacts by workflows and job id rather than run id. See https://docs.zizmor.sh/audits/#concurrency-limits Assisted-by: IBM Bob Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-28 13:10:36 +01:00
Fabiano Fidêncio	5eefbbafb3	Merge pull request #12899 from kata-containers/topic/runtime-rs-docker-qemu runtime-rs: qemu: Add docker support and tests	2026-04-28 13:23:22 +02:00
Fabiano Fidêncio	2aa09df780	Merge pull request #12860 from Xynnn007/release/slsa-for-artifacts ci: enforce SLSA provenance for published artifacts	2026-04-28 12:36:38 +02:00
Xynnn007	f4a9847877	ci: enforce SLSA provenance for published artifacts Published artifacts are consumed as security-critical runtime inputs, so they need verifiable provenance that binds each binary back to the exact source and build context. Without provenance, downstream users cannot reliably distinguish trusted CI outputs from repackaged or substituted artifacts. Recording provenance in Sigstore's immutable transparency infrastructure provides auditable evidence that survives mirror/registry movement and strengthens supply-chain forensics and policy enforcement. This also aligns artifact publication with a zero-trust verification model expected by confidential-computing consumers and automated admission controls. Remove workflow-level attestation gating so published artifacts are consistently accompanied by build provenance. Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>	2026-04-28 11:40:15 +02:00
Fabiano Fidêncio	a5e1521727	kernel: bake in Mellanox MLX5 Ethernet support The MLX5 Ethernet driver is useful well beyond the DPU/SmartNIC use case (any guest sitting on top of a Mellanox/ConnectX NIC benefits from it), yet the existing config fragment lived under dpu/ and was only pulled in when the kernel was built with `-D nvidia`. Promote it to a first-class common fragment so every Kata kernel gets MLX5 Ethernet built in. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-28 11:02:39 +02:00
Fabiano Fidêncio	3ef2c5db65	docs: docker: Update docs to mention runtime-rs and what's tested Now that we're adding support for the rust runtime, let's also update the docs. We may also need to update the docs again once we start testing with different VMMs, but that's not in the scope for this PR. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-28 10:22:21 +02:00
Fabiano Fidêncio	58a2cc0baf	ci: enable Docker smoke tests for runtime-rs (qemu-runtime-rs) Add qemu-runtime-rs to the Docker test matrix on amd64 and s390x so that the runtime-rs shim is exercised with Docker + QEMU networking in CI. Fixes: #9340 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-28 10:20:18 +02:00
Fabiano Fidêncio	19bb8746f8	runtime-rs: rescan network at Start RPC for Docker 26+ Docker 26+ configures the container's veth pair between the Create and Start RPCs by bind-mounting `/proc/<vmm_pid>/ns/net`. The Rust shim's network scan during sandbox creation finds no interfaces because they don't exist yet. The Go shim (commit `f7878cc`) solves this with `detectHypervisorNetns` inside `addAllEndpoints`: when the placeholder netns is empty, it switches to the hypervisor's network namespace and rescans there. Port this approach to the Rust shim: - Add `rescan_network()` to the `Sandbox` trait - Implement it on `VirtSandbox`: build a rescan config that always targets the hypervisor's netns (`/proc/<vmm_pid>/ns/net`), bypassing the placeholder netns and the `network_created` flag - Call `sandbox.rescan_network()` synchronously in the `StartProcess` handler, before `cm.start_process()`, so interfaces are wired before the container process runs Fixes: #9340 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-28 10:20:18 +02:00
Fabiano Fidêncio	0da2f00488	runtime-rs: resource: add network rescan polling for Docker 26+ Docker 26+ configures veth pairs in the hypervisor's network namespace between the Create and Start RPCs. The initial network scan during sandbox creation finds no interfaces because they do not exist yet. Add `rescan_network_if_unconfigured` which polls the network namespace (50ms intervals, 5s timeout) until interfaces appear, then pushes the configuration to the guest agent. This mirrors the Go runtime's `RescanNetwork` (commit `f7878cc`). Supporting changes: - Derive `Clone` on `NetworkWithNetNsConfig` so it can be reused across poll iterations - Add `tokio/time` feature to the resource crate - Add `apply_network_to_agent` helper to push interfaces, routes, and neighbors to the guest Fixes: #9340 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-28 10:20:18 +02:00
Fabiano Fidêncio	67679ddd15	runtime-rs: detect Docker 26+ netns from hook args and filter /proc/0/ Docker 26+ with `runtimeType` may not publish the network namespace in `linux.namespaces` at create time. Instead, the netns path can be discovered from `libnetwork-setkey` hook arguments. Additionally, filter out the invalid `/proc/0/ns/net` placeholder that appears when the task PID is not yet known. This mirrors the Go runtime's `DockerNetnsPath` fallback logic. Fixes: #9340 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-28 10:20:18 +02:00
Fabiano Fidêncio	3ad2de584f	runtime-rs: return hypervisor PID from container manager methods Docker and containerd use the PID returned by the shim to construct `/proc/<pid>/ns/net` for network namespace operations. The Rust shim was returning the shim's own PID instead of the hypervisor's PID, which meant Docker would look at the wrong network namespace. Update `create_container`, `start_process`, `state_process`, `pid`, and `connect_container` to return the VMM master thread/process ID (`vmm_master_tid`) instead of `self.pid`. For QEMU this is the QEMU process PID; for Dragonball this is the VMM thread ID — both are valid for `/proc/<id>/ns/net` on Linux. Fixes: #9340 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-28 10:20:18 +02:00
Fabiano Fidêncio	b1393f03c4	runtime-rs: fix ConnectResponse to set both shim_pid and task_pid The containerd runtime v2 `shimTask.Create()` discards the `CreateTaskResponse.Pid` and instead retrieves the task PID by calling the shim's Connect RPC, reading `ConnectResponse.task_pid`. The Rust shim only set `shim_pid` in the ConnectResponse, leaving `task_pid` at its default zero value. This caused Docker to call `sb.SetKey("/proc/0/ns/net", ...)` which fails with "no such file or directory". Set `shim_pid` to the actual shim process ID and `task_pid` to the hypervisor PID (vmm_master_tid), matching the Go shim's Connect handler behavior. Fixes: #9340 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-28 10:20:18 +02:00
Fabiano Fidêncio	cbd71f534e	kata-sys-util: add oci_docker module for Docker netns detection Docker 26+ with `runtimeType` shims may not include a network namespace in the OCI spec's `linux.namespaces` and instead uses `libnetwork-setkey` hooks to communicate the sandbox ID. Add helpers to detect Docker containers and resolve the netns path from hook arguments, matching the Go runtime's `DockerNetnsPath` and `IsDockerContainer` utilities. Fixes: #9340 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-28 10:20:18 +02:00
Saul Paredes	03796a1126	Merge pull request #12909 from microsoft/saul/fix_failing_aks_check tests: describe pods deployment when testing deployment output	2026-04-27 14:40:20 -07:00
Saul Paredes	dc658a0551	Merge pull request #12917 from sespiros/darwin-libc-mode genpolicy: make FileType::from portable across Darwin	2026-04-27 14:36:51 -07:00
Fabiano Fidêncio	7820877de5	Merge pull request #12368 from alextibbles/docker-howto docs: add a simple how-to on using kata from docker	2026-04-27 19:21:11 +02:00
Saul Paredes	7c8df3b9e6	Revert "test: temp skip failing tests on AKS" This reverts commit `90e94ab305`.	2026-04-27 09:36:51 -07:00
Alex Tibbles	90286d3072	docs: add a simple how-to on using kata from docker Create a new how-to covering simple installation and configuration of kata as a docker daemon runtime. Signed-off-by: Alex Tibbles <alex@bleg.org>	2026-04-27 17:51:13 +02:00
Saul Paredes	3273c4e1cc	Revert "ci: Skip tests not working with k8s 1.36.0" This reverts commit `df68536cd6`.	2026-04-27 08:08:27 -07:00
Saul Paredes	51f234cb56	tests: describe pods deployment when testing deployment output For k8s 1.36.0, the events of a pod are no longer included in the "kubectl describe pod" output when describing a deployment. Describe using the "app" label instead. Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2026-04-27 08:07:58 -07:00
Spyros Seimenis	d7385eee99	genpolicy: make FileType::from portable across Darwin libc::S_IF* are u16 on Darwin/BSD and u32 on Linux. The match in FileType::from and its tests mix both widths and don't compile on Darwin. Cast everything to u32; on Linux that's a no-op, hence the clippy::unnecessary_cast allow (rust-lang/rust-clippy#6466). Fixes: #12916 Signed-off-by: Spyros Seimenis <sse@edgeless.systems>	2026-04-27 12:14:04 +03:00
Steve Horsman	d5785b4eba	Merge pull request #12872 from stevenhorsman/bump-rust-to-1.93 Bump rust to 1.93	2026-04-27 09:01:00 +01:00
Steve Horsman	63e50dd946	Merge pull request #12817 from burgerdev/regorus-bump genpolicy: update regorus to 0.9.1	2026-04-26 13:58:40 +01:00
Fabiano Fidêncio	120d895d60	Merge pull request #12918 from mythi/no-ita tests: align qemu-tdx kbs tests to use Trustee AS	2026-04-26 13:13:59 +02:00
Fabiano Fidêncio	74d9d043f0	agent: raise regorus policy length limits regorus 0.9.0 introduced a hard, per-engine ceiling on parsed-policy size (1024 columns / 1 MiB / 20 000 lines, see lexer.rs:30 in microsoft/regorus). The 1024-column cap rejects realistic policies emitted by `genpolicy`: the `NVIDIA_REQUIRE_CUDA` environment variable on `nvcr.io/nvidia/k8s/cuda-sample` is roughly 1.3 KiB on a single line, so the agent's `set_policy()` returns an error, the agent (PID 1) exits, the guest kernel reboots, and the runtime eventually times out connecting to the agent's vsock. regorus PR #624 ("feat: make policy length limits configurable per engine") adds `Engine::set_policy_length_config`, but it has not been released yet -- the latest published version is still 0.9.1, which predates that change. Pin `regorus` to the upstream commit that includes #624 and call the new setter from `AgentPolicy::new_engine()` with values that comfortably fit any policy we expect to evaluate (64 KiB per line, 16 MiB per file, 200 000 lines) while still rejecting pathological/minified input. Once a regorus release > 0.9.1 ships with #624, the dependency can be moved back to crates.io. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-26 10:18:26 +02:00
Markus Rudy	c8fe6a60d0	genpolicy: update regorus to 0.9.1 The version we used before was released in 2024, it's about time to use a newer version. The new version of the crate comes with a license, which addresses a `cargo deny` finding. Signed-off-by: Markus Rudy <mr@edgeless.systems>	2026-04-26 10:18:26 +02:00
Fabiano Fidêncio	815db4a1df	Merge pull request #12920 from zvonkok/driver-bump cuda: Bump Driver Version	2026-04-26 00:00:00 +02:00
Mikko Ylinen	9cccfb5cb5	tests: align qemu-tdx kbs tests to use Trustee AS No need to deviate from how other CoCo targets use Trustee and enables us to add more tests (e.g., RVPS) that ITA Trustee implemention does not support. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2026-04-25 22:53:15 +02:00

1 2 3 4 5 ...

18820 Commits