kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-07-02 07:02:16 +00:00

Author	SHA1	Message	Date
Fabiano Fidêncio	f36c383b4f	runtime: generate dedicated CLH Azure config variants Create configuration-clh-azure{,-runtime-rs}.toml from the base CLH configs during build. This keeps Mariner-specific defaults in explicit config artifacts instead of ad-hoc runtime mutation. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-28 23:32:37 +02:00
Zvonko Kaiser	aeadb1af35	Merge pull request #12948 from fidencio/topic/numa runtime (go): agent: Add NUMA support for QEMU	2026-05-25 15:33:14 +02:00
Fabiano Fidêncio	1cbe930fc9	runtime: Add pxb-pcie NUMA-aware PCIe topology for VFIO devices When NUMA placement is active and VFIO devices are cold-plugged, create a pxb-pcie (PCIe Expander Bridge) per NUMA node that has devices. Each pxb-pcie carries a numa_node property that gives the guest kernel correct NUMA affinity for all PCI devices beneath it. Root ports are created on each pxb-pcie bus instead of pcie.0, and VFIODevice.Attach() assigns each device to the root port on its host NUMA node's pxb bridge. Non-VFIO devices remain on pcie.0. NUMA placement is "active" when there is more than one guest NUMA node OR a single guest node mapped to a specific host node (the latter happens when maybeRightSizeAutoNUMA() collapses a multi-node sandbox to the GPU's host NUMA node). In both cases buildNUMATopology() also emits the matching memory-backend-ram,host-nodes=,policy=bind entries so guest memory is sourced from the right host node. So pxb-pcie can never capture a leaf virtio-pci device as the default bus, every virtio-pci device emitter (NetDevice, VSOCK, vhost-user-{net,scsi,blk,fs}) now appends bus=pcie.0 explicitly when the machine actually exposes a pcie.0 root. Detection is done via a new hasPCIeRoot() helper that returns true only for q35/virt machine types — ppc64le's pseries (pci.0), s390x's s390-ccw-virtio (CCW transport) and microvm (no PCI) intentionally skip the pin to avoid "Bus 'pcie.0' not found" at startup. This is the only QEMU mechanism that works for both regular and confidential (TDX/SNP) guests, as it operates through the PCI bus hierarchy rather than ACPI table injection. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-24 22:00:46 +02:00
Fabiano Fidêncio	15292da217	config: Enable NUMA by default for nvidia-gpu configurations Enable enable_numa=true in the three nvidia-gpu QEMU configuration templates (base, SNP, TDX). On single-NUMA hosts this is a no-op since buildNUMATopology() returns nil when there is only one node. On multi-NUMA hosts it ensures GPU memory accesses are NUMA-local. Add documentation to all QEMU config templates explaining the VFIO device NUMA placement validation that occurs when NUMA is enabled. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-05-24 22:00:46 +02:00
Fabiano Fidêncio	f53f427859	runtime: Fix vCPU pinning race for Go runtime QEMU may not have spawned all vCPU threads when pinning starts, so query_cpus_fast can return an incomplete list and leave some vCPUs unpinned. To fix it, let's add exponential backoff retries before pinning and fall back to available threads if retries are exhausted. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-24 22:00:46 +02:00
Fabiano Fidêncio	b688619314	runtime: oci: Fix sandbox CPU sizing with cpuManagerPolicy=static When cpuManagerPolicy=static is configured, kubelet sets the sandbox CPU quota to -1 (unconstrained) because it uses cpuset pinning instead of CFS quota. This causes CalculateSandboxSizing to compute 0 workload CPUs, resulting in the VM starting with only default_vcpus. Fall back to deriving the CPU count from sandbox CPU shares (1024 shares per CPU) when the quota-based calculation yields 0. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-24 22:00:46 +02:00
Fabiano Fidêncio	12e5985dbd	runtime: Add NUMA-aware vCPU pinning and cpuset.mems forwarding Make checkVCPUsPinning() NUMA-aware: when GuestNUMANodes are configured, vCPU threads are pinned to host CPUs belonging to the same NUMA node as the vCPU's guest NUMA node assignment via checkVCPUsPinningNUMA(), preserving memory locality. vCPUs are distributed proportionally across NUMA nodes, matching the distribution in buildNUMATopology(). Stop unconditionally stripping cpuset.mems in constrainGRPCSpec() and container update(). When multi-NUMA is configured, translate host NUMA node IDs to guest NUMA node IDs using translateHostMemsToGuest() before forwarding to the agent. This allows the agent to enforce NUMA-aware memory placement for containers. Filter guest NUMA nodes at VM creation time: before calling CreateVM(), prune GuestNUMANodes to only those whose HostCPUs intersect the sandbox cpuset. This avoids exposing fake NUMA topology to the guest when Kubernetes allocates CPUs from fewer nodes than the host has (e.g. all CPUs from node 0 on a 2-node host), improving memory locality and avoiding unnecessary cross-node memory traffic. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-05-24 22:00:46 +02:00
Fabiano Fidêncio	d0d7deb262	runtime: Add host NUMA distance discovery and build guest NUMA topology Add sysfs-based host NUMA distance reading (GetHostNUMADistances) that parses /sys/devices/system/node/nodeN/distance to mirror the host NUMA distance matrix into the guest via -numa dist entries. Implement buildNUMATopology() which translates the GuestNUMANodes configuration into govmm NUMANode and NUMADist slices. Each guest NUMA node gets a floor-divided share of vCPUs and memory, with the last node absorbing any remainder. This handles the common Kata case of +1 VMM overhead vCPU gracefully. Memory backends are selected based on hugepages/virtio-fs/file-backed-mem configuration. Guard multi-NUMA topology generation to amd64 and arm64 only, since other architectures (s390x, riscv64) do not support QEMU NUMA/DIMM. Wire buildNUMATopology() into CreateVM so the QEMU config includes NUMA nodes and distances. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-05-24 22:00:46 +02:00
Fabiano Fidêncio	447e2a3faf	runtime: Add VFIO device NUMA node detection and placement validation Add PCISysFsDevicesNUMANode property and GetPCIDeviceNUMANode() helper to read /sys/bus/pci/devices/<BDF>/numa_node when discovering VFIO devices. Store the result in the new NUMANode field on VFIODev (-1 for unknown/no affinity). Wire NUMA node detection into both GetAllVFIODevicesFromIOMMUGroup() (legacy VFIO path) and GetDeviceFromVFIODev() (IOMMUFD path) so every discovered VFIO device carries its host NUMA node. Add validateVFIODeviceNUMAPlacement() which runs at the end of buildNUMATopology(). It checks every cold-plugged VFIO device's host NUMA node against the guest NUMA topology and logs a warning if a device is on a host NUMA node not covered by any guest NUMA node (indicating potential cross-NUMA memory access overhead), or an info message confirming correct placement. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-05-24 22:00:46 +02:00
Fabiano Fidêncio	1ee8bb5740	runtime: Add NUMA-aware SMP topology Make cpuTopology() NUMA-aware by accepting a numNUMANodes parameter. When multiple NUMA nodes are configured, restructure the SMP topology so that Sockets=numNUMA and Cores=ceil(maxvcpus/numNUMA), grouping vCPUs by socket per NUMA node. Use ceiling division so that uneven vCPU counts (e.g. the +1 VMM overhead vCPU that Kata adds) produce a QEMU-valid SMP topology where MaxCPUs == Sockets * Cores * Threads. When numNUMANodes <= 1, the existing flat topology (Sockets=maxvcpus, Cores=1) is preserved. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-05-24 22:00:46 +02:00
Fabiano Fidêncio	1e9da61d48	govmm: Add multi-NUMA memory backend and distance matrix support Introduce NUMANode and NUMADist types, add NUMANodes/NUMADists fields to Config, and implement appendMultiNUMAMemoryKnobs() to generate per-node memory-backend objects with host-nodes/policy=bind, -numa node entries with cpus= ranges, and -numa dist entries for the distance matrix. Gate the multi-NUMA path in appendMemoryKnobs() behind isDimmSupported() to ensure architectures without DIMM support (s390x, riscv64) fall back to the single-node path. Drop 386 from isDimmSupported since 32-bit x86 is not a supported Kata target. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-05-24 22:00:46 +02:00
Florian Vichot	554e8f91b1	kata-monitor: use full URI for connecting to containerd Without the protocol in the URI, grpc-go defaults to the DNS resolver, which results in an error for unix sockets (`name resolver error: produced zero addresses`). We also remove the `getAddressAndDialer(...)` and `dial(...)` functions, as they are no longer necessary, grpc-go supports connecting to unix sockets directly. This also removes the matching tests. This also adds a `Makefile` and tweaks the Dockerfile to simplify building the Docker image. Fixes #12398 Signed-off-by: Florian Vichot <florian.vichot@gmail.com>	2026-05-23 16:47:46 +02:00
dependabot[bot]	ac77c5fdff	build(deps): bump github.com/containerd/containerd in /src/runtime Bumps [github.com/containerd/containerd](https://github.com/containerd/containerd) from 1.7.29 to 1.7.32. - [Release notes](https://github.com/containerd/containerd/releases) - [Changelog](https://github.com/containerd/containerd/blob/main/RELEASES.md) - [Commits](https://github.com/containerd/containerd/compare/v1.7.29...v1.7.32) --- updated-dependencies: - dependency-name: github.com/containerd/containerd dependency-version: 1.7.32 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2026-05-21 21:56:06 +00:00
Fabiano Fidêncio	f9eafb3341	runtime: drop host time namespace from OCI spec Docker 29.5+ adds a private time namespace to container bundles by default, but kata agent only supports the classic namespace set and then fails with "invalid namespace type". Let's strip time namespaces in both the Go and rust runtimes before the spec reaches the agent, matching how network and cgroup namespaces are handled. Fixes: #13080 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-21 13:56:45 +02:00
Fabiano Fidêncio	ffa59ce3aa	Merge commit from fork runtime: disable virtiofsd extra-args annotation by default	2026-05-19 08:22:12 +02:00
Alex Lyn	8dca734008	Merge pull request #12959 from DataDog/mayeul/fix-race-condition-when-adding-qdisc shim: Add backoff retry to ingress qdisc creation to avoid potential race condition	2026-05-19 14:06:37 +08:00
Sebastian Wolf	26746c9ce8	runtime/fc: track real firecracker PID instead of jailer PID When the jailer is in use (the default for kata-fc), cmd.Process.Pid in fcInit() is the jailer's PID, not firecracker's. The jailer forks + execs firecracker as a separate child and exits. fc.info.PID was therefore stored as the (soon-to-be-dead) jailer PID. At sandbox shutdown, fcEnd() calls WaitLocalProcess(fc.info.PID, SIGTERM, ...). syscall.Kill on the dead jailer PID returns ESRCH, WaitLocalProcess returns nil immediately, and the real firecracker microVM never receives a signal. It gets reparented to init and stays alive indefinitely, holding open resources from the host. Over many container lifecycles this becomes a serious resource leak. Read the real PID from <jailerRoot>/firecracker.pid, which firecracker itself writes after the exec. Update fc.info.PID with that value so all downstream code (fcEnd, Save/Load, kill-0 alive checks, NewProc) operates on the actual firecracker process. Also fix a small adjacent bug in Sandbox.Stop where the per-container teardown loop ignored the force flag, causing any container.stop error to short-circuit Stop before stopVM ran. Signed-off-by: Sebastian Wolf <swolf@nvidia.com>	2026-05-18 21:09:51 +02:00
Mayeul Blanzat	26f60ddd9b	shim: Add backoff retry to ingress qdisc creation to avoid race condition We sometimes get this error when creating the pod sandbox: failed to create shim task: Failed to add qdisc for network index 2 : device or resource busy. Adding a linear backoff retry when adding the qdisc to help mitigate the issue at the source and avoid the cascading error. Signed-off-by: Mayeul Blanzat <mayeul.blanzat@datadoghq.com>	2026-05-18 17:46:50 +02:00
Steve Horsman	557fb5187b	Merge pull request #12853 from kata-containers/dependabot/go_modules/src/runtime/github.com/sirupsen/logrus-1.9.4 build(deps): bump github.com/sirupsen/logrus from 1.9.3 to 1.9.4 in /src/runtime	2026-05-14 13:56:10 +01:00
Fabiano Fidêncio	c8f6f17269	Merge pull request #13027 from PiotrProkop/fix-loop-blockfile-sandbox-cgroup runtime: allow loopback devices when sandbox_cgroup_only is enabled	2026-05-14 11:18:45 +02:00
dependabot[bot]	408e15641c	build(deps): bump github.com/sirupsen/logrus in /src/runtime Bumps [github.com/sirupsen/logrus](https://github.com/sirupsen/logrus) from 1.9.3 to 1.9.4. - [Release notes](https://github.com/sirupsen/logrus/releases) - [Changelog](https://github.com/sirupsen/logrus/blob/master/CHANGELOG.md) - [Commits](https://github.com/sirupsen/logrus/compare/v1.9.3...v1.9.4) --- updated-dependencies: - dependency-name: github.com/sirupsen/logrus dependency-version: 1.9.4 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2026-05-13 06:11:21 +00:00
Greg Kurz	d2dc0a923c	Merge pull request #13030 from stevenhorsman/go-1.25.10-bump Go 1.25.10 bump	2026-05-13 08:09:51 +02:00
PiotrProkop	5065058d4a	runtime: fix device allowlist detection comparing pointers Because intptr() returns a fresh pointer on every call, those comparisons compared addresses, never values, so every check evaluated to false. As a result /dev/null, /dev/urandom, /dev/ptmx, /dev/loop-control and /dev/loop* were appended to devices allowlist for sandbox_cgroup even when the runtime spec already listed them, producing duplicate entries. Switch to nil-safe value comparisons via a type switch on the cgroup device type and dereferenced d.Major / d.Minor, keeping the same detection semantics but actually matching existing entries. Assisted-By: Claude 4.7 Signed-off-by: PiotrProkop <pprokop@nvidia.com>	2026-05-12 18:52:53 +02:00
PiotrProkop	5cd187619e	runtime: allow loopback devices for sandbox cgroup only When sandbox_cgroup_only is enabled, the kata shim threads inherit the sandbox device cgroup. For container rootfs whose mount source is a regular file backed by a loop device (notably the blockfile snapshotter), containerd's mount package opens /dev/loop-control to allocate a free /dev/loopN and then opens that block node to attach the backing file. Neither device is on the sandbox cgroup allowlist, so both opens fail with EPERM. This change adds /dev/loop-control (char 10:237) and the /dev/loopN block nodes (block major 7, any minor) to the sandbox device cgroup allowlist when sandbox_cgroup_only is true, mirroring the existing treatment of /dev/null, /dev/urandom and /dev/ptmx. The additions are gated on SandboxCgroupOnly because that is the only mode in which the shim itself is constrained by this cgroup. Assisted-By: Claude 4.7 Signed-off-by: PiotrProkop <pprokop@nvidia.com>	2026-05-12 18:48:58 +02:00
stevenhorsman	7cc72b933d	versions: bump golang.org/x/net to v0.53.0 Bump golang.org/x/net to resolve CVE: - GO-2026-4918 Signed-off-by: stevenhorsman <steven@uk.ibm.com> Assisted-by: IBM Bob	2026-05-12 11:56:26 +01:00
stevenhorsman	4a65aca9cf	versions: bump golang to 1.25.10 Bump the go version to resolve CVEs: - GO-2026-4918 - GO-2026-4971 - GO-2026-4976 - GO-2026-4977 - GO-2026-4980 - GO-2026-4981 - GO-2026-4982 - GO-2026-4986 Signed-off-by: stevenhorsman <steven@uk.ibm.com> Assisted-by: IBM Bob	2026-05-12 11:56:13 +01:00
Fabiano Fidêncio	6b802a4e30	nvidia: switch GPU rootfs images to erofs Switch the NVIDIA GPU rootfs images (both standard and confidential) from ext4 to erofs (Enhanced Read-Only File System). Unlike ext4, which is a read-write filesystem mounted read-only by convention, erofs is structurally read-only -- no journal, no write metadata, no superblock write path. This eliminates accidental mutation and reduces the attack surface inside the guest VM, which is particularly important for confidential workloads using dm-verity. Introduce a DEFROOTFSTYPE_NV Makefile variable (set to erofs) for both Go and Rust runtimes, keeping the global DEFROOTFSTYPE as ext4 so non-NVIDIA configurations are unaffected. Update all six NVIDIA GPU configuration templates (base, SNP, TDX for both runtimes) to use @DEFROOTFSTYPE_NV@ instead of the global @DEFROOTFSTYPE@. Export FS_TYPE=erofs in install_image_nvidia_gpu() and install_image_nvidia_gpu_confidential() so the build pipeline produces erofs images via the image builder. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-10 17:18:05 +02:00
Fabiano Fidêncio	c945d2701c	runtime: disable virtiofsd extra-args annotation by default Keep virtio_fs_extra_args support in code, but remove it from default enable_annotations and add explicit security warnings in Makefiles and docs. Release-note note: mirror this hardening in release notes so operators know this remains opt-in and carries host-side risk when enabled. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-09 13:21:39 +02:00
Greg Kurz	bb933f65e4	vendor: Remove `make vendor` across the repo `make vendor` isn't required anymore. People who need vendored code should use the `tools/packaging/release/generate_vendor.sh` script instead. Assisted-by: Claude AI Signed-off-by: Greg Kurz <groug@kaod.org>	2026-05-06 09:49:52 +02:00
Greg Kurz	b44e56d3db	runtime: Remove vendor directory Now shipped in the vendored code tarball. Drop the git tree status check since it isn't needed anymore. Also stop building with `-mod=vendor`. This requires to expose GOMODCACHE as suggested by Fabiano Fidêncio. Signed-off-by: Greg Kurz <groug@kaod.org>	2026-05-06 09:47:30 +02:00
Fabiano Fidêncio	6436922f5b	runtime: network: handle "device" type interfaces (mlx5 SFs) Interfaces whose drivers do not register a specific netlink kind (e.g. mlx5 Scalable Functions) are reported with the generic type "device". The endpoint creation code did not handle this type, causing sandbox creation to fail with: "Unsupported network interface: device" This is particularly visible on arm64 with Mellanox ConnectX NICs using Scalable Functions, where the ethtool BusInfo returns a non-PCI identifier (e.g. "mlx5_core.sf.4") so isPhysicalIface() cannot classify the interface as physical either. Handle "device" type interfaces the same way as veth endpoints, connecting them through a TAP + TC-filter bridge. Additionally, relax getLinkForEndpoint() for VethEndpoint so it accepts the concrete link type returned by the kernel instead of asserting netlink.Veth. A "device" type interface wrapped in a VethEndpoint returns netlink.Device from LinkByName(), which would fail the strict type assertion. All callers only need link.Attrs(), so accepting any link type is safe. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-25 12:26:20 +02:00
Fabiano Fidêncio	77e558deb0	runtime: Fix shellcheck issues in git_push.sh Fix shellcheck warnings and notes identified by running shellcheck --severity=style. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-24 08:14:07 +02:00
Fabiano Fidêncio	4c490579d5	runtime: Fix shellcheck issues in update-generated-runtime-proto.sh Fix shellcheck warnings and notes identified by running shellcheck --severity=style. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-24 08:14:07 +02:00
Fabiano Fidêncio	71e5e67b07	runtime: Fix shellcheck issues in update-generated-hypervisor-proto.sh Fix shellcheck warnings and notes identified by running shellcheck --severity=style. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-24 08:14:07 +02:00
Fabiano Fidêncio	01fb3bdd1f	runtime: Fix shellcheck issues in tree_status.sh Fix shellcheck warnings and notes identified by running shellcheck --severity=style. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-24 08:14:07 +02:00
Fabiano Fidêncio	5ef09c222b	runtime: Fix shellcheck issues in go-test.sh Fix shellcheck warnings and notes identified by running shellcheck --severity=style. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-24 08:14:07 +02:00
Fabiano Fidêncio	c7e3f95883	tests: remove disabled tracing tests and CI job The run-tracing job in basic-ci-amd64.yaml has been disabled (if: false) due to issue #9763, with no path to re-enablement. Remove the job definition and the backing tests/functional/tracing/ directory. Made-with: Cursor Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-23 08:46:12 +02:00
Saul Paredes	83bbfedc08	network: preseed default-gateway neighbor This change mirrors host networking into the guest as before, but now also includes the default gateway neighbor entry for each interface. Pods using overlay/synthetic gateways (e.g., 169.254.1.1) can hit a first-connect race while the guest performs the initial ARP. Preseeding the gateway neighbor removes that latency and makes early connections (e.g., to the API Service) deterministic. Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2026-04-20 10:00:19 -07:00
Fabiano Fidêncio	64c139208f	agent: add GetDiagnosticData RPC with termination log support Add a new extensible GetDiagnosticData RPC that retrieves diagnostic information from the guest VM. The request carries a log_type string field to specify what kind of data is requested, and a container_id field to identify the target container. The first supported log_type is "termination_log", which reads the Kubernetes termination message file from inside the guest. This is needed for shared_fs=none configurations where the host cannot directly access the guest filesystem. On the Go runtime side, the container stop() path now calls GetDiagnosticData to copy the termination message to the host when running with NoSharedFS and the terminationMessagePolicy annotation is set to "File". The call is best-effort: failures are logged as warnings rather than blocking container teardown. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Silenio Quarti <silenio_quarti@ca.ibm.com>	2026-04-17 13:01:13 +02:00
Fabiano Fidêncio	661cfd7efa	Merge pull request #12800 from kata-containers/dependabot/go_modules/src/runtime/go.opentelemetry.io/otel/sdk-1.43.0 build(deps): bump go.opentelemetry.io/otel/sdk from 1.40.0 to 1.43.0 in /src/runtime	2026-04-14 17:22:47 +02:00
Fabiano Fidêncio	b17dd2a902	runtime: Fix concurrent map read/write panic in Wait() Wait() was releasing s.mu immediately after getContainer(), then calling getExec() — which reads c.execs — without holding any lock. Concurrent Exec() or Delete() calls that write to c.execs under s.mu triggered a "concurrent map read and map write" fatal panic. Add a dedicated sync.RWMutex to the container struct that protects the execs map. getExec() now acquires a read lock internally, and all writes go through new setExec()/deleteExec() helpers that acquire the write lock. This keeps the locking concern local to the map and avoids complicating the s.mu usage in Wait(). Add a regression test (TestConcurrentExecAccess) that exercises concurrent getExec reads against setExec/deleteExec writes; this reliably reproduces the panic under the race detector without the fix. Fixes: #12825 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-13 21:14:28 +02:00
dependabot[bot]	b303600283	build(deps): bump go.opentelemetry.io/otel/sdk in /src/runtime Bumps [go.opentelemetry.io/otel/sdk](https://github.com/open-telemetry/opentelemetry-go) from 1.40.0 to 1.43.0. - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/v1.40.0...v1.43.0) --- updated-dependencies: - dependency-name: go.opentelemetry.io/otel/sdk dependency-version: 1.43.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2026-04-13 10:36:44 +00:00
Fabiano Fidêncio	6f3c11aec4	Merge pull request #12808 from fidencio/topic/agent-allow-configuring-launch-process-timeout agent: Make launch_process_timeout configurable	2026-04-11 00:36:01 +02:00
Fabiano Fidêncio	7244389ad4	runtime: Set `enable_vcpus_pinning = true` for NVIDIA configs So we can have a better performance by default. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-10 16:41:34 +02:00
Fabiano Fidêncio	e8f34a2b26	agent: Update protocol This is not related to this PR, but rather to #12734, which ended up not running the `make src/agent generate-protocols`. While here, let's also fix it. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-10 14:47:01 +02:00
Fabiano Fidêncio	36a2d8e7f2	agent: Make launch_process_timeout configurable The hardcoded DEFAULT_LAUNCH_PROCESS_TIMEOUT of 6 seconds in the kata agent is insufficient for environments with NVIDIA GPUs and NVSwitches, where the attestation-agent needs significantly more time to collect evidence during initialization (e.g. ~2 seconds per NVSwitch). When the timeout expires, the agent (PID 1) exits with an error, causing the guest kernel to perform an orderly shutdown before the attestation-agent has finished starting. Make this timeout configurable via the kernel parameter agent.launch_process_timeout (in seconds), preserving the 6-second default for backward compatibility. The Go runtime is wired up to pass this value from the TOML config's [agent.kata] section through to the kernel command line. The NVIDIA GPU configs set the new default to 15 seconds. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-10 14:47:01 +02:00
stevenhorsman	31f9a5461b	versions: bump golang to 1.25.9 Bump the go version to resolve CVEs: - GO-2026-4947 - GO-2026-4946 - GO-2026-4870 - GO-2026-4869 - GO-2026-4865 - GO-2026-4864 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-09 08:59:40 +01:00
Hyounggyu Choi	f15f7f49f1	Merge pull request #12787 from fidencio/topic/runtime-rs-qemu-arm64-use-static-sandbox-resource-mgmt runtime: qemu: Enable static sandbox resource management on ARM & s390x	2026-04-09 09:18:11 +02:00
Amanda Liem	79f844d057	runtime: SNP img-based rootfs with dm-verity Follow-on to kata-containers/kata-containers#12396 Switch SNP config from initrd-based to image-based rootfs with dm-verity. The runtime assembles the dm-mod.create kernel cmdline from kernel_verity_params, and with kernel-hashes=on the root hash is included in the SNP launch measurement. Also add qemu-snp to the measured rootfs integration test. Signed-off-by: Amanda Liem <aliem@amd.com>	2026-04-08 16:46:32 +00:00
Fabiano Fidêncio	ffab9b7eee	runtime: qemu: Enable static sandbox resource management on ARM runtime-rs lacks several features needed for CPU hotplug on ARM: pflash/UEFI firmware passthrough, SMP topology in -smp, nr_cpus kernel parameter, and QMP vCPU add handling for the virt machine type (which requires core-id only placement with socket/thread/die set to -1). Without static sandbox resource management, these gaps cause failures in tests like k8s-memory.bats where the VM is not correctly sized for the workload. Enable static_sandbox_resource_mgmt for aarch64 in the QEMU runtime-rs configuration so the VM is pre-sized at creation time, sidestepping the need for hotplug entirely. Together with this we're aligning the go runtime to the very same behaviour. Fixes: #10928 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-08 16:36:00 +02:00

1 2 3 4 5 ...

2299 Commits