kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-07-01 22:50:54 +00:00

Author	SHA1	Message	Date
Alex Lyn	3095bd379b	runtime-rs: Introduce cancellation for OOM watcher during teardown This commit introduces an explicit cancellation mechanism for the OOM watcher loop within VirtSandbox. This addresses the issue where the watcher continues to poll for OOM events even when the sandbox is being stopped, leading to spurious "Connection reset by peer" errors. Key changes: (1) A CancellationToken is added to VirtSandbox to signal the watcher loop when the sandbox is undergoing teardown. (2) The OOM watcher loop in VirtSandbox::start() is now wrapped in a tokio::select! statement. This allows it to concurrently listen for two events: - cancel_token.cancelled(): Triggered when the sandbox/VM is stopping. - agent.get_oom_event(): The regular OOM event polling. (3) In the sandbox stop/teardown path, cancel_token.cancel() is called before stopping the VM. This ensures the OOM watcher loop exits cleanly via the cancellation token, preventing the occurrence of ECONNRESET/EOF errors on a closed channel. This change improves the robustness of OOM event handling during sandbox lifecycle management. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-16 12:56:54 +08:00
Fabiano Fidêncio	87d27e0cc8	kata-deploy-job-dispatcher: add generic per-node Job dispatcher Add a small, deployment-agnostic dispatcher binary that runs exactly one Kubernetes Job per selected node and paces the rollout, so callers get guaranteed per-node coverage without encoding the fan-out in Helm. Motivation: templating one Job per node into a Helm release does not scale (the release Secret hits etcd's 1 MiB limit and hooks run sequentially), and a single Indexed Job cannot guarantee per-node coverage when paced - the scheduler ignores completed pods when evaluating topology spread, so nodes get uneven numbers of pods. A tiny dispatcher that enumerates nodes live and creates node-pinned Jobs itself sidesteps both problems and keeps the Helm release O(1) in fleet size. The dispatcher: - enumerates target nodes live (explicit --nodes list or --node-selector label selector), paginating the API; - stamps out one Job per node from a YAML template, pinning it with nodeName and an owner label for server-side filtering; - keeps at most --parallelism Jobs in flight, refilling as they finish, and sets an OwnerReference to the owner Job so the per-node Jobs are garbage-collected with it; - is a plain API client (kube): it never touches the host, so it can run fully unprivileged. Node membership is resolved live on each run, not frozen at Helm template-render time: re-running the dispatcher (e.g. via `helm upgrade`) picks up nodes added since the last run and skips ones already done, as the per-node stages are idempotent. The dispatcher is one-shot, however - it does not watch the API, so nodes added while it is not running are only covered by the next run. job.rs holds the pure helpers (node-name sanitization, deterministic Job naming, template instantiation, status interpretation) with rstest unit tests; main.rs wires up the CLI and the fan-out loop. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 18:58:33 +02:00
Alex Lyn	c1ebf269f7	runtime-rs: Add nydus client for nydusd API communication via HTTP Implement NydusClient to interact with nydusd daemon via Unix socket: (1) check_status: query daemon state via GET /api/v1/daemon. (2) mount/umount: manage filesystem mounts via POST/DELETE /api/v1/mount. (3) wait_until_ready: poll daemon until RUNNING state. This provides a lightweight, stateless HTTP client layer for nydusd API. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 21:42:48 +02:00
Alex Lyn	6500e018c0	Merge pull request #13093 from RainaYL/rainax/tdx_boot_pr dragonball: Add steps to boot TDX VM	2026-06-09 10:13:57 +08:00
Steve Horsman	2ac6bb173b	Merge pull request #13036 from stevenhorsman/jaeger-to-otlp-tracing-switch trace-forwarder: migrate from Jaeger to OTLP exporter	2026-06-05 14:30:26 +01:00
Steve Horsman	1624ebe362	Merge pull request #13135 from kata-containers/dependabot/cargo/tar-0.4.46 build(deps): bump tar from 0.4.45 to 0.4.46	2026-06-05 09:44:46 +01:00
stevenhorsman	b737ae48bf	trace-forwarder: migrate from Jaeger to OTLP exporter Migrate trace-forwarder from the deprecated opentelemetry-jaeger exporter to the modern opentelemetry-otlp exporter. This change remediates GHSA-2f9f-gq7v-9h6m (CVE-2026-43868), a medium-severity vulnerability in Apache Thrift. The opentelemetry-jaeger crate is no longer maintained and depends on vulnerable thrift versions (0.13.0 and 0.16.0). The opentelemetry-otlp exporter does not use thrift and is actively maintained. Changes: - Replace opentelemetry-jaeger with opentelemetry-otlp in Cargo.toml - Update tracer.rs to use OTLP exporter instead of Jaeger exporter - Replace --jaeger-host/--jaeger-port flags with --otlp-endpoint flag - Update server.rs to use TracerProvider instead of SpanExporter - Update documentation to reflect OTLP migration - Add examples for common OTLP-compatible collectors Breaking change: Users must update their trace-forwarder invocations to use --otlp-endpoint instead of --jaeger-host and --jaeger-port. Default endpoint: http://localhost:4317 (OTLP gRPC) Generated-by: IBM Bob Signed-off-by: stevenhorsman <steven@uk.ibm.com> Co-authored-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2026-06-04 19:39:47 +01:00
dependabot[bot]	4ab63d0a5d	build(deps): bump tar from 0.4.45 to 0.4.46 Bumps [tar](https://github.com/composefs/tar-rs) from 0.4.45 to 0.4.46. - [Release notes](https://github.com/composefs/tar-rs/releases) - [Commits](https://github.com/composefs/tar-rs/compare/0.4.45...0.4.46) --- updated-dependencies: - dependency-name: tar dependency-version: 0.4.46 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2026-06-04 07:52:44 +00:00
dependabot[bot]	d155f1a4ab	build(deps): bump openssl from 0.10.79 to 0.10.80 Bumps [openssl](https://github.com/rust-openssl/rust-openssl) from 0.10.79 to 0.10.80. - [Release notes](https://github.com/rust-openssl/rust-openssl/releases) - [Commits](https://github.com/rust-openssl/rust-openssl/compare/openssl-v0.10.79...openssl-v0.10.80) --- updated-dependencies: - dependency-name: openssl dependency-version: 0.10.80 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2026-06-04 07:51:50 +00:00
Fabiano Fidêncio	67843220f8	runtime-rs: set VF admin MAC before vfio-pci rebind for IB/RoCE support Without an admin MAC the guest mlx5_core inherits whatever firmware- default MAC the VF was created with. This MAC differs from the IB port HCA MAC, so mlx5_ib's GID cache refuses to populate /sys/class/infiniband/mlx5_/ports/N/gids/. RoCE appears active but every verb needing a GID fails. Before bind_device_to_vfio(), push the CNI-assigned MAC down to the VF as an "admin MAC" via the parent PF using RTM_SETLINK with IFLA_VFINFO_LIST — the netlink equivalent of ip link set <PF> vf <N> mac <MAC> The operation runs in a spawn_blocking closure that enters the host network namespace (via NetnsGuard("/proc/1/ns/net")), since attach() is called while the thread is inside the pod netns. Best-effort: failures are logged at warn and the existing agent-side MAC reconciliation (update_interface in rpc.rs) remains as a fallback for L2/L3 connectivity. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-05-29 13:07:45 +02:00
Xiaofan Xxf	4f2e893bdb	dragonball: Add steps to boot TDX VM A few ioctls should be invoked before booting a TDX VM. Major changes: - While calling KVM_CREATE_VM, use KVM_X86_TDX_VM as vm_type argument, instead of 0. - Call KVM_TDX_CAPABILITIES and save the capability info - Call KVM_TDX_INIT_VM before initializing vcpu mamager, because TDX module might allow for a different max vcpu number from the KVM context, and only after calling KVM_TDX_INIT_VM, the correct value would be set and can be retrieved via KVM_CHECK_EXTENSION, so that the max vcpu info saved in vcpu manager would be properly initialized. - Call KVM_TDX_INIT_VCPU after creating vcpus and parsing TDVF, because this ioctl requires HOB address as parameter, which is saved in TDVF metadata. - Call KVM_TDX_INIT_MEM_REGION after loading TDVF data, linux kernel, cmdline and HOB list into VM memory. - Call KVM_TDX_FINALIZE_VM after all previous TDX ioctls. Also deleted dbs-tdx crate, because we are now using virtee's tdx crate, instead of maintaining our own utility module. Signed-off-by: Xiaofan Xxf <xiaofan.xxf@antgroup.com>	2026-05-26 10:35:45 +08:00
Alex Lyn	c3b06af4c7	kata-types: Add gpt_disk module for GPT metadata generation Introduce gpt_disk.rs to compute GPT partition layouts and generate metadata files for multi-layer EROFS rootfs. The module creates GPT head metadata that are combined with EROFS layer images via VMDK descriptors, presenting a single GPT-partitioned virtual disk to the guest VM — each EROFS layer mapped to its own partition. The layout engine calculates LBA positions for an arbitrary number of EROFS layers, then writes a full protective-MBR + GPT image and extracts the head (MBR + primary GPT table) segments as standalone files for VMDK extent assembly. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-25 19:08:31 +08:00
Fabiano Fidêncio	291e4d37be	kata-deploy: implement selective tarball extraction in installer Add zstd and tar as Rust dependencies and rewrite the artifact installation logic to extract only the component tarballs required by the enabled runtime classes. extract_component_tarballs reads shim-components.json to determine which kata-static-<name>.tar.zst files are needed for the selected shims and current architecture. Shared components (e.g. kernel, shim-v2-go) are listed by multiple shims and must only be unpacked once per install run. Deduplication is handled with an in-memory set passed through the call, avoiding any risk of stale on-disk state surviving across pod restarts. Within each tarball, opt/kata path prefixes are stripped and absolute symlink / hard-link targets are rewritten to point at the resolved installation directory, correctly handling MULTI_INSTALL_SUFFIX. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-20 20:52:36 +02:00
stevenhorsman	3466f888db	agent-ctl: Move into root workspace - Add agent-ctl to be a workspace member to simplify the dependency management. - Also add a test target as we've been running it in static-checks without it doing anything Assisted-by: IBM Bob Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-05-18 09:47:15 +01:00
Alex Lyn	34dc055da3	Merge pull request #12932 from RainaYL/rainax/tdshim_pr dragonball: Allow guest VM to load tdshim firmware for booting	2026-05-18 10:43:22 +08:00
Fabiano Fidêncio	d3a9669be5	runtime-rs: implement EncryptedEmptyDirVolume Add the core volume handler for block-encrypted emptyDir support in runtime-rs, bringing it to parity with the Go runtime (PR #10559). When emptydir_mode is set to "block-encrypted", host emptyDir bind mounts are intercepted and handled as follows: 1. A sparse disk image (disk.img) is created inside the emptyDir folder, sized to match the host filesystem capacity. 2. A mountInfo.json is written under the kata direct-volume root with volume_type "blk", fs_type "ext4", and metadata encryptionKey=ephemeral. 3. The disk image is plugged into the guest VM as a virtio-blk device via the hypervisor device manager. 4. An agent::Storage is built with driver_options containing encryption_key=ephemeral and shared=true, so the kata-agent delegates formatting and encryption to CDH using LUKS2. The volume is registered in the dispatch chain before the regular block-volume check, and ephemeral disk metadata is tracked for sandbox-level cleanup at teardown. Also re-exports EMPTYDIR_MODE_* constants from kata-types::config so downstream crates can reference them. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-05-14 22:56:11 +02:00
Xiaofan Xxf	88d892a77f	dragonball: Allow guest VM to load tdshim firmware for booting Added a firmware module to dbs_boot crate, and guest VM is allowed to load tdshim into memory, which serves as a prerequisite for booting TDX VM. And other sections (including kernel payload and cmdline) are also loaded into correct guest physical addresses according to the design of tdshim layout. Signed-off-by: Xiaofan Xxf <xiaofan.xxf@antgroup.com>	2026-05-14 10:04:39 +08:00
Fabiano Fidêncio	346119108e	kata-deploy: drop unused kube features The binary doesn't use kube::runtime (controllers, watchers, reflectors) or kube::derive (the CustomResource macro). Pulling them in only added transitive deps (kube-runtime, kube-derive, backon, educe, ahash, async-broadcast, ...) and inflated the binary's static data segment for no functional gain. Set default-features = false and select only what the binary actually calls into: the kube-client surface plus the rustls-tls backend that hyper-rustls already pulled in transitively. Behaviour is unchanged. Fixes: https://github.com/kata-containers/kata-containers/discussions/12976 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-05-07 13:40:55 +02:00
Fabiano Fidêncio	8a33007806	runtime-rs: Add configuration-qemu-nvidia-gpu-tdx-runtime-rs.toml.in Add a new runtime-rs configuration template that combines the NVIDIA GPU cold-plug stack with Intel TDX confidential guest support. This is the runtime-rs counterpart of the Go runtime's configuration-qemu-nvidia-gpu-tdx template. The template merges the GPU NV settings (VFIO cold-plug, Pod Resources API, NV-specific kernel/image/firmware, extended timeouts) with TDX confidential guest settings (confidential_guest, OVMF.inteltdx.fd firmware, TDX Quote Generation Service socket, confidential NV kernel and image). The Makefile is updated with the new config file registration and the FIRMWARETDVFPATH_NV variable pointing to OVMF.inteltdx.fd. Also removes a stray tdx_quote_generation_service_socket_port setting from the SNP GPU template where it did not belong. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-05-07 10:33:26 +02:00
Alex Lyn	4f618d09d5	runtime-rs: Add Pod Resources CDI discovery in sandbox Query the kubelet Pod Resources API during sandbox setup to discover which GPU devices have been allocated to the pod. When cold_plug_vfio is enabled, the sandbox resolves CDI device specs, extracts host PCI addresses and IOMMU groups from sysfs, and creates VfioModernCfg device entries that get passed to the hypervisor for cold-plug. Add pod-resources and cdi crate dependencies to the runtimes and virt_container workspace members. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-07 10:33:26 +02:00
Alex Lyn	e72ed1c12e	runtime-rs: Add VFIO modern device driver Add the VfioDeviceModern driver for VFIO device passthrough in runtime-rs. The driver handles device discovery through sysfs, detects whether the host uses iommufd cdev or legacy VFIO group interfaces, resolves PCI BDF addresses and IOMMU groups, and implements the Device and PCIeDevice traits for hypervisor integration. The module is structured as: - core.rs: sysfs discovery, BDF parsing, IOMMU group resolution, device-node path logic for both iommufd cdev and legacy group paths - device.rs: VfioDeviceModern/VfioDeviceModernHandle types, Device and PCIeDevice trait implementations - mod.rs: host capability detection (iommufd vs legacy), backend selection logic The DeviceType::VfioModern enum variant and stub PCIeTopology methods (reserve_bus_for_device, release_bus_for_device) are added so the driver compiles; full topology wiring follows in a subsequent commit. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-07 10:33:26 +02:00
Alex Lyn	b4768cfc61	dragonball: Adapt VFIO DMA calls to vfio-ioctls 0.6 API The vfio-ioctls 0.6.0 crate changed the vfio_dma_map signature: the host address parameter is now a raw pointer (*mut u8) instead of u64, and the size parameter is usize instead of u64. Since the kernel uses the host address to set up DMA mappings to physical memory — and the caller must guarantee the memory behind that pointer remains valid for the lifetime of the mapping — upstream marked vfio_dma_map as unsafe fn. Wrap vfio_dma_map calls in unsafe blocks and adjust the type casts accordingly. vfio_dma_unmap only needed the usize cast for the size parameter (it does not take a host address, so it remains safe). Bump workspace dependencies: - vfio-bindings 0.6.1 -> 0.6.2 - vfio-ioctls 0.5.0 -> 0.6.0 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-07 10:33:26 +02:00
Alex Lyn	0bb9b66815	kata-sys-util: Add PCI helpers for VFIO cold-plug paths The VFIO cold-plug path needs to resolve a PCI device's sysfs address from its /dev/vfio/ group or iommufd cdev node. Extend the PCI helpers in kata-sys-util to support this: add a function that walks /sys/bus/pci/devices to find a device by its IOMMU group, and expose the guest BDF that the QEMU command line will reference. These helpers are consumed by the runtime-rs hypervisor crate when building VFIO device descriptors for the QEMU command line. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-07 10:33:26 +02:00
Alex Lyn	1e96e75bf3	pod-resources-rs: Add kubelet Pod Resources API client Add a gRPC client crate that speaks the kubelet PodResourcesLister service (v1). The runtime-rs VFIO cold-plug path needs this to discover which GPU devices the kubelet has assigned to a pod so they can be passed through to the guest before the VM boots. The crate is intentionally kept minimal: it wraps the upstream pod_resources.proto, exposes a Unix-domain-socket client, and re-exports the generated types. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-07 10:33:26 +02:00
dependabot[bot]	8cc9325fee	build(deps): bump openssl from 0.10.78 to 0.10.79 Bumps [openssl](https://github.com/rust-openssl/rust-openssl) from 0.10.78 to 0.10.79. - [Release notes](https://github.com/rust-openssl/rust-openssl/releases) - [Commits](https://github.com/rust-openssl/rust-openssl/compare/openssl-v0.10.78...openssl-v0.10.79) --- updated-dependencies: - dependency-name: openssl dependency-version: 0.10.79 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2026-05-06 10:19:15 +00:00
Fabiano Fidêncio	210ad5de98	runtime-rs: Bump netlinks for Linux 6.17+ IPv6 dev conf RTNetlink Upgrade netlink-packet-route and rtnetlink so IFLA_INET6_CONF matches the kernel's 240-byte layout (DEVCONF_FORCE_FORWARDING). Adapt to API changes: NeighbourAttribute::LinkLayerAddress and bool MulticastSnooping. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-05-05 13:56:44 +02:00
stevenhorsman	efe62c9280	kata-ctl: Move into root workspace Add kata-ctl to be a workspace member to simplify the dependency management. Assisted-by: IBM Bob Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-30 08:45:27 +01:00
stevenhorsman	7664ebda7e	trace-forwarder: Move into root workspace Add trace-forwarder to be a workspace member to simplify the dependency management. Assisted-by: IBM Bob Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-29 12:11:04 +01:00
Fabiano Fidêncio	cbd71f534e	kata-sys-util: add oci_docker module for Docker netns detection Docker 26+ with `runtimeType` shims may not include a network namespace in the OCI spec's `linux.namespaces` and instead uses `libnetwork-setkey` hooks to communicate the sandbox ID. Add helpers to detect Docker containers and resolve the netns path from hook arguments, matching the Go runtime's `DockerNetnsPath` and `IsDockerContainer` utilities. Fixes: #9340 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-28 10:20:18 +02:00
Fabiano Fidêncio	74d9d043f0	agent: raise regorus policy length limits regorus 0.9.0 introduced a hard, per-engine ceiling on parsed-policy size (1024 columns / 1 MiB / 20 000 lines, see lexer.rs:30 in microsoft/regorus). The 1024-column cap rejects realistic policies emitted by `genpolicy`: the `NVIDIA_REQUIRE_CUDA` environment variable on `nvcr.io/nvidia/k8s/cuda-sample` is roughly 1.3 KiB on a single line, so the agent's `set_policy()` returns an error, the agent (PID 1) exits, the guest kernel reboots, and the runtime eventually times out connecting to the agent's vsock. regorus PR #624 ("feat: make policy length limits configurable per engine") adds `Engine::set_policy_length_config`, but it has not been released yet -- the latest published version is still 0.9.1, which predates that change. Pin `regorus` to the upstream commit that includes #624 and call the new setter from `AgentPolicy::new_engine()` with values that comfortably fit any policy we expect to evaluate (64 KiB per line, 16 MiB per file, 200 000 lines) while still rejecting pathological/minified input. Once a regorus release > 0.9.1 ships with #624, the dependency can be moved back to crates.io. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-26 10:18:26 +02:00
Markus Rudy	c8fe6a60d0	genpolicy: update regorus to 0.9.1 The version we used before was released in 2024, it's about time to use a newer version. The new version of the crate comes with a license, which addresses a `cargo deny` finding. Signed-off-by: Markus Rudy <mr@edgeless.systems>	2026-04-26 10:18:26 +02:00
Steve Horsman	fc359d2140	Merge pull request #12901 from kata-containers/dependabot/cargo/openssl-0.10.78 build(deps): bump openssl from 0.10.76 to 0.10.78	2026-04-25 20:59:51 +01:00
dependabot[bot]	151a797fc0	build(deps): bump openssl from 0.10.76 to 0.10.78 Bumps [openssl](https://github.com/rust-openssl/rust-openssl) from 0.10.76 to 0.10.78. - [Release notes](https://github.com/rust-openssl/rust-openssl/releases) - [Commits](https://github.com/rust-openssl/rust-openssl/compare/openssl-v0.10.76...openssl-v0.10.78) --- updated-dependencies: - dependency-name: openssl dependency-version: 0.10.78 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2026-04-25 10:28:48 +00:00
stevenhorsman	d6df75853b	versions: Update rustls-webpki to 0.103.13 Simple bump to fix CVE GHSA-82j2-j2ch-gfr8: Denial of service via panic on malformed CRL BIT STRING Assisted-by: IBM Bob Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-25 11:27:02 +01:00
Fabiano Fidêncio	e0927e0e0c	Merge pull request #12846 from RainaYL/rainax/split_irqchip_pr dragonball: Implement userspace IOAPIC to enable split irqchip	2026-04-24 19:07:45 +02:00
Anjana A R K	d2e0e277cc	kata-agent: Bump serde-enum-str to v0.5.0 Upgraded the serde-enum-str to v0.5.0 which bumps serde-attributes to 0.3.0 version Signed-off-by: Anjana A R K <anjana.a.r.k1@ibm.com>	2026-04-24 15:57:59 +05:30
Xiaofan Xxf	fd39117a21	dragonball: Implement userspace IOAPIC to enable split irqchip From Linux 6.14, creating a TDX VM requires that split irqchip is enabled. Under this circumstance, device IOAPIC would be managed in userspace, instead of KVM, so a manager is needed to handle MMIO read/write to emulated IOAPIC registers. Also, with split irqchip, irqfd is no longer able to trigger an interrupt after device IO is completed. Instead, KVM_SIGNAL_MSI is used for interrupt triggering. Note that only legacy irq with edge-triggered interrupt is implemented here. And split irqchip feature is only enabled when confidential VM type is set to TDX. Signed-off-by: Xiaofan Xxf <xiaofan.xxf@antgroup.com>	2026-04-24 10:33:05 +08:00
Fupan Li	18378145d2	Merge pull request #12821 from fidencio/topic/runtime-rs-cpu-pinning runtime-rs: Add vCPU thread pinning support	2026-04-23 16:49:18 +08:00
Markus Rudy	639ff3578d	genpolicy: restrict symlinks in CopyFile Allowing arbitrary symlinks in the shared directory is unsafe for confidential VM use cases. In order to make CopyFile safe both for the VM as well for the consuming containers, we implement the following rules for symlinks (in addition to the existing rules for other files): 1. Symlinks may not be placed directly into the shared directory. 2. Symlinks must not point 'upwards', i.e. contain `..` as a path element. 3. Symlinks must be relative. These rules ensure that all writes initiated by CopyFile are restricted to the shared directory (protecting the VM), and that symlinks can't point outside their mount points (protecting the container). These new restrictions mean that we can't support arbitrary mount sources (which might not follow these rules), but the usual k8s suspects (ConfigMap, Secret, ServiceAccountToken) should still pass. In order to aid writing the policy, we convert the CopyFileRequest to a structure that does not contain binary data, but well-defined strings and types. Signed-off-by: Markus Rudy <mr@edgeless.systems>	2026-04-22 15:46:12 +02:00
Fabiano Fidêncio	48669a894e	runtime-rs: Add vCPU thread pinning support Port the Go runtime's enable_vcpus_pinning feature to runtime-rs. The Go runtime already lets users pin each vCPU thread to a specific host CPU when the vCPU count matches the sandbox cpuset size, using sched_setaffinity. This is useful for latency-sensitive workloads that benefit from eliminating cross-CPU migration of vCPU threads. The approach mirrors the Go implementation: After VM start and on every container add/update/delete, we fetch the vCPU thread IDs (via QMP query-cpus-fast for QEMU), compute the union of all containers' OCI cpusets, and if the two counts match, pin vCPU i to cpuset[i]. If they diverge (hotplug, container removal, etc.) we reset all threads back to the full cpuset so nothing gets stuck on a single core. The pinning check lives in CgroupsResourceInner::update_sandbox_cgroups, which already runs at exactly the right points in the lifecycle. The enable_vcpus_pinning flag flows from the TOML config through CgroupConfig into the cgroup resource layer, and can also be overridden per-pod via the io.katacontainers.config.runtime.enable_vcpus_pinning annotation. The QEMU config templates default to false. The NV GPU configs will get their own default (true) in a follow-up once those templates are added. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-21 12:45:56 +02:00
stevenhorsman	a59afa3154	versions: Update rustls-webpki to 0.103.12 Simple bump to fix CVEs: - RUSTSEC-2026-0098 - RUSTSEC-2026-0099 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-20 16:24:20 +01:00
stevenhorsman	35be1a938d	versions: Bump rand crate where possible Update all versions of rand that are controlled by us to remediate GHSA-cq8v-f236-94qc. Note: There are still some usages of rand 0.8.5 it that are from transitive dependencies which we can't currently update: - fail - phf_generator - opentelemetry due to them being archived, or our usage being 17 versions out of date Also update the rand API breakages e.g. : - rand::thread_rng() → rand::rng() (function renamed) - rand::distributions::Alphanumeric → rand::distr::Alphanumeric (module renamed) - rng.gen_range() → rng.random_range() (function renamed) Assisted-by: IBM Bob Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-17 15:58:58 +01:00
Fabiano Fidêncio	9e1f595160	kata-deploy: add Rust binary to root workspace Add tools/packaging/kata-deploy/binary as a workspace member, inherit shared dependency versions from the root manifest, and refresh Cargo.lock. Build the kata-deploy image from the repository root: copy the workspace layout into the rust-builder stage, run cargo test/build with -p kata-deploy, and adjust artifact and static asset COPY paths. Update the payload build script to invoke docker buildx with -f .../Dockerfile from the repo root. Add a repo-root .dockerignore to keep the Docker build context smaller. Document running unit tests with cargo test -p kata-deploy from the root. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-07 10:07:06 +08:00
Ruoqing He	2a024f55d0	libs: Move libs into root workspace Remove libs from exclude list, and move them explicitly into root workspace to make sure our core components are in a consistent state. This is a follow up of #12413. Signed-off-by: Ruoqing He <ruoqing.he@lingcage.com>	2026-04-06 11:03:38 +02:00
Jiahao Wang	29e5d5d951	build: Move agent to root workspace This commit adds kata agent to the root workspace, as a follow up work of #12413. Remove agent from exclude list, and make it as a member of root workspace. Signed-off-by: Jiahao Wang <jiahao.wang@lingcage.com>	2026-03-29 06:35:38 +00:00
stevenhorsman	9871256771	versions: Bump cloud-hypervisor to v51 In v51 the license was added, so try bumping to this version to solve the cargo deny issue Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-03-23 10:34:28 +00:00
dependabot[bot]	ef32923461	build(deps): bump tar from 0.4.44 to 0.4.45 Bumps [tar](https://github.com/alexcrichton/tar-rs) from 0.4.44 to 0.4.45. - [Commits](https://github.com/alexcrichton/tar-rs/compare/0.4.44...0.4.45) --- updated-dependencies: - dependency-name: tar dependency-version: 0.4.45 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2026-03-23 10:34:27 +00:00
stevenhorsman	85e17c2e77	deps: Bump rustls-webpki Bump rusttls-webpki to 0.103.10 to remediate RUSTSEC-2026-0049 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-03-23 10:34:27 +00:00
stevenhorsman	c3868f8e60	deps: Bump aws-lc-rs to 1.16.2 Bump aws-lc-rs, so that aws-lc-sys updates to 0.39.0 to remediate RUSTSEC-2026-0044 and https://osv.dev/vulnerability/RUSTSEC-2026-0048 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-03-23 10:34:27 +00:00
Fupan Li	608f378bff	dragonball: make sure the nydus's worker thread access network Since the dragonball's vmm thread had been joined in the pod's netns, which wouldn't access the network, thus we should make sure the nydus's worker thread join into the runD's main thread's netns which would access the network. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2026-03-22 22:44:24 +08:00

1 2

73 Commits