79 Commits

Author SHA1 Message Date
stevenhorsman
d09d1959c2 libs: Update mem-agent to use nix workspace version
Now that the workspace version has been updated,
switch the mem-agent to pick up the new workspace version

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-06-19 03:49:16 -07:00
stevenhorsman
531877f28f deps: Upgrade nix crate from 0.26.4 to 0.31.3
Upgrade the nix crate across the workspace to version 0.30.1 to address
security vulnerabilities and adopt safer file descriptor handling patterns.

### Breaking Changes in nix 0.28.0

1. **File Descriptor Type Changes**
   - Functions now return `OwnedFd` instead of `RawFd` (i32)
   - Functions requiring file descriptors now expect types implementing `AsFd` trait
   - This provides RAII-based automatic cleanup and prevents fd leaks

2. **API Signature Changes**
   - `pipe()`, `pipe2()`, `openpty()` now return `OwnedFd` tuples
   - `socket()` returns `OwnedFd` instead of `RawFd`
   - `open()`, `memfd_create()` return `OwnedFd`
   - `setns()`, `write()`, `fcntl()` require `AsFd` trait
   - `madvise()` requires `NonNull<c_void>` instead of raw pointer
   - `bind()`, `listen()`, `connect()` require `AsFd` and `Backlog` type

3. **Module Feature Flags**
   - Modules now require explicit feature flags (mman, reboot, etc.)

### Additional Breaking Changes in nix 0.30.1

1. **symlinkat() API Change**
   - `dirfd` parameter now requires `AsFd` trait instead of `Option<RawFd>`
   - Use `BorrowedFd::borrow_raw(libc::AT_FDCWD)` for current directory

2. **Type Alias Deprecation**
   - `MemFdCreateFlag` renamed to `MFdFlags` for consistency

### Changes Made

**Workspace Configuration (Cargo.toml)**
- Updated nix to 0.30.1 with features: fs, mount, sched, process, ioctl,
  signal, socket, feature, user, hostname, term, event, mman, reboot

**File Descriptor Handling Patterns**
- Use `BorrowedFd::borrow_raw(raw_fd)` to wrap RawFd for AsFd requirements
- Use `.as_fd().as_raw_fd()` to extract raw fd without ownership transfer
- Use `.into_raw_fd()` only when ownership transfer is needed
- Use `NonNull::new().unwrap()` for madvise pointer conversion

**Deprecated API Replacements**
- `eventfd()` → `EventFd::from_value_and_flags()`
- `Errno::from_i32()` → `Errno::from_raw()`
- `listen(fd, backlog)` → `listen(&fd, Backlog::new(backlog).unwrap())`
- `MemFdCreateFlag` → `MFdFlags`

Generated by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-06-19 03:49:16 -07:00
stevenhorsman
2b8b09469d dragonball: Use workspace nix version
See if we can sync to use the workspace version for easier
dependency management

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-06-19 03:49:16 -07:00
Charlotte Hartmann Paludo
b4be5fdcca runtime-rs: change safe-path dependency from crates.io to workspace
`safe-path` is resolved from the local workspace in all other workspace
member crates. This commit changes the dependency to a local one for
runtime-rs as well.

Signed-off-by: Charlotte Hartmann Paludo <git@charlotteharludo.com>
Co-authored-by: Markus Rudy <mr@edgeless.systems>
2026-06-18 06:32:06 +02:00
Fabiano Fidêncio
3ca5742338 Merge pull request #13129 from pmores/fix-default_memory_annontation
runtime-rs: fix default_memory annonation processing
2026-06-16 18:11:19 +02:00
Pavel Mores
9b31e06c20 runtime-rs: bump the byte-unit dependency version
The unit tests added by the previous commit exposed a malfunction of the
byte-unit crate on big-endian systems(*), causing s390x CI to fail.
Bump the dependency's version to include a fix.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2026-06-16 13:15:23 +02:00
Alex Lyn
3095bd379b runtime-rs: Introduce cancellation for OOM watcher during teardown
This commit introduces an explicit cancellation mechanism for the OOM
watcher loop within VirtSandbox. This addresses the issue where the
watcher continues to poll for OOM events even when the sandbox is being
stopped, leading to spurious "Connection reset by peer" errors.

Key changes:
(1) A CancellationToken is added to VirtSandbox to signal the watcher
loop when the sandbox is undergoing teardown.
(2) The OOM watcher loop in VirtSandbox::start() is now wrapped in a
tokio::select! statement. This allows it to concurrently listen for
two events:
- cancel_token.cancelled(): Triggered when the sandbox/VM is stopping.
- agent.get_oom_event(): The regular OOM event polling.
(3) In the sandbox stop/teardown path, cancel_token.cancel() is called
before stopping the VM. This ensures the OOM watcher loop exits cleanly
via the cancellation token, preventing the occurrence of ECONNRESET/EOF
errors on a closed channel.

This change improves the robustness of OOM event handling during sandbox
lifecycle management.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-16 12:56:54 +08:00
Fabiano Fidêncio
87d27e0cc8 kata-deploy-job-dispatcher: add generic per-node Job dispatcher
Add a small, deployment-agnostic dispatcher binary that runs exactly one
Kubernetes Job per selected node and paces the rollout, so callers get
guaranteed per-node coverage without encoding the fan-out in Helm.

Motivation: templating one Job per node into a Helm release does not
scale (the release Secret hits etcd's 1 MiB limit and hooks run
sequentially), and a single Indexed Job cannot guarantee per-node
coverage when paced - the scheduler ignores completed pods when
evaluating topology spread, so nodes get uneven numbers of pods. A tiny
dispatcher that enumerates nodes live and creates node-pinned Jobs itself
sidesteps both problems and keeps the Helm release O(1) in fleet size.

The dispatcher:
  - enumerates target nodes live (explicit --nodes list or
    --node-selector label selector), paginating the API;
  - stamps out one Job per node from a YAML template, pinning it with
    nodeName and an owner label for server-side filtering;
  - keeps at most --parallelism Jobs in flight, refilling as they finish,
    and sets an OwnerReference to the owner Job so the per-node Jobs are
    garbage-collected with it;
  - is a plain API client (kube): it never touches the host, so it can
    run fully unprivileged.

Node membership is resolved live on each run, not frozen at Helm
template-render time: re-running the dispatcher (e.g. via `helm upgrade`)
picks up nodes added since the last run and skips ones already done, as
the per-node stages are idempotent. The dispatcher is one-shot, however
- it does not watch the API, so nodes added while it is not running are
only covered by the next run.

job.rs holds the pure helpers (node-name sanitization, deterministic Job
naming, template instantiation, status interpretation) with rstest unit
tests; main.rs wires up the CLI and the fan-out loop.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-06-12 18:58:33 +02:00
Alex Lyn
c1ebf269f7 runtime-rs: Add nydus client for nydusd API communication via HTTP
Implement NydusClient to interact with nydusd daemon via Unix
socket:
(1) check_status: query daemon state via GET /api/v1/daemon.
(2) mount/umount: manage filesystem mounts via POST/DELETE
  /api/v1/mount.
(3) wait_until_ready: poll daemon until RUNNING state.

This provides a lightweight, stateless HTTP client layer for nydusd
API.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-11 21:42:48 +02:00
Alex Lyn
6500e018c0 Merge pull request #13093 from RainaYL/rainax/tdx_boot_pr
dragonball: Add steps to boot TDX VM
2026-06-09 10:13:57 +08:00
Steve Horsman
2ac6bb173b Merge pull request #13036 from stevenhorsman/jaeger-to-otlp-tracing-switch
trace-forwarder: migrate from Jaeger to OTLP exporter
2026-06-05 14:30:26 +01:00
Steve Horsman
1624ebe362 Merge pull request #13135 from kata-containers/dependabot/cargo/tar-0.4.46
build(deps): bump tar from 0.4.45 to 0.4.46
2026-06-05 09:44:46 +01:00
stevenhorsman
b737ae48bf trace-forwarder: migrate from Jaeger to OTLP exporter
Migrate trace-forwarder from the deprecated opentelemetry-jaeger
exporter to the modern opentelemetry-otlp exporter.

This change remediates GHSA-2f9f-gq7v-9h6m (CVE-2026-43868), a
medium-severity vulnerability in Apache Thrift. The opentelemetry-jaeger
crate is no longer maintained and depends on vulnerable thrift versions
(0.13.0 and 0.16.0). The opentelemetry-otlp exporter does not use thrift
and is actively maintained.

Changes:
- Replace opentelemetry-jaeger with opentelemetry-otlp in Cargo.toml
- Update tracer.rs to use OTLP exporter instead of Jaeger exporter
- Replace --jaeger-host/--jaeger-port flags with --otlp-endpoint flag
- Update server.rs to use TracerProvider instead of SpanExporter
- Update documentation to reflect OTLP migration
- Add examples for common OTLP-compatible collectors

Breaking change: Users must update their trace-forwarder invocations
to use --otlp-endpoint instead of --jaeger-host and --jaeger-port.

Default endpoint: http://localhost:4317 (OTLP gRPC)

Generated-by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Co-authored-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2026-06-04 19:39:47 +01:00
dependabot[bot]
4ab63d0a5d build(deps): bump tar from 0.4.45 to 0.4.46
Bumps [tar](https://github.com/composefs/tar-rs) from 0.4.45 to 0.4.46.
- [Release notes](https://github.com/composefs/tar-rs/releases)
- [Commits](https://github.com/composefs/tar-rs/compare/0.4.45...0.4.46)

---
updated-dependencies:
- dependency-name: tar
  dependency-version: 0.4.46
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-06-04 07:52:44 +00:00
dependabot[bot]
d155f1a4ab build(deps): bump openssl from 0.10.79 to 0.10.80
Bumps [openssl](https://github.com/rust-openssl/rust-openssl) from 0.10.79 to 0.10.80.
- [Release notes](https://github.com/rust-openssl/rust-openssl/releases)
- [Commits](https://github.com/rust-openssl/rust-openssl/compare/openssl-v0.10.79...openssl-v0.10.80)

---
updated-dependencies:
- dependency-name: openssl
  dependency-version: 0.10.80
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-06-04 07:51:50 +00:00
Fabiano Fidêncio
67843220f8 runtime-rs: set VF admin MAC before vfio-pci rebind for IB/RoCE support
Without an admin MAC the guest mlx5_core inherits whatever firmware-
default MAC the VF was created with. This MAC differs from the IB port
HCA MAC, so mlx5_ib's GID cache refuses to populate
/sys/class/infiniband/mlx5_*/ports/N/gids/*. RoCE appears active but
every verb needing a GID fails.

Before bind_device_to_vfio(), push the CNI-assigned MAC down to the VF
as an "admin MAC" via the parent PF using RTM_SETLINK with
IFLA_VFINFO_LIST — the netlink equivalent of
  ip link set <PF> vf <N> mac <MAC>

The operation runs in a spawn_blocking closure that enters the host
network namespace (via NetnsGuard("/proc/1/ns/net")), since attach() is
called while the thread is inside the pod netns.

Best-effort: failures are logged at warn and the existing agent-side MAC
reconciliation (update_interface in rpc.rs) remains as a fallback for
L2/L3 connectivity.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-05-29 13:07:45 +02:00
Xiaofan Xxf
4f2e893bdb dragonball: Add steps to boot TDX VM
A few ioctls should be invoked before booting a TDX VM.

Major changes:
- While calling KVM_CREATE_VM, use KVM_X86_TDX_VM as vm_type
argument, instead of 0.
- Call KVM_TDX_CAPABILITIES and save the capability info
- Call KVM_TDX_INIT_VM before initializing vcpu mamager, because
TDX module might allow for a different max vcpu number from the
KVM context, and only after calling KVM_TDX_INIT_VM, the correct
value would be set and can be retrieved via KVM_CHECK_EXTENSION,
so that the max vcpu info saved in vcpu manager would be properly
initialized.
- Call KVM_TDX_INIT_VCPU after creating vcpus and parsing TDVF,
because this ioctl requires HOB address as parameter, which is
saved in TDVF metadata.
- Call KVM_TDX_INIT_MEM_REGION after loading TDVF data, linux
kernel, cmdline and HOB list into VM memory.
- Call KVM_TDX_FINALIZE_VM after all previous TDX ioctls.

Also deleted dbs-tdx crate, because we are now using virtee's
tdx crate, instead of maintaining our own utility module.

Signed-off-by: Xiaofan Xxf <xiaofan.xxf@antgroup.com>
2026-05-26 10:35:45 +08:00
Alex Lyn
c3b06af4c7 kata-types: Add gpt_disk module for GPT metadata generation
Introduce gpt_disk.rs to compute GPT partition layouts and generate
metadata files for multi-layer EROFS rootfs. The module creates GPT
head metadata that are combined with EROFS layer images via VMDK
descriptors, presenting a single GPT-partitioned virtual disk to the
guest VM — each EROFS layer mapped to its own partition.

The layout engine calculates LBA positions for an arbitrary number of
EROFS layers, then writes a full protective-MBR + GPT image and extracts
the head (MBR + primary GPT table)  segments as standalone files for
VMDK extent assembly.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-05-25 19:08:31 +08:00
Fabiano Fidêncio
291e4d37be kata-deploy: implement selective tarball extraction in installer
Add zstd and tar as Rust dependencies and rewrite the artifact
installation logic to extract only the component tarballs required by
the enabled runtime classes.

extract_component_tarballs reads shim-components.json to determine which
kata-static-<name>.tar.zst files are needed for the selected shims and
current architecture.  Shared components (e.g. kernel, shim-v2-go) are
listed by multiple shims and must only be unpacked once per install run.
Deduplication is handled with an in-memory set passed through the call,
avoiding any risk of stale on-disk state surviving across pod restarts.

Within each tarball, opt/kata path prefixes are stripped and absolute
symlink / hard-link targets are rewritten to point at the resolved
installation directory, correctly handling MULTI_INSTALL_SUFFIX.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-20 20:52:36 +02:00
stevenhorsman
3466f888db agent-ctl: Move into root workspace
- Add agent-ctl to be a workspace member to simplify the
dependency management.
- Also add a test target as we've been running it in static-checks
without it doing anything

Assisted-by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-05-18 09:47:15 +01:00
Alex Lyn
34dc055da3 Merge pull request #12932 from RainaYL/rainax/tdshim_pr
dragonball: Allow guest VM to load tdshim firmware for booting
2026-05-18 10:43:22 +08:00
Fabiano Fidêncio
d3a9669be5 runtime-rs: implement EncryptedEmptyDirVolume
Add the core volume handler for block-encrypted emptyDir support
in runtime-rs, bringing it to parity with the Go runtime (PR #10559).

When emptydir_mode is set to "block-encrypted", host emptyDir bind
mounts are intercepted and handled as follows:

  1. A sparse disk image (disk.img) is created inside the emptyDir
     folder, sized to match the host filesystem capacity.
  2. A mountInfo.json is written under the kata direct-volume root
     with volume_type "blk", fs_type "ext4", and metadata
     encryptionKey=ephemeral.
  3. The disk image is plugged into the guest VM as a virtio-blk
     device via the hypervisor device manager.
  4. An agent::Storage is built with driver_options containing
     encryption_key=ephemeral and shared=true, so the kata-agent
     delegates formatting and encryption to CDH using LUKS2.

The volume is registered in the dispatch chain before the regular
block-volume check, and ephemeral disk metadata is tracked for
sandbox-level cleanup at teardown.

Also re-exports EMPTYDIR_MODE_* constants from kata-types::config
so downstream crates can reference them.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-05-14 22:56:11 +02:00
Xiaofan Xxf
88d892a77f dragonball: Allow guest VM to load tdshim firmware for booting
Added a firmware module to dbs_boot crate, and guest VM is allowed
to load tdshim into memory, which serves as a prerequisite for
booting TDX VM. And other sections (including kernel payload and
cmdline) are also loaded into correct guest physical addresses
according to the design of tdshim layout.

Signed-off-by: Xiaofan Xxf <xiaofan.xxf@antgroup.com>
2026-05-14 10:04:39 +08:00
Fabiano Fidêncio
346119108e kata-deploy: drop unused kube features
The binary doesn't use kube::runtime (controllers, watchers, reflectors)
or kube::derive (the CustomResource macro). Pulling them in only added
transitive deps (kube-runtime, kube-derive, backon, educe, ahash,
async-broadcast, ...) and inflated the binary's static data segment for
no functional gain.

Set default-features = false and select only what the binary actually
calls into: the kube-client surface plus the rustls-tls backend that
hyper-rustls already pulled in transitively. Behaviour is unchanged.

Fixes: https://github.com/kata-containers/kata-containers/discussions/12976

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-05-07 13:40:55 +02:00
Fabiano Fidêncio
8a33007806 runtime-rs: Add configuration-qemu-nvidia-gpu-tdx-runtime-rs.toml.in
Add a new runtime-rs configuration template that combines the NVIDIA GPU
cold-plug stack with Intel TDX confidential guest support. This is the
runtime-rs counterpart of the Go runtime's configuration-qemu-nvidia-gpu-tdx
template.

The template merges the GPU NV settings (VFIO cold-plug, Pod Resources API,
NV-specific kernel/image/firmware, extended timeouts) with TDX confidential
guest settings (confidential_guest, OVMF.inteltdx.fd firmware, TDX Quote
Generation Service socket, confidential NV kernel and image).

The Makefile is updated with the new config file registration and the
FIRMWARETDVFPATH_NV variable pointing to OVMF.inteltdx.fd.

Also removes a stray tdx_quote_generation_service_socket_port setting
from the SNP GPU template where it did not belong.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-05-07 10:33:26 +02:00
Alex Lyn
4f618d09d5 runtime-rs: Add Pod Resources CDI discovery in sandbox
Query the kubelet Pod Resources API during sandbox setup to discover
which GPU devices have been allocated to the pod. When cold_plug_vfio
is enabled, the sandbox resolves CDI device specs, extracts host PCI
addresses and IOMMU groups from sysfs, and creates VfioModernCfg
device entries that get passed to the hypervisor for cold-plug.

Add pod-resources and cdi crate dependencies to the runtimes and
virt_container workspace members.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-07 10:33:26 +02:00
Alex Lyn
e72ed1c12e runtime-rs: Add VFIO modern device driver
Add the VfioDeviceModern driver for VFIO device passthrough in
runtime-rs. The driver handles device discovery through sysfs, detects
whether the host uses iommufd cdev or legacy VFIO group interfaces,
resolves PCI BDF addresses and IOMMU groups, and implements the Device
and PCIeDevice traits for hypervisor integration.

The module is structured as:
- core.rs: sysfs discovery, BDF parsing, IOMMU group resolution,
  device-node path logic for both iommufd cdev and legacy group paths
- device.rs: VfioDeviceModern/VfioDeviceModernHandle types, Device
  and PCIeDevice trait implementations
- mod.rs: host capability detection (iommufd vs legacy), backend
  selection logic

The DeviceType::VfioModern enum variant and stub PCIeTopology methods
(reserve_bus_for_device, release_bus_for_device) are added so the
driver compiles; full topology wiring follows in a subsequent commit.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-07 10:33:26 +02:00
Alex Lyn
b4768cfc61 dragonball: Adapt VFIO DMA calls to vfio-ioctls 0.6 API
The vfio-ioctls 0.6.0 crate changed the vfio_dma_map signature: the
host address parameter is now a raw pointer (*mut u8) instead of u64,
and the size parameter is usize instead of u64. Since the kernel uses
the host address to set up DMA mappings to physical memory — and the
caller must guarantee the memory behind that pointer remains valid for
the lifetime of the mapping — upstream marked vfio_dma_map as unsafe fn.

Wrap vfio_dma_map calls in unsafe blocks and adjust the type casts
accordingly. vfio_dma_unmap only needed the usize cast for the size
parameter (it does not take a host address, so it remains safe).

Bump workspace dependencies:
- vfio-bindings 0.6.1 -> 0.6.2
- vfio-ioctls 0.5.0 -> 0.6.0

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-07 10:33:26 +02:00
Alex Lyn
0bb9b66815 kata-sys-util: Add PCI helpers for VFIO cold-plug paths
The VFIO cold-plug path needs to resolve a PCI device's sysfs address
from its /dev/vfio/ group or iommufd cdev node. Extend the PCI helpers
in kata-sys-util to support this: add a function that walks
/sys/bus/pci/devices to find a device by its IOMMU group, and expose the
guest BDF that the QEMU command line will reference.

These helpers are consumed by the runtime-rs hypervisor crate when
building VFIO device descriptors for the QEMU command line.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-07 10:33:26 +02:00
Alex Lyn
1e96e75bf3 pod-resources-rs: Add kubelet Pod Resources API client
Add a gRPC client crate that speaks the kubelet PodResourcesLister
service (v1). The runtime-rs VFIO cold-plug path needs this to discover
which GPU devices the kubelet has assigned to a pod so they can be
passed through to the guest before the VM boots.

The crate is intentionally kept minimal: it wraps the upstream
pod_resources.proto, exposes a Unix-domain-socket client, and
re-exports the generated types.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-07 10:33:26 +02:00
dependabot[bot]
8cc9325fee build(deps): bump openssl from 0.10.78 to 0.10.79
Bumps [openssl](https://github.com/rust-openssl/rust-openssl) from 0.10.78 to 0.10.79.
- [Release notes](https://github.com/rust-openssl/rust-openssl/releases)
- [Commits](https://github.com/rust-openssl/rust-openssl/compare/openssl-v0.10.78...openssl-v0.10.79)

---
updated-dependencies:
- dependency-name: openssl
  dependency-version: 0.10.79
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-05-06 10:19:15 +00:00
Fabiano Fidêncio
210ad5de98 runtime-rs: Bump netlinks for Linux 6.17+ IPv6 dev conf RTNetlink
Upgrade netlink-packet-route and rtnetlink so IFLA_INET6_CONF matches the
kernel's 240-byte layout (DEVCONF_FORCE_FORWARDING). Adapt to API changes:
NeighbourAttribute::LinkLayerAddress and bool MulticastSnooping.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-05 13:56:44 +02:00
stevenhorsman
efe62c9280 kata-ctl: Move into root workspace
Add kata-ctl to be a workspace member to simplify the
dependency management.

Assisted-by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-30 08:45:27 +01:00
stevenhorsman
7664ebda7e trace-forwarder: Move into root workspace
Add trace-forwarder to be a workspace member to simplify the
dependency management.

Assisted-by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-29 12:11:04 +01:00
Fabiano Fidêncio
cbd71f534e kata-sys-util: add oci_docker module for Docker netns detection
Docker 26+ with `runtimeType` shims may not include a network
namespace in the OCI spec's `linux.namespaces` and instead uses
`libnetwork-setkey` hooks to communicate the sandbox ID.  Add helpers
to detect Docker containers and resolve the netns path from hook
arguments, matching the Go runtime's `DockerNetnsPath` and
`IsDockerContainer` utilities.

Fixes: #9340

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-28 10:20:18 +02:00
Fabiano Fidêncio
74d9d043f0 agent: raise regorus policy length limits
regorus 0.9.0 introduced a hard, per-engine ceiling on parsed-policy
size (1024 columns / 1 MiB / 20 000 lines, see lexer.rs:30 in
microsoft/regorus). The 1024-column cap rejects realistic policies
emitted by `genpolicy`: the `NVIDIA_REQUIRE_CUDA` environment variable
on `nvcr.io/nvidia/k8s/cuda-sample` is roughly 1.3 KiB on a single line,
so the agent's `set_policy()` returns an error, the agent (PID 1) exits,
the guest kernel reboots, and the runtime eventually times out
connecting to the agent's vsock.

regorus PR #624 ("feat: make policy length limits configurable per
engine") adds `Engine::set_policy_length_config`, but it has not been
released yet -- the latest published version is still 0.9.1, which
predates that change.

Pin `regorus` to the upstream commit that includes #624 and call the
new setter from `AgentPolicy::new_engine()` with values that comfortably
fit any policy we expect to evaluate (64 KiB per line, 16 MiB per file,
200 000 lines) while still rejecting pathological/minified input. Once
a regorus release > 0.9.1 ships with #624, the dependency can be moved
back to crates.io.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-26 10:18:26 +02:00
Markus Rudy
c8fe6a60d0 genpolicy: update regorus to 0.9.1
The version we used before was released in 2024, it's about time to use
a newer version. The new version of the crate comes with a license,
which addresses a `cargo deny` finding.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2026-04-26 10:18:26 +02:00
Steve Horsman
fc359d2140 Merge pull request #12901 from kata-containers/dependabot/cargo/openssl-0.10.78
build(deps): bump openssl from 0.10.76 to 0.10.78
2026-04-25 20:59:51 +01:00
dependabot[bot]
151a797fc0 build(deps): bump openssl from 0.10.76 to 0.10.78
Bumps [openssl](https://github.com/rust-openssl/rust-openssl) from 0.10.76 to 0.10.78.
- [Release notes](https://github.com/rust-openssl/rust-openssl/releases)
- [Commits](https://github.com/rust-openssl/rust-openssl/compare/openssl-v0.10.76...openssl-v0.10.78)

---
updated-dependencies:
- dependency-name: openssl
  dependency-version: 0.10.78
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-25 10:28:48 +00:00
stevenhorsman
d6df75853b versions: Update rustls-webpki to 0.103.13
Simple bump to fix CVE GHSA-82j2-j2ch-gfr8:
Denial of service via panic on malformed CRL BIT STRING

Assisted-by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-25 11:27:02 +01:00
Fabiano Fidêncio
e0927e0e0c Merge pull request #12846 from RainaYL/rainax/split_irqchip_pr
dragonball: Implement userspace IOAPIC to enable split irqchip
2026-04-24 19:07:45 +02:00
Anjana A R K
d2e0e277cc kata-agent: Bump serde-enum-str to v0.5.0
Upgraded the serde-enum-str to v0.5.0 which bumps serde-attributes to 0.3.0 version

Signed-off-by: Anjana A R K <anjana.a.r.k1@ibm.com>
2026-04-24 15:57:59 +05:30
Xiaofan Xxf
fd39117a21 dragonball: Implement userspace IOAPIC to enable split irqchip
From Linux 6.14, creating a TDX VM requires that split irqchip is
enabled. Under this circumstance, device IOAPIC would be managed
in userspace, instead of KVM, so a manager is needed to handle
MMIO read/write to emulated IOAPIC registers.
Also, with split irqchip, irqfd is no longer able to trigger an
interrupt after device IO is completed. Instead, KVM_SIGNAL_MSI
is used for interrupt triggering.

Note that only legacy irq with edge-triggered interrupt is
implemented here. And split irqchip feature is only enabled
when confidential VM type is set to TDX.

Signed-off-by: Xiaofan Xxf <xiaofan.xxf@antgroup.com>
2026-04-24 10:33:05 +08:00
Fupan Li
18378145d2 Merge pull request #12821 from fidencio/topic/runtime-rs-cpu-pinning
runtime-rs: Add vCPU thread pinning support
2026-04-23 16:49:18 +08:00
Markus Rudy
639ff3578d genpolicy: restrict symlinks in CopyFile
Allowing arbitrary symlinks in the shared directory is unsafe for
confidential VM use cases. In order to make CopyFile safe both for the
VM as well for the consuming containers, we implement the following
rules for symlinks (in addition to the existing rules for other files):

1. Symlinks may not be placed directly into the shared directory.
2. Symlinks must not point 'upwards', i.e. contain `..` as a path
   element.
3. Symlinks must be relative.

These rules ensure that all writes initiated by CopyFile are restricted
to the shared directory (protecting the VM), and that symlinks can't
point outside their mount points (protecting the container).

These new restrictions mean that we can't support arbitrary mount
sources (which might not follow these rules), but the usual k8s suspects
(ConfigMap, Secret, ServiceAccountToken) should still pass.

In order to aid writing the policy, we convert the CopyFileRequest to a
structure that does not contain binary data, but well-defined strings
and types.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2026-04-22 15:46:12 +02:00
Fabiano Fidêncio
48669a894e runtime-rs: Add vCPU thread pinning support
Port the Go runtime's enable_vcpus_pinning feature to runtime-rs.

The Go runtime already lets users pin each vCPU thread to a specific
host CPU when the vCPU count matches the sandbox cpuset size, using
sched_setaffinity. This is useful for latency-sensitive workloads that
benefit from eliminating cross-CPU migration of vCPU threads.

The approach mirrors the Go implementation:

After VM start and on every container add/update/delete, we fetch the
vCPU thread IDs (via QMP query-cpus-fast for QEMU), compute the union of
all containers' OCI cpusets, and if the two counts match, pin vCPU i to
cpuset[i]. If they diverge (hotplug, container removal, etc.) we reset
all threads back to the full cpuset so nothing gets stuck on a single
core.

The pinning check lives in CgroupsResourceInner::update_sandbox_cgroups,
which already runs at exactly the right points in the lifecycle. The
enable_vcpus_pinning flag flows from the TOML config through
CgroupConfig into the cgroup resource layer, and can also be overridden
per-pod via the io.katacontainers.config.runtime.enable_vcpus_pinning
annotation.

The QEMU config templates default to false. The NV GPU configs will get
their own default (true) in a follow-up once those templates are added.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-21 12:45:56 +02:00
stevenhorsman
a59afa3154 versions: Update rustls-webpki to 0.103.12
Simple bump to fix CVEs:
- RUSTSEC-2026-0098
- RUSTSEC-2026-0099

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-20 16:24:20 +01:00
stevenhorsman
35be1a938d versions: Bump rand crate where possible
Update all versions of rand that are controlled by us to remediate
GHSA-cq8v-f236-94qc.

Note: There are still some usages of rand 0.8.5 it that are from
transitive dependencies which we can't currently update:
- fail
- phf_generator
- opentelemetry
due to them being archived, or our usage being 17 versions out of date

Also update the rand API breakages e.g. :
- rand::thread_rng() → rand::rng() (function renamed)
- rand::distributions::Alphanumeric → rand::distr::Alphanumeric (module renamed)
- rng.gen_range() → rng.random_range() (function renamed)

Assisted-by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-17 15:58:58 +01:00
Fabiano Fidêncio
9e1f595160 kata-deploy: add Rust binary to root workspace
Add tools/packaging/kata-deploy/binary as a workspace member, inherit shared
dependency versions from the root manifest, and refresh Cargo.lock.

Build the kata-deploy image from the repository root: copy the workspace
layout into the rust-builder stage, run cargo test/build with -p kata-deploy,
and adjust artifact and static asset COPY paths. Update the payload build
script to invoke docker buildx with -f .../Dockerfile from the repo root.

Add a repo-root .dockerignore to keep the Docker build context smaller.
Document running unit tests with cargo test -p kata-deploy from the root.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-07 10:07:06 +08:00
Ruoqing He
2a024f55d0 libs: Move libs into root workspace
Remove libs from exclude list, and move them explicitly into root
workspace to make sure our core components are in a consistent state.

This is a follow up of #12413.

Signed-off-by: Ruoqing He <ruoqing.he@lingcage.com>
2026-04-06 11:03:38 +02:00