32 Commits

Author SHA1 Message Date
stevenhorsman
531877f28f deps: Upgrade nix crate from 0.26.4 to 0.31.3
Upgrade the nix crate across the workspace to version 0.30.1 to address
security vulnerabilities and adopt safer file descriptor handling patterns.

### Breaking Changes in nix 0.28.0

1. **File Descriptor Type Changes**
   - Functions now return `OwnedFd` instead of `RawFd` (i32)
   - Functions requiring file descriptors now expect types implementing `AsFd` trait
   - This provides RAII-based automatic cleanup and prevents fd leaks

2. **API Signature Changes**
   - `pipe()`, `pipe2()`, `openpty()` now return `OwnedFd` tuples
   - `socket()` returns `OwnedFd` instead of `RawFd`
   - `open()`, `memfd_create()` return `OwnedFd`
   - `setns()`, `write()`, `fcntl()` require `AsFd` trait
   - `madvise()` requires `NonNull<c_void>` instead of raw pointer
   - `bind()`, `listen()`, `connect()` require `AsFd` and `Backlog` type

3. **Module Feature Flags**
   - Modules now require explicit feature flags (mman, reboot, etc.)

### Additional Breaking Changes in nix 0.30.1

1. **symlinkat() API Change**
   - `dirfd` parameter now requires `AsFd` trait instead of `Option<RawFd>`
   - Use `BorrowedFd::borrow_raw(libc::AT_FDCWD)` for current directory

2. **Type Alias Deprecation**
   - `MemFdCreateFlag` renamed to `MFdFlags` for consistency

### Changes Made

**Workspace Configuration (Cargo.toml)**
- Updated nix to 0.30.1 with features: fs, mount, sched, process, ioctl,
  signal, socket, feature, user, hostname, term, event, mman, reboot

**File Descriptor Handling Patterns**
- Use `BorrowedFd::borrow_raw(raw_fd)` to wrap RawFd for AsFd requirements
- Use `.as_fd().as_raw_fd()` to extract raw fd without ownership transfer
- Use `.into_raw_fd()` only when ownership transfer is needed
- Use `NonNull::new().unwrap()` for madvise pointer conversion

**Deprecated API Replacements**
- `eventfd()` → `EventFd::from_value_and_flags()`
- `Errno::from_i32()` → `Errno::from_raw()`
- `listen(fd, backlog)` → `listen(&fd, Backlog::new(backlog).unwrap())`
- `MemFdCreateFlag` → `MFdFlags`

Generated by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-06-19 03:49:16 -07:00
Alex Lyn
3095bd379b runtime-rs: Introduce cancellation for OOM watcher during teardown
This commit introduces an explicit cancellation mechanism for the OOM
watcher loop within VirtSandbox. This addresses the issue where the
watcher continues to poll for OOM events even when the sandbox is being
stopped, leading to spurious "Connection reset by peer" errors.

Key changes:
(1) A CancellationToken is added to VirtSandbox to signal the watcher
loop when the sandbox is undergoing teardown.
(2) The OOM watcher loop in VirtSandbox::start() is now wrapped in a
tokio::select! statement. This allows it to concurrently listen for
two events:
- cancel_token.cancelled(): Triggered when the sandbox/VM is stopping.
- agent.get_oom_event(): The regular OOM event polling.
(3) In the sandbox stop/teardown path, cancel_token.cancel() is called
before stopping the VM. This ensures the OOM watcher loop exits cleanly
via the cancellation token, preventing the occurrence of ECONNRESET/EOF
errors on a closed channel.

This change improves the robustness of OOM event handling during sandbox
lifecycle management.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-16 12:56:54 +08:00
Fabiano Fidêncio
87d27e0cc8 kata-deploy-job-dispatcher: add generic per-node Job dispatcher
Add a small, deployment-agnostic dispatcher binary that runs exactly one
Kubernetes Job per selected node and paces the rollout, so callers get
guaranteed per-node coverage without encoding the fan-out in Helm.

Motivation: templating one Job per node into a Helm release does not
scale (the release Secret hits etcd's 1 MiB limit and hooks run
sequentially), and a single Indexed Job cannot guarantee per-node
coverage when paced - the scheduler ignores completed pods when
evaluating topology spread, so nodes get uneven numbers of pods. A tiny
dispatcher that enumerates nodes live and creates node-pinned Jobs itself
sidesteps both problems and keeps the Helm release O(1) in fleet size.

The dispatcher:
  - enumerates target nodes live (explicit --nodes list or
    --node-selector label selector), paginating the API;
  - stamps out one Job per node from a YAML template, pinning it with
    nodeName and an owner label for server-side filtering;
  - keeps at most --parallelism Jobs in flight, refilling as they finish,
    and sets an OwnerReference to the owner Job so the per-node Jobs are
    garbage-collected with it;
  - is a plain API client (kube): it never touches the host, so it can
    run fully unprivileged.

Node membership is resolved live on each run, not frozen at Helm
template-render time: re-running the dispatcher (e.g. via `helm upgrade`)
picks up nodes added since the last run and skips ones already done, as
the per-node stages are idempotent. The dispatcher is one-shot, however
- it does not watch the API, so nodes added while it is not running are
only covered by the next run.

job.rs holds the pure helpers (node-name sanitization, deterministic Job
naming, template instantiation, status interpretation) with rstest unit
tests; main.rs wires up the CLI and the fan-out loop.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-06-12 18:58:33 +02:00
Greg Kurz
eac5dd2907 generate_vendor: Fix heavily broken logic
While checking the content of the vendor tarball artifact in the 3.31.0
release page, I realized that it is lacking most of the rust code and
all the go code. It turns out that the script is badly broken in many
ways :

1. Cargo workspace conflicts: Vendored dependencies were treated as
   workspace members, causing "current package believes it's in a
   workspace when it's not" errors. Fixed by adding vendor directory
   exclusions to root Cargo.toml.

2. Missing Go vendoring: Script only searched for Cargo.lock files,
   never processing go.mod files despite having a case statement for
   them. Fixed by adding go.mod to the find command with '-o -name go.mod'.

3. Wrong tar execution directory: Script ran tar from release/ directory
   but vendor_dir_list contained paths relative to repo root (./vendor,
   ./src/agent/vendor, etc.), causing "Cannot stat" errors. Fixed by
   moving tar command before final popd.

4. Relative tarball path: Since tar now runs from repo root, converted
   tarball path to absolute to ensure it's created in the release
   directory.

5. Vendored go.mod pollution: Added '-path ./vendor -prune' to find
   command to exclude vendor directory, preventing the script from
   finding go.mod files inside vendored Rust dependencies.

The fixes are simple enough they can be squashed into a single
commit.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Signed-off-by: Greg Kurz <groug@kaod.org>
2026-06-12 10:06:53 +02:00
Alex Lyn
6500e018c0 Merge pull request #13093 from RainaYL/rainax/tdx_boot_pr
dragonball: Add steps to boot TDX VM
2026-06-09 10:13:57 +08:00
stevenhorsman
9625bf8056 versions: Update MSRV to 1.94
With the bump to 1.94, we are now relying on some 1.94+
apis, so update the MSRV to reflect this

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-06-01 17:02:20 +01:00
Xiaofan Xxf
4f2e893bdb dragonball: Add steps to boot TDX VM
A few ioctls should be invoked before booting a TDX VM.

Major changes:
- While calling KVM_CREATE_VM, use KVM_X86_TDX_VM as vm_type
argument, instead of 0.
- Call KVM_TDX_CAPABILITIES and save the capability info
- Call KVM_TDX_INIT_VM before initializing vcpu mamager, because
TDX module might allow for a different max vcpu number from the
KVM context, and only after calling KVM_TDX_INIT_VM, the correct
value would be set and can be retrieved via KVM_CHECK_EXTENSION,
so that the max vcpu info saved in vcpu manager would be properly
initialized.
- Call KVM_TDX_INIT_VCPU after creating vcpus and parsing TDVF,
because this ioctl requires HOB address as parameter, which is
saved in TDVF metadata.
- Call KVM_TDX_INIT_MEM_REGION after loading TDVF data, linux
kernel, cmdline and HOB list into VM memory.
- Call KVM_TDX_FINALIZE_VM after all previous TDX ioctls.

Also deleted dbs-tdx crate, because we are now using virtee's
tdx crate, instead of maintaining our own utility module.

Signed-off-by: Xiaofan Xxf <xiaofan.xxf@antgroup.com>
2026-05-26 10:35:45 +08:00
Alex Lyn
c3b06af4c7 kata-types: Add gpt_disk module for GPT metadata generation
Introduce gpt_disk.rs to compute GPT partition layouts and generate
metadata files for multi-layer EROFS rootfs. The module creates GPT
head metadata that are combined with EROFS layer images via VMDK
descriptors, presenting a single GPT-partitioned virtual disk to the
guest VM — each EROFS layer mapped to its own partition.

The layout engine calculates LBA positions for an arbitrary number of
EROFS layers, then writes a full protective-MBR + GPT image and extracts
the head (MBR + primary GPT table)  segments as standalone files for
VMDK extent assembly.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-05-25 19:08:31 +08:00
stevenhorsman
3466f888db agent-ctl: Move into root workspace
- Add agent-ctl to be a workspace member to simplify the
dependency management.
- Also add a test target as we've been running it in static-checks
without it doing anything

Assisted-by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-05-18 09:47:15 +01:00
Fabiano Fidêncio
d3a9669be5 runtime-rs: implement EncryptedEmptyDirVolume
Add the core volume handler for block-encrypted emptyDir support
in runtime-rs, bringing it to parity with the Go runtime (PR #10559).

When emptydir_mode is set to "block-encrypted", host emptyDir bind
mounts are intercepted and handled as follows:

  1. A sparse disk image (disk.img) is created inside the emptyDir
     folder, sized to match the host filesystem capacity.
  2. A mountInfo.json is written under the kata direct-volume root
     with volume_type "blk", fs_type "ext4", and metadata
     encryptionKey=ephemeral.
  3. The disk image is plugged into the guest VM as a virtio-blk
     device via the hypervisor device manager.
  4. An agent::Storage is built with driver_options containing
     encryption_key=ephemeral and shared=true, so the kata-agent
     delegates formatting and encryption to CDH using LUKS2.

The volume is registered in the dispatch chain before the regular
block-volume check, and ephemeral disk metadata is tracked for
sandbox-level cleanup at teardown.

Also re-exports EMPTYDIR_MODE_* constants from kata-types::config
so downstream crates can reference them.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-05-14 22:56:11 +02:00
Fabiano Fidêncio
52e6a19253 kata-deploy: size-optimise the release profile
Apply per-package release-profile overrides for the kata-deploy crate
only:

  opt-level = "z"     # optimise for size, not speed
  codegen-units = 1   # let LLVM see the whole crate when inlining

The binary is throwaway: it runs once at DaemonSet pod start, finishes
the install in seconds, and then sits idle waiting for SIGTERM. There
is no hot path to optimise for speed, so trading a bit of compile time
and a few percent of CPU for a meaningfully smaller text segment is the
right call here.

These overrides live at the workspace root and are scoped via
[profile.release.package."kata-deploy"], so they do not affect the
agent, runtime-rs, dragonball, or any of the libs / tools crates.

Fixes: https://github.com/kata-containers/kata-containers/discussions/12976

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-05-07 13:40:55 +02:00
Alex Lyn
b4768cfc61 dragonball: Adapt VFIO DMA calls to vfio-ioctls 0.6 API
The vfio-ioctls 0.6.0 crate changed the vfio_dma_map signature: the
host address parameter is now a raw pointer (*mut u8) instead of u64,
and the size parameter is usize instead of u64. Since the kernel uses
the host address to set up DMA mappings to physical memory — and the
caller must guarantee the memory behind that pointer remains valid for
the lifetime of the mapping — upstream marked vfio_dma_map as unsafe fn.

Wrap vfio_dma_map calls in unsafe blocks and adjust the type casts
accordingly. vfio_dma_unmap only needed the usize cast for the size
parameter (it does not take a host address, so it remains safe).

Bump workspace dependencies:
- vfio-bindings 0.6.1 -> 0.6.2
- vfio-ioctls 0.5.0 -> 0.6.0

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-07 10:33:26 +02:00
Alex Lyn
1e96e75bf3 pod-resources-rs: Add kubelet Pod Resources API client
Add a gRPC client crate that speaks the kubelet PodResourcesLister
service (v1). The runtime-rs VFIO cold-plug path needs this to discover
which GPU devices the kubelet has assigned to a pod so they can be
passed through to the guest before the VM boots.

The crate is intentionally kept minimal: it wraps the upstream
pod_resources.proto, exposes a Unix-domain-socket client, and
re-exports the generated types.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-07 10:33:26 +02:00
stevenhorsman
efe62c9280 kata-ctl: Move into root workspace
Add kata-ctl to be a workspace member to simplify the
dependency management.

Assisted-by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-30 08:45:27 +01:00
stevenhorsman
7664ebda7e trace-forwarder: Move into root workspace
Add trace-forwarder to be a workspace member to simplify the
dependency management.

Assisted-by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-29 12:11:04 +01:00
Xiaofan Xxf
fd39117a21 dragonball: Implement userspace IOAPIC to enable split irqchip
From Linux 6.14, creating a TDX VM requires that split irqchip is
enabled. Under this circumstance, device IOAPIC would be managed
in userspace, instead of KVM, so a manager is needed to handle
MMIO read/write to emulated IOAPIC registers.
Also, with split irqchip, irqfd is no longer able to trigger an
interrupt after device IO is completed. Instead, KVM_SIGNAL_MSI
is used for interrupt triggering.

Note that only legacy irq with edge-triggered interrupt is
implemented here. And split irqchip feature is only enabled
when confidential VM type is set to TDX.

Signed-off-by: Xiaofan Xxf <xiaofan.xxf@antgroup.com>
2026-04-24 10:33:05 +08:00
stevenhorsman
35be1a938d versions: Bump rand crate where possible
Update all versions of rand that are controlled by us to remediate
GHSA-cq8v-f236-94qc.

Note: There are still some usages of rand 0.8.5 it that are from
transitive dependencies which we can't currently update:
- fail
- phf_generator
- opentelemetry
due to them being archived, or our usage being 17 versions out of date

Also update the rand API breakages e.g. :
- rand::thread_rng() → rand::rng() (function renamed)
- rand::distributions::Alphanumeric → rand::distr::Alphanumeric (module renamed)
- rng.gen_range() → rng.random_range() (function renamed)

Assisted-by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-17 15:58:58 +01:00
dependabot[bot]
bbb037e025 build(deps): bump the tracing group across 1 directory with 1 update
Bumps the tracing group with 1 update in the /src/tools/kata-ctl directory: [tracing](https://github.com/tokio-rs/tracing).


Updates `tracing` from 0.1.41 to 0.1.44
- [Release notes](https://github.com/tokio-rs/tracing/releases)
- [Commits](https://github.com/tokio-rs/tracing/compare/tracing-0.1.41...tracing-0.1.44)

Updates `tracing` from 0.1.41 to 0.1.44
- [Release notes](https://github.com/tokio-rs/tracing/releases)
- [Commits](https://github.com/tokio-rs/tracing/compare/tracing-0.1.41...tracing-0.1.44)

---
updated-dependencies:
- dependency-name: tracing
  dependency-version: 0.1.44
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: tracing
- dependency-name: tracing
  dependency-version: 0.1.44
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: tracing
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-15 15:06:48 +00:00
Fabiano Fidêncio
9e1f595160 kata-deploy: add Rust binary to root workspace
Add tools/packaging/kata-deploy/binary as a workspace member, inherit shared
dependency versions from the root manifest, and refresh Cargo.lock.

Build the kata-deploy image from the repository root: copy the workspace
layout into the rust-builder stage, run cargo test/build with -p kata-deploy,
and adjust artifact and static asset COPY paths. Update the payload build
script to invoke docker buildx with -f .../Dockerfile from the repo root.

Add a repo-root .dockerignore to keep the Docker build context smaller.
Document running unit tests with cargo test -p kata-deploy from the root.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-07 10:07:06 +08:00
Ruoqing He
2a024f55d0 libs: Move libs into root workspace
Remove libs from exclude list, and move them explicitly into root
workspace to make sure our core components are in a consistent state.

This is a follow up of #12413.

Signed-off-by: Ruoqing He <ruoqing.he@lingcage.com>
2026-04-06 11:03:38 +02:00
Jiahao Wang
29e5d5d951 build: Move agent to root workspace
This commit adds kata agent to the root workspace, as a follow up work
of #12413.

Remove agent from exclude list, and make it as a member of root
workspace.

Signed-off-by: Jiahao Wang <jiahao.wang@lingcage.com>
2026-03-29 06:35:38 +00:00
Fupan Li
d0f0dc2008 dragonball: fix the dbs-virtio-devices compiled errors
Update dbs-virtio-devices to compile with:
- virtio-bindings 0.2.x: VIRTIO_F_VERSION_1, VIRTIO_F_NOTIFY_ON_EMPTY,
  VIRTIO_F_RING_PACKED moved from virtio_blk/virtio_net/virtio_ring to
  virtio_config module.
- virtio-queue 0.17.0: Descriptor no longer exported at top level, use
  desc::split::Descriptor instead.
- vhost 0.15.0: Master->Frontend, VhostUserMaster->VhostUserFrontend,
  MasterReqHandler->FrontendReqHandler,
  VhostUserMasterReqHandler->VhostUserFrontendReqHandler,
  SLAVE_REQ->BACKEND_REQ, SLAVE_SEND_FD->BACKEND_SEND_FD,
  set_slave_request_fd->set_backend_request_fd.
  FS slave messages (VhostUserFSSlaveMsg etc.) removed from vhost crate;
  SlaveReqHandler now implements VhostUserFrontendReqHandler with
  handle_config_change only.
- fuse-backend-rs 0.14.0: Handle CachePolicy::Metadata variant,
  fix get_rootfs() returning tuple, use buffer-based I/O for Ufile
  since ReadVolatile/WriteVolatile are not implemented for Box<dynUfile>.
- vm-memory 0.17.1: GuestRegionMmap::new returns Option instead of
  Result, mmap::Error removed.

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2026-03-12 10:58:03 +00:00
Fupan Li
8d09a0e7e7 runtime-rs: Bump the rust-vmm related crates
vm-memory 0.10.0 → =0.17.1
vmm-sys-util 0.11.0 → 0.15.0
kvm-bindings 0.6.0 → 0.14.0
kvm-ioctls =0.12.1 → 0.24.0
virtio-queue 0.7.0 → 0.17.0
virtio-bindings 0.1.0 → 0.2.0
fuse-backend-rs 0.10.5 → 0.14.0

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2026-03-12 10:58:03 +00:00
Markus Rudy
8dfeeea924 genpolicy: add to Cargo workspace
This commit adds the genpolicy utility to the root workspace. For now,
only dependencies that are already in the root workspace are consumed
from there, the genpolicy-specific ones should be added later.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2026-03-11 09:30:46 +01:00
stevenhorsman
c456b84537 versions: Bump sha2 crate version
sha2 0.9.3 includes the use of cpuid-bool, which was renamed to cpufeatures
around 5 years ago. Try moving to a workspace dependency of sha2
and bumping to the latest version to remediate RUSTSEC-2021-0064

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-01-06 15:41:34 +00:00
stevenhorsman
1d139a7c92 versions: Bump rust to 1.88
In prep for the bump to rust 1.90, try bumping
to 1.88 first to see if the CI is successful here

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-22 19:50:19 +00:00
Fabiano Fidêncio
9d88c6b1d7 kata-deploy: Oxidize the script
kata-deploy shell script is not THAT bad and, to be honest, it's quite
handy for quick hacks and quick changes.  However, it's been
increasingly becoming harder to maintain as it's grown its scope from a
testing tool to the proper project's front door, lacking unit tests, and
with an abundacy of complex regular expressions and bashisms to be able
to properly parse the environment variables it consumes.

Morever, the fact it is a Frankstein's monster glued together using
python packages, golang binaries, and a distro dependent container makes
the situation VERY HARD to use it from a distroless container (thus,
avoiding security issues), preventing further integration with
components that require a higher standard of security than we've been
requiring.

With everything said, with the help of Cursor (mostly on generating the
tests cases), here comes the oxidized version of the script, which runs
from a distroless container image.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-17 09:57:02 +01:00
Ruoqing He
beb0cac0d1 build: Move runtime-rs to root workspace
This is a follow-up of 3fbe693.

Remove runtime-rs from exclude list, and make it as a member of root
workspace.

Specify shim and shim-ctl as the binary of runtime-rs package, make
runtime-rs and all its members into root workspace.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-12-16 11:26:07 +01:00
Ruoqing He
54bfbf5687 build: Exclude tools from root workspace
There are rust packages being cloned and built inside
tools/packaging/kata-deploy/local-build/build folder, which may mislead
those packages to think they are part of the kata root workspace.
Exclude the directory to avoid that.

Reported-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-11-19 15:49:25 +01:00
Ruoqing He
e6b24cd789 build: Exclude crates with no workspace setup
Crates with no workspace setup would think themselves are in the root
workspace, which our root workspace is not ready for them. Excluding
them for now.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-11-18 01:39:48 +00:00
Ruoqing He
6068242bf1 build: Move dragonball to root workspace
Move dragonball and all its member of that workspace into root
workspace.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-11-18 01:39:48 +00:00
Ruoqing He
3fbe693658 build: Introduce root workspace for rust components
Add Cargo.toml at repo root, use this root workspace for as many as
possible Rust components of Kata Containers. This would enable us to
share a common Cargo.lock file, and reduce the noise from dependabot.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-11-18 01:39:48 +00:00