Commit Graph

1425 Commits

Author SHA1 Message Date
Markus Rudy
639ff3578d genpolicy: restrict symlinks in CopyFile
Allowing arbitrary symlinks in the shared directory is unsafe for
confidential VM use cases. In order to make CopyFile safe both for the
VM as well for the consuming containers, we implement the following
rules for symlinks (in addition to the existing rules for other files):

1. Symlinks may not be placed directly into the shared directory.
2. Symlinks must not point 'upwards', i.e. contain `..` as a path
   element.
3. Symlinks must be relative.

These rules ensure that all writes initiated by CopyFile are restricted
to the shared directory (protecting the VM), and that symlinks can't
point outside their mount points (protecting the container).

These new restrictions mean that we can't support arbitrary mount
sources (which might not follow these rules), but the usual k8s suspects
(ConfigMap, Secret, ServiceAccountToken) should still pass.

In order to aid writing the policy, we convert the CopyFileRequest to a
structure that does not contain binary data, but well-defined strings
and types.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2026-04-22 15:46:12 +02:00
Markus Rudy
d6bd666b3f agent: fix naming for symlinks in CopyFile
The agent referred to the `data` field of an incoming CopyFileRequest
as the 'src'. This is misleading, because 'source' is not mentioned
in the specification (where links are just a path with attached
bytes), and because the documentation for the `ln` utility calls the
path LINK_NAME and the data TARGET. This commit fixes the glitch and
calls the first argument to `symlinkat` the target.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2026-04-22 15:46:12 +02:00
Markus Rudy
5c362adcff agent: add required features for standalone build
Building the kata-agent-policy crate only succeeded when its parents
(agent and genpolicy) pulled in the required features. This commit adds
the required features to the crate itself, such that it can be built
standalone and IDEs don't show errors while browsing it.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2026-04-22 15:46:12 +02:00
Alex Lyn
ce3473d272 agent: Kill processes before removing container directory in destroy()
When using multi-layer EROFS snapshotter, the destroy() method fails to
kill container processes, causing process leaks in shared PID namespace
scenarios.

Problem Background:
1. Multi-layer EROFS creates temporary mount points under the container's
  root directory:
  - /run/kata-containers/<cid>/multi-layer/upper (ext4, writable)
  - /run/kata-containers/<cid>/multi-layer/lower-0 (EROFS, read-only)
2. The original destroy() method executed in this order:
  (1) umount rootfs
  (2) fs::remove_dir_all(&self.root) <- FAILS with "Read-only file system"
  (3) cgroup cleanup and process killing <- NEVER EXECUTED
3. When remove_dir_all() encounters the read-only EROFS mount point, it
  returns EROFS error (os error 30), causing destroy() to exit early
  without killing processes.

Why This Fix:
1. The test case k8s-kill-all-process-in-container.bats creates an init
  container with a background process (tail -f /dev/null), expecting it
  to be killed when the init container is destroyed.
2. With shared PID namespace (shareProcessNamespace: true), the orphaned
  process continues running, causing the test to fail.

Solution:
1. Reorder the destroy() method to kill processes BEFORE attempting to
  remove the container directory:
  (1) Get PIDs from cgroup and send SIGKILL
  (2) Destroy cgroup
  (3) umount rootfs
  (4) fs::remove_dir_all(&self.root)
2. This ensures processes are always killed regardless of filesystem
  cleanup status, matching the behavior of overlayfs snapshotter.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-04-19 13:24:31 +02:00
Alex Lyn
c745d18e00 agent: Add virtio-scsi for multilayer erofs storage handler
It aims to suppport virtio-scsi driver for handling vmdk and rwlayer
storage in kata-agent.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-04-19 13:24:31 +02:00
Alex Lyn
37a542c20f agent: Refactor multi-layer EROFS handling with unified flow
Refactor the multi-layer EROFS storage handling to improve code
maintainability and reduce duplication.

Key changes:
(1) Extract update_storage_device() to unify device state management
  for both multi-layer and standard storages
(2) Simplify handle_multi_layer_storage() to focus on device creation,
  returning MultiLayerProcessResult struct instead of managing state
(3) Unify the processing flow in add_storages() with clear separation:
(4) Support multiple EROFS lower layers with dynamic lower-N mount paths
(5) Improve mkdir directive handling with deferred {{ mount 1 }}
  resolution

This reduces code duplication, improves readability, and makes the
storage handling logic more consistent across different storage types.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-04-19 13:24:31 +02:00
Alex Lyn
27c59f15a0 agent: Register MultiLayerErofsHandler and process multiple EROFS
Introduce MultiLayerErofsHandler and method of
handle_multi_layer_storage for multi-layer storage:
(1) Register MultiLayerErofsHandler to STORAGE_HANDLERS to handle
multi-layer EROFS storage with driver type 'multi-layer-erofs'.
(2) Add handle_multi_layer_erofs function to process multiple EROFS
storages with X-kata.multi-layer marker together in guest.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-04-19 13:24:31 +02:00
Alex Lyn
6ce9180333 agent: Add support for EROFS rootfs handling in kata-agent
Add multi_layer_erofs.rs implementing guest-side processing logics
of multi-layer EROFS rootfs with overlay mount support.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-04-19 13:24:31 +02:00
Fabiano Fidêncio
64c139208f agent: add GetDiagnosticData RPC with termination log support
Add a new extensible GetDiagnosticData RPC that retrieves diagnostic
information from the guest VM. The request carries a log_type string
field to specify what kind of data is requested, and a container_id
field to identify the target container.

The first supported log_type is "termination_log", which reads the
Kubernetes termination message file from inside the guest. This is
needed for shared_fs=none configurations where the host cannot
directly access the guest filesystem.

On the Go runtime side, the container stop() path now calls
GetDiagnosticData to copy the termination message to the host
when running with NoSharedFS and the terminationMessagePolicy
annotation is set to "File". The call is best-effort: failures
are logged as warnings rather than blocking container teardown.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Silenio Quarti <silenio_quarti@ca.ibm.com>
2026-04-17 13:01:13 +02:00
Fabiano Fidêncio
36a2d8e7f2 agent: Make launch_process_timeout configurable
The hardcoded DEFAULT_LAUNCH_PROCESS_TIMEOUT of 6 seconds in the kata
agent is insufficient for environments with NVIDIA GPUs and NVSwitches,
where the attestation-agent needs significantly more time to collect
evidence during initialization (e.g. ~2 seconds per NVSwitch).

When the timeout expires, the agent (PID 1) exits with an error, causing
the guest kernel to perform an orderly shutdown before the
attestation-agent has finished starting.

Make this timeout configurable via the kernel parameter
agent.launch_process_timeout (in seconds), preserving the 6-second
default for backward compatibility. The Go runtime is wired up to pass
this value from the TOML config's [agent.kata] section through to the
kernel command line.

The NVIDIA GPU configs set the new default to 15 seconds.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-10 14:47:01 +02:00
Alex Lyn
2bac201364 agent: Remove virtio-9p storage handler
Remove the Virtio9pHandler implementation and its registration
from the storage handler manager:
(1) Remove Virtio9pHandler struct and StorageHandler implementation.
(2) Remove DRIVER_9P_TYPE and Virtio9pHandler from STORAGE_HANDLERS
  registration.
(3) Update watcher.rs comments to remove 9p references.
This completes the removal of virtio-9p support in the agent.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-04-07 23:15:39 +02:00
stevenhorsman
5390e470d3 agent: Remove Cargo.lock
Following on from #12690, the agent is part of the repo workspace,
so no longer needs a lock file.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-01 09:11:28 +01:00
Jiahao Wang
1163b6581f agent: Change TARGET_PATH to root workspace
After agent was moved to root workspace, the products are now under the
repo root. Change the TARGET_PATH accordingly to tell Makefile where to
lookup output.

Signed-off-by: Jiahao Wang <jiahao.wang@lingcage.com>
2026-03-29 06:35:54 +00:00
Jiahao Wang
29e5d5d951 build: Move agent to root workspace
This commit adds kata agent to the root workspace, as a follow up work
of #12413.

Remove agent from exclude list, and make it as a member of root
workspace.

Signed-off-by: Jiahao Wang <jiahao.wang@lingcage.com>
2026-03-29 06:35:38 +00:00
Fabiano Fidêncio
aa6890eae1 Merge pull request #12675 from manuelh-dev/mahuber/cdh-storage-options
agent: add mkfs_opts parameter to cdh_secure_mount
2026-03-23 15:18:38 +01:00
stevenhorsman
38a655487f vsock-exporter: Switch bincode for serde_json
bincode is not maintained, so switch to serde_json to
resolve RUSTSEC-2025-0141

Assisted-By: Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-19 10:45:17 +00:00
stevenhorsman
e1d7d5bef8 agent: Remove async-std
It's a dev-dependency that doesn't seem to be used, so
remove it and resolve RUSTSEC-2025-0052

Assisted-By: Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-19 10:45:17 +00:00
stevenhorsman
e4eda5e1d8 agent: Bump tracing-subscriber
- Bump tracing-subscriber to 0.3.20 to resolve RUSTSEC-2025-0055
- Switch deprecated `slog_info!` for `slog::info!`

Generated-By: Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-19 10:45:17 +00:00
Manuel Huber
62d74bb1fd agent: add mkfs_opts parameter to cdh_secure_mount
Add an mkfs_opts parameter to cdh_secure_mount so that its users
can parametrize these options depending on their needs. For now,
there is two users providing explicit values (container image
layer storage and container data storage features).

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-03-18 15:37:32 -07:00
stevenhorsman
501578cc5a agent: Remove non-idiomatic unwrap
Calling .unwrap() after an .is_some() check is considered non-idiomatic in
as it performs redundant work and makes the code more verbose.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-17 16:04:58 +00:00
Manuel Huber
169f92ff09 agent: cdh: Update CDH and API
With the new CDH version, the secure_mount API changes.
Further, the new CDH version no longer uses the luks-encrypt-storage
script but utilizes libcryptsetup as well as mkfs.ext4 and dd. Hence, adapt
some of the CDH and Kata components build steps

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-03-16 09:43:17 -07:00
Aurélien Bombo
9fe03fb170 genpolicy: Support trusted ephemeral data storage
* Introduces a new cluster_config setting encrypted_emptydir defaulting to true.
 * Adapts genpolicy for encrypted emptyDirs.

Crucially, the rules.rego change checks that the mount and the storage are
well-formed together:

 * i_storage.source matches a known regex.
 * i_storage.mount_point == $(spath)/BASE64(i_storage.source)
 * i_storage.mount_point == p_storage.mount_point
 * i_storage.mount_point == i_mount.source

Note that policy enforcement is necessary to prevent rogue device injection.
E.g. the agent could not blindly encrypt all block devices as some use cases
only need dm-verity.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-03-09 14:52:17 -05:00
Aurélien Bombo
eaa711617e agent: Support trusted ephemeral data storage
Handles block-based emptyDirs plugged via virtio-blk and virtio-scsi by
encrypting and formatting them.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-03-09 14:52:17 -05:00
stevenhorsman
8177a440ca libs: Remove unused crates
Remove unused crates to reduce our size and the work needed
to do updates

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-26 09:37:46 +00:00
Alex Lyn
d298df7014 kata-types: Add cross-platform host_memory_mib() helper for host memory
Introduce host_memory_mib() with OS-specific implementations
(Linux/Android via nix::sysinfo,
macOS via sysctl) selected at compile time. This improves
portability and allows consistent host memory sizing/validation
across different platforms.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-25 21:04:26 +08:00
Alex Lyn
b3d60698af runtime-rs: move host memory adjustment into MemoryInfo using nix sysinfo
As the memory related information has been serialized at the sandbox
initalization specially at the moment of parsing configuration toml.

This commit aims to refactor MemoryInfo initialization logics:

(1) Remove memory sizing/host-memory adjustment logic from QEMU cmdline
  Memory::new()
(2) Initialize/adjust memory values via kata-types MemoryInfo (single
  source of truth)
(3) Replace sysinfo::System::new_with_specifics with
  nix::sys::sysinfo::sysinfo() to get host RAM

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-25 19:32:44 +08:00
Jacek Tomasiak
8025fa0457 agent: Don't pass empty options to mount
With some older kernels some fs implementations don't handle empty
options strings well. This leads to failures in "setup rootfs" step.
E.g. `cgroup: cgroup2: unknown option ""`.
This is fixed by mapping empty string to `None` before passing to
`nix::mount`.

Signed-off-by: Jacek Tomasiak <jtomasiak@arista.com>
Signed-off-by: Jacek Tomasiak <jacek.tomasiak@gmail.com>
2026-02-16 14:55:59 +01:00
stevenhorsman
90dbd3f562 agent: Bump rkyv version to 0.7.46
Bump to remediate RUSTSEC-2026-0001

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-14 00:33:45 +01:00
stevenhorsman
ffcb10b6a3 agent: Bump time crate to 0.3.47
Update time to resolve CVE-2026-25727.
Note: this involved bumping the versions of slog-term and slog-json
and bumping the MSRV to 1.88.0 which time 0.3.47 requires.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-09 21:44:51 +01:00
stevenhorsman
e49a61eea2 agent: Bump bytes to 1.11.1
To remediate CVE-2026-25541

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-09 21:43:23 +01:00
Alex Lyn
ffb8a6a9c3 agent: fix misleading tokio::select! biased comment in do_read_stream
The previous comment incorrectly implied that `biased` prevents data
loss and the exit notifier would never be polled before all buffered
data is read. And the detailed info can be seen from the document:
https://docs.rs/tokio/latest/src/tokio/macros/select.rs.html#67

Tokio's `biased` only makes polling order deterministic(top-to-bottom)
when multiple branches are ready in the same poll, and it makes fairness
the caller's responsibility. Output can still be truncated if the exit
notification becomes ready while `read_stream` is pending.

This change updates the comment to reflect the actual semantics and
caveats. No functional behavior change.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-09 15:56:13 +01:00
Alex Lyn
1080f6d87e agent: Introduce drain after exit mechanism to address truncation race
Short-lived processes (e.g., `kubectl exec echo`) in legacy-io mode
occasionally lose the last segments of their output.

The root cause is a race condition where the `term_exit_notifier`
triggers before the pipe buffers are fully drained. In the previous
implementation, once the exit notification was received, the agent
immediately returned an EOF, causing the runtime's `run_io_copy` to
terminate and drop any residual data in the pipe.

This patch introduces a "drain after exit" mechanism:
- Upon receiving an exit notification, the agent enters a 500ms window
  for polling `read_streaim` to flush remaining data from the buffer.
- A true EOF is only returned if the stream is confirmed empty or the
  timeout is reached.

This ensures reliable output delivery for transient exec tasks under
high concurrency.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-09 15:56:13 +01:00
Alex Lyn
700bddeecc agent: treat EOF as normal for read_stdout/stderr stream
Legacy IO uses shim polling via read_stdout/read_stderr. The agent
previously mapped pipe EOF (read() == 0) and term_exit_notifier to
errors ("read meet eof"/"eof"), which became ttrpc INTERNAL failures.
This caused runtime IO copy to abort early, leading to lost
stdout/stderr for short-lived exec (e.g."echo") and spurious failures.

Normalize EOF semantics: read_stream now returns Ok(empty) on EOF
instead of Err("read meet eof").

This makes legacy IO behave like a proper stream: data until EOF, no
INTERNAL errors for normal termination.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-09 15:56:13 +01:00
Zvonko Kaiser
7af306de13 agent: Update aarch64 create_pci_root_bus_path
aarch64 is also a supported architecture for NUMA.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-02-09 10:19:41 +01:00
Zvonko Kaiser
8185c015ad gpu: Add Agent NUMA Support 1 of N
We're introducing a root_complex to assign each
and every device to a NUMA node or to the default
root_complex="00" aka pcie.0. This patch introduces
the proper handling of the current  qom path being
bus/device == "00/02" with NUMAA we need to extend it
with the root_complex/bus/device == "10/00/02".

We're defaulting to root_complex="00".

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-02-09 10:19:41 +01:00
Qingyuan Hou
ca43a8cbb8 agent: remove redundant func comment
This comment was first introduced in e111093 with secure_join()
but then we forgot to remove it when we switched to the safe-path
lib in c0ceaf6

Signed-off-by: Qingyuan Hou <lenohou@gmail.com>
2026-01-27 03:07:57 +00:00
stevenhorsman
78824e0181 agent: Remove unnecessary unwrap
Switch `is_some()` and then `unwrap()` for `if let` which is
"more idiomatic"

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-01-21 08:53:40 +00:00
Greg Kurz
cf3441bd2c agent: Refresh Cargo.lock
Downstream builders at Red Hat complain that `Cargo.lock` doesn't match
`Cargo.toml`.

Run `cargo check` to refresh `Cargo.lock`.

`git bisect` shows that 7cfb97d41b is the first commit where
`cargo check` has an effect in `src/agent`.

Signed-off-by: Greg Kurz <groug@kaod.org>
2026-01-20 14:44:47 +01:00
Manuel Huber
183507beeb agent: change secure_storage_integrity default
Change the secure_storage_integrity option's default value to true.
With this, integrity protection for encrypted block device contents
will be requested from the confidential data hub by default, see the
agent's cdh_handler_trusted_storage function in rpc.rs.
This behavior can be disabled by explicitly setting the
agent.secure_storage_integrity parameter to 0 or false via kernel
command line parameters.

This will affect the trusted storage implementation for the guest-pull
mechanism, and it will affect future implementations using this code
path, such as implementations for ephemeral secure storage.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-01-10 16:54:03 +01:00
stevenhorsman
b07899f8dc agent: Fix uninlined_format_args
Clippy is recommending that format args are inlined for
better clarity, so update our code to remove these warnings

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-22 19:49:17 +00:00
stevenhorsman
2af88dbb48 agent: bump cdi-rs
In #12151 the version was bumped in cargo.toml, but the update not
done, so run `cargo update -p container-device-interface` to apply it

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-20 10:08:45 +00:00
stevenhorsman
0027f6cae0 agent: Fix dead_code warning
VirtioBlkCcwDeviceHandler and VirtioBlkCcwHandler
are only constructed on s390x, so add #[cfg(target_arch = "s390x")]
to all the code

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-18 07:45:27 +00:00
stevenhorsman
9ec7109712 agent: Fix clippy::io_other_error issue
We can use the new Error::other options rather than
Error:new(Error:Kind:Other

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-18 07:45:26 +00:00
stevenhorsman
34d299ae44 vsock-exporter: Fix clippy::io_other_error issue
We can use the new Error::other options rather than
Error:new(Error:Kind:Other

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-18 07:45:26 +00:00
Fabiano Fidêncio
fb326b53df agent: Ensure MS_REMOUNT is respected
When updating ephemeral storages, MS_REMOUNT is explicitly passed as,
for instance, `/dev/shm` should be remounted after memory is hotplugged.

Till now Kata Containers has been explicitly ignoring such updates,
leading to the containers' `/dev/shm` having the size of "half of the
memory allocated, during the startup time", which goes against the
expected behaviour.

Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
2025-12-16 15:11:34 +01:00
Adeet Phanse
db09912808 agent: add SandboxError enum for typed error handling
- Replace generic errors in sandbox operations with typed SandboxError variants (InvalidContainerId, InitProcessNotFound, InvalidExecId).
- This enables the kata shim to handle specific failure cases differently.

Fixes #12120

Signed-off-by: Adeet Phanse <adeet.phanse@mongodb.com>
2025-12-12 12:33:18 -05:00
Zvonko Kaiser
9dfa6df2cb agent: Bump CDI-rs to latest
Latest version of container-device-interface is v0.1.1

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-11-27 22:57:50 +01:00
shwetha-s-poojary
4510e6b49e agent: fix the list_routes failure
relax list_routes tests so not every route requires a device

Signed-off-by: shwetha-s-poojary <shwetha.s-poojary@ibm.com>
2025-11-25 20:25:46 -08:00
Dan Mihai
22d60a36c0 agent: allow disabling detect_initdata_device
Allow users to build the Kata Agent using INIT_DATA=no to disable the
detect_initdata_device() code loop and associated debug log output.

Future additional improvements related to Init Data are tracked by #11532.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2025-11-25 02:44:28 +00:00
dependabot[bot]
ede5ac9c2d build(deps): bump the bit-vec group across 2 directories with 1 update
Bumps the bit-vec group with 1 update in the /src/agent directory: [bit-vec](https://github.com/contain-rs/bit-vec).
Bumps the bit-vec group with 1 update in the /src/tools/agent-ctl directory: [bit-vec](https://github.com/contain-rs/bit-vec).


Updates `bit-vec` from 0.6.3 to 0.8.0
- [Changelog](https://github.com/contain-rs/bit-vec/blob/master/RELEASES.md)
- [Commits](https://github.com/contain-rs/bit-vec/commits)

Updates `bit-vec` from 0.6.3 to 0.8.0
- [Changelog](https://github.com/contain-rs/bit-vec/blob/master/RELEASES.md)
- [Commits](https://github.com/contain-rs/bit-vec/commits)

---
updated-dependencies:
- dependency-name: bit-vec
  dependency-version: 0.8.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: bit-vec
- dependency-name: bit-vec
  dependency-version: 0.8.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: bit-vec
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-11-19 10:43:25 +01:00