Commit Graph

18526 Commits

Author SHA1 Message Date
Alex Lyn
c745d18e00 agent: Add virtio-scsi for multilayer erofs storage handler
It aims to suppport virtio-scsi driver for handling vmdk and rwlayer
storage in kata-agent.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-04-19 13:24:31 +02:00
Alex Lyn
37a542c20f agent: Refactor multi-layer EROFS handling with unified flow
Refactor the multi-layer EROFS storage handling to improve code
maintainability and reduce duplication.

Key changes:
(1) Extract update_storage_device() to unify device state management
  for both multi-layer and standard storages
(2) Simplify handle_multi_layer_storage() to focus on device creation,
  returning MultiLayerProcessResult struct instead of managing state
(3) Unify the processing flow in add_storages() with clear separation:
(4) Support multiple EROFS lower layers with dynamic lower-N mount paths
(5) Improve mkdir directive handling with deferred {{ mount 1 }}
  resolution

This reduces code duplication, improves readability, and makes the
storage handling logic more consistent across different storage types.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-04-19 13:24:31 +02:00
Alex Lyn
27c59f15a0 agent: Register MultiLayerErofsHandler and process multiple EROFS
Introduce MultiLayerErofsHandler and method of
handle_multi_layer_storage for multi-layer storage:
(1) Register MultiLayerErofsHandler to STORAGE_HANDLERS to handle
multi-layer EROFS storage with driver type 'multi-layer-erofs'.
(2) Add handle_multi_layer_erofs function to process multiple EROFS
storages with X-kata.multi-layer marker together in guest.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-04-19 13:24:31 +02:00
Alex Lyn
6ce9180333 agent: Add support for EROFS rootfs handling in kata-agent
Add multi_layer_erofs.rs implementing guest-side processing logics
of multi-layer EROFS rootfs with overlay mount support.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-04-19 13:24:31 +02:00
Alex Lyn
d8db044c63 runtime-rs: Add erofs rootfs handling logic in handler_rootfs
Add handling for multi-layer EROFS rootfs in RootFsResource
handler_rootfs method. It will correctly handle the multi-layers
erofs rootfs.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-04-19 13:24:31 +02:00
Alex Lyn
8d7051436a runtime-rs: Add support for erofs rootfs with multi-layer
Add erofs_rootfs.rs implementing ErofsMultiLayerRootfs for
multi-layer EROFS rootfs with VMDK descriptor generation.

It's the core implementation of Erofs rootfs within runtime.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-04-19 13:24:31 +02:00
Alex Lyn
cb706219ae runtime-rs: Change Rootfs::get_storage return type
Change Rootfs::get_storage to return Option<Vec<Storage>>
to support multi-layer rootfs with multiple storages.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-04-18 22:46:33 +02:00
Alex Lyn
c06bc388c2 runtime-rs: Add format argument to hotplug_block_device method
Add format argument to hotplug_block_device for flexibly specifying
different block formats.
With this, we can support kinds of formats, currently raw and vmdk are
supported, and some other formats will be supported in future.

Aside the formats, the corresponding handling logics are also required
to properly handle its options needed in QMP blockdev-add.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-04-18 22:46:33 +02:00
Alex Lyn
15740439eb runtime-rs: Add BlockDeviceFormat enum to support more block formats
In practice, we need more kinds of block formats, not limited to `Raw`.

This commit aims to add BlockDeviceFormat enum for kinds of block device
formats support, like RAW, VMDK, etc. And it will do some following actions
to make this changes work well, including format field in BlockConfig.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-04-18 19:00:44 +02:00
Alex Lyn
8ed4fa1406 runtime-rs: Add RUNTIME_ALLOW_MOUNTS to RuntimeInfo
Add RUNTIME_ALLOW_MOUNTS annotation to RuntimeInfo to specify
custom mount types allowed by the runtime.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-04-18 19:00:44 +02:00
Fabiano Fidêncio
614cd0618e Merge pull request #12841 from kata-containers/topic/arm-add-qemu-coco-dev
runtime-rs: arm64: ci: Enable qemu-coco-dev tests
2026-04-18 12:22:58 +02:00
Fabiano Fidêncio
edfaeec316 tests: arm64: Skip tests which do not have a multi-arch image
The image used has some special (as weird) properties that are being
taking advantage of to implement policy related tests.

Changing the image is a no-go at this point, otherwise we break the
tests ... so let's just skip those for now.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-18 00:48:13 +02:00
Fabiano Fidêncio
d04bb98e09 runtime-rs: Increase reconnect_timeout_ms for confidential VMs
The Go runtime's CoCo dev config uses dial_timeout = 45s, but all
runtime-rs confidential VM configs had reconnect_timeout_ms set to
3000ms (3s) or 5000ms (SE). This is too short for confidential VMs,
especially on arm64 where UEFI firmware (AAVMF) adds significant
boot time on top of the measured boot process, causing ECONNRESET
errors on the vsock connection before the agent is ready.

Bump reconnect_timeout_ms to 45000ms across all confidential VM
configs (coco-dev, SNP, TDX, SE) to match the Go runtime.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-18 00:48:13 +02:00
Fabiano Fidêncio
35e48fdfd1 ci: run qemu-coco-dev-runtime-rs tests on arm64
Add qemu-coco-dev-runtime-rs to the arm64 k8s test matrix so that the
CoCo non-TEE configuration is exercised on aarch64 runners.

Also enable auto-generated policy for qemu-coco-dev on aarch64 (matching
the existing x86_64 behavior) and register the new job as a required
gatekeeper check.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-18 00:48:13 +02:00
Fabiano Fidêncio
588a67a3fb kata-deploy: add arm64 support for qemu-coco-dev shims
Add aarch64/arm64 to the list of supported architectures for
qemu-coco-dev and qemu-coco-dev-runtime-rs shims across kata-deploy
configuration, Helm chart values, and test helper scripts.

Note that guest-components and the related build dependencies are not
yet wired for arm64 in these configurations; those will be addressed
separately.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-18 00:48:13 +02:00
Fabiano Fidêncio
861f15cdc4 build: add arm64 coco-dev build dependencies
Build coco-guest-components, pause-image, and rootfs-image-confidential
for arm64, which are required by qemu-coco-dev-runtime-rs.

Enable MEASURED_ROOTFS on the arm64 shim-v2 build, add the aarch64 case
to install_kernel() so the default kernel is built as a unified kernel
(with confidential guest support, like x86_64), and adjust the kernel
install naming so only CCA builds get the -confidential suffix.

Also wire rootfs-image-confidential-tarball into the aarch64 local-build
Makefile.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-18 00:48:13 +02:00
Fabiano Fidêncio
e1f8b8e8b4 build: add arm64 tools build (genpolicy only)
The arm64 build workflow was missing the tools build entirely.
Add build-tools-asset and create-kata-tools-tarball jobs mirroring
the amd64 workflow so that genpolicy and the other tools are
available for coco-dev tests that need auto-generated policy.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-18 00:48:02 +02:00
Fabiano Fidêncio
0ee556a40a Merge pull request #12874 from fidencio/topic/nydus-update-to-v0.15.15
versions: Update nydus-snapshotter to v0.15.15
2026-04-17 22:21:34 +02:00
Saul Paredes
6f6e45522e Merge pull request #11562 from Apokleos/clh-initdata
runtime-rs: Add CoCo/protected device for initdata within runtime-rs/Cloud Hypervisor
2026-04-17 11:09:19 -07:00
Fabiano Fidêncio
690f5a2b62 Merge pull request #12862 from fidencio/topic/runtime-rs-enable-measured-rootfs-tests
runtime-rs: enable measured rootfs for qemu-coco-dev-runtime-rs
2026-04-17 18:48:47 +02:00
Fabiano Fidêncio
3512241cbb versions: Update nydus-snapshotter to v0.15.15
The release brings in CVEs & security fixes on nydus-snapshotter deps.
See: https://github.com/containerd/nydus-snapshotter/releases/tag/v0.15.15

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-17 18:04:59 +02:00
Fabiano Fidêncio
1ec0e344e5 runtime-rs: enable measured rootfs for qemu-coco-dev-runtime-rs
Add kernel_verity_params to the qemu-coco-dev-runtime-rs configuration
so the runtime can assemble dm-verity kernel parameters, and remove the
test skip that was disabling measured rootfs tests for this hypervisor.

Fixes: #12851

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-17 15:22:17 +02:00
Fabiano Fidêncio
fd8973d1c0 Merge pull request #11826 from squarti/termination-logs
agent: termination logs for share_fs=none
2026-04-17 15:16:14 +02:00
Fabiano Fidêncio
7205fd8579 tests: add integration tests for termination log via GetDiagnosticData
Add BATS tests for the GetDiagnosticData termination log feature on
CoCo platforms where shared_fs=none.

Three test cases cover:
- Successful exit (exit 0): termination message is propagated when
  GetDiagnosticDataRequest is allowed by policy.
- Failed exit (exit 1): termination message is propagated when
  GetDiagnosticDataRequest is allowed by policy.
- Policy denied: with default CoCo policy (GetDiagnosticDataRequest
  is false), the container stops cleanly but no termination message
  is propagated (best-effort behavior).

Tests are skipped on non-CoCo platforms where shared_fs is not "none".

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-17 13:16:25 +02:00
Fabiano Fidêncio
eda3bc6190 runtime-rs: wire GetDiagnosticData for termination logs
Add runtime-rs support for the GetDiagnosticData RPC. This extends
the Agent trait, types, and protocol translation layer with the new
request/response types.

During container stop, when shared_fs is "none" and the
terminationMessagePolicy annotation is "File", the runtime copies
the termination log from the guest via GetDiagnosticData. The call
is best-effort to avoid blocking container teardown.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-17 13:16:25 +02:00
Fabiano Fidêncio
411f8cf583 genpolicy: policy-gate GetDiagnosticDataRequest
Add policy rules for the new GetDiagnosticDataRequest RPC.
The request is denied by default in genpolicy-generated policies,
ensuring CoCo workloads do not expose diagnostic data unless
explicitly opted in via policy_data.request_defaults.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Silenio Quarti <silenio_quarti@ca.ibm.com>
2026-04-17 13:16:25 +02:00
Fabiano Fidêncio
64c139208f agent: add GetDiagnosticData RPC with termination log support
Add a new extensible GetDiagnosticData RPC that retrieves diagnostic
information from the guest VM. The request carries a log_type string
field to specify what kind of data is requested, and a container_id
field to identify the target container.

The first supported log_type is "termination_log", which reads the
Kubernetes termination message file from inside the guest. This is
needed for shared_fs=none configurations where the host cannot
directly access the guest filesystem.

On the Go runtime side, the container stop() path now calls
GetDiagnosticData to copy the termination message to the host
when running with NoSharedFS and the terminationMessagePolicy
annotation is set to "File". The call is best-effort: failures
are logged as warnings rather than blocking container teardown.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Silenio Quarti <silenio_quarti@ca.ibm.com>
2026-04-17 13:01:13 +02:00
Steve Horsman
1db12f8ccf Merge pull request #12812 from stevenhorsman/tee-test-refactor
ci: Refactor confidential TEE support
2026-04-17 11:12:13 +01:00
Steve Horsman
e4b3ba56dd Merge pull request #12855 from stevenhorsman/increase-stale-issues-frequency
ci: increase stale issues workflow frequency
2026-04-17 08:37:20 +01:00
stevenhorsman
1dc57c6cef ci: increase stale issues workflow frequency
Update the stale issues workflow to run more frequently:
- Weekdays: Every 4 hours (6x per day) at 00:00, 06:00, 12:00, 18:00 UTC
- Weekends: Every hour (24x per day)

Previously ran once daily at midnight UTC. This change reduces the time
it will take for us to get through our backlog, particularly increasing
the runs at the weekend, when we should have less other CI running,
which it could impact due to GH API rate limiting.

Assisted-by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-16 20:50:38 +01:00
Fabiano Fidêncio
d9128a58d9 Merge pull request #11611 from Xynnn007/docs-typo
docs: fix nerdctl guest image command
2026-04-16 15:36:37 +02:00
Fabiano Fidêncio
57ce3a1347 Merge pull request #11364 from kata-containers/dependabot/github_actions/tim-actions/wip-check-1.1.0
build(deps): bump tim-actions/w.i.p.-check from 1.0.0 to 1.1.0
2026-04-16 14:11:12 +02:00
Fabiano Fidêncio
78a8133112 Merge pull request #12242 from stevenhorsman/msrv-current-thoughts
doc: Add MSRV comments to toolchain guidance
2026-04-16 14:09:30 +02:00
Fabiano Fidêncio
88ce64819d Merge pull request #12726 from LandonTClipp/doc_annotations
docs: Add annotation config to doc site
2026-04-16 13:07:53 +02:00
stevenhorsman
05430d5690 doc: Add MSRV comments to toolchain guidance
Add some extra clarification about our current position on
MSRV.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-16 12:06:46 +01:00
Fabiano Fidêncio
beb06573fa Merge pull request #12790 from kata-containers/dependabot/cargo/src/tools/kata-ctl/tracing-0d2b5df27c
build(deps): bump tracing from 0.1.41 to 0.1.44 in /src/tools/kata-ctl in the tracing group across 1 directory
2026-04-16 12:52:05 +02:00
dependabot[bot]
c044403409 build(deps): bump tim-actions/wip-check from 1.0.0 to 1.1.0
Bumps [tim-actions/wip-check](https://github.com/tim-actions/wip-check) from 1.0.0 to 1.1.0.
- [Release notes](https://github.com/tim-actions/wip-check/releases)
- [Commits](1c2a1ca6c1...8c84f59872)

---
updated-dependencies:
- dependency-name: tim-actions/wip-check
  dependency-version: 1.1.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-16 10:48:41 +00:00
Xynnn007
1d806e0cfa docs: fix nerdctl guest image command
the image name is delivered via annotation than label in nerdctl >= 2.0
version.

See the release note
https://github.com/containerd/nerdctl/releases/tag/v2.0.0

and PR
https://github.com/containerd/nerdctl/pull/2906

If an old version of nerdctl (< 2.0), --label will still work.

Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
2026-04-16 11:34:03 +02:00
stevenhorsman
ff246f9538 ci: Remove deploy_snapshotter
Snapshotter deployment is a no-op now that
kata-deploy handles this, so clean up this code.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-16 09:21:04 +01:00
stevenhorsman
fce6415865 tests: Use hypervisor helpers
Utilise the new hypervisor helpers in our CI and test
code to help add clarity and reduce duplication

Note: `kubernetes_dir` is declared as readonly in
tests/integration/kubernetes/setup.sh which is sourced
by tests_common.sh, so we update it to only be set if
unset

Assisted-by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-16 09:21:04 +01:00
stevenhorsman
2f3fec9727 tests: Add new hypervisor helper script
Add a pure shell script which the CI and integration tests can
use to check for different categories of runtime

Assisted-by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-16 09:21:04 +01:00
Alex Lyn
c546b3c585 Merge pull request #12843 from microsoft/saul/build-opt
runtime-rs: add build optimization flags
2026-04-16 09:05:20 +08:00
Dan Mihai
c967b45996 Merge pull request #12838 from kata-containers/sprt/new-az-region
ci: Change Azure region to eastus2
2026-04-15 16:08:21 -07:00
Aurélien Bombo
1602e04b2d ci: Change Azure region to eastus2
I'm doing some bookkeeping in the Azure subscription that requires we move
from eastus to eastus2. This should have no user-facing impact.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-04-15 14:37:13 -05:00
Fabiano Fidêncio
19441e5515 Merge pull request #12844 from Apokleos/fix-warning
runtime-rs: Fix unformatted code in runtime-rs
2026-04-15 17:35:03 +02:00
Fabiano Fidêncio
d2fb22edbe Merge pull request #12847 from fidencio/topic/ci-adjust-timeout-for-k8s-tests
ci: k8s: Adjust timeout on free runners
2026-04-15 17:30:51 +02:00
Fabiano Fidêncio
8d6f1d6f34 ci: k8s: Adjust timeout on free runners
I've seen several cases of the CLH tests just being killed due to the 60
minutes timeout. Let's bump it to 75 and see how it goes.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-15 17:09:30 +02:00
dependabot[bot]
bbb037e025 build(deps): bump the tracing group across 1 directory with 1 update
Bumps the tracing group with 1 update in the /src/tools/kata-ctl directory: [tracing](https://github.com/tokio-rs/tracing).


Updates `tracing` from 0.1.41 to 0.1.44
- [Release notes](https://github.com/tokio-rs/tracing/releases)
- [Commits](https://github.com/tokio-rs/tracing/compare/tracing-0.1.41...tracing-0.1.44)

Updates `tracing` from 0.1.41 to 0.1.44
- [Release notes](https://github.com/tokio-rs/tracing/releases)
- [Commits](https://github.com/tokio-rs/tracing/compare/tracing-0.1.41...tracing-0.1.44)

---
updated-dependencies:
- dependency-name: tracing
  dependency-version: 0.1.44
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: tracing
- dependency-name: tracing
  dependency-version: 0.1.44
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: tracing
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-15 15:06:48 +00:00
LandonTClipp
fd896e4e76 ci: Add kata-dictionary.txt to required_tests.yaml
This makes it so that changes to the kata-dictionary.txt file only trigger the
static checks to run.

Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>
2026-04-15 14:48:01 +01:00
LandonTClipp
56cdfa831f docs: Add annotation config to doc site
Adding the pod annotation config to the doc site. A symlink is created
at docs/pod-annotations.md that points to
how-to/how-to-set-sandbox-config-kata.md so that the URL for this file will be
created at `/pod-annotations`. Also adding brief contrbuting guidelines and
how-to's for running the documentation site locally for local previews.

Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>
2026-04-15 14:48:01 +01:00