The erofs snapshotter configuration is node-wide (a single containerd
drop-in) and cannot be split per runtime handler. The Go runtime does
not support fsmerged EROFS — it rejects fsmeta.erofs mount sources with
"unsupported mount source" — so erofs is only usable with runtime-rs.
Drop qemu-coco-dev (Go) from the erofs CI matrix and add a check in
kata-deploy's configure_erofs_snapshotter() that inspects the
SNAPSHOTTER_HANDLER_MAPPING: if any Go shim is explicitly mapped to
erofs, emit a prominent warning and bail out with a clear error telling
the operator to fix the mapping.
Since all shims are now guaranteed to be runtime-rs when erofs is
active, remove the conditional is_rust_shim gating and always emit the
full erofs configuration (differ options, default_size,
max_unmerged_layers=1).
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
To ensure we're using the latest released version of the project, as I
think we're missing patches on v2.2.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Add qemu-coco-dev-runtime-rs to the arm64 k8s test matrix so that the
CoCo non-TEE configuration is exercised on aarch64 runners.
Also enable auto-generated policy for qemu-coco-dev on aarch64 (matching
the existing x86_64 behavior) and register the new job as a required
gatekeeper check.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
Build coco-guest-components, pause-image, and rootfs-image-confidential
for arm64, which are required by qemu-coco-dev-runtime-rs.
Enable MEASURED_ROOTFS on the arm64 shim-v2 build, add the aarch64 case
to install_kernel() so the default kernel is built as a unified kernel
(with confidential guest support, like x86_64), and adjust the kernel
install naming so only CCA builds get the -confidential suffix.
Also wire rootfs-image-confidential-tarball into the aarch64 local-build
Makefile.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
The arm64 build workflow was missing the tools build entirely.
Add build-tools-asset and create-kata-tools-tarball jobs mirroring
the amd64 workflow so that genpolicy and the other tools are
available for coco-dev tests that need auto-generated policy.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
Update the stale issues workflow to run more frequently:
- Weekdays: Every 4 hours (6x per day) at 00:00, 06:00, 12:00, 18:00 UTC
- Weekends: Every hour (24x per day)
Previously ran once daily at midnight UTC. This change reduces the time
it will take for us to get through our backlog, particularly increasing
the runs at the weekend, when we should have less other CI running,
which it could impact due to GH API rate limiting.
Assisted-by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
I've seen several cases of the CLH tests just being killed due to the 60
minutes timeout. Let's bump it to 75 and see how it goes.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
skopeo copy with --override-arch fails with "authentication required"
during blob existence checks at the destination, regardless of how
credentials are provided (--dest-creds, --authfile, REGISTRY_AUTH_FILE).
This is a known issue with skopeo 1.13.x when copying from manifest
list sources.
Replace the skopeo/buildah approach with docker/build-push-action,
which is already proven in this repo (build-kubectl-image.yaml) and
handles multi-arch builds and Quay pushes reliably. The workflow now
builds a trivial FROM busybox image using buildx with QEMU emulation.
Fixes: b0abe5999 ("workflows: Add workflow to create auth registry test image")
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
At a rate of default 30 per run, with over 1.5k issues, it will take
us over 50 days to do a pass of the issues we have, so increase
operations-per-run as suggested in the workflow by github to
reduce this. Based on the stats of the latest run, we are not too
close to hitting the API rate limit:
```
Github API rate used: 32
Github API rate remaining: 3693; reset at: Thu Apr 09 2026 10:23:31 GMT+0000 (Coordinated Universal Time)
```
so I think this should be okay.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
v9 is based on Node.js 20 which is deprecated, so update to the
latest to pick up a Node.js 24 version before Github removes Node 20
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Add a manually-triggered workflow that builds and pushes a multi-arch
busybox-based image to quay.io/kata-containers/confidential-containers-auth
for use as an authenticated container image in CI tests.
The workflow uses skopeo to copy per-arch images and buildah to create
and push the multi-arch manifest.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
TEE hardware (TDX, SEV-SNP) is very limited in CI. Running the full
test suite on every PR consumes these resources unnecessarily, since
most tests exercises what is already exercised by the -coco-dev CIs.
Introduce a `tee-test-scope` workflow input (small/full) and a new
`baremetal-small-tee` K8S_TEST_HOST_TYPE that runs only the 12 tests
that are TEE-relevant: attestation tests (encrypted/authenticated/
signed image pull, confidential attestation) plus policy and trusted
ephemeral data storage tests.
PR runs default to "small" (12 tests), nightly runs use "full" (59
tests), and manual dispatch offers a dropdown to choose.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
The cargo deny generated action doesn't seem to work
and seems unnecessarily complex, so try using
EmbarkStudios/cargo-deny-action instead
Fixes: #11218
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
As we're in the process to stabilise runtime-rs for the coming 4.0.0
release, we better start running as many tests as possible with that.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Resolve externals.nydus-snapshotter version and url in the Docker image build
with yq from the repo-root versions.yaml instead of Dockerfile ARG defaults.
Drop the redundant workflow that only enforced parity between those two sources.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Docker 26+ configures container networking (veth pair, IP addresses,
routes) after task creation rather than before. Kata's endpoint scan
runs during CreateSandbox, before the interfaces exist, resulting in
VMs starting without network connectivity (no -netdev passed to QEMU).
Add RescanNetwork() which runs asynchronously after the Start RPC.
It polls the network namespace until Docker's interfaces appear, then
hotplugs them to QEMU and informs the guest agent to configure them
inside the VM.
Additional fixes:
- mountinfo parser: find fs type dynamically instead of hardcoded
field index, fixing parsing with optional mount tags (shared:,
master:)
- IsDockerContainer: check CreateRuntime hooks for Docker 26+
- DockerNetnsPath: extract netns path from libnetwork-setkey hook
args with path traversal protection
- detectHypervisorNetns: verify PID ownership via /proc/pid/cmdline
to guard against PID recycling
- startVM guard: rescan when len(endpoints)==0 after VM start
Fixes: #9340
Signed-off-by: llink5 <llink5@users.noreply.github.com>
The standard stale/action is intended to be run regularly with
a date offset, but we want to have one we can run against a specific
date in order to run the stale bot against issues created since a particular
release milestone, so calculate the offset in one step and use it in the next.
At the moment we want to run this to stale issues before 9th October 2022 when Kata 3.0 was release, so default to this.
Note the stale action only processes a few issues at a time to avoid rate limiting, so why we want a cron job to it can get through
the backlog, but also to stale/unstale issues that are commented on.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Update the action to resolve the following warning in GHA:
> Node.js 20 actions are deprecated. The following actions are running
> on Node.js 20 and may not work as expected:
> actions/checkout@11bd71901b.
> Actions will be forced to run with Node.js 24 by default starting June 2nd, 2026.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Fixed a bug with the debug kernel build where common/ was repeated
after the common path variable, resulting in the debug
confs never being picked up.
This exposed a subsequent bug where the debug conf
was included in other builds, this is also fixed by creating a
separate directory for debug confs with one file at the moment,
debug.conf that contains debug configurations and bpf specific
configs.
To enable kernel builds (specifically for bpf) the dwarves package was added
to the kernel dockerfile for the pahole package.
Signed-off-by: Agam Dua <agam_dua@apple.com>
Add the debug kernel to the kata tarball alongside the other kernels.
Also update the kernel README documentation to describe the new debug
kernel build process.
Signed-off-by: Agam Dua <agam_dua@apple.com>
This supersedes https://github.com/kata-containers/kata-containers/pull/12622.
I replaced Zensical with mkdocs-materialx. Materialx is a fork of mkdocs-material
created after mkdocs-material was put into maintenance mode. We'll use this
platform until Zensical is more feature complete.
Added a few of the existing docs into the site to make a more user-friendly flow.
Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>
For kata tarballs we eventually release to GitHub, check their
size against the GitHub size limit. With this, we fail in case of
an ongoing release process in 'CI | Publish Kata Containers payload'
instead of only later on in the 'Release Kata Containers' action,
and we fail during PR builds, avoiding this situation at all.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
The old hunspell based spell-check was causing contributors
challenges and proving a barrier to doc updates. We've replaced
it with a cspell based-solution, so clean up the old approach.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Add a separate spellcheck workflow, so we can replace
the complex hunspell approach embedded in static-checks
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
As the NVIDIA stack has shifted to using an image for both the
confidential and non-confidential variants, we retire the initrd
build.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>