Commit Graph

1133 Commits

Author SHA1 Message Date
Fabiano Fidêncio
19c194aa94 ci: Add runtime-rs GPU shims to NVIDIA GPU CI workflow
Add qemu-nvidia-gpu-runtime-rs and qemu-nvidia-gpu-snp-runtime-rs to
the NVIDIA GPU test matrix so CI covers the new runtime-rs shims.

Introduce a `coco` boolean field in each matrix entry and use it for
all CoCo-related conditionals (KBS, snapshotter, KBS deploy/cleanup
steps). This replaces fragile name-string comparisons that were already
broken for the runtime-rs variants: `nvidia-gpu (runtime-rs)` was
incorrectly getting KBS steps, and `nvidia-gpu-snp (runtime-rs)` was
not getting the right env vars.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-05-07 10:33:26 +02:00
Fabiano Fidêncio
acfb9f9762 Merge pull request #12954 from zvonkok/modular-makefile
build: remove gha-adjust-to-use-prebuilt-components.sh
2026-05-07 10:32:32 +02:00
Greg Kurz
c18932b5ab build-checks: Remove make vendor
The `generate_vendor.sh` script already knows how to create a tarball
with all the rust and go vendored code within the repo. It is used by
the release workflow to provide vendored code to downstream consummers
that might need it.

There isn't any vendored code in the repo anymore.

It thus doesn't seem quite useful to run `make vendor` in CI.

Stop doing it.

Signed-off-by: Greg Kurz <groug@kaod.org>
2026-05-06 09:49:50 +02:00
Greg Kurz
1c1945f997 ci: Add go mod tidy check to static checks
Ensures go.mod and go.sum files are kept up-to-date on PRs that modify
Go code, go modules, or the Go version in versions.yaml.

The workflow can also be run directly from the GitHub UI, in order
to check the tidyness of the target branch.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
2026-05-06 09:31:57 +02:00
Fabiano Fidêncio
6033f25e0e Merge pull request #12965 from kata-containers/dependabot/github_actions/actions/checkout-6.0.2
build(deps): bump actions/checkout from 4.2.2 to 6.0.2
2026-05-04 19:37:54 +02:00
Fabiano Fidêncio
a3d6829ed4 Merge pull request #12964 from kata-containers/dependabot/github_actions/editorconfig-checker/action-editorconfig-checker-2.2.0
build(deps): bump editorconfig-checker/action-editorconfig-checker from 2.1.0 to 2.2.0
2026-05-04 19:37:42 +02:00
Fabiano Fidêncio
7c61c55011 Merge pull request #12966 from kata-containers/dependabot/github_actions/streetsidesoftware/cspell-action-8.4.0
build(deps): bump streetsidesoftware/cspell-action from 8.3.0 to 8.4.0
2026-05-04 19:37:28 +02:00
dependabot[bot]
b2931c6759 build(deps): bump actions/checkout from 4.2.2 to 6.0.2
Bumps [actions/checkout](https://github.com/actions/checkout) from 4.2.2 to 6.0.2.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v4.2.2...de0fac2e4500dabe0009e67214ff5f5447ce83dd)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: 6.0.2
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-05-04 17:05:59 +00:00
Fabiano Fidêncio
3d43259463 Merge pull request #12974 from fidencio/topic/ci-tdx-nightly-run-with-runtime-rs
ci: tdx: Remove ITA key usage and run qemu-tdx-runtime-rs on nightly
2026-05-04 19:04:03 +02:00
Fabiano Fidêncio
8655d87892 ci: nvidia: Disable NVRC trace logging on nightly runs
On nightly CI, run the NVIDIA GPU tests without setting nvrc.log=trace.
This gives us end-to-end test coverage that more closely matches how
users would actually run Kata Containers with NVIDIA GPUs, since trace
logging is not enabled by default in production.

NVRC trace logging remains enabled for PR runs, where the extra
verbosity is useful for debugging failures.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-03 18:13:07 +02:00
Fabiano Fidêncio
51d5f2ea7b ci: Run runtime-rs tests for TDX on nightly
As we're in the process to stabilise runtime-rs for the coming 4.0.0
release, we better start running as many tests as possible with that.

The TDX runtime-rs job is gated to nightly runs only (pr-number ==
"nightly") since we only have a single TDX machine and cannot afford
to run both qemu-tdx and qemu-tdx-runtime-rs on every PR.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-03 18:05:58 +02:00
Fabiano Fidêncio
8c3c7aa871 ci: Drop ITA_KEY usage from CI workflows
The ITA_KEY secret was conditionally passed to TDX jobs for Intel
Trust Authority attestation, but it is no longer needed. Remove it
from all workflow files and the test helper export.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-03 18:05:51 +02:00
stevenhorsman
9715a7cca2 release: fix release workflow concurrency deadlock
Architecture-specific release workflows were using the same concurrency
group when called from release.yaml, causing GitHub Actions to detect
a deadlock and cancel the builds.

Fix by appending architecture suffix to each workflow's concurrency
group, allowing parallel execution without conflicts.

Assisted-by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-05-02 20:13:17 +01:00
dependabot[bot]
7a1fa7842d build(deps): bump streetsidesoftware/cspell-action from 8.3.0 to 8.4.0
Bumps [streetsidesoftware/cspell-action](https://github.com/streetsidesoftware/cspell-action) from 8.3.0 to 8.4.0.
- [Release notes](https://github.com/streetsidesoftware/cspell-action/releases)
- [Changelog](https://github.com/streetsidesoftware/cspell-action/blob/main/CHANGELOG.md)
- [Commits](9cd41bb518...de2a73e963)

---
updated-dependencies:
- dependency-name: streetsidesoftware/cspell-action
  dependency-version: 8.4.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-05-01 18:06:22 +00:00
dependabot[bot]
883edd798f build(deps): bump editorconfig-checker/action-editorconfig-checker
Bumps [editorconfig-checker/action-editorconfig-checker](https://github.com/editorconfig-checker/action-editorconfig-checker) from 2.1.0 to 2.2.0.
- [Release notes](https://github.com/editorconfig-checker/action-editorconfig-checker/releases)
- [Commits](4b6cd6190d...840e866d93)

---
updated-dependencies:
- dependency-name: editorconfig-checker/action-editorconfig-checker
  dependency-version: 2.2.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-05-01 18:05:35 +00:00
Zvonko Kaiser
35dfb11fe4 build: replace prebuilt-components sed hack with DEPS=
Mutating the Makefile in-place to strip prereqs was fragile and
limited to one target per invocation. DEPS= skips deps declaratively
and propagates through recursive make, so multi-target builds can
opt out in one shot.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-04-30 00:48:46 +00:00
Fabiano Fidêncio
ef15324b04 Revert "ci: Only run arm64 k8s tests on nightly builds"
This reverts commit c5b159c556, as now we
have 3 runners plugged into the CI.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-29 07:38:12 +02:00
Steve Horsman
2435970fe8 Merge pull request #12933 from fidencio/topic/runtime-rs-decouple-dragonball-from-non-x86-checks
runtime-rs: drop misleading unsupported arches gating
2026-04-28 18:36:16 +01:00
Aurélien Bombo
e4fbddb91a ci: rename cloud-hypervisor to clh-runtime-rs
This aligns on qemu-runtime-rs and makes more sense.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-04-28 10:58:01 -05:00
Fabiano Fidêncio
8ab97a60f3 ci: install protobuf-compiler for runtime-rs build-checks
The `runtime-rs` component of `build-checks.yaml` declared `rust`
as its only dependency, but the runtime-rs build pulls in
`prost-build v0.8.0` (via `ttrpc-codegen` -> `containerd-shim-protos`,
and via the in-tree `hypervisor` crate), and `prost-build`'s build
script needs a `protoc` binary at compile time.

This worked on x86_64 and aarch64 only because `prost-build v0.8.0`
ships bundled `protoc` binaries for those targets. On s390x (and
ppc64le, when the matrix gets there) there is no bundled binary,
so the build fails with:

  Failed to find the protoc binary. The PROTOC environment variable
  is not set, there is no bundled protoc for this platform, and
  protoc is not in the PATH

The reason this didn't show up in CI before is that `make test`
and `make check` for runtime-rs were wrapped in arch-specific
`ifeq` blocks in `src/runtime-rs/Makefile` that turned them into
no-ops on s390x/ppc64le/riscv64gc. The previous commit dropped
those gates so `make {test,check}` now actually run on every arch,
which exposes this latent CI gap.

Match what `agent`, `libs`, `agent-ctl`, `kata-ctl` and `genpolicy`
already declare and add `protobuf-compiler` to runtime-rs's needs.
The existing `Install protobuf-compiler` step in this workflow
already runs `sudo apt-get -y install protobuf-compiler`, which
the s390x/ppc64le runners support (those other components have
been using it on s390x for some time).

Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-28 16:25:31 +02:00
stevenhorsman
09ac10e8df workflows: Remove workflow concurrency
It seems like some of our workflow concurrency rules are clashing
with the job-level ones for some reason and cancelling jobs, so
remove these problematic workflow rules.

Co-authored-by: Fabiano Fidêncio <fabiano@fidencio.org>
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-28 14:56:07 +01:00
stevenhorsman
d5411e00f6 workflows: Fix version on pinned action
docker/build-push-action@bcafcacb16
seemed to be given the wrong version in the comment, so
update this to be correct

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-28 13:10:36 +01:00
stevenhorsman
063a13ccd0 workflows: Bump zizmor to 1.22
Bump zizmor to the 1.22 version to pick up new rule updates.
Later bumps to follow once this has proven stable

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-28 13:10:36 +01:00
stevenhorsman
92ded7ff98 workflows: Add timeouts
Recently I've seen a couple of occasions where
jobs have seemed to run infinitely. Add timeouts
for these jobs to stop this from happening if things
get into a bad state.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-28 13:10:36 +01:00
stevenhorsman
af4ced32f4 workflows: Add concurrency limits
It is good practice to add concurrency limits to automatically
cancel jobs that have been superceded and potentially stop
race conditions if we try and get artifacts by workflows and job id
rather than run id.

See https://docs.zizmor.sh/audits/#concurrency-limits

Assisted-by: IBM Bob

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-28 13:10:36 +01:00
Fabiano Fidêncio
5eefbbafb3 Merge pull request #12899 from kata-containers/topic/runtime-rs-docker-qemu
runtime-rs: qemu: Add docker support and tests
2026-04-28 13:23:22 +02:00
Xynnn007
f4a9847877 ci: enforce SLSA provenance for published artifacts
Published artifacts are consumed as security-critical runtime inputs, so
they need verifiable provenance that binds each binary back to the exact
source and build context.

Without provenance, downstream users cannot reliably distinguish trusted
CI outputs from repackaged or substituted artifacts.

Recording provenance in Sigstore's immutable transparency infrastructure
provides auditable evidence that survives mirror/registry movement and
strengthens supply-chain forensics and policy enforcement.

This also aligns artifact publication with a zero-trust verification
model expected by confidential-computing consumers and automated
admission controls.

Remove workflow-level attestation gating so published artifacts are
consistently accompanied by build provenance.

Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
2026-04-28 11:40:15 +02:00
Fabiano Fidêncio
58a2cc0baf ci: enable Docker smoke tests for runtime-rs (qemu-runtime-rs)
Add qemu-runtime-rs to the Docker test matrix on amd64 and s390x
so that the runtime-rs shim is exercised with Docker + QEMU
networking in CI.

Fixes: #9340

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-28 10:20:18 +02:00
Greg Kurz
de91eda11b Merge pull request #12890 from fidencio/topic/shell-check
shell check: Let the bot fix those issues
2026-04-24 12:41:33 +02:00
Fabiano Fidêncio
c5b159c556 ci: Only run arm64 k8s tests on nightly builds
The arm64 k8s tests are expensive and consume self-hosted runner
resources. Restrict both run-k8s-tests-on-arm64 and
run-kata-coco-tests-on-arm64 to nightly CI runs by gating on
inputs.pr-number == 'nightly'.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-24 09:38:13 +02:00
Fabiano Fidêncio
ea974bea59 ci: Remove redundant shellcheck.yaml workflow
The shellcheck_required.yaml workflow now covers everything this
workflow did and more, running at severity=style instead of the
default severity.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-24 08:14:08 +02:00
Fabiano Fidêncio
d532cd06f8 ci: Bump shellcheck severity to style
Raise the shellcheck gate from severity=error to severity=style now
that all scripts in the repo have been cleaned up. Ignore paths that
are being removed by other efforts.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-24 08:14:08 +02:00
Fabiano Fidêncio
68cc7f8e70 ci: remove unmaintained CoCo stability test workflows
The ci-coco-stability.yaml workflow has its weekly schedule
commented out with a note that the workload is not maintained.
Remove the entire chain: ci-coco-stability.yaml, ci-weekly.yaml,
run-kata-coco-stability-tests.yaml, and the kubernetes stability
test scripts that were only used through this path.

The local containerd stability tests (tests/stability/gha-run.sh)
remain as they are actively used by basic-ci workflows.

Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-23 08:46:12 +02:00
Fabiano Fidêncio
e0d98fafe3 ci: remove disabled run-cri-containerd-tests-arm64 job
This job in ci.yaml has been unconditionally disabled (if: false)
with no tracking issue or path to re-enablement.

Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-23 08:46:12 +02:00
Fabiano Fidêncio
c7e3f95883 tests: remove disabled tracing tests and CI job
The run-tracing job in basic-ci-amd64.yaml has been disabled
(if: false) due to issue #9763, with no path to re-enablement.
Remove the job definition and the backing
tests/functional/tracing/ directory.

Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-23 08:46:12 +02:00
Fabiano Fidêncio
8a93cf8f17 tests: remove disabled VFIO tests and CI job
The run-vfio job in basic-ci-amd64.yaml has been disabled
(if: false) due to issues #9764, #9851, and #9940, with no
path to re-enablement. Remove the job definition and the
backing tests/functional/vfio/ directory.

Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-23 08:46:12 +02:00
Fabiano Fidêncio
8e685f22c6 ci: remove orphan run-kata-deploy-tests-on-aks.yaml workflow
This reusable workflow (workflow_call) has no caller anywhere in
the repository, making it dead code.

Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-23 08:46:12 +02:00
Fabiano Fidêncio
b74f2c0a9c tests: remove metrics tests and workflow
The run-metrics.yaml workflow is a reusable workflow_call with no
caller in the repository, making it effectively dead code. Remove
the workflow, the entire tests/metrics/ directory (~586 files
including vendored Go for checkmetrics), and the "metrics"
self-hosted runner label from actionlint.yaml.

Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-23 08:46:12 +02:00
Saul Paredes
baf0f16804 ci: k8s-tests: test mariner and runtime-rs
Disable policy tests when using mariner and runtime-rs. These are not supported yet.

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
2026-04-21 14:08:21 -07:00
Aurélien Bombo
d64fce3998 Revert "ci: k8s: Adjust timeout on free runners"
This reverts commit 8d6f1d6f34.
2026-04-20 15:36:35 -05:00
Fabiano Fidêncio
d6f0b15578 ci: erofs: restrict to runtime-rs only
The erofs snapshotter configuration is node-wide (a single containerd
drop-in) and cannot be split per runtime handler.  The Go runtime does
not support fsmerged EROFS — it rejects fsmeta.erofs mount sources with
"unsupported mount source" — so erofs is only usable with runtime-rs.

Drop qemu-coco-dev (Go) from the erofs CI matrix and add a check in
kata-deploy's configure_erofs_snapshotter() that inspects the
SNAPSHOTTER_HANDLER_MAPPING: if any Go shim is explicitly mapped to
erofs, emit a prominent warning and bail out with a clear error telling
the operator to fix the mapping.

Since all shims are now guaranteed to be runtime-rs when erofs is
active, remove the conditional is_rust_shim gating and always emit the
full erofs configuration (differ options, default_size,
max_unmerged_layers=1).

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-19 13:24:31 +02:00
Fabiano Fidêncio
9c803d86a6 ci: erofs: Bump containerd to v2.3
To ensure we're using the latest released version of the project, as I
think we're missing patches on v2.2.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-19 13:24:31 +02:00
Fabiano Fidêncio
cdd09c3c65 ci: enable erofs tests with runtime-rs
Now that erofs snapshotter has added , let's make sure this is tested.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-19 13:24:31 +02:00
Fabiano Fidêncio
35e48fdfd1 ci: run qemu-coco-dev-runtime-rs tests on arm64
Add qemu-coco-dev-runtime-rs to the arm64 k8s test matrix so that the
CoCo non-TEE configuration is exercised on aarch64 runners.

Also enable auto-generated policy for qemu-coco-dev on aarch64 (matching
the existing x86_64 behavior) and register the new job as a required
gatekeeper check.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-18 00:48:13 +02:00
Fabiano Fidêncio
861f15cdc4 build: add arm64 coco-dev build dependencies
Build coco-guest-components, pause-image, and rootfs-image-confidential
for arm64, which are required by qemu-coco-dev-runtime-rs.

Enable MEASURED_ROOTFS on the arm64 shim-v2 build, add the aarch64 case
to install_kernel() so the default kernel is built as a unified kernel
(with confidential guest support, like x86_64), and adjust the kernel
install naming so only CCA builds get the -confidential suffix.

Also wire rootfs-image-confidential-tarball into the aarch64 local-build
Makefile.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-18 00:48:13 +02:00
Fabiano Fidêncio
e1f8b8e8b4 build: add arm64 tools build (genpolicy only)
The arm64 build workflow was missing the tools build entirely.
Add build-tools-asset and create-kata-tools-tarball jobs mirroring
the amd64 workflow so that genpolicy and the other tools are
available for coco-dev tests that need auto-generated policy.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-18 00:48:02 +02:00
Steve Horsman
1db12f8ccf Merge pull request #12812 from stevenhorsman/tee-test-refactor
ci: Refactor confidential TEE support
2026-04-17 11:12:13 +01:00
stevenhorsman
1dc57c6cef ci: increase stale issues workflow frequency
Update the stale issues workflow to run more frequently:
- Weekdays: Every 4 hours (6x per day) at 00:00, 06:00, 12:00, 18:00 UTC
- Weekends: Every hour (24x per day)

Previously ran once daily at midnight UTC. This change reduces the time
it will take for us to get through our backlog, particularly increasing
the runs at the weekend, when we should have less other CI running,
which it could impact due to GH API rate limiting.

Assisted-by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-16 20:50:38 +01:00
dependabot[bot]
c044403409 build(deps): bump tim-actions/wip-check from 1.0.0 to 1.1.0
Bumps [tim-actions/wip-check](https://github.com/tim-actions/wip-check) from 1.0.0 to 1.1.0.
- [Release notes](https://github.com/tim-actions/wip-check/releases)
- [Commits](1c2a1ca6c1...8c84f59872)

---
updated-dependencies:
- dependency-name: tim-actions/wip-check
  dependency-version: 1.1.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-16 10:48:41 +00:00
stevenhorsman
ff246f9538 ci: Remove deploy_snapshotter
Snapshotter deployment is a no-op now that
kata-deploy handles this, so clean up this code.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-16 09:21:04 +01:00