containerd 2.3 requires Go 1.26.3, but Kata still pins Go 1.25.10.
Use Go 1.26.3 for the sandbox-api job so that make cri-integration
can build containerd from source.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
The job was disabled because TestImageLoad was failing when using the
shim sandboxer with runc due to a containerd bug (config.json not
being written to the bundle directory).
Now that check_daemon_setup uses podsandbox for the runc sanity check,
the root cause of the failure is worked around on our side and the job
can be re-enabled.
Also update the runner to ubuntu-24.04.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Update the Dockerfile to copy each kata-static-<name>.tar.zst directly
into the image alongside shim-components.json, replacing the old
artifact-extractor stage that unpacked a single merged tarball.
Update the publish-kata-deploy-payload and release CI workflows to
download individual per-component artifacts instead of waiting for a
merged tarball, and simplify kata-deploy-build-and-upload-payload.sh
accordingly. The kata-deploy image build is no longer blocked on the
merge step.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Split the monolithic shim-v2 build target into separate shim-v2-go and
shim-v2-rust targets in kata-deploy-binaries.sh, the local-build
Makefile, and the four architecture CI workflows.
The Go and Rust shims now each produce their own kata-static-<name>.tar.zst
artifact, allowing downstream consumers to select only the shim variant
they need. MEASURED_ROOTFS is set per-arch for the Rust job in CI.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
The docs URL alive check workflow has been disabled for months
and never passed since we moved to GHA., so removes the workflow
and all associated code.
Note: Although the static-checks.sh --doc code was wider scope than
URL check, it wasn't being called anywhere else, so it was removed too.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Assisted-by: IBM Bob
The stale issues workflow was using shell syntax ${AGE} instead of
GitHub Actions syntax ${{ env.AGE }} for the days-before-issue-stale
parameter. This prevented the workflow from correctly reading the
calculated AGE value.
Also added days-before-stale: -1 to disable default stale behavior
and ensure only issue-specific settings apply.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Assisted-By: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Add qemu-nvidia-gpu-runtime-rs and qemu-nvidia-gpu-snp-runtime-rs to
the NVIDIA GPU test matrix so CI covers the new runtime-rs shims.
Introduce a `coco` boolean field in each matrix entry and use it for
all CoCo-related conditionals (KBS, snapshotter, KBS deploy/cleanup
steps). This replaces fragile name-string comparisons that were already
broken for the runtime-rs variants: `nvidia-gpu (runtime-rs)` was
incorrectly getting KBS steps, and `nvidia-gpu-snp (runtime-rs)` was
not getting the right env vars.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
The `generate_vendor.sh` script already knows how to create a tarball
with all the rust and go vendored code within the repo. It is used by
the release workflow to provide vendored code to downstream consummers
that might need it.
There isn't any vendored code in the repo anymore.
It thus doesn't seem quite useful to run `make vendor` in CI.
Stop doing it.
Signed-off-by: Greg Kurz <groug@kaod.org>
Ensures go.mod and go.sum files are kept up-to-date on PRs that modify
Go code, go modules, or the Go version in versions.yaml.
The workflow can also be run directly from the GitHub UI, in order
to check the tidyness of the target branch.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
On nightly CI, run the NVIDIA GPU tests without setting nvrc.log=trace.
This gives us end-to-end test coverage that more closely matches how
users would actually run Kata Containers with NVIDIA GPUs, since trace
logging is not enabled by default in production.
NVRC trace logging remains enabled for PR runs, where the extra
verbosity is useful for debugging failures.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
As we're in the process to stabilise runtime-rs for the coming 4.0.0
release, we better start running as many tests as possible with that.
The TDX runtime-rs job is gated to nightly runs only (pr-number ==
"nightly") since we only have a single TDX machine and cannot afford
to run both qemu-tdx and qemu-tdx-runtime-rs on every PR.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
The ITA_KEY secret was conditionally passed to TDX jobs for Intel
Trust Authority attestation, but it is no longer needed. Remove it
from all workflow files and the test helper export.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Architecture-specific release workflows were using the same concurrency
group when called from release.yaml, causing GitHub Actions to detect
a deadlock and cancel the builds.
Fix by appending architecture suffix to each workflow's concurrency
group, allowing parallel execution without conflicts.
Assisted-by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Mutating the Makefile in-place to strip prereqs was fragile and
limited to one target per invocation. DEPS= skips deps declaratively
and propagates through recursive make, so multi-target builds can
opt out in one shot.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
The `runtime-rs` component of `build-checks.yaml` declared `rust`
as its only dependency, but the runtime-rs build pulls in
`prost-build v0.8.0` (via `ttrpc-codegen` -> `containerd-shim-protos`,
and via the in-tree `hypervisor` crate), and `prost-build`'s build
script needs a `protoc` binary at compile time.
This worked on x86_64 and aarch64 only because `prost-build v0.8.0`
ships bundled `protoc` binaries for those targets. On s390x (and
ppc64le, when the matrix gets there) there is no bundled binary,
so the build fails with:
Failed to find the protoc binary. The PROTOC environment variable
is not set, there is no bundled protoc for this platform, and
protoc is not in the PATH
The reason this didn't show up in CI before is that `make test`
and `make check` for runtime-rs were wrapped in arch-specific
`ifeq` blocks in `src/runtime-rs/Makefile` that turned them into
no-ops on s390x/ppc64le/riscv64gc. The previous commit dropped
those gates so `make {test,check}` now actually run on every arch,
which exposes this latent CI gap.
Match what `agent`, `libs`, `agent-ctl`, `kata-ctl` and `genpolicy`
already declare and add `protobuf-compiler` to runtime-rs's needs.
The existing `Install protobuf-compiler` step in this workflow
already runs `sudo apt-get -y install protobuf-compiler`, which
the s390x/ppc64le runners support (those other components have
been using it on s390x for some time).
Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
It seems like some of our workflow concurrency rules are clashing
with the job-level ones for some reason and cancelling jobs, so
remove these problematic workflow rules.
Co-authored-by: Fabiano Fidêncio <fabiano@fidencio.org>
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Bump zizmor to the 1.22 version to pick up new rule updates.
Later bumps to follow once this has proven stable
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Recently I've seen a couple of occasions where
jobs have seemed to run infinitely. Add timeouts
for these jobs to stop this from happening if things
get into a bad state.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
It is good practice to add concurrency limits to automatically
cancel jobs that have been superceded and potentially stop
race conditions if we try and get artifacts by workflows and job id
rather than run id.
See https://docs.zizmor.sh/audits/#concurrency-limits
Assisted-by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Published artifacts are consumed as security-critical runtime inputs, so
they need verifiable provenance that binds each binary back to the exact
source and build context.
Without provenance, downstream users cannot reliably distinguish trusted
CI outputs from repackaged or substituted artifacts.
Recording provenance in Sigstore's immutable transparency infrastructure
provides auditable evidence that survives mirror/registry movement and
strengthens supply-chain forensics and policy enforcement.
This also aligns artifact publication with a zero-trust verification
model expected by confidential-computing consumers and automated
admission controls.
Remove workflow-level attestation gating so published artifacts are
consistently accompanied by build provenance.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
Add qemu-runtime-rs to the Docker test matrix on amd64 and s390x
so that the runtime-rs shim is exercised with Docker + QEMU
networking in CI.
Fixes: #9340
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
The arm64 k8s tests are expensive and consume self-hosted runner
resources. Restrict both run-k8s-tests-on-arm64 and
run-kata-coco-tests-on-arm64 to nightly CI runs by gating on
inputs.pr-number == 'nightly'.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
The shellcheck_required.yaml workflow now covers everything this
workflow did and more, running at severity=style instead of the
default severity.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
Raise the shellcheck gate from severity=error to severity=style now
that all scripts in the repo have been cleaned up. Ignore paths that
are being removed by other efforts.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
The ci-coco-stability.yaml workflow has its weekly schedule
commented out with a note that the workload is not maintained.
Remove the entire chain: ci-coco-stability.yaml, ci-weekly.yaml,
run-kata-coco-stability-tests.yaml, and the kubernetes stability
test scripts that were only used through this path.
The local containerd stability tests (tests/stability/gha-run.sh)
remain as they are actively used by basic-ci workflows.
Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
This job in ci.yaml has been unconditionally disabled (if: false)
with no tracking issue or path to re-enablement.
Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
The run-tracing job in basic-ci-amd64.yaml has been disabled
(if: false) due to issue #9763, with no path to re-enablement.
Remove the job definition and the backing
tests/functional/tracing/ directory.
Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
The run-vfio job in basic-ci-amd64.yaml has been disabled
(if: false) due to issues #9764, #9851, and #9940, with no
path to re-enablement. Remove the job definition and the
backing tests/functional/vfio/ directory.
Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
This reusable workflow (workflow_call) has no caller anywhere in
the repository, making it dead code.
Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
The run-metrics.yaml workflow is a reusable workflow_call with no
caller in the repository, making it effectively dead code. Remove
the workflow, the entire tests/metrics/ directory (~586 files
including vendored Go for checkmetrics), and the "metrics"
self-hosted runner label from actionlint.yaml.
Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
The erofs snapshotter configuration is node-wide (a single containerd
drop-in) and cannot be split per runtime handler. The Go runtime does
not support fsmerged EROFS — it rejects fsmeta.erofs mount sources with
"unsupported mount source" — so erofs is only usable with runtime-rs.
Drop qemu-coco-dev (Go) from the erofs CI matrix and add a check in
kata-deploy's configure_erofs_snapshotter() that inspects the
SNAPSHOTTER_HANDLER_MAPPING: if any Go shim is explicitly mapped to
erofs, emit a prominent warning and bail out with a clear error telling
the operator to fix the mapping.
Since all shims are now guaranteed to be runtime-rs when erofs is
active, remove the conditional is_rust_shim gating and always emit the
full erofs configuration (differ options, default_size,
max_unmerged_layers=1).
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>