Enable devicemapper (dm-verity) support in CI by forwarding the
USE_DEVMAPPER environment variable to the agent build step across
all architectures (amd64, arm64, s390x).
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Add k8s-erofs-dmverity.bats integration test that verifies dm-verity
protected EROFS layers work end-to-end, and register the integrity
mode in the CoCo test matrix.
This commit introduces two new files to enable it:
- k8s-erofs-dmverity.bats
- pod-erofs-dmverity-probe.yaml
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Introduce a new `dmverity` module in kata-types that provides dm-verity
device creation, destruction and lifecycle management via devicemapper
ioctls. The module is conditionally compiled behind the `devicemapper`
feature flag, which also pulls in tokio for async device-node polling.
The workspace devicemapper dependency is pinned to a specific git
revision for reproducible builds.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
The kata-monitor image has no job-dispatcher sidecar, so opt out of the
kata-deploy-specific dispatcher manifest derivation in the
payload-after-push workflow by setting
KATA_DEPLOY_PUBLISH_JOB_DISPATCHER=false, mirroring the same fix already
applied to the release workflows.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Add k8s-vm-templating-test.bats which exercises pod create
with the factory initialized on the target node.
Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
Let's use the ActionsPZ runners for the following jobs:
- publish-kata-deploy-image-s390x
- publish-kata-monitor-image-s390x
to improve CI experiences.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
The shared _publish_multiarch_manifest() helper always derived a
"-job-dispatcher" registry from the registries it was given. However, the
dispatcher is a kata-deploy-specific sidecar image, so when the helper
was reused to publish the kata-monitor multi-arch manifest it wrongly
tried to push a non-existent kata-monitor-job-dispatcher image.
Let's gate the dispatcher derivation behind
KATA_DEPLOY_PUBLISH_JOB_DISPATCHER (defaulting to true so the
kata-deploy path is unchanged) and opt out of it when publishing the
kata-monitor manifest.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
Package and ship the dispatcher built in the previous commit so the
job-mode Helm chart has an image to run.
- Dockerfile.components: build kata-deploy and kata-deploy-job-dispatcher
from the same rust-builder stage (one compile), and run fmt/clippy/
test for both crates.
- job-dispatcher/Dockerfile: a minimal distroless/static image containing
only the dispatcher binary and CA certs - it is an API client, so it
needs nothing from the host.
- local-build: kata-deploy-job-dispatcher becomes its own build component
with its own static tarball
(kata-deploy-static-kata-deploy-job-dispatcher.tar.zst); the shared
rust-builder output is reused so the two components do not recompile
the workspace locally. The payload script builds and pushes a separate
"<kata-deploy registry>-job-dispatcher" image with the same tag scheme,
and release.sh publishes its multi-arch manifest symmetrically.
- CI: add kata-deploy-job-dispatcher to the build-kata-deploy-components
matrices (its tarball is picked up by the existing kata-artifacts-*
glob), and gate it in the kata-deploy rust static checks.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
Run qemu-coco-dev-runtime-rs k8s test workflow on the zVSI
only during nightly builds.
Changes:
- Modified run-k8s-tests-on-zvsi.yaml to accept vmm as workflow
inputs instead of hardcoded matrix values
- run-k8s-tests-on-zvsi passes a conditional vmm value; 4 vmms
for nightly/dev builds and 3 vmms for all other PRs.
This ensures qemu-coco-dev-runtime-rs is only tested with nydus
snapshotter during nightly CI runs, reducing PR test time while
maintaining comprehensive nightly coverage.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Add qemu-coco-dev-runtime-rs to the VMM matrix in the zVSI K8S
test workflow, configured to run only with the nydus snapshotter.
Changes:
- Add qemu-coco-dev-runtime-rs to the vmm matrix
- Exclude overlayfs + qemu-coco-dev-runtime-rs combination
- Exclude devmapper + qemu-coco-dev-runtime-rs combination
- Update CoCo-related conditional steps to include the new VMM:
* KBS environment variable setup
* kbs-client uninstall/install steps
* CoCo KBS deployment
This ensures qemu-coco-dev-runtime-rs is only tested with nydus
snapshotter, while maintaining existing test configurations for
other VMMs.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Switch kata-monitor workflows from the deprecated "active" key to
"latest" so CI resolves containerd versions from versions.yaml correctly
after the key rename.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
When I implemented the OSC scanner I followed the
guidance on the the action repo to use a single workflow for
both PR and main tests and rely on a re-usable workflow.
Since then I've realised some negatives of this approach:
- Unlike actions, dependabot needs custom logic to bump
workflow pins, so we are more likely to be out of date
- A lack of transparency/notification of when updates
are needed, due to bugs/ security fixes
- The dual workflow results in skipped jobs that
clutter the UI
- No ability to customise the pre-steps, or config
As such let's take the hit of managing two workflows,
in order to give us better flexibility.
Also add the `--call-analysis=none` option as we run govulncheck
separately, so don't want to have to compile and have a slow build
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Generated-By: IBM Bob
These jobs build and push the kata-deploy OCI image, so call them
publish-kata-deploy-image-* instead of *-payload-*, matching the
kata-monitor image jobs and making the workflow easier to read.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Add a single-job k8s test that installs the kata-deploy helm chart
with monitor.enabled=true, pointed at the per-PR kata-monitor image
built earlier in the same run, and exercises both the rollout and the
user-visible behaviour:
* the kata-monitor DaemonSet rolls out and the pod stays up without
container restarts;
* a real kata-runtime probe pod is scheduled, then /metrics and
/sandboxes are scraped through the apiserver pod-proxy to prove
kata-monitor sees the sandbox (non-zero running-shim count plus at
least one per-sandbox kata_shim_* metric);
* after the probe pod is deleted, /metrics drops back to a zero
running-shim count.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: OpenAI Codex <codex@openai.com>
Exercise the published kata-monitor container image (the one built by
publish-kata-monitor-payload-amd64) rather than the on-disk binary, so
integration regressions like the recent glibc/musl mismatch surface at
PR time. The kata-monitor-tests.sh script keeps the binary fallback for
ad-hoc local runs.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: OpenAI Codex <codex@openai.com>
Drop the stale CRI-O matrix entry (its cri-tools pin was several
releases behind) along with the exclude that hid the containerd job,
and pin the remaining job to containerd's "active" track (currently
v2.2) via CONTAINERD_VERSION.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: OpenAI Codex <codex@openai.com>
Bump the containerd version used by CI from v1.7.25 to v2.3.0.
Rename the version-range fields in versions.yaml and throughout the
GitHub Actions workflows from lts/active/version/sandbox_api to
minimum/latest to make their meaning self-evident:
minimum: "v1.7" # oldest containerd branch under test
latest: "v2.3" # newest containerd branch under test
Drop the bare version field (superseded by the matrix) and the
sandbox_api alias (covered by latest). Update all containerd_version
matrix entries in the workflow files accordingly, and update
gha-run-k8s-common.sh to resolve the new key names.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
Mirror the CI payload publish flow in local builds, including image and
helm chart publishing, while reusing the same chart upload helper in
payload-after-push to avoid duplicated chart packaging logic.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
As the boot-image-se builds a fake image, the secret
CI_HKD_PATH is not necessary anymore.
Remove it from the workflows.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
- Add FAKE_SE_IMAGE mode support in SE image build scripts for CI without real SE setup
- Simplify workflow by removing build-asset-boot-image-se job
- Integrate fake-boot-image-se into build matrix instead of separate job
- Skip attestation for fake-boot-image-se builds
- Update qemu-se and qemu-se-runtime-rs shim components to use:
- rootfs-initrd-confidential instead of rootfs-image-confidential
- boot-image-se component
This change streamlines the s390x SE build process and makes it easier
to test without requiring actual Secure Execution infrastructure.
This fixes deployment issues on non-TEE systems where TEE-specific artifacts
(like boot-image-se for IBM SEL) are not included in the kata-deploy image,
while ensuring TEE systems still get all required components.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
kata-deploy currently hard-codes the EROFS snapshotter
default_size to "10G", so the CoCo EROFS CI lane only
exercises the path where the snapshotter provides an rwlayer.
Use the generic containerd.userDropIn support for the EROFS
default_size and thread it through the Kubernetes CI helpers.
Keep the kata-deploy default at "10G" to preserve current
behavior, but allow the workflow to set "0" for the runtime-rs
no-rwlayer path.
Expand the existing EROFS snapshotter job to run both values.
The override is written to containerd as a TOML string so "0"
is not parsed as an integer.
Assisted-by: OpenAI Codex <codex@openai.com>
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Drop cloud-hypervisor-glibc from local and CI kata-deploy build targets
now that Azure CLH uses the standard cloud-hypervisor artifact set.
This removes obsolete build matrix entries and installer target
handling.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Switch AKS Mariner matrix entries to clh-azure handlers and remove the
temporary host-OS based helm value overrides.
Update integration test wiring and required test labels so CI tracks the
new runtime names.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Build and publish the kata-deploy binary and CoCo guest-pull nydus
snapshotter as dedicated per-arch artifacts, then consume those tarballs
when assembling the kata-deploy image.
This avoids rebuilding those components in the payload image (which
would happen in serial) path and reduces overall CI build time.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
containerd 2.3 requires Go 1.26.3, but Kata still pins Go 1.25.10.
Use Go 1.26.3 for the sandbox-api job so that make cri-integration
can build containerd from source.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
The job was disabled because TestImageLoad was failing when using the
shim sandboxer with runc due to a containerd bug (config.json not
being written to the bundle directory).
Now that check_daemon_setup uses podsandbox for the runc sanity check,
the root cause of the failure is worked around on our side and the job
can be re-enabled.
Also update the runner to ubuntu-24.04.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Update the Dockerfile to copy each kata-static-<name>.tar.zst directly
into the image alongside shim-components.json, replacing the old
artifact-extractor stage that unpacked a single merged tarball.
Update the publish-kata-deploy-payload and release CI workflows to
download individual per-component artifacts instead of waiting for a
merged tarball, and simplify kata-deploy-build-and-upload-payload.sh
accordingly. The kata-deploy image build is no longer blocked on the
merge step.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Split the monolithic shim-v2 build target into separate shim-v2-go and
shim-v2-rust targets in kata-deploy-binaries.sh, the local-build
Makefile, and the four architecture CI workflows.
The Go and Rust shims now each produce their own kata-static-<name>.tar.zst
artifact, allowing downstream consumers to select only the shim variant
they need. MEASURED_ROOTFS is set per-arch for the Rust job in CI.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
The docs URL alive check workflow has been disabled for months
and never passed since we moved to GHA., so removes the workflow
and all associated code.
Note: Although the static-checks.sh --doc code was wider scope than
URL check, it wasn't being called anywhere else, so it was removed too.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Assisted-by: IBM Bob
The stale issues workflow was using shell syntax ${AGE} instead of
GitHub Actions syntax ${{ env.AGE }} for the days-before-issue-stale
parameter. This prevented the workflow from correctly reading the
calculated AGE value.
Also added days-before-stale: -1 to disable default stale behavior
and ensure only issue-specific settings apply.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Assisted-By: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Add qemu-nvidia-gpu-runtime-rs and qemu-nvidia-gpu-snp-runtime-rs to
the NVIDIA GPU test matrix so CI covers the new runtime-rs shims.
Introduce a `coco` boolean field in each matrix entry and use it for
all CoCo-related conditionals (KBS, snapshotter, KBS deploy/cleanup
steps). This replaces fragile name-string comparisons that were already
broken for the runtime-rs variants: `nvidia-gpu (runtime-rs)` was
incorrectly getting KBS steps, and `nvidia-gpu-snp (runtime-rs)` was
not getting the right env vars.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
The `generate_vendor.sh` script already knows how to create a tarball
with all the rust and go vendored code within the repo. It is used by
the release workflow to provide vendored code to downstream consummers
that might need it.
There isn't any vendored code in the repo anymore.
It thus doesn't seem quite useful to run `make vendor` in CI.
Stop doing it.
Signed-off-by: Greg Kurz <groug@kaod.org>
Ensures go.mod and go.sum files are kept up-to-date on PRs that modify
Go code, go modules, or the Go version in versions.yaml.
The workflow can also be run directly from the GitHub UI, in order
to check the tidyness of the target branch.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Greg Kurz <groug@kaod.org>