As the NVIDIA stack has shifted to using an image for both the
confidential and non-confidential variants, we retire the initrd
build.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
This reverts commit ab25592533, as now
we're deploying k3s/rke2 in a way that we properly test them.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
When I added this I had in mind the period that we still relied on the
SEV module being generated, which we don't do for quite a long time.
This wrong assumption caused the cache to **ALWAYS** fail, increasing
our build time considerably for no reason.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
We depend on a k3s commit so we can properly test it, or we need to
change our CI quite a bit to deploy a full template with that imports
in. For now, let's just skip the testing in k3s/rke2 and we'll address
it in a different PR.
ref:
b51167a996
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
These tests run only on nightly and when triggering the dev CI manually.
They cover both nydus snapshotter with guest-pull and experimental-force-guest-pull,
using qemu-coco-dev and qemu-coco-dev-runtime-rs, and are included in the
run-kata-coco-tests workflow behind the extensive-matrix-autogenerated-policy flag.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
By default, setup-go installs ppc64 binary instead of ppc64le,
resulting in an exec format error. Pass the arch explicitly to fix this.
Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
The actionlint gh extension is outdated and the wrapping seems
unnecessary when there is a github action that seems to be maintained,
so let's update to use that
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Unfortunately, due to golang/go#75031, there is an issue
that results in `go: no such tool "covdata"`
with a automatically installed 1.25 toolchain, so
the approach to skip the install_go.sh script (which causes
double install problems) didn't work. Try the alternative approach
of using setup-go action, which should do a more comprehensive job
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Some of our static checks are hitting issues with duplicate
go versions installed. Given that we in go.mod we set the
version to match our required toolchain, if go is already installed
we can let go handle the toolchain version management instead
of installing a second version
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Later on we may even think about merging those, but for now let's at
least make sure the envs used are the same / declared in a similar place
for each job.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Run all CoCo non-TEE variants in a single job on the free runner with an
explicit environment matrix (vmm, snapshotter, pull_type, kbs,
containerd_version).
Here we're testing CoCo only with the "active" version of containerd.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
We were running most of the k8s integration tests on AKS. The ones that
don't actually depend on AKS's environment now run on normal
ubuntu-24.04 GitHub runners instead: we bring up a kubeadm cluster
there, test with both containerd lts and active, and skip attestation
tests since those runtimes don't need them. AKS is left only for the
jobs that do depend on it.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Add push-oras-tarball-cache workflow that runs on push to main when
versions.yaml changes (and on workflow_dispatch). It populates the
ghcr.io ORAS cache with gperf and busybox tarballs from versions.yaml.
Remove the push_to_cache call from download-with-oras-cache.sh since
it was never triggered in CI. Cache population is now done solely by the
new workflow and by populate-oras-tarball-cache.sh when run manually.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Previously zizmor only mandated pinning of third-party actions,
but has recommended rolling this out to all actions now.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
In a refactor we've remove the `matrix` section of this strategy, so
the whole section isn't needed any more, so clean this up.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- Trim trailing whitespace and ensure final newline in non-vendor files
- Add .editorconfig-checker.json excluding vendor dirs, *.patch, *.img,
*.dtb, *.drawio, *.svg, and pkg/cloud-hypervisor/client so CI only
checks project code
- Leave generated and binary assets unchanged (excluded from checker)
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
This adds a basic configuration for editorconfig checker. The
supplied configuration checks against trailing whitespaces and
issues with newlines.
Example:
| tools/packaging/kernel/configs/fragments/x86_64/numa.conf:
| Wrong line endings or no final newline
| tools/packaging/release/generate_vendor.sh:
| 44: Trailing whitespace
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Build a single kernel for both kernel and kernel-confidential on x86_64
and s390x. The kernel is built with TEE support (-x) on those arches only.
This helps to simplilfy and to maintain the code, and having a single
kernel was the original plan since forever.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Build a single kernel for both nvidia-gpu and nvidia-gpu-confidential,
simplifying and reducing code maintenance.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Remove the initramfs folder, its build steps, and use the kernel
based dm-verity enforcement for the handlers which used the
initramfs mode. Also, remove the initramfs verity mode
capability from the shims and their configs.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
This change introduces the kernelinit dm-verity mode, allowing
initramfs-less dm-verity enforcement against the rootfs image.
For this, the change introduces a new variable with dm-verity
information. This variable will be picked up by shim
configurations in subsequent commits.
This will allow the shims to build the kernel command line
with dm-verity information based on the existing
kernel_parameters configuration knob and a new
kernel_verity_params configuration knob. The latter
specifically provides the relevant dm-verity information.
This new configuration knob avoids merging the verity
parameters into the kernel_params field. Avoiding this, no
cumbersome escape logic is required as we do not need to pass the
dm-mod.create="..." parameter directly in the kernel_parameters,
but only relevant dm-verity parameters in semi-structured manner
(see above). The only place where the final command line is
assembled is in the shims. Further, this is a line easy to comment
out for developers to disable dm-verity enforcement (or for CI
tasks).
This change produces the new kernelinit dm-verity parameters for
the NVIDIA runtime handlers, and modifies the format of how
these parameters are prepared for all handlers. With this, the
parameters are currently no longer provided to the
kernel_params configuration knob for any runtime handler.
This change alone should thus not be used as dm-verity
information will no longer be picked up by the shims.
systemd-analyze on the coco-dev handler shows that using the
kernelinit mode on a local machine, less time is spent in the
kernel phase, slightly speeding up pod start-up. On that machine,
the average of 172.5ms was reduced to 141ms (4 measurements, each
with a basic pod manifest), i.e., the kernel phase duration is
improved by about 18 percent.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Add the necessary configuration and code changes to support QEMU
on arm64 architecture in runtime-rs.
Changes:
- Set MACHINETYPE to "virt" for arm64
- Add machine accelerators "usb=off,gic-version=host" required for
proper arm64 virtualization
- Add arm64-specific kernel parameter "iommu.passthrough=0"
- Guard vIOMMU (Intel IOMMU) to skip on arm64 since it's not supported
These changes align runtime-rs with the Go runtime's arm64 QEMU support.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org>
It was observed that some kata-deploy cleanup steps could hang,
causing the workflow to never finish properly. In these cases,
a QEMU process was not cleaned up and kept printing debug logs
to the journal. Over time, this maxed out the runner’s disk
usage and caused the runner service to stop.
Set timeouts for the relevant cleanup steps to avoid this.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
When looking into stale bot more for issues, I realised that our existing
stale job would need permissions to work. Unfortunately the behaviour
of the actions without these permissions is to log, but still finish as successful.
This means it was hard to spot we had an issue.
Add the required permissions to get this working again and improve the message
Also add concurrency rule to make zizmor happy
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Add run_bats_tests() function to common.bash that provides consistent
test execution and reporting across all test suites (k8s, nvidia,
kata-deploy).
This removes duplicated test runner code from run_kubernetes_tests.sh,
run_kubernetes_nv_tests.sh, and run-kata-deploy-tests.sh.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
The runk tool hasn't been supported for a few years, with no maintainers
since ManaSugi stopped being involved in the project and the CI was
disabled in 2024.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
This reverts commit 6130d7330f, as we're
officially swithcing to the rust version of kata-deploy.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
To unlock the release, move the job to publish kata payload after push to an alternate runner(IBM owned) for ppc64le.
Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
We don't need to store the kernel headers anymore. We do need to store
the kernel modules, instead.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
This commit adds a Github workflow for building a Github Pages site for the markdown
files in the docs/ directory. Zensical is a new markdown-based static site generation
framework built by the creators of Material for Mkdocs. https://zensical.org/
This commit does not clean the doc structure, so site navigation is initially going to
be messy.
Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>
This is a suggestion from Choi, so we can easily test with a specific
kubectl version and also easily understand which kubectl version is
being used in case of failure.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
This image will be used by our helm charts to verify that a
kata-containers deployment is correct.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
OVMF build for Intel TDX (aka "TDVF") was disabled in favor of Ubuntu/
CentOS pre-upstream releases of Intel TDX.
See 4292c4c3b1.
It's time to re-enable the build and move runtime configurations to
use it (the latter will be done in a later commit).
This is a partial revert of 4292c4c3b with the following changes:
- Stop calling OVMF for Intel TDX "TDVF" and follow the naming distros
use for TDX enabled build: OVMF.inteltdx.fd.
- Single binary OVMF.inteltdx.fd is supported using -bios QEMU param.
- Secure Boot infrastructure is disabled since Kata does not support it.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Otherwise we may hit a `no space left on device` when building the rust
kata-deploy binary.
This happens mostly because of the muli-staging build used to generate a
distroless final container.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Let's shamelessly duplicate the nightly job to have at least nightly
runs using the rust implementation of kata-deploy.
The reason for doing that is to be pragmatic, as pragmatic as possible,
and avoid switching away of the scripts before 3.24.0 release, while
still testing both ways till the switch happens.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
We're only releasing those for amd64 as that's the only architecture
we've been building the packages for.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>