Commit Graph

1835 Commits

Author SHA1 Message Date
Fabiano Fidêncio
fb43d3419f build: Fix nvidia kernel breakage
On commit 9602ba6ccc, from February this
year, we've introduced a check to ensure that the files needed for
signing the kernel build are present. However, we've noticed last week
that there were a reasonable amount of wrong assumptions with the
workflow. :-)

Zvonko fixed the majority of those, but this bit was left and it'd cause
breakages when using kernel that was cached ... although passing when
building new kernels.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-10-13 19:28:40 +02:00
Zvonko Kaiser
b00013c717 kernel: Add KBUILD_SIGN_PIN pass through
This is needed to the kernel setup picks up the correct
config values from our fragments directories.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-10-10 15:45:34 -04:00
Zvonko Kaiser
37bd5e3c9d gpu: Add kernel CONFIG check
We need to make sure that the kernel we're using has the
correct configs set, otherwise the module signing will not work.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-10-10 15:45:34 -04:00
Fabiano Fidêncio
496e255ea2 build: Fix KBUILD_SIGN_PIN usage
What was done in the past, trying to set the env var on the same step
it'd be used, simply does not work.

Instead, we need to properly set it through the `env` set up, as done
now.

We're also bumping the kata_config_version to ensure we retrigger the
kernel builds.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-10-10 15:25:10 +02:00
Fabiano Fidêncio
a1f90fe350 tests: k8s: Unify k8s TEE tests
There's no reason to have the code duplication between the SNP / TDX
tests for CoCo, as those are basically using the same configuration
nowadays.

Note that for the TEEs case, as the nydus-snapshotter is deployed by the
admin, once, instead of deploying it on every run ... I'm actually
removing the nydus-snapshotter steps so we make it clear that those
steps are not performed by the CI.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-10-10 09:51:59 +02:00
Zvonko Kaiser
91739d4425 gpu: PPCIE support DGX like systems
For DGX like systems we need additional binaries and libraries,
enable the Kata AND CoCo use-case.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>

Update tools/osbuilder/rootfs-builder/nvidia/nvidia_rootfs.sh

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-09 00:00:12 +00:00
Aurélien Bombo
07645cf58b ci: actionlint: Address issues and set as required
Address issues just introduced and set actionlint as a required by removing
the path filter.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2025-10-08 16:55:27 -05:00
Aurélien Bombo
b3a551d438 ci: zizmor: Reestablish as required test
We can re-require this now that we've addressed all the issues.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2025-10-08 16:55:27 -05:00
Fabiano Fidêncio
dbb1eb959c kata-deploy: Allow users to set experimental_force_guest_pull
For those who are not willing to use the nydus-snapshotter for pulling
the image inside the guest, let's allow them setting the
experimetal_force_guest_pull, introduced by Edgeless, as part of our
helm-chart.

This option can be set as:
_experimentalForceGuestPull: "qemu-tdx,qemu-coco-dev"

Which would them ensure that the configuration for `qemu-tdx` and
`qemu-coco-dev` would have the option enabled.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-10-08 17:43:09 +02:00
Fabiano Fidêncio
8c4bad68a8 kata-deploy: Remove kustomize yamls, rely on helm-chart only
As the kata-deploy helm chart has been the only way we've been testing
kata-containers deployment as part of our CI, it's time to finally get
rid of the kustomize yamls and avoid us having to maintain two different
methods (with one of those not being tested).

Here I removed:
* kata-deploy yamls and kustomize yamls
* kata-cleanup yamls and kustomize yamls
* kata-rbac yals and kustomize yamls
* README.md for the kustomize yamls was removed

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-10-08 16:54:19 +02:00
Zvonko Kaiser
59b4e3d3f8 gpu: Add CONFIG_FW_LOADER to the kernel
We need it for the newer CC kernel

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-10-08 10:01:27 +02:00
Zvonko Kaiser
7061f64db5 gpu: Fix confidential build
NVRC introduced the confidential feature flag and we
haven't updated the rootfs build to accomodate.
If rootfs_type==confidential user --feature=confidential

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-10-08 10:01:27 +02:00
Zvonko Kaiser
2260f66339 gpu: Some fixes regarding the rootfs v580
With the 580 driver version we need new dependencies
in the rootfs.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-10-08 10:01:27 +02:00
Szymon Klimek
8dc6b24e7d kata-deploy: accept 25.10 as supported distro for TDX
Canonical TDX release is not needed for vanilla Ubuntu 25.10 but
GRUB_CMDLINE_LINUX_DEFAULT needs to contain `nohibernate` and
`kvm_intel.tdx=1`

Signed-off-by: Szymon Klimek <szymon.klimek@intel.com>
2025-10-07 23:41:52 +02:00
Fabiano Fidêncio
000c9cce23 kata-deploy: chart: Add _experimentalSetupSnapshotter
Let's expose the EXPERIMENTAL_SETUP_SNAPSHOTTER script environment
variable to our chart, allowing then users of our helm chart to take
advantage of this experimental feature.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-10-07 10:32:46 +02:00
Fabiano Fidêncio
d6a1881b8b kata-deploy: scripts: Allow setting up multiple snapshotters
We may deploy in scenarios where we want to have both snapshotters set
up, sometimes even for simple test on which one behaves better.

With this in mind, let's allow EXTERNAL_SETUP_SNAPSHOTTER to receive a
comma separated list of snapshotters, such as:
```
EXPERIMENTAL_SETUP_SNAPSHOTTER="erofs,nydus"
```

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-10-07 10:32:46 +02:00
Fabiano Fidêncio
445af6c09b kata-deploy: scripts: Allow deploying erofs-snapshotters
Similarly to what's been done for the nydus-snapshotter, let's allow
users to have erofs-snapshotter set up by simply passing:
```
EXPERIMENTAL_SETUP_SNAPSHOTTER="erofs".
```

Mind that erofs, although a built-in containerd snapshotter, has system
depdencies that we will *NOT* install and it's up to the admin to do so.
These dependencies are:
* erofs-utils
* fsverity
* erofs module loaded

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-10-07 10:32:46 +02:00
Fabiano Fidêncio
2e0ce2f39f kata-deploy: scripts: Allow deploying nydus-snapshotter
Let's introduce a new EXPERIMENTAL_SETUP_SNAPSHOTTER environemnt
variable that, when set, allows kata-deploy to put the nydus snapshotter
in the correct place, and configure containerd accordingly.

Mind, this is a stop gap till the nydus-snapshotter helm chart is ready
to be used and behaving well enough to become a weak dependency of our
helm chart.  When that happens this code can be deleted entirely.

Users can have nydus-snapshotter deployed and configured for the
guest-pull use case by simply passing:
```
EXPERIMENTAL_SETUP_SNAPSHOTTER="nydus"
```

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-10-07 10:32:46 +02:00
Fabiano Fidêncio
1e2c86c068 kata-deploy: scripts: Only add conf file to the imports once
Otherwise we'd end up adding a the file several times, which could lead
to problems when removing the entry, leading to containerd not being
able to start due to an import file not being present.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-10-07 10:32:46 +02:00
Fabiano Fidêncio
300f7e686e build: Fix initramfs build
We have noticed in the CI that the `gen_init_cpio ...` was returning 255
and breaking the build. Why? I am not sure.

When chatting with Steve, he suggested to split the command, so it'd be
easier to see what's actually breaking. But guess what? There's no
breakage when we split the command.

So, let's try it out and see whether the CI passes after it.

If someone is willing to educate us on this one, please, that would be
helpful! :-)

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-10-02 20:58:22 +02:00
Zvonko Kaiser
2693daf503 gpu: Install dcgm export from the CUDA repo
Do not use the repo to install the exporter,
we rely on the version tested with Ubuntu <version>

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-10-02 18:05:13 +02:00
Zvonko Kaiser
56c6512781 gpu: Bump to noble and rearrange repos
Moving the CUDA repo to the top for all essential packages
and adding a repo priority favouring NVIDIA based repos.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-10-02 18:05:13 +02:00
Manuel Huber
e36f788570 kernel: add required configs for openvpn support
Currently, use of openvpn clients/servers is not possible in Kata UVMs.
Following error message can be expected:
ERROR: Cannot open TUN/TAP dev /dev/net/tun: No such device (errno=19)

To support opevpn scenarios using bridging and TAP, we enable various
kernel networking config options.

Signed-off-by: Manuel Huber <mahuber@microsoft.com>
2025-10-02 11:40:49 +02:00
Zvonko Kaiser
3743eb4cea gpu: Add ligcc for RUST libc=gnul builds
Since we cannot build all components with libc=musl and
static RUSTFLAG we still need to ship libcc for AA or other guest
components.

Without this change the guest components do not work and we see

/usr/local/bin/attestation-agent: error while loading shared
libraries: libgcc_s.so.1: cannot open shared object file: No such file or directory

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-09-26 15:08:58 -04:00
Aurélien Bombo
0b40ad066a gha: Set Zizmor check as non-required
As a consequence of moving away from Advanced Security for Zizmor, it now
checks the entire codebase and will error out on this PR and future.

To be reverted once we address all Zizmor findings in a future PR.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2025-09-25 10:50:49 -05:00
stevenhorsman
c2b0650491 release: Bump version to 3.21.0
Bump VERSION and helm-chart versions

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-09-23 20:59:00 +02:00
Steve Horsman
3e67f92e34 Merge commit from fork
Fix malicious host can circumvent initdata verification on TDX
2025-09-23 13:31:29 +01:00
Mikko Ylinen
5cb1332348 build: enable nvidia-attester for coco-guest-components
coco-guest-components tarball is used as is for both vanilla coco
rootfs and the nvidia enabled rootfs. nvidia-attester can be built
without nvml so make it globally enabled for coco-guest-components.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2025-09-23 12:38:32 +03:00
Zvonko Kaiser
e6f12d8f86 gpu: Add latest driver per default
Lets make sure that we use latest driver for CI and release.
There was a sort step missing.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-09-20 23:50:35 +00:00
Fabiano Fidêncio
54e8081222 qemu: Fix submodules location change
The submodule change led to a breakage on our build of QEMU.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2025-09-20 22:12:27 +02:00
Fabiano Fidêncio
d056fb20fe initramfs: Enforce --panic-on-corruption for veritysetup
Let's enforce an error on veritysetup in case there's any tampering with
the rootfs.

Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
2025-09-16 21:35:00 +02:00
Mikko Ylinen
86fe419774 versions: update kernel-confidential to Linux v6.16.7
update to the latest available v6.16 stable series kernel for CoCo.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2025-09-15 20:29:22 +02:00
Fabiano Fidêncio
d741544fa6 kata-deploy: Don't fail if the runtimeclass is already deleted
I've hit this when using a machine with slow internet connection, which
took ages to download the kata-cleanup image, and then helm timed out in
the middle of the cleanup, leading to the cleanup job being restarted
and then bailing with an error as the runtimeclasses that kata-deploy
tries to delete were already deleted.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2025-09-15 15:27:54 +02:00
Alex Tibbles
66a3d4b4a2 versions: bump kernel to 6.12.47
Update LTS kernel to latest.

Signed-off-by: Alex Tibbles <alex@bleg.org>
2025-09-15 14:19:48 +02:00
Alex Tibbles
710c117a24 version: Bump QEMU to v10.1.0
A minor release of QEMU is out, so update to it for fixes and features.

QEMU changelog: https://wiki.qemu.org/ChangeLog/10.1

Notes:
* AVX support is not an option to be enabled / disabled anymore.
* Passt requires Glibc 2.40.+, which means a dependency on Ubuntu 25.04
  or newer, thus we're disabling it.

Signed-off-by: Alex Tibbles <alex@bleg.org>
2025-09-15 14:19:25 +02:00
Dan Mihai
5d59341f7f Merge pull request #11780 from ryansavino/snp-guest-kernel-upgrade-issue
packaging: add required modules for confidential guest kernel
2025-09-10 18:21:26 -07:00
Ruoqing He
6a2d813196 ci: gatekeeper: Mark make test libs not required
There are still some issues to be address before we can mark `make test`
for `libs` as required. Mark this case as not required temporarily.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-09-10 03:52:20 +00:00
Ryan Savino
85779a6f1a packaging: add required modules for confidential guest kernel
SNP launch was failing after the confidential guest kernel was upgraded to 6.16.1.
Added required module CONFIG_MTRR enabled.
Added required module CONFIG_X86_PAT enabled.

Fixes: #11779

Signed-off-by: Ryan Savino <ryan.savino@amd.com>
2025-09-09 21:58:15 -05:00
Cameron Baird
d16026f7b9 kernel: add required configs for ip6tables support
Currently, the UVM kernel fails for istio deployments (at least with the
version we tested, 1.27.0). This is because the istio sidecar container
uses ip6tables and the required kernel configs are not built-in:

```
iptables binary ip6tables has no loaded kernel support and cannot be used, err: exit status 3 out: ip6tables v1.8.10 (legacy):
can't initialize ip6tables table `filter': Table does not exist (do you need to insmod?)
Perhaps ip6tables or your kernel needs to be upgraded.
```

Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
2025-09-04 07:18:45 +02:00
Fabiano Fidêncio
e396a460bc Revert "local-build: Enforce USE_CACHE=no"
This reverts commit cb5f143b1b, as the
cached packages have been regenerated after the switch to using zstd.

Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
2025-08-22 14:03:36 +02:00
Steve Horsman
23d2dfaedc Merge pull request #11707 from fidencio/topic/switch-to-use-zstd-when-possible
kata-deploy: local-build: Use zstd instead of xz
2025-08-22 10:06:00 +01:00
stevenhorsman
381da9e603 versions: Bump golang to 1.24.6
golang 1.25 has been released, so 1.23 is EoL,
so we should update to ensure we don't end up with security issues

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-08-22 10:44:15 +02:00
Fabiano Fidêncio
cb5f143b1b local-build: Enforce USE_CACHE=no
We need that to regenerate the tarballs that are already cached in the
zstd format.

Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
2025-08-21 21:00:20 +02:00
Fabiano Fidêncio
f8d7ff40b4 local-build: Fix shim-v2 no cache build with measured rootfs
We need to get the root_hash.txt file from the image build, otherwise
there's no way to build the shim using those values for the
configuration files.

Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
2025-08-21 19:56:01 +02:00
Fabiano Fidêncio
ad240a39e6 kata-deploy: tools: tests: Use zstd instead of xz
Although the compress ratio is not as optimal as using xz, it's way
faster to compress / uncompress, and it's "good enough".

This change is not small, but it's still self-contained, and has to get
in at once, in order to help bisects in the future.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2025-08-21 19:53:55 +02:00
Fabiano Fidêncio
9cc97ad35c kata-deploy: Bump image to use alpine 3.22
As 3.18 is already EOL.

We need to add `--break-system-packages` to enforce the install of the
installation of the yq version that we rely on.  The tests have shown
that no breakage actually happens, fortunately.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2025-08-21 19:53:55 +02:00
Fabiano Fidêncio
c32fc409ec rootfs-builder: Bump alpine to 3.22
As we were using a very old non-supported version.

Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
2025-08-21 19:53:55 +02:00
Zvonko Kaiser
c980b6e191 release: Bump version to 3.20.0
Bump VERSION and helm-chart versions

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-08-20 18:18:05 +02:00
Fabiano Fidêncio
dd1752ac1c Merge pull request #11634 from mythi/coco-kernel-v6.16
versions: update kernel-confidential to Linux v6.16.1
2025-08-20 13:01:05 +02:00
Fupan Li
29ab8df881 Merge pull request #11514 from Apokleos/ci-for-libs
CI: Introduce CI for libs to Improve code quality and reduce noises
2025-08-20 18:59:27 +08:00