Commit Graph

13785 Commits

Author SHA1 Message Date
Fabiano Fidêncio
4cd048444d build: nvidia-gpu: Fix cache usage of the headers tarball
Whenever we count on having the headers tarball, we must unpack the
cached content into the expected directory, otherwise we'd simply fail,
as we've been failing in our CI, at the end of the process where we
generate the tarball from the cached components.

It's weird to me, sincerely, that the headers tarball end up in such
weird place (build/kernel-nvidia-gpu/builddir/), but I'll leave that to
Zvonko to figure out whether something better can be done, as the intuit
of this PR is simply unblock Kata Containers CI.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-11 17:59:53 +02:00
Zvonko Kaiser
693e307f72 deploy: Add artefact repository
New env var so everyone can test the PUSH_TO_REGISTRY feature

export PUSH_TO_REGISTRY=yes
export ARTEFACT_REGISTRY=quay.io
export ARTEFACT_REPOSITORY=my-fancy-kata-containers
export ARTEFACT_REGISTRY_USERNAME=zvonkok
export ARTEFACT_REGISTRY_PASSWORD=<super-secret>

make ...-tarball

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2024-05-10 16:41:52 +00:00
Zvonko Kaiser
4dea73b433
Merge pull request #9616 from zvonkok/nv-kernel-hotfix
deploy: Fix wrong pushing of artifacts
2024-05-10 18:38:09 +02:00
Zvonko Kaiser
4d0f42a145 deploy: Fix wrong pushing of artifacts
Added explicit case statements for nvidia-gpu and
nvidia-gpu-confidential

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2024-05-10 14:08:32 +00:00
Zvonko Kaiser
85374f55d2 gpu: Add build targets for GPU rootfs initrd/image
Preparation for complete GPU rootfs build step #1/#N

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2024-05-10 09:47:21 +00:00
Zvonko Kaiser
8ec2cc9c0d threat-model: Add VFIO, ACPI and KVM/VMM threat-model descriptions
We're missing several topics in the current threat model lets update.

Fixes: #8943

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2024-05-10 07:18:44 +00:00
Fabiano Fidêncio
20515fed70
Merge pull request #9484 from zvonkok/nvidia-runtimeclasses
deploy: Add runtimeClasses relating to the NVIDIA GPU
2024-05-10 03:52:12 +02:00
Gabriela Cervantes
80e551ea74 metrics: Update launch times script
This PR updates the launch times scripts by improving the variable
definition as well as trying to use the same format across all the script.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-05-09 21:29:32 +00:00
Emanuel Lima
59c1567f80 runtime-rs: Fix constructing the RTC struct
RTC was being built in a wrong fashion on commit #2bc5e3c6e2ab0145fa9e8be95df0d5086c07a517

RTC was being constructed inside the QemuCmdLine struct,
but it should've been built inside the devices vector.

Signed-off-by: Emanuel Lima <emlima@redhat.com>
2024-05-09 15:00:47 -03:00
Fabiano Fidêncio
2f686b1179
Merge pull request #9608 from fidencio/topic/tdx-depend-on-distro-host-stack-part-II
tdx: Adapt kata-deploy to use QEMU / OVMF from the distros
2024-05-09 10:25:19 +02:00
Zvonko Kaiser
da7e6a0f07
deploy: Add runtimeClasses relating to the NVIDIA GPU
Fixes: #9483

For the added configurations we need to provide runtimeClasses.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-09 10:00:59 +02:00
Fabiano Fidêncio
96a100f910
Merge pull request #9482 from zvonkok/kernel-headers-tarball
kernel: Add caching of kernel-headers
2024-05-09 09:58:30 +02:00
Fabiano Fidêncio
aba56a8adb
tests: measured-rootfs: Skip policy addition
Let's skip the policy addition for now, in order to get the TDX CI back
up and running, and then we can re-enable it as soon as we get
https://github.com/kata-containers/kata-containers/issues/9612 fixed.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-09 07:59:12 +02:00
Fabiano Fidêncio
77f457c0e1
runtime: tdx: Drop sept-ve-disable=on
This was needed when we were using an old (and not maintained anymore)
host stack.  Considering what we have as part of the distros, Today,
this can simply be dropped, as I cannot find any reference of this one
being needed in any up-to-date documentation.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-09 07:59:12 +02:00
Fabiano Fidêncio
416d00228c
Revert "qemu: tdx: Adapt command line" (partially)
This reverts commit b7cccfa019.

The `private=on` bit has never made its way upstream, and was removed
from the latest iteration that we're using.  With that in mind, let's
revert its usage in the code.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-09 07:59:12 +02:00
Fabiano Fidêncio
1c3037fd25
Revert "govmm: tdx: Expose the private=on|off knob"
This reverts commit 582b5b6b19.

The `private=on` bit has never made its way upstream, and was removed
from the latest iteration that we're using.  With that in mind, let's
revert its addition, and later on its usage in the code.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-09 07:59:12 +02:00
Fabiano Fidêncio
a9720495de
kata-deploy: Ensure the distro QEMU and OVMF are used for TDX
Here we're checking the distro's `/etc/os-release` or
`/usr/lib/os-release` in order to get which distro we're deploying the
Kata Containers artefacts to, and then to properly adjust the QEMU and
OVMF with TDX support that's been shipped with the distros.

Together with that, we're also printing the instructions provided by the
distro on how to enable and use TDX.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-09 07:59:12 +02:00
Fabiano Fidêncio
f48450b360
runtime: config: tdx: Add QEMU / OVMF placeholder var
Let's add the PLACEHOLDER_FOR_DISTRO_{QEMU,OVMF}_WITH_TDX_SUPPORT
variables instead of actually setting a path, so we can easily replace
those as part of our deployment scripts.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-09 07:59:12 +02:00
Fabiano Fidêncio
84b94dc2b1
kata-deploy: Expose /host to the daemon-set
We'll need to have access to the host os-release file (either under
`/etc/os-release` or under `/usr/lib/os-release`), and the simplest
approach that comes to my mind to do is doing what a debug pod would do,
mounting `/` as `/host` and then allowing us to have access to those
files, and then corectly set the TDX specific QEMU and OVMF (TDVF) paths
for the tdx available configurations.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-09 07:59:12 +02:00
Fabiano Fidêncio
f2d40da8e4
versions: build: Remove unused td-shim entry
We haven't been using nor testing with td-shim, as Cloud Hypervisor does
not officially support TDX yet, and TDVF is supposed to be used with
QEMU, instead of td-shim.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-09 07:59:12 +02:00
Fabiano Fidêncio
ea82740b19
versions: build: Remove TDX specific QEMU
Let's remove everything related to the TDX specific QEMU building /
shipping from our repo, as we'll be relying on the one coming from the
distros.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-09 07:59:12 +02:00
Fabiano Fidêncio
4292c4c3b1
versions: build: Remove TDX specific OVMF (TDVF)
Let's remove everything related to the TDVF building / shipping from our
repo, as we'll be relying on the one coming from the distro.

Later on, we may need to re-add TDVF logic, as we're already using
upstream edk2 repo / content, but when that's needed we'll simply revert
this commit.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-09 07:59:12 +02:00
Alex Lyn
946f0bdfff
Merge pull request #9609 from fidencio/topic/skip-pull-image-tests-on-tees
tests: pull-image: Don't run on TEEs
2024-05-09 08:22:55 +08:00
GabyCT
3b8a910393
Merge pull request #9596 from lifupan/main
db: fix the issue of failed to init pci root bus
2024-05-08 13:14:20 -06:00
Gabriela Cervantes
2fb406ed3a metrics: Fix random write value for FIO
This PR fixes the random write value for FIO for qemu by decreasing it
to avoid the random failures of the GHA CI.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-05-08 18:54:41 +00:00
Fabiano Fidêncio
142342012c
tests: pull-image: Don't run on TEEs
Let's skip those tests on TEEs as we've been facing a reasonable amount
of issues, most likely on the containerd side, related to pulling the
image on the guest.

Once we're able to fix the issues on containerd, we can get back and
re-enable those by reverting this commit.

The decision of disabling the tests for TEEs is because the machines may
end up in a state where human intervention is necessary to get them back
to a functional state, and that's really not optimal for our CI.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-08 18:40:22 +02:00
Fabiano Fidêncio
c0bf9e9bc6
Merge pull request #9607 from fidencio/topic/tdx-depend-on-distro-host-stack-part-I
ci: Stop building TDX specific QEMU and OVMF
2024-05-08 15:53:15 +02:00
Zvonko Kaiser
fb0b821771 kernel: Add caching of kernel-headers
Fixes: #9481

We need to cache the kernel-headers for the NVIDIA GPU initrd/image build.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2024-05-08 11:30:39 +00:00
Fabiano Fidêncio
12dc9f83df
ci: Stop building TDX specific QEMU and OVMF
This is the first step of the work to start relying on the artefacts
coming from the distros (CentOS 9 Stream, and Ubuntu) themselves.

Let's have this first one merged, as this will not run the CI due to the
changes being on the yaml itself, and then follow-up with the changes
needed on other parts of the project (kata-deploy, runtime, etc).

Fixes: #9590 -- part I

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-08 11:39:32 +02:00
Alex Lyn
875e6e3815
Merge pull request #9601 from cncal/fix_redundant_log
qemu: the error is logged only when it occurs
2024-05-08 08:59:01 +08:00
GabyCT
22087f9db9
Merge pull request #9598 from lifupan/main_shim
runtime-rs: fix the issue of the leak of dead shim
2024-05-07 10:14:11 -06:00
GabyCT
a564422b7b
Merge pull request #9582 from cncal/main
build: fix the confusing build message if yq doesn't exist in GOPATH/bin
2024-05-07 09:34:27 -06:00
Fabiano Fidêncio
cd84414c63
Merge pull request #9600 from GabyCT/topic/deleteoci
versions: Remove oci information from versions file
2024-05-07 13:15:35 +02:00
Fabiano Fidêncio
ddf6b367c7
Merge pull request #9568 from kata-containers/dependabot/go_modules/src/runtime/go_modules-22ef55fa20
build(deps): bump the go_modules group across 5 directories with 8 updates
2024-05-07 13:14:48 +02:00
Steve Horsman
e967db60ab
Merge pull request #9592 from sprt/mariner-before-ch39
tests: adapt Mariner CI to unblock CH v39 upgrade
2024-05-07 11:52:55 +01:00
cncal
15d511af97 qemu: the error is logged only when it occurs
Everytime I create contianer on arm64 machine, containerd/kata logs a redundant warning
as follows:
``` shell
time="2024-05-07" level=warning msg="<nil>" arch=arm64 name=containerd-shim-v2
pid=xxx sandbox=fdd1f05 source=virtcontainers/hypervisor
```
I added an error statement so that the error would be logged when it occurs.

Signed-off-by: cncal <flycalvin@qq.com>
2024-05-07 14:28:04 +08:00
Gabriela Cervantes
aecede11fc versions: Remove oci information from versions file
This PR removes oci information from versions file as this is not
longer being used in kata containers repository.

Fixes #9599

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-05-06 20:14:00 +00:00
Gabriela Cervantes
b54dc26073 gha: Enable uninstall kbs client function for coco gha workflow
This PR enables the uninstall kbs client function for coco gha tdx
workflow.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-05-06 15:55:24 +00:00
Gabriela Cervantes
aaf9b54d97 gha: Add support to install KBS to k8s TDX GHA workflow
This PR adds support to install KBS to k8s TDX GHA workflow in
order to run confidential attestation tests.

Fixes #9451

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-05-06 15:42:17 +00:00
Gabriela Cervantes
506e17a60d tests: Add k8s negative policy test
This PR adds a k8s negative policy test to the confidential attestation
bats test.

Fixes #9437

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-05-06 15:28:54 +00:00
Fupan Li
3694f3d9fe runtime-rs: fix the issue of the leak of dead shim
We should init and asign the runtime instance to runtime
handler, otherwise, if the pause container failed to start,
which means the runtime instance failed to start, then the
following delete & shutdown request wouldn't be run, thus
the dead shim would be left.

Fixes: #9597

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2024-05-06 17:31:31 +08:00
Fupan Li
26bee78e8d db: fix the issue of failed to init pci root bus
dragonball reserves 2048G of mmio space for the pci root bus by default
on physical addresses greater than 4G. However, for some machines with
smaller physical address widths, such as 39-bit wide physical addresses,
dragonball reserves the mmio space when initializing the memory. It is
less than 2048G, so this commit dynamically calculates and allocates the
mmio size of each pci root bus.

Fixes: #9509

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2024-05-06 11:34:18 +08:00
Aurélien Bombo
0cc2b07a8c tests: adapt Mariner CI to unblock CH v39 upgrade
The CH v39 upgrade in #9575 is currently blocked because of a bug in the
Mariner host kernel. To address this, we temporarily tweak the Mariner
CI to use an Ubuntu host and the Kata guest kernel, while retaining the
Mariner initrd. This is tracked in #9594.

Importantly, this allows us to preserve CI for genpolicy. We had to
tweak the default rules.rego however, as the OCI version is now
different in the Ubuntu host. This is tracked in #9593.

This change has been tested together with CH v39 in #9588.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2024-05-03 16:29:12 +00:00
cncal
48d873b52b build: fix the confusing build message if yq doesn't exist in GOPATH/bin
The build message shows that yq was not found when I tried to build
runtime binaries, but I've actually installed yq by yum install.

Signed-off-by: cncal <flycalvin@qq.com>
2024-05-03 08:34:45 +08:00
cncal
9caa7beb1f runtime: make kata-runtime check error more understandable
If device /dev/kvm does not exist, kata-runtime check would fail with
an ambiguous error messae 'no such file or directory'. I added a little
more details to make it understandable and it will belike:

```
ERRO[0000] cannot open kvm device: no such file or directory  arch=arm64 check-type=full device=/dev/kvm name=kata-runtime pid=2849085 source=runtime
ERRO[0000] no such file or directory                          arch=arm64 name=kata-runtime pid=2849085 source=runtime
no such file or directory
```

Signed-off-by: cncal <flycalvin@qq.com>
2024-05-03 08:29:08 +08:00
Zvonko Kaiser
e5e0983b56
Merge pull request #9476 from zvonkok/nvidia-config-tomls
config: Add NVIDIA GPU SNP, TDX configuration files
2024-05-02 10:27:10 +02:00
Fabiano Fidêncio
f04a7a55ed
Merge pull request #9563 from fidencio/topic/agent-use-policy-by-default
build: Build the shipped agent with policy enabled
2024-05-01 12:22:05 +02:00
Fabiano Fidêncio
33a8701904
Merge pull request #9573 from littlejawa/kata_deploy_crio_conf
kata-deploy: configure debugging for crio
2024-05-01 12:19:10 +02:00
Julien Ropé
c2aed995b7 kata-deploy: configure debugging for crio
Fix the configuration for crio's log_level

Fixes: #9556

Signed-off-by: Julien Ropé <jrope@redhat.com>
2024-04-30 17:48:43 +02:00
stevenhorsman
3c2232d898 runtime: fix testVersionString logic
- The testVersionString logic use regex to check that the ociVersion is
displayed correctly, but with the new go module that version has a
`+` in, so we need to quote this to escape special characters

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-04-30 10:54:49 +01:00