Commit Graph

13813 Commits

Author SHA1 Message Date
Fabiano Fidêncio
cffeb0ffb8
Merge pull request #9673 from fidencio/topic/revert-aks-workaround
Revert "ci: azure: Workaround azure cli installation script"
2024-05-20 16:16:55 +02:00
stevenhorsman
f271983aeb gha: release: Set inherit secrets on tarball builds
Now we have updated the release builds to push
artefacts to
our registry for the release, so we can cache the images, we need to
set `secrets: inherit` for all architecture's tarball builds
so that we can log into quay.io and ghcr in those steps

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-05-20 14:19:17 +01:00
Fabiano Fidêncio
25c9cf32ff
Revert "ci: azure: Workaround azure cli installation script"
This reverts commit 5ff53e4d1c, as the
script was fixed by MSFT, at least according to:
https://github.com/Azure/azure-cli/issues/28984

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-20 14:38:46 +02:00
vac (Brendan)
d812007b99 kata-deploy: Fix unbound VERSION_ID
VERSION_ID is not guaranteed to be specified in os-release, this
makes kaka-deploy breaks in rolling distros like arch linux and void
linux.

Note that operating system vendors may choose not to provide
version information, for example to accommodate for rolling releases.
In this case, VERSION and VERSION_ID may be unset.
Applications should not rely on these fields to be set.

Signed-off-by: vac <dot.fun@protonmail.com>
2024-05-20 19:48:31 +08:00
Fabiano Fidêncio
e8ebe18868
tests: k8s: tdx: Skip liveness probe test
This test doesn't fail with the guest image pulling, but it for sure
should. :-)

We can see in the bats logs, something like:
```
Events:
  Type     Reason     Age               From               Message
  ----     ------     ----              ----               -------
  Normal   Scheduled  31s               default-scheduler  Successfully assigned kata-containers-k8s-tests/liveness-exec to 984fee00bd70.jf.intel.com
  Normal   Pulled     23s               kubelet            Successfully pulled image "quay.io/prometheus/busybox:latest" in 345ms (345ms including waiting)
  Normal   Started    21s               kubelet            Started container liveness
  Warning  Unhealthy  7s (x3 over 13s)  kubelet            Liveness probe failed: cat: can't open '/tmp/healthy': No such file or directory
  Normal   Killing    7s                kubelet            Container liveness failed liveness probe, will be restarted
  Normal   Pulled     7s                kubelet            Successfully pulled image "quay.io/prometheus/busybox:latest" in 389ms (389ms including waiting)
  Warning  Failed     5s                kubelet            Error: failed to create containerd task: failed to create shim task: the file /bin/sh was not found: unknown
  Normal   Pulling    5s (x3 over 23s)  kubelet            Pulling image "quay.io/prometheus/busybox:latest"
  Normal   Pulled     4s                kubelet            Successfully pulled image "quay.io/prometheus/busybox:latest" in 342ms (342ms including waiting)
  Normal   Created    4s (x3 over 23s)  kubelet            Created container liveness
  Warning  Failed     3s                kubelet            Error: failed to create containerd task: failed to create shim task: failed to mount /run/kata-containers/f0ec86fb156a578964007f7773a3ccbdaf60023106634fe030f039e2e154cd11/rootfs to /run/kata-containers/liveness/rootfs, with error: ENOENT: No such file or directory: unknown
  Warning  BackOff    1s (x3 over 3s)   kubelet            Back-off restarting failed container liveness in pod liveness-exec_kata-containers-k8s-tests(b1a980bf-a5b3-479d-97c2-ebdb45773eff)
```

Let's skip it for now as we have an issue opened to track it down:
https://github.com/kata-containers/kata-containers/issues/9665

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-19 21:59:29 +02:00
Fabiano Fidêncio
a2c70222a8
tests: k8s: tdx: Skip initContainerd shared vol test
This is another one that is related to initContainers not being properly
handled with the guest image pulling.

Let's skip it for now as we have
https://github.com/kata-containers/kata-containers/issues/9668 to track
it down.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-19 20:58:45 +02:00
Fabiano Fidêncio
9d56145499
tests: k8s: tdx: Skip volume related tests
Similarly to firecracker, which doesn't have support for virtio-fs /
virtio-9p, TDX used with `shared_fs=none` will face the very same
limitations.

The tests affected are:
* k8s-credentials-secrets.bats
* k8s-file-volume.bats
* k8s-inotify.bats
* k8s-nested-configmap-secret.bats
* k8s-projected-volume.bats
* k8s-volume.bats

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-19 19:38:49 +02:00
Fabiano Fidêncio
606a62a0a7
tests: k8s: tdx: Skip "Setting sysctl" test
This test fails when using `shared_fs=none` with the nydus-snapshotter,
and we're tracking the issue here:
https://github.com/kata-containers/kata-containers/issues/9666

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-19 19:38:38 +02:00
Fabiano Fidêncio
937b2d5806
tests: k8s: tdx: Skip "Kill all processes in container" test
This test fails when using `shared_fs=none` with the nydus snapshotter,
and we're tracking the issue here:
https://github.com/kata-containers/kata-containers/issues/9664

For now, let's have it skipped.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-19 18:51:14 +02:00
Fabiano Fidêncio
03ce41b743
tests: k8s: tdx: Skip "Check custom dns" test
The test has been failing on TDX for a while, and an issue has been
created to track it down, see:
https://github.com/kata-containers/kata-containers/issues/9663

For now, let's have it skipped.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-19 18:51:14 +02:00
Fabiano Fidêncio
1a8a4d046d
tests: k8s: setup: Improve / Fix logs
Let's make sure the logs will print the correct annotation and its
value, instead of always mentioning "kernel" and "initrd".

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-19 18:51:14 +02:00
Fabiano Fidêncio
3f38309c39
tests: k8s: tdx: Stop running k8s-guest-pull-image.bats
We're doing that as all tests are going to be running with
`shared_fs=none`, meaning that we don't need any specific test for this
case anymore.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-19 18:51:00 +02:00
Fabiano Fidêncio
e84619d54b
tests: k8s: tdx: Add add_runtime_handler_annotations function
This function will set the needed annotation for enforcing that the
image pull will be handled by the snapshotter set for the runtime
handler, instead of using the default one.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-19 18:49:07 +02:00
Fabiano Fidêncio
f2de259387
runtime: tdx: Use shared_fs=none
We shouldn't be using 9p, at all, with TEEs, as off right now we have no
way to ensure the channels are encrypted.  The way to work this around
for now is using guest pull, either with containerd + nydus snapshotter
or with CRI-O; or even tardev snapshotter for pulling on the host (which
is the approach used by MSFT).

This is only done for TDX for now, leaving the generic, AMD, and IBM
related stuff for the folks working on those to switch and debug
possible issues on their environment.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-19 18:47:09 +02:00
Fabiano Fidêncio
5b257685d9
Merge pull request #9662 from dborquez/fix_launchtimes_timestamp_generation
Fix launch times timestamp generation.
2024-05-18 21:11:09 +02:00
Fabiano Fidêncio
94786dc939
Merge pull request #9659 from stevenhorsman/remove-non-printable-tag-characters
ci: cache: Filter out non-printable characters from tag
2024-05-18 14:47:07 +02:00
Fabiano Fidêncio
874cda0e51
Merge pull request #9655 from BbolroC/add-arch-to-initramfs
CI: Append arch type to initramfs-cryptsetup image
2024-05-18 14:31:57 +02:00
Malte Poll
babdab9078 genpolicy: detect empty string in ns as default
In Kubernetes, the following values for namespace are equivalent and all refer to the default namespace:

- ` ` (namespace field missing)
- `namespace: ""` (namespace field is the empty string)
- `namespace: "default"`(namespace field has the explicit value `default`)

Genpolicy currently does not handle the empty string case correctly.

Signed-Off-By: Malte Poll <1780588+malt3@users.noreply.github.com>
2024-05-18 12:44:59 +02:00
Fabiano Fidêncio
cbfdc70a55
Merge pull request #9613 from fidencio/topic/skip-pull-image-tests-on-tees-part-II
tests: pull-image: Only skip tests for TEEs
2024-05-18 03:31:38 +02:00
Archana Shinde
0e28e904e0 kata-manager: Install cni for containerd
When just containerd is installed without installing nerdctl,
cni plugins are missing from the installation.
containerd tarball does not include cni plugin files.
Hence install cni plugins separately for containerd.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2024-05-18 00:19:57 +00:00
Archana Shinde
d23d58a484 kata-manager: Copy cni files under /opt/cni
nerdctl requires cni plugins to be installed in /opt/cni/bin
Without bridge plugin installed, it is not possible to run a
container with nerdctl.
The downloaded nerdctl tarball contains cni plugin files, but are
extracted under /usr/local/libexec.
Copy extracted tarball cni files under /usr/local/libexec
to /opt/cni/bin

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2024-05-18 00:16:48 +00:00
David Esparza
938d3dc430
metrics: fix timestamps generation from launch times test.
Use `eval` to process the `date` command along with its parameters,
thus avoiding misinterpreting the parameters as commands.

Fixes: #9661

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2024-05-17 14:44:41 -06:00
David Esparza
bae377b42a
metrics: determine the realpath of kata-shim component.
Determine the realpath of kata-shim avoiding the check fails
in case the kata-shim is not a symlink, as was happening prior
to this commit.

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2024-05-17 14:40:02 -06:00
Fabiano Fidêncio
5ff53e4d1c
ci: azure: Workaround azure cli installation script
This is done in order to work around
https://github.com/Azure/azure-cli/issues/28984, following a suggestion
on the very same issue.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-17 20:28:24 +02:00
stevenhorsman
42fddb5530 ci: cache: Filter out non-printable characters from tag
- The tags have a trailing non-printable character, which results
in our cache tags having a trailing underscore e.g. `ghcr.io/kata-containers/cached-artefacts/agent:ce24e9835_`
For ease of use of these cached components, we should strip off the trailing underscore.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-05-17 14:16:40 +01:00
Hyounggyu Choi
961735a181 CI: Migrate vfio-ap test files from tests repo
An e2e test for `vfio-ap` has been conducted internally in IBM
due to the lack of publicly available test machines equipped
with a required crypto device.
The test is performed by the `tests` repository:
(i.e. 772105b560/Makefile (L144))

The community is working to integrate all tests into the `kata-containers`
repository, so the `vfio-ap` test should be part of that effort.

This commit moves a test script and Dockerfile for a test image from
the `tests` repository. We do not rename the script to `gha-run.sh`
because it is not executed by Github Actions' workflow.

You can check the test results from the s390x nightly test with the migrated files here:
https://github.com/kata-containers/kata-containers/actions/runs/9123170010/job/25100026025

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-05-17 14:59:16 +02:00
stevenhorsman
a92defdffe
tests: pull-image: Remove skips
Given that we think the containerd -> snapshotter image cache
problems have been resolved by bumping to nydus-snapshotter v0.3.13
we can try removing the skips to test this out

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-05-17 12:39:57 +02:00
stevenhorsman
7ac302e2d8
tests: Slacken guest pull rootfs count assert
- We previously have an expectation for the pause rootfs
to be pull on the host when we did a guest pull. We weren't
really clear why, but it is plausible related to the issues we had
with containerd and nydus caching. Now that is fixed we can begin
to address this with setting shared_fs=none, but let's start with
updating the rootfs host check to be not higher than expected

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-05-17 12:39:56 +02:00
Fabiano Fidêncio
67ff58251d
tests: confidential_common: Remove unneeded ensure_yq call
This test is called from `tests/integration/run_kuberentes_tests.sh`,
which already ensures that yq is installed.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-17 12:39:56 +02:00
Fabiano Fidêncio
cc874ad5e1
tests: confidential: Ensure those only run on TEEs
Running those with the non-TEE runtime classes will simply fail.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-17 12:39:56 +02:00
Fabiano Fidêncio
2bc5b1bba2
tests: pull-image: Only skip tests for TEEs
On 1423420, I've mistakenly disabled the tests entirely, for both
non-TEEs and TEEs.

This happened as I didn't realise that `confidential_setup` would take
non-TEEs into consideration. :-/

Now, let me follow-up on that and make sure that the tests will be
running on non-TEEs.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-17 12:39:56 +02:00
Fabiano Fidêncio
d875f89fa2
tests: Add is_confidential_hardware()
This function is a helper to check whether the KATA_HYPERVISOR being
used is a confidential hardware (TEE) or not, and we can use it to
skip or only run tests on those platforms when needed.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-17 12:39:56 +02:00
Fabiano Fidêncio
4a04a1f2ae
tests: Re-work confidential_setup()
Let's rename it to `is_confidential_runtime_class`, and adapt all the
places where it's called.

The new name provides a better description, leading to a better
understanding of what the function really does.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-17 12:39:56 +02:00
Pavel Mores
b9febc4458 runtime-rs: document architecture & implementation conventions in qemu-rs
Implementation of QemuCmdLine has a fairly uniform and repetitive structure
that's guided by a set of conventions.  These conventions have however been
mostly implicit so far, leading to a superfluous and annoying
request/force-push churn during qemu-rs PR reviews.

This commit aims to make things explicit so that contributors can take them
into account before an initial PR submission.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-05-17 12:21:44 +02:00
Hyounggyu Choi
3917930a76 CI: Append arch type to initramfs-cryptsetup image
This commit is to append an arch type to the initramfs-cryptsetup image
to prevent a wrong arch image from being pulled on a different arch host.

Fixes: #9654

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-05-17 11:42:49 +02:00
Steve Horsman
9a6d8d8330
Merge pull request #9650 from stevenhorsman/caching-tagging-update-partIII
Caching tagging update part iii
2024-05-17 09:09:15 +01:00
stevenhorsman
ce24e98358 ci: cache: Add tag character filtering
- Container image tags can only contain alphanumeric, period,
hyphen and underscore characters, so convert characters outside
of these to be underscores, to avoid having invalid tag failures

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-05-16 21:38:07 +01:00
stevenhorsman
a98b1e3afb ci: cache: Integrate tagging updates with recent changes
Recently the extra gpu caching was added, unfortunately when I
rebased I ended up with both the new tagging logic and old logic.
Let's try and integrate them properly to avoid doing the push twice.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-05-16 21:38:07 +01:00
Lukáš Doktor
f994f79078
ci.ocp: Add steps to reproduce/bisect CI runs
in case the upstream CI fails it's useful to pin-point the PR that
caused the regression. Currently openshift-ci does not allow doing that
from their setup but we can mimic the setup on our infrastructure and
use the available kata-deploy-ci images to find the first failing one.
To help with that add a few helper scripts and a howto.

Fixes: #9228

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2024-05-16 20:20:05 +02:00
Lukáš Doktor
a556ad7e01
ci.ocp: Document how to run openshift-tests with kata
document the ocp pipeline.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2024-05-16 20:15:32 +02:00
Lukáš Doktor
ea081bd882
ci.ocp: Add webhook cleanup
cleanup the webhook resources as well.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2024-05-16 20:15:31 +02:00
David Esparza
029a6de52b
Merge pull request #9615 from GabyCT/topic/fixlaunchtime
metrics: Update launch times script
2024-05-16 11:28:44 -06:00
Steve Horsman
33e6b241ba
Merge pull request #9647 from stevenhorsman/fix-artefact-tags-unbound-variable
ci: cache: Fix unbound variable
2024-05-16 16:22:47 +01:00
stevenhorsman
9d9487b17f ci: cache: Fix unbound variable
Now we have the workflow updated and can test the changes in caching
we've hit an error:
```
line 1180: artefact_tag: unbound variable
```
so we need to fix that up. Sorry for missing this before.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-05-16 14:30:32 +01:00
Steve Horsman
03c08583c3
Merge pull request #9644 from stevenhorsman/fix-broken-workflow
workflow: Remove if from env conditional
2024-05-16 14:13:25 +01:00
stevenhorsman
f7fd2f9a5d workflow: Fix problems with build-asset workflows
- It appears like the `if` isn't required when setting env as a
conditional
- `inputs.stage` over input.stage
- Swap matrix.component to matrix.asset

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-05-16 11:51:46 +01:00
Steve Horsman
d8468cb178
Merge pull request #9550 from stevenhorsman/tag-component-caches
Tag component caches
2024-05-16 11:05:18 +01:00
Steve Horsman
b31ff09b8d
Merge pull request #9617 from zvonkok/artefact-repository
deploy: Add artefact repository
2024-05-16 10:41:23 +01:00
Fabiano Fidêncio
4d073c837d
Merge pull request #9636 from ChengyuZhu6/snapshotter
version: Bump nydus snapshotter to v0.13.13
2024-05-16 02:54:53 +02:00
GabyCT
05cc8fae5e
Merge pull request #9610 from GabyCT/topic/fixrwfio
metrics: Fix random write value for FIO
2024-05-15 17:44:41 -06:00