Commit Graph

650 Commits

Author SHA1 Message Date
Steve Horsman
256ab50f1a Merge pull request #9959 from sprt/fix-ci-cleanup
ci: cleanup: Ignore nonexisting resources
2024-07-18 19:23:48 +01:00
Wainer dos Santos Moschetta
66c600f8d8 gha: delint the s390x workflow
Made run-k8s-tests-on-zvsi.yaml free of warnings by removing:

SC2086:info:1:1: Double quote to prevent globbing and word splitting ...
SC2086:info:2:1: Double quote to prevent globbing and word splitting ...

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2024-07-16 15:20:46 -03:00
Wainer dos Santos Moschetta
a98985fab8 gha: export user/password for auth registry tests on s390x
Counterpart of commit d8961cbd4a for run-k8s-tests-on-zvsi workflow

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2024-07-16 15:18:40 -03:00
Fupan Li
a7179be31d Merge pull request #9534 from Tim-Zhang/fix-stdin-stuck
Fix ctr exec stuck problem
2024-07-15 13:19:19 +08:00
Gabriela Cervantes
b6b8524ab7 gha: Increase timeout to run CoCo TDX tests
This PR increases the timeout to run the CoCo TDX tests in order
to avoid the random failures on TDX saying that
The action 'Run tests' has timed out after 30 minutes and making
the GHA job fail.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-07-10 16:06:07 +00:00
Steve Horsman
78bbc51ff0 Merge pull request #9806 from niteeshkd/nd_snp_certs
runtime: pass certificates to get extended attestation report for SNP coco
2024-07-10 08:57:45 +01:00
Steve Horsman
29413021e5 Merge pull request #9981 from stevenhorsman/run-k8s-tests-on-zvsi-inherit-secrets
gha: make run-k8s-tests-on-zvsi inherit secrets
2024-07-10 08:49:11 +01:00
Niteesh Dubey
e7b4e5e386 gha: add SNP attestation test
This tests the attestation of SNP guest.

Signed-off-by: Niteesh Dubey <niteesh@us.ibm.com>
2024-07-09 17:14:26 +00:00
stevenhorsman
c7cf26fa32 gha: make run-k8s-tests-on-zvsi inherit secrets
run-k8s-tests-on-zvsi runs the coco tests and we've added new
secrets to provide credentials for the authenticated image testing,
so we need to let the zvsi job inherit these from the caller workflow
like the rest of the coco tests

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-07-09 15:29:48 +01:00
Tim Zhang
704da86e9b CI: Add tests for stdio
Add tests for stdio

Signed-off-by: Tim Zhang <tim@hyper.sh>
2024-07-09 11:44:40 +08:00
Aurélien Bombo
f18e35014f ci: Move run-nerdctl-tests to free runner
See #9940.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2024-07-03 14:58:11 +00:00
Aurélien Bombo
c0919d6f45 ci: Move run-docker-tests to free runner
Removed the Docker installation step as that's preinstalled in free
runners.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2024-07-03 14:57:59 +00:00
Aurélien Bombo
743a765525 ci: Move run-runk to free runner
See #9940.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2024-07-03 14:57:48 +00:00
Aurélien Bombo
09cce86cc7 ci: Move run-nydus to free runner
See #9940.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2024-07-03 14:57:42 +00:00
Aurélien Bombo
9e1b6064dc ci: Move run-containerd-stability to free runner
Removes the Docker installation step as that's preinstalled on the free
runner:

https://github.com/actions/runner-images/blob/main/images/ubuntu/Ubuntu2204-Readme.md#tools

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2024-07-03 14:57:37 +00:00
Aurélien Bombo
6a0e403acf ci: Move run-cri-containerd to free runner
See #9940.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2024-07-03 14:57:29 +00:00
Aurélien Bombo
eda5d2c623 ci: cleanup: Run every 24 hours instead of 6 hours
Resources don't fail to get deleted as often to need to run every 6
hours.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2024-07-02 22:27:58 +00:00
Steve Horsman
078a1147a6 Merge pull request #9909 from kata-containers/sprt/gha-cleanup-pt2
ci: Add scheduled job to cleanup resources, pt. II
2024-07-02 17:12:03 +01:00
Fabiano Fidêncio
ef3f6515cf Merge pull request #9941 from sprt/temp-disable-test
ci: Temporarily disable kata-deploy and GARM tests
2024-07-02 14:13:46 +02:00
stevenhorsman
16130e473c gha: ci: Remove incorrect secrets line
The CI is failing with:
```
Invalid workflow file: .github/workflows/cleanup-resources.yaml#L10
The workflow is not valid. .github/workflows/cleanup-resources.yaml (Line: 10, Col: 5): Unexpected value 'secrets'
```
I think this is because `secrets: inherit` is only applicable
when re-using a workflow, not for a standalone job like
we have here.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-07-01 09:32:58 +01:00
Aurélien Bombo
c605fff4c1 ci: Temporarily disable kata-deploy and GARM tests
Per the decision taken in the 6/27 AC meeting, this PR temporarily
disables kata-deploy and GARM tests until we secure further Azure CI
funding.

In the meantime, I'll transition the GARM tests to free runners and
reenable them to regain that coverage without affecting spending (see
#9940). If it turns out the free runners are too slow, we'll switch back
to GARM.

After funding is secured, we'll reenable the kata-deploy tests (see
#9939).

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2024-06-28 20:23:07 +00:00
Hyounggyu Choi
dd23beeb05 CI: Eliminating dependency on clone_tests_repo()
As part of archiving the tests repo, we are eliminating the dependency on
`clone_tests_repo()`. The scripts using the function is as follows:

- `ci/install_rust.sh`.
- `ci/setup.sh`
- `ci/lib.sh`

This commit removes or replaces the files, and makes an adjustment accordingly.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-06-28 14:52:02 +02:00
Aurélien Bombo
2c89828749 ci: Add scheduled job to cleanup resources, pt. II
Follow-up to #9898 and final PR of this set. This implements the actual
deletion logic.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2024-06-26 17:36:47 +00:00
Steve Horsman
c9df743dab Merge pull request #9898 from sprt/gha-cleanup-job
ci: Add scheduled job to cleanup resources, pt. I
2024-06-25 19:11:30 +01:00
Aurélien Bombo
d60b548d61 ci: Add scheduled job to cleanup resources
This is the first part of adding a job to clean up potentially dangling
Azure resources. This will be based on Jeremi's tool from
https://github.com/jepio/kata-azure-automation.

At first, we'll only clean up AKS clusters, as this is what has been
causing us problems lately, but this could very well be extended to
cleaning up entire resource groups, which is why I left the different
names pretty generic (i.e. "resources" instead of "clusters").

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2024-06-25 16:33:03 +00:00
stevenhorsman
d8961cbd4a workflow: coco: Add auth registry secret
- Add the `AUTHENTICATED_IMAGE_USER` and
`AUTHENTICATED_IMAGE_PASSWORD` repository secrets as env vars
to the coco tests, so we can use them to pull an images from
and authenticated registry for testing

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-06-25 11:11:02 +01:00
Wainer Moschetta
bcdc4fde10 Merge pull request #9857 from wainersm/disable_failing_jobs-part2
CI: disable jobs that failed >= 50% on nightly CI recently - part 2
2024-06-24 10:11:05 -03:00
Fabiano Fidêncio
3adf9e250f Merge pull request #9875 from zvonkok/gha-no-sudo-arm64
ci: gha no sudo arm64
2024-06-21 15:28:54 +02:00
Fabiano Fidêncio
2d552800f2 Merge pull request #9876 from zvonkok/gha-no-sudo-s390x
ci: remove sudo from s390x build
2024-06-21 15:00:31 +02:00
GabyCT
9320c2e484 Merge pull request #9845 from GabyCT/topic/fixartifacts
gha: Do not fail when collecting artifacts
2024-06-20 10:15:53 -06:00
Fabiano Fidêncio
2bab0f31d7 ci: tdx: Re-enable TDX CI
Now, using vanilla kubernetes, let's re-enable the TDX CI and hope it
becomes more stable than it used to be.

The cleanup-snapshotter is now taking ~4 minutes, and that matches with
the other platforms, mainly considering there's a sum of 210 seconds
sleep in the process.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-06-19 20:08:28 +02:00
Fabiano Fidêncio
7127178acc ci: tdx: Use vanilla k8s instead of k3s
We've noticed a bunch of issues related to deploying and deleting the
nydus-snapshotter.  As we don't see the same issues on other machines
using vanilla kubernetes, let's avoid using k3s for now follow the flow.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-06-19 16:56:15 +02:00
Zvonko Kaiser
d783ddaf03 ci: Remove not needed chown for ppc64
Now that all artifacts are owned by $USER no extra step needed
to adjust ownership

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2024-06-19 07:56:45 +00:00
Zvonko Kaiser
5bc37e39d5 ci: remove sudo from ppc64 build
We can now do the same for ppc64 that we did for amd64 and remove
the sudo cp.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2024-06-19 07:55:45 +00:00
Zvonko Kaiser
c341234c0b ci: remove sudo from s390x build
We can now do the same for s390x that we did for amd64 and remove
the sudo cp.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2024-06-19 07:53:33 +00:00
Zvonko Kaiser
3beb460a97 ci: Remove not needed chown for arm64
Now that all artifacts are owned by $USER no extra step needed
to adjust ownership

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2024-06-19 07:48:00 +00:00
Zvonko Kaiser
445b389b16 ci: remove sudo from arm64 build
We can now do the same for arm64 that we did for amd64 and remove
the sudo cp.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2024-06-19 07:46:51 +00:00
Gabriela Cervantes
eeb467bdc2 gha: Do not fail when collecting artifacts
This PR will avoid the failures when collecting artifacts for the gha.
This will ensure that we collect and archive system's data for the
purpose of debugging.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-06-18 16:05:23 +00:00
Fabiano Fidêncio
587f4d45de ci: tdx: Disable TDX CI
TDX CI has been having some issues with the Nydus snapshotter cleanup,
which has been stuck for hours depending every now and then.

With this in mind, let's disable the TDX CI, so we avoid it blocking the
progress of Kata Containers project, and we re-enable it as soon as we
have it solved on Intel's side.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-06-18 10:30:40 +02:00
Wainer Moschetta
7df221a8f9 Merge pull request #9833 from wainersm/qemu-rs_tests
tests/k8s: run for qemu-runtime-rs on AKS
2024-06-17 16:59:46 -03:00
Wainer dos Santos Moschetta
08eaa60b59 CI: disable all run-kata-deploy-tests-on-garm jobs
The following jobs have failed more than 50% on nightly CI.

run-kata-deploy-tests-on-garm / run-kata-deploy-tests (clh, k0s)
run-kata-deploy-tests-on-garm / run-kata-deploy-tests (clh, rke2)
run-kata-deploy-tests-on-garm / run-kata-deploy-tests (qemu, k0s)

Instead of removing only those jobs, let's skip the kata-deploy-tests
on GARM completely so we can try to fix all the issues (or maybe
drop the jobs altogether).

Issue: #9854
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2024-06-17 14:39:38 -03:00
Zvonko Kaiser
5c2f3f34a8 CI: remove sudo from GHA
Now that all artifacts are owned by $USER we can start
to remove sudo from our GHA

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2024-06-17 11:06:56 +00:00
Wainer dos Santos Moschetta
d4f664b73b CI: disable run-kata-monitor-tests / run-monitor (containerd, lts) job
The job has failed more than 50% on nightly CI. Remove it from the list of
execution until we don't have a fix.

Issue: #9853
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2024-06-14 16:27:04 -03:00
Wainer dos Santos Moschetta
cbf0b7ca7b CI: disable run-basic-amd64-tests / run-nerdctl-tests (clh) job
The job has failed more than 50% on nightly CI. Remove it from the list of
execution until we don't have a fix.

Issue: #9852
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2024-06-14 16:17:26 -03:00
Wainer dos Santos Moschetta
562820449e CI: disable run-basic-amd64-tests / run-vfio (qemu) job
The job has failed more than 50% on nightly CI. Remove it from the list of
execution until we don't have a fix.

The clh variation was disabled on commit 5f5274e699 so this change will
actually result on all the VFIO jobs disabled. Instead of delete the entire
entry from this workflow yaml (or comment the entry), I preferred to use
`if: false` which will make the jobs appear on the UI as skipped.

Issue: 9851
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2024-06-14 16:09:59 -03:00
Wainer dos Santos Moschetta
73ab5942fb tests/k8s: run for qemu-runtime-rs on AKS
The following tests are disabled because they fail (alike with dragonball):

- k8s-cpu-ns.bats
- k8s-number-cpus.bats
- k8s-sandbox-vcpus-allocation.bats

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2024-06-13 16:20:59 -03:00
Wainer dos Santos Moschetta
be9990144a workflow: run kata-deploy tests to qemu-runtime-rs on AKS
Start testing the ability of kata-deploy to install and configure
the qemu-runtime-rs runtimeClass.

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2024-06-11 12:58:47 -03:00
Hyounggyu Choi
869f89c338 Merge pull request #9773 from BbolroC/use-qemu-coco-dev-s390x
GHA: Use qemu-coco-dev for k8s nydus test on s390x
2024-06-04 17:49:38 +02:00
Wainer Moschetta
2b8cdd9ff2 Merge pull request #9765 from wainersm/disable_failing_jobs
CI: disable jobs that failed > 50% on nightly CI recently - part 1
2024-06-04 12:05:36 -03:00
Hyounggyu Choi
246ee83768 GHA: Use qemu-coco-dev for k8s nydus test on s390x
In line with the changes for x86_64, the k8s nydus test for s390x should
also use `qemu-coco-dev` for `KATA_HYPERVISOR`.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-06-04 15:49:23 +02:00