This PR increases the timeout for running the CoCo tests to avoid random failures.
These failures occur when the action `Run tests` times out after 30 minutes, causing the CI to fail.
Fixes: #10062
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Made run-k8s-tests-on-zvsi.yaml free of warnings by removing:
SC2086:info:1:1: Double quote to prevent globbing and word splitting ...
SC2086:info:2:1: Double quote to prevent globbing and word splitting ...
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This PR increases the timeout to run the CoCo TDX tests in order
to avoid the random failures on TDX saying that
The action 'Run tests' has timed out after 30 minutes and making
the GHA job fail.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
run-k8s-tests-on-zvsi runs the coco tests and we've added new
secrets to provide credentials for the authenticated image testing,
so we need to let the zvsi job inherit these from the caller workflow
like the rest of the coco tests
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
The CI is failing with:
```
Invalid workflow file: .github/workflows/cleanup-resources.yaml#L10
The workflow is not valid. .github/workflows/cleanup-resources.yaml (Line: 10, Col: 5): Unexpected value 'secrets'
```
I think this is because `secrets: inherit` is only applicable
when re-using a workflow, not for a standalone job like
we have here.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Per the decision taken in the 6/27 AC meeting, this PR temporarily
disables kata-deploy and GARM tests until we secure further Azure CI
funding.
In the meantime, I'll transition the GARM tests to free runners and
reenable them to regain that coverage without affecting spending (see
#9940). If it turns out the free runners are too slow, we'll switch back
to GARM.
After funding is secured, we'll reenable the kata-deploy tests (see
#9939).
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
As part of archiving the tests repo, we are eliminating the dependency on
`clone_tests_repo()`. The scripts using the function is as follows:
- `ci/install_rust.sh`.
- `ci/setup.sh`
- `ci/lib.sh`
This commit removes or replaces the files, and makes an adjustment accordingly.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This is the first part of adding a job to clean up potentially dangling
Azure resources. This will be based on Jeremi's tool from
https://github.com/jepio/kata-azure-automation.
At first, we'll only clean up AKS clusters, as this is what has been
causing us problems lately, but this could very well be extended to
cleaning up entire resource groups, which is why I left the different
names pretty generic (i.e. "resources" instead of "clusters").
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
- Add the `AUTHENTICATED_IMAGE_USER` and
`AUTHENTICATED_IMAGE_PASSWORD` repository secrets as env vars
to the coco tests, so we can use them to pull an images from
and authenticated registry for testing
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Now, using vanilla kubernetes, let's re-enable the TDX CI and hope it
becomes more stable than it used to be.
The cleanup-snapshotter is now taking ~4 minutes, and that matches with
the other platforms, mainly considering there's a sum of 210 seconds
sleep in the process.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We've noticed a bunch of issues related to deploying and deleting the
nydus-snapshotter. As we don't see the same issues on other machines
using vanilla kubernetes, let's avoid using k3s for now follow the flow.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR will avoid the failures when collecting artifacts for the gha.
This will ensure that we collect and archive system's data for the
purpose of debugging.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
TDX CI has been having some issues with the Nydus snapshotter cleanup,
which has been stuck for hours depending every now and then.
With this in mind, let's disable the TDX CI, so we avoid it blocking the
progress of Kata Containers project, and we re-enable it as soon as we
have it solved on Intel's side.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The following jobs have failed more than 50% on nightly CI.
run-kata-deploy-tests-on-garm / run-kata-deploy-tests (clh, k0s)
run-kata-deploy-tests-on-garm / run-kata-deploy-tests (clh, rke2)
run-kata-deploy-tests-on-garm / run-kata-deploy-tests (qemu, k0s)
Instead of removing only those jobs, let's skip the kata-deploy-tests
on GARM completely so we can try to fix all the issues (or maybe
drop the jobs altogether).
Issue: #9854
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The job has failed more than 50% on nightly CI. Remove it from the list of
execution until we don't have a fix.
Issue: #9853
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The job has failed more than 50% on nightly CI. Remove it from the list of
execution until we don't have a fix.
Issue: #9852
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The job has failed more than 50% on nightly CI. Remove it from the list of
execution until we don't have a fix.
The clh variation was disabled on commit 5f5274e699 so this change will
actually result on all the VFIO jobs disabled. Instead of delete the entire
entry from this workflow yaml (or comment the entry), I preferred to use
`if: false` which will make the jobs appear on the UI as skipped.
Issue: 9851
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The following tests are disabled because they fail (alike with dragonball):
- k8s-cpu-ns.bats
- k8s-number-cpus.bats
- k8s-sandbox-vcpus-allocation.bats
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Start testing the ability of kata-deploy to install and configure
the qemu-runtime-rs runtimeClass.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>