This addresses an internal AKS issue that intermittently prevents
clusters from getting created. The fix has been rolled out to eastus but
not yet eastus2, so we unblock the CI by switching. No downsides in
general.
This supersedes #8990.
Fixes: #8989
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This PR adds workflow for running kubernetes test suite on ppc64le.
It uses scripts to create and delete the cluster using kubeadm as none of the current cluster creation tools are supported on Power.
Fixes: #7950
Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
This PR re-arranged the nerdctl tests to avoid random failures.
In this PR first will run the tests with RunC and then with the kata hypervisor.
This PR tries to avoid the random failures that is happening with cloud-hypervisor
and clh.
Fixes#8963
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Due to the changes done in the CI, we need to set the correct
subscription to be used with the account from now on, otherwise we'd end
up using CoCo subscription.
Fixes: #8946
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The existing confidential basic test titled `Test unencrypted
confidential container launch success and verify that we are
running in a secure enclave` has been updated to incorporate
IBM Secure Execution (`qemu-se`).
Previously, a secure image was absent from kata-deploy, hindering
the inclusion of IBM SE in the test.
Thanks to the #6755 update, it is now possible to test the TEE.
This modification extends the existing test by introducing
`qemu-se`. The specific changes are outlined below:
- Add an additional test `cc-se-e2e-tests` to s390x nightly
- Expansion of `REMOTE_COMMAND_PER_HYPERVISOR` for `qemu-se`
- Temporary exclusion of two test cases currently incompatible with IBM SE
(`cpu-ns` is a common issue across all TEEs, while `inotify`
will be addressed in a subsequent pull request).
Fixes: #8913
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Add --show-output-of-passing-tests to the k8s integration tests. The
output of a passing test can be helpful when investigating a failure
of the same test.
Fixes: #8885
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR updates the packages necessary to build the ResNet50 fp32
Dockerfile to run properly the benchmark.
Fixes#8875
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
The changes to install and test genpolicy must come later, after CI
picks up these gha changes.
Fixes: #8856
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR removes the references to virtiofs from memory average
calculation when the container uses a shared file system other than
virtiofs.
Fixes: #8807
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR removes the iperf3 server protocol as this server definition is
also used for the UDP iperf3 benchmarks to avoid duplication of the
same yaml files.
Fixes#8829
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR uses a specific python version to run tensorflow benchmark
as it needs python 3.8 to run correctly and avoid failures.
Fixes#8791
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Containerd runtime options with wrong setting cause it failed.
Correct it as below:
...
[plugins.cri.containerd.runtimes.${runtime}.options]
ConfigPath= "${KATA_CONFIG_PATH}"
...
Fixes: #8746
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This PR removes the check images function from stressng test as now
it will part of the install dependencies function from gha-run script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
To avoid random failures while trying to build and install the stressng image,
this PR moves that step as part of the install dependencies in order to move
the stability tests and avoid timeouts.
Fixes#8787
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Log the list of the current pods between tests because these pods
might be related to cluster nodes occasionally running out of memory.
Fixes: #8769
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR adds the qemu-experimental hypervisor in the function to
kill kata components.
Fixes#8775
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
As the StratoVirt VMM has been added, we can update the docs
and make some intoduction to StratoVirt, thus users can know more
about the hypervisor choices.
Fixes: #8645
Signed-off-by: Zhigang Wang <wangzhigang17@huawei.com>
Signed-off-by: Liu Wenyuan <liuwenyuan9@huawei.com>
This PR improves the iperf3 cleanup to ensure all the components are
being deleted properly to avoid the random failures of leaving
the iperf3 clients on the kata metrics CI.
Fixes#8765
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
The default network backend of runtime-rs with Dragonball is vhost-net
after #8609 merged. The tests might be failed if vhost modules are not
loaded.
Fixes: #8717
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
After kata-deploy has installed, check that the worker nodes
are still in Ready state and don't have a containerd://Unknown
container runtime versions, identicating that container isn't working
to ensure that we didn't corrupt the containerd config during kata-deploy's edits
Fixes: #8678
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
To become more resilient against these kinds of errors:
deployment.apps/confidential-unencrypted created
pod/confidential-unencrypted-c5fdd6964-rrb6q condition met
ssh: connect to host 10.42.0.109 port 22: Connection refused
Fixes: #8687
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR fixes the indentation of the confidential common
script for kubernetes tests.
Fixes#8698
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Update the Speed & Density metric tests baseline for StratoVirt
and re-enable them, and skip other metric tests temporarily.
Fixes: #8656
Signed-off-by: Liu Wenyuan <liuwenyuan9@huawei.com>
This PR improves the latency network cleanup by removing the pods
even if the test fails.
Fixes#8658
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Some copyright dates were not updated with the most recent changes to
code; update them.
Fixes#8595
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Change directory for running make due to local errors when building with
make -C.
Fixes#8595
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
This PR updates the TensorFlow ResNet50 Int8 Dockerfile to use the
proper python version for kata metrics.
Fixes#8643
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Fix paths for yqdir (where the install_yq.sh script currently is) so
that static checks can run without error.
Fixes#8595
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>