This PR updates the python version for the TensorFlow ResNet FP32
dockerfile so the benchmark can run without issues.
Fixes#8593
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This will make the life easier for dragonball developers to properly
enable the tests once the tests are ready.
Fixes: #8569
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This will make the life easier for dragonball developers to properly
enable the tests once the tests are ready.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This will make the life easier for dragonball developers to properly
enable the tests once the tests are ready.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR fixes the iperf parallel bandwidth limit for the kata
metrics CI.
Fixes#8530
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Print more information useful for debugging. Also, use a separate YAML
file for this test, instead of reusing someone else's file.
Fixes: #8270
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
As the configuration for the runtime-rs based drivers are now placed in
a different location than the golang ones, we should adapt this script
accordingly.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
`kata-ctl` is the tool for runtime-rs, and it should be used instead of
`kata-runtime`.
`kata-ctl` requires sudo, and that's the reason it's also been added as
part of the calls.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
`kata-runtime env` is an alias for `kata-runtime kata-env, and calling
it with the `env` paramenter allows us to easily extend the scripts to
use `kata-ctl` instead of `kata-runtime` when dealing with runtime-rs.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As we've done some changes in the runtime-rs based drivers to install
their configuration into a different location, this should also be
reflected as part of this test.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Updates to scripts for static-checks.sh functionality, including common
functions location, the move of several common functions to the existing
common.bash, adding hadolint and xurls to the versions file, and changes
to static checks for running in the main kata containers repo.
The changes to the vendor check include searching for existing go.mod
files but no other changes to expand the test.
Fixes#8187
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Move tool as part of static checks migration.
Fixes#8187
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Signed-off-by: Derek Lee <derlee@redhat.com>
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Signed-off-by: Graham Whaley <graham.whaley@intel.com>
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Signed-off-by: Marco Vedovati <mvedovati@suse.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Signed-off-by: Shiming Zhang <wzshiming@foxmail.com>
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Move tool as part of static checks migration.
Fixes#8187
Signed-off-by: Bo Chen <chen.bo@intel.com>
Signed-off-by: Carlos Venegas <jos.c.venegas.munoz@intel.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Signed-off-by: Dan Middleton <dan.middleton@intel.com>
Signed-off-by: Derek Lee <derlee@redhat.com>
Signed-off-by: Eric Ernst <eric.ernst@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Signed-off-by: Graham Whaley <graham.whaley@intel.com>
Signed-off-by: Hui Zhu <teawater@antfin.com>
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Signed-off-by: Jimmy Xu <xjmmyshcn@gmail.com>
Signed-off-by: Liu Xiaodong <xiaodong.liu@intel.com>
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Signed-off-by: Shiming Zhang <wzshiming@foxmail.com>
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Move the tool as a dependency for static checks migration.
Fixes#8187
Signed-off-by: Bin Liu <bin@hyper.sh>
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Signed-off-by: Julio Montes <julio.montes@intel.com>
This PR updates the iperf3 network documentation to include
the parallel bandwidth.
Fixes#8523
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Implements the following test case:
Scenario: Check incorrect hash fails
**Given** I have a version of kata installed that has a kernel with the
initramfs built and config with rootfs_verity.scheme=dm-verity
rootfs_verity.hash=<incorrect hash of rootfs> set in the kernel_params
**When** I try and create a container a basic pod
**Then** The pod is doesn't run
**And** Ideally we'd get a helpful message to indicate why
Currently on CI only qemu-tdx is built with measured
rootfs support in the kernel, so the test is restriced to that
runtimeclass.
Fixes#7415
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Bring the setup_common() from CCv0 branch test's
integration/kubernetes/confidential/tests_common.sh. It should be used
to reduce boilerplates on the setup() of the tests.
Unlike the original code, this won't export the `test_start_time` variable
as it wouldn't be accurate to grab logs from the worker nodes due
date/time mismatch between the running tests machine and the worker
node. The function export the `node` variable which holds the name of
a random node which has kata installed. Apart from that, it exports the
`node_start_time` which capture the date/time when the test started,
relative to the `node`.
Tests that should inspect the logs can schedule pods/resources to the `node`
and use `node_start_time` as the value reference to grep the logs.
Fixes#7590
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Bring the assert_logs_contain() from CCv0 branch tests'
integration/kubernetes/confidential/lib.sh.
Introduced the print_node_journal() which uses `kubectl debug` to print
the systemd's journal of a k8s's node.
Fixes#7590
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This new function allow to the annotations to metadata section in a yaml
configuration file.
Co-authored-by: Ryan Savino <ryan.savino@amd.com>
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Copied the new_pod_config() and pod-config.yaml.in from CCv0 branch
tests' integration/kubernetes/confidential/tests_common.sh and fixtures.
Unlike the original version, new_pod_config() now gets the runtimeclass
by parameter as the RUNTIMECLASS environment variable seems not broadly
used on main branch's CI.
The pod-config.yaml.in was changed as the diff shows below. In
particular the imagePullSecrets was removed to avoid it throwing a
warning on the pod's log.
```
--- a/tests/integration/kubernetes/runtimeclass_workloads/pod-config.yaml.in
+++ b/tests/integration/kubernetes/runtimeclass_workloads/pod-config.yaml.in
@@ -5,12 +5,10 @@
apiVersion: v1
kind: Pod
metadata:
- name: busybox-cc
+ name: test-e2e
spec:
runtimeClassName: $RUNTIMECLASS
containers:
- - name: nginx
+ - name: test_container
image: $IMAGE
- imagePullPolicy: Always
- imagePullSecrets:
- - name: cococred
\ No newline at end of file
+ imagePullPolicy: Always
\ No newline at end of file
```
Co-authored-by: Georgina Kinge <georgina.kinge@ibm.com>
Co-authored-by: Megan Wright <Megan.Wright@ibm.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The following functions were copied from CCv0's branch test's
integration/kubernetes/confidential/lib.sh. I did just smalls
refactorings (shortened their names and delinted shellcheck warnings):
- k8s_delete_all_pods_if_any_exists()
- k8s_wait_pod_be_ready()
- k8s_create_pod()
- assert_pod_fail()
Co-authored-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Co-authored-by: Georgina Kinge <georgina.kinge@ibm.com>
Co-authored-by: Jordan Jackson <jordan.jackson@ibm.com>
Co-authored-by: Megan Wright <Megan.Wright@ibm.com>
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Co-authored-by: Wang, Arron <arron.wang@intel.com>
This PR fixes the result finding for the general throughput for
the tensorflow benchmark.
Fixes#8466
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
As part of the CI migration, this PR is to add workflows for containerd and k8s for s390x.
Fixes: #7930
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This commit enables StratoVirt hypervisor to be tested in kata GHA,
incluing k8s, metrics, cri-containerd, nydus and so on.
Meanwhile, adding some unit tests for StratoVirt to make sure it works.
Fixes: #7794
Signed-off-by: Liu Wenyuan <liuwenyuan9@huawei.com>
This PR adds the iperf udp information to the network README
for the kata metrics CI.
Fixes#8452
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
As we've done some changes in the VMM vcpu allocation, let's introduce
basic tests to make sure that we're getting the expected behaviour.
The test consists in checking 3 scenarios:
* default_vcpus = 0 | no limits set
* this should allocate 1 vcpu
* default_vcpus = 0.75 | limits set to 0.25
* this should allocate 1 vcpu
* default_vcpus = 0.75 | limits set to 1.2
* this should allocate 2 vcpus
The tests are very basic, but they do ensure we're rounding things up to
what the new logic is supposed to do.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The current kata-deploy code has been doing a `sed` to add allowed
hypervisor annotations, so CBL mariner can be tested with their own
kernel and initrd.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
There's no reason to escape the first + on the +k3s[0-9]\+ regex, as
shown here:
```sh
ubuntu@k3s:~$ /usr/local/bin/k3s kubectl version --short 2>/dev/null | \
grep "Client Version" | \
sed \
-e 's/Client Version: //' \
-e 's/+k3s[0-9]\+//'
v1.27.7
```
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
It seems that with the new k3s release, they've bumped their kubectl
version from x.y.z+k3s1 to x.y.z+k3s2.
Let's ensure our regexp is more generic and future proof for such
changes.
Fixes: #8410
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
`TestDeviceCgroup` is added to cri-containerd's integration tests. The test
launches two containers. Each container has a block device. It checks the
validity of device cgroup.
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Add test to verify kata supports ipvlan networks.
This test can be bit tricky as it requires knowledge about host interfaces
to be used as a master for the ipvlan network.
However, with github actions, we can assume interface called eth0 to be
present on the host and functioning.
Fixes: #8366
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
This PR makes the change to using the SIGKILL signal instead
of SIGTERM to force stop each kata component before start
running any metric test.
Fixes: #8336
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR removes double quotes in jq output to return raw strings
as input of checkmetrics tool.
Fixes: #8331
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR increases the number of attempts to stop kata components
when it is required usually before starting a metrics test.
Fixes: #8307
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR enables the new FIO test based on the containerd client
which is used to track the I/O metrics in the kata-ci environment.
Additionally this PR fixes the parsing of results.
Fixes: #8199
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR updates the values of the fio parameters for iodepth
requests and for the number of jobs, in order to increase the
number of sequential operations.
Additionally, it adds the list of packages needed to parse the
results.
Fixes: #8198
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
In bare-metal machines the git tree might get on unstable state with the
previous rebase left halfway. So let's attempt to abort any rebase before.
Fixes#8318
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This PR adds the iperf udp benchmark for bandwdith measurement
for network metrics.
Fixes#8246
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Encode policy file during test - easier to understand than hard-coding
the encoded file contents.
Fixes: #8214
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR corrects the init env() helper function, to make that
systemctl always returns true when enumerating masked services,
and preventing the test from failing
Fixes: #8242
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
To avoid errors when initializing the test environment, the
kill_processes_before_start() helper function needs to verify that
docker is installed before attempting to stop it.
Fixes: #8218
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR removes trailing commas so that the json results
file is valid.
This PR also changes the way data results are collected by
terating through the array of memory values to calculate
their average.
Fixes: #8204
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR removes the reference in the documentation to the DAX
subtest of the FIO benchmark, because this metric is currently
WIP.
Fixes: #8159
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR adds a trap whenever the scrip exits, it deletes the iperf
k8s deployment and k8s services, and deletes the kata components.
This way, when the script finishes, it verifies that there are
indeed no kata components still running.
Fixes: #8126
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
The KUBERNETES variable is mostly used by kata-deploy whether to apply
k3s specific deployments or not. It is used to select the type of
kubernetes to be installed (k3s, k0s, rancher...etc) and it is always
set on CI. Running the script locally we want to set a value by default
to avoid `KUBERNETES: unbound variable` errors.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This test can give false-positive on a multi-node cluster. Changed it to
use the new get_one_kata_node() and the modified exec_host() to run the
setup commands on a given node (that has kata installed) and ensure the
test pod is scheduled at that same node.
Fixes#7619
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This test can give false-positive on a multi-node cluster. Changed it to
use the new get_one_kata_node() and the modified exec_host() to run the
setup commands on a given node (that has kata installed) and ensure the
test pod is scheduled at that same node.
Fixes#7619
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The exec_host() simply fails on cluster with multi-nodes because
`kubectl get node -o name" will return a list o names. Moreover, it will
return control nodes names which usually don't have kata installed.
Fixes#7619
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The introduced get_one_kata_node() returns the first node that
has the kata-runtime=true label, i.e., supposedly a node with
kata installed.
This is useful for tests that should run on a determined worker
node on a multi-nodes cluster.
Fixes#7619
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Let KATA_HYPERVISOR be qemu by default in gh-run.sh as this variable
is required to tweak some configurations of kata-deploy.
Fixes#7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The deploy-kata() of gha-run.sh will wait for 10 minutes for the kata
deploy installation finish. This allow users of the script to overwrite
that value by exporting the KATA_DEPLOY_WAIT_TIMEOUT environment
variable.
Fixes#7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Fixed a couple of warns shellcheck emitted and disabled others:
* SC2154 (var is referenced but not assigned)
* SC2086 (Double quote to prevent globbing and word splitting)
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The only difference to the other platforms is that it needs to
export KUBECONFIG.
Fixes#7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The cleanup-kcli() behaves like other deploy kata for
bare-metal (e.g. sev, tdx...etc) except that KUBECONFIG
should be exported.
Fixes#7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The cleanup-kcli() behaves like other clean up for bare-metal (e.g. sev,
tdx...etc) except that KUBECONFIG should be exported.
Fixes#7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
On CI workflows the variables DOCKER_REGISTRY, DOCKER_REPO and
DOCKER_TAG are exported to match the built image. However, when running
the script outside of CI context, a developer might just use the latest
image which in this case will be
`quay.io/kata-containers/kata-deploy-ci:kata-containers-latest`.
Fixes#7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Adapted the gha-run.sh script to create a Kubernetes cluster locally
using the kcli tool.
Use `./gha-run.sh create-cluster-kcli` to create it, and
`./gha-run.sh delete-cluster-kcli` to delete.
Fixes#7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
I'm basically moving the runk tests from the tests repo to this one, and
I'm adding the "Signed-off-by:" of every single contributor the tests.
Fixes: #8116
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Chen Yiyang <cyyzero@qq.com>
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
I'm basically moving the tracing tests from the tests repo to this one,
and I'm adding the "Signed-off-by:" of every single contributor to the
tests.
Fixes: #8114
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Signed-off-by: Salvador Fuentes <salvador.fuentes@intel.com>
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>
This PR enables the use of jq pretty-print feature to
improve the formatting of metric results json files.
Fixes: #8081
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
FIO benchmark is enabled to measure IO in Kata
at different latencies using containerd client,
in order to complement the CI metrics testing set.
This PR asl deprecated the previous Fio bench
based on k8s.
Fixes: #8080
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
The tests are failing when setting up k0s, and that happens because we
download a kubectl binary matching the kubernetes version k0s is using,
and we do that by:
```
sudo k0s kubectl version --short 2>/dev/null | ...
```
With kubectl 1.28, which is now the default on k0s, `kubectl version
--short` has been removed, leading us to an empty stringm causing then
the error in the CI.
Fixes: #8105
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The k8s.gcr.io is deprecated for a while now and has been redirected to
registry.k8s.io. However on some bare-metal machines in our testing
pools that redirection is not working, so let's just replace the
registries.
Fixes#8098
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
(cherry picked from commit b2c3bca558c38deff2117d5909d9071c23c05590)
Let's move, adapt, and use the kata-monitor tests from the tests repo.
In this PR I'm keeping the SoB from every single contributor from who
touched those tests in the past.
Fixes: #8074
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
This will serve us quite will in the upcoming tests addition, which will
also have to be executed using CRi-O.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This will become handy when doing tests with CRI-O, as CRI-O doesn't
install the CNI plugins for us.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's ensure we have runc running with `SystemdCgroups = false`,
otherwise we'll face failures when running tests depending on runc on
Ubuntu 22.04, woth LTS containerd.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR fixes the latency yamls path for the latency test for
kata metrics.
Fixes#8055
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
We've faced this as part of the CI, only happening with the CRI-O tests:
```
not ok 1 Test readonly volume for pods
# (from function `exec_host' in file tests_common.sh, line 51,
# in test file k8s-file-volume.bats, line 25)
# `exec_host "echo "$file_body" > $tmp_file"' failed with status 127
# [bats-exec-test:38] INFO: k8s configured to use runtimeclass
# bash: line 1: $'\r': command not found
#
# Error from server (NotFound): pods "test-file-volume" not found
```
I must say I didn't dig into figuring out why this is happening, but we
may be safe enough to just trail the '\r', as long as all the tests keep
passing on containerd.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We need the default capabilities to be enabled, especially `SYS_CHROOT`,
in order to have tests accessing the host to pass.
A huge thanks to Greg Kurz for spotting this and suggesting the fix.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
This PR fixes the network metrics section at the README by leaving
the current tests that we have in our kata metrics.
Fixes#8017
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR enables the latency test for gha run script for kata metrics.
Fixes#8037
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Otherwise we'll face the following error:
```
Failed to enable unit: Interactive authentication required.
```
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This is based on official CRI-O documentations[0] and right now we're
making this specific to Ubuntu as that's what we have as runners.
We may want to expand this in the future, but we're good for now.
[0]:
https://github.com/cri-o/cri-o/blob/main/install.md#apt-based-operating-systems
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This will be very useful in the near future, when we start testing
kata-deploy with rke2 as well.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This will be very useful in the near future, when we start testing
kata-deploy with k0s as well.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We'll be using exactly the same code used for the k8s tests, which are
already deploying k3s on GARM.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We just need to make sure the correct overlay is applied, following what
we already have been doing for k3s.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
So we have a better control on which flavour of kubernetes kata-deploy
is expected to be targetting.
This was also done as part of fa62a4c01b,
for the k8s tests.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Otherwise only the first test will be executed
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Ideally we'd add the instance_type or the full K8S_TEST_HOST_TYPE but
that exceeds the maximum amount of characteres allowed for the cluster
name. With this in mind, let's use the first letter of
K8S_TEST_HOST_TYPE instead.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This makes it so that each AKS cluster is created in its own individual
resource group, rather than using the "kataCI" resource group for all
test clusters.
This is to accommodate a tool that we recently introduced in our Azure
subscription which automatically deletes resource groups after a set
amount of time, in order to keep spending under control.
The tool will automatically delete any resource group, unless it has a
tag SkipAutoDeleteTill = YYYY-MM-DD. When this tag is present, the
resource group will be retained until the specified date.
Note that I tagged all current resource groups in our subscription with
SkipAutoDeleteTill = 2043-01-01 so that we don't lose any existing
resources.
Fixes: #7982
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This PR enables the parallel bandwidth iperf limit for kata metrics.
Fixes#7989
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
We're hitting a specific issue after updating, which will require some
work on dragonball before it can be re-added here.
The issue:
```
...
3: failed to do rafs mount\\n
4: fail to attach rafs \\\"/var/lib/containerd-nydus/snapshots/2/fs/image/image.boot\\\"\\n
5: add share fs mount\\n
6: Mount rafs at
/rafs/197ef3db03c86b91bf3045ff59183ce8b5750941ad1d3484f4a8301a70f5109f/rootfs_lower
error: Failed to Mount backend
...
Caused by:
vmm action error: FsDevice(AttachBackendFailed(\\\"attach/detach a
backend filesystem failed:: missing field `version` at line 1 column
489\\\"))\"): unknown"
```
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This will ensure we're testing with the correct runtime, instead of
using the `default` one.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
To support the v0.12.0 nydus-snapshotter, we need to update the config
files and the commandline to start nydus-snapshotter.
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
And with this we finally enable the nydus tests to run as part of our
GHA CI.
Fixes: #6543
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We've been simply doing nothing whenever `install-kata` was called, and
that was the intent when we added the placeholder calls.
Now, let's install kata, as expected. :-)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As we've added install_nydus() and install_nydus_snapshotter(), which do
conform with the pattern we're following on GHA, let's rely on them
rather than relying on the bits coming from nydus_test.sh.
Later on we'll have install_nydus() and install_nydus_snapshotter() as
part of the dependencies install in our `gha-run.sh`.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Similarly to what's been done for the cri-containerd tests, as part of
84dd02e0f9, we need to add the timeout
here for the crictl calls.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Otherwise we may face errors like:
```
getting sandbox status of pod "d3af2db414ce8": metadata.Name,
metadata.Namespace or metadata.Uid is not in metadata
"&PodSandboxMetadata{Name:nydus-sandbox,Uid:,Namespace:default,Attempt:1,}"
getting sandbox status of pod "-A": rpc error: code = NotFound desc = an
error occurred when try to find sandbox: not found
```
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Otherwise we canoot properly start the nydus snapshotter, nor properly
kill it after it's been started.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The "source ..." we've been doing was not changed since those tests were
part of the Jenkins tests, and we need to adapt them, either setting the
correct path or entirely removing the ones that are not relevant to us
anymore.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This function will be used to download and install the
nydus-snapshotter, and it follows the same pattern we already have
introduced for downloading and installing another dependencies from
GitHub.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This function will be used to download and install nydus, and it follows
the same pattern we already have introduced for downloading and
installing another dependencies from GitHub.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
clippy is used as part our tests, so it's useful to have it installed
while we're already installing rust.
In case of developers, they also better be using it. :-)
Fixes: #7974 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We'll use it as part of the refactoring we're doing in the static check
tests.
I can see a lot of other uses of this, but changing all of them to this
one is out of the scope for this PR.
Fixes: #7974 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We can use this a lot as part of our CI, but right now I'm just moving
those here with the intent to use later on in this series.
Fixes: #7974 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let me start with a fair warning that this commit is hard to split into
different parts that could be easily tested (or not tested, just
ignored) without breaking pieces.
Now, about the commit itself, as we're on the run to reduce costs
related to our sponsorship on Azure, we can split the k8s tests we run
in 2 simple groups:
* Tests that can be run in the smaller Azure instance (D2s_v5)
* Tests that required the normal Azure instance (D4s_v5)
With this in mind, we're now passing to the tests which type of host
we're using, which allows us to select to run either one of the two
types of tests, or even both in case of running the tests on a baremetal
system.
Fixes: #7972
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The list of tests which require a bigger VM instance is:
* k8s-number-cpus.bats -- failing on all CIs
* k8s-parallel.bats -- only failing on the cbl-mariner CI
* k8s-scale-nginx.bats -- only failing on the cbl-mariner CI
We'll keep those disabled while we re-work the logic to **only run
those** in a bigger (and more expensive) VM instance.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Without setting the cpu limit / request to 1, we can make this test run
in a smaller VM instance without any issue.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We need a very recent L2 guest kernel to fix all the bugs that occur in nested
virtualization.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
cloud hypervisor does not emulate pcie switches or pci bridges, so we need to
accept a lonely device.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
It is fine to start a VM with the disk image without syncing it as we now run
the test in an ephemeral Azure instance.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
tdp_mmu had some issues up until around Linux v6.3 that make it work
particularly bad when running nested on Hyper-V. Reload the module at the start
of the test and disable the tdp_mmu param.
Gather debug info at the end of the test to make it easier to figure out what
went wrong. This uses github actions group syntax so that each section can be
collapsed.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
- reduce memory and cpu usage to fit in a D4s_v5
- source correct lib
- mount workspace from 9p
- disable cpu mitigations for speed
- drop unused commands and variables
- install containerd
- install kata from built artifacts
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
This imports the vfio test scripts github.com/kata-containers/tests. The test
case doesn't work yet but doing the changes in a separate commit will make it
easier to track the changes. The only change in this commit is renaming
vfio_jenkins_job_build.sh -> vfio_fedora_vm_wrapper.sh
Fixes: #6555
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Otherwise we'll fail to configure kata-containers in the `install-kata`
step.
This is mostly needed because the nerdctl-full tarball doesn't provide a
contaienrd configuration, just the binary, as contaienrd does not
actually require a configuration file to run with the default config.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
TIL that the Azure VMs we use are created without an explicit outbund
connectivity defined.
This leads us to issues using `ping ...` as part of our tests, and when
consulting Jeremi Piotrowski about the issue he pointed me out to two
interesting links:
* https://learn.microsoft.com/en-us/azure/virtual-network/ip-services/default-outbound-access
* https://learn.microsoft.com/en-us/archive/blogs/mast/use-port-pings-instead-of-icmp-to-test-azure-vm-connectivity
For your own sanity, do not read the comments, after all this is
internet. :-)
Anyways, the suggestion is to use nping instead, which is provided by
the nmap package, so we can explicitly switch to using the tcp port 80
for the ping. With this in mind, I'm switching the image we use for the
test and using one that provided nping as a possible entry point, and
from now on (this part of) the tests should work.
Fixes: #7910
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
TIL that the Azure VMs we use are created without an explicit outbund
connectivity defined.
This leads us to issues using `ping ...` as part of our tests, and when
consulting Jeremi Piotrowski about the issue he pointed me out to two
interesting links:
* https://learn.microsoft.com/en-us/azure/virtual-network/ip-services/default-outbound-access
* https://learn.microsoft.com/en-us/archive/blogs/mast/use-port-pings-instead-of-icmp-to-test-azure-vm-connectivity
For your own sanity, do not read the comments, after all this is
internet. :-)
Anyways, the suggestion is to use nping instead, which is provided by
the nmap package, so we can explicitly switch to using the tcp port 80
for the ping. With this in mind, I'm switching the image we use for the
test and using one that provided nping as a possible entry point, and
from now on (this part of) the tests should work.
Fixes: #7910
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Fix kernel and initrd annotations in the k8s tests on Mariner. These
annotations must be applied to the spec.template for Deployment, Job
and ReplicationController resources.
Fixes: #7764
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR ensures that docker is running as part of the init_env function
in kata metrics to avoid failures like docker is not running and making
the kata metrics CI to fail.
Fixes#7898
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
FIO test is showing ongoing issues when running in k8s.
Working on running FIO on the ctr client which has been
shown to be stable.
Fixes: #7920
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
Let's add a very basic sanity test to check that we can spawn a
containers using nerdctl + Kata Containers.
This will ensure that, at least, we don't regress to the point where
this feature doesn't work at all.
In the future, we should also test all the VMMs with devmapper, but
that's for a follow-up PR after this test is working as expected.
Fixes: #7911
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's add a very basic sanity test to check that we can spawn a
containers using docker + Kata Containers.
This will ensure that, at least, we don't regress to the point where
this feature doesn't work at all.
For now we're running this test against Cloud Hypervisor and QEMU only,
due to an already reported issue with dragonball:
https://github.com/kata-containers/kata-containers/issues/7912
In the future, we should also test all the VMMs with devmapper, but
that's for a follow-up PR after this test is working as expected.
Fixes: #7910
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Now that the metrics migration from the tests to kata containers has been completed, this PR removes the warning from the main metrics documentation.
Fixes#7894
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
We're changing what's been done as part of ac939c458c, as we've
notcied issues using `github.event.pull_request.merge_commit_sha`.
Basically, whenever a force-push would happen, the reference of
merge_commit_sha wouldn't be updated, leading us to test PRs with the
old code. :-/
In order to get the rebase properly working, we need to ensure we pull
the hash of the commit as part of checkout action, and ensure
fetch-depth is set to 0.
Fixes: #7414
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
There's absolutely no need to have the skip check as part of the test
itself when it's already done as part of the setup function.
We're only touching the files here that were touched in the previous
commit.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's keep both checks for now, but in the future we'll be able to
remove the check for "firecracker", as the hypervisor name used as part
of the GitHub Actions has to match what's used as part of the
kata-deploy stuff, which is `fc` (as in `kata-fc for the runtime class)
instead of `firecracker`.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We've been using the `kata-deploy-tdx` target as that also uses k3s as
base, but it's better to just have a specific garm target.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
So we have a better control on which flavour of kubernetes kata-deploy
is expected to be targetting.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR changes the order in which the FIO test first
cleans the environment and then checks if the environment
is indeed clean.
Fixes: #7869
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
As we were using `tee` without the `-a` (or `--apend`) aptton, the
containerd config would be overwritten, leading to a NotReady state of
the Node.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's download the vanilla kubectl binary into `/usr/bin/`, as we need
to avoid hitting issues like:
```sh
error: open /etc/rancher/k3s/k3s.yaml.lock: permission denied
```
The issue basically happens because k3s links `/usr/local/bin/kubectl`
to `/usr/local/bin/k3s`, and that does extra stuff that vanilla
`kubectl` doesn't do.
Also, in order to properly use the k3s.yaml config with the vanilla
kubectl, we're copying it to ~/.kube/config.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Otherwise the /etc/rancher/k3s/k3s.yaml is not readable by other users
than root.
As --write-config-mode is being passed, and that's an option that has to
be passed to the `server`, -s is also added to the command line.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
`wait` waits for a job to complete, not a number of seconds. Not sure
how I got that wrong in the first place, but it's what it's.
Fixes: #6542
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR replaces the ubuntu image for one which has TensorFlow optimized
for kata metrics.
Fixes#7866
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This function right now is completely based on what's part of the tests
repo[0], and that's the reason I'm keeping the `Signed-off-by` of all
the contributors to that file.
This is not perfect, though, as it changes the default snapshotter to
devmapper, instead of only doing so for the Kata Containers specific
runtime handlers. OTOH, this is exactly what we've always been doing as
part of the tests.
We'll improve it, soon enough, when we get to also add a way for
kata-deploy to set up different snapshotters for different handlers.
But, for now, this is as good (or as bad) as it's always been.
It's important to note that the devmapper setup doesn't take into
consideration a BM machine, and this is not suitable for that. We're
really only targetting GHA runners which will be thrown away after the
run is over.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Shiming Zhang <wzshiming@foxmail.com>
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
One can use different kubernetes flavours for getting a kubernetes
cluster up and running.
As part of our CI, though, I really would like to avoid contributors
spending time maintaining and updating kubernetes dependencies, as done
with the tests repo, and which has been proven to be really good on
getting things rotten.
With this in mind, I'm taking the bullet and using "k3s" as the way to
deploy kubernetes for the devmapper related tests, and that's the reason
I'm adding a function to do so, and this will be used later on as part
of this series.
It's important to note that the k3s setup doesn't take into
consideration a BM machine, and this is not suitable for that. We're
really only targetting GHA runners which will be thrown away after the
run is over.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR adds the write 95 percentile for FIO for qemu for
checkmetrics for kata metrics.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds the write 95 percentile FIO value for checkmetrics
for kata metrics.
Fixes#7842
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR enables the iperf benchmark to run on the gha for kata metrics.
Fixes#7575
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Use AGENT_POLICY=yes when building the Guest images, and add a
permissive test policy to the k8s tests for:
- CBL-Mariner
- SEV
- SNP
- TDX
Also, add an example of policy rejecting ExecProcessRequest.
Fixes: #7667
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR adds the grabdata script so it can be used for the metrics report
for kata metrics.
Fixes#7812
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR fixes the memory inside limit for clh for kata metrics due
to the recent changes that we had in the script which impacted
in the performance measurement.
Fixes#7786
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Let's expand the confidential test to also support TDX.
The main difference on the test, though, is that we're not grepping for
a string in the `dmesg` output, but rather relying on `cpuid` to detect
a TDX guest.
Fixes: #7184
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Add a test case for the launch of unencrypted confidential
container, verifying that we are running inside a TEE.
Right now the test only works with SEV, but it'll be expanded in the
coming commits, as part of this very same series.
Fixes: #7184
Signed-Off-By: Unmesh Deodhar <udeodhar@amd.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR fixes an issues in the parsing results stage,
by collecting just the n-results from the n-running
containers, discarding irrelevant data.
Fixes: #7774
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
k8s-pid-ns.bats was already using the test name from
k8s-kill-all-process-in-container.bats - probably a copy/paste bug.
Fixes: #7753
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR configures the corresponding kata runtime in K8s
based on the tested hypervisor.
This PR also enables FIO metrics test in the kata metrics-ci.
Fixes: #7665
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR fixes the check results for tensorflow benchmark now
that we change the name of the test.
Fixes#7684
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Instead of doing this as part of the test itself, let's ensure it's done
before running the tests and during the tests cleanup.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This test, at least for now, only checks whether the runtimeclasses
have been properly created.
This is just a migration from a test we had as part of the k8s suite.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
By doing this we can make sure there won't be any clash on the cluster
name created for either the k8s or the kata-deploy tests.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This will have the same function as run-k8s-tests.sh has, but for
kata-deploy.
Right now it doesn't have any tests, and the command to actually run the
tests is commented out, but right now this is just a placeholder that
will be populated sooner than later.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
In a follow-up series, we'll add a whole suite for the kata-deploy
tests. With this in mind, let's already get rid of this one and avoid
more kata-deploy tests to land here.
Fixes: #7642
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The default `kata` runtime class would get created with the `kata`
handler instead of `kata-$KATA_HYPERVISOR`. This made Kata use the wrong
hypervisor and broke CI.
Fixes: #7663
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This PR fixes MobileNet help me description in the
tensorflow script.
Fixes#7661
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR fixes the TensorFlow word across the document to have uniformity
across all the document.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
As the cri-containerd tests have been fully migrated to GHA, let's make
sure we get them running.
Fixes: #6543
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As part of the runners, we're hitting a timeout that I cannot reproduce,
at all, when allocating the same instance and running the tests
manually.
The default timeout to connect to the server is 2s when using `crictl`.
Let's increase this to 20s.
It's fairly important to mention that in the first tests I used a
timeout of 10s, and that helped but we still hit issues every now and
then.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR renames the tensorflow scripts to include the data format
that is being used as we will have multiple tests with different
data and model formats for tensorflow so this will help us to
distinguish them.
Fixes#7645
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Right now this file does nothing, as it's not even called by any GHA.
However, it'll be populated later on as part of a different series,
where we'll have kata-deploy specific tests running here.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's split a good portion of `tests/integration/kuberentes/gha-run.sh`
out, and put them in a place where they can be used to the soon-to-come
kata-deploy specific tests.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR fixed the loop that stops the kata-shim and the
hypervisors used in metrics checks.
Fixes: #7628
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR adds check containers are running in tensorflow mobilenet
that is being defined in common script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds the check containers are up function from common
in tensorflow script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds the check containers are running function the common metrics
script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds the check containers are up in the common script
in the tensorflow mobilenet script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR uses the check containers are up from the common script
in the tensorflow script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR uses the collect results function defined in common for
the tensorflow mobilenet test.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR removes the collect results function from tensorflow script
as it is going to be referenced in the common metrics script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds the collect results function to the common metrics
script.
Fixes#7617
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR computes average results for TF bench.
Additionally, it improves the data parsing from
all running containers.
Fixes: #7603
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
With these 2 simple checks we can ensure that we do not regress on the
behaviour of allowing the runtime classes / default runtime class to be
created by the kata-deploy payload.
Fixes: #7491
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's add here the image we'll be using for unencrypted confidential
tests. Later on, we'll make sure to build and use this image as part of
our CI.
The image can easily be built as a multi-arch image, and has `cpuid`
installed in case of `x86_64` build, so it can be used to detect whether
we're running on a TEE guest without having to rely on `dmesg | grep
...`.
Fixes: #7595
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We don't have to do any sed to replace the runtimeclass being used by
the moment we start taking advantage of the `DEFAULT_SHIM` environment
variable exposed merged in the previous commits.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Instead of using package manager to install bats, building
this from source. This gives us the updated version of bats
which supports functions such as setup_file and
teardown_file.
We can use these functions into our current tests.
Fixes: #7597
Signed-off-by: Unmesh Deodhar <udeodhar@amd.com>
This PR changes the metrics workflow in order to just install
kata once, and run the checks for multiple hypervisor variations.
In this way we save time avoiding installing kata for each
hypervisor to be tested.
Fixes: #7578
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR renames the mobilenet tensorflow test to have a more specific
tensorflow name mainly because tensorflow has different configurations
and we will add more tensorflow tests so we want to distinguish each
tensorflow test.
Fixes#7571
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This commit provides a new way to name the containers used
in the launch-times-test in this form:
'kata_launch_times_RANDOM_NUMBER', where RANDOM_NUMBER is
in the 0-1000 range.
Fixes: #7529
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
Otherwise the VM deletion may not delete, leaving us with several
machines behind.
Fixes: #7509
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR rounds the axelnet and resnet results in order to extract
properly the result.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR will avoid to have the strconv.atoi parsing error when we
are retrieving the results from the json.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR moves the checkmetrics to gha-run script to gathered
tensorflow information.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
The amount of info we've added seemed unnecessary, and ends up making
our lives even harder when trying to find errors.
Let's just rely on the kata-debug container to collect the needed info
for us.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
It's been proven to not be useful, and ends up making things more
confusing due to the amount of logs printed.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's make sure we can debug kata-deploy in case something goes wrong
during its execution.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This can be easily done as there was no official release with the
previous values.
The reason we're doing so is because when using `yq` to replace the
value, even when forcing `--tag '!!str' "yes"`, the content is placed
without quotes, causing errors in our CI.
While here, we're also removing the fallback value for DEBUG, as it is
**always** set in the kata-deploy.yaml file.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This imports the k8s-file-volume test from the tests repo and modifies
it slightly to set up the host volume on the AKS host.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This imports the k8s-volume test from the tests repo and modifies it
slightly to set up the host volume on the AKS host.
Fixes: #6566
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This deletes node debugger pods after execution since their presence may
affect tests that assume only test workloads pods are present.
For example, in `k8s-job` we wait for *any* pod to be in the `Succeeded`
state before proceeding, which causes failures.
Fixes: #7452
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This simply allows setting a custom resource group when debugging
locally, so as to prevent name collisions and not pollute the namespace.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Makes it so that `setup.sh` doesn't make changes in
`runtimeclass_workloads/` directly. Instead we treat that as a template
directory and we use the new directory `runtimeclass_workloads_work/` as
a work dir.
This has two advantages:
* Allows rerunning tests without the assumption that `setup.sh` must be
idempotent. E.g. the `set_runtime_class()` step would break.
* Doesn't pollute your git environment with a bunch of changes when
developing.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This splits deploying Kata and running the tests into separate commands
to make it possible to rerun tests locally without having to redeploy
Kata each time.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This PR adds the FIO benchmark scripts and resources for the metrics
tests section.
Fixes#7441
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This job will run on a nested virt capable Azure VM (improving test
concurrency). This is just a placeholder while we adapt the test to GHA.
Fixes: #6555
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
This PR adds general improvements like putting function before function
name and consistency in how we declare variables and so on to have
uniformity across the metrics scripts.
Fixes#7429
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
We should source from `nydus_dir`, instead of `cri_containerd_dir`, and
that was a leftover from fb4f7a002c.
Fixes: #6543
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This newly added GHA does nothing, is not even triggered, and it's just
a placeholder that we'll grow in the next commits / PRs, so we can
actually start running the nydus tests as part of our CI.
Fixes: #6543
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The test is currently failing with GHA, and I don't think it makes sense
to block all the other tests to get merged while it's happening.
For now, let's disable it and re-enable it as soon as we have it
passing.
Reference: https://github.com/kata-containers/kata-containers/issues/7410
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Otherwise crictl will fail to remove them with:
```
getting sandbox status of pod "$pod": metadata.Name, metadata.Namespace
or metadata.Uid is not in metadata "..."
```
A huge shout out to Steven Horsman for helping to debug this one.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
For this set of tests, we'll always be using podman in order to avoid
having containerd pulled in by docker.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We don't need the env var, we just need to restrict the test according
to the KATA_HYPERVISOR used, as right now it's very specifict to QEMU.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We only have shim-v2 as the runtime type, so we always need to run tests
using it. :-)
We had to adjust the script in order to properly run the tests with the
current logic.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's move the `integration/containerd/cri/integration-tests.sh` file
from the tests repo to this one.
The file has been moved as it is, it's not used, and in the following
commits we'll clean it up before actually using it.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's install all the dependencies needed for running the
`cri-containerd` tests.
The list of dependencies we have are:
* From the system
- build-essential
- jq
- podman-docker
* From our own repo
- yq
- go
* From GitHub projects
- containerd
- cri-tools
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This function will simply clone containerd repo, specifically on a tag
we want to use to test.
This can be expanded for different projects, and it will be the case as
soon as we grow the tests. But, for now, let's keep it simple.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This function will install cri-tools in the host, and soon enough (as
part of this PR) we'll be using it to install cri-tools as part of the
cri-containerd tests.
I've decided to have this as part of the `common.bash` as other tests
that will be added in the future will require cri-tools to be installed
as well.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This function will install cri-containerd in the host, and soon enough
(as part of this PR) we'll be using it to install cri-containerd as part
of the cri-containerd tests.
I've decided to have this as part of the `common.bash` as other tests
that will be added in the future will require cri-containerd to be
installed as well.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This function will hel us to get the tarball, from a github project,
that we're going to use as part of our tests.
Right now this is not used anywhere, but it'll soon enough (as part of
this series) be used to download the cri-containerd / cri-tools / cni
tarballs.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This function will help us to get the latest patch release from a
GitHub project.
The idea behind this function is that we don't have to keep updating
versions.yaml that frequently (or worse, have it outdated as it
currently is), and always test against the latest patch release of a
given project's version that we care about.
Although right now this is not used anywhere, this will be used with the
coming cri-containerd tests, which will be part of this series.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's adjust paths for what we source and the scripts we call, after
moving from the tests repo to this one.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's move `.ci/install_go.sh` file from the tests repo to this one.
The file has been moved as it is, it's not used, and in the following
commits we'll clean it up before actually using it.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Right now we'd need to import lib.sh just in order to get cross-build
information for rust, and it seems a little bit premature to do so at
this stage and only for rust.
Let's skip it and keep this transition simple.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's move `.ci/kata-arch.sh` file from the tests repo to this one.
The file has been moved as it is, it's not used, and in the following
commits we'll clean it up before actually using it.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
First of all, I'm 100% aware that I'm duplicating this function here as
I've copied it from the packaging stuff, and I'm not exactly proud of
that.
However, right now it seems a little bit premature to combine that set
of scripts with this set of scripts in a single one and make them used
by both pieces of our project.
Anyways, this functions helps to get information from the
`versions.yaml` file, and it'll be used as part of the cri-containerd
tests and a few others in the future.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This is already declared as part of the `common.bash` file, so let's
just make sure we use it from there.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Considering that someone may want to run the tests locally, we shouldn't
rely on having GITHUB_WORKSPACE exported, and fallback to $HOME/go if
needed.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
When the glob star is inside quotes, there is only one iteration of the loop
and b holds all matches at once. Move the glob out of the quotes so that we
actually iterate over matched paths.
Fixes: #6543
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
The `install_kata` function was moved from the metrics' `gha-run.sh`
file to the `common.bash` in the commit 3ffd48bc16, but I didn't notice
that it brought with it a call to `install_check_metrics`, which is
totally unrelated to installing Kata Containers.
Let's remove the call so the function is a little bit less specific, and
move the call to install_check_metrics to the metrics `gha-run.sh` file.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This will help us to in two fronts:
* catching possible issues related to kata-deploy cleanup
* do more (like, in the future, collect logs) after the tests run
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR adds the tensorflow function in gha-run script in order to
be triggered in the gha.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds function before function of the variables at the memory
inside container script in order to have uniformity across the script.
Fixes#7386
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR kills the hypervisor and the kata shim in the
init_env stage prior to launch any metric test.
Additionally this PR adds info messages in the main blocks
of the blogbench test to help in debugging.
Fixes: #7366
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR adds C-Ray performance test in order to be part of the kata
metrics CI.
Fixes#7375
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR updates the machine learning documentation related with
Tensorflow and Pytorch benchmarks.
Fixes#7359
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds the Tensorflow mobilinet documentation for the machine
learning README.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR makes kata-env is called only after some metrics have
completed his workload. This fixes a bug that occurs when
kata-env was being called before kata is already installed on the
testing platform.
Fixes: #7348
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR uses squared brackets in a jq expression to access
key values corresponding to metric results in json format.
The values are the data inputs into the checkmetrics tool.
Fixes: #7319
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This will help us to gather more information about Kata Containers in
case of failure.
Fixes: #7343
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
In order to run kata metrics we need to check that the containerd
config file is properly set. When this is not the case, we
need to remove that file, and generate a valid one.
This PR runs rm -f in order to ignore errors in case the
file to delete does not exist.
Fixes: #7336
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR adds tensorflow mobilenet performance test for
kata metrics.
Fixes#7334
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds the storage metrics documentation for blogbench for kata
metrics.
Fixes#7329
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds function before the function name in common.bash script
in order to have uniformity across all the script.
Fixes#7327
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Let's make sure we install the needed dependencies for running the
`cri-containerd` tests.
Right now this commit is basically adding a placeholder, and later on,
when we'll actually be able to test the job, we'll add the logic of
installing the needed dependencies.
The obvious dependencies we've spotted so far are:
* From the OS
* jq
* curl (already present)
* From our repo
* yq (using the install_yq script)
* From GitHub
* cri-containerd
* cri-tools
* cni plugins
We may need a few more packages, but we will only figure this out as
part of the actual work.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Use bc tool to perform math operations even when variables contain
values with leading zero.
Fixes: #7317
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR adds double quotes to variables in the blogbench script to
have uniformity across all the tests.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR enables the blogbench performance test for the kata metrics CI.
Fixes#7281
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR builds the foundation for us to start migrating the
cri-containerd tests from Jenkins to GitHub Actions.
Right now the test does nothing and should always finish successfully.
The coming PRs will actually introduce logic to the `gha-run.sh` script
where we'll be able to run the tests and make sure those pass before
having them actually merged.
Fixes: #6543
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Those functions were originally introduced as part of the
`metrics/gha-run.sh` file, but those will be very hand at the time we
start adding more tests.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This is a very simple addition, that should be expanded by
https://github.com/kata-containers/kata-containers/pull/7185, and it's
targetting gathering more info that will help us to debug CI failures.
Fixes: #7296
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR updates memory usage script by applying the clean_env_ctr at the main
in order to avoid failures of leaving certain processes not removed.
Fixes#7302
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Currently a mixture of cbl-mariner and mariner is used when creating the
mariner initrd. The kata-static tarball has mariner in the name, but the
jenkins url uses cbl-mariner. This breaks cache usage.
Use mariner as the target name throughout the build, so that caching works.
Fixes: #7292
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
We've noticed this caused regressions with the k8s-oom tests, and then
decided to take a step back and do this in the same way it was done
before 67972ec48a.
Moreover, this step back is also more reasonable in terms of the
controlling logic.
And by doing this we can re-enable the k8s-oom.bats tests, which is done
as part of this PR.
Fixes: #7271
Depends-on: github.com/kata-containers/tests#5705
Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
Let's skip the k8s-oom, as the test is currently failing.
We've an issue opened for that, and we'll be working on re-enabling it
as soon as possible.
Reference:
https://github.com/kata-containers/kata-containers/issues/7271Fixes: #7253
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's skip the k8s-number-cpus, as the test is currently failing.
We've an issue opened for that, and we'll be working on re-enabling it
as soon as possible.
Reference:
https://github.com/kata-containers/kata-containers/issues/7270Fixes: #7253
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR will enable the memory inside container metrics for the Kata CI.
Fixes#7254
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR fixes the call to check_metrics function as KATA_HYPERVISOR
is not needed to be passed.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Let's make sure we run our tests in a specific namespace, as in case of
any kind of issue, we will just get rid of the namespace itself, which
will take care of cleaning up any leftover from failing tests.
One important thing to mention is why we can get rid of the `namespace:
${namespace}` on the tests that are already using it, and let's do it in
parts:
* namespace: default
We can easily get rid of this as that's the default namespace where
pods are created, so it was a no-op so far.
* namespace: test-quota-ns
My understanding is that we'd need this in order to get a clean
namespace where we'd be setting a quota for. Doing this in the
namespace that's only used for tests should **not** cause any
side-effect on the tests, as we're running those in serial and there's
no other pods running on the `kata-containers-k8s-tests` namespace
Last but not least, we're not dynamically creating namespaces as the
tests are not running in parallel, **never**, not in the case of having
2 tests being ran at same time, neither in the case of having 2 jobs
being scheduled to the same machine.
Fixes: #6864
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Use the 'function' keyword to prevent bash aliases from colliding
with other function's name.
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR enables storing metrics workflow artifacts in two
separated flavours: clh and qemu.
Fixes: #7239
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR adds the function name before the function to have uniformity
across all the test.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds blogbench and webtooling metrics checks to this repo.
The function running the test intentionally returns zero, so
the test will be enabled in another PR once the workflow is
green.
Fixes: #7069
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR usses double quotes in all the variables as well as general fixes
to the memory usage script in order to have uniformity.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Non AKS k8s tests (SEV/SNP/TDX) don't currently set KATA_HOST_OS, so provide a
default empty value for the variable so that those tests can run.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
as OSSKU value, to get rid of this warning when creating the AKS cluster:
WARNING: The osSKU "AzureLinux" should be used going forward instead of
"CBLMariner" or "Mariner". The osSKUs "CBLMariner" and "Mariner" will
eventually be deprecated.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
We only need to install in run_tests() so that the yq install is picked up by
kubernets/setup.sh as well. We also need to either use (sudo &&
INSTALL_IN_GOPATH=false) || (INSTALL_IN_GOPATH=true).
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
This PR adds the checkmetrics ci worker file for cloud hypervisor in
order to check the boot times limit.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds double quotes in all variables to have uniformity across
all the gha-run.sh script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds checkmetrics installation for gha-run.sh in order to compare
results limits as part of the metrics CI.
Fixes#7198
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Mariner ships a bleeding-edge kernel that might be ahead of upstream, so
we use that to guarantee compatibility with the host.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
* Adds a new `rootfs-initrd-mariner` build target.
* Sets the custom initrd path via annotation in `setup.sh` at test
time.
* Adapts versions.yaml to specify a `cbl-mariner` initrd variant.
* Introduces env variable `HOST_OS` at deploy time to enable using a
custom initrd.
* Refactors the image builder so that its caller specifies the desired
guest OS.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This PR adds memory foot print metrics to tests/metrics/density
folder.
Intentionally, each test exits w/ zero in all test cases to ensure
that tests would be green when added, and will be enabled in a
subsequent PR.
A workflow matrix was added to define hypervisor variation on
each job, in order to run them sequentially.
The launch-times test was updated to make use of the matrix
environment variables.
Fixes: #7066
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR adds the word function before the function names in order to have
uniformity across the script as some are using this and some are not.
Fixes#7196
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds link to the unreference docs in the cmd path to make
them more discoverable.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds checkmetrics makefile which is used to process the
metrics json results files.
Fixes#7172
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds time tests documentation reference in the general README
for kata metrics.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds boot time metrics documentation for kata metrics tests.
Fixes#7170
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds the metrics documentation as a general reference in the
main README for kata containers.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds the checkmetrics scripts that will be used for the kata metrics CI.
Fixes#7160
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds a general metrics introduction documentation for the kata CI.
Fixes#7157
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR installs kata static tarball on metrics runner
and run launch-times tests.
Fixes: #7049
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
The common.sh script includes helper functions used in
our metrics tests, so we are gradually adding more
metrics used in kata.
Fixes: #7108
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This test measures the duration of a workload that starts, and then
immediately stops the contianer. Also measures the workload period,
the time to quit period, and the time to kernel period.
Fixes: #7049
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR adds the json script which allow us to save the metrics results
into a json file which will be used in the kata containers metrics.
Fixes#7128
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds the test lib common script that is going to be used
for kata containers metrics.
Fixes#7113
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This gh-workflow prints a simple msg, but is the base for future
PRs that will gradually add the jobs corresponding to the kata
metrics test.
Fixes: #7100
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
Github Actions reads and runs workflow files from the main branch,
rather than from the PR branch. This means that PRs that modify workflow
files aren't being tested with the updated workflows coming from the PR,
but rather with the old workflows from the main branch. AFAIK, this
behavior isn't avoidable for workflow files (but is for other scripts).
This makes it very hard to reliably test workflow changes before they're
actually merged into main and leads to issues that we have to hotifx
(see #6983, #6995).
This PR aims to mitigate that by extracting the commands used in
workflows to a separate script file. The way our CI is set up, those
script files are read from the PR branch and thus changes would be
reflected in the CI checks.
Fixes: #6971
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
The current testing setup only supports running Kata on top of an Ubuntu
host. This adds Mariner to the matrix of testable hosts for k8s
tests, with Cloud Hypervisor as a VMM.
As preparation for the upcoming PR that will change only the actual test
code (rather than workflow YAMLs), this also introduces a new file
`setup.sh` that will be used to set host-specific parameters at test
run-time.
Fixes: #6961
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Currently Kata does not support memory / CPU hotplug for SEV or
SEV-SNP so we need to skip tests that rely on it.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
Now that SEV artifacts are built by GHA, remove
conditional that skips tests when using qemu-sev.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
Now that we have SNP artifacts in place and they are built via gha,
remove the condition that skips the tests for SNP.
Fixes: #6809
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
With the changes proposed as part of this PR, a qemu-snp cluster
will be created but no tests will be performed.
GitHub Actions will only run the tests using the workflows that are
part of the **target** branch, instead of the using the ones coming
from the PR. No way to work around this for now.
After this commit is merged, the tests (not the yaml files for the
actions) will be altered in order for the checkout action to help in
this case.
Fixes: #6722
Signed-off-by: Ryan Savino <ryan.savino@amd.com>
With the changes proposed as part of this PR, a qemu-sev cluster will
be created but no tests will be performed.
GitHub Actions will only run the tests using the workflows that are
part of the **target** branch, instead of the using the ones coming
from the PR. No way to work around this for now.
After this commit is merged, the tests (not the yaml files for the
actions) will be altered in order for the checkout action to help in this
case.
Fixes: #6711
Signed-off-by: Ryan Savino <ryan.savino@amd.com>
Now that the infra for running dragonball tests has been enabled, let's
actually make sure to have them running on each PR.
The tests skipped are:
* `k8s-cpu-ns.bats`, as CPU resize doesn't seem to be yet properly
supported on runtime-rs
* https://github.com/kata-containers/kata-containers/issues/6621Fixes: #6605
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
With the changes proposed as part of this PR, an AKS cluster will be
created but no tests will be performed.
The reason we have to do this is because GitHub Actions will only run
the tests using the workflows that are part of the **target** branch,
instead of the using the ones coming from the PR, and we didn't find yet
a way to work this around.
Once this commit is in, we'll actually change the tests themselves (not
the yaml files for the actions), as those will be the ones we want as
the checkout action helps us on this case.
Fixes: #6583
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The first part of simplifying things to have all our tests using GitHub
actions is moving the k8s tests to this repo, as those will be the first
vict^W targets to be migrated to GitHub actions.
Those tests have been slightly adapted, mainly related to what they load
/ import, so they are more self-contained and do not require us bringing
a lot of scripts from the tests repo here.
A few scripts were also dropped along the way, as we no longer plan to
deploy kubernetes as part of every single run, but rather assume there
will always be k8s running whenever we land to run those tests.
It's important to mention that a few tests were not added here:
* k8s-block-volume:
* k8s-file-volume:
* k8s-volume:
* k8s-ro-volume:
These tests depend on some sort of volume being created on the
kubernetes node where the test will run, and this won't fly as the
tests will run from a GitHub runner, targetting a different machine
where kubernetes will be running.
* https://github.com/kata-containers/kata-containers/issues/6566
* k8s-hugepages: This test depends a whole lot on the host where it
lands and right now we cannot assume anything about that anymore, as
the tests will run from a GitHub runner, targetting a different
machine where kubernetes will be running.
* https://github.com/kata-containers/kata-containers/issues/6567
* k8s-expose-ip: This is simply hanging when running on AKS and has to
be debugged in order to figure out the root cause of that, and then
adapted to also work on AKS.
* https://github.com/kata-containers/kata-containers/issues/6578
Till those issues are solved, we'll keep running a jenkins job with
hose tests to avoid any possible regression.
Last but not least, I've decided to **not** keep the history when
bringing those tests here, otherwise we'd end up polluting a lot the
history of this repo, without any clear benefit on doing so.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>