The directory is a host path mount and cannot be removed from within the
container. What we actually want to remove is whatever is inside that
directory.
This may raise errors like:
```
rm: cannot remove '/opt/kata/': Device or resource busy
```
Fixes: #7746
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit d8f3ce6497)
The vfio test requires nested-nested virtualization:
L0 Azure host
-> L1 Ubuntu VM
-> L2 Fedora VM
-> L3 Kata
This hits a kernel bug on v5.15 but works quite nicely on the v6.2 kernel
included in Ubuntu 23.04. We can switch back to Ubuntu 22.04 when they roll out
v6.2.
Fixes: #6555
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit 936e8091a7)
docker install now creates a group with gid 999 which happens to match what we
need to get docker-in-docker to work. Remove the group first as we don't need
it.
Fixes: #7726
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit 3b881fbc0e)
This PR adds the TensorFlow ResNet50 fp32 Dockerfile for kata metrics.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 959ca49447)
This PR adds TensorFlow ResNet50 FP32 benchmark for kata metrics.
Fixes#7735
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 4b7d72c4a8)
We can simply use `rm -f` all over the place and avoid the container
returning any error.
Fixes: #7733
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 5cba38c175)
This PR adds disk link to README documentation for kata metrics.
Fixes#7721
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 8afd158cef)
This PR fixes the FIO path for the FIO files.
Fixes#7711
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit eee2ee6eeb)
This PR uses a common function into the pytorch script.
Fixes#7709
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 39bc3488f5)
This PR configures the corresponding kata runtime in K8s
based on the tested hypervisor.
This PR also enables FIO metrics test in the kata metrics-ci.
Fixes: #7665
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
(cherry picked from commit fb571f8be9)
This PR update tensorflow name in gha run script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 85c02828e1)
This PR fixes the check results for tensorflow benchmark now
that we change the name of the test.
Fixes#7684
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit e8a5119343)
Instead of doing this as part of the test itself, let's ensure it's done
before running the tests and during the tests cleanup.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 2d896ad12f)
This test, at least for now, only checks whether the runtimeclasses
have been properly created.
This is just a migration from a test we had as part of the k8s suite.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 4ffc2c86f3)
By doing this we can make sure there won't be any clash on the cluster
name created for either the k8s or the kata-deploy tests.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 285e616b5e)
This will have the same function as run-k8s-tests.sh has, but for
kata-deploy.
Right now it doesn't have any tests, and the command to actually run the
tests is commented out, but right now this is just a placeholder that
will be populated sooner than later.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit ce6adecd0a)
In a follow-up series, we'll add a whole suite for the kata-deploy
tests. With this in mind, let's already get rid of this one and avoid
more kata-deploy tests to land here.
Fixes: #7642
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit cfc29c11a3)
The default `kata` runtime class would get created with the `kata`
handler instead of `kata-$KATA_HYPERVISOR`. This made Kata use the wrong
hypervisor and broke CI.
Fixes: #7663
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
(cherry picked from commit 339569b69c)
This PR fixes MobileNet help me description in the
tensorflow script.
Fixes#7661
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 2a491e9b1f)
Let's add the tests as part of the ci.yaml, so they an be triggered as
part of each PR.
For this PR those tests won't be triggered, courtesy to the
`pull_request_target` event we rely on.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit d19a75e80c)
This PR fixes the TensorFlow word across the document to have uniformity
across all the document.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit bade6a5c3b)
This PR adds the dockerfile for ResNet50 int8 benchmark.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 24baededc0)
This PR adds the Tensorflow ResNet50 int8 script for kata metrics.
Fixes#7652
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 6d971ba8df)
As the cri-containerd tests have been fully migrated to GHA, let's make
sure we get them running.
Fixes: #6543
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit b3592ab25c)
As part of the runners, we're hitting a timeout that I cannot reproduce,
at all, when allocating the same instance and running the tests
manually.
The default timeout to connect to the server is 2s when using `crictl`.
Let's increase this to 20s.
It's fairly important to mention that in the first tests I used a
timeout of 10s, and that helped but we still hit issues every now and
then.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 84dd02e0f9)
It'll help us to debug failures with the pod stop / pod delete.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit b29782984a)
We need this to fully understand what are the issues we're facing.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit ae0930824a)
This improves readability in case of failures by a lot.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 6c8b2ffa60)
This PR renames the tensorflow scripts to include the data format
that is being used as we will have multiple tests with different
data and model formats for tensorflow so this will help us to
distinguish them.
Fixes#7645
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 18a7fd8e4e)
This will not be tested as part of the PR, thanks to the
`pull_request_target` event, but we want it to be added so we can build
atop of that in a coming up series.
Fixes: #7642
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit e55fa93db9)
This will not be tested as part of the PR, thanks to the
`pull_request_target` event, but we want it to be added so we can build
atop of that in a coming up series.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit d9ee17aaec)
Right now this file does nothing, as it's not even called by any GHA.
However, it'll be populated later on as part of a different series,
where we'll have kata-deploy specific tests running here.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 831e73ff91)
Let's split a good portion of `tests/integration/kuberentes/gha-run.sh`
out, and put them in a place where they can be used to the soon-to-come
kata-deploy specific tests.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit af1b46bbf2)
This PR fixed the loop that stops the kata-shim and the
hypervisors used in metrics checks.
Fixes: #7628
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
(cherry picked from commit 767434d50a)
This PR adds cassandra statefulset yaml for kata metrics.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 5d0f0d43c7)
This PR adds the cassandra service yaml for the benchmark.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit c1dcc1396f)
This PR adds the block loop pv yaml for cassandra test.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit e3d511946f)
This PR adds the block loop pvc for cassandra test.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 9890271594)
This PR adds Cassandra Kubernetes benchmark for kata metrics tests.
Fixes#7625
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 349b89969a)
The GHA runners are not exactly powerful, which makes the static-checks
take way too long (almost an hour).
Let's give a try and move those to the same size of Azure instances used
as part of our CI, and probably have this time reduced.
Fixes: #7446
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit c52d090522)
This PR adds check containers are running in tensorflow mobilenet
that is being defined in common script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit fdcd52ff78)
This PR adds the check containers are up function from common
in tensorflow script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 36337ee146)