Commit Graph

356 Commits

Author SHA1 Message Date
stevenhorsman
1022d8d260 metrics: Update range for clh tests
In ef0e8669fb we
had been seeing some significantly lower minvalues in
the jitter.Result test, so I lowered the mid-value rather
than having a very high minpercent, but it appears that the
variability of this result is very high, so we are still getting
the occasional high value, so reset the midval and just
have a bigger ranges on both sides, to try and keep the test
stable.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-14 14:54:30 +00:00
stevenhorsman
d77008b817 metrics: Further reduce repeats for boot time tests on qemu
I've seen failures on the third run, so reduce it further to
just run twice on qemu

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-14 14:53:26 +00:00
stevenhorsman
97151cce4e metrics: Improve iperf timeout
The kubectl wait has a built in timeout of 30s, so
wrapping it in waitForProcess, means we have
180/2 * 30 delay, which is much longer than intended,
so just set the timeout directly.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-14 14:53:26 +00:00
Zvonko Kaiser
4bb0eb4590
Merge pull request #10954 from kata-containers/topic/metrics-kata-deploy
Rework and fix metrics issues
2025-03-04 20:22:53 -05:00
stevenhorsman
b220cca253 shellcheck: Fix shellcheck SC2066
> Since you double-quoted this, it will not word split, and the loop will only run once.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-04 09:39:10 +00:00
stevenhorsman
c5ff513e0b shellcheck: Fix shellcheck SC2068
> Double quote array expansions to avoid re-splitting elements

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-04 09:35:46 +00:00
stevenhorsman
58672068ff shellcheck: Fix shellcheck SC2145
> Argument mixes string and array. Use * or separate argument.

- Swap echos for printfs and improve formatting
- Replace $@ with $*
- Split arrays into separate arguments

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-04 09:35:46 +00:00
stevenhorsman
c69509be1c metrics: Reduce repeats for boot time tests on qemu
On qemu the run seems to error after ~4-7 runs, so try
a cut down version of repetitions to see if this helps us
get results in a stable way.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-02 08:42:00 +00:00
stevenhorsman
0962cd95bc metrics: Increase minpercent range for qemu iperf test
We have a new metrics machine and environment
and the iperf jitter result failed as it finished too quickly,
so increase the minpercent to try and get it stable

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-02 08:32:26 +00:00
stevenhorsman
ef0e8669fb metrics: Increase minpercent range for clh tests
We have a new metrics machine and environment
and the fio write.bw and iperf3 parallel.Results
tests failed for clh, as below
the minimum range, so increase the
minpercent to try and get it stable

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-02 08:32:26 +00:00
stevenhorsman
f81c85e73d metrics: Increase maxpercent range for clh boot times
We have a new metrics machine and environment
and the boot time test failed for clh, so increase the
maxpercent to try and get it stable

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-01 17:50:05 +00:00
stevenhorsman
435ee86fdd metrics: Update iperf affinity
The iperf deployment is quite a lot out of date
and uses `master` for it's affinity and toleration,
so update this to control-plane, so it can run on
newer Kubernetes clusters

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-01 17:50:05 +00:00
stevenhorsman
85bbc0e969 metrics: Increase wait time
The new metrics runner seems slower, so we are
seeing errors like:
The iperf3 tests are failing with:
```
pod rejected: RuntimeClass "kata" not found
```
so give more time for it to succeed

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-01 17:50:05 +00:00
stevenhorsman
4ce94c2d1b Revert "metrics: Add init_env function to latency test"
This reverts commit 9ac29b8d38.
to remove the duplicate `init_env` call

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-01 17:50:05 +00:00
stevenhorsman
658a5e032b metrics: Increase containerd start timeout
- Move `kill_kata_components` from common.bash
into the metrics code base as the only user of it
- Increase the timeout on the start of containerd as
the last 10 nightlies metric tests have failed with:
```
223478 Killed                  sudo timeout -s SIGKILL "${TIMEOUT}" systemctl start containerd
```

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-01 17:50:05 +00:00
stevenhorsman
3fab7944a3 workflows: Improve metrics jobs
- As the metrics tests are largely independent
then allow subsequent tests to run even if previous
ones failed. The results might not be perfect if
clean-up is required, but we can work on that later.
- Move the test results check out of the latency
test that seems arbitrary and into it's own job step
- Add timeouts to steps that might fail/hang if there
are containerd/K8s issues

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-01 17:50:05 +00:00
stevenhorsman
6f918d71f5 workflows: Update metrics jobs
Currently the run-metrics job runs a manual install
and does this in a separate job before the metrics
tests run. This doesn't make sense as if we have multiple
CI runs in parallel (like we often do), there is a high chance
that the setup for another PR runs between the metrics
setup and the runs, meaning it's not testing the correct
version of code. We want to remove this from happening,
so install (and delete to cleanup) kata as part of the metrics
test jobs.

Also switch to kata-deploy rather than manual install for
simplicity and in order to test what we recommend to users.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-01 17:50:05 +00:00
Balint Tobik
1943a1c96d tests: replace egrep with grep -E to avoid deprecation warning
https://lists.gnu.org/archive/html/info-gnu/2022-09/msg00001.html

Signed-off-by: Balint Tobik <btobik@redhat.com>
2025-01-29 11:26:27 +01:00
stevenhorsman
d031e479ab metrics: Increase minval range for blogbench test
In the last couple of days I've seen the blogbench
metrics write latency test on clh fail a few times because
the latency was too low, so adjust the minimum range
to tolerate quicker finishes.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-01-23 15:58:31 +00:00
stevenhorsman
aaae5b6d0f metrics: clh: Increase network-iperf3 range
We hit a failure with:
```
time="2025-01-09T09:55:58Z" level=warning msg="Failed Minval (0.017600 > 0.015000) for [network-iperf3]"
```
The range is very big, but in the last 3 test runs I reviewed we have got a minimum value of 0.015s
and a max value of 0.052, so there is a ~350% difference possible
so I think we need to have a wide range to make this stable.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-01-09 11:25:57 +00:00
stevenhorsman
e946d9d5d3 metrics: qemu: Increase latency test range
After the kernel version bump, in the latest nightly run
https://github.com/kata-containers/kata-containers/actions/runs/12681309963/job/35345228400
The sequential read throughput result was 79.7% of the expected (so failed)
and the sequential write was 84% of the expected, so was fairly close,
so increase their minimum ranges to make them more robust.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-01-09 11:25:50 +00:00
stevenhorsman
dc069d83b5 metrics: Increase latency test range
The bump to kernel 6.12 seems to have reduced the latency in
the metrics test, so increase the ranges for the minimal value,
to account for this.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-01-08 15:11:49 +00:00
stevenhorsman
b87b4b6756 metrics: Increase ranges range for qemu failing tests
We've also seen the qemu metrics tests are failing due to the results
being slightly outside the max range for network-iperf3 parallel and minimum for network-iperf3 jitter tests on PRs that have no code changes,
so we've increase the bounds to not see false negatives.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-11-29 10:52:16 +00:00
stevenhorsman
4011071526 metrics: Increase minval range for failing tests
We've seen a couple of instances recently where the metrics
tests are failing due to the results being below the minimum
value by ~2%.
For tests like latency I'm not sure why values being too low would
be an issue, but I've updated the minpercent range of the failing tests
to try and get them passing.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-11-29 10:50:02 +00:00
Gabriela Cervantes
52ef092489 metrics: Update fast footprint script to use grep
This PR updates the fast footprint script to remove the use
of egrep as this command has been deprecated and change it
to use grep command.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-09-30 17:43:08 +00:00
Gabriela Cervantes
fdaf12d16c metrics: Remove unused remove img var in common script
This PR removes the remove_img variable in the metrics common script
as it is not being used.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-09-11 17:45:18 +00:00
Gabriela Cervantes
fcc35dd3a7 metrics: Update openVINO and oneDNN tests references
This PR updates the machine learning tests references or urls for the
openVINO and oneDNN scripts as currently they are refering to a different
performance benchmark.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-09-05 15:39:21 +00:00
Gabriela Cervantes
5b0ab7f17c metrics: Remove metrics report for Kata Containers
This PR removes the metrics report which is not longer being used
in Kata Containers.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-09-03 16:11:07 +00:00
Gabriela Cervantes
aa8635727d metrics: Remove unused variable in oneDNN benchmark
This PR removes an unused variable in oneDNN metrics benchmark.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-08-29 15:52:47 +00:00
Gabriela Cervantes
3affde5b28 docs: Add oneDNN benchmark information to metrics README
This PR adds the oneDNN benchmark information to the machine
learning metrics README.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-08-27 16:32:50 +00:00
Gabriela Cervantes
2fa8e85439 metrics: Add OpenVINO general information into README
This PR adds the OpenVINO benchmark general information into the
machine learning README metrics information.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-08-22 16:08:06 +00:00
Gabriela Cervantes
59e31baaee metrics: Remove unused variable in openvino script
This PR removes an unused variable in the openvino script for kata
metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-08-21 16:05:55 +00:00
David Esparza
dcd0c0b269
metrics: Remove duplicated headers from results file.
This PR removes duplicated entries (vcpus count, and available memory),
from onednn and openvino results files.

Fixes: #10119

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2024-08-01 18:11:06 -06:00
Gabriela Cervantes
7454908690 metrics: Update memory tests to use grep -F
This PR updates the memory tests like fast footprint to use grep -F
instead of fgrep as this command has been deprecated.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-08-01 17:20:57 +00:00
Gabriela Cervantes
3d17a7038a metrics: Update launch times to use grep -F
This PR updates the metrics launch times to use grep -F instead of
fgrep as this command has been deprecated.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-07-23 17:13:52 +00:00
David Esparza
60f52a4b93
metrics: update avg reference values for blogbench.
This PR updates the Blogbench reference values for
read and write operations used in the CI check metrics
job.

This is due to the update to version 1.2 of blobench.

Fixes: #10039

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2024-07-18 15:47:14 -06:00
David Esparza
1fdc5c1183
Merge pull request #10028 from amshinde/upgrade-blogbench-1.2
metric: Upgrade blogbench to 1.2
2024-07-18 11:30:17 -06:00
Archana Shinde
30e5e88ff1 metric: Upgrade blogbench to 1.2
Move to blogbench 1.2 version from 1.1.
This version includes an important fix for the read_score test
which was reported to be broken in the previous version.
It essentially fixes this issue here:
https://github.com/jedisct1/Blogbench/issues/4

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2024-07-17 11:32:09 -07:00
GabyCT
2056eda5f0
Merge pull request #9922 from GabyCT/topic/updateblogname
metrics: Update container name in blogbench test
2024-07-11 10:05:35 -06:00
Aurélien Bombo
25e0e2fb35 ci: fix run-nydus tests
GH-9973 introduced:

 * New function get_kata_memory_and_vcpus() in
   tests/metrics/lib/common.bash.
 * A call to get_kata_memory_and_vcpus() from extract_kata_env(), which
   is defined in tests/common.bash.

Because the nydus test only sources tests/common.bash, it can't find
get_kata_memory_and_vcpus() and errors out.

We fix this by moving the get_kata_memory_and_vcpus() call from
tests/common.bash to tests/metrics/lib/json.bash so that it doesn't
impact the nydus test.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2024-07-10 17:19:08 +00:00
Alex Lyn
e4997760f1
Merge pull request #9987 from kata-containers/remove_double_process_check_from_memory_usage_test
metrics: Remove duplicate check of processes from memory test.
2024-07-10 10:12:18 +08:00
David Esparza
e77d44614b
metrics: Remove duplicate check of processes from memory test.
This PR removes the common_init function call from the memory
usage script to eliminate duplicate checking that is also done
from the init_env function.

It also eliminates duplicaction of nested conditionals.

Fixes: #9984

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2024-07-09 12:34:51 -06:00
David Esparza
04df85a44f
metrics: Add num_vcpus and free_mem to metrics results template.
This PR retrieves the free memory and the vcpus count from
a kata container and includes them to the json results file of
any metric.

Additionally this PR parses the requested vcpus quantity and the
requested amount memory from kata configuration file and includes
this pair of values into the json results file of any metric.

Finally, the file system defined in the kata configuration file
is included in the results template.

Fixes: #9972

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2024-07-09 10:29:29 -06:00
David Esparza
a554541495
metrics: Improvement to the description of certain functions.
This PR rephrased the description and usage of certain functions
as such as:
- set_kata_configuration_performance
- set_kata_config_file
- get_current_kata_config_file
- check_if_root
- check_ctr_images

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2024-07-09 10:29:29 -06:00
Gabriela Cervantes
b7da1291ea metrics: Remove variable in sysbench that is not being used
This PR removes the CI_JOB variable which previously was used but
not longer being supported of the metrics sysbench test.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-07-02 15:29:50 +00:00
Gabriela Cervantes
e3318a04f7 metrics: Update container name in blogbench test
This PR updates the container name to put a random name instead
of using a hard coded name. This PR is a general improvement
to avoid random bug failures specially when we are running on
baremetal environments.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-07-01 19:28:16 +00:00
Chelsea Mafrica
0b83c8549a tests: Update help section in openvino test
Test reports that it is a onednn test when it is openvino; update
description.

Fixes: #9948

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2024-07-01 14:24:50 +00:00
Gabriela Cervantes
671d9af456 metrics: Improve variable definition in memory inside containers script
This PR improves the variable definition in memory inside
the container script for metrics. This change declares and assigns
the variables separately to avoid masking return values.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-06-18 16:56:12 +00:00
Gabriela Cervantes
a96ff49060 metrics: Use function definition to have uniformity
This PR uses the function definition to have uniformity across
all the launch times script.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-06-11 17:36:08 +00:00
GabyCT
6d58fce4a9
Merge pull request #9677 from GabyCT/topic/memoryusags
metrics: Improve variable definition in memory usage script
2024-05-29 10:16:56 -06:00