Commit Graph

370 Commits

Author SHA1 Message Date
stevenhorsman
d06dadd8ef docs: Spelling updates
Either fixing typos, or including program/repo name in
backticks

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-19 10:22:54 +00:00
stevenhorsman
8ae0e36737 versions: bump golang to 1.25.8
Bump the builder image and versions to resolve CVEs:
- GO-2026-4601
- GO-2026-4602
- GO-2026-4603

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-09 09:10:01 +00:00
stevenhorsman
993a4846c8 versions: Bump go to 1.25.7
Now that go 1.26 is out, 1.24 is not supported, so bump to
1.25 as per our policy.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-02 16:33:47 +01:00
stevenhorsman
9b307a5fa6 metrics: Uncapitalise error strings
Fix `T1005: error strings should not be capitalized (staticcheck)`
This is to comply with go conventitions as errors are normally appended,
so there would be a spurious captialisation in the middle of the message

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-24 14:33:04 +00:00
stevenhorsman
15d6a681ed doc: Fix spelling issues
Put things in backticks

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-10 21:58:28 +01:00
Fabiano Fidêncio
5c0269881e tests: Make editorconfig-checker happy
- Trim trailing whitespace and ensure final newline in non-vendor files
- Add .editorconfig-checker.json excluding vendor dirs, *.patch, *.img,
  *.dtb, *.drawio, *.svg, and pkg/cloud-hypervisor/client so CI only
  checks project code
- Leave generated and binary assets unchanged (excluded from checker)

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-10 21:58:28 +01:00
stevenhorsman
b29312289f versions: Bump go to 1.24.13
Bump go to 1.24.13 to fix CVE GO-2026-4337

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-09 14:49:31 +01:00
Steve Horsman
4d1095e653 Merge pull request #12350 from manuelh-dev/mahuber/term-grace-period
tests: Remove terminationGracePeriod in manifests
2026-01-29 15:17:17 +00:00
Fabiano Fidêncio
500146bfee versions: Bump Go to 1.24.12
Update Go from 1.24.11 to 1.24.12 to address security vulnerabilities
in the standard library:

- GO-2026-4342: Excessive CPU consumption in archive/zip
- GO-2026-4341: Memory exhaustion in net/url query parsing
- GO-2026-4340: TLS handshake encryption level issue in crypto/tls

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-29 00:23:26 +01:00
Manuel Huber
6438fe7f2d tests: Remove terminationGracePeriod in manifests
Do not kill containers immediately, instead use Kubernetes'
default termination grace period.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-01-23 16:18:44 -08:00
stevenhorsman
403de2161f version: Update golang to 1.24.11
Needed to fix:
```
Vulnerability #1: GO-2025-4155
    Excessive resource consumption when printing error string for host
    certificate validation in crypto/x509
  More info: https://pkg.go.dev/vuln/GO-2025-4155
  Standard library
    Found in: crypto/x509@go1.24.9
    Fixed in: crypto/x509@go1.24.11
    Vulnerable symbols found:
      #1: x509.HostnameError.Error
```

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-04 22:50:07 +01:00
Fabiano Fidêncio
14039c9089 golang: Update to 1.24.9
In order to fix:
```

=== Running govulncheck on containerd-shim-kata-v2 ===
 Vulnerabilities found in containerd-shim-kata-v2:
=== Symbol Results ===

Vulnerability #1: GO-2025-4015
    Excessive CPU consumption in Reader.ReadResponse in net/textproto
  More info: https://pkg.go.dev/vuln/GO-2025-4015
  Standard library
    Found in: net/textproto@go1.24.6
    Fixed in: net/textproto@go1.24.8
    Vulnerable symbols found:
      #1: textproto.Reader.ReadResponse

Vulnerability #2: GO-2025-4014
    Unbounded allocation when parsing GNU sparse map in archive/tar
  More info: https://pkg.go.dev/vuln/GO-2025-4014
  Standard library
    Found in: archive/tar@go1.24.6
    Fixed in: archive/tar@go1.24.8
    Vulnerable symbols found:
      #1: tar.Reader.Next

Vulnerability #3: GO-2025-4013
    Panic when validating certificates with DSA public keys in crypto/x509
  More info: https://pkg.go.dev/vuln/GO-2025-4013
  Standard library
    Found in: crypto/x509@go1.24.6
    Fixed in: crypto/x509@go1.24.8
    Vulnerable symbols found:
      #1: x509.Certificate.Verify
      #2: x509.Certificate.Verify

Vulnerability #4: GO-2025-4012
    Lack of limit when parsing cookies can cause memory exhaustion in net/http
  More info: https://pkg.go.dev/vuln/GO-2025-4012
  Standard library
    Found in: net/http@go1.24.6
    Fixed in: net/http@go1.24.8
    Vulnerable symbols found:
      #1: http.Client.Do
      #2: http.Client.Get
      #3: http.Client.Head
      #4: http.Client.Post
      #5: http.Client.PostForm
      Use '-show traces' to see the other 9 found symbols

Vulnerability #5: GO-2025-4011
    Parsing DER payload can cause memory exhaustion in encoding/asn1
  More info: https://pkg.go.dev/vuln/GO-2025-4011
  Standard library
    Found in: encoding/asn1@go1.24.6
    Fixed in: encoding/asn1@go1.24.8
    Vulnerable symbols found:
      #1: asn1.Unmarshal
      #2: asn1.UnmarshalWithParams

Vulnerability #6: GO-2025-4010
    Insufficient validation of bracketed IPv6 hostnames in net/url
  More info: https://pkg.go.dev/vuln/GO-2025-4010
  Standard library
    Found in: net/url@go1.24.6
    Fixed in: net/url@go1.24.8
    Vulnerable symbols found:
      #1: url.JoinPath
      #2: url.Parse
      #3: url.ParseRequestURI
      #4: url.URL.Parse
      #5: url.URL.UnmarshalBinary

Vulnerability #7: GO-2025-4009
    Quadratic complexity when parsing some invalid inputs in encoding/pem
  More info: https://pkg.go.dev/vuln/GO-2025-4009
  Standard library
    Found in: encoding/pem@go1.24.6
    Fixed in: encoding/pem@go1.24.8
    Vulnerable symbols found:
      #1: pem.Decode

Vulnerability #8: GO-2025-4008
    ALPN negotiation error contains attacker controlled information in
    crypto/tls
  More info: https://pkg.go.dev/vuln/GO-2025-4008
  Standard library
    Found in: crypto/tls@go1.24.6
    Fixed in: crypto/tls@go1.24.8
    Vulnerable symbols found:
      #1: tls.Conn.Handshake
      #2: tls.Conn.HandshakeContext
      #3: tls.Conn.Read
      #4: tls.Conn.Write
      #5: tls.Dial
      Use '-show traces' to see the other 4 found symbols

Vulnerability #9: GO-2025-4007
    Quadratic complexity when checking name constraints in crypto/x509
  More info: https://pkg.go.dev/vuln/GO-2025-4007
  Standard library
    Found in: crypto/x509@go1.24.6
    Fixed in: crypto/x509@go1.24.9
    Vulnerable symbols found:
      #1: x509.CertPool.AppendCertsFromPEM
      #2: x509.Certificate.CheckCRLSignature
      #3: x509.Certificate.CheckSignature
      #4: x509.Certificate.CheckSignatureFrom
      #5: x509.Certificate.CreateCRL
      Use '-show traces' to see the other 27 found symbols

Vulnerability #10: GO-2025-4006
    Excessive CPU consumption in ParseAddress in net/mail
  More info: https://pkg.go.dev/vuln/GO-2025-4006
  Standard library
    Found in: net/mail@go1.24.6
    Fixed in: net/mail@go1.24.8
    Vulnerable symbols found:
      #1: mail.AddressParser.Parse
      #2: mail.AddressParser.ParseList
      #3: mail.Header.AddressList
      #4: mail.ParseAddress
      #5: mail.ParseAddressList
```

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-03 16:57:22 +01:00
stevenhorsman
87356269d8 versions: Tidy up go.mod versions
Update go 1.23 references to go 1.24.6 to match
versions.yaml

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-09-08 14:03:47 +01:00
stevenhorsman
c37840ce80 versions: Bump golang version
Bump golang version to the latest minor 1.23.x release
now that 1.24 has been released and 1.22.x is no longer
stable and receiving security fixes

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-04-23 12:37:48 +01:00
stevenhorsman
1022d8d260 metrics: Update range for clh tests
In ef0e8669fb we
had been seeing some significantly lower minvalues in
the jitter.Result test, so I lowered the mid-value rather
than having a very high minpercent, but it appears that the
variability of this result is very high, so we are still getting
the occasional high value, so reset the midval and just
have a bigger ranges on both sides, to try and keep the test
stable.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-14 14:54:30 +00:00
stevenhorsman
d77008b817 metrics: Further reduce repeats for boot time tests on qemu
I've seen failures on the third run, so reduce it further to
just run twice on qemu

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-14 14:53:26 +00:00
stevenhorsman
97151cce4e metrics: Improve iperf timeout
The kubectl wait has a built in timeout of 30s, so
wrapping it in waitForProcess, means we have
180/2 * 30 delay, which is much longer than intended,
so just set the timeout directly.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-14 14:53:26 +00:00
Zvonko Kaiser
4bb0eb4590 Merge pull request #10954 from kata-containers/topic/metrics-kata-deploy
Rework and fix metrics issues
2025-03-04 20:22:53 -05:00
stevenhorsman
b220cca253 shellcheck: Fix shellcheck SC2066
> Since you double-quoted this, it will not word split, and the loop will only run once.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-04 09:39:10 +00:00
stevenhorsman
c5ff513e0b shellcheck: Fix shellcheck SC2068
> Double quote array expansions to avoid re-splitting elements

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-04 09:35:46 +00:00
stevenhorsman
58672068ff shellcheck: Fix shellcheck SC2145
> Argument mixes string and array. Use * or separate argument.

- Swap echos for printfs and improve formatting
- Replace $@ with $*
- Split arrays into separate arguments

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-04 09:35:46 +00:00
stevenhorsman
c69509be1c metrics: Reduce repeats for boot time tests on qemu
On qemu the run seems to error after ~4-7 runs, so try
a cut down version of repetitions to see if this helps us
get results in a stable way.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-02 08:42:00 +00:00
stevenhorsman
0962cd95bc metrics: Increase minpercent range for qemu iperf test
We have a new metrics machine and environment
and the iperf jitter result failed as it finished too quickly,
so increase the minpercent to try and get it stable

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-02 08:32:26 +00:00
stevenhorsman
ef0e8669fb metrics: Increase minpercent range for clh tests
We have a new metrics machine and environment
and the fio write.bw and iperf3 parallel.Results
tests failed for clh, as below
the minimum range, so increase the
minpercent to try and get it stable

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-02 08:32:26 +00:00
stevenhorsman
f81c85e73d metrics: Increase maxpercent range for clh boot times
We have a new metrics machine and environment
and the boot time test failed for clh, so increase the
maxpercent to try and get it stable

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-01 17:50:05 +00:00
stevenhorsman
435ee86fdd metrics: Update iperf affinity
The iperf deployment is quite a lot out of date
and uses `master` for it's affinity and toleration,
so update this to control-plane, so it can run on
newer Kubernetes clusters

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-01 17:50:05 +00:00
stevenhorsman
85bbc0e969 metrics: Increase wait time
The new metrics runner seems slower, so we are
seeing errors like:
The iperf3 tests are failing with:
```
pod rejected: RuntimeClass "kata" not found
```
so give more time for it to succeed

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-01 17:50:05 +00:00
stevenhorsman
4ce94c2d1b Revert "metrics: Add init_env function to latency test"
This reverts commit 9ac29b8d38.
to remove the duplicate `init_env` call

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-01 17:50:05 +00:00
stevenhorsman
658a5e032b metrics: Increase containerd start timeout
- Move `kill_kata_components` from common.bash
into the metrics code base as the only user of it
- Increase the timeout on the start of containerd as
the last 10 nightlies metric tests have failed with:
```
223478 Killed                  sudo timeout -s SIGKILL "${TIMEOUT}" systemctl start containerd
```

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-01 17:50:05 +00:00
stevenhorsman
3fab7944a3 workflows: Improve metrics jobs
- As the metrics tests are largely independent
then allow subsequent tests to run even if previous
ones failed. The results might not be perfect if
clean-up is required, but we can work on that later.
- Move the test results check out of the latency
test that seems arbitrary and into it's own job step
- Add timeouts to steps that might fail/hang if there
are containerd/K8s issues

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-01 17:50:05 +00:00
stevenhorsman
6f918d71f5 workflows: Update metrics jobs
Currently the run-metrics job runs a manual install
and does this in a separate job before the metrics
tests run. This doesn't make sense as if we have multiple
CI runs in parallel (like we often do), there is a high chance
that the setup for another PR runs between the metrics
setup and the runs, meaning it's not testing the correct
version of code. We want to remove this from happening,
so install (and delete to cleanup) kata as part of the metrics
test jobs.

Also switch to kata-deploy rather than manual install for
simplicity and in order to test what we recommend to users.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-01 17:50:05 +00:00
Balint Tobik
1943a1c96d tests: replace egrep with grep -E to avoid deprecation warning
https://lists.gnu.org/archive/html/info-gnu/2022-09/msg00001.html

Signed-off-by: Balint Tobik <btobik@redhat.com>
2025-01-29 11:26:27 +01:00
stevenhorsman
d031e479ab metrics: Increase minval range for blogbench test
In the last couple of days I've seen the blogbench
metrics write latency test on clh fail a few times because
the latency was too low, so adjust the minimum range
to tolerate quicker finishes.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-01-23 15:58:31 +00:00
stevenhorsman
aaae5b6d0f metrics: clh: Increase network-iperf3 range
We hit a failure with:
```
time="2025-01-09T09:55:58Z" level=warning msg="Failed Minval (0.017600 > 0.015000) for [network-iperf3]"
```
The range is very big, but in the last 3 test runs I reviewed we have got a minimum value of 0.015s
and a max value of 0.052, so there is a ~350% difference possible
so I think we need to have a wide range to make this stable.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-01-09 11:25:57 +00:00
stevenhorsman
e946d9d5d3 metrics: qemu: Increase latency test range
After the kernel version bump, in the latest nightly run
https://github.com/kata-containers/kata-containers/actions/runs/12681309963/job/35345228400
The sequential read throughput result was 79.7% of the expected (so failed)
and the sequential write was 84% of the expected, so was fairly close,
so increase their minimum ranges to make them more robust.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-01-09 11:25:50 +00:00
stevenhorsman
dc069d83b5 metrics: Increase latency test range
The bump to kernel 6.12 seems to have reduced the latency in
the metrics test, so increase the ranges for the minimal value,
to account for this.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-01-08 15:11:49 +00:00
stevenhorsman
b87b4b6756 metrics: Increase ranges range for qemu failing tests
We've also seen the qemu metrics tests are failing due to the results
being slightly outside the max range for network-iperf3 parallel and minimum for network-iperf3 jitter tests on PRs that have no code changes,
so we've increase the bounds to not see false negatives.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-11-29 10:52:16 +00:00
stevenhorsman
4011071526 metrics: Increase minval range for failing tests
We've seen a couple of instances recently where the metrics
tests are failing due to the results being below the minimum
value by ~2%.
For tests like latency I'm not sure why values being too low would
be an issue, but I've updated the minpercent range of the failing tests
to try and get them passing.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-11-29 10:50:02 +00:00
Gabriela Cervantes
52ef092489 metrics: Update fast footprint script to use grep
This PR updates the fast footprint script to remove the use
of egrep as this command has been deprecated and change it
to use grep command.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-09-30 17:43:08 +00:00
Gabriela Cervantes
fdaf12d16c metrics: Remove unused remove img var in common script
This PR removes the remove_img variable in the metrics common script
as it is not being used.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-09-11 17:45:18 +00:00
Gabriela Cervantes
fcc35dd3a7 metrics: Update openVINO and oneDNN tests references
This PR updates the machine learning tests references or urls for the
openVINO and oneDNN scripts as currently they are refering to a different
performance benchmark.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-09-05 15:39:21 +00:00
Gabriela Cervantes
5b0ab7f17c metrics: Remove metrics report for Kata Containers
This PR removes the metrics report which is not longer being used
in Kata Containers.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-09-03 16:11:07 +00:00
Gabriela Cervantes
aa8635727d metrics: Remove unused variable in oneDNN benchmark
This PR removes an unused variable in oneDNN metrics benchmark.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-08-29 15:52:47 +00:00
Gabriela Cervantes
3affde5b28 docs: Add oneDNN benchmark information to metrics README
This PR adds the oneDNN benchmark information to the machine
learning metrics README.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-08-27 16:32:50 +00:00
Gabriela Cervantes
2fa8e85439 metrics: Add OpenVINO general information into README
This PR adds the OpenVINO benchmark general information into the
machine learning README metrics information.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-08-22 16:08:06 +00:00
Gabriela Cervantes
59e31baaee metrics: Remove unused variable in openvino script
This PR removes an unused variable in the openvino script for kata
metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-08-21 16:05:55 +00:00
David Esparza
dcd0c0b269 metrics: Remove duplicated headers from results file.
This PR removes duplicated entries (vcpus count, and available memory),
from onednn and openvino results files.

Fixes: #10119

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2024-08-01 18:11:06 -06:00
Gabriela Cervantes
7454908690 metrics: Update memory tests to use grep -F
This PR updates the memory tests like fast footprint to use grep -F
instead of fgrep as this command has been deprecated.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-08-01 17:20:57 +00:00
Gabriela Cervantes
3d17a7038a metrics: Update launch times to use grep -F
This PR updates the metrics launch times to use grep -F instead of
fgrep as this command has been deprecated.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-07-23 17:13:52 +00:00
David Esparza
60f52a4b93 metrics: update avg reference values for blogbench.
This PR updates the Blogbench reference values for
read and write operations used in the CI check metrics
job.

This is due to the update to version 1.2 of blobench.

Fixes: #10039

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2024-07-18 15:47:14 -06:00