Commit Graph

497 Commits

Author SHA1 Message Date
Gabriela Cervantes
e95d3b1be5 tests: Add stressor CPU test for stability tests
This PR adds the stressor CPU test for stability tests.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 6493aa309e)
2023-10-06 15:18:32 +02:00
Gabriela Cervantes
4393f553e9 metrics: Add stability test for kata CI
This PR adds the stability test for kata containers repository.

Fixes #8084

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit ef68a3a36b)
2023-10-06 15:18:26 +02:00
Gabriela Cervantes
c22fdb46e3 metrics: Increase qemu jitter value
This PR increases qemu jitter value.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 8d66ef5185)
2023-10-06 15:17:07 +02:00
Gabriela Cervantes
8a1af8689b metrics: Increase jitter value for clh
This PR increases jitter value for clh.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 5600e28b54)
2023-10-06 15:16:59 +02:00
Gabriela Cervantes
f3fcf6cbf9 metrics: Add checkmetrics for latency test
This PR adds the checkmetrics for latency test.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 8cb7df1bed)
2023-10-06 15:16:53 +02:00
Gabriela Cervantes
ce03e9f97a metrics: Add qemu latency value limit
This PR adds the qemu latency value limit for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit e90440ae24)
2023-10-06 15:16:47 +02:00
Gabriela Cervantes
cd82a351bd metrics: Add latency value limits for kata CI
This PR adds latency value limits for kata CI.

Fixes #8067

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit a74a8f8a9d)
2023-10-06 15:16:39 +02:00
Fabiano Fidêncio
1709f99975 ci: kata-monitor: Move tests over
Let's move, adapt, and use the kata-monitor tests from the tests repo.
In this PR I'm keeping the SoB from every single contributor from who
touched those tests in the past.

Fixes: #8074

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
(cherry picked from commit 489caf1ad0)
2023-10-06 15:16:25 +02:00
Fabiano Fidêncio
c42d19619d ci: Make install_kata aware of container engines
This will help us when running tests using CRI-O.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 57cb4ce204)
2023-10-06 15:16:11 +02:00
Fabiano Fidêncio
5017435734 ci: Create a generic install_crio function
This will serve us quite will in the upcoming tests addition, which will
also have to be executed using CRi-O.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit de1eeee334)
2023-10-06 15:16:05 +02:00
Fabiano Fidêncio
98e9434be4 ci: Add install_cni_plugins helper
This will become handy when doing tests with CRI-O, as CRI-O doesn't
install the CNI plugins for us.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 64a2000859)
2023-10-06 15:15:58 +02:00
Fabiano Fidêncio
c61b488b66 ci: Modify containerd default config
Let's ensure we have runc running with `SystemdCgroups = false`,
otherwise we'll face failures when running tests depending on runc on
Ubuntu 22.04, woth LTS containerd.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 8132fe15c9)
2023-10-06 15:15:51 +02:00
Gabriela Cervantes
7c4617cfac metrics: Add init_env function to latency test
This Pr adds the init_env function to latency test.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 9ac29b8d38)
2023-10-06 15:15:42 +02:00
Gabriela Cervantes
e106ecd1e4 metrics: Fix latency yamls path
This PR fixes the latency yamls path for the latency test for
kata metrics.

Fixes #8055

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 81c8babca9)
2023-10-06 15:15:34 +02:00
Gabriela Cervantes
665805c81c metrics: Fix spelling warnings
This PR fixes general spelling warnings detected by the spelling check.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 97e73b2234)
2023-10-06 15:15:27 +02:00
Gabriela Cervantes
b0c9b4254b metrics: Fix metrics README
This PR fixes the network metrics section at the README by leaving
the current tests that we have in our kata metrics.

Fixes #8017

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 36c8cd6f1f)
2023-10-06 15:15:19 +02:00
Gabriela Cervantes
c28a0a03f0 metrics: Fix C-Ray documentation
This PR fixes the C-Ray documentation for kata metrics.

Fixes #8052

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 4815736820)
2023-10-06 15:15:08 +02:00
Fabiano Fidêncio
48a9b4ab13 ci: crio: Trail '\r' from exec_host() output
We've faced this as part of the CI, only happening with the CRI-O tests:
```
 not ok 1 Test readonly volume for pods
 # (from function `exec_host' in file tests_common.sh, line 51,
 #  in test file k8s-file-volume.bats, line 25)
 #   `exec_host "echo "$file_body" > $tmp_file"' failed with status 127
 # [bats-exec-test:38] INFO: k8s configured to use runtimeclass
 # bash: line 1: $'\r': command not found
 #
 # Error from server (NotFound): pods "test-file-volume" not found
```

I must say I didn't dig into figuring out why this is happening, but we
may be safe enough to just trail the '\r', as long as all the tests keep
passing on containerd.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit ef63d67c41)
2023-10-06 15:14:58 +02:00
Fabiano Fidêncio
2de1c8bac2 ci: crio: Enable default capabilities
We need the default capabilities to be enabled, especially `SYS_CHROOT`,
in order to have tests accessing the host to pass.

A huge thanks to Greg Kurz for spotting this and suggesting the fix.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 74c12b2927)
2023-10-06 15:14:52 +02:00
Fabiano Fidêncio
468a3218f5 ci: crio: Pass -y to apt
That was something overlooked during my tests. :-/

Fixes: #8005

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit ebaa4fa4c1)
2023-10-06 15:12:41 +02:00
Gabriela Cervantes
73a084a7d4 metrics: Enable latency test in gha run script
This PR enables the latency test for gha run script for kata metrics.

Fixes #8037

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 6776b55d7e)
2023-10-06 15:12:23 +02:00
Fabiano Fidêncio
f2953e6448 ci: k8s: rke2: Use sudo to call systemd
Otherwise we'll face the following error:
```
Failed to enable unit: Interactive authentication required.
```

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 07a6e63a6b)
2023-09-21 14:17:21 +02:00
Fabiano Fidêncio
b41fa6d946 ci: k8s: Add a method to install CRI-O
This is based on official CRI-O documentations[0] and right now we're
making this specific to Ubuntu as that's what we have as runners.

We may want to expand this in the future, but we're good for now.

[0]:
https://github.com/cri-o/cri-o/blob/main/install.md#apt-based-operating-systems

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit d7105cf7a4)
2023-09-21 14:17:13 +02:00
Fabiano Fidêncio
67fef9d5c6 ci: k8s: k0s: Allow passing parameters to the k0s installer
We'll need this in order to setup k0s with a different container engine.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 54c0a471b1)
2023-09-21 14:17:09 +02:00
Fabiano Fidêncio
7fc2f7d003 ci: kata-deploy: Add the ability to deploy rke2
This will be very useful in the near future, when we start testing
kata-deploy with rke2 as well.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 2c908b598c)
2023-09-21 14:16:44 +02:00
Fabiano Fidêncio
59a4b00d29 ci: kata-deploy: Add the ability to deploy k0s
This will be very useful in the near future, when we start testing
kata-deploy with k0s as well.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit eaf6164916)
2023-09-21 14:16:40 +02:00
Fabiano Fidêncio
1a605c33ad ci: kata-deploy: Add deploy-k8s argument to gha-run.sh
We'll be using exactly the same code used for the k8s tests, which are
already deploying k3s on GARM.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 0015257636)
2023-09-21 14:16:35 +02:00
Fabiano Fidêncio
19ee6c9fd7 ci: kata-deploy: Expland tests to run on k0s / rke2
We just need to make sure the correct overlay is applied, following what
we already have been doing for k3s.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit bf2cb02283)
2023-09-21 14:16:32 +02:00
Fabiano Fidêncio
f09c255766 ci: kata-deploy: Export KUBERNETES env var
So we have a better control on which flavour of kubernetes kata-deploy
is expected to be targetting.

This was also done as part of fa62a4c01b,
for the k8s tests.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 9e1fb8a966)
2023-09-21 14:16:17 +02:00
Fabiano Fidêncio
abe9dc9904 ci: Move deploy_k8s() to gha-run-k8s-common.sh
This will allow us to re-use the function in the kata-deploy tests,
which will come soon.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 09cc0ed438)
2023-09-21 14:16:12 +02:00
Fabiano Fidêncio
ea6489653e ci: Properly set K8S_TEST_UNION
Otherwise only the first test will be executed

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 486fe14c99)
2023-09-21 14:16:03 +02:00
Aurélien Bombo
7892e04dd1 ci: Add first letter of the K8S_TEST_HOST_TYPE to resource group name
Ideally we'd add the instance_type or the full K8S_TEST_HOST_TYPE but
that exceeds the maximum amount of characteres allowed for the cluster
name.  With this in mind, let's use the first letter of
K8S_TEST_HOST_TYPE instead.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
(cherry picked from commit d9ef1352af)
2023-09-21 14:15:59 +02:00
Aurélien Bombo
882d7d7d89 ci: Create clusters in individual resource groups
This makes it so that each AKS cluster is created in its own individual
resource group, rather than using the "kataCI" resource group for all
test clusters.

This is to accommodate a tool that we recently introduced in our Azure
subscription which automatically deletes resource groups after a set
amount of time, in order to keep spending under control.

The tool will automatically delete any resource group, unless it has a
tag SkipAutoDeleteTill = YYYY-MM-DD. When this tag is present, the
resource group will be retained until the specified date.

Note that I tagged all current resource groups in our subscription with
SkipAutoDeleteTill = 2043-01-01 so that we don't lose any existing
resources.

Fixes: #7982

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
(cherry picked from commit 68267a3996)
2023-09-21 14:15:54 +02:00
Gabriela Cervantes
b09a3f8f8e metrics: Add parallel bandwidth limit for qemu
This PR adds the parallel bandwidth limit for qemu for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 9aa8d1c917)
2023-09-21 14:15:45 +02:00
Gabriela Cervantes
63e8c38a7a metrics: Enable parallel bandwidth iperf limit
This PR enables the parallel bandwidth iperf limit for kata metrics.

Fixes #7989

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit af59d4bf4a)
2023-09-21 14:15:39 +02:00
Fabiano Fidêncio
f3c42ff5fe nydus: Temporarily skip tests on dragonball
We're hitting a specific issue after updating, which will require some
work on dragonball before it can be re-added here.

The issue:
```
...
3: failed to do rafs mount\\n
4: fail to attach rafs \\\"/var/lib/containerd-nydus/snapshots/2/fs/image/image.boot\\\"\\n
5: add share fs mount\\n
6: Mount rafs at
   /rafs/197ef3db03c86b91bf3045ff59183ce8b5750941ad1d3484f4a8301a70f5109f/rootfs_lower
   error: Failed to Mount backend
...

Caused by:
vmm action error: FsDevice(AttachBackendFailed(\\\"attach/detach a
backend filesystem failed:: missing field `version` at line 1 column
489\\\"))\"): unknown"
```

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit aba36ab188)
2023-09-21 14:15:34 +02:00
Fabiano Fidêncio
49c1a37330 nydus: Use kata-${KATA_HYPERVISOR} instead of kata
This will ensure we're testing with the correct runtime, instead of
using the `default` one.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit b8a8dfcd15)
2023-09-21 14:15:29 +02:00
ChengyuZhu6
65e5bfe9eb tests: nydus: Update nydus tests
To support the v0.12.0 nydus-snapshotter, we need to update the config
files and the commandline to start nydus-snapshotter.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
(cherry picked from commit 2f9c9e2e63)
2023-09-21 14:15:18 +02:00
Fabiano Fidêncio
d9e910702b gha: nydus: Populate run()
And with this we finally enable the nydus tests to run as part of our
GHA CI.

Fixes: #6543

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit b73bde320d)
2023-09-21 14:15:08 +02:00
Fabiano Fidêncio
33a4427845 gha: nydus: Populate install_dependencies()
Let's have all the dependencies needed for running the nydus tests
installed.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit b3904a1a30)
2023-09-21 14:15:03 +02:00
Fabiano Fidêncio
70c1c7d868 gha: nydus: Actually install kata when install-kata is called
We've been simply doing nothing whenever `install-kata` was called, and
that was the intent when we added the placeholder calls.

Now, let's install kata, as expected. :-)

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit d2b3b67f5d)
2023-09-21 14:14:57 +02:00
Fabiano Fidêncio
30efa3e563 gha: nydus: Get rid of nydus{,-snapshotter} install from nydus_test.sh
As we've added install_nydus() and install_nydus_snapshotter(), which do
conform with the pattern we're following on GHA, let's rely on them
rather than relying on the bits coming from nydus_test.sh.

Later on we'll have install_nydus() and install_nydus_snapshotter() as
part of the dependencies install in our `gha-run.sh`.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 0ec00ad42e)
2023-09-21 14:14:51 +02:00
Fabiano Fidêncio
9ad6000676 tests: nydus: Add timeout to the crictl calls
Similarly to what's been done for the cri-containerd tests, as part of
84dd02e0f9, we need to add the timeout
here for the crictl calls.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 568439c77b)
2023-09-21 14:14:47 +02:00
Fabiano Fidêncio
6d9b8e2437 tests: nydus: Add uid / namespace to the nydus container / sandbox
Otherwise we may face errors like:
```
getting sandbox status of pod "d3af2db414ce8": metadata.Name,
metadata.Namespace or metadata.Uid is not in metadata
"&PodSandboxMetadata{Name:nydus-sandbox,Uid:,Namespace:default,Attempt:1,}"

getting sandbox status of pod "-A": rpc error: code = NotFound desc = an
error occurred when try to find sandbox: not found
```

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 5ac3b76eb1)
2023-09-21 14:14:41 +02:00
Fabiano Fidêncio
fd5935da9d tests: nydus: Decorate some calls with sudo
Otherwise we canoot properly start the nydus snapshotter, nor properly
kill it after it's been started.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 376574a16c)
2023-09-21 14:14:31 +02:00
Fabiano Fidêncio
4b58777eec tests: nydus: Adapt "source ..." to GHA
The "source ..." we've been doing was not changed since those tests were
part of the Jenkins tests, and we need to adapt them, either setting the
correct path or entirely removing the ones that are not relevant to us
anymore.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 4290fd4b67)
2023-09-21 14:14:24 +02:00
Fabiano Fidêncio
82c531978f tests: nydus: Adapt check to "clh" instead "cloud-hypervisor"
As that's what we've been using as part of the GHA.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit a84efa3e87)
2023-09-21 14:14:20 +02:00
Fabiano Fidêncio
4915605b20 tests: common: Add install_nydus_snapshotter()
This function will be used to download and install the
nydus-snapshotter, and it follows the same pattern we already have
introduced for downloading and installing another dependencies from
GitHub.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 56a14b3950)
2023-09-21 14:14:16 +02:00
Fabiano Fidêncio
8e4180f697 tests: common: Add install_nydus()
This function will be used to download and install nydus, and it follows
the same pattern we already have introduced for downloading and
installing another dependencies from GitHub.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit b6563783e2)
2023-09-21 14:14:11 +02:00
Fabiano Fidêncio
cd6ab3cf07 tests: install_rust: Also install clippy
clippy is used as part our tests, so it's useful to have it installed
while we're already installing rust.

In case of developers, they also better be using it. :-)

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit e125775863)
2023-09-21 14:12:50 +02:00
Fabiano Fidêncio
755057c9ed tests: Move install_rust.sh from the tests repo
We'll use it as part of the refactoring we're doing in the static check
tests.

I can see a lot of other uses of this, but changing all of them to this
one is out of the scope for this PR.

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 6794d4c843)
2023-09-21 14:12:42 +02:00
Fabiano Fidêncio
d3a04b7b8f tests: install_go: Remove tests repo dependency
We can rely on the functions that are now part of the common.bash.

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit e64508c308)
2023-09-21 14:12:39 +02:00
Fabiano Fidêncio
c18c412db7 tests: Move functions from kata_arch script here
We can use this a lot as part of our CI, but right now I'm just moving
those here with the intent to use later on in this series.

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 11dff731b7)
2023-09-21 14:12:34 +02:00
Fabiano Fidêncio
7c4a0f7fac ci: Use variable size of VMs depending on the tests running
Let me start with a fair warning that this commit is hard to split into
different parts that could be easily tested (or not tested, just
ignored) without breaking pieces.

Now, about the commit itself, as we're on the run to reduce costs
related to our sponsorship on Azure, we can split the k8s tests we run
in 2 simple groups:
* Tests that can be run in the smaller Azure instance (D2s_v5)
* Tests that required the normal Azure instance (D4s_v5)

With this in mind, we're now passing to the tests which type of host
we're using, which allows us to select to run either one of the two
types of tests, or even both in case of running the tests on a baremetal
system.

Fixes: #7972

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit c69a1e33bd)
2023-09-21 14:12:22 +02:00
Fabiano Fidêncio
2f280659b1 ci: k8s: Temporarily disable tests that require a bigger VM instance
The list of tests which require a bigger VM instance is:
* k8s-number-cpus.bats -- failing on all CIs
* k8s-parallel.bats -- only failing on the cbl-mariner CI
* k8s-scale-nginx.bats -- only failing on the cbl-mariner CI

We'll keep those disabled while we re-work the logic to **only run
those** in a bigger (and more expensive) VM instance.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 094b6b2cf8)
2023-09-21 14:11:48 +02:00
Fabiano Fidêncio
fa9dd46041 ci: k8s: Don't set cpu limit request for k8s-inotofy test
Without setting the cpu limit / request to 1, we can make this test run
in a smaller VM instance without any issue.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 92fff129fd)
2023-09-21 14:10:57 +02:00
Fabiano Fidêncio
767ccb117f ci: Reduce the size of the AKS VMs
We do **not** need a very powerful machine for our tests, as we're not
building anything there.

The instance we switched to (Standard_D2s_v5) still has nested virt
available, as shown here[0], but has half of the amount of vCPUs /
Memory, which should be fine only for running the tests, costing us
basically half of the price[1].

[0]:
https://learn.microsoft.com/en-us/azure/virtual-machines/dv5-dsv5-series
[1]:
https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux/#pricing

Fixes: #7955

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit faf98c0623)
2023-09-21 14:10:52 +02:00
Gabriela Cervantes
c78f740854 metrics: Add iperf cpu utilization limit for qemu
This PR adds the iperf cpu utilization limit for qemu for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit cd4fd1292a)
2023-09-21 13:58:28 +02:00
Gabriela Cervantes
73e989c4b1 metrics: Add iperf value for cpu utilization
This PR adds the iperf value for cpu utilization for kata metrics.

Fixes #7936

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit df5cd10ea0)
2023-09-21 13:58:13 +02:00
Jeremi Piotrowski
1c32b31589 tests: Apply timeout to 'ctr t kill'
This task has been observed to hang at times.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit a96050a7ad)
2023-09-21 13:57:02 +02:00
Jeremi Piotrowski
1d78871713 tests/vfio: Bump VM image to Fedora 38
We need a very recent L2 guest kernel to fix all the bugs that occur in nested
virtualization.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit 9d93036783)
2023-09-21 13:56:58 +02:00
Jeremi Piotrowski
b40a42699d tests/vfio: Accept single device in vfio group for CLH
cloud hypervisor does not emulate pcie switches or pci bridges, so we need to
accept a lonely device.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit faee59b520)
2023-09-21 13:56:54 +02:00
Jeremi Piotrowski
82a0225159 tests/vfio: Get rid of sync's
It is fine to start a VM with the disk image without syncing it as we now run
the test in an ephemeral Azure instance.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit df3dc1105c)
2023-09-21 13:56:49 +02:00
Jeremi Piotrowski
a0f59829b2 tests/vfio: Give commands 30s to execute
This is a to catch the case of the guest getting stuck.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit 9f1a42c6cc)
2023-09-21 13:55:56 +02:00
Jeremi Piotrowski
65943d5b77 tests/vfio: Configure a value for 'hot_plug_vfio' for both vmms
This shouldn't be hiding behind only a qemu check, we need this for clh as
well.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit b46b0ecf8b)
2023-09-21 13:55:51 +02:00
Jeremi Piotrowski
01265fb217 tests/vfio: Gather debug info and disable tdp_mmu
tdp_mmu had some issues up until around Linux v6.3 that make it work
particularly bad when running nested on Hyper-V. Reload the module at the start
of the test and disable the tdp_mmu param.

Gather debug info at the end of the test to make it easier to figure out what
went wrong. This uses github actions group syntax so that each section can be
collapsed.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit 5f6475a28a)
2023-09-21 13:55:25 +02:00
Jeremi Piotrowski
44f37f689a tests/vfio: Capture journal from vm
For debugging (though this doesn't get exposed yet).

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit 8fffdc81c5)
2023-09-21 13:55:19 +02:00
Jeremi Piotrowski
a69d0d1772 tests/vfio: Change to get the test working in GHA
- reduce memory and cpu usage to fit in a D4s_v5
- source correct lib
- mount workspace from 9p
- disable cpu mitigations for speed
- drop unused commands and variables
- install containerd
- install kata from built artifacts

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit df815087e7)
2023-09-21 13:55:14 +02:00
Jeremi Piotrowski
e90027f38c tests/vfio: Move dependency installation to gha-run.sh
To match the flow of other github actions workflows.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit a92ddeea15)
2023-09-21 13:55:08 +02:00
Jeremi Piotrowski
62804d637c gha: vfio: Import jobs scripts from tests repo
This imports the vfio test scripts github.com/kata-containers/tests. The test
case doesn't work yet but doing the changes in a separate commit will make it
easier to track the changes. The only change in this commit is renaming
vfio_jenkins_job_build.sh -> vfio_fedora_vm_wrapper.sh

Fixes: #6555
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit 5a551a85b1)
2023-09-21 13:50:38 +02:00
Gabriela Cervantes
97283b18b4 metrics: Increase jitter value for qemu
This PR increases the jitter value for qemu for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 49e2fa189c)
2023-09-21 13:50:31 +02:00
Gabriela Cervantes
3c5bd8c44d metrics: Increase value limit for jitter in clh
This PR increases the value limit for jitter in clh.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 49234433a7)
2023-09-21 13:50:25 +02:00
Fabiano Fidêncio
6abf513f06 ci: docker: nerdtl: Use io.containerd.kata-${KATA_HYPERVISOR}.io
This will ensure that we're calling the correct binary for the
hypervisor.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 813bfdec01)
2023-09-21 13:50:15 +02:00
Fabiano Fidêncio
9a664ea8bb ci: nerdctl: Create the containerd config
Otherwise we'll fail to configure kata-containers in the `install-kata`
step.

This is mostly needed because the nerdctl-full tarball doesn't provide a
contaienrd configuration, just the binary, as contaienrd does not
actually require a configuration file to run with the default config.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 46bc0b1c01)
2023-09-21 13:50:10 +02:00
Fabiano Fidêncio
5734c4cbca ci: nerdctl: Switch to tcp port 80 ping
TIL that the Azure VMs we use are created without an explicit outbund
connectivity defined.

This leads us to issues using `ping ...` as part of our tests, and when
consulting Jeremi Piotrowski about the issue he pointed me out to two
interesting links:
* https://learn.microsoft.com/en-us/azure/virtual-network/ip-services/default-outbound-access
* https://learn.microsoft.com/en-us/archive/blogs/mast/use-port-pings-instead-of-icmp-to-test-azure-vm-connectivity

For your own sanity, do not read the comments, after all this is
internet. :-)

Anyways, the suggestion is to use nping instead, which is provided by
the nmap package, so we can explicitly switch to using the tcp port 80
for the ping.  With this in mind, I'm switching the image we use for the
test and using one that provided nping as a possible entry point, and
from now on (this part of) the tests should work.

Fixes: #7910

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 13968aa7f6)
2023-09-21 13:50:00 +02:00
Fabiano Fidêncio
55c8a47a40 ci: docker: Switch to tcp port 80 ping
TIL that the Azure VMs we use are created without an explicit outbund
connectivity defined.

This leads us to issues using `ping ...` as part of our tests, and when
consulting Jeremi Piotrowski about the issue he pointed me out to two
interesting links:
* https://learn.microsoft.com/en-us/azure/virtual-network/ip-services/default-outbound-access
* https://learn.microsoft.com/en-us/archive/blogs/mast/use-port-pings-instead-of-icmp-to-test-azure-vm-connectivity

For your own sanity, do not read the comments, after all this is
internet. :-)

Anyways, the suggestion is to use nping instead, which is provided by
the nmap package, so we can explicitly switch to using the tcp port 80
for the ping.  With this in mind, I'm switching the image we use for the
test and using one that provided nping as a possible entry point, and
from now on (this part of) the tests should work.

Fixes: #7910

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit e0c811678b)
2023-09-21 13:49:54 +02:00
Gabriela Cervantes
31c3d9bd80 metrics: Add iperf bandwidth value for qemu
This PR adds the iperf bandwidth value for qemu for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 0aa073967d)
2023-09-21 13:49:45 +02:00
Gabriela Cervantes
40ae855f0e metrics: Add iperf bandwidth value for kata metrics
This PR adds the iperf bandwidth value for kata metrics.

Fixes #7924

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 615c1cbf19)
2023-09-21 13:49:35 +02:00
Gabriela Cervantes
deadacd58f metrics: Ensure docker is running in init_env
This PR ensures that docker is running as part of the init_env function
in kata metrics to avoid failures like docker is not running and making
the kata metrics CI to fail.

Fixes #7898

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit d53eb73eec)
2023-09-21 13:49:28 +02:00
Gabriela Cervantes
31c33f9c1c metrics: Add Cassandra Metrics documentation
This PR adds the Cassandra Metrics documentation for kata metrics.

Fixes #7922

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit ad08321b83)
2023-09-21 13:49:21 +02:00
David Esparza
0968bf1eb9 metrics: this PR skips the FIO test temprarily to fix issues
FIO test is showing ongoing issues when running in k8s.
Working on running FIO on the ctr client which has been
shown to be stable.

Fixes: #7920

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
(cherry picked from commit a58ea66592)
2023-09-21 13:48:24 +02:00
Fabiano Fidêncio
e5e3951398 ci: docker: Also run the smoke test with runc
This will help us to make sure that the failure is actually related to
Kata Containers.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit f536ef5ce1)
2023-09-21 13:46:53 +02:00
Fabiano Fidêncio
33430ad60c ci: Add a very basic nerdctl sanity test
Let's add a very basic sanity test to check that we can spawn a
containers using nerdctl + Kata Containers.

This will ensure that, at least, we don't regress to the point where
this feature doesn't work at all.

In the future, we should also test all the VMMs with devmapper, but
that's for a follow-up PR after this test is working as expected.

Fixes: #7911

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 12d833d07d)
2023-09-21 13:46:37 +02:00
Fabiano Fidêncio
69dd11f459 ci: Add a very basic docker sanity test
Let's add a very basic sanity test to check that we can spawn a
containers using docker + Kata Containers.

This will ensure that, at least, we don't regress to the point where
this feature doesn't work at all.

For now we're running this test against Cloud Hypervisor and QEMU only,
due to an already reported issue with dragonball:
https://github.com/kata-containers/kata-containers/issues/7912

In the future, we should also test all the VMMs with devmapper, but
that's for a follow-up PR after this test is working as expected.

Fixes: #7910

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 348b8644d6)
2023-09-21 13:46:28 +02:00
Gabriela Cervantes
9f510d059b metrics: Remove warning from metrics documentation
Now that the metrics migration from the tests to kata containers has been completed, this PR removes the warning from the main metrics documentation.

Fixes #7894

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 060499dcae)
2023-09-21 13:45:31 +02:00
Fabiano Fidêncio
61b1a99fca gha: Manually rebase PR atop of the target branch before testing
We're changing what's been done as part of ac939c458c, as we've
notcied issues using `github.event.pull_request.merge_commit_sha`.

Basically, whenever a force-push would happen, the reference of
merge_commit_sha wouldn't be updated, leading us to test PRs with the
old code. :-/

In order to get the rebase properly working, we need to ensure we pull
the hash of the commit as part of checkout action, and ensure
fetch-depth is set to 0.

Fixes: #7414

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit bd24afcf73)
2023-09-21 13:43:49 +02:00
Fabiano Fidêncio
bb5dbfbbce k8s: ci: Skip "Pod quota" test with firecracker
The test is failing, and an issue has been opened to track it.
For now, let's skip it.

Issue:
https://github.com/kata-containers/kata-containers/issues/7873

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 9d74b7ccc9)

 Conflicts:
	tests/integration/kubernetes/k8s-pod-quota.bats
2023-09-21 13:43:16 +02:00
Fabiano Fidêncio
263ed4afd1 ci: k8s: Remove useless skip statement from tests
There's absolutely no need to have the skip check as part of the test
itself when it's already done as part of the setup function.

We're only touching the files here that were touched in the previous
commit.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit f6cd3930c5)
2023-09-21 13:42:24 +02:00
Fabiano Fidêncio
7e135294a7 ci: k8s: Also check for "fc" (for firecracker)
Let's keep both checks for now, but in the future we'll be able to
remove the check for "firecracker", as the hypervisor name used as part
of the GitHub Actions has to match what's used as part of the
kata-deploy stuff, which is `fc` (as in `kata-fc for the runtime class)
instead of `firecracker`.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 3cc20b47a6)
2023-09-21 13:42:17 +02:00
Fabiano Fidêncio
8892d9a7b2 ci: k8s: Add clean-up-garm argument for gha-run.sh
The tests are failing to finish as the argument is invalid.

Fixes: #6542

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit b5bad3cb0f)
2023-09-21 13:42:06 +02:00
Fabiano Fidêncio
aee6f36c86 ci: k8s: Add a kata-deploy-garm target
We've been using the `kata-deploy-tdx` target as that also uses k3s as
base, but it's better to just have a specific garm target.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 27fa7d828d)
2023-09-21 13:41:52 +02:00
Fabiano Fidêncio
5bb77b628d ci: k8s: Export KUBERNETES env var
So we have a better control on which flavour of kubernetes kata-deploy
is expected to be targetting.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit fa62a4c01b)
2023-09-21 13:41:45 +02:00
Fabiano Fidêncio
9fb291d88a ci: k8s: Wait some time after restarting k3s
Let's put a 1 minute sleep, just to make sure everything is back up
again.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 3de23034f8)
2023-09-21 13:41:31 +02:00
David Esparza
053308eefc metrics: fix FIO test initialization
This PR changes the order in which the FIO test first
cleans the environment and then checks if the environment
is indeed clean.

Fixes: #7869

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
(cherry picked from commit adfea55b8f)
2023-09-21 13:41:12 +02:00
Fabiano Fidêncio
89345b6731 ci: k8s: Append, instead of overwrite, the devmapper config
As we were using `tee` without the `-a` (or `--apend`) aptton, the
containerd config would be overwritten, leading to a NotReady state of
the Node.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 2df183fd99)
2023-09-21 13:41:01 +02:00
Fabiano Fidêncio
bb675f8101 ci: k8s: Decrease k3s sleep from 4 to 2 minutes
It should be plenty, and worked well in local tests.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 369a8af8f7)
2023-09-21 13:40:56 +02:00
Fabiano Fidêncio
695c7162ef ci: k8s: Use vanilla kubectl with k3s
Let's download the vanilla kubectl binary into `/usr/bin/`, as we need
to avoid hitting issues like:
```sh
error: open /etc/rancher/k3s/k3s.yaml.lock: permission denied
```

The issue basically happens because k3s links `/usr/local/bin/kubectl`
to `/usr/local/bin/k3s`, and that does extra stuff that vanilla
`kubectl` doesn't do.

Also, in order to properly use the k3s.yaml config with the vanilla
kubectl, we're copying it to ~/.kube/config.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit ada65b988a)
2023-09-21 13:40:51 +02:00
Fabiano Fidêncio
7f865be398 ci: k8s: Ensure k3s is deploy with --write-kubeconfig-mode=644
Otherwise the /etc/rancher/k3s/k3s.yaml is not readable by other users
than root.

As --write-config-mode is being passed, and that's an option that has to
be passed to the `server`, -s is also added to the command line.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit ad45ab5d33)
2023-09-21 13:40:45 +02:00
Fabiano Fidêncio
7a96d0a589 ci: k8s: Use the proper command for sleep
`wait` waits for a job to complete, not a number of seconds.  Not sure
how I got that wrong in the first place, but it's what it's.

Fixes: #6542

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 028a97e0d5)
2023-09-21 13:40:38 +02:00
Gabriela Cervantes
92fdaf9719 metrics: Use TensorFlow optimized image
This PR replaces the ubuntu image for one which has TensorFlow optimized
for kata metrics.

Fixes #7866

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 3a427795ea)
2023-09-21 13:39:40 +02:00