Commit Graph

11454 Commits

Author SHA1 Message Date
Fabiano Fidêncio
594fcdce56 ci: cri-containerd: Use a smaller / cheaper VM instance
We don't need to run on a D4s_v5. as those tests are not CPU / memory
intense.  With this is mind, let's use a smaller version of the
instance, the D2s_v5 one.

Fixes: #7958

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 32841827b8)
2023-09-21 14:11:03 +02:00
Fabiano Fidêncio
fa9dd46041 ci: k8s: Don't set cpu limit request for k8s-inotofy test
Without setting the cpu limit / request to 1, we can make this test run
in a smaller VM instance without any issue.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 92fff129fd)
2023-09-21 14:10:57 +02:00
Fabiano Fidêncio
767ccb117f ci: Reduce the size of the AKS VMs
We do **not** need a very powerful machine for our tests, as we're not
building anything there.

The instance we switched to (Standard_D2s_v5) still has nested virt
available, as shown here[0], but has half of the amount of vCPUs /
Memory, which should be fine only for running the tests, costing us
basically half of the price[1].

[0]:
https://learn.microsoft.com/en-us/azure/virtual-machines/dv5-dsv5-series
[1]:
https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux/#pricing

Fixes: #7955

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit faf98c0623)
2023-09-21 14:10:52 +02:00
Fabiano Fidêncio
054895fcdd ci: cache: For consistency, read all used env vars
Instead of having some of them only being considered if explicitly
passed to the script.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit adc18ecdb1)
2023-09-21 14:10:46 +02:00
Fabiano Fidêncio
5e22a3085b ci: cache: Pass the exposed env vars to the kata-deploy binaries in docker
As the environment variables are now being passed down from the GitHub
Actions, let's make sure they're exposed to the container used to build
the kata-deploy binaries, and during the build process we'll be able to
use those to log in and push the artefacts to the OCI registry, using
ORAS.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit c7a851efd7)
2023-09-21 14:10:39 +02:00
Fabiano Fidêncio
bda0354491 ci: cache: Export env vars needed to use ORAS
We do the build of our artefacts inside a container image, and we need
to expose some env vars to the container so ORAS can be used there to
push the artefacts we want to cache to ghcr.io.

The env vars we're exposing are:
* ARTEFACT_REGISTRY: The registry where we're going to save the
  artefacts.
* ARTEFACT_REGISTRY_USERNAME: The username to log in to the registry, as
  ORAS does not use the same json file used by docker.
* ARTEFACT_REGISTRY_PASSWORD: The pasword to log in to the the registry,
  as the ORAS does not use the same json file used by docker.
* TARGET_BRANCH: The target branch, which will be part of the tag of the
  artefact, as we may end up caching the artefacts for both main and
  stable branches.

Fixes: #7834 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 6bd15a85d5)
2023-09-21 13:58:47 +02:00
Gabriela Cervantes
c78f740854 metrics: Add iperf cpu utilization limit for qemu
This PR adds the iperf cpu utilization limit for qemu for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit cd4fd1292a)
2023-09-21 13:58:28 +02:00
Gabriela Cervantes
73e989c4b1 metrics: Add iperf value for cpu utilization
This PR adds the iperf value for cpu utilization for kata metrics.

Fixes #7936

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit df5cd10ea0)
2023-09-21 13:58:13 +02:00
Jeremi Piotrowski
1c32b31589 tests: Apply timeout to 'ctr t kill'
This task has been observed to hang at times.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit a96050a7ad)
2023-09-21 13:57:02 +02:00
Jeremi Piotrowski
1d78871713 tests/vfio: Bump VM image to Fedora 38
We need a very recent L2 guest kernel to fix all the bugs that occur in nested
virtualization.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit 9d93036783)
2023-09-21 13:56:58 +02:00
Jeremi Piotrowski
b40a42699d tests/vfio: Accept single device in vfio group for CLH
cloud hypervisor does not emulate pcie switches or pci bridges, so we need to
accept a lonely device.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit faee59b520)
2023-09-21 13:56:54 +02:00
Jeremi Piotrowski
82a0225159 tests/vfio: Get rid of sync's
It is fine to start a VM with the disk image without syncing it as we now run
the test in an ephemeral Azure instance.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit df3dc1105c)
2023-09-21 13:56:49 +02:00
Jeremi Piotrowski
a1aed0c78e gha: vfio: Set test timeout to 15m
Sometimes the test gets stuck running commands in the container - need to
investigate why later.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit 7211c3dccc)
2023-09-21 13:56:45 +02:00
Jeremi Piotrowski
32be55aa8a packaging: kernel: Enable VIRTIO_IOMMU on x86_64
Cloud Hypervisor exposes a VIRTIO_IOMMU device to the VM when IOMMU support is
enabled. We need to add it to the whitelist because dragonball uses kernel
v5.10 which restricted VIRTIO_IOMMU to ARM64 only.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit 1b02f89e4f)

 Conflicts:
	tools/packaging/kernel/kata_config_version
2023-09-21 13:56:34 +02:00
Jeremi Piotrowski
3b5c5bcfa4 runtime: clh: Support enabling iommu
by enabling IOMMU on the default PCI segment. For hotplug to work we need a
virtualized iommu and clh exposes one if there is some device or PCI segment
that requests it. I would have preferred to add a separate PCI segment for
hotplugging vfio devices but unfortunately kata assumes there is only one
segment all over the place. See create_pci_root_bus_path(),
split_vfio_pci_option() and grep for '0000'.

Enabling the IOMMU on the default PCI segment requires passing enabling IOMMU on
every device that is attached to it, which is why it is sprinkled all over the
place.

CLH does not support IOMMU for VirtioFs, so I've added a non IOMMU segment for
that device.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit 3a1db7a86b)
2023-09-21 13:56:01 +02:00
Jeremi Piotrowski
a0f59829b2 tests/vfio: Give commands 30s to execute
This is a to catch the case of the guest getting stuck.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit 9f1a42c6cc)
2023-09-21 13:55:56 +02:00
Jeremi Piotrowski
65943d5b77 tests/vfio: Configure a value for 'hot_plug_vfio' for both vmms
This shouldn't be hiding behind only a qemu check, we need this for clh as
well.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit b46b0ecf8b)
2023-09-21 13:55:51 +02:00
Jeremi Piotrowski
18a8b8df03 runtime: Remove redundant check in checkPCIeConfig
There is no way for this branch to be hit, as port is only set when it is
different than config.NoPort.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit bfc93927fb)
2023-09-21 13:55:47 +02:00
Jeremi Piotrowski
d86af5923f runtime: Add test cases for checkPCIeConfig
These test cases shows which options are valid for CLH/Qemu, and test that we
correctly catch unsupported combinations.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit 7c4e73b609)
2023-09-21 13:55:43 +02:00
Jeremi Piotrowski
0a918d0d20 runtime: Check config for supported CLH (cold|hot)_plug_vfio values
The only supported options are hot_plug_vfio=root-port or no-port.
cold_plug_vfio not supported yet.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit fc51e4b9eb)
2023-09-21 13:55:38 +02:00
Jeremi Piotrowski
86201ace5a runtime: clh: Add hot_plug_vfio entry to config
hot_plug_vfio needs to be set to root-port, otherwise attaching vfio devices to
CLH VMs fails. Either cold_plug_vfio or hot_plug_vfio is required, and we have
not implemented support for cold_plug_vfio in CLH yet.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit 509771e6f5)
2023-09-21 13:55:31 +02:00
Jeremi Piotrowski
01265fb217 tests/vfio: Gather debug info and disable tdp_mmu
tdp_mmu had some issues up until around Linux v6.3 that make it work
particularly bad when running nested on Hyper-V. Reload the module at the start
of the test and disable the tdp_mmu param.

Gather debug info at the end of the test to make it easier to figure out what
went wrong. This uses github actions group syntax so that each section can be
collapsed.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit 5f6475a28a)
2023-09-21 13:55:25 +02:00
Jeremi Piotrowski
44f37f689a tests/vfio: Capture journal from vm
For debugging (though this doesn't get exposed yet).

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit 8fffdc81c5)
2023-09-21 13:55:19 +02:00
Jeremi Piotrowski
a69d0d1772 tests/vfio: Change to get the test working in GHA
- reduce memory and cpu usage to fit in a D4s_v5
- source correct lib
- mount workspace from 9p
- disable cpu mitigations for speed
- drop unused commands and variables
- install containerd
- install kata from built artifacts

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit df815087e7)
2023-09-21 13:55:14 +02:00
Jeremi Piotrowski
e90027f38c tests/vfio: Move dependency installation to gha-run.sh
To match the flow of other github actions workflows.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit a92ddeea15)
2023-09-21 13:55:08 +02:00
Jeremi Piotrowski
62804d637c gha: vfio: Import jobs scripts from tests repo
This imports the vfio test scripts github.com/kata-containers/tests. The test
case doesn't work yet but doing the changes in a separate commit will make it
easier to track the changes. The only change in this commit is renaming
vfio_jenkins_job_build.sh -> vfio_fedora_vm_wrapper.sh

Fixes: #6555
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit 5a551a85b1)
2023-09-21 13:50:38 +02:00
Gabriela Cervantes
97283b18b4 metrics: Increase jitter value for qemu
This PR increases the jitter value for qemu for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 49e2fa189c)
2023-09-21 13:50:31 +02:00
Gabriela Cervantes
3c5bd8c44d metrics: Increase value limit for jitter in clh
This PR increases the value limit for jitter in clh.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 49234433a7)
2023-09-21 13:50:25 +02:00
Fabiano Fidêncio
6abf513f06 ci: docker: nerdtl: Use io.containerd.kata-${KATA_HYPERVISOR}.io
This will ensure that we're calling the correct binary for the
hypervisor.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 813bfdec01)
2023-09-21 13:50:15 +02:00
Fabiano Fidêncio
9a664ea8bb ci: nerdctl: Create the containerd config
Otherwise we'll fail to configure kata-containers in the `install-kata`
step.

This is mostly needed because the nerdctl-full tarball doesn't provide a
contaienrd configuration, just the binary, as contaienrd does not
actually require a configuration file to run with the default config.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 46bc0b1c01)
2023-09-21 13:50:10 +02:00
Fabiano Fidêncio
5734c4cbca ci: nerdctl: Switch to tcp port 80 ping
TIL that the Azure VMs we use are created without an explicit outbund
connectivity defined.

This leads us to issues using `ping ...` as part of our tests, and when
consulting Jeremi Piotrowski about the issue he pointed me out to two
interesting links:
* https://learn.microsoft.com/en-us/azure/virtual-network/ip-services/default-outbound-access
* https://learn.microsoft.com/en-us/archive/blogs/mast/use-port-pings-instead-of-icmp-to-test-azure-vm-connectivity

For your own sanity, do not read the comments, after all this is
internet. :-)

Anyways, the suggestion is to use nping instead, which is provided by
the nmap package, so we can explicitly switch to using the tcp port 80
for the ping.  With this in mind, I'm switching the image we use for the
test and using one that provided nping as a possible entry point, and
from now on (this part of) the tests should work.

Fixes: #7910

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 13968aa7f6)
2023-09-21 13:50:00 +02:00
Fabiano Fidêncio
55c8a47a40 ci: docker: Switch to tcp port 80 ping
TIL that the Azure VMs we use are created without an explicit outbund
connectivity defined.

This leads us to issues using `ping ...` as part of our tests, and when
consulting Jeremi Piotrowski about the issue he pointed me out to two
interesting links:
* https://learn.microsoft.com/en-us/azure/virtual-network/ip-services/default-outbound-access
* https://learn.microsoft.com/en-us/archive/blogs/mast/use-port-pings-instead-of-icmp-to-test-azure-vm-connectivity

For your own sanity, do not read the comments, after all this is
internet. :-)

Anyways, the suggestion is to use nping instead, which is provided by
the nmap package, so we can explicitly switch to using the tcp port 80
for the ping.  With this in mind, I'm switching the image we use for the
test and using one that provided nping as a possible entry point, and
from now on (this part of) the tests should work.

Fixes: #7910

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit e0c811678b)
2023-09-21 13:49:54 +02:00
Gabriela Cervantes
31c3d9bd80 metrics: Add iperf bandwidth value for qemu
This PR adds the iperf bandwidth value for qemu for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 0aa073967d)
2023-09-21 13:49:45 +02:00
Gabriela Cervantes
40ae855f0e metrics: Add iperf bandwidth value for kata metrics
This PR adds the iperf bandwidth value for kata metrics.

Fixes #7924

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 615c1cbf19)
2023-09-21 13:49:35 +02:00
Gabriela Cervantes
deadacd58f metrics: Ensure docker is running in init_env
This PR ensures that docker is running as part of the init_env function
in kata metrics to avoid failures like docker is not running and making
the kata metrics CI to fail.

Fixes #7898

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit d53eb73eec)
2023-09-21 13:49:28 +02:00
Gabriela Cervantes
31c33f9c1c metrics: Add Cassandra Metrics documentation
This PR adds the Cassandra Metrics documentation for kata metrics.

Fixes #7922

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit ad08321b83)
2023-09-21 13:49:21 +02:00
David Esparza
0968bf1eb9 metrics: this PR skips the FIO test temprarily to fix issues
FIO test is showing ongoing issues when running in k8s.
Working on running FIO on the ctr client which has been
shown to be stable.

Fixes: #7920

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
(cherry picked from commit a58ea66592)
2023-09-21 13:48:24 +02:00
Fabiano Fidêncio
e5e3951398 ci: docker: Also run the smoke test with runc
This will help us to make sure that the failure is actually related to
Kata Containers.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit f536ef5ce1)
2023-09-21 13:46:53 +02:00
Fabiano Fidêncio
c7147dabce ci: docker: Run the tests after the kata-static is created
There's no reason to wait till the payload is created to run the tests,
as we rely on the tarball, not on the kata-deploy payload.

That was a mistake on my side, and that's already fixed for the nerdctl
tests.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit c83f167c59)
2023-09-21 13:46:46 +02:00
Fabiano Fidêncio
33430ad60c ci: Add a very basic nerdctl sanity test
Let's add a very basic sanity test to check that we can spawn a
containers using nerdctl + Kata Containers.

This will ensure that, at least, we don't regress to the point where
this feature doesn't work at all.

In the future, we should also test all the VMMs with devmapper, but
that's for a follow-up PR after this test is working as expected.

Fixes: #7911

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 12d833d07d)
2023-09-21 13:46:37 +02:00
Fabiano Fidêncio
69dd11f459 ci: Add a very basic docker sanity test
Let's add a very basic sanity test to check that we can spawn a
containers using docker + Kata Containers.

This will ensure that, at least, we don't regress to the point where
this feature doesn't work at all.

For now we're running this test against Cloud Hypervisor and QEMU only,
due to an already reported issue with dragonball:
https://github.com/kata-containers/kata-containers/issues/7912

In the future, we should also test all the VMMs with devmapper, but
that's for a follow-up PR after this test is working as expected.

Fixes: #7910

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 348b8644d6)
2023-09-21 13:46:28 +02:00
Fabiano Fidêncio
fcfa6c6e1a ci: use github.ref_name instead of $GITHUB_REF_NAME
As, regardless of what's mentioned in the documentation, it seems that
$GITHUB_REF_NAME is passed down as a literal string.

Fixes: #7414

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit f811b064ca)
2023-09-21 13:46:00 +02:00
Fabiano Fidêncio
19d9fd9eb1 ci: Add more target-branch related fixes
The ones for the payload-after-push.yamland ci-nightly.yaml are not that
much important right now, but they're needed for when we start running
those on stable branches as well.

The other ones were missed during
bd24afcf73.

Fixes: #7414

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 6d795c089e)
2023-09-21 13:45:52 +02:00
Fabiano Fidêncio
fe4247a90c ci: Fix target-branch usage
We missed those one as part of bd24afcf73.

Fixes: #7414

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 8509c31870)
2023-09-21 13:45:43 +02:00
Gabriela Cervantes
9f510d059b metrics: Remove warning from metrics documentation
Now that the metrics migration from the tests to kata containers has been completed, this PR removes the warning from the main metrics documentation.

Fixes #7894

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 060499dcae)
2023-09-21 13:45:31 +02:00
Fabiano Fidêncio
400418bce0 kata-deploy: Remove curl after it's used
There's no need to keep curl there after the kubectl binary has already
been downloaded.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 8b4a0b368f)
2023-09-21 13:45:07 +02:00
Fabiano Fidêncio
1df997c38c kata-deploy: Fix aarch64 image build
Similarly to what's been done for x86_64 -> amd64, we need to do a
aarch64 -> arm64 change in order to be able to download the kubectl
binary.

Fixes: #7861

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 139c7f03ab)
2023-09-21 13:44:54 +02:00
Fabiano Fidêncio
61b1a99fca gha: Manually rebase PR atop of the target branch before testing
We're changing what's been done as part of ac939c458c, as we've
notcied issues using `github.event.pull_request.merge_commit_sha`.

Basically, whenever a force-push would happen, the reference of
merge_commit_sha wouldn't be updated, leading us to test PRs with the
old code. :-/

In order to get the rebase properly working, we need to ensure we pull
the hash of the commit as part of checkout action, and ensure
fetch-depth is set to 0.

Fixes: #7414

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit bd24afcf73)
2023-09-21 13:43:49 +02:00
Fabiano Fidêncio
db563709e3 kata-deploy: Switch to an alpine image
This will make our image smaller, and still ensure it's multi-arch
support.

Fixes: #7861

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 670a8e9c73)
2023-09-21 13:43:35 +02:00
Fabiano Fidêncio
bb5dbfbbce k8s: ci: Skip "Pod quota" test with firecracker
The test is failing, and an issue has been opened to track it.
For now, let's skip it.

Issue:
https://github.com/kata-containers/kata-containers/issues/7873

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 9d74b7ccc9)

 Conflicts:
	tests/integration/kubernetes/k8s-pod-quota.bats
2023-09-21 13:43:16 +02:00