Commit Graph

45 Commits

Author SHA1 Message Date
Lukáš Doktor
67ee9f3425
ci.ocp: Improve logging of extra new resources
this script relies on temporary subscriptions and won't cleanup any
resources. Let's improve the logging to better describe what resources
were created and how to clean them, if the user needs to do so.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2025-05-21 11:02:36 +02:00
Lukáš Doktor
32dbc5d2a9
ci.ocp: Use SCRIPT_DIR to allow execution from any folder
We used hardcoded "ci/openshift-ci/cluster" location which expects this
script to be only executed from the root. Let's use SCRIPT_DIR instead
to allow execution from elsewhere eg. by user bisecting a failed CI run.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2025-05-21 10:30:03 +02:00
Lukáš Doktor
0e4fb62bb4
ci.ocp: Retry first az command as login takes time to propagate
In CI we hit problem where just after `az login` the first `az
network vnet list` command fails due to permission. We see
"insufficient permissions" or "pending permissions", suggesting we should
retry later. Manual tests and successful runs indicate we do have the
permissions, but not immediately after login.

Azure docs suggest using extra `az account set` but still the
propagation might take some time. Add a loop retrying
the first command a few times before declaring failure.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2025-05-21 10:28:01 +02:00
Lukáš Doktor
c203d7eba6
ci.ocp: Set peer-pods-azure license
We forgot to add the license header when introducing this test.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2025-05-20 17:03:48 +02:00
Lukáš Doktor
b97b20295b
ci.ocp: Make peer-pods setup executable
set permissions of the peer-pods-azure.sh script to executable

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2025-05-20 17:03:48 +02:00
Steve Horsman
a6d1dc7df3
Merge pull request #10940 from ldoktor/peer-pods
ci.ocp: Add peer-pods setup script
2025-04-29 15:57:30 +01:00
Lukáš Doktor
bfdf4e7a6a
ci.ocp: Add peer-pods setup script
this script will be used in a new OCP integration pipeline to monitor
basic workflows of OCP+peer-pods.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2025-04-15 12:13:22 +02:00
Lukáš Doktor
009aa6257b
ci.ocp: Override default runtimeclass CPU resources
some of the e2e tests spawn a lot of workers which are mainly idle, but
the scheduler fails to schedule them due to cpu resource overcommit. For
our testing we are more focused on having actual pods running than the
speed of the scheduled pods so let's increase the amount of schedulable
pods by decreasing the default cpu requests.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2025-04-02 10:30:40 +02:00
Lukáš Doktor
d708866b2a
ci.ocp: shellcheck various fixes
various manual fixes.

Related to: #10951

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2025-03-19 12:26:28 +01:00
Lukáš Doktor
02deb1d782
ci: shellcheck SC2248
SC2248 (style): Prefer double quoting even when variables don't contain
special characters, might result in arguments difference, shouldn't in
our cases.

Related to: #10951

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2025-03-19 12:26:16 +01:00
Lukáš Doktor
6552ac41e0
ci: shellcheck SC2086
SC2086 Double quote to prevent globbing and word splitting, might break
places where we deliberately use word splitting, but we are not using it
here.

Related to: #10951

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2025-03-19 12:26:08 +01:00
Lukáš Doktor
154a4ddc00
ci: shellcheck SC2292
SC2292 (style): Prefer [[ ]] over [ ] for tests in Bash/Ksh. This might
result in different handling of globs and some ops which we don't use.

Related to: #10951

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2025-03-19 12:26:03 +01:00
Lukáš Doktor
667e26036c
ci: shellcheck SC2250
Treat the SC2250 require-variable-braces in CI. There are no functional
changes.

Related to: #10951

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2025-03-19 12:25:44 +01:00
Wainer Moschetta
8c2d1b374c
Merge pull request #10892 from ldoktor/webhook
ci: Change the way we modify runtimeclass in webhook
2025-03-12 12:32:45 -03:00
stevenhorsman
2df3e5937a ci/openshift-ci: Fix script error
The space was missing before `]`, so fix this and also
swtich to double square brackets and variable braces

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-04 09:39:10 +00:00
stevenhorsman
67bfd4793e shellcheck: Fix shellcheck SC2242
> Can only exit with status 0-255. Other data should be written to stdout/stderr.

Switch exit -1 to exit 1

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-04 09:39:01 +00:00
stevenhorsman
c5ff513e0b shellcheck: Fix shellcheck SC2068
> Double quote array expansions to avoid re-splitting elements

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-04 09:35:46 +00:00
stevenhorsman
58672068ff shellcheck: Fix shellcheck SC2145
> Argument mixes string and array. Use * or separate argument.

- Swap echos for printfs and improve formatting
- Replace $@ with $*
- Split arrays into separate arguments

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-04 09:35:46 +00:00
Lukáš Doktor
d0ef78d3a4
ci: Change the way we modify runtimeclass in webhook
previously we used to deploy the webhook and then modified the cm from
our ci/openshift-ci/ script to the desired value, but sometimes it
happens that the webhook pod starts before we modify the cm and keeps
using the default value.

Let's change the approach and modify the deployments in-place. The only
cons is it leaves the git dirty, but since this script is only supposed
to be used in ci it should be safe.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2025-02-18 11:39:22 +01:00
Lukáš Doktor
2f7d34417a
ci.ocp: Use the official python:3 container for sanity
Fedora F40 removed python3 from the base container, to avoid such issues
let's rely on the latest and greates official python container.

Fixes: #10497

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2024-11-08 07:16:30 +01:00
Lukáš Doktor
820e000f1c
ci.ocp: Sort images according to git
The quay.io registry returns the tags sorted alphabetically and doesn't
seem to provide a way to sort it by age. Let's use "git log" to get all
changes between the commits and print all tags that were actually
pushed.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2024-10-01 16:08:00 +02:00
Lukáš Doktor
8355eee9f5
ci: Reorder webhook deployment
in b9d88f74ed the `runtime_class` CM was
added which overrides the one we previously set. Let's reorder our logic
to first deploy webhook and then override the default CM in order to use
the one we really want.

Since we need to change dirs we also have to use realpath to ensure the
files are located well.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2024-09-24 17:01:28 +02:00
Lukáš Doktor
b08c019003
ci.ocp: Ensure we smoke-test with the right runtime class
we do encourage people to set the KATA_RUNTIME, but it is only used by
the webhook. Let's define it in the main `test.sh` and use it in the
smoke test to ensure the user-defined runtime is smoke-tested rather
than hard-coded kata-qemu one.

Related to: #9804

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2024-06-20 11:15:02 +02:00
Lukáš Doktor
699376c535
ci.ocp: Switch base to centos-9
Centos8 is EOL and repos are not available anymore. Centos9 contains the
same packages and should do well as a base for testing.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2024-06-06 09:03:17 +02:00
Lukáš Doktor
f994f79078
ci.ocp: Add steps to reproduce/bisect CI runs
in case the upstream CI fails it's useful to pin-point the PR that
caused the regression. Currently openshift-ci does not allow doing that
from their setup but we can mimic the setup on our infrastructure and
use the available kata-deploy-ci images to find the first failing one.
To help with that add a few helper scripts and a howto.

Fixes: #9228

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2024-05-16 20:20:05 +02:00
Lukáš Doktor
a556ad7e01
ci.ocp: Document how to run openshift-tests with kata
document the ocp pipeline.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2024-05-16 20:15:32 +02:00
Lukáš Doktor
ea081bd882
ci.ocp: Add webhook cleanup
cleanup the webhook resources as well.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2024-05-16 20:15:31 +02:00
Lukáš Doktor
b8382cea88
ci.ocp: Increase the MCP update time
updating the machine config takes even longer than 1200s, use 60m to be
sure everything is updated.

Fixes: #9338

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2024-04-03 15:01:29 +02:00
Lukáš Doktor
46e62eecb1
ci.ocp: Log the full grepped line rather than the expected msg
we are grepping for an expected message but it might contain extra bits
of information fruitful for later debugging. Let's include it in the
output and the full log in case of an error.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2024-03-12 17:03:46 +01:00
Lukáš Doktor
7ff2eb508e
ci.ocp: Increase the mcp update timeout
we're hitting this timeout quite often, looks like newer OCP takes
longer to reconfigure. Increase the timeout to 1200.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2024-03-12 16:38:04 +01:00
Lukáš Doktor
cc02329fd1
ci.ocp: Add a cleanup script
This script doesn't serve as a complete cleanup, but it can be used as a
best-effort cleaner between deploying different versions of
kata-containers on the same OCP cluster.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2024-03-12 16:38:04 +01:00
Lukáš Doktor
b811ee0650
ci.ocp: Allow to override the kata-deploy image
sometimes we want to test a different than the latest image (eg. when
verifying a PR via ghcr images or when bisecting a failure over older
builds). Let's add a KATA_DEPLOY_IMAGE variable for that while keeping
the latest image by default.

Fixes: #9228

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2024-03-12 16:38:04 +01:00
Lukáš Doktor
2936503b24
ci.ocp: Always replace the kata-deploy image in OCP pipeline
previously we only replaced the image when the previously defined one
matched the "old_img". This is good to avoid modifying developers custom
changes, but it might lead to hard-to-debug issues when the image stays
different. Let's ensure we always replace the image with the one we
asked for.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2024-03-12 16:38:04 +01:00
Lukáš Doktor
6525c94065
ci.ocp: Add a workaround to optionally enable skip_mount_home
the latest upstream kata-containers requires the skip_mount_home to be
enabled, which is default on OCP 4.14+ but disabled on OCP 4.13-. Let's
use a "WORKAROUND_9206_CRIO" (called by kata-containers GH issue)
variable to allow users to enable this treatement when needed.

Related to: #9206

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2024-03-12 16:38:04 +01:00
Lukáš Doktor
739d627b4e
ci.ocp: Turn selinux relabel failures into warnings
Instead of failing the pipeline let's proceed with an error message that
selinux setup failed so, in case of a later failure, we know what might
have caused it while keeping the coverage in case of a false setup
issue.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2024-03-12 16:38:04 +01:00
Lukáš Doktor
76c452d4e0
ci.ocp: Wait for all pods to finish the work
previously we only waited for a random pod to finish the selinux
relabel, which could be error-prone. Let's wait for all of the podst to
contain the expected message.

Increase the timeout to 120s as some pods might take a little bit longer
to finish.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2024-03-12 16:34:56 +01:00
Lukáš Doktor
f7febd07a0
ci.ocp: Allow to re-apply the selinux workaround
in case we re-apply the selinux workaround or if user had already
existing similar rule the relabel_selinux was failing. Let's allow it to
modify the existing rules as well to avoid such issues.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2024-03-12 16:02:21 +01:00
Lukáš Doktor
fbbea68f1f
ci.ocp: Ignore selinux setup on non-selinux cluster
improve our selinux workaround to work well on non-selinux clusters.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2024-03-12 16:02:20 +01:00
Lukáš Doktor
6fffbaa190
ci.ocp: Backport service-up detection fixes
This backports the:

9060e930caf2d20f413df07778d3ab497493161c

    ci.ocp: Add debug output on HTTP service failure

    these logs are vital to analyze a setup failure.

a10a1e2c9cbc21afc1e80f22b0fb8634d27cbd8d

    ci.ocp: Improve the service-up detection

    waiting for the first response is not sufficient as OCP returns html
    page without error even when the route is not yet established describing
    the issue (why it doesn't reply with 500?). Waiting for the correct
    output should do better.

commits from the kata-containers/tests repo.

Fixes: #8653

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2024-03-01 12:04:20 +01:00
Lukáš Doktor
b0b7748f30
ci/openshift-ci: Correct the lib location
correct the lib file locations after the move from
tests->kata-containers repo and add a minimized version of the
".ci/lib.sh" library into the "ci/openshift-ci" as we don't really
utilize all of the features.

Fixes: #8653

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2024-01-30 19:05:56 +01:00
Lukáš Doktor
4c58478536
ci/openshift-ci: Move openshift-ci from the tests repo
Move the f15be37d9bef58a0128bcba006f8abb3ea13e8da version of scripts
required for openshift-ci from "kata-containers/tests/.ci/openshift-ci"
into "kata-containers/kata-containers/ci/openshift-ci" and required
webhook+libs into "kata-containers/kata-containers/tools/testing" as is
to simplify verification, the different location handling will be added
in following commit.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2024-01-30 19:05:55 +01:00
Wainer dos Santos Moschetta
a9bebb3169 openshift-ci: switch to CentOS Stream
The build root container is switched from CentOS 8 to Stream 8 as
the former reached EOL.

Fixes #3605
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2022-02-02 19:50:01 -03:00
Wainer dos Santos Moschetta
3669e1b6d9 ci/openshift-ci: delint dockerfiles
Removed all errors/warnings pointed out by hadolint version 2.7.0, except for the following
ignored rules:
  - "DL3008 warning: Pin versions in apt get install"
  - "DL3041 warning: Specify version with `dnf install -y <package>-<version>`"
  - "DL3033 warning: Specify version with `yum install -y <package>-<version>`"
  - "DL3048 style: Invalid label key"
  - "DL3003 warning: Use WORKDIR to switch to a directory"
  - "DL3018 warning: Pin versions in apk add. Instead of apk add <package> use apk add <package>=<version>"
  - "DL3037 warning: Specify version with zypper install -y <package>[=]<version>"

Fixes #3107
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2021-12-21 09:54:44 -05:00
Wainer dos Santos Moschetta
8594f80c0a ci/openshift-ci: Pull centos from registry.centos.org
In order to avoid hit the pull requests limit of docker.io, this changed the
openshift-ci/images/Dockerfile.buildroot dockerfile to pull the centos image
from registry.centos.org.

Fixes #1636

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2021-08-23 15:21:10 -03:00
Wainer dos Santos Moschetta
9281e56705 ci/openshift-ci: Add build root dockerfile
This adds the dockerfile which is used by the OpenShift CI operator to build
the build root image. It is installed git as it is required by the operator
to clone repositories. The sudo package is also installed because many scripts
relies on the command but it is not installed by tests/.ci/setup_env_centos.sh.

Fixes #1636

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2021-04-06 13:44:02 -04:00