diff --git a/ci/openshift-ci/README.md b/ci/openshift-ci/README.md index bc44a3ed59..2d01276bad 100644 --- a/ci/openshift-ci/README.md +++ b/ci/openshift-ci/README.md @@ -8,3 +8,142 @@ There are 2 pipelines, history and logs can be accessed here: * [main - currently supported OCP](https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/periodic-ci-kata-containers-kata-containers-main-e2e-tests) * [next - currently under development OCP](https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/periodic-ci-kata-containers-kata-containers-main-next-e2e-tests) + + +Running openshift-tests on OCP with kata-containers manually +============================================================ + +To run openshift-tests (or other suites) with kata-containers one can use +the kata-webhook. To deploy everything you can mimic the CI pipeline by: + +```bash +#!/bin/bash -e +# Setup your kubectl and check it's accessible by +kubectl nodes +# Deploy kata (set KATA_DEPLOY_IMAGE to override the default kata-deploy-ci:latest image) +./test.sh +# Deploy the webhook +KATA_RUNTIME=kata-qemu cluster/deploy_webhook.sh +``` + +This should ensure kata-containers as well as kata-webhook are installed and +working. Before running the openshift-tests it's (currently) recommended to +ignore some security features by: + +```bash +#!/bin/bash -e +oc adm policy add-scc-to-group privileged system:authenticated system:serviceaccounts +oc adm policy add-scc-to-group anyuid system:authenticated system:serviceaccounts +oc label --overwrite ns default pod-security.kubernetes.io/enforce=privileged pod-security.kubernetes.io/warn=baseline pod-security.kubernetes.io/audit=baseline +``` + +Now you should be ready to run the openshift-tests. Our CI only uses a subset +of tests, to get the current ``TEST_SKIPS`` see +[the pipeline config](https://github.com/openshift/release/tree/master/ci-operator/config/kata-containers/kata-containers). +Following steps require the [openshift tests](https://github.com/openshift/origin) +being cloned and built in the current directory: + +```bash +#!/bin/bash -e +# Define tests to be skipped (see the pipeline config for the current version) +TEST_SKIPS="\[sig-node\] Security Context should support seccomp runtime/default\|\[sig-node\] Variable Expansion should allow substituting values in a volume subpath\|\[k8s.io\] Probing container should be restarted with a docker exec liveness probe with timeout\|\[sig-node\] Pods Extended Pod Container lifecycle evicted pods should be terminal\|\[sig-node\] PodOSRejection \[NodeConformance\] Kubelet should reject pod when the node OS doesn't match pod's OS\|\[sig-network\].*for evicted pods\|\[sig-network\].*HAProxy router should override the route\|\[sig-network\].*HAProxy router should serve a route\|\[sig-network\].*HAProxy router should serve the correct\|\[sig-network\].*HAProxy router should run\|\[sig-network\].*when FIPS.*the HAProxy router\|\[sig-network\].*bond\|\[sig-network\].*all sysctl on whitelist\|\[sig-network\].*sysctls should not affect\|\[sig-network\] pods should successfully create sandboxes by adding pod to network" +# Get the list of tests to be executed +TESTS="$(./openshift-tests run --dry-run --provider "${TEST_PROVIDER}" "${TEST_SUITE}")" +# Store the list of tests in /tmp/tsts file +echo "${TESTS}" | grep -v "$TEST_SKIPS" > /tmp/tsts +# Remove previously-existing temporarily files as well as previous results +OUT=RESULTS/tmp +rm -Rf /tmp/*test* /tmp/e2e-* +rm -R $OUT +mkdir -p $OUT +# Run the tests ignoring the monitor health checks +./openshift-tests run --provider azure -o "$OUT/job.log" --junit-dir "$OUT" --file /tmp/tsts --max-parallel-tests 5 --cluster-stability Disruptive --run '^\[sig-node\].*|^\[sig-network\]' +``` + +[!NOTE] +Note we are ignoring the cluster stability checks because our public cloud is +not that stable and running with VMs instead of containers results in minor +stability issues. Some of the old monitor stability tests do not reflect +the ``--cluster-stability`` setting, one should simply ignore these. If you +get a message like ``invariant was violated`` or ``error: failed due to a +MonitorTest failure``, it's usually an indication that only those kind of +tests failed but the real tests passed. See +[wrapped-openshift-tests.sh](https://github.com/openshift/release/blob/master/ci-operator/config/kata-containers/kata-containers/wrapped-openshift-tests.sh) +for details how our pipeline deals with that. + +[!TIP] +To compare multiple results locally one can use +[junit2html](https://github.com/inorton/junit2html) tool. + + +Best-effort kata-containers cleanup +=================================== + +If you need to cleanup the cluster after testing, you can use the +``cleanup.sh`` script from the current directory. It tries to delete all +resources created by ``test.sh`` as well as ``cluster/deploy_webhook.sh`` +ignoring all failures. The primary purpose of this script is to allow +soft-cleanup after deployment to test different versions without +re-provisioning everything. + +[!WARNING] +Do not rely on this script in production, return codes are not checked!** + + +Bisecting e2e tests failures +============================ + +Let's say the OCP pipeline passed running with +``quay.io/kata-containers/kata-deploy-ci:kata-containers-d7afd31fd40e37a675b25c53618904ab57e74ccd-amd64`` +but failed running with +``quay.io/kata-containers/kata-deploy-ci:kata-containers-9f512c016e75599a4a921bd84ea47559fe610057-amd64`` +and you'd like to know which PR caused the regression. You can either run with +all the 60 tags between or you can utilize the [bisecter](https://github.com/ldoktor/bisecter) +to optimize the number of steps in between. + +Before running the bisection you need a reproducer script. Sample one called +``sample-test-reproducer.sh`` is provided in this directory but you might +want to copy and modify it, especially: + +* ``OCP_DIR`` - directory where your openshift/release is located (can be exported) +* ``E2E_TEST`` - openshift-test(s) to be executed (can be exported) +* behaviour of SETUP (returning 125 skips the current image tag, returning + >=128 interrupts the execution, everything else reports the tag as failure +* what should be executed (perhaps running the setup is enough for you or + you might want to be looking for specific failures...) +* use ``timeout`` to interrupt execution in case you know things should be faster + +Executing that script with the GOOD commit should pass +``./sample-test-reproducer.sh quay.io/kata-containers/kata-deploy-ci:kata-containers-d7afd31fd40e37a675b25c53618904ab57e74ccd-amd64`` +and fail when executed with the BAD commit +``./sample-test-reproducer.sh quay.io/kata-containers/kata-deploy-ci:kata-containers-9f512c016e75599a4a921bd84ea47559fe610057-amd64``. + +To get the list of all tags in between those two PRs you can use the +``bisect-range.sh`` script + +```bash +./bisect-range.sh d7afd31fd40e37a675b25c53618904ab57e74ccd 9f512c016e75599a4a921bd84ea47559fe610057 +``` + +[!NOTE] +The tagged images are only built per PR, not for individual commits. See +[kata-deploy-ci](https://quay.io/kata-containers/kata-deploy-ci) to see the +available images. + +To find out which PR caused this regression, you can either manually try the +individual commits or you can simply execute: + +```bash +bisecter start "$(./bisect-range.sh d7afd31fd40 9f512c016)" +OCP_DIR=/path/to/openshift/release bisecter run ./sample-test-reproducer.sh +``` + +[!NOTE] +If you use ``KATA_WITH_SYSTEM_QEMU=yes`` you might want to deploy once with +it and skip it for the cleanup. That way you might (in most cases) test +all images with a single MCP update instead of per-image MCP update. + +[!TIP] +You can check the bisection progress during/after execution by running +``bisecter log`` from the current directory. Before starting a new +bisection you need to execute ``bisecter reset``. diff --git a/ci/openshift-ci/bisect-range.sh b/ci/openshift-ci/bisect-range.sh new file mode 100755 index 0000000000..c894506f1d --- /dev/null +++ b/ci/openshift-ci/bisect-range.sh @@ -0,0 +1,24 @@ +#!/bin/bash +# Copyright (c) 2024 Red Hat, Inc. +# +# SPDX-License-Identifier: Apache-2.0 +# +if [ "$#" -gt 2 ] || [ "$#" -lt 1 ] ; then + echo "Usage: $0 GOOD [BAD]" + echo "Prints list of available kata-deploy-ci tags between GOOD and BAD commits (by default BAD is the latest available tag)" + exit 255 +fi +GOOD="$1" +[ -n "$2" ] && BAD="$2" +ARCH=amd64 +REPO="quay.io/kata-containers/kata-deploy-ci" + +TAGS=$(skopeo list-tags "docker://$REPO") +# Only amd64 +TAGS=$(echo "$TAGS" | jq '.Tags' | jq "map(select(endswith(\"$ARCH\")))" | jq -r '.[]') +# Tags since $GOOD +TAGS=$(echo "$TAGS" | sed -n -e "/$GOOD/,$$p") +# Tags up to $BAD +[ -n "$BAD" ] && TAGS=$(echo "$TAGS" | sed "/$BAD/q") +# Comma separated tags with repo +echo "$TAGS" | sed -e "s@^@$REPO:@" | paste -s -d, - diff --git a/ci/openshift-ci/sample-test-reproducer.sh b/ci/openshift-ci/sample-test-reproducer.sh new file mode 100755 index 0000000000..3d080d80a1 --- /dev/null +++ b/ci/openshift-ci/sample-test-reproducer.sh @@ -0,0 +1,50 @@ +#!/bin/bash +# Copyright (c) 2024 Red Hat, Inc. +# +# SPDX-License-Identifier: Apache-2.0 +# +# A sample script to deploy, configure, run E2E_TEST and soft-cleanup +# afterwards OCP cluster using kata-containers primarily created for use +# with https://github.com/ldoktor/bisecter + +[ "$#" -ne 1 ] && echo "Provide image as the first and only argument" && exit 255 +export KATA_DEPLOY_IMAGE="$1" +OCP_DIR="${OCP_DIR:-/path/to/your/openshift/release/}" +E2E_TEST="${E2E_TEST:-'"[sig-node] Container Runtime blackbox test on terminated container should report termination message as empty when pod succeeds and TerminationMessagePolicy FallbackToLogsOnError is set [NodeConformance] [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]"'}" +KATA_CI_DIR="${KATA_CI_DIR:-$(pwd)}" +export KATA_RUNTIME="${KATA_RUNTIME:-kata-qemu}" + +## SETUP +# Deploy kata +SETUP=0 +pushd "$KATA_CI_DIR" || { echo "Failed to cd to '$KATA_CI_DIR'"; exit 255; } +./test.sh || SETUP=125 +cluster/deploy_webhook.sh || SETUP=125 +if [ $SETUP != 0 ]; then + ./cleanup.sh + exit "$SETUP" +fi +popd || true +# Disable security +oc adm policy add-scc-to-group privileged system:authenticated system:serviceaccounts +oc adm policy add-scc-to-group anyuid system:authenticated system:serviceaccounts +oc label --overwrite ns default pod-security.kubernetes.io/enforce=privileged pod-security.kubernetes.io/warn=baseline pod-security.kubernetes.io/audit=baseline + +## TEST EXECUTION +# Run the testing +pushd "$OCP_DIR" || { echo "Failed to cd to '$OCP_DIR'"; exit 255; } +echo "$E2E_TEST" > /tmp/tsts +# Remove previously-existing temporarily files as well as previous results +OUT=RESULTS/tmp +rm -Rf /tmp/*test* /tmp/e2e-* +rm -R $OUT +mkdir -p $OUT +# Run the tests ignoring the monitor health checks +./openshift-tests run --provider azure -o "$OUT/job.log" --junit-dir "$OUT" --file /tmp/tsts --max-parallel-tests 5 --cluster-stability Disruptive +RET=$? +popd || true + +## CLEANUP +./cleanup.sh +exit "$RET" + diff --git a/tests/cmd/check-spelling/data/acronyms.txt b/tests/cmd/check-spelling/data/acronyms.txt index b86a95e024..20e2f3de73 100644 --- a/tests/cmd/check-spelling/data/acronyms.txt +++ b/tests/cmd/check-spelling/data/acronyms.txt @@ -11,6 +11,7 @@ AUFS # Another Union FS AWS/AB BDF/AB CFS/AB +ci/AB CLI/AB CNI/AB CNM/AB @@ -33,6 +34,7 @@ gRPC/AB GSC/AB GVT/AB IaaS/B # Infrastructure as a Service +io/B IOMMU/AB IoT/AB # Internet of Things IOV/AB diff --git a/tests/cmd/check-spelling/data/main.txt b/tests/cmd/check-spelling/data/main.txt index 330bbd1629..e5dbffd1a6 100644 --- a/tests/cmd/check-spelling/data/main.txt +++ b/tests/cmd/check-spelling/data/main.txt @@ -67,6 +67,7 @@ metadata microcontroller/AB miniOS mmap/AB +MonitorTest/A nack/AB namespace/ABCD netlink diff --git a/tests/cmd/check-spelling/data/projects.txt b/tests/cmd/check-spelling/data/projects.txt index 211fe07de5..997ce6dc54 100644 --- a/tests/cmd/check-spelling/data/projects.txt +++ b/tests/cmd/check-spelling/data/projects.txt @@ -6,6 +6,7 @@ Ansible/B AppArmor/B +bisecter/B blogbench/B BusyBox/B Cassandra/B @@ -62,6 +63,7 @@ Netlify/B Nginx/B OpenCensus/B OpenPGP/B +openshift/B # lower-case used for some sub-projects OpenShift/B OpenSSL/B OpenStack/B diff --git a/tests/cmd/check-spelling/kata-dictionary.dic b/tests/cmd/check-spelling/kata-dictionary.dic index 16c504c59d..bd40d1aa4e 100644 --- a/tests/cmd/check-spelling/kata-dictionary.dic +++ b/tests/cmd/check-spelling/kata-dictionary.dic @@ -1,4 +1,4 @@ -386 +387 ACPI/AB ACS/AB API/AB @@ -90,6 +90,7 @@ MITRE/B MacOS/B Mellanox/B Minikube/B +MonitorTest/A NEMU/AB NIC/AB NVDIMM/AB @@ -197,6 +198,7 @@ backend backport/ACD backtick/AB backtrace +bisecter/B blogbench/B bootloader/AB ccloudvm/B @@ -204,6 +206,7 @@ centric/B cgroup/AB checkbox/A chipset/AB +ci/AB cnn/B codebase codecov/B @@ -255,6 +258,7 @@ init/AB initramfs/AB initrd/AB intel +io/B ioctl/A iodepth/A ioengine/A @@ -295,6 +299,7 @@ netns/AB nvidia/A onwards openSUSE/B +openshift/B osbuilder/B packagecloud/B parallelize/AC