mirror of
https://github.com/kata-containers/kata-containers.git
synced 2025-07-12 22:58:58 +00:00
ci.ocp: Add steps to reproduce/bisect CI runs
in case the upstream CI fails it's useful to pin-point the PR that caused the regression. Currently openshift-ci does not allow doing that from their setup but we can mimic the setup on our infrastructure and use the available kata-deploy-ci images to find the first failing one. To help with that add a few helper scripts and a howto. Fixes: #9228 Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
This commit is contained in:
parent
a556ad7e01
commit
f994f79078
@ -8,3 +8,142 @@ There are 2 pipelines, history and logs can be accessed here:
|
|||||||
|
|
||||||
* [main - currently supported OCP](https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/periodic-ci-kata-containers-kata-containers-main-e2e-tests)
|
* [main - currently supported OCP](https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/periodic-ci-kata-containers-kata-containers-main-e2e-tests)
|
||||||
* [next - currently under development OCP](https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/periodic-ci-kata-containers-kata-containers-main-next-e2e-tests)
|
* [next - currently under development OCP](https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/periodic-ci-kata-containers-kata-containers-main-next-e2e-tests)
|
||||||
|
|
||||||
|
|
||||||
|
Running openshift-tests on OCP with kata-containers manually
|
||||||
|
============================================================
|
||||||
|
|
||||||
|
To run openshift-tests (or other suites) with kata-containers one can use
|
||||||
|
the kata-webhook. To deploy everything you can mimic the CI pipeline by:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#!/bin/bash -e
|
||||||
|
# Setup your kubectl and check it's accessible by
|
||||||
|
kubectl nodes
|
||||||
|
# Deploy kata (set KATA_DEPLOY_IMAGE to override the default kata-deploy-ci:latest image)
|
||||||
|
./test.sh
|
||||||
|
# Deploy the webhook
|
||||||
|
KATA_RUNTIME=kata-qemu cluster/deploy_webhook.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
This should ensure kata-containers as well as kata-webhook are installed and
|
||||||
|
working. Before running the openshift-tests it's (currently) recommended to
|
||||||
|
ignore some security features by:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#!/bin/bash -e
|
||||||
|
oc adm policy add-scc-to-group privileged system:authenticated system:serviceaccounts
|
||||||
|
oc adm policy add-scc-to-group anyuid system:authenticated system:serviceaccounts
|
||||||
|
oc label --overwrite ns default pod-security.kubernetes.io/enforce=privileged pod-security.kubernetes.io/warn=baseline pod-security.kubernetes.io/audit=baseline
|
||||||
|
```
|
||||||
|
|
||||||
|
Now you should be ready to run the openshift-tests. Our CI only uses a subset
|
||||||
|
of tests, to get the current ``TEST_SKIPS`` see
|
||||||
|
[the pipeline config](https://github.com/openshift/release/tree/master/ci-operator/config/kata-containers/kata-containers).
|
||||||
|
Following steps require the [openshift tests](https://github.com/openshift/origin)
|
||||||
|
being cloned and built in the current directory:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#!/bin/bash -e
|
||||||
|
# Define tests to be skipped (see the pipeline config for the current version)
|
||||||
|
TEST_SKIPS="\[sig-node\] Security Context should support seccomp runtime/default\|\[sig-node\] Variable Expansion should allow substituting values in a volume subpath\|\[k8s.io\] Probing container should be restarted with a docker exec liveness probe with timeout\|\[sig-node\] Pods Extended Pod Container lifecycle evicted pods should be terminal\|\[sig-node\] PodOSRejection \[NodeConformance\] Kubelet should reject pod when the node OS doesn't match pod's OS\|\[sig-network\].*for evicted pods\|\[sig-network\].*HAProxy router should override the route\|\[sig-network\].*HAProxy router should serve a route\|\[sig-network\].*HAProxy router should serve the correct\|\[sig-network\].*HAProxy router should run\|\[sig-network\].*when FIPS.*the HAProxy router\|\[sig-network\].*bond\|\[sig-network\].*all sysctl on whitelist\|\[sig-network\].*sysctls should not affect\|\[sig-network\] pods should successfully create sandboxes by adding pod to network"
|
||||||
|
# Get the list of tests to be executed
|
||||||
|
TESTS="$(./openshift-tests run --dry-run --provider "${TEST_PROVIDER}" "${TEST_SUITE}")"
|
||||||
|
# Store the list of tests in /tmp/tsts file
|
||||||
|
echo "${TESTS}" | grep -v "$TEST_SKIPS" > /tmp/tsts
|
||||||
|
# Remove previously-existing temporarily files as well as previous results
|
||||||
|
OUT=RESULTS/tmp
|
||||||
|
rm -Rf /tmp/*test* /tmp/e2e-*
|
||||||
|
rm -R $OUT
|
||||||
|
mkdir -p $OUT
|
||||||
|
# Run the tests ignoring the monitor health checks
|
||||||
|
./openshift-tests run --provider azure -o "$OUT/job.log" --junit-dir "$OUT" --file /tmp/tsts --max-parallel-tests 5 --cluster-stability Disruptive --run '^\[sig-node\].*|^\[sig-network\]'
|
||||||
|
```
|
||||||
|
|
||||||
|
[!NOTE]
|
||||||
|
Note we are ignoring the cluster stability checks because our public cloud is
|
||||||
|
not that stable and running with VMs instead of containers results in minor
|
||||||
|
stability issues. Some of the old monitor stability tests do not reflect
|
||||||
|
the ``--cluster-stability`` setting, one should simply ignore these. If you
|
||||||
|
get a message like ``invariant was violated`` or ``error: failed due to a
|
||||||
|
MonitorTest failure``, it's usually an indication that only those kind of
|
||||||
|
tests failed but the real tests passed. See
|
||||||
|
[wrapped-openshift-tests.sh](https://github.com/openshift/release/blob/master/ci-operator/config/kata-containers/kata-containers/wrapped-openshift-tests.sh)
|
||||||
|
for details how our pipeline deals with that.
|
||||||
|
|
||||||
|
[!TIP]
|
||||||
|
To compare multiple results locally one can use
|
||||||
|
[junit2html](https://github.com/inorton/junit2html) tool.
|
||||||
|
|
||||||
|
|
||||||
|
Best-effort kata-containers cleanup
|
||||||
|
===================================
|
||||||
|
|
||||||
|
If you need to cleanup the cluster after testing, you can use the
|
||||||
|
``cleanup.sh`` script from the current directory. It tries to delete all
|
||||||
|
resources created by ``test.sh`` as well as ``cluster/deploy_webhook.sh``
|
||||||
|
ignoring all failures. The primary purpose of this script is to allow
|
||||||
|
soft-cleanup after deployment to test different versions without
|
||||||
|
re-provisioning everything.
|
||||||
|
|
||||||
|
[!WARNING]
|
||||||
|
Do not rely on this script in production, return codes are not checked!**
|
||||||
|
|
||||||
|
|
||||||
|
Bisecting e2e tests failures
|
||||||
|
============================
|
||||||
|
|
||||||
|
Let's say the OCP pipeline passed running with
|
||||||
|
``quay.io/kata-containers/kata-deploy-ci:kata-containers-d7afd31fd40e37a675b25c53618904ab57e74ccd-amd64``
|
||||||
|
but failed running with
|
||||||
|
``quay.io/kata-containers/kata-deploy-ci:kata-containers-9f512c016e75599a4a921bd84ea47559fe610057-amd64``
|
||||||
|
and you'd like to know which PR caused the regression. You can either run with
|
||||||
|
all the 60 tags between or you can utilize the [bisecter](https://github.com/ldoktor/bisecter)
|
||||||
|
to optimize the number of steps in between.
|
||||||
|
|
||||||
|
Before running the bisection you need a reproducer script. Sample one called
|
||||||
|
``sample-test-reproducer.sh`` is provided in this directory but you might
|
||||||
|
want to copy and modify it, especially:
|
||||||
|
|
||||||
|
* ``OCP_DIR`` - directory where your openshift/release is located (can be exported)
|
||||||
|
* ``E2E_TEST`` - openshift-test(s) to be executed (can be exported)
|
||||||
|
* behaviour of SETUP (returning 125 skips the current image tag, returning
|
||||||
|
>=128 interrupts the execution, everything else reports the tag as failure
|
||||||
|
* what should be executed (perhaps running the setup is enough for you or
|
||||||
|
you might want to be looking for specific failures...)
|
||||||
|
* use ``timeout`` to interrupt execution in case you know things should be faster
|
||||||
|
|
||||||
|
Executing that script with the GOOD commit should pass
|
||||||
|
``./sample-test-reproducer.sh quay.io/kata-containers/kata-deploy-ci:kata-containers-d7afd31fd40e37a675b25c53618904ab57e74ccd-amd64``
|
||||||
|
and fail when executed with the BAD commit
|
||||||
|
``./sample-test-reproducer.sh quay.io/kata-containers/kata-deploy-ci:kata-containers-9f512c016e75599a4a921bd84ea47559fe610057-amd64``.
|
||||||
|
|
||||||
|
To get the list of all tags in between those two PRs you can use the
|
||||||
|
``bisect-range.sh`` script
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./bisect-range.sh d7afd31fd40e37a675b25c53618904ab57e74ccd 9f512c016e75599a4a921bd84ea47559fe610057
|
||||||
|
```
|
||||||
|
|
||||||
|
[!NOTE]
|
||||||
|
The tagged images are only built per PR, not for individual commits. See
|
||||||
|
[kata-deploy-ci](https://quay.io/kata-containers/kata-deploy-ci) to see the
|
||||||
|
available images.
|
||||||
|
|
||||||
|
To find out which PR caused this regression, you can either manually try the
|
||||||
|
individual commits or you can simply execute:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bisecter start "$(./bisect-range.sh d7afd31fd40 9f512c016)"
|
||||||
|
OCP_DIR=/path/to/openshift/release bisecter run ./sample-test-reproducer.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
[!NOTE]
|
||||||
|
If you use ``KATA_WITH_SYSTEM_QEMU=yes`` you might want to deploy once with
|
||||||
|
it and skip it for the cleanup. That way you might (in most cases) test
|
||||||
|
all images with a single MCP update instead of per-image MCP update.
|
||||||
|
|
||||||
|
[!TIP]
|
||||||
|
You can check the bisection progress during/after execution by running
|
||||||
|
``bisecter log`` from the current directory. Before starting a new
|
||||||
|
bisection you need to execute ``bisecter reset``.
|
||||||
|
24
ci/openshift-ci/bisect-range.sh
Executable file
24
ci/openshift-ci/bisect-range.sh
Executable file
@ -0,0 +1,24 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
# Copyright (c) 2024 Red Hat, Inc.
|
||||||
|
#
|
||||||
|
# SPDX-License-Identifier: Apache-2.0
|
||||||
|
#
|
||||||
|
if [ "$#" -gt 2 ] || [ "$#" -lt 1 ] ; then
|
||||||
|
echo "Usage: $0 GOOD [BAD]"
|
||||||
|
echo "Prints list of available kata-deploy-ci tags between GOOD and BAD commits (by default BAD is the latest available tag)"
|
||||||
|
exit 255
|
||||||
|
fi
|
||||||
|
GOOD="$1"
|
||||||
|
[ -n "$2" ] && BAD="$2"
|
||||||
|
ARCH=amd64
|
||||||
|
REPO="quay.io/kata-containers/kata-deploy-ci"
|
||||||
|
|
||||||
|
TAGS=$(skopeo list-tags "docker://$REPO")
|
||||||
|
# Only amd64
|
||||||
|
TAGS=$(echo "$TAGS" | jq '.Tags' | jq "map(select(endswith(\"$ARCH\")))" | jq -r '.[]')
|
||||||
|
# Tags since $GOOD
|
||||||
|
TAGS=$(echo "$TAGS" | sed -n -e "/$GOOD/,$$p")
|
||||||
|
# Tags up to $BAD
|
||||||
|
[ -n "$BAD" ] && TAGS=$(echo "$TAGS" | sed "/$BAD/q")
|
||||||
|
# Comma separated tags with repo
|
||||||
|
echo "$TAGS" | sed -e "s@^@$REPO:@" | paste -s -d, -
|
50
ci/openshift-ci/sample-test-reproducer.sh
Executable file
50
ci/openshift-ci/sample-test-reproducer.sh
Executable file
@ -0,0 +1,50 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
# Copyright (c) 2024 Red Hat, Inc.
|
||||||
|
#
|
||||||
|
# SPDX-License-Identifier: Apache-2.0
|
||||||
|
#
|
||||||
|
# A sample script to deploy, configure, run E2E_TEST and soft-cleanup
|
||||||
|
# afterwards OCP cluster using kata-containers primarily created for use
|
||||||
|
# with https://github.com/ldoktor/bisecter
|
||||||
|
|
||||||
|
[ "$#" -ne 1 ] && echo "Provide image as the first and only argument" && exit 255
|
||||||
|
export KATA_DEPLOY_IMAGE="$1"
|
||||||
|
OCP_DIR="${OCP_DIR:-/path/to/your/openshift/release/}"
|
||||||
|
E2E_TEST="${E2E_TEST:-'"[sig-node] Container Runtime blackbox test on terminated container should report termination message as empty when pod succeeds and TerminationMessagePolicy FallbackToLogsOnError is set [NodeConformance] [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]"'}"
|
||||||
|
KATA_CI_DIR="${KATA_CI_DIR:-$(pwd)}"
|
||||||
|
export KATA_RUNTIME="${KATA_RUNTIME:-kata-qemu}"
|
||||||
|
|
||||||
|
## SETUP
|
||||||
|
# Deploy kata
|
||||||
|
SETUP=0
|
||||||
|
pushd "$KATA_CI_DIR" || { echo "Failed to cd to '$KATA_CI_DIR'"; exit 255; }
|
||||||
|
./test.sh || SETUP=125
|
||||||
|
cluster/deploy_webhook.sh || SETUP=125
|
||||||
|
if [ $SETUP != 0 ]; then
|
||||||
|
./cleanup.sh
|
||||||
|
exit "$SETUP"
|
||||||
|
fi
|
||||||
|
popd || true
|
||||||
|
# Disable security
|
||||||
|
oc adm policy add-scc-to-group privileged system:authenticated system:serviceaccounts
|
||||||
|
oc adm policy add-scc-to-group anyuid system:authenticated system:serviceaccounts
|
||||||
|
oc label --overwrite ns default pod-security.kubernetes.io/enforce=privileged pod-security.kubernetes.io/warn=baseline pod-security.kubernetes.io/audit=baseline
|
||||||
|
|
||||||
|
## TEST EXECUTION
|
||||||
|
# Run the testing
|
||||||
|
pushd "$OCP_DIR" || { echo "Failed to cd to '$OCP_DIR'"; exit 255; }
|
||||||
|
echo "$E2E_TEST" > /tmp/tsts
|
||||||
|
# Remove previously-existing temporarily files as well as previous results
|
||||||
|
OUT=RESULTS/tmp
|
||||||
|
rm -Rf /tmp/*test* /tmp/e2e-*
|
||||||
|
rm -R $OUT
|
||||||
|
mkdir -p $OUT
|
||||||
|
# Run the tests ignoring the monitor health checks
|
||||||
|
./openshift-tests run --provider azure -o "$OUT/job.log" --junit-dir "$OUT" --file /tmp/tsts --max-parallel-tests 5 --cluster-stability Disruptive
|
||||||
|
RET=$?
|
||||||
|
popd || true
|
||||||
|
|
||||||
|
## CLEANUP
|
||||||
|
./cleanup.sh
|
||||||
|
exit "$RET"
|
||||||
|
|
@ -11,6 +11,7 @@ AUFS # Another Union FS
|
|||||||
AWS/AB
|
AWS/AB
|
||||||
BDF/AB
|
BDF/AB
|
||||||
CFS/AB
|
CFS/AB
|
||||||
|
ci/AB
|
||||||
CLI/AB
|
CLI/AB
|
||||||
CNI/AB
|
CNI/AB
|
||||||
CNM/AB
|
CNM/AB
|
||||||
@ -33,6 +34,7 @@ gRPC/AB
|
|||||||
GSC/AB
|
GSC/AB
|
||||||
GVT/AB
|
GVT/AB
|
||||||
IaaS/B # Infrastructure as a Service
|
IaaS/B # Infrastructure as a Service
|
||||||
|
io/B
|
||||||
IOMMU/AB
|
IOMMU/AB
|
||||||
IoT/AB # Internet of Things
|
IoT/AB # Internet of Things
|
||||||
IOV/AB
|
IOV/AB
|
||||||
|
@ -67,6 +67,7 @@ metadata
|
|||||||
microcontroller/AB
|
microcontroller/AB
|
||||||
miniOS
|
miniOS
|
||||||
mmap/AB
|
mmap/AB
|
||||||
|
MonitorTest/A
|
||||||
nack/AB
|
nack/AB
|
||||||
namespace/ABCD
|
namespace/ABCD
|
||||||
netlink
|
netlink
|
||||||
|
@ -6,6 +6,7 @@
|
|||||||
|
|
||||||
Ansible/B
|
Ansible/B
|
||||||
AppArmor/B
|
AppArmor/B
|
||||||
|
bisecter/B
|
||||||
blogbench/B
|
blogbench/B
|
||||||
BusyBox/B
|
BusyBox/B
|
||||||
Cassandra/B
|
Cassandra/B
|
||||||
@ -62,6 +63,7 @@ Netlify/B
|
|||||||
Nginx/B
|
Nginx/B
|
||||||
OpenCensus/B
|
OpenCensus/B
|
||||||
OpenPGP/B
|
OpenPGP/B
|
||||||
|
openshift/B # lower-case used for some sub-projects
|
||||||
OpenShift/B
|
OpenShift/B
|
||||||
OpenSSL/B
|
OpenSSL/B
|
||||||
OpenStack/B
|
OpenStack/B
|
||||||
|
@ -1,4 +1,4 @@
|
|||||||
386
|
387
|
||||||
ACPI/AB
|
ACPI/AB
|
||||||
ACS/AB
|
ACS/AB
|
||||||
API/AB
|
API/AB
|
||||||
@ -90,6 +90,7 @@ MITRE/B
|
|||||||
MacOS/B
|
MacOS/B
|
||||||
Mellanox/B
|
Mellanox/B
|
||||||
Minikube/B
|
Minikube/B
|
||||||
|
MonitorTest/A
|
||||||
NEMU/AB
|
NEMU/AB
|
||||||
NIC/AB
|
NIC/AB
|
||||||
NVDIMM/AB
|
NVDIMM/AB
|
||||||
@ -197,6 +198,7 @@ backend
|
|||||||
backport/ACD
|
backport/ACD
|
||||||
backtick/AB
|
backtick/AB
|
||||||
backtrace
|
backtrace
|
||||||
|
bisecter/B
|
||||||
blogbench/B
|
blogbench/B
|
||||||
bootloader/AB
|
bootloader/AB
|
||||||
ccloudvm/B
|
ccloudvm/B
|
||||||
@ -204,6 +206,7 @@ centric/B
|
|||||||
cgroup/AB
|
cgroup/AB
|
||||||
checkbox/A
|
checkbox/A
|
||||||
chipset/AB
|
chipset/AB
|
||||||
|
ci/AB
|
||||||
cnn/B
|
cnn/B
|
||||||
codebase
|
codebase
|
||||||
codecov/B
|
codecov/B
|
||||||
@ -255,6 +258,7 @@ init/AB
|
|||||||
initramfs/AB
|
initramfs/AB
|
||||||
initrd/AB
|
initrd/AB
|
||||||
intel
|
intel
|
||||||
|
io/B
|
||||||
ioctl/A
|
ioctl/A
|
||||||
iodepth/A
|
iodepth/A
|
||||||
ioengine/A
|
ioengine/A
|
||||||
@ -295,6 +299,7 @@ netns/AB
|
|||||||
nvidia/A
|
nvidia/A
|
||||||
onwards
|
onwards
|
||||||
openSUSE/B
|
openSUSE/B
|
||||||
|
openshift/B
|
||||||
osbuilder/B
|
osbuilder/B
|
||||||
packagecloud/B
|
packagecloud/B
|
||||||
parallelize/AC
|
parallelize/AC
|
||||||
|
Loading…
Reference in New Issue
Block a user