The previous approach was based on the observation that some Prow jobs use the
--report-dir parameter instead of the E2E_REPORT_DIR env variable. Parsing the
command line was necessary to use the --json-report and --junit-report
parameters.
But that is complex and can be avoided by triggering the creation of complete
reports in the E2E test suite. The paths are hard-coded and relative to the
report directory to keep the code simple.
There was a report that k8s-triage started processing more data after
6db4b741dd was merged. It's unclear whether
that was because of the new <report-dir>/ginkgo_report.xml file. To avoid
this potential problem, the reports are now in a "ginkgo" sub-directory.
While at it, error checking gets enhanced:
- Create directories at the start of
the suite and bail out early if that fails.
- *All* e2e suites using the framework do this, not just test/e2e.
- Added missing error checking of truncated JUnit report writing.
From the warning message that ginkgo now emits:
--slow-spec-threshold is deprecated --slow-spec-threshold has been deprecated
and will be removed in a future version of Ginkgo. This feature has proved
to be more noisy than useful. You can use --poll-progress-after, instead, to
get more actionable feedback about potentially slow specs and understand
where they might be getting stuck.
We already use --poll-progress-after.
This is a fix for 104aab81a4: because
the default was not set for E2E_TEST_DEBUG_TOOL, all parameters were always
also passed to the E2E suite.
That wasn't wrong for the parameters so far, but breaks when using something
like --output-dir which is only understood by the CLI.
If the script was called with no arguments, it passed "${@:-}" to the suite,
which expands to one empty argument. That's not right, "${@}" should be used
instead because it expands to nothing when empty.
Most parameters can be passed to both the CLI and the suite, but some
(for example, --ginkgo.slow-spec-threshold) had no effect when only
passed to the suite.
Adding the ability to ignore no schedule flags in testing.
Specifically node.cloudprovider.kubernetes.io/uninitialized:NoSchedule
Fix shellcheck complaint.
The new support in ginkgo for progress reports while a test runs dumps
information about where a test is stuck when it runs too long. This can provide
additional insights into what the test is waiting for.
For the Kubernetes jobs using ginkgo-e2e.sh, such dumps are now enabled after
300 seconds and then get repeated every 20 seconds. The initial delay is
intentionally the same as for warning about a slow test. The rationale is that
such test runtimes are unexpected and may need further information to diagnose
why they are slow.
With -ginkgo.source-root, Ginkgo is able to locate the Kubernetes source code
and display small source code snippets for functions that are related to the
test, determined through a heuristic that assumes that all files under the test
suite are for the tests in it.
Set intercept mode to none will help to reveal more information when the
test hangs,
- https://github.com/onsi/ginkgo/issues/970
or circumvent cases where the code grabbing the stdout/stderr pipe
is not under the framework control and may cause hangs,
- https://github.com/onsi/ginkgo/issues/851
The flag `output-interceptor-mode` is set to `none` as we were trying to
figure out of the rootcase of the test flaky, it's only intended for debugging.
- https://github.com/kubernetes/kubernetes/issues/111086
But this set also has some side effect, since it will turn off stdout/stderr
capture completely, any output to stdout/stderr will be lost.
Now that the root cause is not caused by Ginkgo bump nor how the intercept
mode was set, we'd better to follow the default value.
Signed-off-by: Dave Chen <dave.chen@arm.com>
This applies to all jobs using hack/ginkgo-e2e.sh. This is done because
Spyglass does not render the escape sequences, making test output harder to
read.
It is done here because then we don't need to set GINKGO_NO_COLOR in all the
different Prow job configs.
Default timeout setting has been reduced from `24h` down to `1h` in
Ginkgo V2, but for some long running test this is too short.
How long to abort the test was controlled by the the linux command `timeout`
in V1. e.g. `'timeout -k 30s 150m ...`, and is configured in the file
like `sig-network-misc.yaml`.
Set the timeout manually for Ginkgo V2 to avoid the early aborting.
Signed-off-by: Dave Chen <dave.chen@arm.com>
Some tests have a short timeout for starting the pods (1 minute), but if
those tests happen to be the first ones to run, and the images have to be
pulled, then the test could timeout, especially with larger images. This
commit will allow us to prepull commonly used E2E test images, so this issue
can be avoided.
The image "gcr.io/authenticated-image-pulling/windows-nanoserver:v1" is not a
manifest list, and it is only useful for Windows Server 1809, which means that the
test "should be able to pull from private registry with secret" will fail for
environments with Windows Server 1903, 1909, or any other future version we might
want to test.
This commit adds the the ability to have an alternative private image to pull by
using a configurable docker config file which contains the necessary credentials
needed to pull the image.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Deprecate InfluxDB cluster monitoring
InfluxDB cluster monitoring addon will no longer be supported and will be removed in k8s 1.12.
Default monitoring solution will be changed to `standalone`.
Heapster will still be deployed for backward compatibility of `kubectl top`
```release-note
Stop using InfluxDB as default cluster monitoring
InfluxDB cluster monitoring is deprecated and will be removed in v1.12
```
cc @piosz
In scalability testing influxdb was recently disabled, but we still
trying to execute corresponidng test, as a result it fails all the time.
Skip test if influxdb is disabled.
Transitional part of kubernetes/test-infra#3307, should be eliminated
by kubernetes/test-infra#3330: Allow NODE_INSTANCE_GROUP to be set
before we get here, which eliminates any cluster/gke use if
KUBERNETES_CONFORMANCE_PROVIDER is set to "gke".