WaitForPods is now a generic function which lists pods and then checks the pods
that it found against some provided condition. A parameter determines how many
pods must be found resp. match the condition for the check to succeed.
The code becomes simpler (78 insertions, 91 deletions), easier to read (all
code entirely inside WaitForPodsRunningReady, no need to declare and later
overwrite variables) and possibly more correct (if all API calls failed,
the resulting error was ignored when allowedNotReadyPods > 0).
None of the users of the functions passed anything other than nil or an empty
map and the implementation ignore the parameter - it seems like a candidate for
simplification.
When a Gomega failure is converted to an error, the stack at the time when the
failure occurs may be useful: error wrapping provides some bread crumbs that
can be followed to determine where the failure really occurred, but error
wrapping may be missing or ambiguous.
To provide the additional information, a FailureError now includes a full stack
backtrace. The backtrace intentionally makes no attempt to exclude framework
functions besides the gomega support itself because helpers like
e2e/framework/pod may be relevant.
That backtrace is not included in the failure message for the sake of
brevity. Instead, it gets logged as part of the test's output.
gomega.Eventually provides better progress reports: instead of filling up the
log with rather useless one-line messages that are not enough to to understand
the current state, it integrates with Gingko's progress reporting (SIGUSR1,
--poll-progress-after) and then dumps the same complete failure message as
after a timeout. That makes it possible to understand why progress isn't
getting made without having to wait for the timeout.
The other advantage is that the failure message for some unexpected pod state
becomes more readable: instead of encapsulating it as "observed object" inside
an error, it directly gets rendered by gomega.
Calling gomega.Expect/Eventually/Consistently deep inside a helper call chain
has several challenges:
- the stack offset must be tracked correctly, otherwise the callstack
for the failure starts at some helper code, which is often not informative
- augmenting the failure message with additional information from each
caller implies that each caller must pass down a string and/or format
string plus arguments
Both challenges can be solved by returning errors:
- the stacktrace is taken at that level where the error is
treated as a failure instead of passing back an error, i.e.
inside the It callback
- traditional error wrapping can add additional information, if
desirable
What was missing was some easy way to generate an error via a gomega
assertion. The new infrastructure achieves that by mirroring the
Gomega/Assertion/AsyncAssertion interfaces with errors as return values instead
of calling a fail handler.
It is intentionally less flexible than the gomega APIs:
- A context must be passed to Eventually/Consistently as first
parameter because that is needed for proper timeout handling.
- No additional text can be added to the failure through this
API because error wrapping is meant to be used for this.
- No need to adjust the callstack offset because no backtrace
is recorded when a failure occurs.
To avoid the useless "unexpected error" log message when passing back a gomega
failure, ExpectNoError gets extended to recognize such errors and then skips
the logging.
There are two runtime class tests which required the container runtime
config to include explicit configuration for `test-handler`. The current
logic skips these tests in non GCE environments. This skip is too strict
since the test is skipped in node e2e environments and in other
environments such as kind, which support running the test and also
configure `test-handler`.
Instead of skipping based on provider, add a new function
`NodeSupportsPreconfiguredRuntimeClassHandler` which examines the
underlying container runtime config and checks if the config includes
`test-handler`. The check is a bit brittle since it assumes container
runtime config paths, but it is a net improvement over skipping the test
entirely on non GCE environments.
This results in the test working in the common test environments, namely
GCE kube-up, node e2e, and kind.
Signed-off-by: David Porter <david@porter.me>
These helper functions can be used in combination with
omega.Eventually/Consistently to implement polling of objects that is aware of
Kubernetes apiserver conventions:
- retry on certain errors instead of giving up,
with "not found" handling decided by the caller (may or may not
be fatal, depending on the test)
- sleep if requested by apiserver
The E2E framework contains several functions which only differ in how they get
name and namespace: from an API object (WaitForPodRunningInNamespace) or as
separate parameters (WaitTimeoutForPodRunningInNamespace).
NamespacedName and the NamedObject interface enable writing helper functions
that can be called with both an API object (like *v1.Pod, which implements
Object and thus NamedObject) and name+namespace string (via
NamespacedName).
The other advantage of NamespacedName is that the order of name and namespace
parameter cannot be mixed up.
NamespacedName was derived from k8s.io/apimachinery/pkg/types.NamespacedName. A
separate type in the framework package was chosen a) to avoid additional
imports in test code and b) because the interface might not be suitable for
k8s.io/apimachinery.
Primarily this protects against accidentally polling with the default interval
of 10ms. Setting these defaults may also make some tests simpler because they
don't need to override the defaults.
Various different tests all have their own poll intervals. As a start towards
consolidating that, the interval from test/e2e/framework/pod (as one of the
most common cases for polling) is moved into the framework.
Changing other helper packages and tests needs to follow.
This consolidates timeout handling. In the future, configuration of all
timeouts via a configuration file might get added. For now, the same three
legacy command line flags for the timeouts that get moved continue to be
supported.
If we were to add new fields in TimeoutContext, the current users of
NewFrameworkWithCustomTimeouts might run into failures unless they get modified
to also set those new fields. This is error-prone.
A better approach is to let users of NewFrameworkWithCustomTimeouts override
fields by setting just those and use the normal defaults for the others.
It doesn't make sense for the E2E framework to have command line options that
don't do anything because then all test suites built with the framework inherit
those options.
For -list-images and -list-conformance-tests the solution is to move the
implementation into the framework (-list-images) respectively move the flag
into test/e2e (-list-conformance-tests).
The placement was decided based on the observation that image patching is
common functionality while conformance testing is specific to one test suite.
The old tests were no longer passing with Ginkgo v2.5.0. Instead of keeping the
old approach of checking recorded spec results, now the tests actually cover
what we care about most: the results recorded in JUnit.
This also gets rid of having to repeat the stack backtrace twice (once as part
of the output, once for the separate backtrace field).
All information that we want will be written into the failure XML element's
data. We don't need the message tag and don't want it because our
tools (kettle, testgrid, spyglass) would then just concatenate the two strings.
This gets implemented for us by Ginkgo. However, truncating the failure message
is not supported there at the moment. It's unclear how important that is,
therefore this (recently added feature) gets removed.
All code must use the context from Ginkgo when doing API calls or polling for a
change, otherwise the code would not return immediately when the test gets
aborted.
The Gingo v2 time suffix is hh:mm:ss without the .xyz sub-second details if the
time stamp happens to land exactly on a second.
This change fixes test flakes like the following:
-STEP: Building a namespace api object, basename test-namespace
+STEP: Building a namespace api object, basename test-namespace 12/13/22 11:43:53
--- FAIL: TestCleanup (36.79s)