From c8bd3b886b9054665c2ca5fe7b12175eb1ec463c Mon Sep 17 00:00:00 2001 From: Quinton Hoole Date: Fri, 22 Jan 2016 13:52:53 -0800 Subject: [PATCH] Add document on writing good e2e tests. --- docs/devel/e2e-tests.md | 2 + docs/devel/writing-good-e2e-tests.md | 264 +++++++++++++++++++++++++++ 2 files changed, 266 insertions(+) create mode 100644 docs/devel/writing-good-e2e-tests.md diff --git a/docs/devel/e2e-tests.md b/docs/devel/e2e-tests.md index 0c75af705c2..5e44f07d402 100644 --- a/docs/devel/e2e-tests.md +++ b/docs/devel/e2e-tests.md @@ -42,6 +42,8 @@ The e2e tests in kubernetes are built atop of [Ginkgo](http://onsi.github.io/gin The purpose of *this* document is to serve as a primer for developers who are looking to execute or add tests using a local development environment. +Before writing new tests or making substantive changes to existing tests, you should also read [Writing Good e2e Tests](writing-good-e2e-tests.md) + ## Building and Running the Tests There are a variety of ways to run e2e tests, but we aim to decrease the number of ways to run e2e tests to a canonical way: `hack/e2e.go`. diff --git a/docs/devel/writing-good-e2e-tests.md b/docs/devel/writing-good-e2e-tests.md new file mode 100644 index 00000000000..f00b55dc7e0 --- /dev/null +++ b/docs/devel/writing-good-e2e-tests.md @@ -0,0 +1,264 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +# Writing good e2e tests for Kubernetes # + +## Patterns and Anti-Patterns ## + +### Goals of e2e tests ### + +Beyond the obvious goal of providing end-to-end system test coverage, +there are a few less obvious goals that you should bear in mind when +designing, writing and debugging your end-to-end tests. In +particular, "flaky" tests, which pass most of the time but fail +intermittently for difficult-to-diagnose reasons are extremely costly +in terms of blurring our regression signals and slowing down our +automated merge queue. Up-front time and effort designing your test +to be reliable is very well spent. Bear in mind that we have hundreds +of tests, each running in dozens of different environments, and if any +test in any test environment fails, we have to assume that we +potentially have some sort of regression. So if a significant number +of tests fail even only 1% of the time, basic statistics dictates that +we will almost never have a "green" regression indicator. Stated +another way, writing a test that is only 99% reliable is just about +useless in the harsh reality of a CI environment. In fact it's worse +than useless, because not only does it not provide a reliable +regression indicator, but it also costs a lot of subsequent debugging +time, and delayed merges. + +#### Debuggability #### + +If your test fails, it should provide as detailed as possible reasons +for the failure in it's output. "Timeout" is not a useful error +message. "Timed out after 60 seconds waiting for pod xxx to enter +running state, still in pending state" is much more useful to someone +trying to figure out why your test failed and what to do about it. +Specifically, +[assertion](https://onsi.github.io/gomega/#making-assertions) code +like the following generates rather useless errors: + +``` +Expect(err).NotTo(HaveOccurred()) +``` + +Rather +[annotate](https://onsi.github.io/gomega/#annotating-assertions) your assertion with something like this: + +``` +Expect(err).NotTo(HaveOccurred(), "Failed to create %d foobars, only created %d", foobarsReqd, foobarsCreated) +``` + +On the other hand, overly verbose logging, particularly of non-error conditions, can make +it unnecessarily difficult to figure out whether a test failed and if +so why? So don't log lots of irrelevant stuff either. + +#### Ability to run in non-dedicated test clusters #### + +To reduce end-to-end delay and improve resource utilization when +running e2e tests, we try, where possible, to run large numbers of +tests in parallel against the same test cluster. This means that: + +1. you should avoid making any assumption (implicit or explicit) that +your test is the only thing running against the cluster. For example, +making the assumption that your test can run a pod on every node in a +cluster is not a safe assumption, as some other tests, running at the +same time as yours, might have saturated one or more nodes in the +cluster. Similarly, running a pod in the system namespace, and +assuming that that will increase the count of pods in the system +namespace by one is not safe, as some other test might be creating or +deleting pods in the system namespace at the same time as your test. +If you do legitimately need to write a test like that, make sure to +label it ["\[Serial\]"](e2e-tests.md#kinds_of_tests) so that it's easy +to identify, and not run in parallel with any other tests. +1. You should avoid doing things to the cluster that make it difficult +for other tests to reliably do what they're trying to do, at the same +time. For example, rebooting nodes, disconnecting network interfaces, +or upgrading cluster software as part of your test is likely to +violate the assumptions that other tests might have made about a +reasonably stable cluster environment. If you need to write such +tests, please label them as +["\[Disruptive\]"](e2e-tests.md#kinds_of_tests) so that it's easy to +identify them, and not run them in parallel with other tests. +1. You should avoid making assumptions about the Kubernetes API that +are not part of the API specification, as your tests will break as +soon as these assumptions become invalid. For example, relying on +specific Events, Event reasons or Event messages will make your tests +very brittle. + +#### Speed of execution #### + +We have hundreds of e2e tests, some of which we run in serial, one +after the other, in some cases. If each test takes just a few minutes +to run, that very quickly adds up to many, many hours of total +execution time. We try to keep such total execution time down to a +few tens of minutes at most. Therefore, try (very hard) to keep the +execution time of your individual tests below 2 minutes, ideally +shorter than that. Concretely, adding inappropriately long 'sleep' +statements or other gratuitous waits to tests is a killer. If under +normal circumstances your pod enters the running state within 10 +seconds, and 99.9% of the time within 30 seconds, it would be +gratuitous to wait 5 minutes for this to happen. Rather just fail +after 30 seconds, with a clear error message as to why your test +failed ("e.g. Pod x failed to become ready after 30 seconds, it +usually takes 10 seconds"). If you do have a truly legitimate reason +for waiting longer than that, or writing a test which takes longer +than 2 minutes to run, comment very clearly in the code why this is +necessary, and label the test as +["\[Slow\]"](e2e-tests.md#kinds_of_tests), so that it's easy to +identify and avoid in test runs that are required to complete +timeously (for example those that are run against every code +submission before it is allowed to be merged). +Note that completing within, say, 2 minutes only when the test +passes is not generally good enough. Your test should also fail in a +reasonable time. We have seen tests that, for example, wait up to 10 +minutes for each of several pods to become ready. Under good +conditions these tests might pass within a few seconds, but if the +pods never become ready (e.g. due to a system regression) they take a +very long time to fail and typically cause the entire test run to time +out, so that no results are produced. Again, this is a lot less +useful than a test that fails reliably within a minute or two when the +system is not working correctly. + +#### Resilience to relatively rare, temporary infrastructure glitches or delays #### + +Remember that your test will be run many thousands of +times, at different times of day and night, probably on different +cloud providers, under different load conditions. And often the +underlying state of these systems is stored in eventually consistent +data stores. So, for example, if a resource creation request is +theoretically asynchronous, even if you observe it to be practically +synchronous most of the time, write your test to assume that it's +asynchronous (e.g. make the "create" call, and poll or watch the +resource until it's in the correct state before proceeding). +Similarly, don't assume that API endpoints are 100% available. +They're not. Under high load conditions, API calls might temporarily +fail or time-out. In such cases it's appropriate to back off and retry +a few times before failing your test completely (in which case make +the error message very clear about what happened, e.g. "Retried +http://... 3 times - all failed with xxx". Use the standard +retry mechanisms provided in the libraries detailed below. + +### Some concrete tools at your disposal ### + +Obviously most of the above goals apply to many tests, not just yours. +So we've developed a set of reusable test infrastructure, libraries +and best practises to help you to do the right thing, or at least do +the same thing as other tests, so that if that turns out to be the +wrong thing, it can be fixed in one place, not hundreds, to be the +right thing. + +Here are a few pointers: + ++ [E2e Framework](../../test/e2e/framework.go): + Familiarise yourself with this test framework and how to use it. + Amongst others, it automatically creates uniquely named namespaces + within which your tests can run to avoid name clashes, and reliably + automates cleaning up the mess after your test has completed (it + just deletes everything in the namespace). This helps to ensure + that tests do not leak resources. Note that deleting a namespace + (and by implication everything in it) is currently an expensive + operation. So the fewer resources you create, the less cleaning up + the framework needs to do, and the faster your test (and other + tests running concurrently with yours) will complete. Your tests + should always use this framework. Trying other home-grown + approaches to avoiding name clashes and resource leaks has proven + to be a very bad idea. ++ [E2e utils library](../../test/e2e/util.go): + This handy library provides tons of reusable code for a host of + commonly needed test functionality, including waiting for resources + to enter specified states, safely and consistently retrying failed + operations, usefully reporting errors, and much more. Make sure + that you're familiar with what's available there, and use it. + Likewise, if you come across a generally useful mechanism that's + not yet implemented there, add it so that others can benefit from + your brilliance. In particular pay attention to the variety of + timeout and retry related constants at the top of that file. Always + try to reuse these constants rather than try to dream up your own + values. Even if the values there are not precisely what you would + like to use (timeout periods, retry counts etc), the benefit of + having them be consistent and centrally configurable across our + entire test suite typically outweighs your personal preferences. ++ **Follow the examples of stable, well-written tests:** Some of our + existing end-to-end tests are better written and more reliable than + others. A few examples of well-written tests include: + [Replication Controllers](../../test/e2e/rc.go), + [Services](../../test/e2e/service.go), + [Reboot](../../test/e2e/reboot.go). ++ [Ginkgo Test Framework](https://github.com/onsi/ginkgo): This is the + test library and runner upon which our e2e tests are built. Before + you write or refactor a test, read the docs and make sure that you + understand how it works. In particular be aware that every test is + uniquely identified and described (e.g. in test reports) by the + concatenation of it's `Describe` clause and nested `It` clauses. + So for example `Describe("Pods",...).... It(""should be scheduled + with cpu and memory limits")` produces a sane test identifier and + descriptor `Pods should be scheduled with cpu and memory limits`, + which makes it clear what's being tested, and hence what's not + working if it fails. Other good examples include: + +``` + CAdvisor should be healthy on every node +``` + +and + +``` + Daemon set should run and stop complex daemon +``` + + On the contrary +(these are real examples), the following are less good test +descriptors: + +``` + KubeProxy should test kube-proxy +``` + +and + +``` +Nodes [Disruptive] Network when a node becomes unreachable +[replication controller] recreates pods scheduled on the +unreachable node AND allows scheduling of pods on a node after +it rejoins the cluster +``` + +An improvement might be + +``` +Unreachable nodes are evacuated and then repopulated upon rejoining [Disruptive] +``` + +Note that opening issues for specific better tooling is welcome, and +code implementing that tooling is even more welcome :-). + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/writing-good-e2e-tests.md?pixel)]() +