CI testing guidelines redux

This commit is contained in:
Isaac Hollander McCreery 2016-01-29 16:20:53 -08:00
parent 63b7ba0414
commit b6cb8be2bd
2 changed files with 40 additions and 71 deletions

View File

@ -112,19 +112,54 @@ We are working on implementing clearer partitioning of our e2e tests to make run
- `[Disruptive]`: If a test restarts components that might cause other tests to fail or break the cluster completely, it is labeled `[Disruptive]`. Any `[Disruptive]` test is also assumed to qualify for the `[Serial]` label, but need not be labeled as both. These tests are not run against soak clusters to avoid restarting components.
- `[Flaky]`: If a test is found to be flaky, it receives the `[Flaky]` label until it is fixed. A `[Flaky]` label should be accompanied with a reference to the issue for de-flaking the test, because while a test remains labeled `[Flaky]`, it is not monitored closely in CI. `[Flaky]` tests are by default not run, unless a `focus` or `skip` argument is explicitly given.
- `[Skipped]`: `[Skipped]` is a legacy label that we're phasing out. If a test is marked `[Skipped]`, there should be an issue open to label it properly. `[Skipped]` tests are by default not run, unless a `focus` or `skip` argument is explicitly given.
- `[Feature:...]`: If a test has non-default requirements to run or targets some non-core functionality, and thus should not be run as part of the standard suite, it receives a `[Feature:...]` label, e.g. `[Feature:Performance]` or `[Feature:Ingress]`. `[Feature:...]` tests are not run in our core suites, instead running in custom suites. There are a few use-cases for `[Feature:...]` tests:
- If a feature is experimental or alpha and is not enabled by default due to being incomplete or potentially subject to breaking changes, it should *not* block the merge-queue, and thus should run in some separate test suites owned by the feature owner(s).
- If a feature is in beta or GA, it *should* block the merge-queue. In moving from experimental to beta or GA, tests that are expected to pass by default should simply remove the `[Feature:...]` label, and will be incorporated into our core suites. If tests are not expected to pass by default, (e.g. they require a special environment such as added quota,) they should remain with the `[Feature:...]` label, and the suites that run them should be incorporated into our merge-queue, owned by the Build Cop.
- `[Feature:.+]`: If a test has non-default requirements to run or targets some non-core functionality, and thus should not be run as part of the standard suite, it receives a `[Feature:.+]` label, e.g. `[Feature:Performance]` or `[Feature:Ingress]`. `[Feature:.+]` tests are not run in our core suites, instead running in custom suites. There are a few use-cases for `[Feature:.+]` tests:
- If a feature is experimental or alpha and is not enabled by default due to being incomplete or potentially subject to breaking changes, it does *not* block the merge-queue, and thus should run in some separate test suites owned by the feature owner(s) (see #continuous_integration below).
Finally, `[Conformance]` tests are tests we expect to pass on **any** Kubernetes cluster. The `[Conformance]` label does not supersede any other labels. `[Conformance]` test policies are a work-in-progress; see #18162.
## Adding a New Test
## Continuous Integration
A quick overview of how we run e2e CI on Kubernetes.
### What is CI?
We run a battery of `e2e` tests against `HEAD` of the master branch on a continuous basis, and block merges via the [submit queue](http://submit-queue.k8s.io/) on a subset of those tests if they fail (the subset is defined in the [munger config](https://github.com/kubernetes/contrib/blob/master/mungegithub/mungers/submit-queue.go) via the `jenkins-jobs` flag; note we also block on `kubernetes-build` and `kubernetes-test-go` jobs for build and unit and integration tests).
CI results can be found at [ci-test.k8s.io](ci-test.k8s.io), e.g. [ci-test.k8s.io/kubernetes-e2e-gce/10594](ci-test.k8s.io/kubernetes-e2e-gce/10594).
### What runs in CI?
We run all default tests (those that aren't marked `[Flaky]` or `[Feature:.+]`) against GCE and GKE. To minimize the time from regression-to-green-run, we partition tests across different jobs:
- `kubernetes-<provider>` runs all non-`[Slow]`, non-`[Serial]`, non-`[Disruptive]`, non-`[Flaky]`, non-`[Feature:.+]` tests in parallel.
- `kubernetes-<provider>-slow` runs all `[Slow]`, non-`[Serial]`, non-`[Disruptive]`, non-`[Flaky]`, non-`[Feature:.+]` tests in parallel.
- `kubernetes-<provider>-serial` runs all `[Serial]` and `[Disruptive]`, non-`[Flaky]`, non-`[Feature:.+]` tests in serial.
We also run non-default tests if the tests exercise general-availability ("GA") features that require a special environment to run in, e.g. `kubernetes-e2e-gce-scalability` and `kubernetes-kubemark-gce`, which test for Kubernetes performance.
#### Non-default tests
Many `[Feature:.+]` tests we don't run in CI. These tests are for features that are experimental (often in the `experimental` API), and aren't enabled by default.
### Adding a test to CI
As mentioned above, prior to adding a new test, it is a good idea to perform a `-ginkgo.dryRun=true` on the system, in order to see if a behavior is already being tested, or to determine if it may be possible to augment an existing set of tests for a specific use case.
If a behavior does not currently have coverage and a developer wishes to add a new e2e test, navigate to the ./test/e2e directory and create a new test using the existing suite as a guide.
**TODO:** Create a self-documented example which has been disabled, but can be copied to create new tests and outlines the capabilities and libraries used.
TODO(#20357): Create a self-documented example which has been disabled, but can be copied to create new tests and outlines the capabilities and libraries used.
When writing a test, consult #kinds_of_tests above to determine how your test should be marked, (e.g. `[Slow]`, `[Serial]`; remember, by default we assume a test can run in parallel with other tests!).
When first adding a test it should *not* go straight into CI, because failures block ordinary development. A test should only be added to CI after is has been running in some non-CI suite long enough to establish a track record showing that the test does not fail when run against *working* software.
Generally, a feature starts as `experimental`, and will be run in some suite owned by the team developing the feature. If a feature is in beta or GA, it *should* block the merge-queue. In moving from experimental to beta or GA, tests that are expected to pass by default should simply remove the `[Feature:.+]` label, and will be incorporated into our core suites. If tests are not expected to pass by default, (e.g. they require a special environment such as added quota,) they should remain with the `[Feature:.+]` label, and the suites that run them should be incorporated into the [munger config](https://github.com/kubernetes/contrib/blob/master/mungegithub/mungers/submit-queue.go) via the `jenkins-jobs` flag.
Occasionally, we'll want to add tests to better exercise features that are already GA. These tests also shouldn't go straight to CI. They should begin by being marked as `[Flaky]` to be run outside of CI, and once a track-record for them is established, they may be promoted out of `[Flaky]`.
### Moving a test out of CI
TODO(ihmccreery) do we want to keep the `[Flaky]` label at all?
## Performance Evaluation

View File

@ -1,66 +0,0 @@
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
<!-- BEGIN STRIP_FOR_RELEASE -->
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
width="25" height="25">
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
width="25" height="25">
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
width="25" height="25">
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
width="25" height="25">
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
width="25" height="25">
<h2>PLEASE NOTE: This document applies to the HEAD of the source tree</h2>
If you are using a released version of Kubernetes, you should
refer to the docs that go with that version.
<strong>
The latest release of this document can be found
[here](http://releases.k8s.io/release-1.1/docs/devel/testing.md).
Documentation for other releases can be found at
[releases.k8s.io](http://releases.k8s.io).
</strong>
--
<!-- END STRIP_FOR_RELEASE -->
<!-- END MUNGE: UNVERSIONED_WARNING -->
Kubernetes Commit Queue Testing
===============================
A quick overview of how we add, remove and recycle tests from CI.
## What is CI?
Throughout this document we will refer to CI as any suite of e2e tests that can potentially hold up the submit queue, this means Kubernetes PRs must pass these tests prior to getting merged.
## Adding a test to CI
When first adding a test it should *not* go straight into CI, because failures block ordinary development. A test should only be added to CI after is has been running in some non-CI suite long enought to establish a track record showing that the test does not fail when run against *working* software. A suite named `flaky` exists, and can be overloaded to mean `experimental` and used for this reason (can it really?). In addition to this track record, consider the following as requirements:
* The test must be short (20m?)
* Failures must indicate that the product is unfit (TODO: establish a firmer bar, what I'm trying to say is testing random controller X in CI doesn't help anyone, but maybe it does if controller X is a cluster addon, or our largest customer wants X, or what?)
* Failures must reliably indicate a bug in the product, not a bug in the test
(TODO: is there a parallelism requirement here?)
## Moving a test out of CI
Do *not* move a test to flaky as soon as it starts failing just to clear up the submit queue, this risks introducing more bugs and compounding the problem even further (TODO: or do this? but why). Build cop can use their better judgement to call a test `flaky`. This means it fails for presumably random reasons, once in X runs (TODO: is X == 0?). Move flaky tests out of CI, create a P0/1 bug and try to triage it along to the right person. Adding the `kind/flake` label on github will grab the attention of the grumpy CI shamer bot, which will include the bug in its daily report.
If your test got moved to flaky, it must demonstrate the run of greens required for getting added to CI once again (or nah?).
## Non CI channels for testing
If you want to test against Kubernetes but your test doesn't meet the following requirements, peruse [this list](../../hack/jenkins/e2e.sh) and add your test in whatever makes sense (TODO: What about release-lists, shouldn't they mirror other lists?).
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/testing.md?pixel)]()
<!-- END MUNGE: GENERATED_ANALYTICS -->