doc: Add required jobs info

Add information about what required jobs are and our initial guidelines for how jobs are eligible for being made required, or non-required Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-04-28 03:42:09 +00:00 · 2024-12-20 14:12:13 +00:00 · 2024-12-20 14:12:13 +00:00 · 7612839640
commit 7612839640
parent 1f728eb906
1 changed files with 66 additions and 9 deletions
--- a/ci/README.md
+++ b/ci/README.md
@ -41,7 +41,7 @@ responsible for ensuring that:

 ### Jobs that require a maintainer's approval to run

-These are the required tests, and our so-called "CI".  These require a
+There are some tests, and our so-called "CI".  These require a
 maintainer's approval to run as parts of those jobs will be running on "paid
 runners", which are currently using Azure infrastructure.

@ -77,11 +77,11 @@ them to merely debug issues.

 In the previous section we've mentioned using different runners, now in this section we'll go through each type of runner used.

- Cost free runners:  Those are the runners provided by Github itself, and
-  those are fairly small machines with virtualization capabilities enabled. 
+- Cost free runners:  Those are the runners provided by GitHub itself, and
+  those are fairly small machines with virtualization capabilities enabled.
 - Azure small instances: Those are runners which have virtualization
  capabilities enabled, 2 CPUs, and 8GB of RAM.  These runners have a "-smaller"
-  suffix to their name. 
+  suffix to their name.
 - Azure normal instances: Those are runners which have virtualization
  capabilities enabled, 4 CPUs, and 16GB of RAM.  These runners are usually
  `garm` ones with no "-smaller" suffix.
@ -91,7 +91,7 @@ In the previous section we've mentioned using different runners, now in this sec
  runners which will be actually performing the tests must have virtualization
  capabilities and a reasonable amount for CPU and RAM available (at least
  matching the Azure normal instances).
-  
+
 ## Adding new tests

 Before someone decides to add a new test, we strongly recommend them to go
@ -138,6 +138,63 @@ Following those examples, the community advice during the review, and even
 asking the community directly on Slack are the best ways to get your test
 accepted.

+## Required tests
+
+In our CI we have two categories of jobs - required and non-required:
+- Required jobs need to all pass for a PR to be merged normally and
+should cover all the core features on Kata Containers that we want to
+ensure don't have regressions.
+- The non-required jobs are for unstable tests, or for features that
+are experimental and not-fully supported. We'd like those tests to also
+pass on all PRs ideally, but don't block merging if they don't as it's
+not necessarily an indication of the PR code causing regressions.
+
+### Transitioning between required and non-required status
+
+Required jobs that fail block merging of PRs, so we want to ensure that
+jobs are stable and maintained before we make them required.
+
+The [Kata Containers CI Dashboard](https://kata-containers.github.io/)
+is a useful resource to check when collecting evidence of job stability.
+At time of writing it reports the last ten days of Kata CI nightly test
+results for each job. This isn't perfect as it doesn't currently capture
+results on PRs, but is a good guideline for stability.
+
+> [!NOTE]
+> Below are general guidelines about jobs being marked as
+> required/non-required, but they are subject to change and the Kata
+> Architecture Committee may overrule these guidelines at their
+> discretion.
+
+#### Initial marking as required
+
+For new jobs, or jobs that haven't been marked as required recently,
+the criteria to be initially marked as required is ten days
+of passing tests, with no relevant PR failures reported in that time.
+Required jobs also need one or more nominated maintainers that are
+responsible for the stability of their jobs.
+
+> [!NOTE]
+> We don't currently have a good place to record the job maintainers, but
+> once we have this, the intention is to show it on the CI Dashboard so
+> people can find the contact easily.
+
+#### Expectation of required job maintainers
+
+Due to the nature of the Kata Containers community having contributors
+spread around the world, required jobs being blocked due to infrastructure,
+or test issues can have a big impact on work. As such, the expectation is
+that when a problem with a required job is noticed/reported, the maintainers
+have one working day to acknowledge the issue, perform an initial
+investigation and then either fix it, or get it marked as non-required
+whilst the investigation and/or fix it done.
+
+### Re-marking of required status
+
+Once a job has been removed from the required list, it requires two
+consecutive successful nightly test runs before being made required
+again.
+
 ## Running tests

 ### Running the tests as part of the CI
@ -247,7 +304,7 @@ $ git remote add upstream https://github.com/kata-containers/kata-containers
 $ git remote update
 $ git config --global user.email "you@example.com"
 $ git config --global user.name "Your Name"
-$ git rebase upstream/main 
+$ git rebase upstream/main
 ```

 Now copy the `kata-static.tar.xz` into your `kata-containers/kata-artifacts` directory
@ -261,7 +318,7 @@ $ cp ../kata-static.tar.xz kata-artifacts/
 > If you downloaded the .zip from GitHub you need to uncompress first to see `kata-static.tar.xz`

 And finally run the tests following what's in the yaml file for the test you're
-debugging. 
+debugging.

 In our case, the `run-nerdctl-tests-on-garm.yaml`.

@ -284,7 +341,7 @@ $ bash tests/integration/nerdctl/gha-run.sh run

 And with this you should've been able to reproduce exactly the same issue found
 in the CI, and from now on you can build your own code, use your own binaries,
-and have fun debugging and hacking! 
+and have fun debugging and hacking!

 ### Debugging a Kubernetes test

@ -332,7 +389,7 @@ If you want to remove a current self-hosted runner:

 - For each runner there's a "..." menu, where you can just click and the
  "Remove runner" option will show up
-  
+
 ## Known limitations

 As the GitHub actions are structured right now we cannot: Test the addition of a