The recent regression https://github.com/kubernetes/kubernetes/issues/107033 shows that we need a way to automatically measure different logging configurations (structured text, JSON with and without split streams) under realistic conditions (time stamping, caller identification). System calls may affect the performance and thus writing into actual files is useful. A temp dir under /tmp (usually a tmpfs) is used, so the actual IO bandwidth shouldn't affect the outcome. The "normal" json.Factory code is used to construct the JSON logger when we have actual files that can be set as os.Stderr and os.Stdout, thus making this as realistic as possible. When discarding the output instead of writing it, the focus is more on the rest of the pipeline and changes there can be investigated more reliably. The benchmarks automatically gather "log entries per second" and "bytes per second", which is useful to know when considering requirements like the ones from https://github.com/kubernetes/kubernetes/issues/107029.
Benchmarking logging
Any major changes to the logging code, whether it is in Kubernetes or in klog, must be benchmarked before and after the change.
Running the benchmark
$ go test -bench=. -test.benchmem -benchmem .
Real log data
The files under data define test cases for specific aspects of formatting. To
test with a log file that represents output under some kind of real load, copy
the log file into data/<file name>.log and run benchmarking as described
above. -bench=BenchmarkLogging/<file name without .log suffix> can be used
to benchmark just the new file.
When using data/v<some number>/<file name>.log, formatting will be done at
that log level. Symlinks can be created to simulating writing of the same log
data at different levels.
No such real data is included in the Kubernetes repo because of their size. They can be found in the "artifacts" of this https://testgrid.kubernetes.io/sig-instrumentation-tests#kind-json-logging-master Prow job:
artifacts/logs/kind-control-plane/containersartifacts/logs/kind-*/kubelet.log
With sufficient credentials, gsutil can be used to download everything for a job with:
gsutil -m cp -R gs://kubernetes-jenkins/logs/ci-kubernetes-kind-e2e-json-logging/<job ID> .
Analyzing log data
While loading a file, some statistics about it are collected. Those are shown when running with:
$ go test -v -bench=. -test.benchmem -benchmem .