metrics: Remove metrics report for Kata Containers

This PR removes the metrics report which is not longer being used
in Kata Containers.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This commit is contained in:
Gabriela Cervantes 2024-09-03 16:11:07 +00:00
parent 057612f18f
commit 5b0ab7f17c
16 changed files with 0 additions and 1796 deletions

View File

@ -209,7 +209,3 @@ set if necessary.
## `checkmetrics`
`checkmetrics` is a CLI tool to check a metrics CI results file. For further reference see the [`checkmetrics`](cmd/checkmetrics).
## Report generator
See the [report generator](report) documentation.

View File

@ -1,64 +0,0 @@
# Kata Containers metrics report generator
The files within this directory can be used to generate a "metrics report" for Kata Containers. The
primary workflow consists of two stages:
1) Run the provided report metrics data gathering scripts on the system(s) you wish to analyze.
2) Run the provided report generation script to analyze the data and generate a report file.
## Data gathering
Data gathering is provided by the `grabdata.sh` script. When run, this script executes a set of
tests from the `tests/metrics` directory. The JSON results files will be placed into the
`tests/metrics/results` directory. Once the results are generated, create a suitably named
subdirectory of `tests/metrics/results`, and move the JSON files into it. Repeat this process if
you want to compare multiple sets of results. Note, the report generation scripts process all
subdirectories of `tests/metrics/results` when generating the report.
> **Note:** By default, the `grabdata.sh` script tries to launch some moderately large containers
> (i.e. 8Gbyte RAM) and may fail to produce some results on a memory constrained system.
You can restrict the subset of tests run by `grabdata.sh` via its commandline parameters:
| Option | Description |
| ------ | ------------------------|
| -a | Run all tests (default) |
| -d | Run the density tests |
| -h | Print this help |
| -s | Run the storage tests |
| -t | Run the time tests |
## Report generation
Report generation is provided by the `makereport.sh` script. By default this script processes all
subdirectories of the `tests/metrics/results` directory to generate the report. To run in the
default mode, execute the following:
```
$ ./makereport.sh
```
The report generation tool uses [`Rmarkdown`](https://github.com/rstudio/rmarkdown), [R](https://www.r-project.org/about.html) and
[Pandoc](https://pandoc.org/) to produce a PDF report. To avoid the need for all users to set up a
working environment with all the necessary tooling, the `makereport.sh` script utilises a
`Dockerfile` with the environment pre-defined in order to produce the report. Thus, you need to
have Docker installed on your system in order to run the report generation. The resulting
`metrics_report.pdf` is generated into the `output` subdirectory of the `report` directory.
## Debugging and development
To aid in script development and debugging, the `makereport.sh` script offers a debug facility via
the `-d` command line option. Using this option will place you into a `bash` shell within the
running `Dockerfile` image used to generate the report, whilst also mapping your host side `R`
scripts from the `report_dockerfile` subdirectory into the container, thus facilitating a "live"
edit/reload/run development cycle. From there you can examine the Docker image environment, and
execute the generation scripts. E.g., to test the `scaling.R` script, you can execute:
```
$ makereport.sh -d
# R
> source('/inputdir/Env.R')
> source('/scripts/lifecycle-time.R')
```
You can then edit the `report_dockerfile/lifecycle-time.R` file on the host, and re-run
the `source('/scripts/lifecycle-time.R')` command inside the still running `R` container.

View File

@ -1,168 +0,0 @@
#!/bin/bash
# Copyright (c) 2018-2023 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
# Run a set of the metrics tests to gather data to be used with the report
# generator. The general ideal is to have the tests configured to generate
# useful, meaninful and repeatable (stable, with minimised variance) results.
# If the tests have to be run more or longer to achieve that, then generally
# that is fine - this test is not intended to be quick, it is intended to
# be repeatable.
# Note - no 'set -e' in this file - if one of the metrics tests fails
# then we wish to continue to try the rest.
# Finally at the end, in some situations, we explicitly exit with a
# failure code if necessary.
SCRIPT_DIR=$(dirname "$(readlink -f "$0")")
source "${SCRIPT_DIR}/../lib/common.bash"
RESULTS_DIR=${SCRIPT_DIR}/../results
# By default we run all the tests
RUN_ALL=1
help() {
usage=$(cat << EOF
Usage: $0 [-h] [options]
Description:
This script gathers a number of metrics for use in the
report generation script. Which tests are run can be
configured on the commandline. Specifically enabling
individual tests will disable the 'all' option, unless
'all' is also specified last.
Options:
-a, Run all tests (default).
-d, Run the density tests.
-h, Print this help.
-s, Run the storage tests.
-t, Run the time tests.
EOF
)
echo "$usage"
}
# Set up the initial state
init() {
metrics_onetime_init
local OPTIND
while getopts "adhst" opt;do
case ${opt} in
a)
RUN_ALL=1
;;
d)
RUN_DENSITY=1
RUN_ALL=
;;
h)
help
exit 0;
;;
s)
RUN_STORAGE=1
RUN_ALL=
;;
t)
RUN_TIME=1
RUN_ALL=
;;
?)
# parse failure
help
die "Failed to parse arguments"
;;
esac
done
shift $((OPTIND-1))
}
run_density_ksm() {
echo "Running KSM density tests"
# Run the memory footprint test - the main test that
# KSM affects. Run for a sufficient number of containers
# (that gives us a fair view of how memory gets shared across
# containers), and a large enough timeout for KSM to settle.
# If KSM has not settled down by then, just take the measurement.
# 'auto' mode should detect when KSM has settled automatically.
bash density/memory_usage.sh 20 300 auto
# Get a measure for the overhead we take from the container memory
bash density/memory_usage_inside_container.sh
}
run_density() {
echo "Running non-KSM density tests"
# Run the density tests - no KSM, so no need to wait for settle
# Set a token short timeout, and use enough containers to get a
# good average measurement.
bash density/memory_usage.sh 20 5
}
run_time() {
echo "Running time tests"
# Run the time tests - take time measures for an ubuntu image, over
# 100 'first and only container' launches.
# NOTE - whichever container you test here must support a full 'date'
# command - busybox based containers (including Alpine) will not work.
bash time/launch_times.sh -i public.ecr.aws/ubuntu/ubuntu:latest -n 100
}
run_storage() {
echo "Running storage tests"
bash storage/blogbench.sh
}
# Execute metrics scripts
run() {
pushd "$SCRIPT_DIR/.."
# If KSM is available on this platform, let's run any tests that are
# affected by having KSM on/off first, and then turn it off for the
# rest of the tests, as KSM may introduce some extra noise in the
# results by stealing CPU time for instance.
if [[ -f ${KSM_ENABLE_FILE} ]]; then
# No point enabling and disabling KSM if we have nothing to test.
if [ -n "$RUN_ALL" ] || [ -n "$RUN_DENSITY" ]; then
save_ksm_settings
trap restore_ksm_settings EXIT QUIT KILL
set_ksm_aggressive
run_density_ksm
# And now ensure KSM is turned off for the rest of the tests
disable_ksm
fi
else
echo "No KSM control file, skipping KSM tests"
fi
if [ -n "$RUN_ALL" ] || [ -n "$RUN_TIME" ]; then
run_time
fi
if [ -n "$RUN_ALL" ] || [ -n "$RUN_DENSITY" ]; then
run_density
fi
if [ -n "$RUN_ALL" ] || [ -n "$RUN_STORAGE" ]; then
run_storage
fi
popd
}
finish() {
echo "Now please create a suitably descriptively named subdirectory in"
echo "$RESULTS_DIR and copy the .json results files into it before running"
echo "this script again."
}
init "$@"
run
finish

View File

@ -1,97 +0,0 @@
#!/bin/bash
# Copyright (c) 2018-2023 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
# Take the data found in subdirectories of the metrics 'results' directory,
# and turn them into a PDF report. Use a Dockerfile containing all the tooling
# and scripts we need to do that.
set -e
SCRIPT_PATH=$(dirname "$(readlink -f "$0")")
source "${SCRIPT_PATH}/../lib/common.bash"
IMAGE="${IMAGE:-metrics-report}"
DOCKERFILE="${SCRIPT_PATH}/report_dockerfile/Dockerfile"
HOST_INPUT_DIR="${SCRIPT_PATH}/../results"
R_ENV_FILE="${HOST_INPUT_DIR}/Env.R"
HOST_OUTPUT_DIR="${SCRIPT_PATH}/output"
GUEST_INPUT_DIR="/inputdir/"
GUEST_OUTPUT_DIR="/outputdir/"
# If in debugging mode, we also map in the scripts dir so you can
# dynamically edit and re-load them at the R prompt
HOST_SCRIPT_DIR="${SCRIPT_PATH}/report_dockerfile"
GUEST_SCRIPT_DIR="/scripts/"
setup() {
echo "Checking subdirectories"
check_subdir="$(ls -dx ${HOST_INPUT_DIR}/*/ 2> /dev/null | wc -l)"
if [ $check_subdir -eq 0 ]; then
die "No subdirs in [${HOST_INPUT_DIR}] to read results from."
fi
echo "Checking Dockerfile"
check_dockerfiles_images "$IMAGE" "$DOCKERFILE"
mkdir -p "$HOST_OUTPUT_DIR" && true
echo "inputdir=\"${GUEST_INPUT_DIR}\"" > ${R_ENV_FILE}
echo "outputdir=\"${GUEST_OUTPUT_DIR}\"" >> ${R_ENV_FILE}
# A bit of a hack to get an R syntax'd list of dirs to process
# Also, need it as not host-side dir path - so short relative names
resultdirs="$(cd ${HOST_INPUT_DIR}; ls -dx */)"
resultdirslist=$(echo ${resultdirs} | sed 's/ \+/", "/g')
echo "resultdirs=c(" >> ${R_ENV_FILE}
echo " \"${resultdirslist}\"" >> ${R_ENV_FILE}
echo ")" >> ${R_ENV_FILE}
}
run() {
docker run -ti --rm -v ${HOST_INPUT_DIR}:${GUEST_INPUT_DIR} -v ${HOST_OUTPUT_DIR}:${GUEST_OUTPUT_DIR} ${extra_volumes} ${IMAGE} ${extra_command}
ls -la ${HOST_OUTPUT_DIR}/*
}
help() {
usage=$(cat << EOF
Usage: $0 [-h] [options]
Description:
This script generates a metrics report document
from the results directory one level up in the
directory tree (../results).
Options:
-d, Run in debug (interactive) mode
-h, Print this help
EOF
)
echo "$usage"
}
main() {
local OPTIND
while getopts "d" opt;do
case ${opt} in
d)
# In debug mode, run a shell instead of the default report generation
extra_command="bash"
extra_volumes="-v ${HOST_SCRIPT_DIR}:${GUEST_SCRIPT_DIR}"
;;
?)
# parse failure
help
die "Failed to parse arguments"
;;
esac
done
shift $((OPTIND-1))
setup
run
}
main "$@"

View File

@ -1,44 +0,0 @@
# Copyright (c) 2018-2023 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
# Set up an Ubuntu image with the components needed to generate a
# metrics report. That includes:
# - R
# - The R 'tidyverse'
# - pandoc
# - The report generation R files and helper scripts
# Start with the base rocker tidyverse.
# We would have used the 'verse' base, that already has some of the docs processing
# installed, but I could not figure out how to add in the extra bits we needed to
# the lite tex version is uses.
# Here we specify a tag for base image instead of using latest to let it free from
# the risk from the update of latest base image.
FROM rocker/tidyverse:3.6.0
# Version of the Dockerfile
LABEL DOCKERFILE_VERSION="1.2"
# Without this some of the package installs stop to try and ask questions...
ENV DEBIAN_FRONTEND=noninteractive
# Install the extra doc processing parts we need for our Rmarkdown PDF flow.
RUN apt-get update -qq && \
apt-get install -y --no-install-recommends \
texlive-latex-base \
texlive-fonts-recommended \
latex-xcolor && \
apt-get clean && \
rm -rf /var/lib/apt/lists
# Install the extra R packages we need.
RUN install2.r --error --deps TRUE \
gridExtra \
ggpubr
# Pull in our actual worker scripts
COPY . /scripts
# By default generate the report
CMD ["/scripts/genreport.sh"]

View File

@ -1,122 +0,0 @@
#!/usr/bin/env Rscript
# Copyright (c) 2018-2023 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
# Display details for the 'Device Under Test', for all data sets being processed.
suppressMessages(suppressWarnings(library(tidyr))) # for gather().
library(tibble)
suppressMessages(suppressWarnings(library(plyr))) # rbind.fill
# So we can plot multiple graphs
library(gridExtra) # together.
suppressMessages(suppressWarnings(library(ggpubr))) # for ggtexttable.
suppressMessages(library(jsonlite)) # to load the data.
# A list of all the known results files we might find the information inside.
resultsfiles=c(
"boot-times.json",
"memory-footprint.json",
"memory-footprint-ksm.json",
"memory-footprint-inside-container.json"
)
data=c()
stats=c()
stats_names=c()
# For each set of results
for (currentdir in resultdirs) {
count=1
dirstats=c()
for (resultsfile in resultsfiles) {
fname=paste(inputdir, currentdir, resultsfile, sep="/")
if ( !file.exists(fname)) {
#warning(paste("Skipping non-existent file: ", fname))
next
}
# Derive the name from the test result dirname
datasetname=basename(currentdir)
# Import the data
fdata=fromJSON(fname)
if (length(fdata$'kata-env') != 0 ) {
# We have kata-runtime data
dirstats=tibble("Run Ver"=as.character(fdata$'kata-env'$Runtime$Version$Semver))
dirstats=cbind(dirstats, "Run SHA"=as.character(fdata$'kata-env'$Runtime$Version$Commit))
pver=as.character(fdata$'kata-env'$Proxy$Version)
pver=sub("^[[:alpha:][:blank:]-]*", "", pver)
# uncomment if you want to drop the commit sha as well
#pver=sub("([[:digit:].]*).*", "\\1", pver)
dirstats=cbind(dirstats, "Proxy Ver"=pver)
# Trim the shim string
sver=as.character(fdata$'kata-env'$Shim$Version)
sver=sub("^[[:alpha:][:blank:]-]*", "", sver)
# uncomment if you want to drop the commit sha as well
#sver=sub("([[:digit:].]*).*", "\\1", sver)
dirstats=cbind(dirstats, "Shim Ver"=sver)
# Default QEMU ver string is far too long and noisy - trim.
hver=as.character(fdata$'kata-env'$Hypervisor$Version)
hver=sub("^[[:alpha:][:blank:]]*", "", hver)
hver=sub("([[:digit:].]*).*", "\\1", hver)
dirstats=cbind(dirstats, "Hyper Ver"=hver)
iver=as.character(fdata$'kata-env'$Image$Path)
iver=sub("^[[:alpha:]/-]*", "", iver)
dirstats=cbind(dirstats, "Image Ver"=iver)
kver=as.character(fdata$'kata-env'$Kernel$Path)
kver=sub("^[[:alpha:]/-]*", "", kver)
dirstats=cbind(dirstats, "Guest Krnl"=kver)
dirstats=cbind(dirstats, "Host arch"=as.character(fdata$'kata-env'$Host$Architecture))
dirstats=cbind(dirstats, "Host Distro"=as.character(fdata$'kata-env'$Host$Distro$Name))
dirstats=cbind(dirstats, "Host DistVer"=as.character(fdata$'kata-env'$Host$Distro$Version))
dirstats=cbind(dirstats, "Host Model"=as.character(fdata$'kata-env'$Host$CPU$Model))
dirstats=cbind(dirstats, "Host Krnl"=as.character(fdata$'kata-env'$Host$Kernel))
dirstats=cbind(dirstats, "runtime"=as.character(fdata$test$runtime))
break
} else {
if (length(fdata$'runc-env') != 0 ) {
dirstats=tibble("Run Ver"=as.character(fdata$'runc-env'$Version$Semver))
dirstats=cbind(dirstats, "Run SHA"=as.character(fdata$'runc-env'$Version$Commit))
dirstats=cbind(dirstats, "runtime"=as.character(fdata$test$runtime))
} else {
dirstats=tibble("runtime"="Unknown")
}
break
}
}
if ( length(dirstats) == 0 ) {
warning(paste("No valid data found for directory ", currentdir))
}
# use plyr rbind.fill so we can combine disparate version info frames
stats=rbind.fill(stats, dirstats)
stats_names=rbind(stats_names, datasetname)
}
rownames(stats) = stats_names
# Rotate the tibble so we get data dirs as the columns
spun_stats = as_tibble(cbind(What=names(stats), t(stats)))
# Build us a text table of numerical results
stats_plot = suppressWarnings(ggtexttable(data.frame(spun_stats, check.names=FALSE),
theme=ttheme(base_size=6),
rows=NULL
))
# It may seem odd doing a grid of 1x1, but it should ensure we get a uniform format and
# layout to match the other charts and tables in the report.
master_plot = grid.arrange(
stats_plot,
nrow=1,
ncol=1 )

View File

@ -1,269 +0,0 @@
#!/usr/bin/env Rscript
# Copyright (c) 2018-2023 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
# Display details for `fio` random read storage IO tests.
library(ggplot2) # ability to plot nicely
library(gridExtra) # So we can plot multiple graphs together
suppressMessages(suppressWarnings(library(ggpubr))) # for ggtexttable
suppressMessages(library(jsonlite)) # to load the data
suppressMessages(suppressWarnings(library(tidyr))) # for gather
library(tibble)
testnames=c(
"fio-randread-128",
"fio-randread-256",
"fio-randread-512",
"fio-randread-1k",
"fio-randread-2k",
"fio-randread-4k",
"fio-randread-8k",
"fio-randread-16k",
"fio-randread-32k",
"fio-randread-64k"
)
data2=c()
all_ldata=c()
all_ldata2=c()
stats=c()
rstats=c()
rstats_names=c()
# Where to store up the stats for the tables
read_bw_stats=c()
read_iops_stats=c()
read_lat95_stats=c()
read_lat99_stats=c()
# For each set of results
for (currentdir in resultdirs) {
bw_dirstats=c()
iops_dirstats=c()
lat95_dirstats=c()
lat99_dirstats=c()
# Derive the name from the test result dirname
datasetname=basename(currentdir)
for (testname in testnames) {
fname=paste(inputdir, currentdir, testname, '.json', sep="")
if ( !file.exists(fname)) {
#warning(paste("Skipping non-existent file: ", fname))
next
}
# Import the data
fdata=fromJSON(fname)
# De-ref the test named unique data
fdata=fdata[[testname]]
blocksize=fdata$Raw$'global options'$bs
# Extract the latency data - it comes as a table of percentiles, so
# we have to do a little work...
clat=data.frame(clat_ns=fdata$Raw$jobs[[1]]$read$clat_ns$percentile)
# Generate a clat data set with 'clean' percentile numbers so
# we can sensibly plot it later on.
clat2=clat
colnames(clat2)<-sub("clat_ns.", "", colnames(clat2))
colnames(clat2)<-sub("0000", "", colnames(clat2))
ldata2=gather(clat2)
colnames(ldata2)[colnames(ldata2)=="key"] <- "percentile"
colnames(ldata2)[colnames(ldata2)=="value"] <- "ms"
ldata2$ms=ldata2$ms/1000000 #ns->ms
ldata2=cbind(ldata2, runtime=rep(datasetname, length(ldata2$percentile)))
ldata2=cbind(ldata2, blocksize=rep(blocksize, length(ldata2$percentile)))
# Pull the 95 and 99 percentile numbers for the boxplot
# Plotting all values for all runtimes and blocksizes is just way too
# noisy to make a meaninful picture, so we use this subset.
# Our values fall more in the range of ms...
pc95data=tibble(percentile=clat$clat_ns.95.000000/1000000)
pc95data=cbind(pc95data, runtime=rep(paste(datasetname, "95pc", sep="-"), length(pc95data$percentile)))
pc99data=tibble(percentile=clat$clat_ns.99.000000/1000000)
pc99data=cbind(pc99data, runtime=rep(paste(datasetname, "99pc", sep="-"), length(pc95data$percentile)))
ldata=rbind(pc95data, pc99data)
ldata=cbind(ldata, blocksize=rep(blocksize, length(ldata$percentile)))
# We want total bandwidth, so that is the sum of the bandwidths
# from all the read 'jobs'.
mdata=data.frame(read_bw_mps=as.numeric(sum(fdata$Raw$jobs[[1]]$read$bw)/1024))
mdata=cbind(mdata, iops_tot=as.numeric(sum(fdata$Raw$jobs[[1]]$read$iops)))
mdata=cbind(mdata, runtime=rep(datasetname, length(mdata[, "read_bw_mps"]) ))
mdata=cbind(mdata, blocksize=rep(blocksize, length(mdata[, "read_bw_mps"]) ))
# Extract the stats tables
bw_dirstats=rbind(bw_dirstats, round(mdata$read_bw_mps, digits=1))
# Rowname hack to get the blocksize recorded
rownames(bw_dirstats)[nrow(bw_dirstats)]=blocksize
iops_dirstats=rbind(iops_dirstats, round(mdata$iops_tot, digits=1))
rownames(iops_dirstats)[nrow(iops_dirstats)]=blocksize
# And do the 95 and 99 percentiles as tables as well
lat95_dirstats=rbind(lat95_dirstats, round(mean(clat$clat_ns.95.000000)/1000000, digits=1))
rownames(lat95_dirstats)[nrow(lat95_dirstats)]=blocksize
lat99_dirstats=rbind(lat99_dirstats, round(mean(clat$clat_ns.99.000000)/1000000, digits=1))
rownames(lat99_dirstats)[nrow(lat99_dirstats)]=blocksize
# Collect up as sets across all files and runtimes.
data2=rbind(data2, mdata)
all_ldata=rbind(all_ldata, ldata)
all_ldata2=rbind(all_ldata2, ldata2)
}
# Collect up for each dir we process into a column
read_bw_stats=cbind(read_bw_stats, bw_dirstats)
colnames(read_bw_stats)[ncol(read_bw_stats)]=datasetname
read_iops_stats=cbind(read_iops_stats, iops_dirstats)
colnames(read_iops_stats)[ncol(read_iops_stats)]=datasetname
read_lat95_stats=cbind(read_lat95_stats, lat95_dirstats)
colnames(read_lat95_stats)[ncol(read_lat95_stats)]=datasetname
read_lat99_stats=cbind(read_lat99_stats, lat99_dirstats)
colnames(read_lat99_stats)[ncol(read_lat99_stats)]=datasetname
}
# To get a nice looking table, we need to extract the rownames into their
# own column
read_bw_stats=cbind(Bandwidth=rownames(read_bw_stats), read_bw_stats)
read_bw_stats=cbind(read_bw_stats, Units=rep("MB/s", nrow(read_bw_stats)))
read_iops_stats=cbind(IOPS=rownames(read_iops_stats), read_iops_stats)
read_iops_stats=cbind(read_iops_stats, Units=rep("IOP/s", nrow(read_iops_stats)))
read_lat95_stats=cbind('lat 95pc'=rownames(read_lat95_stats), read_lat95_stats)
read_lat95_stats=cbind(read_lat95_stats, Units=rep("ms", nrow(read_lat95_stats)))
read_lat99_stats=cbind('lat 99pc'=rownames(read_lat99_stats), read_lat99_stats)
read_lat99_stats=cbind(read_lat99_stats, Units=rep("ms", nrow(read_lat99_stats)))
# Bandwidth line plot
read_bw_line_plot <- ggplot() +
geom_line( data=data2, aes(blocksize, read_bw_mps, group=runtime, color=runtime)) +
ylim(0, NA) +
ggtitle("Random Read total bandwidth") +
xlab("Blocksize") +
ylab("Bandwidth (MiB/s)") +
theme(
axis.text.x=element_text(angle=90),
legend.position=c(0.35,0.8),
legend.title=element_text(size=5),
legend.text=element_text(size=5),
legend.background = element_rect(fill=alpha('blue', 0.2))
)
# IOPS line plot
read_iops_line_plot <- ggplot() +
geom_line( data=data2, aes(blocksize, iops_tot, group=runtime, color=runtime)) +
ylim(0, NA) +
ggtitle("Random Read total IOPS") +
xlab("Blocksize") +
ylab("IOPS") +
theme(
axis.text.x=element_text(angle=90),
legend.position=c(0.35,0.8),
legend.title=element_text(size=5),
legend.text=element_text(size=5),
legend.background = element_rect(fill=alpha('blue', 0.2))
)
# 95 and 99 percentile box plot
read_clat_box_plot <- ggplot() +
geom_boxplot( data=all_ldata, aes(blocksize, percentile, color=runtime)) +
stat_summary( data=all_ldata, aes(blocksize, percentile, group=runtime, color=runtime), fun.y=mean, geom="line") +
ylim(0, NA) +
ggtitle("Random Read completion latency", subtitle="95&99 percentiles, boxplot over jobs") +
xlab("Blocksize") +
ylab("Latency (ms)") +
theme(axis.text.x=element_text(angle=90)) +
# Use the 'paired' colour matrix as we are setting these up as pairs of
# 95 and 99 percentiles, and it is much easier to visually group those to
# each runtime if we use this colourmap.
scale_colour_brewer(palette="Paired")
# it would be nice to use the same legend theme as the other plots on this
# page, but because of the number of entries it tends to flow off the picture.
# theme(
# axis.text.x=element_text(angle=90),
# legend.position=c(0.35,0.8),
# legend.title=element_text(size=5),
# legend.text=element_text(size=5),
# legend.background = element_rect(fill=alpha('blue', 0.2))
# )
# As the boxplot is actually quite hard to interpret, also show a linegraph
# of all the percentiles for a single blocksize.
which_blocksize='4k'
clat_line_subtitle=paste("For blocksize", which_blocksize, sep=" ")
single_blocksize=subset(all_ldata2, blocksize==which_blocksize)
clat_line=aggregate(
single_blocksize$ms,
by=list(
percentile=single_blocksize$percentile,
blocksize=single_blocksize$blocksize,
runtime=single_blocksize$runtime
),
FUN=mean
)
clat_line$percentile=as.numeric(clat_line$percentile)
read_clat_line_plot <- ggplot() +
geom_line( data=clat_line, aes(percentile, x, group=runtime, color=runtime)) +
ylim(0, NA) +
ggtitle("Random Read completion latency percentiles", subtitle=clat_line_subtitle) +
xlab("Percentile") +
ylab("Time (ms)") +
theme(
axis.text.x=element_text(angle=90),
legend.position=c(0.35,0.8),
legend.title=element_text(size=5),
legend.text=element_text(size=5),
legend.background = element_rect(fill=alpha('blue', 0.2))
)
# Output the pretty pictures
graphics_plot = grid.arrange(
read_bw_line_plot,
read_iops_line_plot,
read_clat_box_plot,
read_clat_line_plot,
nrow=2,
ncol=2 )
# A bit of an odd tweak to force a pagebreak between the pictures and
# the tables. This only works because we have a `results='asis'` in the Rmd
# R fragment.
cat("\n\n\\pagebreak\n")
read_bw_stats_plot = suppressWarnings(ggtexttable(read_bw_stats,
theme=ttheme(base_size=10),
rows=NULL
))
read_iops_stats_plot = suppressWarnings(ggtexttable(read_iops_stats,
theme=ttheme(base_size=10),
rows=NULL
))
read_lat95_stats_plot = suppressWarnings(ggtexttable(read_lat95_stats,
theme=ttheme(base_size=10),
rows=NULL
))
read_lat99_stats_plot = suppressWarnings(ggtexttable(read_lat99_stats,
theme=ttheme(base_size=10),
rows=NULL
))
# and then the statistics tables
stats_plot = grid.arrange(
read_bw_stats_plot,
read_iops_stats_plot,
read_lat95_stats_plot,
read_lat99_stats_plot,
nrow=4,
ncol=1 )

View File

@ -1,260 +0,0 @@
#!/usr/bin/env Rscript
# Copyright (c) 2018-2023 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
# Display details for 'fio' random writes storage IO tests.
library(ggplot2) # ability to plot nicely
library(gridExtra) # So we can plot multiple graphs together
suppressMessages(suppressWarnings(library(ggpubr))) # for ggtexttable
suppressMessages(library(jsonlite)) # to load the data
suppressMessages(suppressWarnings(library(tidyr))) # for gather
library(tibble)
testnames=c(
"fio-randwrite-128",
"fio-randwrite-256",
"fio-randwrite-512",
"fio-randwrite-1k",
"fio-randwrite-2k",
"fio-randwrite-4k",
"fio-randwrite-8k",
"fio-randwrite-16k",
"fio-randwrite-32k",
"fio-randwrite-64k"
)
data2=c()
all_ldata=c()
all_ldata2=c()
stats=c()
rstats=c()
rstats_names=c()
# Where to store up the stats for the tables
write_bw_stats=c()
write_iops_stats=c()
write_lat95_stats=c()
write_lat99_stats=c()
# For each set of results
for (currentdir in resultdirs) {
bw_dirstats=c()
iops_dirstats=c()
lat95_dirstats=c()
lat99_dirstats=c()
# Derive the name from the test result dirname
datasetname=basename(currentdir)
for (testname in testnames) {
fname=paste(inputdir, currentdir, testname, '.json', sep="")
if ( !file.exists(fname)) {
#warning(paste("Skipping non-existent file: ", fname))
next
}
# Import the data
fdata=fromJSON(fname)
# De-nest the test specific named data
fdata=fdata[[testname]]
blocksize=fdata$Raw$'global options'$bs
# Extract the latency data - it comes as a table of percentiles, so
# we have to do a little work...
clat=data.frame(clat_ns=fdata$Raw$jobs[[1]]$write$clat_ns$percentile)
# Generate a clat data set with 'clean' percentile numbers so
# we can sensibly plot it later on.
clat2=clat
colnames(clat2)<-sub("clat_ns.", "", colnames(clat2))
colnames(clat2)<-sub("0000", "", colnames(clat2))
ldata2=gather(clat2)
colnames(ldata2)[colnames(ldata2)=="key"] <- "percentile"
colnames(ldata2)[colnames(ldata2)=="value"] <- "ms"
ldata2$ms=ldata2$ms/1000000 #ns->ms
ldata2=cbind(ldata2, runtime=rep(datasetname, length(ldata2$percentile)))
ldata2=cbind(ldata2, blocksize=rep(blocksize, length(ldata2$percentile)))
# Pull the 95 and 99 percentiles for the boxplot diagram.
# Our values fall more in the range of ms...
pc95data=tibble(percentile=clat$clat_ns.95.000000/1000000)
pc95data=cbind(pc95data, runtime=rep(paste(datasetname, "95pc", sep="-"), length(pc95data$percentile)))
pc99data=tibble(percentile=clat$clat_ns.99.000000/1000000)
pc99data=cbind(pc99data, runtime=rep(paste(datasetname, "99pc", sep="-"), length(pc95data$percentile)))
ldata=rbind(pc95data, pc99data)
ldata=cbind(ldata, blocksize=rep(blocksize, length(ldata$percentile)))
# We want total bandwidth, so that is the sum of the bandwidths
# from all the write 'jobs'.
mdata=data.frame(write_bw_mps=as.numeric(sum(fdata$Raw$jobs[[1]]$write$bw)/1024))
mdata=cbind(mdata, iops_tot=as.numeric(sum(fdata$Raw$jobs[[1]]$write$iops)))
mdata=cbind(mdata, runtime=rep(datasetname, length(mdata[, "write_bw_mps"]) ))
mdata=cbind(mdata, blocksize=rep(blocksize, length(mdata[, "write_bw_mps"]) ))
# Extract the stats tables
bw_dirstats=rbind(bw_dirstats, round(mdata$write_bw_mps, digits=1))
# Rowname hack to get the blocksize recorded
rownames(bw_dirstats)[nrow(bw_dirstats)]=blocksize
iops_dirstats=rbind(iops_dirstats, round(mdata$iops_tot, digits=1))
rownames(iops_dirstats)[nrow(iops_dirstats)]=blocksize
# And do the 95 and 99 percentiles as tables as well
lat95_dirstats=rbind(lat95_dirstats, round(mean(clat$clat_ns.95.000000)/1000000, digits=1))
rownames(lat95_dirstats)[nrow(lat95_dirstats)]=blocksize
lat99_dirstats=rbind(lat99_dirstats, round(mean(clat$clat_ns.99.000000)/1000000, digits=1))
rownames(lat99_dirstats)[nrow(lat99_dirstats)]=blocksize
# Store away as single sets
data2=rbind(data2, mdata)
all_ldata=rbind(all_ldata, ldata)
all_ldata2=rbind(all_ldata2, ldata2)
}
# Collect up for each dir we process into a column
write_bw_stats=cbind(write_bw_stats, bw_dirstats)
colnames(write_bw_stats)[ncol(write_bw_stats)]=datasetname
write_iops_stats=cbind(write_iops_stats, iops_dirstats)
colnames(write_iops_stats)[ncol(write_iops_stats)]=datasetname
write_lat95_stats=cbind(write_lat95_stats, lat95_dirstats)
colnames(write_lat95_stats)[ncol(write_lat95_stats)]=datasetname
write_lat99_stats=cbind(write_lat99_stats, lat99_dirstats)
colnames(write_lat99_stats)[ncol(write_lat99_stats)]=datasetname
}
# To get a nice looking table, we need to extract the rownames into their
# own column
write_bw_stats=cbind(Bandwidth=rownames(write_bw_stats), write_bw_stats)
write_bw_stats=cbind(write_bw_stats, Units=rep("MB/s", nrow(write_bw_stats)))
write_iops_stats=cbind(IOPS=rownames(write_iops_stats), write_iops_stats)
write_iops_stats=cbind(write_iops_stats, Units=rep("IOP/s", nrow(write_iops_stats)))
write_lat95_stats=cbind('lat 95pc'=rownames(write_lat95_stats), write_lat95_stats)
write_lat95_stats=cbind(write_lat95_stats, Units=rep("ms", nrow(write_lat95_stats)))
write_lat99_stats=cbind('lat 99pc'=rownames(write_lat99_stats), write_lat99_stats)
write_lat99_stats=cbind(write_lat99_stats, Units=rep("ms", nrow(write_lat99_stats)))
# lineplot of total bandwidth across blocksizes.
write_bw_line_plot <- ggplot() +
geom_line( data=data2, aes(blocksize, write_bw_mps, group=runtime, color=runtime)) +
ylim(0, NA) +
ggtitle("Random Write total bandwidth") +
xlab("Blocksize") +
ylab("Bandwidth (MiB/s)") +
theme(
axis.text.x=element_text(angle=90),
legend.position=c(0.35,0.8),
legend.title=element_text(size=5),
legend.text=element_text(size=5),
legend.background = element_rect(fill=alpha('blue', 0.2))
)
# lineplot of IOPS across blocksizes
write_iops_line_plot <- ggplot() +
geom_line( data=data2, aes(blocksize, iops_tot, group=runtime, color=runtime)) +
ylim(0, NA) +
ggtitle("Random Write total IOPS") +
xlab("Blocksize") +
ylab("IOPS") +
theme(
axis.text.x=element_text(angle=90),
legend.position=c(0.35,0.8),
legend.title=element_text(size=5),
legend.text=element_text(size=5),
legend.background = element_rect(fill=alpha('blue', 0.2))
)
# boxplot of 95 and 99 percentiles covering the parallel jobs, shown across
# the blocksizes.
write_clat_box_plot <- ggplot() +
geom_boxplot( data=all_ldata, aes(blocksize, percentile, color=runtime)) +
stat_summary( data=all_ldata, aes(blocksize, percentile, group=runtime, color=runtime), fun.y=mean, geom="line") +
ylim(0, NA) +
ggtitle("Random Write completion latency", subtitle="95&99 Percentiles, boxplot across jobs") +
xlab("Blocksize") +
ylab("Latency (ms)") +
theme(axis.text.x=element_text(angle=90)) +
# Use the 'paired' colour matrix as we are setting these up as pairs of
# 95 and 99 percentiles, and it is much easier to visually group those to
# each runtime if we use this colourmap.
scale_colour_brewer(palette="Paired")
# completion latency line plot across the percentiles, for a specific blocksize only
# as otherwise the graph would be far too noisy.
which_blocksize='4k'
clat_line_subtitle=paste("For blocksize", which_blocksize, sep=" ")
single_blocksize=subset(all_ldata2, blocksize==which_blocksize)
clat_line=aggregate(
single_blocksize$ms,
by=list(
percentile=single_blocksize$percentile,
blocksize=single_blocksize$blocksize,
runtime=single_blocksize$runtime
),
FUN=mean
)
clat_line$percentile=as.numeric(clat_line$percentile)
write_clat_line_plot <- ggplot() +
geom_line( data=clat_line, aes(percentile, x, group=runtime, color=runtime)) +
ylim(0, NA) +
ggtitle("Random Write completion latency percentiles", subtitle=clat_line_subtitle) +
xlab("Percentile") +
ylab("Time (ms)") +
theme(
axis.text.x=element_text(angle=90),
legend.position=c(0.35,0.8),
legend.title=element_text(size=5),
legend.text=element_text(size=5),
legend.background = element_rect(fill=alpha('blue', 0.2))
)
master_plot = grid.arrange(
write_bw_line_plot,
write_iops_line_plot,
write_clat_box_plot,
write_clat_line_plot,
nrow=2,
ncol=2 )
# A bit of an odd tweak to force a pagebreak between the pictures and
# the tables. This only works because we have a `results='asis'` in the Rmd
# R fragment.
cat("\n\n\\pagebreak\n")
write_bw_stats_plot = suppressWarnings(ggtexttable(write_bw_stats,
theme=ttheme(base_size=10),
rows=NULL
))
write_iops_stats_plot = suppressWarnings(ggtexttable(write_iops_stats,
theme=ttheme(base_size=10),
rows=NULL
))
write_lat95_stats_plot = suppressWarnings(ggtexttable(write_lat95_stats,
theme=ttheme(base_size=10),
rows=NULL
))
write_lat99_stats_plot = suppressWarnings(ggtexttable(write_lat99_stats,
theme=ttheme(base_size=10),
rows=NULL
))
# and then the statistics tables
stats_plot = grid.arrange(
write_bw_stats_plot,
write_iops_stats_plot,
write_lat95_stats_plot,
write_lat99_stats_plot,
nrow=4,
ncol=1 )

View File

@ -1,111 +0,0 @@
#!/usr/bin/env Rscript
# Copyright (c) 2018-2023 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
# Show system memory reduction, and hence container 'density', by analysing the
# scaling footprint data results and the 'system free' memory.
library(ggplot2) # ability to plot nicely.
# So we can plot multiple graphs
library(gridExtra) # together.
suppressMessages(suppressWarnings(library(ggpubr))) # for ggtexttable.
suppressMessages(library(jsonlite)) # to load the data.
testnames=c(
paste("footprint-busybox.*", test_name_extra, sep=""),
paste("footprint-mysql.*", test_name_extra, sep=""),
paste("footprint-elasticsearch.*", test_name_extra, sep="")
)
data=c()
stats=c()
rstats=c()
rstats_names=c()
for (currentdir in resultdirs) {
count=1
dirstats=c()
for (testname in testnames) {
matchdir=paste(inputdir, currentdir, sep="")
matchfile=paste(testname, '\\.json', sep="")
files=list.files(matchdir, pattern=matchfile)
if ( length(files) == 0 ) {
#warning(paste("Pattern [", matchdir, "/", matchfile, "] matched nothing"))
}
for (ffound in files) {
fname=paste(inputdir, currentdir, ffound, sep="")
if ( !file.exists(fname)) {
warning(paste("Skipping non-existent file: ", fname))
next
}
# Derive the name from the test result dirname
datasetname=basename(currentdir)
# Import the data
fdata=fromJSON(fname)
# De-nest the test name specific data
shortname=substr(ffound, 1, nchar(ffound)-nchar(".json"))
fdata=fdata[[shortname]]
payload=fdata$Config$payload
testname=paste(datasetname, payload)
cdata=data.frame(avail_mb=as.numeric(fdata$Results$system$avail)/(1024*1024))
cdata=cbind(cdata, avail_decr=as.numeric(fdata$Results$system$avail_decr))
cdata=cbind(cdata, count=seq_len(length(cdata[, "avail_mb"])))
cdata=cbind(cdata, testname=rep(testname, length(cdata[, "avail_mb"]) ))
cdata=cbind(cdata, payload=rep(payload, length(cdata[, "avail_mb"]) ))
cdata=cbind(cdata, dataset=rep(datasetname, length(cdata[, "avail_mb"]) ))
# Gather our statistics
sdata=data.frame(num_containers=length(cdata[, "avail_mb"]))
# Pick out the last avail_decr value - which in theory should be
# the most we have consumed...
sdata=cbind(sdata, mem_consumed=cdata[, "avail_decr"][length(cdata[, "avail_decr"])])
sdata=cbind(sdata, avg_bytes_per_c=sdata$mem_consumed / sdata$num_containers)
sdata=cbind(sdata, runtime=testname)
# Store away as a single set
data=rbind(data, cdata)
stats=rbind(stats, sdata)
s = c(
"Test"=testname,
"n"=sdata$num_containers,
"size"=(sdata$mem_consumed) / 1024,
"kb/n"=round((sdata$mem_consumed / sdata$num_containers) / 1024, digits=1),
"n/Gb"= round((1*1024*1024*1024) / (sdata$mem_consumed / sdata$num_containers), digits=1)
)
rstats=rbind(rstats, s)
count = count + 1
}
}
}
# Set up the text table headers
colnames(rstats)=c("Test", "n", "Tot_Kb", "avg_Kb", "n_per_Gb")
# Build us a text table of numerical results
stats_plot = suppressWarnings(ggtexttable(data.frame(rstats),
theme=ttheme(base_size=10),
rows=NULL
))
# plot how samples varioed over 'time'
line_plot <- ggplot() +
geom_point( data=data, aes(count, avail_mb, group=testname, color=payload, shape=dataset)) +
geom_line( data=data, aes(count, avail_mb, group=testname, color=payload)) +
xlab("Containers") +
ylab("System Avail (Mb)") +
ggtitle("System Memory free") +
ylim(0, NA) +
theme(axis.text.x=element_text(angle=90))
master_plot = grid.arrange(
line_plot,
stats_plot,
nrow=2,
ncol=1 )

View File

@ -1,19 +0,0 @@
#!/bin/bash
# Copyright (c) 2018-2023 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
REPORTNAME="metrics_report.pdf"
cd scripts
Rscript --slave -e "library(knitr);knit('pdf.Rmd')"
Rscript --slave -e "library(knitr);pandoc('pdf.md', format='latex')"
Rscript --slave -e "library(knitr);knit('html.Rmd')"
Rscript --slave -e "library(knitr);pandoc('html.md', format='html')"
cp /scripts/pdf.pdf /outputdir/${REPORTNAME}
cp /scripts/figure/*.png /outputdir/
echo "PNGs of graphs and tables can be found in the output directory."
echo "The report, named ${REPORTNAME}, can be found in the output directory"

View File

@ -1,23 +0,0 @@
---
# Copyright (c) 2018-2023 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
#
title: "Kata Containers metrics report"
author: "Auto generated"
date: "`r format(Sys.time(), '%d %B, %Y')`"
output:
html_document:
urlcolor: blue
---
```{r setup, include=FALSE}
#Set these opts to get pdf images which fit into beamer slides better
opts_chunk$set(dev = 'png')
# Pick up any env set by the invoking script, such as the root dir of the
# results data tree
source("/inputdir/Env.R")
```
```{r child = 'metrics_report.Rmd'}
```

View File

@ -1,157 +0,0 @@
#!/usr/bin/env Rscript
# Copyright (c) 2018-2023 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
# Display how long the various phases of a container lifecycle (run, execute, die etc.
# take.
library(ggplot2) # ability to plot nicely.
suppressMessages(suppressWarnings(library(tidyr))) # for gather().
# So we can plot multiple graphs
library(gridExtra) # together.
suppressMessages(suppressWarnings(library(ggpubr))) # for ggtexttable.
suppressMessages(library(jsonlite)) # to load the data.
testnames=c(
"boot-times"
)
data=c()
stats=c()
rstats=c()
rstats_names=c()
# For each set of results
for (currentdir in resultdirs) {
count=1
dirstats=c()
for (testname in testnames) {
fname=paste(inputdir, currentdir, testname, '.json', sep="")
if ( !file.exists(fname)) {
warning(paste("Skipping non-existent file: ", fname))
next
}
# Derive the name from the test result dirname
datasetname=basename(currentdir)
# Import the data
fdata=fromJSON(fname)
# De-nest the test specific name
fdata=fdata[[testname]]
cdata=data.frame(workload=as.numeric(fdata$Results$'to-workload'$Result))
cdata=cbind(cdata, quit=as.numeric(fdata$Results$'to-quit'$Result))
cdata=cbind(cdata, tokernel=as.numeric(fdata$Results$'to-kernel'$Result))
cdata=cbind(cdata, inkernel=as.numeric(fdata$Results$'in-kernel'$Result))
cdata=cbind(cdata, total=as.numeric(fdata$Results$'total'$Result))
cdata=cbind(cdata, count=seq_len(length(cdata[,"workload"])))
cdata=cbind(cdata, runtime=rep(datasetname, length(cdata[, "workload"]) ))
# Calculate some stats for total time
sdata=data.frame(workload_mean=mean(cdata$workload))
sdata=cbind(sdata, workload_min=min(cdata$workload))
sdata=cbind(sdata, workload_max=max(cdata$workload))
sdata=cbind(sdata, workload_sd=sd(cdata$workload))
sdata=cbind(sdata, workload_cov=((sdata$workload_sd / sdata$workload_mean) * 100))
sdata=cbind(sdata, runtime=datasetname)
sdata=cbind(sdata, quit_mean = mean(cdata$quit))
sdata=cbind(sdata, quit_min = min(cdata$quit))
sdata=cbind(sdata, quit_max = max(cdata$quit))
sdata=cbind(sdata, quit_sd = sd(cdata$quit))
sdata=cbind(sdata, quit_cov = (sdata$quit_sd / sdata$quit_mean) * 100)
sdata=cbind(sdata, tokernel_mean = mean(cdata$tokernel))
sdata=cbind(sdata, inkernel_mean = mean(cdata$inkernel))
sdata=cbind(sdata, total_mean = mean(cdata$total))
# Store away as a single set
data=rbind(data, cdata)
stats=rbind(stats, sdata)
# Store away some stats for the text table
dirstats[count]=round(sdata$tokernel_mean, digits=2)
count = count + 1
dirstats[count]=round(sdata$inkernel_mean, digits=2)
count = count + 1
dirstats[count]=round(sdata$workload_mean, digits=2)
count = count + 1
dirstats[count]=round(sdata$quit_mean, digits=2)
count = count + 1
dirstats[count]=round(sdata$total_mean, digits=2)
count = count + 1
}
rstats=rbind(rstats, dirstats)
rstats_names=rbind(rstats_names, datasetname)
}
unts=c("s", "s", "s", "s", "s")
rstats=rbind(rstats, unts)
rstats_names=rbind(rstats_names, "Units")
# If we have only 2 sets of results, then we can do some more
# stats math for the text table
if (length(resultdirs) == 2) {
# This is a touch hard wired - but we *know* we only have two
# datasets...
diff=c()
for( i in 1:5) {
difference = as.double(rstats[2,i]) - as.double(rstats[1,i])
val = 100 * (difference/as.double(rstats[1,i]))
diff[i] = paste(round(val, digits=2), "%", sep=" ")
}
rstats=rbind(rstats, diff)
rstats_names=rbind(rstats_names, "Diff")
}
rstats=cbind(rstats_names, rstats)
# Set up the text table headers
colnames(rstats)=c("Results", "2k", "ik", "2w", "2q", "tot")
# Build us a text table of numerical results
stats_plot = suppressWarnings(ggtexttable(data.frame(rstats, check.names=FALSE),
theme=ttheme(base_size=8),
rows=NULL
))
# plot how samples varioed over 'time'
line_plot <- ggplot() +
geom_line( data=data, aes(count, workload, color=runtime)) +
geom_smooth( data=data, aes(count, workload, color=runtime), se=FALSE, method="loess") +
xlab("Iteration") +
ylab("Time (s)") +
ggtitle("Boot to workload", subtitle="First container") +
ylim(0, NA) +
theme(axis.text.x=element_text(angle=90))
boot_boxplot <- ggplot() +
geom_boxplot( data=data, aes(runtime, workload, color=runtime), show.legend=FALSE) +
ylim(0, NA) +
ylab("Time (s)")
# convert the stats to a long format so we can more easily do a side-by-side barplot
longstats <- gather(stats, measure, value, workload_mean, quit_mean, inkernel_mean, tokernel_mean, total_mean)
bar_plot <- ggplot() +
geom_bar( data=longstats, aes(measure, value, fill=runtime), stat="identity", position="dodge", show.legend=FALSE) +
xlab("Phase") +
ylab("Time (s)") +
ggtitle("Lifecycle phase times", subtitle="Mean") +
ylim(0, NA) +
theme(axis.text.x=element_text(angle=90))
master_plot = grid.arrange(
bar_plot,
line_plot,
stats_plot,
boot_boxplot,
nrow=2,
ncol=2 )

View File

@ -1,142 +0,0 @@
#!/usr/bin/env Rscript
# Copyright (c) 2018-2023 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
# Analyse the runtime component memory footprint data.
library(ggplot2) # ability to plot nicely.
# So we can plot multiple graphs
library(gridExtra) # together.
suppressMessages(suppressWarnings(library(ggpubr))) # for ggtexttable.
suppressMessages(library(jsonlite)) # to load the data.
testnames=c(
"memory-footprint-inside-container"
)
data=c()
rstats=c()
rstats_rows=c()
rstats_cols=c()
# For each set of results
for (currentdir in resultdirs) {
dirstats=c()
# For the two different types of memory footprint measures
for (testname in testnames) {
# R seems not to like double path slashes '//' ?
fname=paste(inputdir, currentdir, testname, '.json', sep="")
if ( !file.exists(fname)) {
warning(paste("Skipping non-existent file: ", fname))
next
}
# Derive the name from the test result dirname
datasetname=basename(currentdir)
# Import the data
fdata=fromJSON(fname)
fdata=fdata[[testname]]
# Copy the average result into a shorter, more accesible name
fdata$requested=fdata$Results$memrequest$Result
fdata$total=fdata$Results$memtotal$Result
fdata$free=fdata$Results$memfree$Result
fdata$avail=fdata$Results$memavailable$Result
# And lets work out what % we have 'lost' between the amount requested
# and the total the container actually sees.
fdata$lost=fdata$requested - fdata$total
fdata$pctotal= 100 * (fdata$lost/ fdata$requested)
fdata$Runtime=rep(datasetname, length(fdata$Result) )
# Store away the bits we need
data=rbind(data, data.frame(
Result=fdata$requested,
Type="requested",
Runtime=fdata$Runtime ))
data=rbind(data, data.frame(
Result=fdata$total,
Type="total",
Runtime=fdata$Runtime ))
data=rbind(data, data.frame(
Result=fdata$free,
Type="free",
Runtime=fdata$Runtime ))
data=rbind(data, data.frame(
Result=fdata$avail,
Type="avail",
Runtime=fdata$Runtime ))
data=rbind(data, data.frame(
Result=fdata$lost,
Type="lost",
Runtime=fdata$Runtime ))
data=rbind(data, data.frame(
Result=fdata$pctotal,
Type="% consumed",
Runtime=fdata$Runtime ))
# Store away some stats for the text table
dirstats=rbind(dirstats, round(fdata$requested, digits=2) )
dirstats=rbind(dirstats, round(fdata$total, digits=2) )
dirstats=rbind(dirstats, round(fdata$free, digits=2) )
dirstats=rbind(dirstats, round(fdata$avail, digits=2) )
dirstats=rbind(dirstats, round(fdata$lost, digits=2) )
dirstats=rbind(dirstats, round(fdata$pctotal, digits=2) )
}
rstats=cbind(rstats, dirstats)
rstats_cols=append(rstats_cols, datasetname)
}
rstats_rows=c("Requested", "Total", "Free", "Avail", "Consumed", "% Consumed")
unts=c("Kb", "Kb", "Kb", "Kb", "Kb", "%")
rstats=cbind(rstats, unts)
rstats_cols=append(rstats_cols, "Units")
# If we have only 2 sets of results, then we can do some more
# stats math for the text table
if (length(resultdirs) == 2) {
# This is a touch hard wired - but we *know* we only have two
# datasets...
diff=c()
# Just the first three entries - meaningless for the pctotal entry
for (n in 1:5) {
difference = (as.double(rstats[n,2]) - as.double(rstats[n,1]))
val = 100 * (difference/as.double(rstats[n,1]))
diff=rbind(diff, round(val, digits=2))
}
# Add a blank entry for the other entries
diff=rbind(diff, "")
rstats=cbind(rstats, diff)
rstats_cols=append(rstats_cols, "Diff %")
}
# Build us a text table of numerical results
stats_plot = suppressWarnings(ggtexttable(data.frame(rstats),
theme=ttheme(base_size=10),
rows=rstats_rows, cols=rstats_cols
))
bardata <- subset(data, Type %in% c("requested", "total", "free", "avail"))
# plot how samples varioed over 'time'
barplot <- ggplot() +
geom_bar(data=bardata, aes(Type, Result, fill=Runtime), stat="identity", position="dodge") +
xlab("Measure") +
ylab("Size (Kb)") +
ggtitle("In-container memory statistics") +
ylim(0, NA) +
theme(axis.text.x=element_text(angle=90))
master_plot = grid.arrange(
barplot,
stats_plot,
nrow=2,
ncol=1 )

View File

@ -1,121 +0,0 @@
#!/usr/bin/env Rscript
# Copyright (c) 2018-2023 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
# Analyse the runtime component memory footprint data.
library(ggplot2) # ability to plot nicely.
# So we can plot multiple graphs
library(gridExtra) # together.
suppressMessages(suppressWarnings(library(ggpubr))) # for ggtexttable.
suppressMessages(library(jsonlite)) # to load the data.
testnames=c(
"memory-footprint",
"memory-footprint-ksm"
)
resultsfilesshort=c(
"noKSM",
"KSM"
)
data=c()
rstats=c()
rstats_names=c()
# For each set of results
for (currentdir in resultdirs) {
count=1
dirstats=c()
# For the two different types of memory footprint measures
for (testname in testnames) {
# R seems not to like double path slashes '//' ?
fname=paste(inputdir, currentdir, testname, '.json', sep="")
if ( !file.exists(fname)) {
warning(paste("Skipping non-existent file: ", fname))
next
}
# Derive the name from the test result dirname
datasetname=basename(currentdir)
datasetvariant=resultsfilesshort[count]
# Import the data
fdata=fromJSON(fname)
fdata=fdata[[testname]]
# Copy the average result into a shorter, more accesible name
fdata$Result=fdata$Results$average$Result
fdata$variant=rep(datasetvariant, length(fdata$Result) )
fdata$Runtime=rep(datasetname, length(fdata$Result) )
fdata$Count=seq_len(length(fdata$Result))
# Calculate some stats
fdata.mean = mean(fdata$Result)
fdata.min = min(fdata$Result)
fdata.max = max(fdata$Result)
fdata.sd = sd(fdata$Result)
fdata.cov = (fdata.sd / fdata.mean) * 100
# Store away the bits we need
data=rbind(data, data.frame(
Result=fdata$Result,
Count=fdata$Count,
Runtime=fdata$Runtime,
variant=fdata$variant ) )
# Store away some stats for the text table
dirstats[count]=round(fdata.mean, digits=2)
count = count + 1
}
rstats=rbind(rstats, dirstats)
rstats_names=rbind(rstats_names, datasetname)
}
rstats=cbind(rstats_names, rstats)
unts=rep("Kb", length(resultdirs))
# If we have only 2 sets of results, then we can do some more
# stats math for the text table
if (length(resultdirs) == 2) {
# This is a touch hard wired - but we *know* we only have two
# datasets...
diff=c("diff")
difference = (as.double(rstats[2,2]) - as.double(rstats[1,2]))
val = 100 * (difference/as.double(rstats[1,2]))
diff[2] = round(val, digits=2)
difference = (as.double(rstats[2,3]) - as.double(rstats[1,3]))
val = 100 * (difference/as.double(rstats[1,3]))
diff[3] = round(val, digits=2)
rstats=rbind(rstats, diff)
unts[3]="%"
}
rstats=cbind(rstats, unts)
# Set up the text table headers
colnames(rstats)=c("Results", resultsfilesshort, "Units")
# Build us a text table of numerical results
stats_plot = suppressWarnings(ggtexttable(data.frame(rstats),
theme=ttheme(base_size=10),
rows=NULL
))
# plot how samples varioed over 'time'
point_plot <- ggplot() +
geom_point( data=data, aes(Runtime, Result, color=variant), position=position_dodge(0.1)) +
xlab("Dataset") +
ylab("Size (Kb)") +
ggtitle("Average PSS footprint", subtitle="per container") +
ylim(0, NA) +
theme(axis.text.x=element_text(angle=90))
master_plot = grid.arrange(
point_plot,
stats_plot,
nrow=1,
ncol=2 )

View File

@ -1,63 +0,0 @@
---
# Copyright (c) 2018-2023 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
---
\pagebreak
# Introduction
This report compares the metrics between multiple sets of data generated from
the [Kata Containers report generation scripts](https://github.com/kata-containers/kata-containers/tree/main/metrics/report/README.md).
This report was generated using the data from the **`r resultdirs`** results directories.
\pagebreak
# Container scaling system footprint
This [test](https://github.com/kata-containers/kata-containers/blob/main/metrics/density/footprint_data.sh)
measures the system memory footprint impact whilst running an increasing number
of containers. For this test, [KSM](https://en.wikipedia.org/wiki/Kernel_same-page_merging)
is enabled. The results show how system memory is consumed for different sized
containers, and their average system memory footprint cost and density (how many
containers you can fit per Gb) is calculated.
```{r footprint-density, echo=FALSE, fig.cap="System Memory density"}
test_name_extra="-ksm"
source('footprint-density.R')
rm(test_name_extra)
```
\pagebreak
# Memory used inside container
This [test](https://github.com/kata-containers/kata-containers/blob/main/metrics/density/memory_usage_inside_container.sh)
measures the memory inside a container taken by the container runtime. It shows the difference between the amount of memory requested for the container, and the amount the container can actually 'see'.
The *% Consumed* is the key row in the table, which compares the *Requested* against *Total* values.
```{r mem-in-cont, echo=FALSE, fig.cap="System Memory density"}
source('mem-in-cont.R')
```
\pagebreak
# Container boot lifecycle times
This [test](https://github.com/kata-containers/kata-containers/blob/main/metrics/time/launch_times.sh)
uses the `date` command on the host and in the container, as well as data from the container
kernel `dmesg`, to ascertain how long different phases of the create/boot/run/delete
Docker container lifecycle take for the first launched container.
To decode the stats table, the prefixes are 'to(`2`)' and '`i`n'. The suffixes are '`k`ernel', '`w`orkload' and '`q`uit'. 'tot' is the total time for a complete container start-to-finished cycle.
```{r lifecycle-time, echo=FALSE, fig.cap="Execution lifecycle times"}
source('lifecycle-time.R')
```
\pagebreak
# Test setup details
This table describes the test system details, as derived from the information contained
in the test results files.
```{r dut, echo=FALSE, fig.cap="System configuration details"}
source('dut-details.R')

View File

@ -1,132 +0,0 @@
#!/usr/bin/env Rscript
# Copyright (c) 2018-2023 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
# Analyse the runtime component memory footprint data.
library(ggplot2) # ability to plot nicely.
# So we can plot multiple graphs
library(gridExtra) # together.
suppressMessages(suppressWarnings(library(ggpubr))) # for ggtexttable.
suppressMessages(library(jsonlite)) # to load the data.
testnames=c(
"cpu-information"
)
resultsfilesshort=c(
"CPU"
)
data=c()
rstats=c()
rstats_rows=c()
rstats_cols=c()
Gdenom = (1000.0 * 1000.0 * 1000.0)
# For each set of results
for (currentdir in resultdirs) {
dirstats=c()
# For the two different types of memory footprint measures
for (testname in testnames) {
# R seems not to like double path slashes '//' ?
fname=paste(inputdir, currentdir, testname, '.json', sep="")
if ( !file.exists(fname)) {
warning(paste("Skipping non-existent file: ", fname))
next
}
# Derive the name from the test result dirname
datasetname=basename(currentdir)
datasetvariant=resultsfilesshort[count]
# Import the data
fdata=fromJSON(fname)
fdata=fdata[[testname]]
# Copy the average result into a shorter, more accesible name
fdata$ips=fdata$Results$"instructions per cycle"$Result
fdata$Gcycles=fdata$Results$cycles$Result / Gdenom
fdata$Ginstructions=fdata$Results$instructions$Result / Gdenom
fdata$variant=rep(datasetvariant, length(fdata$Result) )
fdata$Runtime=rep(datasetname, length(fdata$Result) )
# Store away the bits we need
data=rbind(data, data.frame(
Result=fdata$ips,
Type="ips",
Runtime=fdata$Runtime,
variant=fdata$variant ) )
data=rbind(data, data.frame(
Result=fdata$Gcycles,
Type="Gcycles",
Runtime=fdata$Runtime,
variant=fdata$variant ) )
data=rbind(data, data.frame(
Result=fdata$Ginstructions,
Type="Ginstr",
Runtime=fdata$Runtime,
variant=fdata$variant ) )
# Store away some stats for the text table
dirstats=rbind(dirstats, round(fdata$ips, digits=2) )
dirstats=rbind(dirstats, round(fdata$Gcycles, digits=2) )
dirstats=rbind(dirstats, round(fdata$Ginstructions, digits=2) )
}
rstats=cbind(rstats, dirstats)
rstats_cols=append(rstats_cols, datasetname)
}
rstats_rows=c("IPS", "GCycles", "GInstr")
unts=c("Ins/Cyc", "G", "G")
rstats=cbind(rstats, unts)
rstats_cols=append(rstats_cols, "Units")
# If we have only 2 sets of results, then we can do some more
# stats math for the text table
if (length(resultdirs) == 2) {
# This is a touch hard wired - but we *know* we only have two
# datasets...
diff=c()
for (n in 1:3) {
difference = (as.double(rstats[n,2]) - as.double(rstats[n,1]))
val = 100 * (difference/as.double(rstats[n,1]))
diff=rbind(diff, round(val, digits=2))
}
rstats=cbind(rstats, diff)
rstats_cols=append(rstats_cols, "Diff %")
}
# Build us a text table of numerical results
stats_plot = suppressWarnings(ggtexttable(data.frame(rstats),
theme=ttheme(base_size=10),
rows=rstats_rows, cols=rstats_cols
))
# plot how samples varioed over 'time'
ipsdata <- subset(data, Type %in% c("ips"))
ips_plot <- ggplot() +
geom_bar(data=ipsdata, aes(Type, Result, fill=Runtime), stat="identity", position="dodge") +
xlab("Measure") +
ylab("IPS") +
ggtitle("Instructions Per Cycle") +
ylim(0, NA) +
theme(axis.text.x=element_text(angle=90))
cycdata <- subset(data, Type %in% c("Gcycles", "Ginstr"))
cycles_plot <- ggplot() +
geom_bar(data=cycdata, aes(Type, Result, fill=Runtime), stat="identity", position="dodge", show.legend=FALSE) +
xlab("Measure") +
ylab("Count (G)") +
ggtitle("Cycles and Instructions") +
ylim(0, NA) +
theme(axis.text.x=element_text(angle=90))
master_plot = grid.arrange(
ips_plot,
cycles_plot,
stats_plot,
nrow=2,
ncol=2 )