🔥 Delete performance_analysis directory (#1252)

This commit is contained in:
M. Mert Yildiran 2022-11-24 18:36:03 -08:00 committed by GitHub
parent 8778e5770c
commit 9aeb1fadea
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
6 changed files with 0 additions and 391 deletions

View File

@ -1,107 +0,0 @@
# Performance analysis
This directory contains tools for analyzing tapper performance.
# Periodic tapper logs
In tapper logs there are some periodic lines that shows its internal state and consumed resources.
Internal state example (formatted and commented):
```
stats - {
"processedBytes":468940592, // how many bytes we read from pcap
"packetsCount":174883, // how many packets we read from pcap
"tcpPacketsCount":174883, // how many tcp packets we read from pcap
"reassembledTcpPayloadsCount":66893, // how many chunks sent to tcp stream
"matchedPairs":24821, // how many request response pairs found
"droppedTcpStreams":2 // how many tcp streams remained stale and dropped
}
```
Consumed resources example (formatted and commented):
```
mem: 24441240, // golang heap size
goroutines: 29, // how many goroutines
cpu: 91.208791, // how much cpu the tapper process consume (in percentage per core)
cores: 16, // how many cores there are on the machine
rss: 87052288 // how many bytes held by the tapper process
```
# Plot tapper logs
In order to plot a tapper log or many logs into a graph, use the `plot_from_tapper_logs.py` util.
It gets a list of tapper logs as a parameter, and output an image with a nice graph.
The log file names should be named in this format `XX_DESCRIPTION.log` when XX is the number between determining the color of the output graph and description is the name of the series. It allows for easy comparison between various modes.
Example run:
```
cd $KUBESHARK_HOME/performance_analysis
virtualenv venv
source venv/bin/activate
pip install -r requirements.txt
python plot_from_tapper_logs.py 00_tapper.log
```
# Tapper Modes
Every packet seen by the tapper is processed in a pipeline that contains various stages.
* Pcap - Read the packet from libpcap
* Assembler - Assemble the packet into a TcpStream
* TcpStream - Hold stream information and TcpReaders
* Dissectors - Read from TcpReader and recognize the packet content and protocol.
* Emit - Marshal the request response pair into a Json
* Send - Send the Json to Api Server
Tapper can be run with various debug modes:
* No Pcap - Start the tapper process, but don't read from any packets from pcap
* No Assembler - Read packets from pcap, but don't assemble them
* No TcpStream - Assemble the packets, but don't create TcpStream for them
* No Dissectors - Create a TcpStream for the packets, but don't dissect their content
* No Emit - Dissect the TcpStream, but don't emit the matched request response pair
* No Send - Emit the request response pair, but don't send them to the Api Server.
* Regular mode
![Tapper Modes](https://github.com/kubeshark/kubeshark/blob/debug/profile-tapper-benchmark/performance_analysis/tapper-modes.png)
# Run benchmark with various tapper modes
## Prerequisite
In order to run the benchmark you probably want:
1. An up and running Api Server
2. An up and running Basenine
3. An up and running UI (optional)
4. An up and running test server, like nginx, that can return a known payload at a known endpoint.
5. Set KUBESHARK_HOME environment variable to points to kubeshark directory
6. Install the `hey` tool
## Running the benchmark
In order to run a benchmark use the `run_tapper_benchmark.sh` script.
Example run:
```
cd $KUBESHARK_HOME/performance_analysis
source venv/bin/activate # Assuming you already run plot_from_tapper_logs.py
./run_tapper_benchmark.sh
```
Running it without params use the default values, use the following environment variables for customization:
```
export=KUBESHARK_BENCHMARK_OUTPUT_DIR=/path/to/dir # Set the output directory for tapper logs and graph
export=KUBESHARK_BENCHMARK_CLIENT_PERIOD=1m # How long each test run
export=KUBESHARK_BENCHMARK_URL=http://server:port/path # The URL to use for the benchmarking process (the test server endpoint)
export=KUBESHARK_BENCHMARK_RUN_COUNT=3 # How many times each tapper mode should run
export=KUBESHARK_BENCHMARK_QPS=250 # How many queries per second the each client should send to the test server
export=KUBESHARK_BENCHMARK_CLIENTS_COUNT=5 # How many clients should run in parallel during the benchmark
```
# Example output graph
An example output graph from a 15 min run with 15K payload and 1000 QPS looks like
![Example Graph](https://github.com/kubeshark/kubeshark/blob/debug/profile-tapper-benchmark/performance_analysis/example-graph.png)

Binary file not shown.

Before

Width:  |  Height:  |  Size: 327 KiB

View File

@ -1,182 +0,0 @@
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pathlib
import re
import sys
import typing
COLORMAP = plt.get_cmap('turbo')
# Extract cpu and rss samples from log files and plot them
# Input: List of log files
#
# example:
# python plot_from_tapper_logs.py 01_no_pcap_01.log 99_normal_00.log
#
# The script assumes that the log file names start with a number (pattern '\d+')
# and groups based on this number. Files that start will the same number will be plotted with the same color.
# Change group_pattern to an empty string to disable this, or change to a regex of your liking.
def get_sample(name: str, line: str, default_value: float):
pattern = name + r': ?(\d+(\.\d+)?)'
maybe_sample = re.findall(pattern, line)
if len(maybe_sample) == 0:
return default_value
sample = float(maybe_sample[0][0])
return sample
def append_sample(name: str, line: str, samples: typing.List[float]):
sample = get_sample(name, line, -1)
if sample == -1:
return
samples.append(sample)
def extract_samples(f: typing.IO) -> typing.Tuple[pd.Series, pd.Series, pd.Series, pd.Series, pd.Series, pd.Series, pd.Series, pd.Series]:
cpu_samples = []
rss_samples = []
count_samples = []
matched_samples = []
live_samples = []
processed_samples = []
heap_samples = []
goroutines_samples = []
for line in f:
append_sample('cpu', line, cpu_samples)
append_sample('rss', line, rss_samples)
ignored_packets_count = get_sample('"ignoredPacketsCount"', line, -1)
packets_count = get_sample('"packetsCount"', line, -1)
if ignored_packets_count != -1 and packets_count != -1:
count_samples.append(packets_count - ignored_packets_count)
append_sample('"matchedPairs"', line, matched_samples)
append_sample('"liveTcpStreams"', line, live_samples)
append_sample('"processedBytes"', line, processed_samples)
append_sample('heap-alloc', line, heap_samples)
append_sample('goroutines', line, goroutines_samples)
cpu_samples = pd.Series(cpu_samples)
rss_samples = pd.Series(rss_samples)
count_samples = pd.Series(count_samples)
matched_samples = pd.Series(matched_samples)
live_samples = pd.Series(live_samples)
processed_samples = pd.Series(processed_samples)
heap_samples = pd.Series(heap_samples)
goroutines_samples = pd.Series(goroutines_samples)
return cpu_samples, rss_samples, count_samples, matched_samples, live_samples, processed_samples, heap_samples, goroutines_samples
def plot(ax, df: pd.DataFrame, title: str, xlabel: str, ylabel: str, group_pattern: typing.Optional[str]):
if group_pattern:
color = get_group_color(df.columns, group_pattern)
df.plot(color=color, ax=ax)
else:
df.plot(cmap=COLORMAP, ax=ax)
ax.ticklabel_format(style='plain')
plt.title(title)
plt.legend()
plt.xlabel(xlabel)
plt.ylabel(ylabel)
def get_group_color(names, pattern):
props = [int(re.findall(pattern, pathlib.Path(name).name)[0]) for name in names]
key = dict(zip(sorted(list(set(props))), range(len(set(props)))))
n_colors = len(key)
color_options = plt.get_cmap('jet')(np.linspace(0, 1, n_colors))
groups = [key[prop] for prop in props]
color = color_options[groups] # type: ignore
return color
if __name__ == '__main__':
filenames = sys.argv[1:]
cpu_samples_all_files = []
rss_samples_all_files = []
count_samples_all_files = []
matched_samples_all_files = []
live_samples_all_files = []
processed_samples_all_files = []
heap_samples_all_files = []
goroutines_samples_all_files = []
for ii, filename in enumerate(filenames):
print("Analyzing {}".format(filename))
with open(filename, 'r') as f:
cpu_samples, rss_samples, count_samples, matched_samples, live_samples, processed_samples, heap_samples, goroutines_samples = extract_samples(f)
cpu_samples.name = pathlib.Path(filename).name
rss_samples.name = pathlib.Path(filename).name
count_samples.name = pathlib.Path(filename).name
matched_samples.name = pathlib.Path(filename).name
live_samples.name = pathlib.Path(filename).name
processed_samples.name = pathlib.Path(filename).name
heap_samples.name = pathlib.Path(filename).name
goroutines_samples.name = pathlib.Path(filename).name
cpu_samples_all_files.append(cpu_samples)
rss_samples_all_files.append(rss_samples)
count_samples_all_files.append(count_samples)
matched_samples_all_files.append(matched_samples)
live_samples_all_files.append(live_samples)
processed_samples_all_files.append(processed_samples)
heap_samples_all_files.append(heap_samples)
goroutines_samples_all_files.append(goroutines_samples)
cpu_samples_df = pd.concat(cpu_samples_all_files, axis=1)
rss_samples_df = pd.concat(rss_samples_all_files, axis=1)
count_samples_df = pd.concat(count_samples_all_files, axis=1)
matched_samples_df = pd.concat(matched_samples_all_files, axis=1)
live_samples_df = pd.concat(live_samples_all_files, axis=1)
processed_samples_df = pd.concat(processed_samples_all_files, axis=1)
heap_samples_df = pd.concat(heap_samples_all_files, axis=1)
goroutines_samples_df = pd.concat(goroutines_samples_all_files, axis=1)
group_pattern = r'^\d+'
cpu_plot = plt.subplot(8, 2, 1)
plot(cpu_plot, cpu_samples_df, 'cpu', '', 'cpu (%)', group_pattern)
cpu_plot.legend().remove()
mem_plot = plt.subplot(8, 2, 2)
plot(mem_plot, (rss_samples_df / 1024 / 1024), 'rss', '', 'mem (mega)', group_pattern)
mem_plot.legend(loc='center left', bbox_to_anchor=(1, 0.5))
packets_plot = plt.subplot(8, 2, 3)
plot(packets_plot, count_samples_df, 'packetsCount', '', 'packetsCount', group_pattern)
packets_plot.legend().remove()
matched_plot = plt.subplot(8, 2, 4)
plot(matched_plot, matched_samples_df, 'matchedCount', '', 'matchedCount', group_pattern)
matched_plot.legend().remove()
live_plot = plt.subplot(8, 2, 5)
plot(live_plot, live_samples_df, 'liveStreamsCount', '', 'liveStreamsCount', group_pattern)
live_plot.legend().remove()
processed_plot = plt.subplot(8, 2, 6)
plot(processed_plot, (processed_samples_df / 1024 / 1024), 'processedBytes', '', 'bytes (mega)', group_pattern)
processed_plot.legend().remove()
heap_plot = plt.subplot(8, 2, 7)
plot(heap_plot, (heap_samples_df / 1024 / 1024), 'heap', '', 'heap (mega)', group_pattern)
heap_plot.legend().remove()
goroutines_plot = plt.subplot(8, 2, 8)
plot(goroutines_plot, goroutines_samples_df, 'goroutines', '', 'goroutines', group_pattern)
goroutines_plot.legend().remove()
fig = plt.gcf()
fig.set_size_inches(20, 18)
print('Saving graph to graph.png')
plt.savefig('graph.png', bbox_inches='tight')

View File

@ -1,2 +0,0 @@
matplotlib
pandas

View File

@ -1,100 +0,0 @@
#!/bin/bash
[ -z "$KUBESHARK_HOME" ] && { echo "KUBESHARK_HOME is missing"; exit 1; }
[ -z "$KUBESHARK_BENCHMARK_OUTPUT_DIR" ] && export KUBESHARK_BENCHMARK_OUTPUT_DIR="/tmp/kubeshark-benchmark-results-$(date +%d-%m-%H-%M)"
[ -z "$KUBESHARK_BENCHMARK_CLIENT_PERIOD" ] && export KUBESHARK_BENCHMARK_CLIENT_PERIOD="1m"
[ -z "$KUBESHARK_BENCHMARK_URL" ] && export KUBESHARK_BENCHMARK_URL="http://localhost:8081/data/b.1000.json"
[ -z "$KUBESHARK_BENCHMARK_RUN_COUNT" ] && export KUBESHARK_BENCHMARK_RUN_COUNT="3"
[ -z "$KUBESHARK_BENCHMARK_QPS" ] && export KUBESHARK_BENCHMARK_QPS="500"
[ -z "$KUBESHARK_BENCHMARK_CLIENTS_COUNT" ] && export KUBESHARK_BENCHMARK_CLIENTS_COUNT="5"
function log() {
local message=$@
printf "[%s] %s\n" "$(date "+%d-%m %H:%M:%S")" "$message"
}
function run_single_bench() {
local mode_num=$1
local mode_str=$2
log "Starting ${mode_num}_${mode_str} (runs: $KUBESHARK_BENCHMARK_RUN_COUNT) (period: $KUBESHARK_BENCHMARK_CLIENT_PERIOD)"
for ((i=0;i<"$KUBESHARK_BENCHMARK_RUN_COUNT";i++)); do
log " $i: Running tapper"
rm -f tapper.log
tapper_args=("--tap" "--api-server-address" "ws://localhost:8899/wsTapper" "-stats" "10" "-ignore-ports" "8899,9099")
if [[ $(uname) == "Darwin" ]]
then
tapper_args+=("-i" "lo0" "-"decoder "Loopback")
else
tapper_args+=("-i" "lo")
fi
nohup ./agent/build/kubesharkagent ${tapper_args[@]} > tapper.log 2>&1 &
log " $i: Running client (hey)"
hey -z $KUBESHARK_BENCHMARK_CLIENT_PERIOD -c $KUBESHARK_BENCHMARK_CLIENTS_COUNT -q $KUBESHARK_BENCHMARK_QPS $KUBESHARK_BENCHMARK_URL > /dev/null || return 1
log " $i: Killing tapper"
kill -9 $(ps -ef | grep agent/build/kubesharkagent | grep tap | grep -v grep | awk '{ print $2 }') > /dev/null 2>&1
local output_file=$KUBESHARK_BENCHMARK_OUTPUT_DIR/${mode_num}_${mode_str}_${i}.log
log " $i: Moving output to $output_file"
mv tapper.log $output_file || return 1
done
}
function generate_bench_graph() {
cd performance_analysis/ || return 1
source venv/bin/activate
python plot_from_tapper_logs.py $KUBESHARK_BENCHMARK_OUTPUT_DIR/*.log || return 1
mv graph.png $KUBESHARK_BENCHMARK_OUTPUT_DIR || return 1
}
mkdir -p $KUBESHARK_BENCHMARK_OUTPUT_DIR
rm -f $KUBESHARK_BENCHMARK_OUTPUT_DIR/*
log "Writing output to $KUBESHARK_BENCHMARK_OUTPUT_DIR"
cd $KUBESHARK_HOME || exit 1
export HOST_MODE=0
export SENSITIVE_DATA_FILTERING_OPTIONS='{}'
export KUBESHARK_DEBUG_DISABLE_PCAP=false
export KUBESHARK_DEBUG_DISABLE_TCP_REASSEMBLY=false
export KUBESHARK_DEBUG_DISABLE_TCP_STREAM=false
export KUBESHARK_DEBUG_DISABLE_NON_HTTP_EXTENSSION=false
export KUBESHARK_DEBUG_DISABLE_DISSECTORS=false
export KUBESHARK_DEBUG_DISABLE_EMITTING=false
export KUBESHARK_DEBUG_DISABLE_SENDING=false
export KUBESHARK_DEBUG_DISABLE_PCAP=true
run_single_bench "01" "no_pcap" || exit 1
export KUBESHARK_DEBUG_DISABLE_PCAP=false
export KUBESHARK_DEBUG_DISABLE_TCP_REASSEMBLY=true
run_single_bench "02" "no_assembler" || exit 1
export KUBESHARK_DEBUG_DISABLE_TCP_REASSEMBLY=false
export KUBESHARK_DEBUG_DISABLE_TCP_STREAM=true
run_single_bench "03" "no_tcp_stream" || exit 1
export KUBESHARK_DEBUG_DISABLE_TCP_STREAM=false
export KUBESHARK_DEBUG_DISABLE_NON_HTTP_EXTENSSION=true
run_single_bench "04" "only_http" || exit 1
export KUBESHARK_DEBUG_DISABLE_NON_HTTP_EXTENSSION=false
export KUBESHARK_DEBUG_DISABLE_DISSECTORS=true
run_single_bench "05" "no_dissectors" || exit 1
export KUBESHARK_DEBUG_DISABLE_DISSECTORS=false
export KUBESHARK_DEBUG_DISABLE_EMITTING=true
run_single_bench "06" "no_emit" || exit 1
export KUBESHARK_DEBUG_DISABLE_EMITTING=false
export KUBESHARK_DEBUG_DISABLE_SENDING=true
run_single_bench "07" "no_send" || exit 1
export KUBESHARK_DEBUG_DISABLE_SENDING=false
run_single_bench "08" "normal" || exit 1
generate_bench_graph || exit 1
log "Output written to to $KUBESHARK_BENCHMARK_OUTPUT_DIR"

Binary file not shown.

Before

Width:  |  Height:  |  Size: 259 KiB