🔥 Delete performance_analysis directory (#1252)

2025-06-27 00:29:31 +00:00 · 2022-11-24 18:36:03 -08:00 · 2022-11-24 18:36:03 -08:00 · 9aeb1fadea
commit 9aeb1fadea
parent 8778e5770c
6 changed files with 0 additions and 391 deletions
--- a/performance_analysis/README.md
+++ b/performance_analysis/README.md
@ -1,107 +0,0 @@
-
-# Performance analysis
-
-This directory contains tools for analyzing tapper performance.
-
-# Periodic tapper logs
-
-In tapper logs there are some periodic lines that shows its internal state and consumed resources.
-
-Internal state example (formatted and commented):
-```
-stats - {
-	"processedBytes":468940592, // how many bytes we read from pcap
-	"packetsCount":174883, // how many packets we read from pcap
-	"tcpPacketsCount":174883, // how many tcp packets we read from pcap
-	"reassembledTcpPayloadsCount":66893, // how many chunks sent to tcp stream
-	"matchedPairs":24821, // how many request response pairs found
-	"droppedTcpStreams":2 // how many tcp streams remained stale and dropped
-}
-```
-
-Consumed resources example (formatted and commented):
-```
-mem: 24441240, // golang heap size
-goroutines: 29, // how many goroutines
-cpu: 91.208791, // how much cpu the tapper process consume (in percentage per core)
-cores: 16,  // how many cores there are on the machine
-rss: 87052288 // how many bytes held by the tapper process
-```
-
-# Plot tapper logs
-
-In order to plot a tapper log or many logs into a graph, use the `plot_from_tapper_logs.py` util.
-
-It gets a list of tapper logs as a parameter, and output an image with a nice graph.
-
-The log file names should be named in this format `XX_DESCRIPTION.log` when XX is the number between determining the color of the output graph and description is the name of the series. It allows for easy comparison between various modes.
-
-Example run:
-```
-cd $KUBESHARK_HOME/performance_analysis
-virtualenv venv
-source venv/bin/activate
-pip install -r requirements.txt
-python plot_from_tapper_logs.py 00_tapper.log
-```
-
-# Tapper Modes
-
-Every packet seen by the tapper is processed in a pipeline that contains various stages. 
-* Pcap - Read the packet from libpcap
-* Assembler - Assemble the packet into a TcpStream
-* TcpStream - Hold stream information and TcpReaders
-* Dissectors - Read from TcpReader and recognize the packet content and protocol.
-* Emit - Marshal the request response pair into a Json
-* Send - Send the Json to Api Server
-
-Tapper can be run with various debug modes:
-* No Pcap - Start the tapper process, but don't read from any packets from pcap
-* No Assembler - Read packets from pcap, but don't assemble them
-* No TcpStream - Assemble the packets, but don't create TcpStream for them
-* No Dissectors - Create a TcpStream for the packets, but don't dissect their content
-* No Emit - Dissect the TcpStream, but don't emit the matched request response pair 
-* No Send - Emit the request response pair, but don't send them to the Api Server.
-* Regular mode
-
-![Tapper Modes](https://github.com/kubeshark/kubeshark/blob/debug/profile-tapper-benchmark/performance_analysis/tapper-modes.png)
-
-# Run benchmark with various tapper modes
-
-## Prerequisite
-
-In order to run the benchmark you probably want:
-1. An up and running Api Server
-2. An up and running Basenine
-3. An up and running UI (optional)
-4. An up and running test server, like nginx, that can return a known payload at a known endpoint.
-5. Set KUBESHARK_HOME environment variable to points to kubeshark directory
-6. Install the `hey` tool
-
-## Running the benchmark
-
-In order to run a benchmark use the `run_tapper_benchmark.sh` script.
-
-Example run:
-```
-cd $KUBESHARK_HOME/performance_analysis
-source venv/bin/activate # Assuming you already run plot_from_tapper_logs.py 
-./run_tapper_benchmark.sh
-```
-
-Running it without params use the default values, use the following environment variables for customization:
-```
-export=KUBESHARK_BENCHMARK_OUTPUT_DIR=/path/to/dir # Set the output directory for tapper logs and graph
-export=KUBESHARK_BENCHMARK_CLIENT_PERIOD=1m # How long each test run
-export=KUBESHARK_BENCHMARK_URL=http://server:port/path # The URL to use for the benchmarking process (the test server endpoint)
-export=KUBESHARK_BENCHMARK_RUN_COUNT=3 # How many times each tapper mode should run
-export=KUBESHARK_BENCHMARK_QPS=250 # How many queries per second the each client should send to the test server
-export=KUBESHARK_BENCHMARK_CLIENTS_COUNT=5 # How many clients should run in parallel during the benchmark
-```
-
-# Example output graph
-
-An example output graph from a 15 min run with 15K payload and 1000 QPS looks like
-
-![Example Graph](https://github.com/kubeshark/kubeshark/blob/debug/profile-tapper-benchmark/performance_analysis/example-graph.png)
-
--- a/performance_analysis/example-graph.png
+++ b/performance_analysis/example-graph.png
--- a/performance_analysis/plot_from_tapper_logs.py
+++ b/performance_analysis/plot_from_tapper_logs.py
@ -1,182 +0,0 @@
-import matplotlib.pyplot as plt
-import numpy as np
-import pandas as pd
-import pathlib
-import re
-import sys
-import typing
-
-COLORMAP = plt.get_cmap('turbo')
-
-# Extract cpu and rss samples from log files and plot them
-# Input: List of log files
-#
-# example:
-#   python plot_from_tapper_logs.py 01_no_pcap_01.log 99_normal_00.log
-#
-# The script assumes that the log file names start with a number (pattern '\d+')
-# and groups based on this number. Files that start will the same number will be plotted with the same color.
-# Change group_pattern to an empty string to disable this, or change to a regex of your liking.
-
-
-def get_sample(name: str, line: str, default_value: float):
-    pattern = name + r': ?(\d+(\.\d+)?)'
-    maybe_sample = re.findall(pattern, line)
-    if len(maybe_sample) == 0:
-        return default_value
-
-    sample = float(maybe_sample[0][0])
-    return sample
-
-
-def append_sample(name: str, line: str, samples: typing.List[float]):
-    sample = get_sample(name, line, -1)
-
-    if sample == -1:
-        return
-
-    samples.append(sample)
-
-
-def extract_samples(f: typing.IO) -> typing.Tuple[pd.Series, pd.Series, pd.Series, pd.Series, pd.Series, pd.Series, pd.Series, pd.Series]:
-    cpu_samples = []
-    rss_samples = []
-    count_samples = []
-    matched_samples = []
-    live_samples = []
-    processed_samples = []
-    heap_samples = []
-    goroutines_samples = []
-    for line in f:
-        append_sample('cpu', line, cpu_samples)
-        append_sample('rss', line, rss_samples)
-        ignored_packets_count = get_sample('"ignoredPacketsCount"', line, -1)
-        packets_count = get_sample('"packetsCount"', line, -1)
-        if ignored_packets_count != -1 and packets_count != -1:
-            count_samples.append(packets_count - ignored_packets_count)
-        append_sample('"matchedPairs"', line, matched_samples)
-        append_sample('"liveTcpStreams"', line, live_samples)
-        append_sample('"processedBytes"', line, processed_samples)
-        append_sample('heap-alloc', line, heap_samples)
-        append_sample('goroutines', line, goroutines_samples)
-
-    cpu_samples = pd.Series(cpu_samples)
-    rss_samples = pd.Series(rss_samples)
-    count_samples = pd.Series(count_samples)
-    matched_samples = pd.Series(matched_samples)
-    live_samples = pd.Series(live_samples)
-    processed_samples = pd.Series(processed_samples)
-    heap_samples = pd.Series(heap_samples)
-    goroutines_samples = pd.Series(goroutines_samples)
-
-    return cpu_samples, rss_samples, count_samples, matched_samples, live_samples, processed_samples, heap_samples, goroutines_samples
-
-
-def plot(ax, df: pd.DataFrame, title: str, xlabel: str, ylabel: str, group_pattern: typing.Optional[str]):
-    if group_pattern:
-        color = get_group_color(df.columns, group_pattern)
-        df.plot(color=color, ax=ax)
-    else:
-        df.plot(cmap=COLORMAP, ax=ax)
-
-    ax.ticklabel_format(style='plain')
-    plt.title(title)
-    plt.legend()
-    plt.xlabel(xlabel)
-    plt.ylabel(ylabel)
-
-
-def get_group_color(names, pattern):
-    props = [int(re.findall(pattern, pathlib.Path(name).name)[0]) for name in names]
-    key = dict(zip(sorted(list(set(props))), range(len(set(props)))))
-    n_colors = len(key)
-    color_options = plt.get_cmap('jet')(np.linspace(0, 1, n_colors))
-    groups = [key[prop] for prop in props]
-    color = color_options[groups]  # type: ignore
-    return color
-
-
-if __name__ == '__main__':
-    filenames = sys.argv[1:]
-
-    cpu_samples_all_files = []
-    rss_samples_all_files = []
-    count_samples_all_files = []
-    matched_samples_all_files = []
-    live_samples_all_files = []
-    processed_samples_all_files = []
-    heap_samples_all_files = []
-    goroutines_samples_all_files = []
-
-    for ii, filename in enumerate(filenames):
-        print("Analyzing {}".format(filename))
-        with open(filename, 'r') as f:
-            cpu_samples, rss_samples, count_samples, matched_samples, live_samples, processed_samples, heap_samples, goroutines_samples = extract_samples(f)
-
-        cpu_samples.name = pathlib.Path(filename).name
-        rss_samples.name = pathlib.Path(filename).name
-        count_samples.name = pathlib.Path(filename).name
-        matched_samples.name = pathlib.Path(filename).name
-        live_samples.name = pathlib.Path(filename).name
-        processed_samples.name = pathlib.Path(filename).name
-        heap_samples.name = pathlib.Path(filename).name
-        goroutines_samples.name = pathlib.Path(filename).name
-
-        cpu_samples_all_files.append(cpu_samples)
-        rss_samples_all_files.append(rss_samples)
-        count_samples_all_files.append(count_samples)
-        matched_samples_all_files.append(matched_samples)
-        live_samples_all_files.append(live_samples)
-        processed_samples_all_files.append(processed_samples)
-        heap_samples_all_files.append(heap_samples)
-        goroutines_samples_all_files.append(goroutines_samples)
-
-    cpu_samples_df = pd.concat(cpu_samples_all_files, axis=1)
-    rss_samples_df = pd.concat(rss_samples_all_files, axis=1)
-    count_samples_df = pd.concat(count_samples_all_files, axis=1)
-    matched_samples_df = pd.concat(matched_samples_all_files, axis=1)
-    live_samples_df = pd.concat(live_samples_all_files, axis=1)
-    processed_samples_df = pd.concat(processed_samples_all_files, axis=1)
-    heap_samples_df = pd.concat(heap_samples_all_files, axis=1)
-    goroutines_samples_df = pd.concat(goroutines_samples_all_files, axis=1)
-
-    group_pattern = r'^\d+'
-
-    cpu_plot = plt.subplot(8, 2, 1)
-    plot(cpu_plot, cpu_samples_df, 'cpu', '', 'cpu (%)', group_pattern)
-    cpu_plot.legend().remove()
-
-    mem_plot = plt.subplot(8, 2, 2)
-    plot(mem_plot, (rss_samples_df / 1024 / 1024), 'rss', '', 'mem (mega)', group_pattern)
-    mem_plot.legend(loc='center left', bbox_to_anchor=(1, 0.5))
-
-    packets_plot = plt.subplot(8, 2, 3)
-    plot(packets_plot, count_samples_df, 'packetsCount', '', 'packetsCount', group_pattern)
-    packets_plot.legend().remove()
-
-    matched_plot = plt.subplot(8, 2, 4)
-    plot(matched_plot, matched_samples_df, 'matchedCount', '', 'matchedCount', group_pattern)
-    matched_plot.legend().remove()
-
-    live_plot = plt.subplot(8, 2, 5)
-    plot(live_plot, live_samples_df, 'liveStreamsCount', '', 'liveStreamsCount', group_pattern)
-    live_plot.legend().remove()
-
-    processed_plot = plt.subplot(8, 2, 6)
-    plot(processed_plot, (processed_samples_df / 1024 / 1024), 'processedBytes', '', 'bytes (mega)', group_pattern)
-    processed_plot.legend().remove()
-
-    heap_plot = plt.subplot(8, 2, 7)
-    plot(heap_plot, (heap_samples_df / 1024 / 1024), 'heap', '', 'heap (mega)', group_pattern)
-    heap_plot.legend().remove()
-
-    goroutines_plot = plt.subplot(8, 2, 8)
-    plot(goroutines_plot, goroutines_samples_df, 'goroutines', '', 'goroutines', group_pattern)
-    goroutines_plot.legend().remove()
-
-    fig = plt.gcf()
-    fig.set_size_inches(20, 18)
-
-    print('Saving graph to graph.png')
-    plt.savefig('graph.png', bbox_inches='tight')
-    
--- a/performance_analysis/requirements.txt
+++ b/performance_analysis/requirements.txt
@ -1,2 +0,0 @@
-matplotlib
-pandas
--- a/performance_analysis/run_tapper_benchmark.sh
+++ b/performance_analysis/run_tapper_benchmark.sh
@ -1,100 +0,0 @@
-#!/bin/bash
-
-[ -z "$KUBESHARK_HOME" ] && { echo "KUBESHARK_HOME is missing"; exit 1; }
-[ -z "$KUBESHARK_BENCHMARK_OUTPUT_DIR" ] && export KUBESHARK_BENCHMARK_OUTPUT_DIR="/tmp/kubeshark-benchmark-results-$(date +%d-%m-%H-%M)"
-[ -z "$KUBESHARK_BENCHMARK_CLIENT_PERIOD" ] && export KUBESHARK_BENCHMARK_CLIENT_PERIOD="1m"
-[ -z "$KUBESHARK_BENCHMARK_URL" ] && export KUBESHARK_BENCHMARK_URL="http://localhost:8081/data/b.1000.json"
-[ -z "$KUBESHARK_BENCHMARK_RUN_COUNT" ] && export KUBESHARK_BENCHMARK_RUN_COUNT="3"
-[ -z "$KUBESHARK_BENCHMARK_QPS" ] && export KUBESHARK_BENCHMARK_QPS="500"
-[ -z "$KUBESHARK_BENCHMARK_CLIENTS_COUNT" ] && export KUBESHARK_BENCHMARK_CLIENTS_COUNT="5"
-
-function log() {
-	local message=$@
-	printf "[%s] %s\n" "$(date "+%d-%m %H:%M:%S")" "$message"
-}
-
-function run_single_bench() {
-	local mode_num=$1
-	local mode_str=$2
-
-	log "Starting ${mode_num}_${mode_str} (runs: $KUBESHARK_BENCHMARK_RUN_COUNT) (period: $KUBESHARK_BENCHMARK_CLIENT_PERIOD)"
-
-	for ((i=0;i<"$KUBESHARK_BENCHMARK_RUN_COUNT";i++)); do
-		log "  $i: Running tapper"
-		rm -f tapper.log
-		tapper_args=("--tap" "--api-server-address" "ws://localhost:8899/wsTapper" "-stats" "10" "-ignore-ports" "8899,9099")
-		if [[ $(uname) == "Darwin" ]]
-		then
-			tapper_args+=("-i" "lo0" "-"decoder "Loopback")
-		else
-			tapper_args+=("-i" "lo")
-		fi
-		nohup ./agent/build/kubesharkagent ${tapper_args[@]} > tapper.log 2>&1 &
-
-		log "  $i: Running client (hey)"
-		hey -z $KUBESHARK_BENCHMARK_CLIENT_PERIOD -c $KUBESHARK_BENCHMARK_CLIENTS_COUNT -q $KUBESHARK_BENCHMARK_QPS $KUBESHARK_BENCHMARK_URL > /dev/null || return 1
-
-		log "  $i: Killing tapper"
-		kill -9 $(ps -ef | grep agent/build/kubesharkagent | grep tap | grep -v grep | awk '{ print $2 }') > /dev/null 2>&1
-
-		local output_file=$KUBESHARK_BENCHMARK_OUTPUT_DIR/${mode_num}_${mode_str}_${i}.log
-		log "  $i: Moving output to $output_file"
-		mv tapper.log $output_file || return 1
-	done
-}
-
-function generate_bench_graph() {
-	cd performance_analysis/ || return 1
-	source venv/bin/activate
-	python plot_from_tapper_logs.py $KUBESHARK_BENCHMARK_OUTPUT_DIR/*.log || return 1
-	mv graph.png $KUBESHARK_BENCHMARK_OUTPUT_DIR || return 1
-}
-
-mkdir -p $KUBESHARK_BENCHMARK_OUTPUT_DIR
-rm -f $KUBESHARK_BENCHMARK_OUTPUT_DIR/*
-log "Writing output to $KUBESHARK_BENCHMARK_OUTPUT_DIR"
-
-cd $KUBESHARK_HOME || exit 1
-
-export HOST_MODE=0
-export SENSITIVE_DATA_FILTERING_OPTIONS='{}'
-export KUBESHARK_DEBUG_DISABLE_PCAP=false
-export KUBESHARK_DEBUG_DISABLE_TCP_REASSEMBLY=false
-export KUBESHARK_DEBUG_DISABLE_TCP_STREAM=false
-export KUBESHARK_DEBUG_DISABLE_NON_HTTP_EXTENSSION=false
-export KUBESHARK_DEBUG_DISABLE_DISSECTORS=false
-export KUBESHARK_DEBUG_DISABLE_EMITTING=false
-export KUBESHARK_DEBUG_DISABLE_SENDING=false
-
-export KUBESHARK_DEBUG_DISABLE_PCAP=true
-run_single_bench "01" "no_pcap" || exit 1
-export KUBESHARK_DEBUG_DISABLE_PCAP=false
-
-export KUBESHARK_DEBUG_DISABLE_TCP_REASSEMBLY=true
-run_single_bench "02" "no_assembler" || exit 1
-export KUBESHARK_DEBUG_DISABLE_TCP_REASSEMBLY=false
-
-export KUBESHARK_DEBUG_DISABLE_TCP_STREAM=true
-run_single_bench "03" "no_tcp_stream" || exit 1
-export KUBESHARK_DEBUG_DISABLE_TCP_STREAM=false
-
-export KUBESHARK_DEBUG_DISABLE_NON_HTTP_EXTENSSION=true
-run_single_bench "04" "only_http" || exit 1
-export KUBESHARK_DEBUG_DISABLE_NON_HTTP_EXTENSSION=false
-
-export KUBESHARK_DEBUG_DISABLE_DISSECTORS=true
-run_single_bench "05" "no_dissectors" || exit 1
-export KUBESHARK_DEBUG_DISABLE_DISSECTORS=false
-
-export KUBESHARK_DEBUG_DISABLE_EMITTING=true
-run_single_bench "06" "no_emit" || exit 1
-export KUBESHARK_DEBUG_DISABLE_EMITTING=false
-
-export KUBESHARK_DEBUG_DISABLE_SENDING=true
-run_single_bench "07" "no_send" || exit 1
-export KUBESHARK_DEBUG_DISABLE_SENDING=false
-
-run_single_bench "08" "normal" || exit 1
-
-generate_bench_graph || exit 1
-log "Output written to to $KUBESHARK_BENCHMARK_OUTPUT_DIR"
--- a/performance_analysis/tapper-modes.png
+++ b/performance_analysis/tapper-modes.png