From 5fd69db1b172e5d927ecfd427f582adda6d9fe39 Mon Sep 17 00:00:00 2001 From: Henri DF Date: Thu, 28 Apr 2016 14:04:46 -0700 Subject: [PATCH] Destroyed Profiling (markdown) --- Profiling.md | 147 --------------------------------------------------- 1 file changed, 147 deletions(-) delete mode 100644 Profiling.md diff --git a/Profiling.md b/Profiling.md deleted file mode 100644 index 47301d0..0000000 --- a/Profiling.md +++ /dev/null @@ -1,147 +0,0 @@ -This page documents some basic profiling of digwatch that I did in the March 2016. - -## Methodology -I used phoronix test suite to run some benchmarks. For a given benchmark, I did three runs: -- Baseline: only the test suite running -- Sysdig: `sysdig -N proc.pid=0` while test suite runs -- Digwatch: `digwatch` runs with a reasonable size ruleset. most events do *not* match the ruleset, which means that most of the digwatch-compiled filter will be evaluated for most events. - -The full ruleset is [here](https://github.com/draios/digwatch/blob/972c84707fd7a786214cc0fbb4a8b753a0488ebb/rules/base.txt). It comprises 18 rules. Many of these rules just have a few expressions, but some are very long. In particular, one rule uses `system_binaries`, which expands to something like `proc.name in (truncate, sha1sum, numfmt, fmt, fold, uniq,...`, enumerating over one hundred names. The full rule is `fd.sockfamily = ip and system_binaries`, which means that for every network read/write event, a comparison of the process name will be made against this list of 100+ names. - - -## Summary of results - -For three benchmarks (apache, nginx, sqlite), digwatch has little or no performance degradation compared to sysdig. But two out of these (apache and nginx) show a significant degradation of both sysdig and digwatch compared to baseline. - -For the redis benchmark, digwatch _does_ show a significant degradation compared to sysdig. My guess was that it might be the `system_binaries` rule (which has a chain of 100+ tests like`proc.name = truncate OR proc.name = ls OR ...`) but that turned out not to be the case. Running the same ruleset with the `fd.sockfamily = ip and system_binaries` rule disabled did not change the result. (As so often, obvious explanations are not the right ones when it comes to performance)... - - -So I tried running digwatch in a variety of ways with a subset of rules, and found there was basically a linear improvement as I removed more and more rules. There wasn't any single rule that explained the dropoff. Removing the first chunk of rules (the first two-thirds) made no difference. Then after that, each removed rule increased performance by order of 5-10%, until the sysdig performance was reached. This corresponds to the digwatch CPU going from 100% flatline down to the 30%-50% level. - - -Profile of system calls for nginx: -(about 1-1.5m events per second per cpu from looking at the kernel docs with insmod verbose=1). - - - -``` -# Calls Syscall --------------------------------------------------------------------------------- -2935528 gettimeofday -2879033 epoll_ctl -1223131 close -1217542 write -1120296 recvfrom -909506 epoll_wait -906429 read -803527 fcntl -799420 connect -``` - - - - - - - -## Environment - -I ran all these tests on a m3.large EC2 instance (2vCPU, 6.5 ECU, 7.5 GB RAM). - - -## Test results - - -### pts/nginx - - -Description: This is a test of ab, which is the Apache benchmark program. This test profile measures how many requests per second a given system can sustain when carrying out 500,000 requests with 100 requests being carried out concurrently. - - -#### baseline -``` -NGINX Benchmark 1.0.11: - pts/nginx-1.1.0 - Test 1 of 1 - Estimated Trial Run Count: 3 - Estimated Time To Completion: 5 Minutes - Running Pre-Test Script @ 19:27:18 - Started Run 1 @ 19:27:23 - Started Run 2 @ 19:28:00 - Started Run 3 @ 19:28:37 [Std. Dev: 0.40%] - Running Post-Test Script @ 19:29:11 - - Test Results: - 14543.39 - 14579.73 - 14658.75 - - Average: 14593.96 Requests Per Second -``` - -#### scap-open - -``` - -NGINX Benchmark 1.0.11: - pts/nginx-1.1.0 - Test 1 of 1 - Estimated Trial Run Count: 3 - Estimated Time To Completion: 5 Minutes - Running Pre-Test Script @ 19:05:52 - Started Run 1 @ 19:05:57 - Started Run 2 @ 19:06:43 - Started Run 3 @ 19:07:30 [Std. Dev: 0.38%] - Running Post-Test Script @ 19:08:14 - - Test Results: - 11429.92 - 11405.57 - 11490.17 - - Average: 11441.89 Requests Per Second -``` - -#### sysdig -N - -``` -NGINX Benchmark 1.0.11: - pts/nginx-1.1.0 - Test 1 of 1 - Estimated Trial Run Count: 3 - Estimated Time To Completion: 5 Minutes - Running Pre-Test Script @ 18:43:42 - Started Run 1 @ 18:43:47 - Started Run 2 @ 18:44:38 - Started Run 3 @ 18:45:30 [Std. Dev: 0.12%] - Running Post-Test Script @ 18:46:19 - - Test Results: - 10155.7 - 10180.53 - 10170.71 -``` - - -#### digwatch -``` - -NGINX Benchmark 1.0.11: - pts/nginx-1.1.0 - Test 1 of 1 - Estimated Trial Run Count: 3 - Estimated Time To Completion: 4 Minutes - Running Pre-Test Script @ 03:35:39 - Started Run 1 @ 03:35:44 - Started Run 2 @ 03:36:58 - Started Run 3 @ 03:38:12 [Std. Dev: 0.64%] - Running Post-Test Script @ 03:39:25 - - Test Results: - 6977.67 - 6986.85 - 6905.94 - - Average: 6956.82 Requests Per Second -``` -