Destroyed Profiling (markdown)

2025-09-15 22:38:26 +00:00 · 2016-04-28 14:04:46 -07:00
parent a60bcbd848
commit 5fd69db1b1
1 changed files with 0 additions and 147 deletions
--- a/Profiling.md
+++ b/Profiling.md
@@ -1,147 +0,0 @@
-This page documents some basic profiling of digwatch that I did in the March 2016. 
-
-## Methodology
-I used phoronix test suite to run some benchmarks. For a given benchmark, I did three runs:
- Baseline: only the test suite running
- Sysdig: `sysdig -N proc.pid=0` while test suite runs
- Digwatch: `digwatch` runs with a reasonable size ruleset. most events do *not* match the ruleset, which means that most of the digwatch-compiled filter will be evaluated for most events. 
-
-The full ruleset is [here](https://github.com/draios/digwatch/blob/972c84707fd7a786214cc0fbb4a8b753a0488ebb/rules/base.txt). It comprises 18 rules. Many of these rules just have a few expressions, but some are very long. In particular, one rule uses `system_binaries`, which expands to something like `proc.name in (truncate, sha1sum, numfmt, fmt, fold, uniq,...`, enumerating over one hundred names. The full rule is `fd.sockfamily = ip and system_binaries`, which means that for every network read/write event, a comparison of the process name will be made against this list of 100+ names.
-
-
-## Summary of results
-
-For three benchmarks (apache, nginx, sqlite), digwatch has little or no performance degradation compared to sysdig. But two out of these (apache and nginx) show a significant degradation of both sysdig and digwatch compared to baseline.
-
-For the redis benchmark, digwatch _does_ show a significant degradation compared to sysdig. My guess was that it might be the `system_binaries` rule (which has a chain of 100+ tests like`proc.name = truncate OR proc.name = ls OR ...`) but that turned out not to be the case. Running the same ruleset with the `fd.sockfamily = ip and system_binaries` rule disabled did not change the result. (As so often, obvious explanations are not the right ones when it comes to performance)...
-
-
-So I tried running digwatch in a variety of ways with a subset of rules, and found there was basically a linear improvement as I removed more and more rules. There wasn't any single rule that explained the dropoff. Removing the first chunk of rules (the first two-thirds) made no difference. Then after that, each removed rule increased performance by order of 5-10%, until the sysdig performance was reached. This corresponds to the digwatch CPU going from 100% flatline down to the 30%-50% level.
-
-
-Profile of system calls for nginx:
-(about 1-1.5m events per second per cpu from looking at the kernel docs with insmod verbose=1).
-
-
-
-```
-# Calls             Syscall             
--------------------------------------------------------------------------------
-2935528             gettimeofday
-2879033             epoll_ctl
-1223131             close
-1217542             write
-1120296             recvfrom
-909506              epoll_wait
-906429              read
-803527              fcntl
-799420              connect
-```
-
-
-
-
-
-
-
-## Environment
-
-I ran all these tests on a m3.large EC2 instance (2vCPU, 6.5 ECU, 7.5 GB RAM).
-
-
-## Test results
-
-
-### pts/nginx
-
-
-Description: This is a test of ab, which is the Apache benchmark program. This test profile measures how many requests per second a given system can sustain when carrying out 500,000 requests with 100 requests being carried out concurrently.
-
-
-#### baseline
-```
-NGINX Benchmark 1.0.11:
-    pts/nginx-1.1.0
-    Test 1 of 1
-    Estimated Trial Run Count:    3
-    Estimated Time To Completion: 5 Minutes
-        Running Pre-Test Script @ 19:27:18
-        Started Run 1 @ 19:27:23
-        Started Run 2 @ 19:28:00
-        Started Run 3 @ 19:28:37  [Std. Dev: 0.40%]
-        Running Post-Test Script @ 19:29:11
-
-    Test Results:
-        14543.39
-        14579.73
-        14658.75
-
-    Average: 14593.96 Requests Per Second
-```
-
-#### scap-open
-
-```
-
-NGINX Benchmark 1.0.11:
-    pts/nginx-1.1.0
-    Test 1 of 1
-    Estimated Trial Run Count:    3
-    Estimated Time To Completion: 5 Minutes
-        Running Pre-Test Script @ 19:05:52
-        Started Run 1 @ 19:05:57
-        Started Run 2 @ 19:06:43
-        Started Run 3 @ 19:07:30  [Std. Dev: 0.38%]
-        Running Post-Test Script @ 19:08:14
-
-    Test Results:
-        11429.92
-        11405.57
-        11490.17
-
-    Average: 11441.89 Requests Per Second
-```
-
-#### sysdig -N
-
-```
-NGINX Benchmark 1.0.11:
-    pts/nginx-1.1.0
-    Test 1 of 1
-    Estimated Trial Run Count:    3
-    Estimated Time To Completion: 5 Minutes
-        Running Pre-Test Script @ 18:43:42
-        Started Run 1 @ 18:43:47
-        Started Run 2 @ 18:44:38
-        Started Run 3 @ 18:45:30  [Std. Dev: 0.12%]
-        Running Post-Test Script @ 18:46:19
-
-    Test Results:
-        10155.7
-        10180.53
-        10170.71
-```
-
-
-#### digwatch
-```
-
-NGINX Benchmark 1.0.11:
-    pts/nginx-1.1.0
-    Test 1 of 1
-    Estimated Trial Run Count:    3
-    Estimated Time To Completion: 4 Minutes
-        Running Pre-Test Script @ 03:35:39
-        Started Run 1 @ 03:35:44
-        Started Run 2 @ 03:36:58
-        Started Run 3 @ 03:38:12  [Std. Dev: 0.64%]
-        Running Post-Test Script @ 03:39:25
-
-    Test Results:
-        6977.67
-        6986.85
-        6905.94
-
-    Average: 6956.82 Requests Per Second
-```
-