mirror of
https://github.com/falcosecurity/falco.git
synced 2025-09-15 22:38:26 +00:00
Destroyed Profiling (markdown)
147
Profiling.md
147
Profiling.md
@@ -1,147 +0,0 @@
|
||||
This page documents some basic profiling of digwatch that I did in the March 2016.
|
||||
|
||||
## Methodology
|
||||
I used phoronix test suite to run some benchmarks. For a given benchmark, I did three runs:
|
||||
- Baseline: only the test suite running
|
||||
- Sysdig: `sysdig -N proc.pid=0` while test suite runs
|
||||
- Digwatch: `digwatch` runs with a reasonable size ruleset. most events do *not* match the ruleset, which means that most of the digwatch-compiled filter will be evaluated for most events.
|
||||
|
||||
The full ruleset is [here](https://github.com/draios/digwatch/blob/972c84707fd7a786214cc0fbb4a8b753a0488ebb/rules/base.txt). It comprises 18 rules. Many of these rules just have a few expressions, but some are very long. In particular, one rule uses `system_binaries`, which expands to something like `proc.name in (truncate, sha1sum, numfmt, fmt, fold, uniq,...`, enumerating over one hundred names. The full rule is `fd.sockfamily = ip and system_binaries`, which means that for every network read/write event, a comparison of the process name will be made against this list of 100+ names.
|
||||
|
||||
|
||||
## Summary of results
|
||||
|
||||
For three benchmarks (apache, nginx, sqlite), digwatch has little or no performance degradation compared to sysdig. But two out of these (apache and nginx) show a significant degradation of both sysdig and digwatch compared to baseline.
|
||||
|
||||
For the redis benchmark, digwatch _does_ show a significant degradation compared to sysdig. My guess was that it might be the `system_binaries` rule (which has a chain of 100+ tests like`proc.name = truncate OR proc.name = ls OR ...`) but that turned out not to be the case. Running the same ruleset with the `fd.sockfamily = ip and system_binaries` rule disabled did not change the result. (As so often, obvious explanations are not the right ones when it comes to performance)...
|
||||
|
||||
|
||||
So I tried running digwatch in a variety of ways with a subset of rules, and found there was basically a linear improvement as I removed more and more rules. There wasn't any single rule that explained the dropoff. Removing the first chunk of rules (the first two-thirds) made no difference. Then after that, each removed rule increased performance by order of 5-10%, until the sysdig performance was reached. This corresponds to the digwatch CPU going from 100% flatline down to the 30%-50% level.
|
||||
|
||||
|
||||
Profile of system calls for nginx:
|
||||
(about 1-1.5m events per second per cpu from looking at the kernel docs with insmod verbose=1).
|
||||
|
||||
|
||||
|
||||
```
|
||||
# Calls Syscall
|
||||
--------------------------------------------------------------------------------
|
||||
2935528 gettimeofday
|
||||
2879033 epoll_ctl
|
||||
1223131 close
|
||||
1217542 write
|
||||
1120296 recvfrom
|
||||
909506 epoll_wait
|
||||
906429 read
|
||||
803527 fcntl
|
||||
799420 connect
|
||||
```
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
## Environment
|
||||
|
||||
I ran all these tests on a m3.large EC2 instance (2vCPU, 6.5 ECU, 7.5 GB RAM).
|
||||
|
||||
|
||||
## Test results
|
||||
|
||||
|
||||
### pts/nginx
|
||||
|
||||
|
||||
Description: This is a test of ab, which is the Apache benchmark program. This test profile measures how many requests per second a given system can sustain when carrying out 500,000 requests with 100 requests being carried out concurrently.
|
||||
|
||||
|
||||
#### baseline
|
||||
```
|
||||
NGINX Benchmark 1.0.11:
|
||||
pts/nginx-1.1.0
|
||||
Test 1 of 1
|
||||
Estimated Trial Run Count: 3
|
||||
Estimated Time To Completion: 5 Minutes
|
||||
Running Pre-Test Script @ 19:27:18
|
||||
Started Run 1 @ 19:27:23
|
||||
Started Run 2 @ 19:28:00
|
||||
Started Run 3 @ 19:28:37 [Std. Dev: 0.40%]
|
||||
Running Post-Test Script @ 19:29:11
|
||||
|
||||
Test Results:
|
||||
14543.39
|
||||
14579.73
|
||||
14658.75
|
||||
|
||||
Average: 14593.96 Requests Per Second
|
||||
```
|
||||
|
||||
#### scap-open
|
||||
|
||||
```
|
||||
|
||||
NGINX Benchmark 1.0.11:
|
||||
pts/nginx-1.1.0
|
||||
Test 1 of 1
|
||||
Estimated Trial Run Count: 3
|
||||
Estimated Time To Completion: 5 Minutes
|
||||
Running Pre-Test Script @ 19:05:52
|
||||
Started Run 1 @ 19:05:57
|
||||
Started Run 2 @ 19:06:43
|
||||
Started Run 3 @ 19:07:30 [Std. Dev: 0.38%]
|
||||
Running Post-Test Script @ 19:08:14
|
||||
|
||||
Test Results:
|
||||
11429.92
|
||||
11405.57
|
||||
11490.17
|
||||
|
||||
Average: 11441.89 Requests Per Second
|
||||
```
|
||||
|
||||
#### sysdig -N
|
||||
|
||||
```
|
||||
NGINX Benchmark 1.0.11:
|
||||
pts/nginx-1.1.0
|
||||
Test 1 of 1
|
||||
Estimated Trial Run Count: 3
|
||||
Estimated Time To Completion: 5 Minutes
|
||||
Running Pre-Test Script @ 18:43:42
|
||||
Started Run 1 @ 18:43:47
|
||||
Started Run 2 @ 18:44:38
|
||||
Started Run 3 @ 18:45:30 [Std. Dev: 0.12%]
|
||||
Running Post-Test Script @ 18:46:19
|
||||
|
||||
Test Results:
|
||||
10155.7
|
||||
10180.53
|
||||
10170.71
|
||||
```
|
||||
|
||||
|
||||
#### digwatch
|
||||
```
|
||||
|
||||
NGINX Benchmark 1.0.11:
|
||||
pts/nginx-1.1.0
|
||||
Test 1 of 1
|
||||
Estimated Trial Run Count: 3
|
||||
Estimated Time To Completion: 4 Minutes
|
||||
Running Pre-Test Script @ 03:35:39
|
||||
Started Run 1 @ 03:35:44
|
||||
Started Run 2 @ 03:36:58
|
||||
Started Run 3 @ 03:38:12 [Std. Dev: 0.64%]
|
||||
Running Post-Test Script @ 03:39:25
|
||||
|
||||
Test Results:
|
||||
6977.67
|
||||
6986.85
|
||||
6905.94
|
||||
|
||||
Average: 6956.82 Requests Per Second
|
||||
```
|
||||
|
Reference in New Issue
Block a user