Add test that cover reading from multiple sets of rule files and
disabling rules. Specific changes:
- Modify falco to allow multiple -r arguments to read from multiple
files.
- In the test multiplex file, add a disabled_rules attribute,
containing a sequence of rules to disable. Result in -D arguments
when running falco.
- In the test multiplex file, 'rules_file' can be a sequence. It
results in multiple -r arguments when running falco.
- In the test multiplex file, 'detect_level' can be a squence of
multiple severity levels. All levels will be checked for in the
output.
- Move all test rules files to a rules subdirectory and all trace files
to a traces subdirectory.
- Add a small trace file for a simple cat of /dev/null. Used by the
new tests.
- Add the following new tests:
- Reading from multiple files, with the first file being
empty. Ensure that the rules from the second file are properly
loaded.
- Reading from multiple files with the last being empty. Ensures
that the empty file doesn't overwrite anything from the first
file.
- Reading from multiple files with varying severity levels for each
rule. Ensures that both files are properly read.
- Disabling rules from a rules file, both with full rule names
and regexes. Will result in not detecting anything.
Create standalone classes falco_engine/falco_outputs that can be
embedded in other programs. falco_engine is responsible for matching
events against rules, and falco_output is responsible for formatting an
alert string given an event and writing the alert string to all
configured outputs.
falco_engine's main interfaces are:
- load_rules/load_rules_file: Given a path to a rules file or a string
containing a set of rules, load the rules. Also loads needed lua code.
- process_event(): check the event against the set of rules and return
the results of a match, if any.
- describe_rule(): print details on a specific rule or all rules.
- print_stats(): print stats on the rules that matched.
- enable_rule(): enable/disable any rules matching a pattern. New falco
command line option -D allows you to disable one or more rules on the
command line.
falco_output's main interfaces are:
- init(): load needed lua code.
- add_output(): add an output channel for alert notifications.
- handle_event(): given an event that matches one or more rules, format
an alert message and send it to any output channels.
Each of falco_engine/falco_output maintains a separate lua state and
loads separate sets of lua files. The code to create and initialize the
lua state is in a base class falco_common.
falco_engine no longer logs anything. In the case of errors, it throws
exceptions. falco_logger is now only used as a logging mechanism for
falco itself and as an output method for alert messages. (This should
really probably be split, but it's ok for now).
falco_engine contains an sinsp_evttype_filter object containing the set
of eventtype filters. Instead of calling
m_inspector->add_evttype_filter() to add a filter created by the
compiler, call falco_engine::add_evttype_filter() instead. This means
that the inspector runs with a NULL filter and all events are returned
from do_inspect. This depends on
https://github.com/draios/sysdig/pull/633 which has a wrapper around a
set of eventtype filters.
Some additional changes along with creating these classes:
- Some cleanups of unnecessary header files, cmake include_directory()s,
etc to only include necessary includes and only include them in header
files when required.
- Try to avoid 'using namespace std' in header files, or assuming
someone else has done that. Generally add 'using namespace std' to all
source files.
- Instead of using sinsp_exception for all errors, define a
falco_engine_exception class for exceptions coming from the falco
engine and use it instead. For falco program code, switch to general
exceptions under std::exception and catch + display an error for all
exceptions, not just sinsp_exceptions.
- Remove fields.{cpp,h}. This was dead code.
- Start tracking counts of rules by priority string (i.e. what's in the
falco rules file) as compared to priority level (i.e. roughtly
corresponding to a syslog level). This keeps the rule processing and
rule output halves separate. This led to some test changes. The regex
used in the test is now case insensitive to be a bit more flexible.
- Now that https://github.com/draios/sysdig/pull/632 is merged, we can
delete the rules object (and its lua_parser) safely.
- Move loading the initial lua script to the constructor. Otherwise,
calling load_rules() twice re-loads the lua script and throws away any
state like the mapping from rule index to rule.
- Allow an empty rules file.
Finally, fix most memory leaks found by valgrind:
- falco_configuration wasn't deleting the allocated m_config yaml
config.
- several ifstreams were being created simply to test which falco
config file to use.
- In the lua output methods, an event formatter was being created using
falco.formatter() but there was no corresponding free_formatter().
This depends on changes in https://github.com/draios/sysdig/pull/640.
When the root directory contains the name 'agent', assume we're running
an agent and provide appropriate configuration and run the agent using
dragent.
You can make autodrop or falco configurable within the agent via
--agent-autodrop and --falco-agent.
Also include some other small changes like timestamping the json points.
Add tests that verify that the event type identification functionality
is working. Notable changes:
- Modify falco_test.py to additionally check for warnings when loading
any set of rules and verify that the event types for each rule match
expected values. This is controlled by the new multiplex fields
"rules_warning" and "rules_events".
- Instead of starting with an empty falco_tests.yaml from scratch from
the downloaded trace files, use a checked-in version which defines
two tests:
- Loading the checked-in falco_rules.yaml and verify that no rules
have warnings.
- A sample falco_rules_warnings.yaml that has ~30 different
mutations of rule filtering expressions. The test verifies for each
rule whether or not the rule should result in a warning and what the
extracted event types are.
The generated tests from the trace files are appended to this file.
- Add an empty .scap file to use with the above tests.
Add shell scripts to make it easier to collect performance results from
traces, live tests, and phoronix tests.
With run_performance_tests.sh you specify the following:
- a subject program to run, using --root
- a name to give to this set of results, using --variant
- a test to run, using --test
- a file to write the results to, using --results.
For tests that start with "trace", the script runs falco/sysdig on the
trace file and measures the time taken to read the file. For other
tests, he script handles starting falco/sysdig, starting a cpu
measurement script (a wrapper around top, just to provide identical
values to what you would see using top) to measure the cpu usage of
falco/sysdig, and running a live test.
The measurement interval for cpu usage depends on the test being run--10
seconds for most tests, 2 seconds for shorter tests.
The output is written as json to the file specified in --results.
Also add R scripts to easily display the results from the shell
script. plot-live.r shows a linechart of the cpu usage for the provided
variants over time. plot-traces.r shows grouped barcharts showing
user/system/total time taken for the provided variants and traces.
One bug--you have to make the results file actual json by adding
leading/trailing []s.
Pass the travis branch to run_regression_tests.sh. When downloading
trace files, first look for a file traces-XXX-$BRANCH and if found
download it. This allows testing out a set of changes with a trace file
specifically for that branch, that can be moved to the normal file once
the PR is merged.
Also increase the timeout for the spawned falco process from 1 to 3
minutes. In debug mode, the kubernetes demo was taking slightly over 1
minute.
Modify falco_test.py to look for a boolean multiplex attribute
'json_output'. If true, examine the lines of the output and for any line
that begins with '{', parse it as json and ensure it has the 4
attributes we expect.
Modify run_regression_tests to have a utility function
prepare_multiplex_fileset that does the work of looping over files in a
directory, along with detect, level, and json output arguments. The
appropriate multiplex attributes are added for each file.
Use that utility function to test json output for the positive and
informational directories along with non-json output. The negative
directory is only tested once.
Add additional rules related to using pipe installers within a fbash
session:
- Modify write_etc to only trigger if *not* in a fbash session. There's
a new rule write_etc_installer which has the same conditions when in
a fbash session, logging at INFO severity.
- A new rule write_rpm_database warns if any non package management
program tries to write below /var/lib/rpm.
- Add a new warning if any program below a fbash session tries to open
an outbound network connection on ports other than http(s) and dns.
- Add INFO level messages when programs in a fbash session try to run
package management binaries (rpm,yum,etc) or service
management (systemctl,chkconfig,etc) binaries.
In order to test these new INFO level rules, make up a third class of
trace files traces-info.zip containing trace files that should result in
info-level messages.
To differentiate warning and info level detection, add an attribute to
the multiplex file "detect_level", which is "Warning" for the files in
traces-positive and "Info" for the files in traces-info. Modify
falco_test.py to look specifically for a non-zero count for the given
detect_level.
Doing this exposed a bug in the way the level-specific counts were being
recorded--they were keeping counts by level name, not number. Fix that.
Do another round of rule cleanups now that we have a larger set of
positive and negative trace files to work with. Outside of this commit,
there are now trace files for all the positive rules, a docker-compose
startup and teardown, and some trace files from the sysdig cloud staging
environment.
Also add a script that runs sysdig with a filter that removes all the
syscalls not handled by falco as well as a few other high-volume,
low-information syscalls. This script was used to create the staging
environment trace files.
Notable rule changes:
- The direction for write_binary_dir/write_etc needs to be exit instead
of enter, as the bin_dir clause works on the file descriptor returned
by the open/openat call.
- Add login as a trusted binary that can read sensitive files (occurs
for direct console logins).
- sshd can read sensitive files well after startup, so exclude it from
the set of binaries that can trigger
read_sensitive_file_trusted_after_startup.
- limit run_shell_untrusted to non-containers.
- Disable the ssh_error_syslog rule for now. With the current
restriction on system calls (no read/write/sendto/recvfrom/etc), you
won't see the ssh error messages. Nevertheless, add a string to look
for to indicate ssh errors and add systemd's true location for the
syslog device.
- Sshd attemps to setuid even when it's not running as root, so exclude
it from the set of binaries to monitor for now.
- Let programs that are direct decendants of systemd spawn user
management tasks for now.
- Temporarily disable the EACCESS rule. This rule is exposing a bug in
sysdig in debug mode, https://github.com/draios/sysdig/issues/598. The
rule is also pretty noisy so I'll keep it disabled until the sysdig bug
is fixed.
- The etc_dir and bin_dir macros both have the problem that they match
pathnames with /etc/, /bin/, etc in the middle of the path, as sysdig
doesn't have a "begins with" comparison. Add notes for that.
- Change spawn_process to spawned_process to indicate that it's for the
exit side of the execve. Also use it in a few places that were
looking for the same conditions without any macro.
- Get rid of adduser_binaries and fold any programs not already present
into shadowutils_binaries.
- Add new groups sysdigcloud_binaries and sysdigcloud_binaries_parent
and add them as exceptions for write_etc/write_binary_dir.
- Add yum as a package management binary and add it as an exception to
write_etc/write_binary_dir.
- Change how db_program_spawned_process works. Since all of the useful
information is on the exit side of the event, you can't really add a
condition based on the process being new. Isntead, have the rule
check for a non-database-related program being spawned by a
database-related program.
- Allow dragent to run shells.
- Add sendmail, sendmail-msp as a program that attempts to setuid.
- Some of the *_binaries macros that were based on dpkg -L accidentally
contained directories in addition to end files. Trim those.
- Add systemd-logind as a login_binary.
- Add unix_chkpwd as a shadowutils_binary.
- Add parentheses around any macros that group items using or. I found
this necessary when the macro is used in the middle of a list of and
conditions.
- Break out system_binaries into a new subset user_mgmt_binaries
containing login_, passwd_, and shadowutils_ binaries. That way you
don't have to pull in all of system_binaries when looking for
sensisitive files or user management activity.
- Rename fs-bash to fbash, thinking ahead to its more likely name.
Start using the Avocado framework for automated regression
testing. Create a test FalcoTest in falco_test.py which can run on a
collection of trace files. The script test/run_regression_tests.sh is
responsible for pulling zip files containing the positive (falco should
detect) and negative (falco should not detect) trace files, creating a
Avocado multiplex file that defines all the tests (one for each trace
file), running avocado on all the trace files, and showing full logs for
any test that didn't pass.
The old regression script, which simply ran falco, has been removed.
Modify falco's stats output to show the total number of events detected
for use in the tests.
In travis.yml, pull a known stable version of avocado and build it,
including installing any dependencies, as a part of the build process.