Do another round of rule cleanups now that we have a larger set of
positive and negative trace files to work with. Outside of this commit,
there are now trace files for all the positive rules, a docker-compose
startup and teardown, and some trace files from the sysdig cloud staging
environment.
Also add a script that runs sysdig with a filter that removes all the
syscalls not handled by falco as well as a few other high-volume,
low-information syscalls. This script was used to create the staging
environment trace files.
Notable rule changes:
- The direction for write_binary_dir/write_etc needs to be exit instead
of enter, as the bin_dir clause works on the file descriptor returned
by the open/openat call.
- Add login as a trusted binary that can read sensitive files (occurs
for direct console logins).
- sshd can read sensitive files well after startup, so exclude it from
the set of binaries that can trigger
read_sensitive_file_trusted_after_startup.
- limit run_shell_untrusted to non-containers.
- Disable the ssh_error_syslog rule for now. With the current
restriction on system calls (no read/write/sendto/recvfrom/etc), you
won't see the ssh error messages. Nevertheless, add a string to look
for to indicate ssh errors and add systemd's true location for the
syslog device.
- Sshd attemps to setuid even when it's not running as root, so exclude
it from the set of binaries to monitor for now.
- Let programs that are direct decendants of systemd spawn user
management tasks for now.
- Temporarily disable the EACCESS rule. This rule is exposing a bug in
sysdig in debug mode, https://github.com/draios/sysdig/issues/598. The
rule is also pretty noisy so I'll keep it disabled until the sysdig bug
is fixed.
- The etc_dir and bin_dir macros both have the problem that they match
pathnames with /etc/, /bin/, etc in the middle of the path, as sysdig
doesn't have a "begins with" comparison. Add notes for that.
- Change spawn_process to spawned_process to indicate that it's for the
exit side of the execve. Also use it in a few places that were
looking for the same conditions without any macro.
- Get rid of adduser_binaries and fold any programs not already present
into shadowutils_binaries.
- Add new groups sysdigcloud_binaries and sysdigcloud_binaries_parent
and add them as exceptions for write_etc/write_binary_dir.
- Add yum as a package management binary and add it as an exception to
write_etc/write_binary_dir.
- Change how db_program_spawned_process works. Since all of the useful
information is on the exit side of the event, you can't really add a
condition based on the process being new. Isntead, have the rule
check for a non-database-related program being spawned by a
database-related program.
- Allow dragent to run shells.
- Add sendmail, sendmail-msp as a program that attempts to setuid.
- Some of the *_binaries macros that were based on dpkg -L accidentally
contained directories in addition to end files. Trim those.
- Add systemd-logind as a login_binary.
- Add unix_chkpwd as a shadowutils_binary.
- Add parentheses around any macros that group items using or. I found
this necessary when the macro is used in the middle of a list of and
conditions.
- Break out system_binaries into a new subset user_mgmt_binaries
containing login_, passwd_, and shadowutils_ binaries. That way you
don't have to pull in all of system_binaries when looking for
sensisitive files or user management activity.
- Rename fs-bash to fbash, thinking ahead to its more likely name.
Start using the Avocado framework for automated regression
testing. Create a test FalcoTest in falco_test.py which can run on a
collection of trace files. The script test/run_regression_tests.sh is
responsible for pulling zip files containing the positive (falco should
detect) and negative (falco should not detect) trace files, creating a
Avocado multiplex file that defines all the tests (one for each trace
file), running avocado on all the trace files, and showing full logs for
any test that didn't pass.
The old regression script, which simply ran falco, has been removed.
Modify falco's stats output to show the total number of events detected
for use in the tests.
In travis.yml, pull a known stable version of avocado and build it,
including installing any dependencies, as a part of the build process.
At shutdown, print stats on the number of rules triggered by severity
and rule name. This is done by a lua function print_stats and the
associated table rule_output_counts.
When passing rules to outputs, update the counts in rule_output_counts.
Add signal handlers for SIGINT/SIGTERM that set a shutdown
flag. Initialize the live inspector with a timeout so the main loop can
watch the flag set by the signal handlers.
We found during testing that rules without syscall/event conditions are
slower than other rules, so take a pass over the existing set of rules
ensuring that whenever possible they have a condition. The changes are:
- Only process executions by interactive users are monitored
- Only look at connect/listen/etc for system binaries performing
network activity
- Only monitor process executions when monitoring user management
programs.
Also comment out all application rules by default so users can opt-in
for the applications they use instead of getting a lot of application
monitoring they may not need. Add a note stating they're all disabled by
default and can be re-enabled as needed.
Finally, remove some less common applications where we haven't done live
testing.
These 3 changes, along with those in
https://github.com/draios/sysdig/pull/592, result in a significant
performance increase on busy servers.
For rules where evt.args had useful information but too much
information, add back specific values that have just the useful argument
from the event:
- spawned shells contain the commandline--it's the exit half of the
exec event so the current commandline is what was exec()d to.
- setuid contains the uid being switched to.
While I was testing these, I had a couple of other fixes:
- In the spawn shells rule, only track execve events so you don't catch
clone() events that precede an exec.
- in spawn_process only consider the exit half of the exec event.
We'll probably want a more formal set of documentation soon, but at
least they're mentioned now.
Also remove socket from the list of discarded events, thinking ahead to
when https://github.com/draios/sysdig/pull/591 will be merged.
A new macro package_mgmt_binaries includes dpkg and rpm. Those programs
are allowed to create directories and modify files below binary
directories. I'm not adding them to other trusted sets for now, though.
Try to clean up the language of the existing rule set, expanding the
output when possible, removing %evt.dir in most cases.
There is one substantive change: the mkdir half of modify_binary_dirs
was split out into its own rule mkdir_binary_dirs.
When run with -l <rule>, falco will print the name/description of the
single rule <rule> and exit. With -L, falco will print the
name/description of all rules.
All the work is done in lua in the rule loader. A new lua function
describe_rule calls the local function describe_single_rule once or
multiple times depending on -l/-L. describe_single_rule prints the rule
name and a wrapped version of the rule description.
Add name and description fields to all rules. The name field is actually
a field called 'rule', which corresponds to the 'macro' field for
macros.
Within the rule loader, the state changes slightly. There are two
indices into the set of rules 'rules_by_name' and
'rules_by_idx' (formerly 'outputs'). They both now contain the original
table from the yaml parse. One field 'level' is added which is the
priority mapped to a number.
Get rid of the notion of default priority or output. Every rule must now
provide both.
Go through all current rules and add names and descriptions.
Update rules to reduce FPs after running against some real-world
environments with and without containers. Summary of changes:
- Too many processes read /etc/passwd--it's world-readable and a
side-effect of getpwent. Switch to /etc/shadow instead.
- Add a mail_binaries group. This wasn't directly used, but it may be
handy for other rules and goes along with the changes in #54.
- not_cron was the only macro expressing a negative, so switch it to be
a positive 'cron'. Also add crond as a cron process.
- add dragent to the set of programs that can call setns.
- For the shell detection rules, change them to only look for the
specific exec/clone event rather than all follow-on activity. Also
allow docker to spawn shell scripts--this is required for entrypoints
that use the shell instead of a direct exec. Also add a few
additional programs that can spawn shells.
- In containers, shells are allowed as long as the parent process is
docker or bash. Like the outside of container case, only the initial
clone/exec is detected.
- Fix a typo Sytem -> System.
- Change the chmod rule to only protect imporant/sensitive files. I saw
lots of "regular" files being chmod()ed.
- Change the setuid test to allow root to setuid to anything, rather
than listing a bunch of programs run as root that drop privileges.
- Allow running su/sudo in containers. Some containers add users from a
base linux distribution before running.
It isn't being used yet, for now we're using the corresponding script
from the sysdig repo. Removing it to avoid confusion, we can later
re-add as necessary.
Instead of running bash as the sysdig container does, run falco. This
makes sense as falco doesn't have a general purpose use like sysdig
does.
To make it easier to run both in docker and as a daemon using the
default command line, enable both syslog and stdout/stderr output by
default. Now that falco dups stdout/stderr to /dev/null when
daemonizing, the stdout/stderr is just thrown away. And when running in
docker, the syslog output will just be discarded unless someone plumbs
the container's syslog output.
Update README.md to reflect that specifying the falco command is not
necessary.
This will detect the result of some sql injection attacks where the
injected query tries to spawn a process.
We don't include web servers in this list for now due to things like
mod_perl, mod_php, etc. Maybe we can add it once we make exceptions for
those modules.
Add back detection for mysql and sensitive files that was removed in the
previous commit. A new macro proc_is_new adds a condition on how long a
process has been running.
A new rule triggers if the process is not new and tries to open a
sensitive file. This handles cases like mysql, where it *does* read
/etc/passwd on startup but shouldn't really open it afterward.
Add some new groups of binary programs as macros and start using them in
the set of rules:
- docker_binaries: docker and exe (which is a temporary process name
for processes like docker-proxy)
- http_server_binaries: httpd, nginx, and similar
- db_server_binaries: mysql for now, we'll add more later
- server_binaries: all of the above
- userexec_binaries: sudo and su.
Start using these groups in the rules. Most of the time, changing from
the inline lists of processes to macros was a no-op. There are some
actual changes, though:
- docker and exe are now allowed to read 'sensitive' files. They may
not actually do so, but it's not really harmful.
- lighttpd is now allowed to read 'sensitive' files, via inclusion in
http_server_binaries.
- su, lighttpd, and docker can now setuid.
- http-foreground is included as a http server wrt non-port 80/443 ports.
I'm going to use these macros in some of the following rules.
This actually prevents detection of mysql reading sensitive files, which
is one of the demo scenarios (sql injection). I plan on adding this
detection back in the next commit.