Make changes to rules to improve performance and reduce FPs:
- Rely on https://github.com/draios/sysdig/pull/610 that allows
specifying an open/openat for reading/writing without having to search
through all the flags individually.
- For a two-item list (open, openat), and thinking ahead to
https://github.com/draios/sysdig/pull/624, check the event type
individually instead of as a set membership test, which is a bit
faster.
- Switch to consistently using evt.type instead of syscall.type.
- Move positive tests like etc_dir, bin_dir, sensitive_files,
proc.sname, etc., which are most likely to not succeed, to the
beginning of rules, so they have a greater chance to cause the rest of
the rule to be skipped, which saves time.
- Using exim as a mail program--exim also can suid to root.
- add a new macro for ssl management binaries and allow them to write
below /etc and read sensitive files.
- add a new macro for dhcp client binaries and allow them to write below
/etc.
- Add exe (docker-related program) as a program that can set a namespace
using setns.
- Don't count /dev/tty as an important file under /dev.
https://github.com/draios/sysdig/pull/623 adds support for a startswith
operator to allow for string prefix matching. Modify the parser to
recognize that operator, and use that operator for rules that really
want to check the beginning of a pathname, directory, etc. to make them
faster and avoid FPs.
Once sysdig adds support for handling "in (...)" filter expressions as
set membership tests, it will be advantageous to combine lists of items
together into a single list so they can all be checked in a single set
membership test.
This commit adds support for a new yaml item type "list" containing a
field "name" and field "items" containing a list of items. These are
represented as a yaml list, which allows yaml to handle some of the
initial parsing with the list items maintained natively in lua.
Allow lists to contain list references by expanding any references to
the items in the list, before storing the list items in
state.lists.
When parsing macro or rule conditions, replace all references to a list
name with the list items as a comma separated string.
Modify the falco rules to switch to lists whenever possible. The
new convention is to use the suffix _binaries for lists of program names
and _procs for macros that define a filter expression using the list.
Adding docker-compose based example of man-in-the-middle attack against
installation scripts and how it can be detected using sysdig falco.
The docker-compose environment starts a good web server, compromised
nginx installation, evil web server, and a copy of sysdig falco. The
README walks through the process of compromising a client by using curl
http://localhost/get-software.sh | bash and detecting the compromise
using ./fbash.
The fbash program included in this example fixes https://github.com/draios/falco/issues/46.
Add additional rules related to using pipe installers within a fbash
session:
- Modify write_etc to only trigger if *not* in a fbash session. There's
a new rule write_etc_installer which has the same conditions when in
a fbash session, logging at INFO severity.
- A new rule write_rpm_database warns if any non package management
program tries to write below /var/lib/rpm.
- Add a new warning if any program below a fbash session tries to open
an outbound network connection on ports other than http(s) and dns.
- Add INFO level messages when programs in a fbash session try to run
package management binaries (rpm,yum,etc) or service
management (systemctl,chkconfig,etc) binaries.
In order to test these new INFO level rules, make up a third class of
trace files traces-info.zip containing trace files that should result in
info-level messages.
To differentiate warning and info level detection, add an attribute to
the multiplex file "detect_level", which is "Warning" for the files in
traces-positive and "Info" for the files in traces-info. Modify
falco_test.py to look specifically for a non-zero count for the given
detect_level.
Doing this exposed a bug in the way the level-specific counts were being
recorded--they were keeping counts by level name, not number. Fix that.
Update fbash rules to use proc.sname instead of proc.aname and to rely
on sessions instead of process ancestors.
I also wanted to add details on the address/port being listened to but
that's blocked on https://github.com/draios/falco/issues/86.
Along with this change, there are new positive trace files
installer-bash-starts-network-server.scap and
installer-bash-starts-session.scap that test these updated rules.
Do another round of rule cleanups now that we have a larger set of
positive and negative trace files to work with. Outside of this commit,
there are now trace files for all the positive rules, a docker-compose
startup and teardown, and some trace files from the sysdig cloud staging
environment.
Also add a script that runs sysdig with a filter that removes all the
syscalls not handled by falco as well as a few other high-volume,
low-information syscalls. This script was used to create the staging
environment trace files.
Notable rule changes:
- The direction for write_binary_dir/write_etc needs to be exit instead
of enter, as the bin_dir clause works on the file descriptor returned
by the open/openat call.
- Add login as a trusted binary that can read sensitive files (occurs
for direct console logins).
- sshd can read sensitive files well after startup, so exclude it from
the set of binaries that can trigger
read_sensitive_file_trusted_after_startup.
- limit run_shell_untrusted to non-containers.
- Disable the ssh_error_syslog rule for now. With the current
restriction on system calls (no read/write/sendto/recvfrom/etc), you
won't see the ssh error messages. Nevertheless, add a string to look
for to indicate ssh errors and add systemd's true location for the
syslog device.
- Sshd attemps to setuid even when it's not running as root, so exclude
it from the set of binaries to monitor for now.
- Let programs that are direct decendants of systemd spawn user
management tasks for now.
- Temporarily disable the EACCESS rule. This rule is exposing a bug in
sysdig in debug mode, https://github.com/draios/sysdig/issues/598. The
rule is also pretty noisy so I'll keep it disabled until the sysdig bug
is fixed.
- The etc_dir and bin_dir macros both have the problem that they match
pathnames with /etc/, /bin/, etc in the middle of the path, as sysdig
doesn't have a "begins with" comparison. Add notes for that.
- Change spawn_process to spawned_process to indicate that it's for the
exit side of the execve. Also use it in a few places that were
looking for the same conditions without any macro.
- Get rid of adduser_binaries and fold any programs not already present
into shadowutils_binaries.
- Add new groups sysdigcloud_binaries and sysdigcloud_binaries_parent
and add them as exceptions for write_etc/write_binary_dir.
- Add yum as a package management binary and add it as an exception to
write_etc/write_binary_dir.
- Change how db_program_spawned_process works. Since all of the useful
information is on the exit side of the event, you can't really add a
condition based on the process being new. Isntead, have the rule
check for a non-database-related program being spawned by a
database-related program.
- Allow dragent to run shells.
- Add sendmail, sendmail-msp as a program that attempts to setuid.
- Some of the *_binaries macros that were based on dpkg -L accidentally
contained directories in addition to end files. Trim those.
- Add systemd-logind as a login_binary.
- Add unix_chkpwd as a shadowutils_binary.
- Add parentheses around any macros that group items using or. I found
this necessary when the macro is used in the middle of a list of and
conditions.
- Break out system_binaries into a new subset user_mgmt_binaries
containing login_, passwd_, and shadowutils_ binaries. That way you
don't have to pull in all of system_binaries when looking for
sensisitive files or user management activity.
- Rename fs-bash to fbash, thinking ahead to its more likely name.
We found during testing that rules without syscall/event conditions are
slower than other rules, so take a pass over the existing set of rules
ensuring that whenever possible they have a condition. The changes are:
- Only process executions by interactive users are monitored
- Only look at connect/listen/etc for system binaries performing
network activity
- Only monitor process executions when monitoring user management
programs.
Also comment out all application rules by default so users can opt-in
for the applications they use instead of getting a lot of application
monitoring they may not need. Add a note stating they're all disabled by
default and can be re-enabled as needed.
Finally, remove some less common applications where we haven't done live
testing.
These 3 changes, along with those in
https://github.com/draios/sysdig/pull/592, result in a significant
performance increase on busy servers.
For rules where evt.args had useful information but too much
information, add back specific values that have just the useful argument
from the event:
- spawned shells contain the commandline--it's the exit half of the
exec event so the current commandline is what was exec()d to.
- setuid contains the uid being switched to.
While I was testing these, I had a couple of other fixes:
- In the spawn shells rule, only track execve events so you don't catch
clone() events that precede an exec.
- in spawn_process only consider the exit half of the exec event.
A new macro package_mgmt_binaries includes dpkg and rpm. Those programs
are allowed to create directories and modify files below binary
directories. I'm not adding them to other trusted sets for now, though.
Try to clean up the language of the existing rule set, expanding the
output when possible, removing %evt.dir in most cases.
There is one substantive change: the mkdir half of modify_binary_dirs
was split out into its own rule mkdir_binary_dirs.
Add name and description fields to all rules. The name field is actually
a field called 'rule', which corresponds to the 'macro' field for
macros.
Within the rule loader, the state changes slightly. There are two
indices into the set of rules 'rules_by_name' and
'rules_by_idx' (formerly 'outputs'). They both now contain the original
table from the yaml parse. One field 'level' is added which is the
priority mapped to a number.
Get rid of the notion of default priority or output. Every rule must now
provide both.
Go through all current rules and add names and descriptions.
Update rules to reduce FPs after running against some real-world
environments with and without containers. Summary of changes:
- Too many processes read /etc/passwd--it's world-readable and a
side-effect of getpwent. Switch to /etc/shadow instead.
- Add a mail_binaries group. This wasn't directly used, but it may be
handy for other rules and goes along with the changes in #54.
- not_cron was the only macro expressing a negative, so switch it to be
a positive 'cron'. Also add crond as a cron process.
- add dragent to the set of programs that can call setns.
- For the shell detection rules, change them to only look for the
specific exec/clone event rather than all follow-on activity. Also
allow docker to spawn shell scripts--this is required for entrypoints
that use the shell instead of a direct exec. Also add a few
additional programs that can spawn shells.
- In containers, shells are allowed as long as the parent process is
docker or bash. Like the outside of container case, only the initial
clone/exec is detected.
- Fix a typo Sytem -> System.
- Change the chmod rule to only protect imporant/sensitive files. I saw
lots of "regular" files being chmod()ed.
- Change the setuid test to allow root to setuid to anything, rather
than listing a bunch of programs run as root that drop privileges.
- Allow running su/sudo in containers. Some containers add users from a
base linux distribution before running.
This will detect the result of some sql injection attacks where the
injected query tries to spawn a process.
We don't include web servers in this list for now due to things like
mod_perl, mod_php, etc. Maybe we can add it once we make exceptions for
those modules.
Add back detection for mysql and sensitive files that was removed in the
previous commit. A new macro proc_is_new adds a condition on how long a
process has been running.
A new rule triggers if the process is not new and tries to open a
sensitive file. This handles cases like mysql, where it *does* read
/etc/passwd on startup but shouldn't really open it afterward.
Add some new groups of binary programs as macros and start using them in
the set of rules:
- docker_binaries: docker and exe (which is a temporary process name
for processes like docker-proxy)
- http_server_binaries: httpd, nginx, and similar
- db_server_binaries: mysql for now, we'll add more later
- server_binaries: all of the above
- userexec_binaries: sudo and su.
Start using these groups in the rules. Most of the time, changing from
the inline lists of processes to macros was a no-op. There are some
actual changes, though:
- docker and exe are now allowed to read 'sensitive' files. They may
not actually do so, but it's not really harmful.
- lighttpd is now allowed to read 'sensitive' files, via inclusion in
http_server_binaries.
- su, lighttpd, and docker can now setuid.
- http-foreground is included as a http server wrt non-port 80/443 ports.
I'm going to use these macros in some of the following rules.
This actually prevents detection of mysql reading sensitive files, which
is one of the demo scenarios (sql injection). I plan on adding this
detection back in the next commit.
Make changes to falco_rules.yaml to make sure they work on the demo
scenarios without too many false positives. The specific changes are:
- Add /etc/ld.so.cache as an allowed shared library to open.
- Comment out the shared library check for now--there are lots of
locations below /usr/lib for things like python, perl, etc and I want
to get a fuller categorization first.
- Add a few additional parent processes that can spawn shells, write
sensitive files, and call setuid. Also allow bash shells with no
parent to spawn shells. We may want to disallow this but I suspect a
better place to detect is the parent-less bash shell becoming a
session leader.
- Add rules for fs-bash (falco-safe bash), which is used in the curl
<url> | bash installer demo. The idea is that fs-bash has restrictions
on what it and child proceses can do.
- Add trailing '/' characters to path names in bin_dir_* so paths like
/tmp/binary don't accidentally match '/bin'
Note that as process names are truncated to 15 characters, long process
names like 'httpd-foregroun' are intentionally truncated.
The ignored syscalls in macros were:
- write: renamed to open_write to make its weaker resolution more
apparent. Checks for open with any flag that could change a file.
- read: renamed to open_read. Checks for open with any read flag.
- sendto: I couldn't think of any way to replace this, so I simply
removed it with a comment.
I kept the original read/write macros commented out with a note that
they use ignored syscalls.
I have not tested these changes yet other than verifying that falco
starts properly.
As pointed out by Loris, timestamping output messages should be a
responsibility of the output/collection system.
So as a first step towards this, add timestamps automatically for output
formats, and remove them from rules.