Clean up the handling of priority levels within rules. It used to be a
mix of strings handled in various places. Now, in falco_common.h there's
a consistent type for priority-as-number as well as a list of
priority-as-string values. Priorities are passed around as numbers
instead of strings. It's still permissive about capitalization.
Also add the ability to load rules by severity. New falco
config option "priority=<val>"/-o priority=<val> specifies the minimum
priority level of rules that will be loaded.
Add unit tests for same. The test suppresses INFO notifications for a
rule/trace file combination that would otherwise generate them.
Add the ability to append to rules/macros, like we already do with
lists. For rules/macros, if the object has an append: true key, the
condition value is appended to the condition of an existing rule/macro
with the same name.
Like lists, it's an error to specify append: true without there being an
existing rule/macro.
Also add tests that test the same kind of things we did for lists:
- That append: true really does append
- That append: false overwrites the rule/macro
- That it's an error to append with a prior rule/macro existing.
Add new unit tests to check that list substitution is working as
expected, with test cases for the list substitution occurring at the
beginning, middle, and end of a condition.
Also add tests that verify that overrides on list/macro/rule names
always occur in order.
* Updates from beta customers.
- add anacron as a cron program
* Reorganize package management binaries
Split package_management_binaries into two separate lists rpm_binaries
and deb_binaries. unattended-upgr is common to both worlds so it's still
in package_management_binaries.
Also change Write below rpm database to use rpm_binaries instead of its
own list.
Also add 75-system-updat (truncated) as a shell spawner.
* Add rules for jenkins
Add rules that allow jenkins to spawn shells, both in containers and
directly on the host.
Also handle jenkins slaves that run /tmp/slave.jar.
* Allow npm to run shells.
Not yet allowing node to run shells itself, although we want to add
something to reduce node-related FPs.
* Allow urlgrabber/git-remote to access /etc
urlgrabber and git-remote both try to access the RHEL nss database,
containing shared certificates. I may change this in a more general way
by changing open_read/open_write to only look for successful opens.
* Only look for successful open_read/open_writes
Change the macros open_read/open_write to only trigger on successful
opens (when fd.num > 0). This is a pretty big change to behavior, but
is more intuitive.
This required a small update to the open counts for a couple of unit
tests, but otherwise they still all passed with this change.
* Allow rename_device to write below /dev
Part of udev.
* Allow cloud-init to spawn shells.
Part of https://cloud-init.io/
* Allow python to run a shell that runs sdchecks
sdchecks is a part of the sysdig monitor agent.
* Allow dev creation binaries to write below etc.
Specifically this includes blkid and /etc/blkid/blkid.tab.
* Allow git binaries to spawn shells.
They were already allowed to run shells in a container.
* Add /dev/kmsg as an allowed /dev file
Allows userspace programs to write to kernel log.
* Allow other make programs to spawn shells.
Also allow gmake/cmake to spawn shells and put them in their own list
make_binaries.
* Add better mesos support.
Mesos slaves appear to be in a container due to their cgroup and can run
programs mesos-health-check/mesos-docker-exec to monitor the containers
on the slave, so allow them to run shells.
Add mesos-agent, mesos-logrotate, mesos-fetch as shell spawners both in
and out of containers.
Add gen_resolvconf. (short for gen_resolvconf.py) as a program that can
write to /etc.
Add toybox (used by mesos, part of http://landley.net/toybox/about.html)
as a shell spawner.
* systemd can listen on network ports.
Systemd can listen on network ports to launch daemons on demand, so
allow it to perform network activity.
* Let docker binaries setuid.
Let docker binaries setuid and add docker-entrypoi (truncation
intentional) to the set of docker binaries.
* Change cis-related rules to be less noisy
Change the two cis-related falco rules "File Open by Privileged
Container" and "Sensitive Mount by Container" to be less noisy. We found
in practice that tracking every open still results in too many falco
notifications.
For now, change the rules to only track the initial process start in the
container by looking for vpid=1. This should result in only triggering
when a privileged/sensitive mount container is started. This is slightly
less coverage but is far less noisy.
* Add quay.io/sysdig as trusted containers
These are used for sysdig cloud onpremise deployments.
* Add gitlab-runner-b(uild) as a gitlab binary.
Add gitlab-runner-b (truncated gitlab-runner-build) as a gitlab binary.
* Add ceph as a shell spawner.
Also allow ceph to spawn shells in a container.
* Allow some shells by command line.
For some mesos containers, where the container doesn't have an image and
is just a tarball in a cgroup/namespace, we don't have any image to work
with. In those cases, allow specific command lines.
* Allow user 'nobody' to setuid.
Allow the user nobody to setuid. This depends on the user nobody being
set up in the first place to have no access, but that should be an ok
assumption.
* Additional allowed shell commandlines
* Add additional shells.
* Allow multiple users to become themself.
Add rule somebody_becoming_themself that handles cases of nobody and
www-data trying to setuid to themself. The sysdig filter language
doesn't support template/variable values to allow "user.name=X and
evt.arg.uid=X for a given X", so we have to enumerate the users.
* More known spawn command lines
* Let make binaries be run in containers.
Some CI/CD pipelines build in containers.
* Add additional shell spawning command lines
* Add additional apt program apt-listchanges.
* Add gitlab-ce as shell spawning container.
* Allow PM2 to spawn shells in containers.
Was already in the general list, seen in some customers, so adding to
the in containers list.
* Clean up pass to fix long lines.
Take a pass through the rules making sure each line is < 120 characters.
* Change tests for privileged container rules.
Change unit tests to reflect the new privileged/sensitive mount
container rules that only detect container launch.
The default falco ruleset now has a wider variety of priorities, so
adjust the automated tests to match:
- Instead of creating a generic test yaml entry for every trace file in
traces-{positive,negative,info} with assumptions about detect levels,
add a new falco_traces.yaml.in multiplex file that has specific
information about the detect priorities and rule detect counts for each
trace file.
- If a given trace file doesn't have a corresponding entry in
falco_traces.yaml.in, a generic entry is added with a simple
detect: (True|False) value and level. That way you can get specific
detect levels/counts for existing trace files, but if you forget to
add a trace to falco_traces.yaml.in, you'll still get some coverage.
- falco_tests.yaml.in isn't added to any longer, so rename it to
falco_tests.yaml.
- Avocado is now run twice--once on each yaml file. The final test
passes if both avocado runs pass.
Add automated tests for running falco from a package and container. As a
result, this will also test building the kernel module as well as
runnning falco-probe-loader as a backup.
In travis.yml, switch to the docker-enabled vm and install dkms. This
changed the environment slightly, so change how avocado's python
dependencies are installed. After building falco, copy the .deb package
to docker/local and build a local docker image based on that package.
Add the following new tests:
- docker_package: this uses "docker run" to run the image created in
travis.yml. This includes using dkms to build the kernel module and
load it. In addition, the conf directory is mounted to /host/conf, the
rules directory is mounted to /host/rules, and the traces directory is
mounted to /host/traces.
- docker_package_local_driver: this disables dkms via a volume mount
that maps /dev/null to /usr/sbin/dkms and copies the kernel module by
hand into the container to /root/.sysdig/falco-probe-....ko. As a
result, falco-probe-loader will use the local kernel module instead
of building one itself.
- debian_package: this installs the .deb package and runs the installed
version of falco.
Ideally, there'd also be a test for downloading the driver, but since
the driver depends on the kernel as well as the falco version string,
you can't put a single driver on download.draios.com that will work
long-term.
These tests depend on the following new test attributes:
- package: if present, this points to the docker image/debian package
to install.
- addl_docker_run_args: if present, will be added to the docker run
command.
- copy_local_driver: if present, will copy the built kernel module to
~/.sysdig. ~/.sysdig/* is always cleared out before each test.
- run_duration: maps to falco's -M <secs> flag
- trace_file is now optional.
Also add some misc general test changes:
- Clean up our use of process.run. By default it will fail a test if the
run program returns non-zero, so we don't have to grab the exit
status. In addition, get rid of sudo in the command lines and use the
sudo attribute instead.
- Fix some tests that were writing to files below /tmp/falco_outputs
by creating the directory first. Useful when running avocado directly.
Start packaging (and building when necessary) a falco-specific kernel
module in falco releases. Previously, falco would depend on sysdig and
use its kernel module instead.
The kernel module was already templated to some degree in various
places, so we just had to change the templated name from
sysdig/sysdig-probe to falco/falco-probe.
In containers, run falco-probe-loader instead of
sysdig-probe-loader. This is actually a script in the sysdig repository
which is modified in https://github.com/draios/sysdig/pull/789, and uses
the filename to indicate what kernel module to build and/or load.
For the falco package itself, don't depend on sysdig any longer but instead
depend on dkms and its dependencies, using sysdig as a guide on the set
of required packages.
Additionally, for the package pre-install/post-install scripts start
running falco-probe-loader.
Finally, add a --version argument to falco so it can pass the desired
version string to falco-probe-loader.
- Instead of having a possibly null string pointer as the argument to
enable_* and process_event, have wrapper versions that assume a
default falco ruleset. The default ruleset name is a static member of
the falco_engine class, and the default ruleset id is created/found
in the constructor.
- This makes the whole mechanism simple enough that it doesn't require
seprarate testing, so remove the capability within falco to read a
ruleset from the environment and remove automated tests that specify
a ruleset.
- Make pattern/tags/ruleset arguments to enable_* functions const.
(I'll squash this down before I commit)
Add automated tests that verify the ability to tag sets of rules,
disable them with -T, and run them with -t, works:
- New test option disable_tags adds -T <tag> arguments to the falco
command line, and run_tags adds -t <tag> arguments to the falco command
line.
- A new trace file open-multiple-files.scap opens 13 different files,
and a new rules file has 13 different rules with all combinations of
the tags a, b, c (both forward and backward), a rule with an empty
list of tags, a rule with no tags field, and a rule with a completely
different tag d.
Using the above, add tests for:
- Both disabling all combations of a, b, c using disable_tags as well as
run all combinations of a, b, c, using run_tags.
- Specifying both disabled (-T/-D) and enabled (-t) rules. Not allowed.
- Specifying a ruleset while having tagged rules enabled, rules based
on a name disabled, and no particular rules enabled or disabled.
A new trace file falco-event-generator.scap contains the result of
running the falco event generator in docker, via:
docker run --security-opt seccomp=unconfined sysdig/falco-event-generator:latest /usr/local/bin/event_generator --once
Make sure this trace file detects the exact set of events we expect for
each rule. This required adding a new verification method
check_detections_by_rule that finds the per-rule counts and compares
them to the expected counts, which are included in the test description
under the key "detect_counts".
This is the first time a trace file for a test is actually in one of the
downloaded zip files. This means it will be tested twice (one for simple
detect-or-not, once for actual counts).
Adding this test showed a problem with Run shell in container
rule--since sysdig/falco-event-generator startswith sysdig/falco, it was
being treated as a trusted container. Modify the macro
trusted_containers to not allow falco-event-generator to be trusted.
Add a test that specifically tests truncated outputs. A rule contains an
output field %fd.cport which has no value for an open event. Ensure that
the rule's output has <NA> for the cport and the remainder of the rule's
output is filled in.
New tests that test every possible override:
- Overriding a rule with one that doesn't match
- Overriding a macro to one that doesn't match
- Overriding a top level list to a binary that doesn't match
- Overriding an embedded list to one that doesn't match
In each case, the override results in no longer matching an open by the
program "cat".
New argument --metric, which can be cpu|drops, controls whether to graph
cpu usage or event drop percentage. Titles/axis labels/etc. change
appropriately.
When run via scripts like run_performance_tests.sh, it's useful to
include extra info like the test being run and the specific program
variant to the stats file. So support that via the
environment. Environment keys starting with FALCO_STATS_EXTRA_XXX will
have the XXX and environment value added to the stats file.
It's undocumented as I doubt other programs will need this functionality
and it keeps the docs simpler.
Add the ability to check falco's return code with exit_status and to
generally match stderr with stderr_contains in a test.
Use those to create a test that has an invalid output expression using
%not_a_real_field. It expects falco to exit with 1 and the output to
contain a message about the invalid output.
Make necessary changes to allow run_performance_tests to invoke the
'test_mm' program we use internally.
Also add ability to run with a build directory separate from the source
directory and to specify an alternate rules file.
Finally, set up the kubernetes demo using sudo, a result of recent changes.
- In the regression tests, make the config file configurable in the
multiplex file via 'conf_file'.
- A new multiplex file item 'outputs' containing a list of <filename>:
<regex> tuples. For each item, the test reads the file and matches
each line against the regex. A match must be found for the test to
pass.
- Add 2 new tests that test file output and program output. They write
to files below /tmp/falco_outputs/ and the contents are checked to
ensure that alerts are written.
Add test that cover reading from multiple sets of rule files and
disabling rules. Specific changes:
- Modify falco to allow multiple -r arguments to read from multiple
files.
- In the test multiplex file, add a disabled_rules attribute,
containing a sequence of rules to disable. Result in -D arguments
when running falco.
- In the test multiplex file, 'rules_file' can be a sequence. It
results in multiple -r arguments when running falco.
- In the test multiplex file, 'detect_level' can be a squence of
multiple severity levels. All levels will be checked for in the
output.
- Move all test rules files to a rules subdirectory and all trace files
to a traces subdirectory.
- Add a small trace file for a simple cat of /dev/null. Used by the
new tests.
- Add the following new tests:
- Reading from multiple files, with the first file being
empty. Ensure that the rules from the second file are properly
loaded.
- Reading from multiple files with the last being empty. Ensures
that the empty file doesn't overwrite anything from the first
file.
- Reading from multiple files with varying severity levels for each
rule. Ensures that both files are properly read.
- Disabling rules from a rules file, both with full rule names
and regexes. Will result in not detecting anything.
Create standalone classes falco_engine/falco_outputs that can be
embedded in other programs. falco_engine is responsible for matching
events against rules, and falco_output is responsible for formatting an
alert string given an event and writing the alert string to all
configured outputs.
falco_engine's main interfaces are:
- load_rules/load_rules_file: Given a path to a rules file or a string
containing a set of rules, load the rules. Also loads needed lua code.
- process_event(): check the event against the set of rules and return
the results of a match, if any.
- describe_rule(): print details on a specific rule or all rules.
- print_stats(): print stats on the rules that matched.
- enable_rule(): enable/disable any rules matching a pattern. New falco
command line option -D allows you to disable one or more rules on the
command line.
falco_output's main interfaces are:
- init(): load needed lua code.
- add_output(): add an output channel for alert notifications.
- handle_event(): given an event that matches one or more rules, format
an alert message and send it to any output channels.
Each of falco_engine/falco_output maintains a separate lua state and
loads separate sets of lua files. The code to create and initialize the
lua state is in a base class falco_common.
falco_engine no longer logs anything. In the case of errors, it throws
exceptions. falco_logger is now only used as a logging mechanism for
falco itself and as an output method for alert messages. (This should
really probably be split, but it's ok for now).
falco_engine contains an sinsp_evttype_filter object containing the set
of eventtype filters. Instead of calling
m_inspector->add_evttype_filter() to add a filter created by the
compiler, call falco_engine::add_evttype_filter() instead. This means
that the inspector runs with a NULL filter and all events are returned
from do_inspect. This depends on
https://github.com/draios/sysdig/pull/633 which has a wrapper around a
set of eventtype filters.
Some additional changes along with creating these classes:
- Some cleanups of unnecessary header files, cmake include_directory()s,
etc to only include necessary includes and only include them in header
files when required.
- Try to avoid 'using namespace std' in header files, or assuming
someone else has done that. Generally add 'using namespace std' to all
source files.
- Instead of using sinsp_exception for all errors, define a
falco_engine_exception class for exceptions coming from the falco
engine and use it instead. For falco program code, switch to general
exceptions under std::exception and catch + display an error for all
exceptions, not just sinsp_exceptions.
- Remove fields.{cpp,h}. This was dead code.
- Start tracking counts of rules by priority string (i.e. what's in the
falco rules file) as compared to priority level (i.e. roughtly
corresponding to a syslog level). This keeps the rule processing and
rule output halves separate. This led to some test changes. The regex
used in the test is now case insensitive to be a bit more flexible.
- Now that https://github.com/draios/sysdig/pull/632 is merged, we can
delete the rules object (and its lua_parser) safely.
- Move loading the initial lua script to the constructor. Otherwise,
calling load_rules() twice re-loads the lua script and throws away any
state like the mapping from rule index to rule.
- Allow an empty rules file.
Finally, fix most memory leaks found by valgrind:
- falco_configuration wasn't deleting the allocated m_config yaml
config.
- several ifstreams were being created simply to test which falco
config file to use.
- In the lua output methods, an event formatter was being created using
falco.formatter() but there was no corresponding free_formatter().
This depends on changes in https://github.com/draios/sysdig/pull/640.
When the root directory contains the name 'agent', assume we're running
an agent and provide appropriate configuration and run the agent using
dragent.
You can make autodrop or falco configurable within the agent via
--agent-autodrop and --falco-agent.
Also include some other small changes like timestamping the json points.
Add tests that verify that the event type identification functionality
is working. Notable changes:
- Modify falco_test.py to additionally check for warnings when loading
any set of rules and verify that the event types for each rule match
expected values. This is controlled by the new multiplex fields
"rules_warning" and "rules_events".
- Instead of starting with an empty falco_tests.yaml from scratch from
the downloaded trace files, use a checked-in version which defines
two tests:
- Loading the checked-in falco_rules.yaml and verify that no rules
have warnings.
- A sample falco_rules_warnings.yaml that has ~30 different
mutations of rule filtering expressions. The test verifies for each
rule whether or not the rule should result in a warning and what the
extracted event types are.
The generated tests from the trace files are appended to this file.
- Add an empty .scap file to use with the above tests.
Add shell scripts to make it easier to collect performance results from
traces, live tests, and phoronix tests.
With run_performance_tests.sh you specify the following:
- a subject program to run, using --root
- a name to give to this set of results, using --variant
- a test to run, using --test
- a file to write the results to, using --results.
For tests that start with "trace", the script runs falco/sysdig on the
trace file and measures the time taken to read the file. For other
tests, he script handles starting falco/sysdig, starting a cpu
measurement script (a wrapper around top, just to provide identical
values to what you would see using top) to measure the cpu usage of
falco/sysdig, and running a live test.
The measurement interval for cpu usage depends on the test being run--10
seconds for most tests, 2 seconds for shorter tests.
The output is written as json to the file specified in --results.
Also add R scripts to easily display the results from the shell
script. plot-live.r shows a linechart of the cpu usage for the provided
variants over time. plot-traces.r shows grouped barcharts showing
user/system/total time taken for the provided variants and traces.
One bug--you have to make the results file actual json by adding
leading/trailing []s.
Pass the travis branch to run_regression_tests.sh. When downloading
trace files, first look for a file traces-XXX-$BRANCH and if found
download it. This allows testing out a set of changes with a trace file
specifically for that branch, that can be moved to the normal file once
the PR is merged.
Also increase the timeout for the spawned falco process from 1 to 3
minutes. In debug mode, the kubernetes demo was taking slightly over 1
minute.
Modify falco_test.py to look for a boolean multiplex attribute
'json_output'. If true, examine the lines of the output and for any line
that begins with '{', parse it as json and ensure it has the 4
attributes we expect.
Modify run_regression_tests to have a utility function
prepare_multiplex_fileset that does the work of looping over files in a
directory, along with detect, level, and json output arguments. The
appropriate multiplex attributes are added for each file.
Use that utility function to test json output for the positive and
informational directories along with non-json output. The negative
directory is only tested once.
Add additional rules related to using pipe installers within a fbash
session:
- Modify write_etc to only trigger if *not* in a fbash session. There's
a new rule write_etc_installer which has the same conditions when in
a fbash session, logging at INFO severity.
- A new rule write_rpm_database warns if any non package management
program tries to write below /var/lib/rpm.
- Add a new warning if any program below a fbash session tries to open
an outbound network connection on ports other than http(s) and dns.
- Add INFO level messages when programs in a fbash session try to run
package management binaries (rpm,yum,etc) or service
management (systemctl,chkconfig,etc) binaries.
In order to test these new INFO level rules, make up a third class of
trace files traces-info.zip containing trace files that should result in
info-level messages.
To differentiate warning and info level detection, add an attribute to
the multiplex file "detect_level", which is "Warning" for the files in
traces-positive and "Info" for the files in traces-info. Modify
falco_test.py to look specifically for a non-zero count for the given
detect_level.
Doing this exposed a bug in the way the level-specific counts were being
recorded--they were keeping counts by level name, not number. Fix that.
Do another round of rule cleanups now that we have a larger set of
positive and negative trace files to work with. Outside of this commit,
there are now trace files for all the positive rules, a docker-compose
startup and teardown, and some trace files from the sysdig cloud staging
environment.
Also add a script that runs sysdig with a filter that removes all the
syscalls not handled by falco as well as a few other high-volume,
low-information syscalls. This script was used to create the staging
environment trace files.
Notable rule changes:
- The direction for write_binary_dir/write_etc needs to be exit instead
of enter, as the bin_dir clause works on the file descriptor returned
by the open/openat call.
- Add login as a trusted binary that can read sensitive files (occurs
for direct console logins).
- sshd can read sensitive files well after startup, so exclude it from
the set of binaries that can trigger
read_sensitive_file_trusted_after_startup.
- limit run_shell_untrusted to non-containers.
- Disable the ssh_error_syslog rule for now. With the current
restriction on system calls (no read/write/sendto/recvfrom/etc), you
won't see the ssh error messages. Nevertheless, add a string to look
for to indicate ssh errors and add systemd's true location for the
syslog device.
- Sshd attemps to setuid even when it's not running as root, so exclude
it from the set of binaries to monitor for now.
- Let programs that are direct decendants of systemd spawn user
management tasks for now.
- Temporarily disable the EACCESS rule. This rule is exposing a bug in
sysdig in debug mode, https://github.com/draios/sysdig/issues/598. The
rule is also pretty noisy so I'll keep it disabled until the sysdig bug
is fixed.
- The etc_dir and bin_dir macros both have the problem that they match
pathnames with /etc/, /bin/, etc in the middle of the path, as sysdig
doesn't have a "begins with" comparison. Add notes for that.
- Change spawn_process to spawned_process to indicate that it's for the
exit side of the execve. Also use it in a few places that were
looking for the same conditions without any macro.
- Get rid of adduser_binaries and fold any programs not already present
into shadowutils_binaries.
- Add new groups sysdigcloud_binaries and sysdigcloud_binaries_parent
and add them as exceptions for write_etc/write_binary_dir.
- Add yum as a package management binary and add it as an exception to
write_etc/write_binary_dir.
- Change how db_program_spawned_process works. Since all of the useful
information is on the exit side of the event, you can't really add a
condition based on the process being new. Isntead, have the rule
check for a non-database-related program being spawned by a
database-related program.
- Allow dragent to run shells.
- Add sendmail, sendmail-msp as a program that attempts to setuid.
- Some of the *_binaries macros that were based on dpkg -L accidentally
contained directories in addition to end files. Trim those.
- Add systemd-logind as a login_binary.
- Add unix_chkpwd as a shadowutils_binary.
- Add parentheses around any macros that group items using or. I found
this necessary when the macro is used in the middle of a list of and
conditions.
- Break out system_binaries into a new subset user_mgmt_binaries
containing login_, passwd_, and shadowutils_ binaries. That way you
don't have to pull in all of system_binaries when looking for
sensisitive files or user management activity.
- Rename fs-bash to fbash, thinking ahead to its more likely name.
Start using the Avocado framework for automated regression
testing. Create a test FalcoTest in falco_test.py which can run on a
collection of trace files. The script test/run_regression_tests.sh is
responsible for pulling zip files containing the positive (falco should
detect) and negative (falco should not detect) trace files, creating a
Avocado multiplex file that defines all the tests (one for each trace
file), running avocado on all the trace files, and showing full logs for
any test that didn't pass.
The old regression script, which simply ran falco, has been removed.
Modify falco's stats output to show the total number of events detected
for use in the tests.
In travis.yml, pull a known stable version of avocado and build it,
including installing any dependencies, as a part of the build process.