agent: set init process non-dumpable

On old kernels (like v4.9), kernel applies CLOECEC in wrong order w.r.t.
dumpable task flags. As a result, we might leak guest file descriptor to
containers. This is a former runc CVE-2016-9962 and still applies to
kata agent. Although Kata container is still valid at protecting the
host, we should not leak extra resources to user containers.

This sets the init processes that join and setup the container's
namespaces as non-dumpable before they setns to the container's pid (or
any other ) namespace.

This settings is automatically reset to the default after the Exec in
the container so that it does not change functionality for the
applications that are running inside, just our init processes.

This prevents parent processes, the pid 1 of the container, to ptrace
the init process before it drops caps and other sets LSMs.

The order during the exec syscall is that the process is set back to
dumpable before O_CLOEXEC are processed.

Refs:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=613cc2b6f272c1a8ad33aefa21cad77af23139f7
https://github.com/torvalds/linux/blob/v4.9/fs/exec.c#L1290-L1318
opencontainers/runc@50a19c6
https://nvd.nist.gov/vuln/detail/CVE-2016-9962

Fixes: #890
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
This commit is contained in:
Peng Tao 2020-10-09 15:09:16 +08:00
parent 3f8e619c2f
commit 15b7156348

View File

@ -456,6 +456,24 @@ fn do_init_child(cwfd: RawFd) -> Result<()> {
setrlimit(rl)?;
}
//
// Make the process non-dumpable, to avoid various race conditions that
// could cause processes in namespaces we're joining to access host
// resources (or potentially execute code).
//
// However, if the number of namespaces we are joining is 0, we are not
// going to be switching to a different security context. Thus setting
// ourselves to be non-dumpable only breaks things (like rootless
// containers), which is the recommendation from the kernel folks.
//
// Ref: https://github.com/opencontainers/runc/commit/50a19c6ff828c58e5dab13830bd3dacde268afe5
//
if !nses.is_empty() {
if let Err(e) = prctl::set_dumpable(false) {
return Err(anyhow!(e).context("set process non-dumpable failed"));
};
}
if userns {
log_child!(cfd_log, "enter new user namespace");
sched::unshare(CloneFlags::CLONE_NEWUSER)?;