mirror of
https://github.com/kata-containers/kata-containers.git
synced 2025-04-29 04:04:45 +00:00
agent: set init process non-dumpable
On old kernels (like v4.9), kernel applies CLOECEC in wrong order w.r.t. dumpable task flags. As a result, we might leak guest file descriptor to containers. This is a former runc CVE-2016-9962 and still applies to kata agent. Although Kata container is still valid at protecting the host, we should not leak extra resources to user containers. This sets the init processes that join and setup the container's namespaces as non-dumpable before they setns to the container's pid (or any other ) namespace. This settings is automatically reset to the default after the Exec in the container so that it does not change functionality for the applications that are running inside, just our init processes. This prevents parent processes, the pid 1 of the container, to ptrace the init process before it drops caps and other sets LSMs. The order during the exec syscall is that the process is set back to dumpable before O_CLOEXEC are processed. Refs: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=613cc2b6f272c1a8ad33aefa21cad77af23139f7 https://github.com/torvalds/linux/blob/v4.9/fs/exec.c#L1290-L1318 opencontainers/runc@50a19c6 https://nvd.nist.gov/vuln/detail/CVE-2016-9962 Fixes: #890 Signed-off-by: Peng Tao <bergwolf@hyper.sh>
This commit is contained in:
parent
3f8e619c2f
commit
15b7156348
@ -456,6 +456,24 @@ fn do_init_child(cwfd: RawFd) -> Result<()> {
|
||||
setrlimit(rl)?;
|
||||
}
|
||||
|
||||
//
|
||||
// Make the process non-dumpable, to avoid various race conditions that
|
||||
// could cause processes in namespaces we're joining to access host
|
||||
// resources (or potentially execute code).
|
||||
//
|
||||
// However, if the number of namespaces we are joining is 0, we are not
|
||||
// going to be switching to a different security context. Thus setting
|
||||
// ourselves to be non-dumpable only breaks things (like rootless
|
||||
// containers), which is the recommendation from the kernel folks.
|
||||
//
|
||||
// Ref: https://github.com/opencontainers/runc/commit/50a19c6ff828c58e5dab13830bd3dacde268afe5
|
||||
//
|
||||
if !nses.is_empty() {
|
||||
if let Err(e) = prctl::set_dumpable(false) {
|
||||
return Err(anyhow!(e).context("set process non-dumpable failed"));
|
||||
};
|
||||
}
|
||||
|
||||
if userns {
|
||||
log_child!(cfd_log, "enter new user namespace");
|
||||
sched::unshare(CloneFlags::CLONE_NEWUSER)?;
|
||||
|
Loading…
Reference in New Issue
Block a user