agent: Run container workload in its own cgroup namespace

When cgroup v2 is in use, a container should only see its part of the
unified hierarchy in `/sys/fs/cgroup`, not the full hierarchy created
at the OS level. Similarly, `/proc/self/cgroup` inside the container
should display `0::/`, rather than a full path such as :

0::/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podde291f58_8f20_4d44_aa89_c9e538613d85.slice/crio-9e1823d09627f3c2d42f30d76f0d2933abdbc033a630aab732339c90334fbc5f.scope

What is needed here is isolation from the OS. Do that by running the
container in its own cgroup namespace. This matches what runc and
other non VM based runtimes do.

Fixes #9124

Signed-off-by: Greg Kurz <groug@kaod.org>
This commit is contained in:
Greg Kurz
2024-02-07 12:48:32 +01:00
parent 14886c7b32
commit 600b951afd

View File

@@ -556,6 +556,10 @@ fn do_init_child(cwfd: RawFd) -> Result<()> {
sched::unshare(to_new & !CloneFlags::CLONE_NEWUSER)?;
if cgroups::hierarchies::is_cgroup2_unified_mode() {
sched::unshare(CloneFlags::CLONE_NEWCGROUP)?;
}
if userns {
bind_device = true;
}