moby: device "all" will add to the cgroup whitelist

After the runc security advisory[1] the default cgroup device
whitelist was changed.

In previous versions every container had "rwm" (read, write, mknod)
for every device ("a" for all). Typically this was overridden by
container engines like Docker. In LinuxKit we left the permissive
default.

In recent `runc` versions the default allow-all rule was removed,
so a container can only access a device if it is specifically
granted access, which LinuxKit handles via a device: entry.

However it is inconvenient for pkg/format, pkg/mount, pkg/swap
to list all possible block devices up-front. Therefore we add the
ability to grant access to an entire class of device with a single
rule:

```
- path: all
  type: b
```

Obviously a paranoid user can still override this with a specific
major/minor number in a device: rule.

[1] https://github.com/opencontainers/runc/security/advisories/GHSA-g54h-m393-cpwq

Signed-off-by: David Scott <dave@recoil.org>
This commit is contained in:
David Scott 2021-10-14 14:20:51 +01:00
parent 24db42dd68
commit 46ea02f65b
6 changed files with 37 additions and 3 deletions

View File

@ -245,8 +245,18 @@ devices:
mode: 0666
```
See the [the getty package](../pkg/getty/build.yml) for a more complete example
and see [runc](https://github.com/opencontainers/runc/commit/60e21ec26e15945259d4b1e790e8fd119ee86467) for context).
See the [getty package](../pkg/getty/build.yml) for a more complete example
and see [runc](https://github.com/opencontainers/runc/commit/60e21ec26e15945259d4b1e790e8fd119ee86467) for context.
To grant access to all block devices use:
```
devices:
- path: all
type: b
```
See the [format package](../pkg/format/build.yml) for an example.
### Mount Options
When mounting filesystem paths into a container - whether as part of `onboot` or `services` - there are several options of which you need to be aware. Using them properly is necessary for your containers to function properly.

View File

@ -2,6 +2,10 @@ image: format
config:
binds:
- /dev:/dev
devices:
# all block devices
- path: all
type: b
capabilities:
- CAP_SYS_ADMIN
- CAP_MKNOD

View File

@ -15,6 +15,7 @@ config:
- /dev:/dev
- /sys:/sys
devices:
# individual console / tty character devices
- path: "/dev/console"
type: c
major: 5

View File

@ -4,6 +4,10 @@ config:
- /dev:/dev
- /var:/var:rshared,rbind
- /:/hostroot
devices:
# all block devices
- path: all
type: b
capabilities:
- CAP_SYS_ADMIN
rootfsPropagation: shared

View File

@ -3,6 +3,10 @@ config:
binds:
- /dev:/dev
- /var:/var
devices:
# all devices (/dev/mapper is a character device)
- path: all
type: a
capabilities:
- CAP_SYS_ADMIN
- CAP_MKNOD

View File

@ -1046,6 +1046,15 @@ func ConfigToOCI(yaml *Image, config imagespec.ImageConfig, idMap map[string]uin
devices := assignDevices(label.Devices, yaml.Devices)
var linuxDevices []specs.LinuxDevice
for _, device := range devices {
if device.Path == "all" {
// add a category of devices to the device whitelist cgroup controller
resources.Devices = append(resources.Devices, specs.LinuxDeviceCgroup{
Allow: true,
Type: device.Type,
Access: "rwm", // read, write, mknod
})
continue
}
mode, err := strconv.ParseInt(device.Mode, 8, 32)
if err != nil {
return oci, runtime, fmt.Errorf("Cannot parse device mode as octal value: %v", err)
@ -1059,6 +1068,8 @@ func ConfigToOCI(yaml *Image, config imagespec.ImageConfig, idMap map[string]uin
FileMode: &fileMode,
}
linuxDevices = append(linuxDevices, linuxDevice)
// to access the device it must be added to the device whitelist cgroup controller
// see https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/devices.html
resources.Devices = append(resources.Devices, deviceCgroup(linuxDevice))
}
@ -1089,6 +1100,6 @@ func deviceCgroup(device specs.LinuxDevice) specs.LinuxDeviceCgroup {
Type: device.Type,
Major: &device.Major,
Minor: &device.Minor,
Access: "rwm",
Access: "rwm", // read, write, mknod
}
}