Merge pull request #5154 from Yuan-Zhuo/main

agent: support systemd cgroup for kata agent.
This commit is contained in:
Bin Liu
2022-11-28 18:40:10 +08:00
committed by GitHub
31 changed files with 4542 additions and 145 deletions

View File

@@ -9,6 +9,7 @@ Kata Containers design documents:
- [VCPU handling](vcpu-handling.md)
- [VCPU threads pinning](vcpu-threads-pinning.md)
- [Host cgroups](host-cgroups.md)
- [Agent systemd cgroup](agent-systemd-cgroup.md)
- [`Inotify` support](inotify.md)
- [Metrics(Kata 2.0)](kata-2-0-metrics.md)
- [Design for Kata Containers `Lazyload` ability with `nydus`](kata-nydus-design.md)

View File

@@ -0,0 +1,84 @@
# Systemd Cgroup for Agent
As we know, we can interact with cgroups in two ways, **`cgroupfs`** and **`systemd`**. The former is achieved by reading and writing cgroup `tmpfs` files under `/sys/fs/cgroup` while the latter is done by configuring a transient unit by requesting systemd. Kata agent uses **`cgroupfs`** by default, unless you pass the parameter `--systemd-cgroup`.
## usage
For systemd, kata agent configures cgroups according to the following `linux.cgroupsPath` format standard provided by `runc` (`[slice]:[prefix]:[name]`). If you don't provide a valid `linux.cgroupsPath`, kata agent will treat it as `"system.slice:kata_agent:<container-id>"`.
> Here slice is a systemd slice under which the container is placed. If empty, it defaults to system.slice, except when cgroup v2 is used and rootless container is created, in which case it defaults to user.slice.
>
> Note that slice can contain dashes to denote a sub-slice (e.g. user-1000.slice is a correct notation, meaning a `subslice` of user.slice), but it must not contain slashes (e.g. user.slice/user-1000.slice is invalid).
>
> A slice of `-` represents a root slice.
>
> Next, prefix and name are used to compose the unit name, which is `<prefix>-<name>.scope`, unless name has `.slice` suffix, in which case prefix is ignored and the name is used as is.
## supported properties
The kata agent will translate the parameters in the `linux.resources` of `config.json` into systemd unit properties, and send it to systemd for configuration. Since systemd supports limited properties, only the following parameters in `linux.resources` will be applied. We will simply treat hybrid mode as legacy mode by the way.
- CPU
- v1
| runtime spec resource | systemd property name |
| --------------------- | --------------------- |
| `cpu.shares` | `CPUShares` |
- v2
| runtime spec resource | systemd property name |
| -------------------------- | -------------------------- |
| `cpu.shares` | `CPUShares` |
| `cpu.period` | `CPUQuotaPeriodUSec`(v242) |
| `cpu.period` & `cpu.quota` | `CPUQuotaPerSecUSec` |
- MEMORY
- v1
| runtime spec resource | systemd property name |
| --------------------- | --------------------- |
| `memory.limit` | `MemoryLimit` |
- v2
| runtime spec resource | systemd property name |
| ------------------------------ | --------------------- |
| `memory.low` | `MemoryLow` |
| `memory.max` | `MemoryMax` |
| `memory.swap` & `memory.limit` | `MemorySwapMax` |
- PIDS
| runtime spec resource | systemd property name |
| --------------------- | --------------------- |
| `pids.limit ` | `TasksMax` |
- CPUSET
| runtime spec resource | systemd property name |
| --------------------- | -------------------------- |
| `cpuset.cpus` | `AllowedCPUs`(v244) |
| `cpuset.mems` | `AllowedMemoryNodes`(v244) |
## Systemd Interface
`session.rs` and `system.rs` in `src/agent/rustjail/src/cgroups/systemd/interface` are automatically generated by `zbus-xmlgen`, which is is an accompanying tool provided by `zbus` to generate Rust code from `D-Bus XML interface descriptions`. The specific commands to generate these two files are as follows:
```shell
// system.rs
zbus-xmlgen --system org.freedesktop.systemd1 /org/freedesktop/systemd1
// session.rs
zbus-xmlgen --session org.freedesktop.systemd1 /org/freedesktop/systemd1
```
The current implementation of `cgroups/systemd` uses `system.rs` while `session.rs` could be used to build rootless containers in the future.
## references
- [runc - systemd cgroup driver](https://github.com/opencontainers/runc/blob/main/docs/systemd.md)
- [systemd.resource-control — Resource control unit settings](https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html)