mirror of
https://github.com/kata-containers/kata-containers.git
synced 2025-05-01 21:24:36 +00:00
docs: Host cgroups documentation update
Update according to the new sandbox/overhead cgroup split. Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
This commit is contained in:
parent
9bed2ade0f
commit
8d9d6e6af0
@ -12,187 +12,244 @@ The OCI [runtime specification][linux-config] provides guidance on where the con
|
|||||||
> [`cgroupsPath`][cgroupspath]: (string, OPTIONAL) path to the cgroups. It can be used to either control the cgroups
|
> [`cgroupsPath`][cgroupspath]: (string, OPTIONAL) path to the cgroups. It can be used to either control the cgroups
|
||||||
> hierarchy for containers or to run a new process in an existing container
|
> hierarchy for containers or to run a new process in an existing container
|
||||||
|
|
||||||
cgroups are hierarchical, and this can be seen with the following pod example:
|
Cgroups are hierarchical, and this can be seen with the following pod example:
|
||||||
|
|
||||||
- Pod 1: `cgroupsPath=/kubepods/pod1`
|
- Pod 1: `cgroupsPath=/kubepods/pod1`
|
||||||
- Container 1:
|
- Container 1: `cgroupsPath=/kubepods/pod1/container1`
|
||||||
`cgroupsPath=/kubepods/pod1/container1`
|
- Container 2: `cgroupsPath=/kubepods/pod1/container2`
|
||||||
- Container 2:
|
|
||||||
`cgroupsPath=/kubepods/pod1/container2`
|
|
||||||
|
|
||||||
- Pod 2: `cgroupsPath=/kubepods/pod2`
|
- Pod 2: `cgroupsPath=/kubepods/pod2`
|
||||||
- Container 1:
|
- Container 1: `cgroupsPath=/kubepods/pod2/container2`
|
||||||
`cgroupsPath=/kubepods/pod2/container2`
|
- Container 2: `cgroupsPath=/kubepods/pod2/container2`
|
||||||
- Container 2:
|
|
||||||
`cgroupsPath=/kubepods/pod2/container2`
|
|
||||||
|
|
||||||
Depending on the upper-level orchestrator, the cgroup under which the pod is placed is
|
Depending on the upper-level orchestration layers, the cgroup under which the pod is placed is
|
||||||
managed by the orchestrator. In the case of Kubernetes, the pod-cgroup is created by Kubelet,
|
managed by the orchestrator or not. In the case of Kubernetes, the pod cgroup is created by Kubelet,
|
||||||
while the container cgroups are to be handled by the runtime. Kubelet will size the pod-cgroup
|
while the container cgroups are to be handled by the runtime.
|
||||||
based on the container resource requirements.
|
Kubelet will size the pod cgroup based on the container resource requirements, to which it may add
|
||||||
|
a configured set of [pod resource overheads](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-overhead/).
|
||||||
|
|
||||||
Kata Containers introduces a non-negligible overhead for running a sandbox (pod). Based on this, two scenarios are possible:
|
Kata Containers introduces a non-negligible resource overhead for running a sandbox (pod). Typically, the Kata shim,
|
||||||
1) The upper-layer orchestrator takes the overhead of running a sandbox into account when sizing the pod-cgroup, or
|
through its underlying VMM invocation, will create many additional threads compared to process based container runtimes:
|
||||||
2) Kata Containers do not fully constrain the VMM and associated processes, instead placing a subset of them outside of the pod-cgroup.
|
the para-virtualized I/O back-ends, the VMM instance or even the Kata shim process, all of those host processes consume
|
||||||
|
memory and CPU time not directly tied to the container workload, and introduces a sandbox resource overhead.
|
||||||
|
In order for a Kata workload to run without significant performance degradation, its sandbox overhead must be
|
||||||
|
provisioned accordingly. Two scenarios are possible:
|
||||||
|
|
||||||
Kata Containers provides two options for how cgroups are handled on the host. Selection of these options is done through
|
1) The upper-layer orchestrator takes the overhead of running a sandbox into account when sizing the pod cgroup.
|
||||||
the `SandboxCgroupOnly` flag within the Kata Containers [configuration](../../src/runtime/README.md#configuration)
|
For example, Kubernetes [`PodOverhead`](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-overhead/)
|
||||||
file.
|
feature lets the orchestrator add a configured sandbox overhead to the sum of all its containers resources. In
|
||||||
|
that case, the pod sandbox is properly sized and all Kata created processes will run under the pod cgroup
|
||||||
|
defined constraints and limits.
|
||||||
|
2) The upper-layer orchestrator does **not** take the sandbox overhead into account and the pod cgroup is not
|
||||||
|
sized to properly run all Kata created processes. With that scenario, attaching all the Kata processes to the sandbox
|
||||||
|
cgroup may lead to non-negligible workload performance degradations. As a consequence, Kata Containers will move
|
||||||
|
all processes but the vCPU threads into a dedicated overhead cgroup under `/kata_overhead`. The Kata runtime will
|
||||||
|
not apply any constraints or limits to that cgroup, it is up to the infrastructure owner to optionally set it up.
|
||||||
|
|
||||||
## `SandboxCgroupOnly` enabled
|
Those 2 scenarios are not dynamically detected by the Kata Containers runtime implementation, and thus the
|
||||||
|
infrastructure owner must configure the runtime according to how the upper-layer orchestrator creates and sizes the
|
||||||
|
pod cgroup. That configuration selection is done through the `sandbox_cgroup_only` flag within the Kata Containers
|
||||||
|
[configuration](../../src/runtime/README.md#configuration) file.
|
||||||
|
|
||||||
With `SandboxCgroupOnly` enabled, it is expected that the parent cgroup is sized to take the overhead of running
|
## `sandbox_cgroup_only = true`
|
||||||
a sandbox into account. This is ideal, as all the applicable Kata Containers components can be placed within the
|
|
||||||
given cgroup-path.
|
|
||||||
|
|
||||||
In the context of Kubernetes, Kubelet will size the pod-cgroup to take the overhead of running a Kata-based sandbox
|
Setting `sandbox_cgroup_only` to `true` from the Kata Containers configuration file means that the pod cgroup is
|
||||||
into account. This will be feasible in the 1.16 Kubernetes release through the `PodOverhead` feature.
|
properly sized and takes the pod overhead into account. This is ideal, as all the applicable Kata Containers processes
|
||||||
|
can simply be placed within the given cgroup path.
|
||||||
|
|
||||||
|
In the context of Kubernetes, Kubelet can size the pod cgroup to take the overhead of running a Kata-based sandbox
|
||||||
|
into account. This has been supported since the 1.16 Kubernetes release, through the
|
||||||
|
[`PodOverhead`](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-overhead/) feature.
|
||||||
|
|
||||||
```
|
```
|
||||||
+----------------------------------------------------------+
|
┌─────────────────────────────────────────┐
|
||||||
| +---------------------------------------------------+ |
|
│ │
|
||||||
| | +---------------------------------------------+ | |
|
│ ┌──────────────────────────────────┐ │
|
||||||
| | | +--------------------------------------+ | | |
|
│ │ │ │
|
||||||
| | | | kata-shimv2, VMM and threads: | | | |
|
│ │ ┌─────────────────────────────┐ │ │
|
||||||
| | | | (VMM, IO-threads, vCPU threads, etc)| | | |
|
│ │ │ │ │ │
|
||||||
| | | | | | | |
|
│ │ │ ┌─────────────────────┐ │ │ │
|
||||||
| | | | kata_<sandbox-id> | | | |
|
│ │ │ │ vCPU threads │ │ │ │
|
||||||
| | | +--------------------------------------+ | | |
|
│ │ │ │ I/O threads │ │ │ │
|
||||||
| | | | | |
|
│ │ │ │ VMM │ │ │ │
|
||||||
| | |Pod 1 | | |
|
│ │ │ │ Kata Shim │ │ │ │
|
||||||
| | +---------------------------------------------+ | |
|
│ │ │ │ │ │ │ │
|
||||||
| | | |
|
│ │ │ │ /kata_<sandbox_id> │ │ │ │
|
||||||
| | +---------------------------------------------+ | |
|
│ │ │ └─────────────────────┘ │ │ │
|
||||||
| | | +--------------------------------------+ | | |
|
│ │ │Pod 1 │ │ │
|
||||||
| | | | kata-shimv2, VMM and threads: | | | |
|
│ │ └─────────────────────────────┘ │ │
|
||||||
| | | | (VMM, IO-threads, vCPU threads, etc)| | | |
|
│ │ │ │
|
||||||
| | | | | | | |
|
│ │ ┌─────────────────────────────┐ │ │
|
||||||
| | | | kata_<sandbox-id> | | | |
|
│ │ │ │ │ │
|
||||||
| | | +--------------------------------------+ | | |
|
│ │ │ ┌─────────────────────┐ │ │ │
|
||||||
| | |Pod 2 | | |
|
│ │ │ │ vCPU threads │ │ │ │
|
||||||
| | +---------------------------------------------+ | |
|
│ │ │ │ I/O threads │ │ │ │
|
||||||
| |kubepods | |
|
│ │ │ │ VMM │ │ │ │
|
||||||
| +---------------------------------------------------+ |
|
│ │ │ │ Kata Shim │ │ │ │
|
||||||
| |
|
│ │ │ │ │ │ │ │
|
||||||
|Node |
|
│ │ │ │ /kata_<sandbox_id> │ │ │ │
|
||||||
+----------------------------------------------------------+
|
│ │ │ └─────────────────────┘ │ │ │
|
||||||
|
│ │ │Pod 2 │ │ │
|
||||||
|
│ │ └─────────────────────────────┘ │ │
|
||||||
|
│ │ │ │
|
||||||
|
│ │/kubepods │ │
|
||||||
|
│ └──────────────────────────────────┘ │
|
||||||
|
│ │
|
||||||
|
│ Node │
|
||||||
|
└─────────────────────────────────────────┘
|
||||||
```
|
```
|
||||||
|
|
||||||
### What does Kata do in this configuration?
|
### Implementation details
|
||||||
1. Given a `PodSandbox` container creation, let:
|
|
||||||
|
|
||||||
```
|
When `sandbox_cgroup_only` is enabled, the Kata shim will create a per pod
|
||||||
podCgroup=Parent(container.CgroupsPath)
|
sub-cgroup under the pod's dedicated cgroup. For example, in the Kubernetes context,
|
||||||
KataSandboxCgroup=<podCgroup>/kata_<PodSandboxID>
|
it will create a `/kata_<PodSandboxID>` under the `/kubepods` cgroup hierarchy.
|
||||||
```
|
On a typical cgroup v1 hierarchy mounted under `/sys/fs/cgroup/`, the memory cgroup
|
||||||
|
subsystem for a pod with sandbox ID `12345678` would live under
|
||||||
|
`/sys/fs/cgroup/memory/kubepods/kata_12345678`.
|
||||||
|
|
||||||
2. Create the cgroup, `KataSandboxCgroup`
|
In most cases, the `/kata_<PodSandboxID>` created cgroup is unrestricted and inherits and shares all
|
||||||
|
constraints and limits from the parent cgroup (`/kubepods` in the Kubernetes case). The exception is
|
||||||
|
for the `cpuset` and `devices` cgroup subsystems, which are managed by the Kata shim.
|
||||||
|
|
||||||
3. Join the `KataSandboxCgroup`
|
After creating the `/kata_<PodSandboxID>` cgroup, the Kata Containers shim will move itself to it, **before** starting
|
||||||
|
the virtual machine. As a consequence all processes subsequently created by the Kata Containers shim (the VMM itself, and
|
||||||
|
all vCPU and I/O related threads) will be created in the `/kata_<PodSandboxID>` cgroup.
|
||||||
|
|
||||||
Any process created by the runtime will be created in `KataSandboxCgroup`.
|
### Why create a kata-cgroup under the parent cgroup?
|
||||||
The runtime will limit the cgroup in the host only if the sandbox doesn't have a
|
|
||||||
container type annotation, but the caller is free to set the proper limits for the `podCgroup`.
|
|
||||||
|
|
||||||
In the example above the pod cgroups are `/kubepods/pod1` and `/kubepods/pod2`.
|
And why not directly adding the per sandbox shim directly to the pod cgroup (e.g.
|
||||||
Kata creates the unrestricted sandbox cgroup under the pod cgroup.
|
`/kubepods` in the Kubernetes context)?
|
||||||
|
|
||||||
### Why create a Kata-cgroup under the parent cgroup?
|
The Kata Containers shim implementation creates a per-sandbox cgroup
|
||||||
|
(`/kata_<PodSandboxID>`) to support the `Docker` use case. Although `Docker` does not
|
||||||
|
have a notion of pods, Kata Containers still creates a sandbox to support the pod-less,
|
||||||
|
single container use case that `Docker` implements. Since `Docker` does create any
|
||||||
|
cgroup hierarchy to place a container into, it would be very complex for Kata to map
|
||||||
|
a particular container to its sandbox without placing it under a `/kata_<containerID>>`
|
||||||
|
sub-cgroup first.
|
||||||
|
|
||||||
`Docker` does not have a notion of pods, and will not create a cgroup directory
|
### Advantages
|
||||||
to place a particular container in (i.e., all containers would be in a path like
|
|
||||||
`/docker/container-id`. To simplify the implementation and continue to support `Docker`,
|
|
||||||
Kata Containers creates the sandbox-cgroup, in the case of Kubernetes, or a container cgroup, in the case
|
|
||||||
of docker.
|
|
||||||
|
|
||||||
### Improvements
|
Keeping all Kata Containers processes under a properly sized pod cgroup is ideal
|
||||||
|
and makes for a simpler Kata Containers implementation. It also helps with gathering
|
||||||
|
accurate statistics and preventing Kata workloads from being noisy neighbors.
|
||||||
|
|
||||||
- Get statistics about pod resources
|
#### Pod resources statistics
|
||||||
|
|
||||||
If the Kata caller wants to know the resource usage on the host it can get
|
If the Kata caller wants to know the resource usage on the host it can get
|
||||||
statistics from the pod cgroup. All cgroups stats in the hierarchy will include
|
statistics from the pod cgroup. All cgroups stats in the hierarchy will include
|
||||||
the Kata overhead. This gives the possibility of gathering usage-statics at the
|
the Kata overhead. This gives the possibility of gathering usage-statics at the
|
||||||
pod level and the container level.
|
pod level and the container level.
|
||||||
|
|
||||||
- Better host resource isolation
|
#### Better host resource isolation
|
||||||
|
|
||||||
Because the Kata runtime will place all the Kata processes in the pod cgroup,
|
Because the Kata runtime will place all the Kata processes in the pod cgroup,
|
||||||
the resource limits that the caller applies to the pod cgroup will affect all
|
the resource limits that the caller applies to the pod cgroup will affect all
|
||||||
processes that belong to the Kata sandbox in the host. This will improve the
|
processes that belong to the Kata sandbox in the host. This will improve the
|
||||||
isolation in the host preventing Kata to become a noisy neighbor.
|
isolation in the host preventing Kata to become a noisy neighbor.
|
||||||
|
|
||||||
## `SandboxCgroupOnly` disabled (default, legacy)
|
## `sandbox_cgroup_only = false` (Default setting)
|
||||||
|
|
||||||
|
If the cgroup provided to Kata is not sized appropriately, Kata components will
|
||||||
|
consume resources that the actual container workloads expect to see and use.
|
||||||
|
This can cause instability and performance degradations.
|
||||||
|
|
||||||
|
To avoid that situation, Kata Containers creates an unconstrained overhead
|
||||||
|
cgroup and moves all non workload related processes (Anything but the virtual CPU
|
||||||
|
threads) to it. The name of this overhead cgroup is `/kata_overhead` and a per
|
||||||
|
sandbox sub cgroup will be created under it for each sandbox Kata Containers creates.
|
||||||
|
|
||||||
|
Kata Containers does not add any constraints or limitations on the overhead cgroup. It is up to the infrastructure
|
||||||
|
owner to either:
|
||||||
|
|
||||||
|
- Provision nodes with a pre-sized `/kata_overhead` cgroup. Kata Containers will
|
||||||
|
load that existing cgroup and move all non workload related processes to it.
|
||||||
|
- Let Kata Containers create the `/kata_overhead` cgroup, leave it
|
||||||
|
unconstrained or resize it a-posteriori.
|
||||||
|
|
||||||
If the cgroup provided to Kata is not sized appropriately, instability will be
|
|
||||||
introduced when fully constraining Kata components, and the user-workload will
|
|
||||||
see a subset of resources that were requested. Based on this, the default
|
|
||||||
handling for Kata Containers is to not fully constrain the VMM and Kata
|
|
||||||
components on the host.
|
|
||||||
|
|
||||||
```
|
```
|
||||||
+----------------------------------------------------------+
|
┌────────────────────────────────────────────────────────────────────┐
|
||||||
| +---------------------------------------------------+ |
|
│ │
|
||||||
| | +---------------------------------------------+ | |
|
│ ┌─────────────────────────────┐ ┌───────────────────────────┐ │
|
||||||
| | | +--------------------------------------+ | | |
|
│ │ │ │ │ │
|
||||||
| | | |Container 1 |-|Container 2 | | | |
|
│ │ ┌─────────────────────────┼────┼─────────────────────────┐ │ │
|
||||||
| | | | |-| | | | |
|
│ │ │ │ │ │ │ │
|
||||||
| | | | Shim+container1 |-| Shim+container2 | | | |
|
│ │ │ ┌─────────────────────┐ │ │ ┌─────────────────────┐ │ │ │
|
||||||
| | | +--------------------------------------+ | | |
|
│ │ │ │ vCPU threads │ │ │ │ VMM │ │ │ │
|
||||||
| | | | | |
|
│ │ │ │ │ │ │ │ I/O threads │ │ │ │
|
||||||
| | |Pod 1 | | |
|
│ │ │ │ │ │ │ │ Kata Shim │ │ │ │
|
||||||
| | +---------------------------------------------+ | |
|
│ │ │ │ │ │ │ │ │ │ │ │
|
||||||
| | | |
|
│ │ │ │ /kata_<sandbox_id> │ │ │ │ /<sandbox_id> │ │ │ │
|
||||||
| | +---------------------------------------------+ | |
|
│ │ │ └─────────────────────┘ │ │ └─────────────────────┘ │ │ │
|
||||||
| | | +--------------------------------------+ | | |
|
│ │ │ │ │ │ │ │
|
||||||
| | | |Container 1 |-|Container 2 | | | |
|
│ │ │ Pod 1 │ │ │ │ │
|
||||||
| | | | |-| | | | |
|
│ │ └─────────────────────────┼────┼─────────────────────────┘ │ │
|
||||||
| | | | Shim+container1 |-| Shim+container2 | | | |
|
│ │ │ │ │ │
|
||||||
| | | +--------------------------------------+ | | |
|
│ │ │ │ │ │
|
||||||
| | | | | |
|
│ │ ┌─────────────────────────┼────┼─────────────────────────┐ │ │
|
||||||
| | |Pod 2 | | |
|
│ │ │ │ │ │ │ │
|
||||||
| | +---------------------------------------------+ | |
|
│ │ │ ┌─────────────────────┐ │ │ ┌─────────────────────┐ │ │ │
|
||||||
| |kubepods | |
|
│ │ │ │ vCPU threads │ │ │ │ VMM │ │ │ │
|
||||||
| +---------------------------------------------------+ |
|
│ │ │ │ │ │ │ │ I/O threads │ │ │ │
|
||||||
| +---------------------------------------------------+ |
|
│ │ │ │ │ │ │ │ Kata Shim │ │ │ │
|
||||||
| | Hypervisor | |
|
│ │ │ │ │ │ │ │ │ │ │ │
|
||||||
| |Kata | |
|
│ │ │ │ /kata_<sandbox_id> │ │ │ │ /<sandbox_id> │ │ │ │
|
||||||
| +---------------------------------------------------+ |
|
│ │ │ └─────────────────────┘ │ │ └─────────────────────┘ │ │ │
|
||||||
| |
|
│ │ │ │ │ │ │ │
|
||||||
|Node |
|
│ │ │ Pod 2 │ │ │ │ │
|
||||||
+----------------------------------------------------------+
|
│ │ └─────────────────────────┼────┼─────────────────────────┘ │ │
|
||||||
|
│ │ │ │ │ │
|
||||||
|
│ │ /kubepods │ │ /kata_overhead │ │
|
||||||
|
│ └─────────────────────────────┘ └───────────────────────────┘ │
|
||||||
|
│ │
|
||||||
|
│ │
|
||||||
|
│ Node │
|
||||||
|
└────────────────────────────────────────────────────────────────────┘
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### What does this method do?
|
### Implementation Details
|
||||||
|
|
||||||
1. Given a container creation let `containerCgroupHost=container.CgroupsPath`
|
When `sandbox_cgroup_only` is disabled, the Kata Containers shim will create a per pod
|
||||||
1. Rename `containerCgroupHost` path to add `kata_`
|
sub-cgroup under the pods dedicated cgroup, and another one under the overhead cgroup.
|
||||||
1. Let `PodCgroupPath=PodSanboxContainerCgroup` where `PodSanboxContainerCgroup` is the cgroup of a container of type `PodSandbox`
|
For example, in the Kubernetes context, it will create a `/kata_<PodSandboxID>` under
|
||||||
1. Limit the `PodCgroupPath` with the sum of all the container limits in the Sandbox
|
the `/kubepods` cgroup hierarchy, and a `/<PodSandboxID>` under the `/kata_overhead` one.
|
||||||
1. Move only vCPU threads of hypervisor to `PodCgroupPath`
|
|
||||||
1. Per each container, move its `kata-shim` to its own `containerCgroupHost`
|
|
||||||
1. Move hypervisor and applicable threads to memory cgroup `/kata`
|
|
||||||
|
|
||||||
_Note_: the Kata Containers runtime will not add all the hypervisor threads to
|
On a typical cgroup v1 hierarchy mounted under `/sys/fs/cgroup/`, for a pod which sandbox
|
||||||
the cgroup path requested, only vCPUs. These threads are run unconstrained.
|
ID is `12345678`, create with `sandbox_cgroup_only` disabled, the 2 memory subsystems
|
||||||
|
for the sandbox cgroup and the overhead cgroup would respectively live under
|
||||||
|
`/sys/fs/cgroup/memory/kubepods/kata_12345678` and `/sys/fs/cgroup/memory/kata_overhead/12345678`.
|
||||||
|
|
||||||
This mitigates the risk of the VMM and other threads receiving an out of memory scenario (`OOM`).
|
Unlike when `sandbox_cgroup_only` is enabled, the Kata Containers shim will move itself
|
||||||
|
to the overhead cgroup first, and then move the vCPU threads to the sandbox cgroup as
|
||||||
|
they're created. All Kata processes and threads will run under the overhead cgroup except for
|
||||||
|
the vCPU threads.
|
||||||
|
|
||||||
|
With `sandbox_cgroup_only` disabled, Kata Containers assumes the pod cgroup is only sized
|
||||||
|
to accommodate for the actual container workloads processes. For Kata, this maps
|
||||||
|
to the VMM created virtual CPU threads and so they are the only ones running under the pod
|
||||||
|
cgroup. This mitigates the risk of the VMM, the Kata shim and the I/O threads going through
|
||||||
|
a catastrophic out of memory scenario (`OOM`).
|
||||||
|
|
||||||
#### Impact
|
#### Pros and Cons
|
||||||
|
|
||||||
If resources are reserved at a system level to account for the overheads of
|
Running all non vCPU threads under an unconstrained overhead cgroup could lead to workloads
|
||||||
running sandbox containers, this configuration can be utilized with adequate
|
potentially consuming a large amount of host resources.
|
||||||
stability. In this scenario, non-negligible amounts of CPU and memory will be
|
|
||||||
utilized unaccounted for on the host.
|
On the other hand, running all non vCPU threads under a dedicated overhead cgroup can provide
|
||||||
|
accurate metrics on the actual Kata Container pod overhead, allowing for tuning the overhead
|
||||||
|
cgroup size and constraints accordingly.
|
||||||
|
|
||||||
[linux-config]: https://github.com/opencontainers/runtime-spec/blob/master/config-linux.md
|
[linux-config]: https://github.com/opencontainers/runtime-spec/blob/master/config-linux.md
|
||||||
[cgroupspath]: https://github.com/opencontainers/runtime-spec/blob/master/config-linux.md#cgroups-path
|
[cgroupspath]: https://github.com/opencontainers/runtime-spec/blob/master/config-linux.md#cgroups-path
|
||||||
|
|
||||||
# Supported cgroups
|
# Supported cgroups
|
||||||
|
|
||||||
Kata Containers supports cgroups `v1` and `v2`. In the following sections each cgroup is
|
Kata Containers currently only supports cgroups `v1`.
|
||||||
described briefly and what changes are needed in Kata Containers to support it.
|
|
||||||
|
In the following sections each cgroup is described briefly.
|
||||||
|
|
||||||
## Cgroups V1
|
## Cgroups V1
|
||||||
|
|
||||||
@ -244,7 +301,7 @@ diagram:
|
|||||||
A process can join a cgroup by writing its process id (`pid`) to `cgroup.procs` file,
|
A process can join a cgroup by writing its process id (`pid`) to `cgroup.procs` file,
|
||||||
or join a cgroup partially by writing the task (thread) id (`tid`) to the `tasks` file.
|
or join a cgroup partially by writing the task (thread) id (`tid`) to the `tasks` file.
|
||||||
|
|
||||||
Kata Containers supports `v1` by default and no change in the configuration file is needed.
|
Kata Containers only supports `v1`.
|
||||||
To know more about `cgroups v1`, see [cgroupsv1(7)][2].
|
To know more about `cgroups v1`, see [cgroupsv1(7)][2].
|
||||||
|
|
||||||
## Cgroups V2
|
## Cgroups V2
|
||||||
@ -297,22 +354,13 @@ Same as `cgroups v1`, a process can join the cgroup by writing its process id (`
|
|||||||
`cgroup.procs` file, or join a cgroup partially by writing the task (thread) id (`tid`) to
|
`cgroup.procs` file, or join a cgroup partially by writing the task (thread) id (`tid`) to
|
||||||
`cgroup.threads` file.
|
`cgroup.threads` file.
|
||||||
|
|
||||||
For backwards compatibility Kata Containers defaults to supporting cgroups v1 by default.
|
Kata Containers does not support cgroups `v2` on the host.
|
||||||
To change this to `v2`, set `sandbox_cgroup_only=true` in the `configuration.toml` file.
|
|
||||||
To know more about `cgroups v2`, see [cgroupsv2(7)][3].
|
|
||||||
|
|
||||||
### Distro Support
|
### Distro Support
|
||||||
|
|
||||||
Many Linux distributions do not yet support `cgroups v2`, as it is quite a recent addition.
|
Many Linux distributions do not yet support `cgroups v2`, as it is quite a recent addition.
|
||||||
For more information about the status of this feature see [issue #2494][4].
|
For more information about the status of this feature see [issue #2494][4].
|
||||||
|
|
||||||
# Summary
|
|
||||||
|
|
||||||
| cgroup option | default? | status | pros | cons | cgroups
|
|
||||||
|-|-|-|-|-|-|
|
|
||||||
| `SandboxCgroupOnly=false` | yes | legacy | Easiest to make Kata work | Unaccounted for memory and resource utilization | v1
|
|
||||||
| `SandboxCgroupOnly=true` | no | recommended | Complete tracking of Kata memory and CPU utilization. In Kubernetes, the Kubelet can fully constrain Kata via the pod cgroup | Requires upper layer orchestrator which sizes sandbox cgroup appropriately | v1, v2
|
|
||||||
|
|
||||||
|
|
||||||
[1]: http://man7.org/linux/man-pages/man5/tmpfs.5.html
|
[1]: http://man7.org/linux/man-pages/man5/tmpfs.5.html
|
||||||
[2]: http://man7.org/linux/man-pages/man7/cgroups.7.html#CGROUPS_VERSION_1
|
[2]: http://man7.org/linux/man-pages/man7/cgroups.7.html#CGROUPS_VERSION_1
|
||||||
|
Loading…
Reference in New Issue
Block a user