Merge pull request #387 from devimc/topic/cpuCgroups

constraints: add cpu cgroups documentation
2025-08-15 06:34:03 +00:00 · 2019-02-28 11:48:32 -06:00 · 2019-02-28 11:48:32 -06:00 · b702f5028d
commit b702f5028d
parent 3fa6a83d4d 1e1a735796
1 changed files with 91 additions and 0 deletions
--- a/constraints/cpu.md
+++ b/constraints/cpu.md
@ -1,3 +1,16 @@
+* [CPU constraints in Kata Containers](#cpu-constraints-in-kata-containers)
+    * [Default number of virtual CPUs](#default-number-of-virtual-cpus)
+    * [Virtual CPUs and Kubernetes pods](#virtual-cpus-and-kubernetes-pods)
+    * [Container lifecycle](#container-lifecycle)
+    * [Container without CPU constraint](#container-without-cpu-constraint)
+    * [Container with CPU constraint](#container-with-cpu-constraint)
+    * [Do not waste resources](#do-not-waste-resources)
+    * [CPU cgroups](#cpu-cgroups)
+    * [cgroups in the guest](#cgroups-in-the-guest)
+        * [CPU pinning](#cpu-pinning)
+    * [cgroups in the host](#cgroups-in-the-host)
+
+
 # CPU constraints in Kata Containers

 ## Default number of virtual CPUs
@ -157,6 +170,84 @@ docker run --cpus 4 -ti debian bash -c "nproc; cat /sys/fs/cgroup/cpu,cpuacct/cp
 400000  # cfs quota
 ```

+
+## CPU cgroups
+
+Kata Containers runs over two layers of cgroups, the first layer is in the guest where
+only the workload is placed, the second layer is in the host that is more complex and
+might contain more than one process and task (thread) depending of the number of
+containers per POD and vCPUs per container. The following diagram represents a nginx container
+created with `docker` with the default number of vcpus.
+
+
+```
+$ docker run -dt --runtime=kata-runtime nginx
+
+
+       .-------.
+       | nginx |
+    .--'-------'---.  .------------.
+    | Guest Cgroup |  | Kata agent |
+  .-'--------------'--'------------'.    .-----------.
+  |  Thread: Hypervisor's vCPU 0    |    | Kata Shim |
+ .'---------------------------------'.  .'-----------'.
+ |             Tasks                 |  |  Processes  |
+.'-----------------------------------'--'-------------'.
+|                    Host Cgroup                       |
+'------------------------------------------------------'
+```
+
+The next sections explain the difference between processes and tasks and why only hypervisor
+vCPUs are constrained.
+
+### cgroups in the guest
+
+Only the workload process including all its threads are placed into cpu cgroups, this means
+that `kata-agent` and `systemd` run without constraints in the guest.
+
+#### CPU pinning
+
+Kata Containers tries to apply and honor the cgroups but sometimes that is not possible.
+An example of this occurs with cpu cgroups when the number of virtual CPUs (in the guest)
+does not match the actual number of physical host CPUs.
+In Kata Containers to have a good performance and small memory footprint, the resources are
+hot added when they are needed, therefore the number of virtual resources is not the same
+as the number of physical resources. The problem with this approach is that it's not possible
+to pin a process on a specific resource that is not present in the guest. To deal with this
+limitation and to not fail when the container is being created, Kata Containers does not apply
+the constraint in the first layer (guest) if the resource does not exist in the guest, but it
+is applied in the second layer (host) where the hypervisor is running. The constraint is applied
+in both layers when the resource is available in the guest and host. The next sections provide
+further details on what parts of the hypervisor are constrained.
+
+### cgroups in the host
+
+In Kata Containers the workloads run in a virtual machine that is managed and represented by a
+hypervisor running in the host. Like other processes the hypervisor might use threads to realize
+several tasks, for example IO and Network operations. One of the most important uses for the
+threads is as vCPUs. The processes running in the guest see these vCPUs as physical CPUs, while
+in the host those vCPU are just threads that are part of a process. This is the key to ensure
+workloads consumes only the amount of CPU resources that were assigned to it without impacting
+other operations. From user perspective the easier approach to implement it would be to take the
+whole hypervisor including its threads and move them into the cgroup, unfortunately this will
+impact negatively the performance, since vCPUs, IO and Network threads will be fighting for
+resources. The following table shows a random read performance comparison between a Kata Container
+with all its hypervisor threads in the cgroup and other with only its hypervisor vCPU threads
+constrained, the difference is huge.
+
+
+| Bandwidth     | All threads   | vCPU threads | Units |
+|:-------------:|:-------------:|:------------:|:-----:|
+| 4k            | 136.2         | 294.7        | MB/s  |
+| 8k            | 166.6         | 579.4        | MB/s  |
+| 16k           | 178.3         | 1093.3       | MB/s  |
+| 32k           | 179.9         | 1931.5       | MB/s  |
+| 64k           | 213.6         | 3994.2       | MB/s  |
+
+
+To have the best performance in Kata Containers only the vCPU threads are constrained.
+
+
 [1]: https://docs.docker.com/config/containers/resource_constraints/#cpu
 [2]: https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource
 [3]: https://kubernetes.io/docs/concepts/workloads/pods/pod/