diff --git a/docs/design/README.md b/docs/design/README.md index ad20cd7204..adcffd0196 100644 --- a/docs/design/README.md +++ b/docs/design/README.md @@ -12,7 +12,7 @@ Kata Containers design documents: - [Metrics(Kata 2.0)](kata-2-0-metrics.md) - [Design for Kata Containers `Lazyload` ability with `nydus`](kata-nydus-design.md) - [Design for direct-assigned volume](direct-blk-device-assignment.md) - +- [Design for core-scheduling](core-scheduling.md) --- - [Design proposals](proposals) diff --git a/docs/design/core-scheduling.md b/docs/design/core-scheduling.md new file mode 100644 index 0000000000..7602e21cfe --- /dev/null +++ b/docs/design/core-scheduling.md @@ -0,0 +1,12 @@ +# Core scheduling + +Core scheduling is a Linux kernel feature that allows only trusted tasks to run concurrently on +CPUs sharing compute resources (for example, hyper-threads on a core). + +Containerd versions >= 1.6.4 leverage this to treat all of the processes associated with a +given pod or container to be a single group of trusted tasks. To indicate this should be carried +out, containerd sets the `SCHED_CORE` environment variable for each shim it spawns. When this is +set, the Kata Containers shim implementation uses the `prctl` syscall to create a new core scheduling +domain for the shim process itself as well as future VMM processes it will start. + +For more details on the core scheduling feature, see the [Linux documentation](https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/core-scheduling.html). diff --git a/src/runtime/pkg/utils/schedcore.go b/src/runtime/pkg/utils/schedcore.go index e5084bfd9a..c35fecef4a 100644 --- a/src/runtime/pkg/utils/schedcore.go +++ b/src/runtime/pkg/utils/schedcore.go @@ -18,11 +18,11 @@ const ( pidTypeProcessGroupId = 2 // Pid affects the current pid - Pid PidType = pidtypePid + Pid PidType = pidTypePid // ThreadGroup affects all threads in the group - ThreadGroup PidType = pidtypeTgid + ThreadGroup PidType = pidTypeThreadGroupId // ProcessGroup affects all processes in the group - ProcessGroup PidType = pidtypePgid + ProcessGroup PidType = pidTypeProcessGroupId ) // Create a new sched core domain