mirror of
https://github.com/kata-containers/kata-containers.git
synced 2025-08-02 00:02:01 +00:00
Merge pull request #398 from amshinde/sysctl-docs
sysctsl: Add how-to doc for setting sysctls.
This commit is contained in:
commit
4d65fb4ec4
@ -20,6 +20,7 @@ For details of the other Kata Containers repositories, see the
|
||||
* HOWTO: [Kata Containers with k8s and cri-containerd](./how-to/how-to-use-k8s-with-cri-containerd-and-kata.md)
|
||||
* HOWTO: [OpenStack Zun with Kata Containers](zun/zun_kata.md)
|
||||
* HOWTO: [Kata Containers with Firecracker](https://github.com/kata-containers/documentation/wiki/Initial-release-of-Kata-Containers-with-Firecracker-support)
|
||||
* HOWTO: [Sysctls with Kata Containers](./how-to/how-to-use-sysctls-with-kata.md)
|
||||
|
||||
## Developer Guide
|
||||
|
||||
|
143
how-to/how-to-use-sysctls-with-kata.md
Normal file
143
how-to/how-to-use-sysctls-with-kata.md
Normal file
@ -0,0 +1,143 @@
|
||||
# Setting Sysctls with Kata
|
||||
|
||||
## Sysctls
|
||||
In Linux, the sysctl interface allows an administrator to modify kernel
|
||||
parameters at runtime. Parameters are available via the `/proc/sys/` virtual
|
||||
process file system.
|
||||
|
||||
The parameters include the following subsystems among others:
|
||||
- `fs` (file systems)
|
||||
- `kernel` (kernel)
|
||||
- `net` (networking)
|
||||
- `vm` (virtual memory)
|
||||
|
||||
To get a complete list of kernel parameters, run:
|
||||
```
|
||||
$ sudo sysctl -a
|
||||
```
|
||||
|
||||
Both Docker and Kubernetes provide mechanisms for setting namespaced sysctls.
|
||||
Namespaced sysctls can be set per pod in the case of Kubernetes or per container
|
||||
in case of Docker.
|
||||
The following sysctls are known to be namespaced and can be set with
|
||||
Docker and Kubernetes:
|
||||
|
||||
- `kernel.shm*`
|
||||
- `kernel.msg*`
|
||||
- `kernel.sem`
|
||||
- `fs.mqueue.*`
|
||||
- `net.*`
|
||||
|
||||
### Namespaced Sysctls:
|
||||
|
||||
Kata Containers supports setting namespaced sysctls with Docker and Kubernetes.
|
||||
All namespaced sysctls can be set in the same way as regular Linux based
|
||||
containers, the difference being, in the case of Kata they are set inside the guest.
|
||||
|
||||
#### Setting Namespaced Sysctls with Docker:
|
||||
|
||||
```
|
||||
$ sudo docker run --runtime=kata-runtime -it alpine cat /proc/sys/fs/mqueue/queues_max
|
||||
256
|
||||
$ sudo docker run --runtime=kata-runtime --sysctl fs.mqueue.queues_max=512 -it alpine cat /proc/sys/fs/mqueue/queues_max
|
||||
512
|
||||
```
|
||||
|
||||
... and:
|
||||
|
||||
```
|
||||
$ sudo docker run --runtime=kata-runtime -it alpine cat /proc/sys/kernel/shmmax
|
||||
18446744073692774399
|
||||
$ sudo docker run --runtime=kata-runtime --sysctl kernel.shmmax=1024 -it alpine cat /proc/sys/kernel/shmmax
|
||||
1024
|
||||
```
|
||||
|
||||
For additional documentation on setting sysctls with Docker please refer to [Docker-sysctl-doc](https://docs.docker.com/engine/reference/commandline/run/#configure-namespaced-kernel-parameters-sysctls-at-runtime).
|
||||
|
||||
|
||||
#### Setting Namespaced Sysctls with Kubernetes:
|
||||
|
||||
Kubernetes considers certain sysctls as safe and others as unsafe. For detailed
|
||||
information about what sysctls are considered unsafe, please refer to the [Kubernetes sysctl docs](https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/).
|
||||
For using unsafe systcls, the cluster admin would need to allow these as:
|
||||
|
||||
```
|
||||
$ kubelet --allowed-unsafe-sysctls 'kernel.msg*,net.ipv4.route.min_pmtu' ...
|
||||
```
|
||||
|
||||
or using the declarative approach as:
|
||||
|
||||
```
|
||||
$ cat kubeadm.yaml
|
||||
apiVersion: kubeadm.k8s.io/v1alpha3
|
||||
kind: InitConfiguration
|
||||
nodeRegistration:
|
||||
kubeletExtraArgs:
|
||||
allowed-unsafe-sysctls: "kernel.msg*,kernel.shm.*,net.*"
|
||||
...
|
||||
```
|
||||
|
||||
The above yaml can then be passed to `kubeadm init` as:
|
||||
```
|
||||
$ sudo -E kubeadm init --config=kubeadm.yaml
|
||||
```
|
||||
|
||||
Both safe and unsafe sysctls can be enabled in the same way in the Pod yaml:
|
||||
|
||||
```
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: sysctl-example
|
||||
spec:
|
||||
securityContext:
|
||||
sysctls:
|
||||
- name: kernel.shm_rmid_forced
|
||||
value: "0"
|
||||
- name: net.ipv4.route.min_pmtu
|
||||
value: "1024"
|
||||
```
|
||||
|
||||
### Non-Namespaced Sysctls:
|
||||
|
||||
Docker and Kubernetes disallow sysctls without a namespace.
|
||||
The recommendation is to set them directly on the host or use a privileged
|
||||
container in the case of Kubernetes.
|
||||
|
||||
In the case of Kata, the approach of setting sysctls on the host does not
|
||||
work since the host sysctls have no effect on a Kata Container running
|
||||
inside a guest. Kata gives you the ability to set non-namespaced sysctls using a privileged container.
|
||||
This has the advantage that the non-namespaced sysctls are set inside the guest
|
||||
without having any effect on the `/proc/sys` values of any other pod or the
|
||||
host itself.
|
||||
|
||||
The recommended approach to do this would be to set the sysctl value in a
|
||||
privileged init container. In this way, the application containers do not need any elevated
|
||||
privileges.
|
||||
|
||||
```
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: busybox-kata
|
||||
spec:
|
||||
runtimeClassName: kata-qemu
|
||||
securityContext:
|
||||
sysctls:
|
||||
- name: kernel.shm_rmid_forced
|
||||
value: "0"
|
||||
containers:
|
||||
- name: busybox-container
|
||||
securityContext:
|
||||
privileged: true
|
||||
image: debian
|
||||
command:
|
||||
- sleep
|
||||
- "3000"
|
||||
initContainers:
|
||||
- name: init-sys
|
||||
securityContext:
|
||||
privileged: true
|
||||
image: busybox
|
||||
command: ['sh', '-c', 'echo "64000" > /proc/sys/vm/max_map_count']
|
||||
```
|
Loading…
Reference in New Issue
Block a user