Merge pull request #398 from amshinde/sysctl-docs

sysctsl: Add how-to doc for setting sysctls.
This commit is contained in:
Sebastien Boeuf 2019-03-15 10:18:28 -07:00 committed by GitHub
commit 4d65fb4ec4
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 144 additions and 0 deletions

View File

@ -20,6 +20,7 @@ For details of the other Kata Containers repositories, see the
* HOWTO: [Kata Containers with k8s and cri-containerd](./how-to/how-to-use-k8s-with-cri-containerd-and-kata.md)
* HOWTO: [OpenStack Zun with Kata Containers](zun/zun_kata.md)
* HOWTO: [Kata Containers with Firecracker](https://github.com/kata-containers/documentation/wiki/Initial-release-of-Kata-Containers-with-Firecracker-support)
* HOWTO: [Sysctls with Kata Containers](./how-to/how-to-use-sysctls-with-kata.md)
## Developer Guide

View File

@ -0,0 +1,143 @@
# Setting Sysctls with Kata
## Sysctls
In Linux, the sysctl interface allows an administrator to modify kernel
parameters at runtime. Parameters are available via the `/proc/sys/` virtual
process file system.
The parameters include the following subsystems among others:
- `fs` (file systems)
- `kernel` (kernel)
- `net` (networking)
- `vm` (virtual memory)
To get a complete list of kernel parameters, run:
```
$ sudo sysctl -a
```
Both Docker and Kubernetes provide mechanisms for setting namespaced sysctls.
Namespaced sysctls can be set per pod in the case of Kubernetes or per container
in case of Docker.
The following sysctls are known to be namespaced and can be set with
Docker and Kubernetes:
- `kernel.shm*`
- `kernel.msg*`
- `kernel.sem`
- `fs.mqueue.*`
- `net.*`
### Namespaced Sysctls:
Kata Containers supports setting namespaced sysctls with Docker and Kubernetes.
All namespaced sysctls can be set in the same way as regular Linux based
containers, the difference being, in the case of Kata they are set inside the guest.
#### Setting Namespaced Sysctls with Docker:
```
$ sudo docker run --runtime=kata-runtime -it alpine cat /proc/sys/fs/mqueue/queues_max
256
$ sudo docker run --runtime=kata-runtime --sysctl fs.mqueue.queues_max=512 -it alpine cat /proc/sys/fs/mqueue/queues_max
512
```
... and:
```
$ sudo docker run --runtime=kata-runtime -it alpine cat /proc/sys/kernel/shmmax
18446744073692774399
$ sudo docker run --runtime=kata-runtime --sysctl kernel.shmmax=1024 -it alpine cat /proc/sys/kernel/shmmax
1024
```
For additional documentation on setting sysctls with Docker please refer to [Docker-sysctl-doc](https://docs.docker.com/engine/reference/commandline/run/#configure-namespaced-kernel-parameters-sysctls-at-runtime).
#### Setting Namespaced Sysctls with Kubernetes:
Kubernetes considers certain sysctls as safe and others as unsafe. For detailed
information about what sysctls are considered unsafe, please refer to the [Kubernetes sysctl docs](https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/).
For using unsafe systcls, the cluster admin would need to allow these as:
```
$ kubelet --allowed-unsafe-sysctls 'kernel.msg*,net.ipv4.route.min_pmtu' ...
```
or using the declarative approach as:
```
$ cat kubeadm.yaml
apiVersion: kubeadm.k8s.io/v1alpha3
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
allowed-unsafe-sysctls: "kernel.msg*,kernel.shm.*,net.*"
...
```
The above yaml can then be passed to `kubeadm init` as:
```
$ sudo -E kubeadm init --config=kubeadm.yaml
```
Both safe and unsafe sysctls can be enabled in the same way in the Pod yaml:
```
apiVersion: v1
kind: Pod
metadata:
name: sysctl-example
spec:
securityContext:
sysctls:
- name: kernel.shm_rmid_forced
value: "0"
- name: net.ipv4.route.min_pmtu
value: "1024"
```
### Non-Namespaced Sysctls:
Docker and Kubernetes disallow sysctls without a namespace.
The recommendation is to set them directly on the host or use a privileged
container in the case of Kubernetes.
In the case of Kata, the approach of setting sysctls on the host does not
work since the host sysctls have no effect on a Kata Container running
inside a guest. Kata gives you the ability to set non-namespaced sysctls using a privileged container.
This has the advantage that the non-namespaced sysctls are set inside the guest
without having any effect on the `/proc/sys` values of any other pod or the
host itself.
The recommended approach to do this would be to set the sysctl value in a
privileged init container. In this way, the application containers do not need any elevated
privileges.
```
apiVersion: v1
kind: Pod
metadata:
name: busybox-kata
spec:
runtimeClassName: kata-qemu
securityContext:
sysctls:
- name: kernel.shm_rmid_forced
value: "0"
containers:
- name: busybox-container
securityContext:
privileged: true
image: debian
command:
- sleep
- "3000"
initContainers:
- name: init-sys
securityContext:
privileged: true
image: busybox
command: ['sh', '-c', 'echo "64000" > /proc/sys/vm/max_map_count']
```