docs: merge documentation repository

Generated by
git subtree add --prefix=docs git@github.com:kata-containers/documentation.git master

git-subtree-dir: docs
git-subtree-mainline: ec146a1b39
git-subtree-split: 510287204b

Fixes: #329
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
This commit is contained in:
Peng Tao
2020-06-23 21:21:53 -07:00
109 changed files with 10281 additions and 0 deletions

28
docs/how-to/README.md Normal file
View File

@@ -0,0 +1,28 @@
# Howto Guides
* [Howto Guides](#howto-guides)
* [Kubernetes Integration](#kubernetes-integration)
* [Hypervisors Integration](#hypervisors-integration)
* [Advanced Topics](#advanced-topics)
## Kubernetes Integration
- [Run Kata Containers with Kubernetes](run-kata-with-k8s.md)
- [How to use Kata Containers and Containerd](containerd-kata.md)
- [How to use Kata Containers and CRI (containerd plugin) with Kubernetes](how-to-use-k8s-with-cri-containerd-and-kata.md)
- [Kata Containers and service mesh for Kubernetes](service-mesh.md)
- [How to import Kata Containers logs into Fluentd](how-to-import-kata-logs-with-fluentd.md)
## Hypervisors Integration
- [Kata Containers with Firecracker](https://github.com/kata-containers/documentation/wiki/Initial-release-of-Kata-Containers-with-Firecracker-support)
- [Kata Containers with NEMU](how-to-use-kata-containers-with-nemu.md)
- [Kata Containers with ACRN Hypervisor](how-to-use-kata-containers-with-acrn.md)
## Advanced Topics
- [How to use Kata Containers with virtio-fs](how-to-use-virtio-fs-with-kata.md)
- [Setting Sysctls with Kata](how-to-use-sysctls-with-kata.md)
- [What Is VMCache and How To Enable It](what-is-vm-cache-and-how-do-I-use-it.md)
- [What Is VM Templating and How To Enable It](what-is-vm-templating-and-how-do-I-use-it.md)
- [Privileged Kata Containers](privileged.md)
- [How to load kernel modules in Kata Containers](how-to-load-kernel-modules-with-kata.md)
- [How to use Kata Containers with `virtio-mem`](how-to-use-virtio-mem-with-kata.md)
- [How to set sandbox Kata Containers configurations with pod annotations](how-to-set-sandbox-config-kata.md)

View File

@@ -0,0 +1,368 @@
# How to use Kata Containers and Containerd
- [Concepts](#concepts)
- [Kubernetes `RuntimeClass`](#kubernetes-runtimeclass)
- [Containerd Runtime V2 API: Shim V2 API](#containerd-runtime-v2-api-shim-v2-api)
- [Install](#install)
- [Install Kata Containers](#install-kata-containers)
- [Install containerd with CRI plugin](#install-containerd-with-cri-plugin)
- [Install CNI plugins](#install-cni-plugins)
- [Install `cri-tools`](#install-cri-tools)
- [Configuration](#configuration)
- [Configure containerd to use Kata Containers](#configure-containerd-to-use-kata-containers)
- [Kata Containers as a `RuntimeClass`](#kata-containers-as-a-runtimeclass)
- [Kata Containers as the runtime for untrusted workload](#kata-containers-as-the-runtime-for-untrusted-workload)
- [Kata Containers as the default runtime](#kata-containers-as-the-default-runtime)
- [Configuration for `cri-tools`](#configuration-for-cri-tools)
- [Run](#run)
- [Launch containers with `ctr` command line](#launch-containers-with-ctr-command-line)
- [Launch Pods with `crictl` command line](#launch-pods-with-crictl-command-line)
This document covers the installation and configuration of [containerd](https://containerd.io/)
and [Kata Containers](https://katacontainers.io). The containerd provides not only the `ctr`
command line tool, but also the [CRI](https://kubernetes.io/blog/2016/12/container-runtime-interface-cri-in-kubernetes/)
interface for [Kubernetes](https://kubernetes.io) and other CRI clients.
This document is primarily written for Kata Containers v1.5.0-rc2 or above, and containerd v1.2.0 or above.
Previous versions are addressed here, but we suggest users upgrade to the newer versions for better support.
## Concepts
### Kubernetes `RuntimeClass`
[`RuntimeClass`](https://kubernetes.io/docs/concepts/containers/runtime-class/) is a Kubernetes feature first
introduced in Kubernetes 1.12 as alpha. It is the feature for selecting the container runtime configuration to
use to run a pods containers. This feature is supported in `containerd` since [v1.2.0](https://github.com/containerd/containerd/releases/tag/v1.2.0).
Before the `RuntimeClass` was introduced, Kubernetes was not aware of the difference of runtimes on the node. `kubelet`
creates Pod sandboxes and containers through CRI implementations, and treats all the Pods equally. However, there
are requirements to run trusted Pods (i.e. Kubernetes plugin) in a native container like runc, and to run untrusted
workloads with isolated sandboxes (i.e. Kata Containers).
As a result, the CRI implementations extended their semantics for the requirements:
- At the beginning, [Frakti](https://github.com/kubernetes/frakti) checks the network configuration of a Pod, and
treat Pod with `host` network as trusted, while others are treated as untrusted.
- The containerd introduced an annotation for untrusted Pods since [v1.0](https://github.com/containerd/cri/blob/v1.0.0-rc.0/docs/config.md):
```yaml
annotations:
io.kubernetes.cri.untrusted-workload: "true"
```
- Similarly, CRI-O introduced the annotation `io.kubernetes.cri-o.TrustedSandbox` for untrusted Pods.
To eliminate the complexity of user configuration introduced by the non-standardized annotations and provide
extensibility, `RuntimeClass` was introduced. This gives users the ability to affect the runtime behavior
through `RuntimeClass` without the knowledge of the CRI daemons. We suggest that users with multiple runtimes
use `RuntimeClass` instead of the deprecated annotations.
### Containerd Runtime V2 API: Shim V2 API
The [`containerd-shim-kata-v2` (short as `shimv2` in this documentation)](https://github.com/kata-containers/runtime/tree/master/containerd-shim-v2)
implements the [Containerd Runtime V2 (Shim API)](https://github.com/containerd/containerd/tree/master/runtime/v2) for Kata.
With `shimv2`, Kubernetes can launch Pod and OCI-compatible containers with one shim per Pod. Prior to `shimv2`, `2N+1`
shims (i.e. a `containerd-shim` and a `kata-shim` for each container and the Pod sandbox itself) and no standalone `kata-proxy`
process were used, even with VSOCK not available.
![Kubernetes integration with shimv2](../design/arch-images/shimv2.svg)
The shim v2 is introduced in containerd [v1.2.0](https://github.com/containerd/containerd/releases/tag/v1.2.0) and Kata `shimv2`
is implemented in Kata Containers v1.5.0.
## Install
### Install Kata Containers
Follow the instructions to [install Kata Containers](https://github.com/kata-containers/documentation/blob/master/install/README.md).
### Install containerd with CRI plugin
> **Note:** `cri` is a native plugin of containerd 1.1 and above. It is built into containerd and enabled by default.
> You do not need to install `cri` if you have containerd 1.1 or above. Just remove the `cri` plugin from the list of
> `disabled_plugins` in the containerd configuration file (`/etc/containerd/config.toml`).
Follow the instructions from the [CRI installation guide](http://github.com/containerd/cri/blob/master/docs/installation.md).
Then, check if `containerd` is now available:
```bash
$ command -v containerd
```
### Install CNI plugins
> **Note:** You do not need to install CNI plugins if you do not want to use containerd with Kubernetes.
> If you have installed Kubernetes with `kubeadm`, you might have already installed the CNI plugins.
You can manually install CNI plugins as follows:
```bash
$ go get github.com/containernetworking/plugins
$ pushd $GOPATH/src/github.com/containernetworking/plugins
$ ./build_linux.sh
$ sudo mkdir /opt/cni
$ sudo cp -r bin /opt/cni/
$ popd
```
### Install `cri-tools`
> **Note:** `cri-tools` is a set of tools for CRI used for development and testing. Users who only want
> to use containerd with Kubernetes can skip the `cri-tools`.
You can install the `cri-tools` from source code:
```bash
$ go get github.com/kubernetes-incubator/cri-tools
$ pushd $GOPATH/src/github.com/kubernetes-incubator/cri-tools
$ make
$ sudo -E make install
$ popd
```
## Configuration
### Configure containerd to use Kata Containers
By default, the configuration of containerd is located at `/etc/containerd/config.toml`, and the
`cri` plugins are placed in the following section:
```toml
[plugins]
[plugins.cri]
[plugins.cri.containerd]
[plugins.cri.containerd.default_runtime]
#runtime_type = "io.containerd.runtime.v1.linux"
[plugins.cri.cni]
# conf_dir is the directory in which the admin places a CNI conf.
conf_dir = "/etc/cni/net.d"
```
The following sections outline how to add Kata Containers to the configurations.
#### Kata Containers as a `RuntimeClass`
For
- Kata Containers v1.5.0 or above (including `1.5.0-rc`)
- Containerd v1.2.0 or above
- Kubernetes v1.12.0 or above
The `RuntimeClass` is suggested.
The following configuration includes three runtime classes:
- `plugins.cri.containerd.runtimes.runc`: the runc, and it is the default runtime.
- `plugins.cri.containerd.runtimes.kata`: The function in containerd (reference [the document here](https://github.com/containerd/containerd/tree/master/runtime/v2#binary-naming))
where the dot-connected string `io.containerd.kata.v2` is translated to `containerd-shim-kata-v2` (i.e. the
binary name of the Kata implementation of [Containerd Runtime V2 (Shim API)](https://github.com/containerd/containerd/tree/master/runtime/v2)).
- `plugins.cri.containerd.runtimes.katacli`: the `containerd-shim-runc-v1` calls `kata-runtime`, which is the legacy process.
```toml
[plugins.cri.containerd]
no_pivot = false
[plugins.cri.containerd.runtimes]
[plugins.cri.containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v1"
[plugins.cri.containerd.runtimes.runc.options]
NoPivotRoot = false
NoNewKeyring = false
ShimCgroup = ""
IoUid = 0
IoGid = 0
BinaryName = "runc"
Root = ""
CriuPath = ""
SystemdCgroup = false
[plugins.cri.containerd.runtimes.kata]
runtime_type = "io.containerd.kata.v2"
[plugins.cri.containerd.runtimes.katacli]
runtime_type = "io.containerd.runc.v1"
[plugins.cri.containerd.runtimes.katacli.options]
NoPivotRoot = false
NoNewKeyring = false
ShimCgroup = ""
IoUid = 0
IoGid = 0
BinaryName = "/usr/bin/kata-runtime"
Root = ""
CriuPath = ""
SystemdCgroup = false
```
From Containerd v1.2.4 and Kata v1.6.0, there is a new runtime option supported, which allows you to specify a specific Kata configuration file as follows:
```toml
[plugins.cri.containerd.runtimes.kata]
runtime_type = "io.containerd.kata.v2"
[plugins.cri.containerd.runtimes.kata.options]
ConfigPath = "/etc/kata-containers/config.toml"
```
This `ConfigPath` option is optional. If you do not specify it, shimv2 first tries to get the configuration file from the environment variable `KATA_CONF_FILE`. If neither are set, shimv2 will use the default Kata configuration file paths (`/etc/kata-containers/configuration.toml` and `/usr/share/defaults/kata-containers/configuration.toml`).
If you use Containerd older than v1.2.4 or a version of Kata older than v1.6.0 and also want to specify a configuration file, you can use the following workaround, since the shimv2 accepts an environment variable, `KATA_CONF_FILE` for the configuration file path. Then, you can create a
shell script with the following:
```bash
#!/bin/bash
KATA_CONF_FILE=/etc/kata-containers/firecracker.toml containerd-shim-kata-v2 $@
```
Name it as `/usr/local/bin/containerd-shim-katafc-v2` and reference it in the configuration of containerd:
```toml
[plugins.cri.containerd.runtimes.kata-firecracker]
runtime_type = "io.containerd.katafc.v2"
```
#### Kata Containers as the runtime for untrusted workload
For cases without `RuntimeClass` support, we can use the legacy annotation method to support using Kata Containers
for an untrusted workload. With the following configuration, you can run trusted workloads with a runtime such as `runc`
and then, run an untrusted workload with Kata Containers:
```toml
[plugins.cri.containerd]
# "plugins.cri.containerd.default_runtime" is the runtime to use in containerd.
[plugins.cri.containerd.default_runtime]
# runtime_type is the runtime type to use in containerd e.g. io.containerd.runtime.v1.linux
runtime_type = "io.containerd.runtime.v1.linux"
# "plugins.cri.containerd.untrusted_workload_runtime" is a runtime to run untrusted workloads on it.
[plugins.cri.containerd.untrusted_workload_runtime]
# runtime_type is the runtime type to use in containerd e.g. io.containerd.runtime.v1.linux
runtime_type = "io.containerd.kata.v2"
```
For the earlier versions of Kata Containers and containerd that do not support Runtime V2 (Shim API), you can use the following alternative configuration:
```toml
[plugins.cri.containerd]
# "plugins.cri.containerd.default_runtime" is the runtime to use in containerd.
[plugins.cri.containerd.default_runtime]
# runtime_type is the runtime type to use in containerd e.g. io.containerd.runtime.v1.linux
runtime_type = "io.containerd.runtime.v1.linux"
# "plugins.cri.containerd.untrusted_workload_runtime" is a runtime to run untrusted workloads on it.
[plugins.cri.containerd.untrusted_workload_runtime]
# runtime_type is the runtime type to use in containerd e.g. io.containerd.runtime.v1.linux
runtime_type = "io.containerd.runtime.v1.linux"
# runtime_engine is the name of the runtime engine used by containerd.
runtime_engine = "/usr/bin/kata-runtime"
```
You can find more information on the [Containerd config documentation](https://github.com/containerd/cri/blob/master/docs/config.md)
#### Kata Containers as the default runtime
If you want to set Kata Containers as the only runtime in the deployment, you can simply configure as follows:
```toml
[plugins.cri.containerd]
[plugins.cri.containerd.default_runtime]
runtime_type = "io.containerd.kata.v2"
```
Alternatively, for the earlier versions of Kata Containers and containerd that do not support Runtime V2 (Shim API), you can use the following alternative configuration:
```toml
[plugins.cri.containerd]
[plugins.cri.containerd.default_runtime]
runtime_type = "io.containerd.runtime.v1.linux"
runtime_engine = "/usr/bin/kata-runtime"
```
### Configuration for `cri-tools`
> **Note:** If you skipped the [Install `cri-tools`](#install-cri-tools) section, you can skip this section too.
First, add the CNI configuration in the containerd configuration.
The following is the configuration if you installed CNI as the *[Install CNI plugins](#install-cni-plugins)* section outlined.
Put the CNI configuration as `/etc/cni/net.d/10-mynet.conf`:
```json
{
"cniVersion": "0.2.0",
"name": "mynet",
"type": "bridge",
"bridge": "cni0",
"isGateway": true,
"ipMasq": true,
"ipam": {
"type": "host-local",
"subnet": "172.19.0.0/24",
"routes": [
{ "dst": "0.0.0.0/0" }
]
}
}
```
Next, reference the configuration directory through containerd `config.toml`:
```toml
[plugins.cri.cni]
# conf_dir is the directory in which the admin places a CNI conf.
conf_dir = "/etc/cni/net.d"
```
The configuration file of `crictl` command line tool in `cri-tools` locates at `/etc/crictl.yaml`:
```yaml
runtime-endpoint: unix:///var/run/containerd/containerd.sock
image-endpoint: unix:///var/run/containerd/containerd.sock
timeout: 10
debug: true
```
## Run
### Launch containers with `ctr` command line
To run a container with Kata Containers through the containerd command line, you can run the following:
```bash
$ sudo ctr image pull docker.io/library/busybox:latest
$ sudo ctr run --runtime io.containerd.run.kata.v2 -t --rm docker.io/library/busybox:latest hello sh
```
This launches a BusyBox container named `hello`, and it will be removed by `--rm` after it quits.
### Launch Pods with `crictl` command line
With the `crictl` command line of `cri-tools`, you can specify runtime class with `-r` or `--runtime` flag.
Use the following to launch Pod with `kata` runtime class with the pod in [the example](https://github.com/kubernetes-sigs/cri-tools/tree/master/docs/examples)
of `cri-tools`:
```bash
$ sudo crictl runp -r kata podsandbox-config.yaml
36e23521e8f89fabd9044924c9aeb34890c60e85e1748e8daca7e2e673f8653e
```
You can add container to the launched Pod with the following:
```bash
$ sudo crictl create 36e23521e8f89 container-config.yaml podsandbox-config.yaml
1aab7585530e62c446734f12f6899f095ce53422dafcf5a80055ba11b95f2da7
```
Now, start it with the following:
```bash
$ sudo crictl start 1aab7585530e6
1aab7585530e6
```
In Kubernetes, you need to create a `RuntimeClass` resource and add the `RuntimeClass` field in the Pod Spec
(see this [document](https://kubernetes.io/docs/concepts/containers/runtime-class/) for more information).
If `RuntimeClass` is not supported, you can use the following annotation in a Kubernetes pod to identify as an untrusted workload:
```yaml
annotations:
io.kubernetes.cri.untrusted-workload: "true"
```

View File

@@ -0,0 +1,463 @@
# Importing Kata Containers logs with Fluentd
* [Introduction](#introduction)
* [Overview](#overview)
* [Test stack](#test-stack)
* [Importing the logs](#importing-the-logs)
* [Direct import `logfmt` from `systemd`](#direct-import-logfmt-from-systemd)
* [Configuring `minikube`](#configuring-minikube)
* [Pull from `systemd`](#pull-from-systemd)
* [Systemd Summary](#systemd-summary)
* [Directly importing JSON](#directly-importing-json)
* [JSON in files](#json-in-files)
* [Prefixing all keys](#prefixing-all-keys)
* [Kata `shimv2`](#kata-shimv2)
* [Caveats](#caveats)
* [Summary](#summary)
# Introduction
This document describes how to import Kata Containers logs into [Fluentd](https://www.fluentd.org/),
typically for importing into an
Elastic/Fluentd/Kibana([EFK](https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/fluentd-elasticsearch#running-efk-stack-in-production))
or Elastic/Logstash/Kibana([ELK](https://www.elastic.co/elastic-stack)) stack.
The majority of this document focusses on CRI-O based (classic) Kata runtime. Much of that information
also applies to the Kata `shimv2` runtime. Differences pertaining to Kata `shimv2` can be found in their
[own section](#kata-shimv2).
> **Note:** This document does not cover any aspect of "log rotation". It is expected that any production
> stack already has a method in place to control node log growth.
# Overview
Kata generates logs. The logs can come from numerous parts of the Kata stack (the runtime, proxy, shim
and even the agent). By default the logs
[go to the system journal](https://github.com/kata-containers/runtime#logging),
but they can also be configured to be stored in files.
The logs default format is in [`logfmt` structured logging](https://brandur.org/logfmt), but can be switched to
be JSON with a command line option.
Provided below are some examples of Kata log import and processing using
[Fluentd](https://www.fluentd.org/).
## Test stack
Some of the testing we can perform locally, but other times we really need a live stack for testing.
We will use a [`minikube`](https://github.com/kubernetes/minikube/) stack with EFK enabled and Kata
installed to do our tests. Some details such as specific paths and versions of components may need
to be adapted to your specific installation.
The [Kata minikube installation guide](../install/minikube-installation-guide.md) was used to install
`minikube` with Kata Containers enabled.
The minikube EFK stack `addon` is then enabled:
```bash
$ minikube addons enable efk
```
> *Note*: Installing and booting EFK can take a little while - check progress with
> `kubectl get pods -n=kube-system` and wait for all the pods to get to the `Running` state.
## Importing the logs
Kata offers us two choices to make when storing the logs:
- Do we store them to the system log, or to separate files?
- Do we store them in `logfmt` format, or `JSON`?
We will start by examining the Kata default setup (`logfmt` stored in the system log), and then look
at other options.
## Direct import `logfmt` from `systemd`
Fluentd contains both a component that can read the `systemd` system journals, and a component
that can parse `logfmt` entries. We will utilise these in two separate steps to evaluate how well
the Kata logs import to the EFK stack.
### Configuring `minikube`
> **Note:** Setting up, configuration and deployment of `minikube` is not covered in exacting
> detail in this guide. It is presumed the user has the abilities and their own Kubernetes/Fluentd
> stack they are able to utilise in order to modify and test as necessary.
Minikube by default
[configures](https://github.com/kubernetes/minikube/blob/master/deploy/iso/minikube-iso/board/coreos/minikube/rootfs-overlay/etc/systemd/journald.conf)
the `systemd-journald` with the
[`Storage=volatile`](https://www.freedesktop.org/software/systemd/man/journald.conf.html) option,
which results in the journal being stored in `/run/log/journal`. Unfortunately, the Minikube EFK
Fluentd install extracts most of its logs in `/var/log`, and therefore does not mount `/run/log`
into the Fluentd pod by default. This prevents us from reading the system journal by default.
This can be worked around by patching the Minikube EFK `addon` YAML to mount `/run/log` into the
Fluentd container:
```patch
diff --git a/deploy/addons/efk/fluentd-es-rc.yaml.tmpl b/deploy/addons/efk/fluentd-es-rc.yaml.tmpl
index 75e386984..83bea48b9 100644
--- a/deploy/addons/efk/fluentd-es-rc.yaml.tmpl
+++ b/deploy/addons/efk/fluentd-es-rc.yaml.tmpl
@@ -44,6 +44,8 @@ spec:
volumeMounts:
- name: varlog
mountPath: /var/log
+ - name: runlog
+ mountPath: /run/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
@@ -57,6 +59,9 @@ spec:
- name: varlog
hostPath:
path: /var/log
+ - name: runlog
+ hostPath:
+ path: /run/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
```
> **Note:** After making this change you will need to build your own `minikube` to encapsulate
> and use this change, or fine another method to (re-)launch the Fluentd containers for the change
> to take effect.
### Pull from `systemd`
We will start with testing Fluentd pulling the Kata logs directly from the system journal with the
Fluentd [systemd plugin](https://github.com/fluent-plugin-systemd/fluent-plugin-systemd).
We modify the Fluentd config file with the following fragment. For reference, the Minikube
YAML can be found
[on GitHub](https://github.com/kubernetes/minikube/blob/master/deploy/addons/efk/fluentd-es-configmap.yaml.tmpl):
> **Note:** The below Fluentd config fragment is in the "older style" to match the Minikube version of
> Fluentd. If using a more up to date version of Fluentd, you may need to update some parameters, such as
> using `matches` rather than `filters` and placing `@` before `type`. Your Fluentd should warn you in its
> logs if such updates are necessary.
```
<source>
type systemd
tag kata-containers
path /run/log/journal
pos_file /run/log/journal/kata-journald.pos
filters [{"SYSLOG_IDENTIFIER": "kata-runtime"}, {"SYSLOG_IDENTIFIER": "kata-proxy"}, {"SYSLOG_IDENTIFIER": "kata-shim"}]
read_from_head true
</source>
```
We then apply the new YAML, and restart the Fluentd pod (by killing it, and letting the `ReplicationController`
start a new instance, which will pick up the new `ConfigurationMap`):
```bash
$ kubectl apply -f new-fluentd-cm.yaml
$ kubectl delete pod -n=kube-system fluentd-es-XXXXX
```
Now open the Kibana UI to the Minikube EFK `addon`, and launch a Kata QEMU based test pod in order to
generate some Kata specific log entries:
```bash
$ minikube addons open efk
$ cd $GOPATH/src/github.com/kata-containers/packaging/kata-deploy
$ kubectl apply -f examples/nginx-deployment-qemu.yaml
```
Looking at the Kibana UI, we can now see that some `kata-runtime` tagged records have appeared:
![Kata tags in EFK](./images/efk_kata_tag.png)
If we now filter on that tag, we can see just the Kata related entries
![Kata tags in EFK](./images/efk_filter_on_tag.png)
If we expand one of those entries, we can see we have imported useful information. You can then
sub-filter on, for instance, the `SYSLOG_IDENTIFIER` to differentiate the Kata components, and
on the `PRIORITY` to filter out critical issues etc.
Kata generates a significant amount of Kata specific information, which can be seen as
[`logfmt`](https://github.com/kata-containers/tests/tree/master/cmd/log-parser#logfile-requirements).
data contained in the `MESSAGE` field. Imported as-is, there is no easy way to filter on that data
in Kibana:
![Kata tags in EFK](./images/efk_syslog_entry_detail.png).
We can however further sub-parse the Kata entries using the
[Fluentd plugins](https://docs.fluentbit.io/manual/parser/logfmt) that will parse
`logfmt` formatted data. We can utilise these to parse the sub-fields using a Fluentd filter
section. At the same time, we will prefix the new fields with `kata_` to make it clear where
they have come from:
```
<filter kata-containers>
@type parser
key_name MESSAGE
format logfmt
reserve_data true
inject_key_prefix kata_
</filter>
```
The Minikube Fluentd version does not come with the `logfmt` parser installed, so we will run a local
test to check the parsing works. The resulting output from Fluentd is:
```
2020-02-21 10:31:27.810781647 +0000 kata-containers:
{"_BOOT_ID":"590edceeef5545a784ec8c6181a10400",
"_MACHINE_ID":"3dd49df65a1b467bac8d51f2eaa17e92",
"_HOSTNAME":"minikube",
"PRIORITY":"6",
"_UID":"0",
"_GID":"0",
"_SYSTEMD_SLICE":"system.slice",
"_SELINUX_CONTEXT":"kernel",
"_CAP_EFFECTIVE":"3fffffffff",
"_TRANSPORT":"syslog",
"_SYSTEMD_CGROUP":"/system.slice/crio.service",
"_SYSTEMD_UNIT":"crio.service",
"_SYSTEMD_INVOCATION_ID":"f2d99c784e6f406c87742f4bca16a4f6",
"SYSLOG_IDENTIFIER":"kata-runtime",
"_COMM":"kata-runtime",
"_EXE":"/opt/kata/bin/kata-runtime",
"SYSLOG_TIMESTAMP":"Feb 21 10:31:27 ",
"_CMDLINE":"/opt/kata/bin/kata-runtime --kata-config /opt/kata/share/defaults/kata-containers/configuration-qemu.toml --root /run/runc state 7cdd31660d8705facdadeb8598d2c0bd008e8142c54e3b3069abd392c8d58997",
"SYSLOG_PID":"14314",
"_PID":"14314",
"MESSAGE":"time=\"2020-02-21T10:31:27.810781647Z\" level=info msg=\"release sandbox\" arch=amd64 command=state container=7cdd31660d8705facdadeb8598d2c0bd008e8142c54e3b3069abd392c8d58997 name=kata-runtime pid=14314 sandbox=1c3e77cad66aa2b6d8cc846f818370f79cb0104c0b840f67d0f502fd6562b68c source=virtcontainers subsystem=sandbox",
"SYSLOG_RAW":"<6>Feb 21 10:31:27 kata-runtime[14314]: time=\"2020-02-21T10:31:27.810781647Z\" level=info msg=\"release sandbox\" arch=amd64 command=state container=7cdd31660d8705facdadeb8598d2c0bd008e8142c54e3b3069abd392c8d58997 name=kata-runtime pid=14314 sandbox=1c3e77cad66aa2b6d8cc846f818370f79cb0104c0b840f67d0f502fd6562b68c source=virtcontainers subsystem=sandbox\n",
"_SOURCE_REALTIME_TIMESTAMP":"1582281087810805",
"kata_level":"info",
"kata_msg":"release sandbox",
"kata_arch":"amd64",
"kata_command":"state",
"kata_container":"7cdd31660d8705facdadeb8598d2c0bd008e8142c54e3b3069abd392c8d58997",
"kata_name":"kata-runtime",
"kata_pid":14314,
"kata_sandbox":"1c3e77cad66aa2b6d8cc846f818370f79cb0104c0b840f67d0f502fd6562b68c",
"kata_source":"virtcontainers",
"kata_subsystem":"sandbox"}
```
Here we can see that the `MESSAGE` field has been parsed out and pre-pended into the `kata_*` fields,
which contain usefully filterable fields such as `kata_level`, `kata_command` and `kata_subsystem` etc.
### Systemd Summary
We have managed to configure Fluentd to capture the Kata logs entries from the system
journal, and further managed to then parse out the `logfmt` message into JSON to allow further analysis
inside Elastic/Kibana.
## Directly importing JSON
The underlying basic data format used by Fluentd and Elastic is JSON. If we output JSON
directly from Kata, that should make overall import and processing of the log entries more efficient.
There are potentially two things we can do with Kata here:
- Get Kata to [output its logs in `JSON` format](https://github.com/kata-containers/runtime#logging) rather
than `logfmt`.
- Get Kata to log directly into a file, rather than via the system journal. This would allow us to not need
to parse the systemd format files, and capture the Kata log lines directly. It would also avoid Fluentd
having to potentially parse or skip over many non-Kata related systemd journal that it is not at all
interested in.
In theory we could get Kata to post its messages in JSON format to the systemd journal by adding the
`--log-format=json` option to the Kata runtime, and then swapping the `logfmt` parser for the `json`
parser, but we would still need to parse the systemd files. We will skip this setup in this document, and
go directly to a full Kata specific JSON format logfile test.
### JSON in files
Kata runtime has the ability to generate JSON logs directly, rather than its default `logfmt` format. Passing
the `--log-format=json` argument to the Kata runtime enables this. The easiest way to pass in this extra
parameter from a [Kata deploy](https://github.com/kata-containers/packaging/tree/master/kata-deploy) installation
is to edit the `/opt/kata/bin/kata-qemu` shell script (generated by the
[Kata packaging release scripts](https://github.com/kata-containers/packaging/blob/master/release/kata-deploy-binaries.sh)).
At the same time, we will add the `--log=/var/log/kata-runtime.log` argument to store the Kata logs in their
own file (rather than into the system journal).
```bash
#!/bin/bash
/opt/kata/bin/kata-runtime --kata-config "/opt/kata/share/defaults/kata-containers/configuration-qemu.toml" --log-format=json --log=/var/log/kata-runtime.log $@
```
And then we'll add the Fluentd config section to parse that file. Note, we inform the parser that Kata is
generating timestamps in `iso8601` format. Kata places these timestamps into a field called `time`, which
is the default field the Fluentd parser looks for:
```
<source>
type tail
tag kata-containers
path /var/log/kata-runtime.log
pos_file /var/log/kata-runtime.pos
format json
time_format %iso8601
read_from_head true
</source>
```
This imports the `kata-runtime` logs, with the resulting records looking like:
![Kata tags in EFK](./images/efk_direct_from_json.png)
Something to note here is that we seem to have gained an awful lot of fairly identical looking fields in the
elastic database:
![Kata tags in EFK](./images/efk_direct_json_fields.png)
In reality, they are not all identical, but do come out of one of the Kata log entries - from the
`kill` command. A JSON fragment showing an example is below:
```json
{
...
"EndpointProperties": {
"Iface": {
"Index": 4,
"MTU": 1460,
"TxQLen": 0,
"Name": "eth0",
"HardwareAddr": "ClgKAQAL",
"Flags": 19,
"RawFlags": 69699,
"ParentIndex": 15,
"MasterIndex": 0,
"Namespace": null,
"Alias": "",
"Statistics": {
"RxPackets": 1,
"TxPackets": 5,
"RxBytes": 42,
"TxBytes": 426,
"RxErrors": 0,
"TxErrors": 0,
"RxDropped": 0,
"TxDropped": 0,
"Multicast": 0,
"Collisions": 0,
"RxLengthErrors": 0,
"RxOverErrors": 0,
"RxCrcErrors": 0,
"RxFrameErrors": 0,
"RxFifoErrors": 0,
"RxMissedErrors": 0,
"TxAbortedErrors": 0,
"TxCarrierErrors": 0,
"TxFifoErrors": 0,
"TxHeartbeatErrors": 0,
"TxWindowErrors": 0,
"RxCompressed": 0,
"TxCompressed": 0
...
```
If these new fields are not required, then a Fluentd
[`record_transformer` filter](https://docs.fluentd.org/filter/record_transformer#remove_keys)
could be used to delete them before they are injected into Elastic.
#### Prefixing all keys
It may be noted above that all the fields are imported with their base native name, such as
`arch` and `level`. It may be better for data storage and processing if all the fields were
identifiable as having come from Kata, and avoid namespace clashes with other imports.
This can be achieved by prefixing all the keys with, say, `kata_`. It appears `fluend` cannot
do this directly in the input or match phases, but can in the filter/parse phase (as was done
when processing `logfmt` data for instance). To achieve this, we can first input the Kata
JSON data as a single line, and then add the prefix using a JSON filter section:
```
# Pull in as a single line...
<source>
@type tail
path /var/log/kata-runtime.log
pos_file /var/log/kata-runtime.pos
read_from_head true
tag kata-runtime
<parse>
@type none
</parse>
</source>
<filter kata-runtime>
@type parser
key_name message
# drop the original single line `message` entry
reserve_data false
inject_key_prefix kata_
<parse>
@type json
</parse>
</filter>
```
# Kata `shimv2`
When using the Kata `shimv2` runtime with `containerd`, as described in this
[how-to guide](containerd-kata.md#containerd-runtime-v2-api-shim-v2-api), the Kata logs are routed
differently, and some adjustments to the above methods will be necessary to filter them in Fluentd.
The Kata `shimv2` logs are different in two primary ways:
- The Kata logs are directed via `containerd`, and will be captured along with the `containerd` logs,
such as on the containerd stdout or in the system journal.
- In parallel, Kata `shimv2` places its logs into the system journal under the systemd name of `kata`.
Below is an example Fluentd configuration fragment showing one possible method of extracting and separating
the `containerd` and Kata logs from the system journal by filtering on the Kata `SYSLOG_IDENTIFIER` field,
using the [Fluentd v0.12 rewrite_tag_filter](https://docs.fluentd.org/v/0.12/output/rewrite_tag_filter):
```yaml
<source>
type systemd
path /path/to/journal
# capture the containerd logs
filters [{ "_SYSTEMD_UNIT": "containerd.service" }]
pos_file /tmp/systemd-containerd.pos
read_from_head true
# tag those temporarily, as we will filter them and rewrite the tags
tag containerd_tmp_tag
</source>
# filter out and split between kata entries and containerd entries
<match containerd_tmp_tag>
@type rewrite_tag_filter
# Tag Kata entries
<rule>
key SYSLOG_IDENTIFIER
pattern kata
tag kata_tag
</rule>
# Anything that was not matched so far, tag as containerd
<rule>
key MESSAGE
pattern /.+/
tag containerd_tag
</rule>
</match>
```
# Caveats
> **Warning:** You should be aware of the following caveats, which may disrupt or change what and how
> you capture and process the Kata Containers logs.
The following caveats should be noted:
- There is a [known issue](https://github.com/kata-containers/runtime/issues/985) whereby enabling
full debug in Kata, particularly enabling agent kernel log messages, can result in corrupt log lines
being generated by Kata (due to overlapping multiple output streams).
- Presently only the `kata-runtime` can generate JSON logs, and direct them to files. Other components
such as the `proxy` and `shim` can only presently report to the system journal. Hopefully these
components will be extended with extra functionality.
# Summary
We have shown how native Kata logs using the systemd journal and `logfmt` data can be import, and also
how Kata can be instructed to generate JSON logs directly, and import those into Fluentd.
We have detailed a few known caveats, and leave it to the implementer to choose the best method for their
system.

View File

@@ -0,0 +1,106 @@
# Loading kernel modules
A new feature for loading kernel modules was introduced in Kata Containers 1.9.
The list of kernel modules and their parameters can be provided using the
configuration file or OCI annotations. The [Kata runtime][1] gives that
information to the [Kata Agent][2] through gRPC when the sandbox is created.
The [Kata Agent][2] will insert the kernel modules using `modprobe(8)`, hence
modules dependencies are resolved automatically.
The sandbox will not be started when:
* A kernel module is specified and the `modprobe(8)` command is not installed in
the guest or it fails loading the module.
* The module is not available in the guest or it doesn't meet the guest kernel
requirements, like architecture and version.
In the following sections are documented the different ways that exist for
loading kernel modules in Kata Containers.
- [Using Kata Configuration file](#using-kata-configuration-file)
- [Using annotations](#using-annotations)
# Using Kata Configuration file
```
NOTE: Use this method, only if you need to pass the kernel modules to all
containers. Please use annotations described below to set per pod annotations.
```
The list of kernel modules and parameters can be set in the `kernel_modules`
option as a coma separated list, where each entry in the list specifies a kernel
module and its parameters. Each list element comprises one or more space separated
fields. The first field specifies the module name and subsequent fields specify
individual parameters for the module.
The following example specifies two modules to load: `e1000e` and `i915`. Two parameters
are specified for the `e1000` module: `InterruptThrottleRate` (which takes an array
of integer values) and `EEE` (which requires a single integer value).
```toml
kernel_modules=["e1000e InterruptThrottleRate=3000,3000,3000 EEE=1", "i915"]
```
Not all the container managers allow users provide custom annotations, hence
this is the only way that Kata Containers provide for loading modules when
custom annotations are not supported.
There are some limitations with this approach:
* Write access to the Kata configuration file is required.
* The configuration file must be updated when a new container is created,
otherwise the same list of modules is used, even if they are not needed in the
container.
# Using annotations
As was mentioned above, not all containers need the same modules, therefore using
the configuration file for specifying the list of kernel modules per [POD][3] can
be a pain. Unlike the configuration file, annotations provide a way to specify
custom configurations per POD.
The list of kernel modules and parameters can be set using the annotation
`io.katacontainers.config.agent.kernel_modules` as a semicolon separated
list, where the first word of each element is considered as the module name and
the rest as its parameters.
In the following example two PODs are created, but the kernel modules `e1000e`
and `i915` are inserted only in the POD `pod1`.
```yaml
apiVersion: v1
kind: Pod
metadata:
name: pod1
annotations:
io.katacontainers.config.agent.kernel_modules: "e1000e EEE=1; i915"
spec:
runtimeClassName: kata
containers:
- name: c1
image: busybox
command:
- sh
stdin: true
tty: true
---
apiVersion: v1
kind: Pod
metadata:
name: pod2
spec:
runtimeClassName: kata
containers:
- name: c2
image: busybox
command:
- sh
stdin: true
tty: true
```
[1]: https://github.com/kata-containers/runtime
[2]: https://github.com/kata-containers/agent
[3]: https://kubernetes.io/docs/concepts/workloads/pods/pod/

View File

@@ -0,0 +1,160 @@
# Per-Pod Kata Configurations
Kata Containers gives users freedom to customize at per-pod level, by setting
a wide range of Kata specific annotations in the pod specification.
# Kata Configuration Annotations
There are several kinds of Kata configurations and they are listed below.
## Global Options
| Key | Value Type | Comments |
|-------| ----- | ----- |
| `io.katacontainers.config_path` | string | Kata config file location that overrides the default config paths |
| `io.katacontainers.pkg.oci.bundle_path` | string | OCI bundle path |
| `io.katacontainers.pkg.oci.container_type`| string | OCI container type. Only accepts `pod_container` and `pod_sandbox` |
## Runtime Options
| Key | Value Type | Comments |
|-------| ----- | ----- |
| `io.katacontainers.config.runtime.experimental` | `boolean` | determines if experimental features enabled |
| `io.katacontainers.config.runtime.disable_guest_seccomp`| `boolean` | determines if `seccomp` should be applied inside guest |
| `io.katacontainers.config.runtime.disable_new_netns` | `boolean` | determines if a new netns is created for the hypervisor process |
| `io.katacontainers.config.runtime.internetworking_model` | string| determines how the VM should be connected to the container network interface. Valid values are `macvtap`, `tcfilter` and `none` |
| `io.katacontainers.config.runtime.sandbox_cgroup_only`| `boolean` | determines if Kata processes are managed only in sandbox cgroup |
## Agent Options
| Key | Value Type | Comments |
|-------| ----- | ----- |
| `io.katacontainers.config.agent.enable_tracing` | `boolean` | enable tracing for the agent |
| `io.katacontainers.config.agent.kernel_modules` | string | the list of kernel modules and their parameters that will be loaded in the guest kernel. Semicolon separated list of kernel modules and their parameters. These modules will be loaded in the guest kernel using `modprobe`(8). E.g., `e1000e InterruptThrottleRate=3000,3000,3000 EEE=1; i915 enable_ppgtt=0` |
| `io.katacontainers.config.agent.trace_mode` | string | the trace mode for the agent |
| `io.katacontainers.config.agent.trace_type` | string | the trace type for the agent |
## Hypervisor Options
| Key | Value Type | Comments |
|-------| ----- | ----- |
| `io.katacontainers.config.hypervisor.asset_hash_type` | string | the hash type used for assets verification, default is `sha512` |
| `io.katacontainers.config.hypervisor.block_device_cache_direct` | `boolean` | Denotes whether use of `O_DIRECT` (bypass the host page cache) is enabled |
| `io.katacontainers.config.hypervisor.block_device_cache_noflush` | `boolean` | Denotes whether flush requests for the device are ignored |
| `io.katacontainers.config.hypervisor.block_device_cache_set` | `boolean` | cache-related options will be set to block devices or not |
| `io.katacontainers.config.hypervisor.block_device_driver` | string | the driver to be used for block device, valid values are `virtio-blk`, `virtio-scsi`, `nvdimm`|
| `io.katacontainers.config.hypervisor.default_max_vcpus` | uint32| the maximum number of vCPUs allocated for the VM by the hypervisor |
| `io.katacontainers.config.hypervisor.default_memory` | uint32| the memory assigned for a VM by the hypervisor in `MiB` |
| `io.katacontainers.config.hypervisor.default_vcpus` | uint32| the default vCPUs assigned for a VM by the hypervisor |
| `io.katacontainers.config.hypervisor.disable_block_device_use` | `boolean` | disallow a block device from being used |
| `io.katacontainers.config.hypervisor.disable_vhost_net` | `boolean` | specify if `vhost-net` is not available on the host |
| `io.katacontainers.config.hypervisor.enable_hugepages` | `boolean` | if the memory should be `pre-allocated` from huge pages |
| `io.katacontainers.config.hypervisor.enable_iothreads` | `boolean`| enable IO to be processed in a separate thread. Supported currently for virtio-`scsi` driver |
| `io.katacontainers.config.hypervisor.enable_mem_prealloc` | `boolean` | the memory space used for `nvdimm` device by the hypervisor |
| `io.katacontainers.config.hypervisor.enable_swap` | `boolean` | enable swap of VM memory |
| `io.katacontainers.config.hypervisor.entropy_source` | string| the path to a host source of entropy (`/dev/random`, `/dev/urandom` or real hardware RNG device) |
| `io.katacontainers.config.hypervisor.file_mem_backend` | string | file based memory backend root directory |
| `io.katacontainers.config.hypervisor.firmware_hash` | string | container firmware SHA-512 hash value |
| `io.katacontainers.config.hypervisor.firmware` | string | the guest firmware that will run the container VM |
| `io.katacontainers.config.hypervisor.guest_hook_path` | string | the path within the VM that will be used for drop in hooks |
| `io.katacontainers.config.hypervisor.hotplug_vfio_on_root_bus` | `boolean` | indicate if devices need to be hotplugged on the root bus instead of a bridge|
| `io.katacontainers.config.hypervisor.hypervisor_hash` | string | container hypervisor binary SHA-512 hash value |
| `io.katacontainers.config.hypervisor.image_hash` | string | container guest image SHA-512 hash value |
| `io.katacontainers.config.hypervisor.image` | string | the guest image that will run in the container VM |
| `io.katacontainers.config.hypervisor.initrd_hash` | string | container guest initrd SHA-512 hash value |
| `io.katacontainers.config.hypervisor.initrd` | string | the guest initrd image that will run in the container VM |
| `io.katacontainers.config.hypervisor.jailer_hash` | string | container jailer SHA-512 hash value |
| `io.katacontainers.config.hypervisor.jailer_path` | string | the jailer that will constrain the container VM |
| `io.katacontainers.config.hypervisor.kernel_hash` | string | container kernel image SHA-512 hash value |
| `io.katacontainers.config.hypervisor.kernel_params` | string | additional guest kernel parameters |
| `io.katacontainers.config.hypervisor.kernel` | string | the kernel used to boot the container VM |
| `io.katacontainers.config.hypervisor.machine_accelerators` | string | machine specific accelerators for the hypervisor |
| `io.katacontainers.config.hypervisor.machine_type` | string | the type of machine being emulated by the hypervisor |
| `io.katacontainers.config.hypervisor.memory_offset` | uint32| the memory space used for `nvdimm` device by the hypervisor |
| `io.katacontainers.config.hypervisor.memory_slots` | uint32| the memory slots assigned to the VM by the hypervisor |
| `io.katacontainers.config.hypervisor.msize_9p` | uint32 | the `msize` for 9p shares |
| `io.katacontainers.config.hypervisor.path` | string | the hypervisor that will run the container VM |
| `io.katacontainers.config.hypervisor.shared_fs` | string | the shared file system type, either `virtio-9p` or `virtio-fs` |
| `io.katacontainers.config.hypervisor.use_vsock` | `boolean` | specify use of `vsock` for agent communication |
| `io.katacontainers.config.hypervisor.virtio_fs_cache_size` | uint32 | virtio-fs DAX cache size in `MiB` |
| `io.katacontainers.config.hypervisor.virtio_fs_cache` | string | the cache mode for virtio-fs, valid values are `always`, `auto` and `none` |
| `io.katacontainers.config.hypervisor.virtio_fs_daemon` | string | virtio-fs `vhost-user` daemon path |
| `io.katacontainers.config.hypervisor.virtio_fs_extra_args` | string | extra options passed to `virtiofs` daemon |
# CRI Configuration
In case of CRI-O, all annotations specified in the pod spec are passed down to Kata.
For containerd, annotations specified in the pod spec are passed down to Kata
starting with version `1.3.0`. Additionally, extra configuration is needed for containerd,
by providing a `pod_annotations` field in the containerd config file. The `pod_annotations`
field is a list of annotations that can be passed down to Kata as OCI annotations.
It supports golang match patterns. Since annotations supported by Kata follow the pattern
`io.katacontainers.*`, the following configuration would work for passing annotations to
Kata from containerd:
```
$ cat /etc/containerd/config
....
[plugins.cri.containerd.runtimes.kata]
runtime_type = "io.containerd.runc.v1"
pod_annotations = ["io.katacontainers.*"]
[plugins.cri.containerd.runtimes.kata.options]
BinaryName = "/usr/bin/kata-runtime"
....
```
Additional documentation on the above configuration can be found in the
[containerd docs](https://github.com/containerd/cri/blob/8d5a8355d07783ba2f8f451209f6bdcc7c412346/docs/config.md).
# Example - Using annotations
As mentioned above, not all containers need the same modules, therefore using
the configuration file for specifying the list of kernel modules per POD can
be a pain. Unlike the configuration file, annotations provide a way to specify
custom configurations per POD.
The list of kernel modules and parameters can be set using the annotation
`io.katacontainers.config.agent.kernel_modules` as a semicolon separated
list, where the first word of each element is considered as the module name and
the rest as its parameters.
Also users might want to enable guest `seccomp` to provide better isolation with a
little performance sacrifice. The annotation
`io.katacontainers.config.runtime.disable_guest_seccomp` can used for such purpose.
In the following example two PODs are created, but the kernel modules `e1000e`
and `i915` are inserted only in the POD `pod1`. Also guest `seccomp` is only enabled
in the POD `pod2`.
```yaml
apiVersion: v1
kind: Pod
metadata:
name: pod1
annotations:
io.katacontainers.config.agent.kernel_modules: "e1000e EEE=1; i915"
spec:
runtimeClassName: kata
containers:
- name: c1
image: busybox
command:
- sh
stdin: true
tty: true
---
apiVersion: v1
kind: Pod
metadata:
name: pod2
annotations:
io.katacontainers.config.runtime.disable_guest_seccomp: false
spec:
runtimeClassName: kata
containers:
- name: c2
image: busybox
command:
- sh
stdin: true
tty: true
```

View File

@@ -0,0 +1,220 @@
# How to use Kata Containers and CRI (containerd plugin) with Kubernetes
* [Requirements](#requirements)
* [Install and configure containerd](#install-and-configure-containerd)
* [Install and configure Kubernetes](#install-and-configure-kubernetes)
* [Install Kubernetes](#install-kubernetes)
* [Configure Kubelet to use containerd](#configure-kubelet-to-use-containerd)
* [Configure HTTP proxy - OPTIONAL](#configure-http-proxy---optional)
* [Start Kubernetes](#start-kubernetes)
* [Install a Pod Network](#install-a-pod-network)
* [Allow pods to run in the master node](#allow-pods-to-run-in-the-master-node)
* [Create an untrusted pod using Kata Containers](#create-an-untrusted-pod-using-kata-containers)
* [Delete created pod](#delete-created-pod)
This document describes how to set up a single-machine Kubernetes (k8s) cluster.
The Kubernetes cluster will use the
[CRI containerd plugin](https://github.com/containerd/cri) and
[Kata Containers](https://katacontainers.io) to launch untrusted workloads.
For Kata Containers 1.5.0-rc2 and above, we will use `containerd-shim-kata-v2` (short as `shimv2` in this documentation)
to launch Kata Containers. For the previous version of Kata Containers, the Pods are launched with `kata-runtime`.
## Requirements
- Kubernetes, Kubelet, `kubeadm`
- containerd with `cri` plug-in
- Kata Containers
> **Note:** For information about the supported versions of these components,
> see the Kata Containers
> [`versions.yaml`](https://github.com/kata-containers/runtime/blob/master/versions.yaml)
> file.
## Install and configure containerd
First, follow the [How to use Kata Containers and Containerd](containerd-kata.md) to install and configure containerd.
Then, make sure the containerd works with the [examples in it](containerd-kata.md#run).
## Install and configure Kubernetes
### Install Kubernetes
- Follow the instructions for
[`kubeadm` installation](https://kubernetes.io/docs/setup/independent/install-kubeadm/).
- Check `kubeadm` is now available
```bash
$ command -v kubeadm
```
### Configure Kubelet to use containerd
In order to allow Kubelet to use containerd (using the CRI interface), configure the service to point to the `containerd` socket.
- Configure Kubernetes to use `containerd`
```bash
$ sudo mkdir -p /etc/systemd/system/kubelet.service.d/
$ cat << EOF | sudo tee /etc/systemd/system/kubelet.service.d/0-containerd.conf
[Service]
Environment="KUBELET_EXTRA_ARGS=--container-runtime=remote --runtime-request-timeout=15m --container-runtime-endpoint=unix:///run/containerd/containerd.sock"
EOF
```
- Inform systemd about the new configuration
```bash
$ sudo systemctl daemon-reload
```
### Configure HTTP proxy - OPTIONAL
If you are behind a proxy, use the following script to configure your proxy for docker, Kubelet, and containerd:
```bash
$ services="
kubelet
containerd
docker
"
$ for service in ${services}; do
service_dir="/etc/systemd/system/${service}.service.d/"
sudo mkdir -p ${service_dir}
cat << EOT | sudo tee "${service_dir}/proxy.conf"
[Service]
Environment="HTTP_PROXY=${http_proxy}"
Environment="HTTPS_PROXY=${https_proxy}"
Environment="NO_PROXY=${no_proxy}"
EOT
done
$ sudo systemctl daemon-reload
```
## Start Kubernetes
- Make sure `containerd` is up and running
```bash
$ sudo systemctl restart containerd
$ sudo systemctl status containerd
```
- Prevent conflicts between `docker` iptables (packet filtering) rules and k8s pod communication
If Docker is installed on the node, it is necessary to modify the rule
below. See https://github.com/kubernetes/kubernetes/issues/40182 for further
details.
```bash
$ sudo iptables -P FORWARD ACCEPT
```
- Start cluster using `kubeadm`
```bash
$ sudo kubeadm init --cri-socket /run/containerd/containerd.sock --pod-network-cidr=10.244.0.0/16
$ export KUBECONFIG=/etc/kubernetes/admin.conf
$ sudo -E kubectl get nodes
$ sudo -E kubectl get pods
```
## Install a Pod Network
A pod network plugin is needed to allow pods to communicate with each other.
- Install the `flannel` plugin by following the
[Using `kubeadm` to Create a Cluster](https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/#instructions)
guide, starting from the **Installing a pod network** section.
- Create a pod network using flannel
> **Note:** There is no known way to determine programmatically the best version (commit) to use.
> See https://github.com/coreos/flannel/issues/995.
```bash
$ sudo -E kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
```
- Wait for the pod network to become available
```bash
# number of seconds to wait for pod network to become available
$ timeout_dns=420
$ while [ "$timeout_dns" -gt 0 ]; do
if sudo -E kubectl get pods --all-namespaces | grep dns | grep Running; then
break
fi
sleep 1s
((timeout_dns--))
done
```
- Check the pod network is running
```bash
$ sudo -E kubectl get pods --all-namespaces | grep dns | grep Running && echo "OK" || ( echo "FAIL" && false )
```
## Allow pods to run in the master node
By default, the cluster will not schedule pods in the master node. To enable master node scheduling:
```bash
$ sudo -E kubectl taint nodes --all node-role.kubernetes.io/master-
```
## Create an untrusted pod using Kata Containers
By default, all pods are created with the default runtime configured in CRI containerd plugin.
If a pod has the `io.kubernetes.cri.untrusted-workload` annotation set to `"true"`, the CRI plugin runs the pod with the
[Kata Containers runtime](https://github.com/kata-containers/runtime/blob/master/README.md).
- Create an untrusted pod configuration
```bash
$ cat << EOT | tee nginx-untrusted.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx-untrusted
annotations:
io.kubernetes.cri.untrusted-workload: "true"
spec:
containers:
- name: nginx
image: nginx
EOT
```
- Create an untrusted pod
```bash
$ sudo -E kubectl apply -f nginx-untrusted.yaml
```
- Check pod is running
```bash
$ sudo -E kubectl get pods
```
- Check hypervisor is running
```bash
$ ps aux | grep qemu
```
## Delete created pod
```bash
$ sudo -E kubectl delete -f nginx-untrusted.yaml
```

View File

@@ -0,0 +1,130 @@
# Kata Containers with ACRN
This document provides an overview on how to run Kata containers with ACRN hypervisor and device model.
- [Introduction](#introduction)
- [Pre-requisites](#pre-requisites)
- [Configure Docker](#configure-docker)
- [Configure Kata Containers with ACRN](#configure-kata-containers-with-acrn)
## Introduction
ACRN is a flexible, lightweight Type-1 reference hypervisor built with real-time and safety-criticality in mind. ACRN uses an open source platform making it optimized to streamline embedded development.
Some of the key features being:
- Small footprint - Approx. 25K lines of code (LOC).
- Real Time - Low latency, faster boot time, improves overall responsiveness with hardware.
- Adaptability - Multi-OS support for guest operating systems like Linux, Android, RTOSes.
- Rich I/O mediators - Allows sharing of various I/O devices across VMs.
- Optimized for a variety of IoT (Internet of Things) and embedded device solutions.
Please refer to ACRN [documentation](https://projectacrn.github.io/latest/index.html) for more details on ACRN hypervisor and device model.
## Pre-requisites
This document requires the presence of the ACRN hypervisor and Kata Containers on your system. Install using the instructions available through the following links:
- ACRN supported [Hardware](https://projectacrn.github.io/latest/hardware.html#supported-hardware).
> **Note:** Please make sure to have a minimum of 4 logical processors (HT) or cores.
- ACRN [software](https://projectacrn.github.io/latest/tutorials/kbl-nuc-sdc.html#use-the-script-to-set-up-acrn-automatically) setup.
- For networking, ACRN supports either MACVTAP or TAP. If MACVTAP is not enabled in the Service OS, please follow the below steps to update the kernel:
```sh
$ git clone https://github.com/projectacrn/acrn-kernel.git
$ cd acrn-kernel
$ cp kernel_config_sos .config
$ sed -i "s/# CONFIG_MACVLAN is not set/CONFIG_MACVLAN=y/" .config
$ sed -i '$ i CONFIG_MACVTAP=y' .config
$ make clean && make olddefconfig && make && sudo make modules_install INSTALL_MOD_PATH=out/
```
Login into Service OS and update the kernel with MACVTAP support:
```sh
$ sudo mount /dev/sda1 /mnt
$ sudo scp -r <user name>@<host address>:<your workspace>/acrn-kernel/arch/x86/boot/bzImage /mnt/EFI/org.clearlinux/
$ sudo scp -r <user name>@<host address>:<your workspace>/acrn-kernel/out/lib/modules/* /lib/modules/
$ conf_file=$(sed -n '$ s/default //p' /mnt/loader/loader.conf).conf
$ kernel_img=$(sed -n 2p /mnt/loader/entries/$conf_file | cut -d'/' -f4)
$ sudo sed -i "s/$kernel_img/bzImage/g" /mnt/loader/entries/$conf_file
$ sync && sudo umount /mnt && sudo reboot
```
- Kata Containers installation: Automated installation does not seem to be supported for Clear Linux, so please use [manual installation](https://github.com/kata-containers/documentation/blob/master/Developer-Guide.md) steps.
> **Note:** Create rootfs image and not initrd image.
In order to run Kata with ACRN, your container stack must provide block-based storage, such as device-mapper.
> **Note:** Currently, by design you can only launch one VM from Kata Containers using ACRN hypervisor (SDC scenario). Based on feedback from community we can increase number of VMs.
## Configure Docker
To configure Docker for device-mapper and Kata,
1. Stop Docker daemon if it is already running.
```bash
$ sudo systemctl stop docker
```
2. Set `/etc/docker/daemon.json` with the following contents.
```
{
"storage-driver": "devicemapper"
}
```
3. Restart docker.
```bash
$ sudo systemctl daemon-reload
$ sudo systemctl restart docker
```
4. Configure [Docker](https://github.com/kata-containers/documentation/blob/master/Developer-Guide.md#update-the-docker-systemd-unit-file) to use `kata-runtime`.
## Configure Kata Containers with ACRN
To configure Kata Containers with ACRN, copy the generated `configuration-acrn.toml` file when building the `kata-runtime` to either `/etc/kata-containers/configuration.toml` or `/usr/share/defaults/kata-containers/configuration.toml`.
The following command shows full paths to the `configuration.toml` files that the runtime loads. It will use the first path that exists. (Please make sure the kernel and image paths are set correctly in the `configuration.toml` file)
```bash
$ sudo kata-runtime --kata-show-default-config-paths
```
>**Warning:** Please offline CPUs using [this](offline_cpu.sh) script, else VM launches will fail.
```bash
$ sudo ./offline_cpu.sh
```
Start an ACRN based Kata Container,
```bash
$ sudo docker run -ti --runtime=kata-runtime busybox sh
```
You will see ACRN(`acrn-dm`) is now running on your system, as well as a `kata-shim`, `kata-proxy`. You should obtain an interactive shell prompt. Verify that all the Kata processes terminate once you exit the container.
```bash
$ ps -ef | grep -E "kata|acrn"
```
Validate ACRN hypervisor by using `kata-runtime kata-env`,
```sh
$ kata-runtime kata-env | awk -v RS= '/\[Hypervisor\]/'
[Hypervisor]
MachineType = ""
Version = "DM version is: 1.2-unstable-254577a6-dirty (daily tag:acrn-2019w27.4-140000p)
Path = "/usr/bin/acrn-dm"
BlockDeviceDriver = "virtio-blk"
EntropySource = "/dev/urandom"
Msize9p = 0
MemorySlots = 10
Debug = false
UseVSock = false
SharedFS = ""
```

View File

@@ -0,0 +1,115 @@
# Kata Containers with NEMU
* [Introduction](#introduction)
* [Pre-requisites](#pre-requisites)
* [NEMU](#nemu)
* [Download and build](#download-and-build)
* [x86_64](#x86_64)
* [aarch64](#aarch64)
* [Configure Kata Containers](#configure-kata-containers)
Kata Containers relies by default on the QEMU hypervisor in order to spawn the virtual machines running containers. [NEMU](https://github.com/intel/nemu) is a fork of QEMU that:
- Reduces the number of lines of code.
- Removes all legacy devices.
- Reduces the emulation as far as possible.
## Introduction
This document describes how to run Kata Containers with NEMU, first by explaining how to download, build and install it. Then it walks through the steps needed to update your Kata Containers configuration in order to run with NEMU.
## Pre-requisites
This document requires Kata Containers to be [installed](https://github.com/kata-containers/documentation/blob/master/install/README.md) on your system.
Also, it's worth noting that NEMU only supports `x86_64` and `aarch64` architecture.
## NEMU
### Download and build
```bash
$ git clone https://github.com/intel/nemu.git
$ cd nemu
$ git fetch origin
$ git checkout origin/experiment/automatic-removal
```
#### x86_64
```
$ SRCDIR=$PWD ./tools/build_x86_64_virt.sh
```
#### aarch64
```
$ SRCDIR=$PWD ./tools/build_aarch64.sh
```
> **Note:** The branch `experiment/automatic-removal` is a branch published by Jenkins after it has applied the automatic removal script to the `topic/virt-x86` branch. The purpose of this code removal being to reduce the source tree by removing files not being used by NEMU.
After those commands have successfully returned, you will find the NEMU binary at `$HOME/build-x86_64_virt/x86_64_virt-softmmu/qemu-system-x86_64_virt` (__x86__), or `$HOME/build-aarch64/aarch64-softmmu/qemu-system-aarch64` (__ARM__).
You also need the `OVMF` firmware in order to boot the virtual machine's kernel. It can currently be found at this [location](https://github.com/intel/ovmf-virt/releases).
```bash
$ sudo mkdir -p /usr/share/nemu
$ OVMF_URL=$(curl -sL https://api.github.com/repos/intel/ovmf-virt/releases/latest | jq -S '.assets[0].browser_download_url')
$ curl -o OVMF.fd -L $(sed -e 's/^"//' -e 's/"$//' <<<"$OVMF_URL")
$ sudo install -o root -g root -m 0640 OVMF.fd /usr/share/nemu/
```
> **Note:** The OVMF firmware will be located at this temporary location until the changes can be pushed upstream.
## Configure Kata Containers
All you need from this section is to modify the configuration file `/usr/share/defaults/kata-containers/configuration.toml` to specify the options related to the hypervisor.
```diff
[hypervisor.qemu]
-path = "/usr/bin/qemu-lite-system-x86_64"
+path = "/home/foo/build-x86_64_virt/x86_64_virt-softmmu/qemu-system-x86_64_virt"
kernel = "/usr/share/kata-containers/vmlinuz.container"
initrd = "/usr/share/kata-containers/kata-containers-initrd.img"
image = "/usr/share/kata-containers/kata-containers.img"
-machine_type = "pc"
+machine_type = "virt"
# Optional space-separated list of options to pass to the guest kernel.
# For example, use `kernel_params = "vsyscall=emulate"` if you are having
@@ -31,7 +31,7 @@
# Path to the firmware.
# If you want that qemu uses the default firmware leave this option empty
-firmware = ""
+firmware = "/usr/share/nemu/OVMF.fd"
# Machine accelerators
# comma-separated list of machine accelerators to pass to the hypervisor.
```
As you can see from this snippet above, all you need to change is:
- The path to the hypervisor binary, `/home/foo/build-x86_64_virt/x86_64_virt-softmmu/qemu-system-x86_64_virt` in this example.
- The machine type from `pc` to `virt`.
- The path to the firmware binary, `/usr/share/nemu/OVMF.fd` in this example.
Once you have saved those modifications, you can start a new container:
```bash
$ docker run --runtime=kata-runtime -it busybox
```
And you will be able to verify this new container is running with the NEMU hypervisor by looking for the hypervisor path and the machine type from the `qemu` process running on your system:
```bash
$ ps -aux | grep qemu
root ... /home/foo/build-x86_64_virt/x86_64_virt-softmmu/qemu-system-x86_64_virt
... -machine virt,accel=kvm,kernel_irqchip,nvdimm ...
```
Also relying on `kata-runtime kata-env` is a reliable way to validate you are using the expected hypervisor:
```bash
$ kata-runtime kata-env | awk -v RS= '/\[Hypervisor\]/'
[Hypervisor]
MachineType = "virt"
Version = "NEMU (like QEMU) version 3.0.0 (v3.0.0-179-gaf9a791)\nCopyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers"
Path = "/home/foo/build-x86_64_virt/x86_64_virt-softmmu/qemu-system-x86_64_virt"
BlockDeviceDriver = "virtio-scsi"
EntropySource = "/dev/urandom"
Msize9p = 8192
MemorySlots = 10
Debug = true
UseVSock = false
```

View File

@@ -0,0 +1,143 @@
# Setting Sysctls with Kata
## Sysctls
In Linux, the sysctl interface allows an administrator to modify kernel
parameters at runtime. Parameters are available via the `/proc/sys/` virtual
process file system.
The parameters include the following subsystems among others:
- `fs` (file systems)
- `kernel` (kernel)
- `net` (networking)
- `vm` (virtual memory)
To get a complete list of kernel parameters, run:
```
$ sudo sysctl -a
```
Both Docker and Kubernetes provide mechanisms for setting namespaced sysctls.
Namespaced sysctls can be set per pod in the case of Kubernetes or per container
in case of Docker.
The following sysctls are known to be namespaced and can be set with
Docker and Kubernetes:
- `kernel.shm*`
- `kernel.msg*`
- `kernel.sem`
- `fs.mqueue.*`
- `net.*`
### Namespaced Sysctls:
Kata Containers supports setting namespaced sysctls with Docker and Kubernetes.
All namespaced sysctls can be set in the same way as regular Linux based
containers, the difference being, in the case of Kata they are set inside the guest.
#### Setting Namespaced Sysctls with Docker:
```
$ sudo docker run --runtime=kata-runtime -it alpine cat /proc/sys/fs/mqueue/queues_max
256
$ sudo docker run --runtime=kata-runtime --sysctl fs.mqueue.queues_max=512 -it alpine cat /proc/sys/fs/mqueue/queues_max
512
```
... and:
```
$ sudo docker run --runtime=kata-runtime -it alpine cat /proc/sys/kernel/shmmax
18446744073692774399
$ sudo docker run --runtime=kata-runtime --sysctl kernel.shmmax=1024 -it alpine cat /proc/sys/kernel/shmmax
1024
```
For additional documentation on setting sysctls with Docker please refer to [Docker-sysctl-doc](https://docs.docker.com/engine/reference/commandline/run/#configure-namespaced-kernel-parameters-sysctls-at-runtime).
#### Setting Namespaced Sysctls with Kubernetes:
Kubernetes considers certain sysctls as safe and others as unsafe. For detailed
information about what sysctls are considered unsafe, please refer to the [Kubernetes sysctl docs](https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/).
For using unsafe sysctls, the cluster admin would need to allow these as:
```
$ kubelet --allowed-unsafe-sysctls 'kernel.msg*,net.ipv4.route.min_pmtu' ...
```
or using the declarative approach as:
```
$ cat kubeadm.yaml
apiVersion: kubeadm.k8s.io/v1alpha3
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
allowed-unsafe-sysctls: "kernel.msg*,kernel.shm.*,net.*"
...
```
The above YAML can then be passed to `kubeadm init` as:
```
$ sudo -E kubeadm init --config=kubeadm.yaml
```
Both safe and unsafe sysctls can be enabled in the same way in the Pod YAML:
```
apiVersion: v1
kind: Pod
metadata:
name: sysctl-example
spec:
securityContext:
sysctls:
- name: kernel.shm_rmid_forced
value: "0"
- name: net.ipv4.route.min_pmtu
value: "1024"
```
### Non-Namespaced Sysctls:
Docker and Kubernetes disallow sysctls without a namespace.
The recommendation is to set them directly on the host or use a privileged
container in the case of Kubernetes.
In the case of Kata, the approach of setting sysctls on the host does not
work since the host sysctls have no effect on a Kata Container running
inside a guest. Kata gives you the ability to set non-namespaced sysctls using a privileged container.
This has the advantage that the non-namespaced sysctls are set inside the guest
without having any effect on the `/proc/sys` values of any other pod or the
host itself.
The recommended approach to do this would be to set the sysctl value in a
privileged init container. In this way, the application containers do not need any elevated
privileges.
```
apiVersion: v1
kind: Pod
metadata:
name: busybox-kata
spec:
runtimeClassName: kata-qemu
securityContext:
sysctls:
- name: kernel.shm_rmid_forced
value: "0"
containers:
- name: busybox-container
securityContext:
privileged: true
image: debian
command:
- sleep
- "3000"
initContainers:
- name: init-sys
securityContext:
privileged: true
image: busybox
command: ['sh', '-c', 'echo "64000" > /proc/sys/vm/max_map_count']
```

View File

@@ -0,0 +1,51 @@
# Kata Containers with virtio-fs
- [Introduction](#introduction)
- [Pre-requisites](#pre-requisites)
- [Install Kata Containers with virtio-fs support](#install-kata-containers-with-virtio-fs-support)
- [Run a Kata Container utilizing virtio-fs](#run-a-kata-container-utilizing-virtio-fs)
## Introduction
Container deployments utilize explicit or implicit file sharing between host filesystem and containers. From a trust perspective, avoiding a shared file-system between the trusted host and untrusted container is recommended. This is not always feasible. In Kata Containers, block-based volumes are preferred as they allow usage of either device pass through or `virtio-blk` for access within the virtual machine.
As of the 1.7 release of Kata Containers, [9pfs](https://www.kernel.org/doc/Documentation/filesystems/9p.txt) is the default filesystem sharing mechanism. While this does allow for workload compatibility, it does so with degraded performance and potential for POSIX compliance limitations.
To help address these limitations, [virtio-fs](https://virtio-fs.gitlab.io/) has been developed. virtio-fs is a shared file system that lets virtual machines access a directory tree on the host. In Kata Containers, virtio-fs can be used to share container volumes, secrets, config-maps, configuration files (hostname, hosts, `resolv.conf`) and the container rootfs on the host with the guest. virtio-fs provides significant performance and POSIX compliance improvements compared to 9pfs.
Enabling of virtio-fs requires changes in the guest kernel as well as the VMM. For Kata Containers, experimental virtio-fs support is enabled through the [NEMU VMM](https://github.com/intel/nemu).
**Note: virtio-fs support is experimental in the 1.7 release of Kata Containers. Work is underway to improve stability, performance and upstream integration. This is available for early preview - use at your own risk**
This document describes how to get Kata Containers to work with virtio-fs.
## Pre-requisites
* Before Kata 1.8 this feature required the host to have hugepages support enabled. Enable this with the `sysctl vm.nr_hugepages=1024` command on the host.
## Install Kata Containers with virtio-fs support
The Kata Containers NEMU configuration, the NEMU VMM and the `virtiofs` daemon are available in the [Kata Container release](https://github.com/kata-containers/runtime/releases) artifacts starting with the 1.7 release. While the feature is experimental, distribution packages are not supported, but installation is available through [`kata-deploy`](https://github.com/kata-containers/packaging/tree/master/kata-deploy).
Install the latest release of Kata as follows:
```
docker run --runtime=runc -v /opt/kata:/opt/kata -v /var/run/dbus:/var/run/dbus -v /run/systemd:/run/systemd -v /etc/docker:/etc/docker -it katadocker/kata-deploy kata-deploy-docker install
```
This will place the Kata release artifacts in `/opt/kata`, and update Docker's configuration to include a runtime target, `kata-nemu`. Learn more about `kata-deploy` and how to use `kata-deploy` in Kubernetes [here](https://github.com/kata-containers/packaging/tree/master/kata-deploy#kubernetes-quick-start).
## Run a Kata Container utilizing virtio-fs
Once installed, start a new container, utilizing NEMU + `virtiofs`:
```bash
$ docker run --runtime=kata-nemu -it busybox
```
Verify the new container is running with the NEMU hypervisor as well as using `virtiofsd`. To do this look for the hypervisor path and the `virtiofs` daemon process on the host:
```bash
$ ps -aux | grep virtiofs
root ... /home/foo/build-x86_64_virt/x86_64_virt-softmmu/qemu-system-x86_64_virt
... -machine virt,accel=kvm,kernel_irqchip,nvdimm ...
root ... /home/foo/build-x86_64_virt/virtiofsd-x86_64 ...
```

View File

@@ -0,0 +1,53 @@
# Kata Containers with `virtio-mem`
- [Introduction](#introduction)
- [Requisites](#requisites)
- [Run a Kata Container utilizing `virtio-mem`](#run-a-kata-container-utilizing-virtio-mem)
## Introduction
The basic idea of `virtio-mem` is to provide a flexible, cross-architecture memory hot plug and hot unplug solution that avoids many limitations imposed by existing technologies, architectures, and interfaces.
More details can be found in https://lkml.org/lkml/2019/12/12/681.
Kata Containers with `virtio-mem` supports memory resize.
## Requisites
Kata Containers with `virtio-mem` requires Linux and the QEMU that support `virtio-mem`.
The Linux kernel and QEMU upstream version still not support `virtio-mem`. @davidhildenbrand is working on them.
Please use following unofficial version of the Linux kernel and QEMU that support `virtio-mem` with Kata Containers.
The Linux kernel is at https://github.com/davidhildenbrand/linux/tree/virtio-mem-rfc-v4.
The Linux kernel config that can work with Kata Containers is at https://gist.github.com/teawater/016194ee84748c768745a163d08b0fb9.
The QEMU is at https://github.com/teawater/qemu/tree/kata-virtio-mem. (The original source is at https://github.com/davidhildenbrand/qemu/tree/virtio-mem. Its base version of QEMU cannot work with Kata Containers. So merge the commit of `virtio-mem` to upstream QEMU.)
Set Linux and the QEMU that support `virtio-mem` with following line in the Kata Containers QEMU configuration `configuration-qemu.toml`:
```toml
[hypervisor.qemu]
path = "qemu-dir"
kernel = "vmlinux-dir"
```
Enable `virtio-mem` with following line in the Kata Containers configuration:
```toml
enable_virtio_mem = true
```
## Run a Kata Container utilizing `virtio-mem`
Use following command to enable memory overcommitment of a Linux kernel. Because QEMU `virtio-mem` device need to allocate a lot of memory.
```
$ echo 1 | sudo tee /proc/sys/vm/overcommit_memory
```
Use following command start a Kata Container.
```
$ docker run --rm -it --runtime=kata --name test busybox
```
Use following command set the memory size of test to default_memory + 512m.
```
$ docker update -m 512m --memory-swap -1 test
```

Binary file not shown.

After

Width:  |  Height:  |  Size: 130 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 38 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 136 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 51 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 144 KiB

View File

@@ -0,0 +1,24 @@
#!/bin/bash
# Copyright (c) 2019 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
#
# Description: Offline SOS CPUs except BSP before launch UOS
[ $(id -u) -eq 0 ] || { echo >&2 "ERROR: run as root"; exit 1; }
for i in $(ls -d /sys/devices/system/cpu/cpu[1-9]*); do
online=`cat $i/online`
idx=`echo $i | tr -cd "[0-9]"`
echo "INFO:$0: cpu$idx online=$online"
if [ "$online" = "1" ]; then
echo 0 > $i/online
while [ "$online" = "1" ]; do
sleep 1
echo 0 > $i/online
online=`cat $i/online`
done
echo $idx > /sys/class/vhm/acrn_vhm/offline_cpu
fi
done

79
docs/how-to/privileged.md Normal file
View File

@@ -0,0 +1,79 @@
# Privileged Kata Containers
Kata Containers supports creation of containers that are "privileged" (i.e. have additional capabilities and access
that is not normally granted).
* [Warnings](#warnings)
* [Host Devices](#host-devices)
* [Containerd and CRI](#containerd-and-cri)
* [CRI-O](#cri-o)
## Warnings
**Warning:** Whilst this functionality is supported, it can decrease the security of Kata Containers if not configured
correctly.
### Host Devices
By default, when privileged is enabled for a container, all the `/dev/*` block devices from the host are mounted
into the guest. This will allow the privileged container inside the Kata guest to gain access to mount any block device
from the host, a potentially undesirable side-effect that decreases the security of Kata.
The following sections document how to configure this behavior in different container runtimes.
#### Containerd and CRI
The Containerd CRI allows configuring the privileged host devices behavior for each runtime in the CRI config. This is
done with the `privileged_without_host_devices` option. Setting this to `true` will disable hot plugging of the host
devices into the guest, even when privileged is enabled.
Support for configuring privileged host devices behaviour was added in containerd `1.3.0` version.
See below example config:
```toml
[plugins]
[plugins.cri]
[plugins.cri.containerd]
[plugins.cri.containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v1"
privileged_without_host_devices = false
[plugins.cri.containerd.runtimes.kata]
runtime_type = "io.containerd.kata.v2"
privileged_without_host_devices = true
[plugins.cri.containerd.runtimes.kata.options]
ConfigPath = "/opt/kata/share/defaults/kata-containers/configuration.toml"
```
- [Kata Containers with Containerd and CRI documentation](how-to-use-k8s-with-cri-containerd-and-kata.md)
- [Containerd CRI config documentation](https://github.com/containerd/cri/blob/master/docs/config.md)
#### CRI-O
Similar to containerd, CRI-O allows configuring the privileged host devices
behavior for each runtime in the CRI config. This is done with the
`privileged_without_host_devices` option. Setting this to `true` will disable
hot plugging of the host devices into the guest, even when privileged is enabled.
Support for configuring privileged host devices behaviour was added in CRI-O `1.16.0` version.
See below example config:
```toml
[crio.runtime.runtimes.runc]
runtime_path = "/usr/local/bin/crio-runc"
runtime_type = "oci"
runtime_root = "/run/runc"
privileged_without_host_devices = false
[crio.runtime.runtimes.kata]
runtime_path = "/usr/bin/kata-runtime"
runtime_type = "oci"
privileged_without_host_devices = true
[crio.runtime.runtimes.kata-shim2]
runtime_path = "/usr/local/bin/containerd-shim-kata-v2"
runtime_type = "vm"
privileged_without_host_devices = true
```
- [Kata Containers with CRI-O](https://github.com/kata-containers/documentation/blob/master/how-to/run-kata-with-k8s.md#cri-o)

View File

@@ -0,0 +1,204 @@
# Run Kata Containers with Kubernetes
* [Run Kata Containers with Kubernetes](#run-kata-containers-with-kubernetes)
* [Prerequisites](#prerequisites)
* [Install a CRI implementation](#install-a-cri-implementation)
* [CRI-O](#cri-o)
* [Kubernetes Runtime Class (CRI-O v1.12 )](#kubernetes-runtime-class-cri-o-v112)
* [Untrusted annotation (until CRI-O v1.12)](#untrusted-annotation-until-cri-o-v112)
* [Network namespace management](#network-namespace-management)
* [containerd with CRI plugin](#containerd-with-cri-plugin)
* [Install Kubernetes](#install-kubernetes)
* [Configure for CRI-O](#configure-for-cri-o)
* [Configure for containerd](#configure-for-containerd)
* [Run a Kubernetes pod with Kata Containers](#run-a-kubernetes-pod-with-kata-containers)
## Prerequisites
This guide requires Kata Containers available on your system, install-able by following [this guide](https://github.com/kata-containers/documentation/blob/master/install/README.md).
## Install a CRI implementation
Kubernetes CRI (Container Runtime Interface) implementations allow using any
OCI-compatible runtime with Kubernetes, such as the Kata Containers runtime.
Kata Containers support both the [CRI-O](https://github.com/kubernetes-incubator/cri-o) and
[CRI-containerd](https://github.com/containerd/cri) CRI implementations.
After choosing one CRI implementation, you must make the appropriate configuration
to ensure it integrates with Kata Containers.
Kata Containers 1.5 introduced the `shimv2` for containerd 1.2.0, reducing the components
required to spawn pods and containers, and this is the preferred way to run Kata Containers with Kubernetes ([as documented here](https://github.com/kata-containers/documentation/blob/master/how-to/how-to-use-k8s-with-cri-containerd-and-kata.md#configure-containerd-to-use-kata-containers)).
An equivalent shim implementation for CRI-O is planned.
### CRI-O
For CRI-O installation instructions, refer to the [CRI-O Tutorial](https://github.com/kubernetes-incubator/cri-o/blob/master/tutorial.md) page.
The following sections show how to set up the CRI-O configuration file (default path: `/etc/crio/crio.conf`) for Kata.
Unless otherwise stated, all the following settings are specific to the `crio.runtime` table:
```toml
# The "crio.runtime" table contains settings pertaining to the OCI
# runtime used and options for how to set up and manage the OCI runtime.
[crio.runtime]
```
A comprehensive documentation of the configuration file can be found [here](https://github.com/cri-o/cri-o/blob/master/docs/crio.conf.5.md).
> **Note**: After any change to this file, the CRI-O daemon have to be restarted with:
>````
>$ sudo systemctl restart crio
>````
#### Kubernetes Runtime Class (CRI-O v1.12+)
The [Kubernetes Runtime Class](https://kubernetes.io/docs/concepts/containers/runtime-class/)
is the preferred way of specifying the container runtime configuration to run a Pod's containers.
To use this feature, Kata must added as a runtime handler with:
```toml
[crio.runtime.runtimes.kata-runtime]
runtime_path = "/usr/bin/kata-runtime"
runtime_type = "oci"
```
You can also add multiple entries to specify alternatives hypervisors, e.g.:
```toml
[crio.runtime.runtimes.kata-qemu]
runtime_path = "/usr/bin/kata-runtime"
runtime_type = "oci"
[crio.runtime.runtimes.kata-fc]
runtime_path = "/usr/bin/kata-runtime"
runtime_type = "oci"
```
#### Untrusted annotation (until CRI-O v1.12)
The untrusted annotation is used to specify a runtime for __untrusted__ workloads, i.e.
a runtime to be used when the workload cannot be trusted and a higher level of security
is required. An additional flag can be used to let CRI-O know if a workload
should be considered _trusted_ or _untrusted_ by default.
For further details, see the documentation
[here](https://github.com/kata-containers/documentation/blob/master/design/architecture.md#mixing-vm-based-and-namespace-based-runtimes).
```toml
# runtime is the OCI compatible runtime used for trusted container workloads.
# This is a mandatory setting as this runtime will be the default one
# and will also be used for untrusted container workloads if
# runtime_untrusted_workload is not set.
runtime = "/usr/bin/runc"
# runtime_untrusted_workload is the OCI compatible runtime used for untrusted
# container workloads. This is an optional setting, except if
# default_container_trust is set to "untrusted".
runtime_untrusted_workload = "/usr/bin/kata-runtime"
# default_workload_trust is the default level of trust crio puts in container
# workloads. It can either be "trusted" or "untrusted", and the default
# is "trusted".
# Containers can be run through different container runtimes, depending on
# the trust hints we receive from kubelet:
# - If kubelet tags a container workload as untrusted, crio will try first to
# run it through the untrusted container workload runtime. If it is not set,
# crio will use the trusted runtime.
# - If kubelet does not provide any information about the container workload trust
# level, the selected runtime will depend on the default_container_trust setting.
# If it is set to "untrusted", then all containers except for the host privileged
# ones, will be run by the runtime_untrusted_workload runtime. Host privileged
# containers are by definition trusted and will always use the trusted container
# runtime. If default_container_trust is set to "trusted", crio will use the trusted
# container runtime for all containers.
default_workload_trust = "untrusted"
```
#### Network namespace management
To enable networking for the workloads run by Kata, CRI-O needs to be configured to
manage network namespaces, by setting the following key to `true`.
In CRI-O v1.16:
```toml
manage_network_ns_lifecycle = true
```
In CRI-O v1.17+:
```toml
manage_ns_lifecycle = true
```
### containerd with CRI plugin
If you select containerd with `cri` plugin, follow the "Getting Started for Developers"
instructions [here](https://github.com/containerd/cri#getting-started-for-developers)
to properly install it.
To customize containerd to select Kata Containers runtime, follow our
"Configure containerd to use Kata Containers" internal documentation
[here](https://github.com/kata-containers/documentation/blob/master/how-to/how-to-use-k8s-with-cri-containerd-and-kata.md#configure-containerd-to-use-kata-containers).
## Install Kubernetes
Depending on what your needs are and what you expect to do with Kubernetes,
please refer to the following
[documentation](https://kubernetes.io/docs/setup/) to install it correctly.
Kubernetes talks with CRI implementations through a `container-runtime-endpoint`,
also called CRI socket. This socket path is different depending on which CRI
implementation you chose, and the Kubelet service has to be updated accordingly.
### Configure for CRI-O
`/etc/systemd/system/kubelet.service.d/0-crio.conf`
```
[Service]
Environment="KUBELET_EXTRA_ARGS=--container-runtime=remote --runtime-request-timeout=15m --container-runtime-endpoint=unix:///var/run/crio/crio.sock"
```
### Configure for containerd
`/etc/systemd/system/kubelet.service.d/0-cri-containerd.conf`
```
[Service]
Environment="KUBELET_EXTRA_ARGS=--container-runtime=remote --runtime-request-timeout=15m --container-runtime-endpoint=unix:///run/containerd/containerd.sock"
```
For more information about containerd see the "Configure Kubelet to use containerd"
documentation [here](https://github.com/kata-containers/documentation/blob/master/how-to/how-to-use-k8s-with-cri-containerd-and-kata.md#configure-kubelet-to-use-containerd).
## Run a Kubernetes pod with Kata Containers
After you update your Kubelet service based on the CRI implementation you
are using, reload and restart Kubelet. Then, start your cluster:
```bash
$ sudo systemctl daemon-reload
$ sudo systemctl restart kubelet
# If using CRI-O
$ sudo kubeadm init --skip-preflight-checks --cri-socket /var/run/crio/crio.sock --pod-network-cidr=10.244.0.0/16
# If using CRI-containerd
$ sudo kubeadm init --skip-preflight-checks --cri-socket /run/containerd/containerd.sock --pod-network-cidr=10.244.0.0/16
$ export KUBECONFIG=/etc/kubernetes/admin.conf
```
You can force Kubelet to use Kata Containers by adding some `untrusted`
annotation to your pod configuration. In our case, this ensures Kata
Containers is the selected runtime to run the described workload.
`nginx-untrusted.yaml`
```yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx-untrusted
annotations:
io.kubernetes.cri.untrusted-workload: "true"
spec:
containers:
- name: nginx
image: nginx
```
Next, you run your pod:
```
$ sudo -E kubectl apply -f nginx-untrusted.yaml
```

246
docs/how-to/service-mesh.md Normal file
View File

@@ -0,0 +1,246 @@
# Kata Containers and service mesh for Kubernetes
* [Assumptions](#assumptions)
* [How they work](#how-they-work)
* [Prerequisites](#prerequisites)
* [Kata and Kubernetes](#kata-and-kubernetes)
* [Restrictions](#restrictions)
* [Install and deploy your service mesh](#install-and-deploy-your-service-mesh)
* [Service Mesh Istio](#service-mesh-istio)
* [Service Mesh Linkerd](#service-mesh-linkerd)
* [Inject your services with sidecars](#inject-your-services-with-sidecars)
* [Sidecar Istio](#sidecar-istio)
* [Sidecar Linkerd](#sidecar-linkerd)
* [Run your services with Kata](#run-your-services-with-kata)
* [Lower privileges](#lower-privileges)
* [Add annotations](#add-annotations)
* [Deploy](#deploy)
A service mesh is a way to monitor and control the traffic between
micro-services running in your Kubernetes cluster. It is a powerful
tool that you might want to use in combination with the security
brought by Kata Containers.
## Assumptions
You are expected to be familiar with concepts such as __pods__,
__containers__, __control plane__, __data plane__, and __sidecar__.
## How they work
Istio and Linkerd both rely on the same model, where they run controller
applications in the control plane, and inject a proxy as a sidecar inside
the pod running the service. The proxy registers in the control plane as
a first step, and it constantly sends different sorts of information about
the service running inside the pod. That information comes from the
filtering performed when receiving all the traffic initially intended for
the service. That is how the interaction between the control plane and the
proxy allows the user to apply load balancing and authentication rules to
the incoming and outgoing traffic, inside the cluster, and between multiple
micro-services.
This cannot not happen without a good amount of `iptables` rules ensuring
the packets reach the proxy instead of the expected service. Rules are
setup through an __init__ container because they have to be there as soon
as the proxy starts.
## Prerequisites
### Kata and Kubernetes
Follow the [instructions](https://github.com/kata-containers/documentation/blob/master/install/README.md)
to get Kata Containers properly installed and configured with Kubernetes.
You can choose between CRI-O and CRI-containerd, both are supported
through this document.
For both cases, select the workloads as _trusted_ by default. This way,
your cluster and your service mesh run with `runc`, and only the containers
you choose to annotate run with Kata Containers.
### Restrictions
As documented [here](https://github.com/linkerd/linkerd2/issues/982),
a kernel version between 4.14.22 and 4.14.40 causes a deadlock when
`getsockopt()` gets called with the `SO_ORIGINAL_DST` option. Unfortunately,
both service meshes use this system call with this same option from the
proxy container running inside the VM. This means that you cannot run
this kernel version range as the guest kernel for Kata if you want your
service mesh to work.
As mentioned when explaining the basic functioning of those service meshes,
`iptables` are heavily used, and they need to be properly enabled through
the guest kernel config. If they are not properly enabled, the init container
is not able to perform a proper setup of the rules.
## Install and deploy your service mesh
### Service Mesh Istio
As a reference, you can follow Istio [instructions](https://istio.io/docs/setup/kubernetes/quick-start/#download-and-prepare-for-the-installation).
The following is a summary of what you need to install Istio on your system:
```
$ curl -L https://git.io/getLatestIstio | sh -
$ cd istio-*
$ export PATH=$PWD/bin:$PATH
```
Now deploy Istio in the control plane of your cluster with the following:
```
$ kubectl apply -f install/kubernetes/istio-demo.yaml
```
To verify that the control plane is properly deployed, you can use both of
the following commands:
```
$ kubectl get svc -n istio-system
$ kubectl get pods -n istio-system -o wide
```
### Service Mesh Linkerd
As a reference, follow the Linkerd [instructions](https://linkerd.io/2/getting-started/index.html).
The following is a summary of what you need to install Linkerd on your system:
```
$ curl https://run.linkerd.io/install | sh
$ export PATH=$PATH:$HOME/.linkerd/bin
```
Now deploy Linkerd in the control plane of your cluster with the following:
```
$ linkerd install | kubectl apply -f -
```
To verify that the control plane is properly deployed, you can use both of
the following commands:
```
$ kubectl get svc -n linkerd
$ kubectl get pods -n linkerd -o wide
```
## Inject your services with sidecars
Once the control plane is running, you need a deployment to define a few
services that rely on each other. Then, you inject the YAML file with the
sidecar proxy using the tools provided by each service mesh.
If you do not have such a deployment ready, refer to the samples provided
by each project.
### Sidecar Istio
Istio provides a [`bookinfo`](https://istio.io/docs/examples/bookinfo/)
sample, which you can rely on to inject their `envoy` proxy as a
sidecar.
You need to use their tool called `istioctl kube-inject` to inject
your YAML file. We use their `bookinfo` sample as an example:
```
$ istioctl kube-inject -f samples/bookinfo/kube/bookinfo.yaml -o bookinfo-injected.yaml
```
### Sidecar Linkerd
Linkerd provides an [`emojivoto`](https://linkerd.io/2/getting-started/index.html)
sample, which you can rely on to inject their `linkerd` proxy as a
sidecar.
You need to use their tool called `linkerd inject` to inject your YAML
file. We use their `emojivoto` sample as example:
```
$ wget https://raw.githubusercontent.com/runconduit/conduit-examples/master/emojivoto/emojivoto.yml
$ linkerd inject emojivoto.yml > emojivoto-injected.yaml
```
## Run your services with Kata
Now that your service deployment is injected with the appropriate sidecar
containers, manually edit your deployment to make it work with Kata.
### Lower privileges
In Kubernetes, the __init__ container is often `privileged` as it needs to
setup the environment, which often needs some root privileges. In the case
of those services meshes, all they need is the `NET_ADMIN` capability to
modify the underlying network rules. Linkerd, by default, does not use
`privileged` container, but Istio does.
Because of the previous reason, if you use Istio you need to switch all
containers with `privileged: true` to `privileged: false`.
### Add annotations
There is no difference between Istio and Linkerd in this section. It is
about which CRI implementation you use.
For both CRI-O and CRI-containerd, you have to add an annotation indicating
the workload for this deployment is not _trusted_, which will trigger
`kata-runtime` to be called instead of `runc`.
__CRI-O:__
Add the following annotation for CRI-O
```yaml
io.kubernetes.cri-o.TrustedSandbox: "false"
```
The following is an example of what your YAML can look like:
```yaml
...
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
creationTimestamp: null
name: details-v1
spec:
replicas: 1
strategy: {}
template:
metadata:
annotations:
io.kubernetes.cri-o.TrustedSandbox: "false"
sidecar.istio.io/status: '{"version":"55c9e544b52e1d4e45d18a58d0b34ba4b72531e45fb6d1572c77191422556ffc","initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["istio-envoy","istio-certs"],"imagePullSecrets":null}'
creationTimestamp: null
labels:
app: details
version: v1
...
```
__CRI-containerd:__
Add the following annotation for CRI-containerd
```yaml
io.kubernetes.cri.untrusted-workload: "true"
```
The following is an example of what your YAML can look like:
```yaml
...
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
creationTimestamp: null
name: details-v1
spec:
replicas: 1
strategy: {}
template:
metadata:
annotations:
io.kubernetes.cri.untrusted-workload: "true"
sidecar.istio.io/status: '{"version":"55c9e544b52e1d4e45d18a58d0b34ba4b72531e45fb6d1572c77191422556ffc","initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["istio-envoy","istio-certs"],"imagePullSecrets":null}'
creationTimestamp: null
labels:
app: details
version: v1
...
```
### Deploy
Deploy your application by using the following:
```
$ kubectl apply -f myapp-injected.yaml
```

View File

@@ -0,0 +1,47 @@
# What Is VMCache and How To Enable It
* [What is VMCache](#what-is-vmcache)
* [How is this different to VM templating](#how-is-this-different-to-vm-templating)
* [How to enable VMCache](#how-to-enable-vmcache)
* [Limitations](#limitations)
### What is VMCache
VMCache is a new function that creates VMs as caches before using it.
It helps speed up new container creation.
The function consists of a server and some clients communicating
through Unix socket. The protocol is gRPC in [`protocols/cache/cache.proto`](https://github.com/kata-containers/runtime/blob/master/protocols/cache/cache.proto).
The VMCache server will create some VMs and cache them by factory cache.
It will convert the VM to gRPC format and transport it when gets
requested from clients.
Factory `grpccache` is the VMCache client. It will request gRPC format
VM and convert it back to a VM. If VMCache function is enabled,
`kata-runtime` will request VM from factory `grpccache` when it creates
a new sandbox.
### How is this different to VM templating
Both [VM templating](https://github.com/kata-containers/documentation/blob/master/how-to/what-is-vm-templating-and-how-do-I-use-it.md) and VMCache help speed up new container creation.
When VM templating enabled, new VMs are created by cloning from a pre-created template VM, and they will share the same initramfs, kernel and agent memory in readonly mode. So it saves a lot of memory if there are many Kata Containers running on the same host.
VMCache is not vulnerable to [share memory CVE](https://github.com/kata-containers/documentation/blob/master/how-to/what-is-vm-templating-and-how-do-I-use-it.md#what-are-the-cons) because each VM doesn't share the memory.
### How to enable VMCache
VMCache can be enabled by changing your Kata Containers config file (`/usr/share/defaults/kata-containers/configuration.toml`,
overridden by `/etc/kata-containers/configuration.toml` if provided) such that:
* `vm_cache_number` specifies the number of caches of VMCache:
* unspecified or == 0
VMCache is disabled
* `> 0`
will be set to the specified number
* `vm_cache_endpoint` specifies the address of the Unix socket.
Then you can create a VM templating for later usage by calling:
```
$ sudo kata-runtime factory init
```
and purge it by `ctrl-c` it.
### Limitations
* Cannot work with VM templating.
* Only supports the QEMU hypervisor.

View File

@@ -0,0 +1,60 @@
# What Is VM Templating and How To Enable It
### What is VM templating
VM templating is a Kata Containers feature that enables new VM
creation using a cloning technique. When enabled, new VMs are created
by cloning from a pre-created template VM, and they will share the
same initramfs, kernel and agent memory in readonly mode. It is very
much like a process fork done by the kernel but here we *fork* VMs.
### How is this different from VMCache
Both [VMCache](https://github.com/kata-containers/documentation/blob/master/how-to/what-is-vm-cache-and-how-do-I-use-it.md) and VM templating help speed up new container creation.
When VMCache enabled, new VMs are created by the VMCache server. So it is not vulnerable to share memory CVE because each VM doesn't share the memory.
VM templating saves a lot of memory if there are many Kata Containers running on the same host.
### What are the Pros
VM templating helps speed up new container creation and saves a lot
of memory if there are many Kata Containers running on the same host.
If you are running a density workload, or care a lot about container
startup speed, VM templating can be very useful.
In one example, we created 100 Kata Containers each claiming 128MB
guest memory and ended up saving 9GB of memory in total when VM templating
is enabled, which is about 72% of the total guest memory. See [full results
here](https://github.com/kata-containers/runtime/pull/303#issuecomment-395846767).
In another example, we created ten Kata Containers with containerd shimv2
and calculated the average boot up speed for each of them. The result
showed that VM templating speeds up Kata Containers creation by as much as
38.68%. See [full results here](https://gist.github.com/bergwolf/06974a3c5981494a40e2c408681c085d).
### What are the Cons
One drawback of VM templating is that it cannot avoid cross-VM side-channel
attack such as [CVE-2015-2877](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-2877)
that originally targeted at the Linux KSM feature.
It was concluded that "Share-until-written approaches for memory conservation among
mutually untrusting tenants are inherently detectable for information disclosure,
and can be classified as potentially misunderstood behaviors rather than vulnerabilities."
**Warning**: If you care about such attack vector, do not use VM templating or KSM.
### How to enable VM templating
VM templating can be enabled by changing your Kata Containers config file (`/usr/share/defaults/kata-containers/configuration.toml`,
overridden by `/etc/kata-containers/configuration.toml` if provided) such that:
- `qemu-lite` is specified in `hypervisor.qemu`->`path` section
- `enable_template = true`
- `initrd =` is set
- `image =` option is commented out or removed
Then you can create a VM templating for later usage by calling
```
$ sudo kata-runtime factory init
```
and purge it by calling
```
$ sudo kata-runtime factory destroy
```
If you do not want to call `kata-runtime factory init` by hand,
the very first Kata container you create will automatically create a VM templating.