mirror of
https://github.com/kata-containers/kata-containers.git
synced 2025-09-16 22:39:01 +00:00
docs: merge documentation repository
Generated by git subtree add --prefix=docs git@github.com:kata-containers/documentation.git master git-subtree-dir: docs git-subtree-mainline:ec146a1b39
git-subtree-split:510287204b
Fixes: #329 Signed-off-by: Peng Tao <bergwolf@hyper.sh>
This commit is contained in:
28
docs/how-to/README.md
Normal file
28
docs/how-to/README.md
Normal file
@@ -0,0 +1,28 @@
|
||||
# Howto Guides
|
||||
|
||||
* [Howto Guides](#howto-guides)
|
||||
* [Kubernetes Integration](#kubernetes-integration)
|
||||
* [Hypervisors Integration](#hypervisors-integration)
|
||||
* [Advanced Topics](#advanced-topics)
|
||||
|
||||
## Kubernetes Integration
|
||||
- [Run Kata Containers with Kubernetes](run-kata-with-k8s.md)
|
||||
- [How to use Kata Containers and Containerd](containerd-kata.md)
|
||||
- [How to use Kata Containers and CRI (containerd plugin) with Kubernetes](how-to-use-k8s-with-cri-containerd-and-kata.md)
|
||||
- [Kata Containers and service mesh for Kubernetes](service-mesh.md)
|
||||
- [How to import Kata Containers logs into Fluentd](how-to-import-kata-logs-with-fluentd.md)
|
||||
|
||||
## Hypervisors Integration
|
||||
- [Kata Containers with Firecracker](https://github.com/kata-containers/documentation/wiki/Initial-release-of-Kata-Containers-with-Firecracker-support)
|
||||
- [Kata Containers with NEMU](how-to-use-kata-containers-with-nemu.md)
|
||||
- [Kata Containers with ACRN Hypervisor](how-to-use-kata-containers-with-acrn.md)
|
||||
|
||||
## Advanced Topics
|
||||
- [How to use Kata Containers with virtio-fs](how-to-use-virtio-fs-with-kata.md)
|
||||
- [Setting Sysctls with Kata](how-to-use-sysctls-with-kata.md)
|
||||
- [What Is VMCache and How To Enable It](what-is-vm-cache-and-how-do-I-use-it.md)
|
||||
- [What Is VM Templating and How To Enable It](what-is-vm-templating-and-how-do-I-use-it.md)
|
||||
- [Privileged Kata Containers](privileged.md)
|
||||
- [How to load kernel modules in Kata Containers](how-to-load-kernel-modules-with-kata.md)
|
||||
- [How to use Kata Containers with `virtio-mem`](how-to-use-virtio-mem-with-kata.md)
|
||||
- [How to set sandbox Kata Containers configurations with pod annotations](how-to-set-sandbox-config-kata.md)
|
368
docs/how-to/containerd-kata.md
Normal file
368
docs/how-to/containerd-kata.md
Normal file
@@ -0,0 +1,368 @@
|
||||
# How to use Kata Containers and Containerd
|
||||
|
||||
- [Concepts](#concepts)
|
||||
- [Kubernetes `RuntimeClass`](#kubernetes-runtimeclass)
|
||||
- [Containerd Runtime V2 API: Shim V2 API](#containerd-runtime-v2-api-shim-v2-api)
|
||||
- [Install](#install)
|
||||
- [Install Kata Containers](#install-kata-containers)
|
||||
- [Install containerd with CRI plugin](#install-containerd-with-cri-plugin)
|
||||
- [Install CNI plugins](#install-cni-plugins)
|
||||
- [Install `cri-tools`](#install-cri-tools)
|
||||
- [Configuration](#configuration)
|
||||
- [Configure containerd to use Kata Containers](#configure-containerd-to-use-kata-containers)
|
||||
- [Kata Containers as a `RuntimeClass`](#kata-containers-as-a-runtimeclass)
|
||||
- [Kata Containers as the runtime for untrusted workload](#kata-containers-as-the-runtime-for-untrusted-workload)
|
||||
- [Kata Containers as the default runtime](#kata-containers-as-the-default-runtime)
|
||||
- [Configuration for `cri-tools`](#configuration-for-cri-tools)
|
||||
- [Run](#run)
|
||||
- [Launch containers with `ctr` command line](#launch-containers-with-ctr-command-line)
|
||||
- [Launch Pods with `crictl` command line](#launch-pods-with-crictl-command-line)
|
||||
|
||||
This document covers the installation and configuration of [containerd](https://containerd.io/)
|
||||
and [Kata Containers](https://katacontainers.io). The containerd provides not only the `ctr`
|
||||
command line tool, but also the [CRI](https://kubernetes.io/blog/2016/12/container-runtime-interface-cri-in-kubernetes/)
|
||||
interface for [Kubernetes](https://kubernetes.io) and other CRI clients.
|
||||
|
||||
This document is primarily written for Kata Containers v1.5.0-rc2 or above, and containerd v1.2.0 or above.
|
||||
Previous versions are addressed here, but we suggest users upgrade to the newer versions for better support.
|
||||
|
||||
## Concepts
|
||||
|
||||
### Kubernetes `RuntimeClass`
|
||||
|
||||
[`RuntimeClass`](https://kubernetes.io/docs/concepts/containers/runtime-class/) is a Kubernetes feature first
|
||||
introduced in Kubernetes 1.12 as alpha. It is the feature for selecting the container runtime configuration to
|
||||
use to run a pod’s containers. This feature is supported in `containerd` since [v1.2.0](https://github.com/containerd/containerd/releases/tag/v1.2.0).
|
||||
|
||||
Before the `RuntimeClass` was introduced, Kubernetes was not aware of the difference of runtimes on the node. `kubelet`
|
||||
creates Pod sandboxes and containers through CRI implementations, and treats all the Pods equally. However, there
|
||||
are requirements to run trusted Pods (i.e. Kubernetes plugin) in a native container like runc, and to run untrusted
|
||||
workloads with isolated sandboxes (i.e. Kata Containers).
|
||||
|
||||
As a result, the CRI implementations extended their semantics for the requirements:
|
||||
|
||||
- At the beginning, [Frakti](https://github.com/kubernetes/frakti) checks the network configuration of a Pod, and
|
||||
treat Pod with `host` network as trusted, while others are treated as untrusted.
|
||||
- The containerd introduced an annotation for untrusted Pods since [v1.0](https://github.com/containerd/cri/blob/v1.0.0-rc.0/docs/config.md):
|
||||
```yaml
|
||||
annotations:
|
||||
io.kubernetes.cri.untrusted-workload: "true"
|
||||
```
|
||||
- Similarly, CRI-O introduced the annotation `io.kubernetes.cri-o.TrustedSandbox` for untrusted Pods.
|
||||
|
||||
To eliminate the complexity of user configuration introduced by the non-standardized annotations and provide
|
||||
extensibility, `RuntimeClass` was introduced. This gives users the ability to affect the runtime behavior
|
||||
through `RuntimeClass` without the knowledge of the CRI daemons. We suggest that users with multiple runtimes
|
||||
use `RuntimeClass` instead of the deprecated annotations.
|
||||
|
||||
### Containerd Runtime V2 API: Shim V2 API
|
||||
|
||||
The [`containerd-shim-kata-v2` (short as `shimv2` in this documentation)](https://github.com/kata-containers/runtime/tree/master/containerd-shim-v2)
|
||||
implements the [Containerd Runtime V2 (Shim API)](https://github.com/containerd/containerd/tree/master/runtime/v2) for Kata.
|
||||
With `shimv2`, Kubernetes can launch Pod and OCI-compatible containers with one shim per Pod. Prior to `shimv2`, `2N+1`
|
||||
shims (i.e. a `containerd-shim` and a `kata-shim` for each container and the Pod sandbox itself) and no standalone `kata-proxy`
|
||||
process were used, even with VSOCK not available.
|
||||
|
||||

|
||||
|
||||
The shim v2 is introduced in containerd [v1.2.0](https://github.com/containerd/containerd/releases/tag/v1.2.0) and Kata `shimv2`
|
||||
is implemented in Kata Containers v1.5.0.
|
||||
|
||||
## Install
|
||||
|
||||
### Install Kata Containers
|
||||
|
||||
Follow the instructions to [install Kata Containers](https://github.com/kata-containers/documentation/blob/master/install/README.md).
|
||||
|
||||
### Install containerd with CRI plugin
|
||||
|
||||
> **Note:** `cri` is a native plugin of containerd 1.1 and above. It is built into containerd and enabled by default.
|
||||
> You do not need to install `cri` if you have containerd 1.1 or above. Just remove the `cri` plugin from the list of
|
||||
> `disabled_plugins` in the containerd configuration file (`/etc/containerd/config.toml`).
|
||||
|
||||
Follow the instructions from the [CRI installation guide](http://github.com/containerd/cri/blob/master/docs/installation.md).
|
||||
|
||||
Then, check if `containerd` is now available:
|
||||
|
||||
```bash
|
||||
$ command -v containerd
|
||||
```
|
||||
|
||||
### Install CNI plugins
|
||||
|
||||
> **Note:** You do not need to install CNI plugins if you do not want to use containerd with Kubernetes.
|
||||
> If you have installed Kubernetes with `kubeadm`, you might have already installed the CNI plugins.
|
||||
|
||||
You can manually install CNI plugins as follows:
|
||||
|
||||
```bash
|
||||
$ go get github.com/containernetworking/plugins
|
||||
$ pushd $GOPATH/src/github.com/containernetworking/plugins
|
||||
$ ./build_linux.sh
|
||||
$ sudo mkdir /opt/cni
|
||||
$ sudo cp -r bin /opt/cni/
|
||||
$ popd
|
||||
```
|
||||
|
||||
### Install `cri-tools`
|
||||
|
||||
> **Note:** `cri-tools` is a set of tools for CRI used for development and testing. Users who only want
|
||||
> to use containerd with Kubernetes can skip the `cri-tools`.
|
||||
|
||||
You can install the `cri-tools` from source code:
|
||||
|
||||
```bash
|
||||
$ go get github.com/kubernetes-incubator/cri-tools
|
||||
$ pushd $GOPATH/src/github.com/kubernetes-incubator/cri-tools
|
||||
$ make
|
||||
$ sudo -E make install
|
||||
$ popd
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
### Configure containerd to use Kata Containers
|
||||
|
||||
By default, the configuration of containerd is located at `/etc/containerd/config.toml`, and the
|
||||
`cri` plugins are placed in the following section:
|
||||
|
||||
```toml
|
||||
[plugins]
|
||||
[plugins.cri]
|
||||
[plugins.cri.containerd]
|
||||
[plugins.cri.containerd.default_runtime]
|
||||
#runtime_type = "io.containerd.runtime.v1.linux"
|
||||
|
||||
[plugins.cri.cni]
|
||||
# conf_dir is the directory in which the admin places a CNI conf.
|
||||
conf_dir = "/etc/cni/net.d"
|
||||
```
|
||||
|
||||
The following sections outline how to add Kata Containers to the configurations.
|
||||
|
||||
#### Kata Containers as a `RuntimeClass`
|
||||
|
||||
For
|
||||
- Kata Containers v1.5.0 or above (including `1.5.0-rc`)
|
||||
- Containerd v1.2.0 or above
|
||||
- Kubernetes v1.12.0 or above
|
||||
|
||||
The `RuntimeClass` is suggested.
|
||||
|
||||
The following configuration includes three runtime classes:
|
||||
- `plugins.cri.containerd.runtimes.runc`: the runc, and it is the default runtime.
|
||||
- `plugins.cri.containerd.runtimes.kata`: The function in containerd (reference [the document here](https://github.com/containerd/containerd/tree/master/runtime/v2#binary-naming))
|
||||
where the dot-connected string `io.containerd.kata.v2` is translated to `containerd-shim-kata-v2` (i.e. the
|
||||
binary name of the Kata implementation of [Containerd Runtime V2 (Shim API)](https://github.com/containerd/containerd/tree/master/runtime/v2)).
|
||||
- `plugins.cri.containerd.runtimes.katacli`: the `containerd-shim-runc-v1` calls `kata-runtime`, which is the legacy process.
|
||||
|
||||
```toml
|
||||
[plugins.cri.containerd]
|
||||
no_pivot = false
|
||||
[plugins.cri.containerd.runtimes]
|
||||
[plugins.cri.containerd.runtimes.runc]
|
||||
runtime_type = "io.containerd.runc.v1"
|
||||
[plugins.cri.containerd.runtimes.runc.options]
|
||||
NoPivotRoot = false
|
||||
NoNewKeyring = false
|
||||
ShimCgroup = ""
|
||||
IoUid = 0
|
||||
IoGid = 0
|
||||
BinaryName = "runc"
|
||||
Root = ""
|
||||
CriuPath = ""
|
||||
SystemdCgroup = false
|
||||
[plugins.cri.containerd.runtimes.kata]
|
||||
runtime_type = "io.containerd.kata.v2"
|
||||
[plugins.cri.containerd.runtimes.katacli]
|
||||
runtime_type = "io.containerd.runc.v1"
|
||||
[plugins.cri.containerd.runtimes.katacli.options]
|
||||
NoPivotRoot = false
|
||||
NoNewKeyring = false
|
||||
ShimCgroup = ""
|
||||
IoUid = 0
|
||||
IoGid = 0
|
||||
BinaryName = "/usr/bin/kata-runtime"
|
||||
Root = ""
|
||||
CriuPath = ""
|
||||
SystemdCgroup = false
|
||||
```
|
||||
|
||||
From Containerd v1.2.4 and Kata v1.6.0, there is a new runtime option supported, which allows you to specify a specific Kata configuration file as follows:
|
||||
|
||||
```toml
|
||||
[plugins.cri.containerd.runtimes.kata]
|
||||
runtime_type = "io.containerd.kata.v2"
|
||||
[plugins.cri.containerd.runtimes.kata.options]
|
||||
ConfigPath = "/etc/kata-containers/config.toml"
|
||||
```
|
||||
|
||||
This `ConfigPath` option is optional. If you do not specify it, shimv2 first tries to get the configuration file from the environment variable `KATA_CONF_FILE`. If neither are set, shimv2 will use the default Kata configuration file paths (`/etc/kata-containers/configuration.toml` and `/usr/share/defaults/kata-containers/configuration.toml`).
|
||||
|
||||
If you use Containerd older than v1.2.4 or a version of Kata older than v1.6.0 and also want to specify a configuration file, you can use the following workaround, since the shimv2 accepts an environment variable, `KATA_CONF_FILE` for the configuration file path. Then, you can create a
|
||||
shell script with the following:
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
KATA_CONF_FILE=/etc/kata-containers/firecracker.toml containerd-shim-kata-v2 $@
|
||||
```
|
||||
|
||||
Name it as `/usr/local/bin/containerd-shim-katafc-v2` and reference it in the configuration of containerd:
|
||||
|
||||
```toml
|
||||
[plugins.cri.containerd.runtimes.kata-firecracker]
|
||||
runtime_type = "io.containerd.katafc.v2"
|
||||
```
|
||||
|
||||
#### Kata Containers as the runtime for untrusted workload
|
||||
|
||||
For cases without `RuntimeClass` support, we can use the legacy annotation method to support using Kata Containers
|
||||
for an untrusted workload. With the following configuration, you can run trusted workloads with a runtime such as `runc`
|
||||
and then, run an untrusted workload with Kata Containers:
|
||||
|
||||
```toml
|
||||
[plugins.cri.containerd]
|
||||
# "plugins.cri.containerd.default_runtime" is the runtime to use in containerd.
|
||||
[plugins.cri.containerd.default_runtime]
|
||||
# runtime_type is the runtime type to use in containerd e.g. io.containerd.runtime.v1.linux
|
||||
runtime_type = "io.containerd.runtime.v1.linux"
|
||||
|
||||
# "plugins.cri.containerd.untrusted_workload_runtime" is a runtime to run untrusted workloads on it.
|
||||
[plugins.cri.containerd.untrusted_workload_runtime]
|
||||
# runtime_type is the runtime type to use in containerd e.g. io.containerd.runtime.v1.linux
|
||||
runtime_type = "io.containerd.kata.v2"
|
||||
```
|
||||
|
||||
For the earlier versions of Kata Containers and containerd that do not support Runtime V2 (Shim API), you can use the following alternative configuration:
|
||||
|
||||
```toml
|
||||
[plugins.cri.containerd]
|
||||
|
||||
# "plugins.cri.containerd.default_runtime" is the runtime to use in containerd.
|
||||
[plugins.cri.containerd.default_runtime]
|
||||
# runtime_type is the runtime type to use in containerd e.g. io.containerd.runtime.v1.linux
|
||||
runtime_type = "io.containerd.runtime.v1.linux"
|
||||
|
||||
# "plugins.cri.containerd.untrusted_workload_runtime" is a runtime to run untrusted workloads on it.
|
||||
[plugins.cri.containerd.untrusted_workload_runtime]
|
||||
# runtime_type is the runtime type to use in containerd e.g. io.containerd.runtime.v1.linux
|
||||
runtime_type = "io.containerd.runtime.v1.linux"
|
||||
|
||||
# runtime_engine is the name of the runtime engine used by containerd.
|
||||
runtime_engine = "/usr/bin/kata-runtime"
|
||||
```
|
||||
|
||||
You can find more information on the [Containerd config documentation](https://github.com/containerd/cri/blob/master/docs/config.md)
|
||||
|
||||
|
||||
#### Kata Containers as the default runtime
|
||||
|
||||
If you want to set Kata Containers as the only runtime in the deployment, you can simply configure as follows:
|
||||
|
||||
```toml
|
||||
[plugins.cri.containerd]
|
||||
[plugins.cri.containerd.default_runtime]
|
||||
runtime_type = "io.containerd.kata.v2"
|
||||
```
|
||||
|
||||
Alternatively, for the earlier versions of Kata Containers and containerd that do not support Runtime V2 (Shim API), you can use the following alternative configuration:
|
||||
|
||||
```toml
|
||||
[plugins.cri.containerd]
|
||||
[plugins.cri.containerd.default_runtime]
|
||||
runtime_type = "io.containerd.runtime.v1.linux"
|
||||
runtime_engine = "/usr/bin/kata-runtime"
|
||||
```
|
||||
|
||||
### Configuration for `cri-tools`
|
||||
|
||||
> **Note:** If you skipped the [Install `cri-tools`](#install-cri-tools) section, you can skip this section too.
|
||||
|
||||
First, add the CNI configuration in the containerd configuration.
|
||||
|
||||
The following is the configuration if you installed CNI as the *[Install CNI plugins](#install-cni-plugins)* section outlined.
|
||||
|
||||
Put the CNI configuration as `/etc/cni/net.d/10-mynet.conf`:
|
||||
|
||||
```json
|
||||
{
|
||||
"cniVersion": "0.2.0",
|
||||
"name": "mynet",
|
||||
"type": "bridge",
|
||||
"bridge": "cni0",
|
||||
"isGateway": true,
|
||||
"ipMasq": true,
|
||||
"ipam": {
|
||||
"type": "host-local",
|
||||
"subnet": "172.19.0.0/24",
|
||||
"routes": [
|
||||
{ "dst": "0.0.0.0/0" }
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Next, reference the configuration directory through containerd `config.toml`:
|
||||
|
||||
```toml
|
||||
[plugins.cri.cni]
|
||||
# conf_dir is the directory in which the admin places a CNI conf.
|
||||
conf_dir = "/etc/cni/net.d"
|
||||
```
|
||||
|
||||
The configuration file of `crictl` command line tool in `cri-tools` locates at `/etc/crictl.yaml`:
|
||||
|
||||
```yaml
|
||||
runtime-endpoint: unix:///var/run/containerd/containerd.sock
|
||||
image-endpoint: unix:///var/run/containerd/containerd.sock
|
||||
timeout: 10
|
||||
debug: true
|
||||
```
|
||||
|
||||
## Run
|
||||
|
||||
### Launch containers with `ctr` command line
|
||||
|
||||
To run a container with Kata Containers through the containerd command line, you can run the following:
|
||||
|
||||
```bash
|
||||
$ sudo ctr image pull docker.io/library/busybox:latest
|
||||
$ sudo ctr run --runtime io.containerd.run.kata.v2 -t --rm docker.io/library/busybox:latest hello sh
|
||||
```
|
||||
|
||||
This launches a BusyBox container named `hello`, and it will be removed by `--rm` after it quits.
|
||||
|
||||
### Launch Pods with `crictl` command line
|
||||
|
||||
With the `crictl` command line of `cri-tools`, you can specify runtime class with `-r` or `--runtime` flag.
|
||||
Use the following to launch Pod with `kata` runtime class with the pod in [the example](https://github.com/kubernetes-sigs/cri-tools/tree/master/docs/examples)
|
||||
of `cri-tools`:
|
||||
|
||||
```bash
|
||||
$ sudo crictl runp -r kata podsandbox-config.yaml
|
||||
36e23521e8f89fabd9044924c9aeb34890c60e85e1748e8daca7e2e673f8653e
|
||||
```
|
||||
|
||||
You can add container to the launched Pod with the following:
|
||||
|
||||
```bash
|
||||
$ sudo crictl create 36e23521e8f89 container-config.yaml podsandbox-config.yaml
|
||||
1aab7585530e62c446734f12f6899f095ce53422dafcf5a80055ba11b95f2da7
|
||||
```
|
||||
|
||||
Now, start it with the following:
|
||||
|
||||
```bash
|
||||
$ sudo crictl start 1aab7585530e6
|
||||
1aab7585530e6
|
||||
```
|
||||
|
||||
In Kubernetes, you need to create a `RuntimeClass` resource and add the `RuntimeClass` field in the Pod Spec
|
||||
(see this [document](https://kubernetes.io/docs/concepts/containers/runtime-class/) for more information).
|
||||
|
||||
If `RuntimeClass` is not supported, you can use the following annotation in a Kubernetes pod to identify as an untrusted workload:
|
||||
|
||||
```yaml
|
||||
annotations:
|
||||
io.kubernetes.cri.untrusted-workload: "true"
|
||||
```
|
463
docs/how-to/how-to-import-kata-logs-with-fluentd.md
Normal file
463
docs/how-to/how-to-import-kata-logs-with-fluentd.md
Normal file
@@ -0,0 +1,463 @@
|
||||
# Importing Kata Containers logs with Fluentd
|
||||
|
||||
* [Introduction](#introduction)
|
||||
* [Overview](#overview)
|
||||
* [Test stack](#test-stack)
|
||||
* [Importing the logs](#importing-the-logs)
|
||||
* [Direct import `logfmt` from `systemd`](#direct-import-logfmt-from-systemd)
|
||||
* [Configuring `minikube`](#configuring-minikube)
|
||||
* [Pull from `systemd`](#pull-from-systemd)
|
||||
* [Systemd Summary](#systemd-summary)
|
||||
* [Directly importing JSON](#directly-importing-json)
|
||||
* [JSON in files](#json-in-files)
|
||||
* [Prefixing all keys](#prefixing-all-keys)
|
||||
* [Kata `shimv2`](#kata-shimv2)
|
||||
* [Caveats](#caveats)
|
||||
* [Summary](#summary)
|
||||
|
||||
# Introduction
|
||||
|
||||
This document describes how to import Kata Containers logs into [Fluentd](https://www.fluentd.org/),
|
||||
typically for importing into an
|
||||
Elastic/Fluentd/Kibana([EFK](https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/fluentd-elasticsearch#running-efk-stack-in-production))
|
||||
or Elastic/Logstash/Kibana([ELK](https://www.elastic.co/elastic-stack)) stack.
|
||||
|
||||
The majority of this document focusses on CRI-O based (classic) Kata runtime. Much of that information
|
||||
also applies to the Kata `shimv2` runtime. Differences pertaining to Kata `shimv2` can be found in their
|
||||
[own section](#kata-shimv2).
|
||||
|
||||
> **Note:** This document does not cover any aspect of "log rotation". It is expected that any production
|
||||
> stack already has a method in place to control node log growth.
|
||||
|
||||
# Overview
|
||||
|
||||
Kata generates logs. The logs can come from numerous parts of the Kata stack (the runtime, proxy, shim
|
||||
and even the agent). By default the logs
|
||||
[go to the system journal](https://github.com/kata-containers/runtime#logging),
|
||||
but they can also be configured to be stored in files.
|
||||
|
||||
The logs default format is in [`logfmt` structured logging](https://brandur.org/logfmt), but can be switched to
|
||||
be JSON with a command line option.
|
||||
|
||||
Provided below are some examples of Kata log import and processing using
|
||||
[Fluentd](https://www.fluentd.org/).
|
||||
|
||||
## Test stack
|
||||
|
||||
Some of the testing we can perform locally, but other times we really need a live stack for testing.
|
||||
We will use a [`minikube`](https://github.com/kubernetes/minikube/) stack with EFK enabled and Kata
|
||||
installed to do our tests. Some details such as specific paths and versions of components may need
|
||||
to be adapted to your specific installation.
|
||||
|
||||
The [Kata minikube installation guide](../install/minikube-installation-guide.md) was used to install
|
||||
`minikube` with Kata Containers enabled.
|
||||
|
||||
The minikube EFK stack `addon` is then enabled:
|
||||
|
||||
```bash
|
||||
$ minikube addons enable efk
|
||||
```
|
||||
|
||||
> *Note*: Installing and booting EFK can take a little while - check progress with
|
||||
> `kubectl get pods -n=kube-system` and wait for all the pods to get to the `Running` state.
|
||||
|
||||
## Importing the logs
|
||||
|
||||
Kata offers us two choices to make when storing the logs:
|
||||
- Do we store them to the system log, or to separate files?
|
||||
- Do we store them in `logfmt` format, or `JSON`?
|
||||
|
||||
We will start by examining the Kata default setup (`logfmt` stored in the system log), and then look
|
||||
at other options.
|
||||
|
||||
## Direct import `logfmt` from `systemd`
|
||||
|
||||
Fluentd contains both a component that can read the `systemd` system journals, and a component
|
||||
that can parse `logfmt` entries. We will utilise these in two separate steps to evaluate how well
|
||||
the Kata logs import to the EFK stack.
|
||||
|
||||
### Configuring `minikube`
|
||||
|
||||
> **Note:** Setting up, configuration and deployment of `minikube` is not covered in exacting
|
||||
> detail in this guide. It is presumed the user has the abilities and their own Kubernetes/Fluentd
|
||||
> stack they are able to utilise in order to modify and test as necessary.
|
||||
|
||||
Minikube by default
|
||||
[configures](https://github.com/kubernetes/minikube/blob/master/deploy/iso/minikube-iso/board/coreos/minikube/rootfs-overlay/etc/systemd/journald.conf)
|
||||
the `systemd-journald` with the
|
||||
[`Storage=volatile`](https://www.freedesktop.org/software/systemd/man/journald.conf.html) option,
|
||||
which results in the journal being stored in `/run/log/journal`. Unfortunately, the Minikube EFK
|
||||
Fluentd install extracts most of its logs in `/var/log`, and therefore does not mount `/run/log`
|
||||
into the Fluentd pod by default. This prevents us from reading the system journal by default.
|
||||
|
||||
This can be worked around by patching the Minikube EFK `addon` YAML to mount `/run/log` into the
|
||||
Fluentd container:
|
||||
|
||||
```patch
|
||||
diff --git a/deploy/addons/efk/fluentd-es-rc.yaml.tmpl b/deploy/addons/efk/fluentd-es-rc.yaml.tmpl
|
||||
index 75e386984..83bea48b9 100644
|
||||
--- a/deploy/addons/efk/fluentd-es-rc.yaml.tmpl
|
||||
+++ b/deploy/addons/efk/fluentd-es-rc.yaml.tmpl
|
||||
@@ -44,6 +44,8 @@ spec:
|
||||
volumeMounts:
|
||||
- name: varlog
|
||||
mountPath: /var/log
|
||||
+ - name: runlog
|
||||
+ mountPath: /run/log
|
||||
- name: varlibdockercontainers
|
||||
mountPath: /var/lib/docker/containers
|
||||
readOnly: true
|
||||
@@ -57,6 +59,9 @@ spec:
|
||||
- name: varlog
|
||||
hostPath:
|
||||
path: /var/log
|
||||
+ - name: runlog
|
||||
+ hostPath:
|
||||
+ path: /run/log
|
||||
- name: varlibdockercontainers
|
||||
hostPath:
|
||||
path: /var/lib/docker/containers
|
||||
```
|
||||
|
||||
> **Note:** After making this change you will need to build your own `minikube` to encapsulate
|
||||
> and use this change, or fine another method to (re-)launch the Fluentd containers for the change
|
||||
> to take effect.
|
||||
|
||||
### Pull from `systemd`
|
||||
|
||||
We will start with testing Fluentd pulling the Kata logs directly from the system journal with the
|
||||
Fluentd [systemd plugin](https://github.com/fluent-plugin-systemd/fluent-plugin-systemd).
|
||||
|
||||
We modify the Fluentd config file with the following fragment. For reference, the Minikube
|
||||
YAML can be found
|
||||
[on GitHub](https://github.com/kubernetes/minikube/blob/master/deploy/addons/efk/fluentd-es-configmap.yaml.tmpl):
|
||||
|
||||
> **Note:** The below Fluentd config fragment is in the "older style" to match the Minikube version of
|
||||
> Fluentd. If using a more up to date version of Fluentd, you may need to update some parameters, such as
|
||||
> using `matches` rather than `filters` and placing `@` before `type`. Your Fluentd should warn you in its
|
||||
> logs if such updates are necessary.
|
||||
|
||||
```
|
||||
<source>
|
||||
type systemd
|
||||
tag kata-containers
|
||||
path /run/log/journal
|
||||
pos_file /run/log/journal/kata-journald.pos
|
||||
filters [{"SYSLOG_IDENTIFIER": "kata-runtime"}, {"SYSLOG_IDENTIFIER": "kata-proxy"}, {"SYSLOG_IDENTIFIER": "kata-shim"}]
|
||||
read_from_head true
|
||||
</source>
|
||||
```
|
||||
|
||||
We then apply the new YAML, and restart the Fluentd pod (by killing it, and letting the `ReplicationController`
|
||||
start a new instance, which will pick up the new `ConfigurationMap`):
|
||||
|
||||
```bash
|
||||
$ kubectl apply -f new-fluentd-cm.yaml
|
||||
$ kubectl delete pod -n=kube-system fluentd-es-XXXXX
|
||||
```
|
||||
|
||||
Now open the Kibana UI to the Minikube EFK `addon`, and launch a Kata QEMU based test pod in order to
|
||||
generate some Kata specific log entries:
|
||||
|
||||
```bash
|
||||
$ minikube addons open efk
|
||||
$ cd $GOPATH/src/github.com/kata-containers/packaging/kata-deploy
|
||||
$ kubectl apply -f examples/nginx-deployment-qemu.yaml
|
||||
```
|
||||
|
||||
Looking at the Kibana UI, we can now see that some `kata-runtime` tagged records have appeared:
|
||||
|
||||

|
||||
|
||||
If we now filter on that tag, we can see just the Kata related entries
|
||||
|
||||

|
||||
|
||||
If we expand one of those entries, we can see we have imported useful information. You can then
|
||||
sub-filter on, for instance, the `SYSLOG_IDENTIFIER` to differentiate the Kata components, and
|
||||
on the `PRIORITY` to filter out critical issues etc.
|
||||
|
||||
Kata generates a significant amount of Kata specific information, which can be seen as
|
||||
[`logfmt`](https://github.com/kata-containers/tests/tree/master/cmd/log-parser#logfile-requirements).
|
||||
data contained in the `MESSAGE` field. Imported as-is, there is no easy way to filter on that data
|
||||
in Kibana:
|
||||
|
||||
.
|
||||
|
||||
We can however further sub-parse the Kata entries using the
|
||||
[Fluentd plugins](https://docs.fluentbit.io/manual/parser/logfmt) that will parse
|
||||
`logfmt` formatted data. We can utilise these to parse the sub-fields using a Fluentd filter
|
||||
section. At the same time, we will prefix the new fields with `kata_` to make it clear where
|
||||
they have come from:
|
||||
|
||||
```
|
||||
<filter kata-containers>
|
||||
@type parser
|
||||
key_name MESSAGE
|
||||
format logfmt
|
||||
reserve_data true
|
||||
inject_key_prefix kata_
|
||||
</filter>
|
||||
```
|
||||
|
||||
The Minikube Fluentd version does not come with the `logfmt` parser installed, so we will run a local
|
||||
test to check the parsing works. The resulting output from Fluentd is:
|
||||
|
||||
```
|
||||
2020-02-21 10:31:27.810781647 +0000 kata-containers:
|
||||
{"_BOOT_ID":"590edceeef5545a784ec8c6181a10400",
|
||||
"_MACHINE_ID":"3dd49df65a1b467bac8d51f2eaa17e92",
|
||||
"_HOSTNAME":"minikube",
|
||||
"PRIORITY":"6",
|
||||
"_UID":"0",
|
||||
"_GID":"0",
|
||||
"_SYSTEMD_SLICE":"system.slice",
|
||||
"_SELINUX_CONTEXT":"kernel",
|
||||
"_CAP_EFFECTIVE":"3fffffffff",
|
||||
"_TRANSPORT":"syslog",
|
||||
"_SYSTEMD_CGROUP":"/system.slice/crio.service",
|
||||
"_SYSTEMD_UNIT":"crio.service",
|
||||
"_SYSTEMD_INVOCATION_ID":"f2d99c784e6f406c87742f4bca16a4f6",
|
||||
"SYSLOG_IDENTIFIER":"kata-runtime",
|
||||
"_COMM":"kata-runtime",
|
||||
"_EXE":"/opt/kata/bin/kata-runtime",
|
||||
"SYSLOG_TIMESTAMP":"Feb 21 10:31:27 ",
|
||||
"_CMDLINE":"/opt/kata/bin/kata-runtime --kata-config /opt/kata/share/defaults/kata-containers/configuration-qemu.toml --root /run/runc state 7cdd31660d8705facdadeb8598d2c0bd008e8142c54e3b3069abd392c8d58997",
|
||||
"SYSLOG_PID":"14314",
|
||||
"_PID":"14314",
|
||||
"MESSAGE":"time=\"2020-02-21T10:31:27.810781647Z\" level=info msg=\"release sandbox\" arch=amd64 command=state container=7cdd31660d8705facdadeb8598d2c0bd008e8142c54e3b3069abd392c8d58997 name=kata-runtime pid=14314 sandbox=1c3e77cad66aa2b6d8cc846f818370f79cb0104c0b840f67d0f502fd6562b68c source=virtcontainers subsystem=sandbox",
|
||||
"SYSLOG_RAW":"<6>Feb 21 10:31:27 kata-runtime[14314]: time=\"2020-02-21T10:31:27.810781647Z\" level=info msg=\"release sandbox\" arch=amd64 command=state container=7cdd31660d8705facdadeb8598d2c0bd008e8142c54e3b3069abd392c8d58997 name=kata-runtime pid=14314 sandbox=1c3e77cad66aa2b6d8cc846f818370f79cb0104c0b840f67d0f502fd6562b68c source=virtcontainers subsystem=sandbox\n",
|
||||
"_SOURCE_REALTIME_TIMESTAMP":"1582281087810805",
|
||||
"kata_level":"info",
|
||||
"kata_msg":"release sandbox",
|
||||
"kata_arch":"amd64",
|
||||
"kata_command":"state",
|
||||
"kata_container":"7cdd31660d8705facdadeb8598d2c0bd008e8142c54e3b3069abd392c8d58997",
|
||||
"kata_name":"kata-runtime",
|
||||
"kata_pid":14314,
|
||||
"kata_sandbox":"1c3e77cad66aa2b6d8cc846f818370f79cb0104c0b840f67d0f502fd6562b68c",
|
||||
"kata_source":"virtcontainers",
|
||||
"kata_subsystem":"sandbox"}
|
||||
```
|
||||
|
||||
Here we can see that the `MESSAGE` field has been parsed out and pre-pended into the `kata_*` fields,
|
||||
which contain usefully filterable fields such as `kata_level`, `kata_command` and `kata_subsystem` etc.
|
||||
|
||||
### Systemd Summary
|
||||
|
||||
We have managed to configure Fluentd to capture the Kata logs entries from the system
|
||||
journal, and further managed to then parse out the `logfmt` message into JSON to allow further analysis
|
||||
inside Elastic/Kibana.
|
||||
|
||||
## Directly importing JSON
|
||||
|
||||
The underlying basic data format used by Fluentd and Elastic is JSON. If we output JSON
|
||||
directly from Kata, that should make overall import and processing of the log entries more efficient.
|
||||
|
||||
There are potentially two things we can do with Kata here:
|
||||
|
||||
- Get Kata to [output its logs in `JSON` format](https://github.com/kata-containers/runtime#logging) rather
|
||||
than `logfmt`.
|
||||
- Get Kata to log directly into a file, rather than via the system journal. This would allow us to not need
|
||||
to parse the systemd format files, and capture the Kata log lines directly. It would also avoid Fluentd
|
||||
having to potentially parse or skip over many non-Kata related systemd journal that it is not at all
|
||||
interested in.
|
||||
|
||||
In theory we could get Kata to post its messages in JSON format to the systemd journal by adding the
|
||||
`--log-format=json` option to the Kata runtime, and then swapping the `logfmt` parser for the `json`
|
||||
parser, but we would still need to parse the systemd files. We will skip this setup in this document, and
|
||||
go directly to a full Kata specific JSON format logfile test.
|
||||
|
||||
### JSON in files
|
||||
|
||||
Kata runtime has the ability to generate JSON logs directly, rather than its default `logfmt` format. Passing
|
||||
the `--log-format=json` argument to the Kata runtime enables this. The easiest way to pass in this extra
|
||||
parameter from a [Kata deploy](https://github.com/kata-containers/packaging/tree/master/kata-deploy) installation
|
||||
is to edit the `/opt/kata/bin/kata-qemu` shell script (generated by the
|
||||
[Kata packaging release scripts](https://github.com/kata-containers/packaging/blob/master/release/kata-deploy-binaries.sh)).
|
||||
|
||||
At the same time, we will add the `--log=/var/log/kata-runtime.log` argument to store the Kata logs in their
|
||||
own file (rather than into the system journal).
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
/opt/kata/bin/kata-runtime --kata-config "/opt/kata/share/defaults/kata-containers/configuration-qemu.toml" --log-format=json --log=/var/log/kata-runtime.log $@
|
||||
```
|
||||
|
||||
And then we'll add the Fluentd config section to parse that file. Note, we inform the parser that Kata is
|
||||
generating timestamps in `iso8601` format. Kata places these timestamps into a field called `time`, which
|
||||
is the default field the Fluentd parser looks for:
|
||||
|
||||
```
|
||||
<source>
|
||||
type tail
|
||||
tag kata-containers
|
||||
path /var/log/kata-runtime.log
|
||||
pos_file /var/log/kata-runtime.pos
|
||||
format json
|
||||
time_format %iso8601
|
||||
read_from_head true
|
||||
</source>
|
||||
```
|
||||
|
||||
This imports the `kata-runtime` logs, with the resulting records looking like:
|
||||
|
||||

|
||||
|
||||
Something to note here is that we seem to have gained an awful lot of fairly identical looking fields in the
|
||||
elastic database:
|
||||
|
||||

|
||||
|
||||
In reality, they are not all identical, but do come out of one of the Kata log entries - from the
|
||||
`kill` command. A JSON fragment showing an example is below:
|
||||
|
||||
```json
|
||||
{
|
||||
...
|
||||
"EndpointProperties": {
|
||||
"Iface": {
|
||||
"Index": 4,
|
||||
"MTU": 1460,
|
||||
"TxQLen": 0,
|
||||
"Name": "eth0",
|
||||
"HardwareAddr": "ClgKAQAL",
|
||||
"Flags": 19,
|
||||
"RawFlags": 69699,
|
||||
"ParentIndex": 15,
|
||||
"MasterIndex": 0,
|
||||
"Namespace": null,
|
||||
"Alias": "",
|
||||
"Statistics": {
|
||||
"RxPackets": 1,
|
||||
"TxPackets": 5,
|
||||
"RxBytes": 42,
|
||||
"TxBytes": 426,
|
||||
"RxErrors": 0,
|
||||
"TxErrors": 0,
|
||||
"RxDropped": 0,
|
||||
"TxDropped": 0,
|
||||
"Multicast": 0,
|
||||
"Collisions": 0,
|
||||
"RxLengthErrors": 0,
|
||||
"RxOverErrors": 0,
|
||||
"RxCrcErrors": 0,
|
||||
"RxFrameErrors": 0,
|
||||
"RxFifoErrors": 0,
|
||||
"RxMissedErrors": 0,
|
||||
"TxAbortedErrors": 0,
|
||||
"TxCarrierErrors": 0,
|
||||
"TxFifoErrors": 0,
|
||||
"TxHeartbeatErrors": 0,
|
||||
"TxWindowErrors": 0,
|
||||
"RxCompressed": 0,
|
||||
"TxCompressed": 0
|
||||
...
|
||||
```
|
||||
|
||||
If these new fields are not required, then a Fluentd
|
||||
[`record_transformer` filter](https://docs.fluentd.org/filter/record_transformer#remove_keys)
|
||||
could be used to delete them before they are injected into Elastic.
|
||||
|
||||
#### Prefixing all keys
|
||||
|
||||
It may be noted above that all the fields are imported with their base native name, such as
|
||||
`arch` and `level`. It may be better for data storage and processing if all the fields were
|
||||
identifiable as having come from Kata, and avoid namespace clashes with other imports.
|
||||
This can be achieved by prefixing all the keys with, say, `kata_`. It appears `fluend` cannot
|
||||
do this directly in the input or match phases, but can in the filter/parse phase (as was done
|
||||
when processing `logfmt` data for instance). To achieve this, we can first input the Kata
|
||||
JSON data as a single line, and then add the prefix using a JSON filter section:
|
||||
|
||||
```
|
||||
# Pull in as a single line...
|
||||
<source>
|
||||
@type tail
|
||||
path /var/log/kata-runtime.log
|
||||
pos_file /var/log/kata-runtime.pos
|
||||
read_from_head true
|
||||
tag kata-runtime
|
||||
<parse>
|
||||
@type none
|
||||
</parse>
|
||||
</source>
|
||||
|
||||
<filter kata-runtime>
|
||||
@type parser
|
||||
key_name message
|
||||
# drop the original single line `message` entry
|
||||
reserve_data false
|
||||
inject_key_prefix kata_
|
||||
<parse>
|
||||
@type json
|
||||
</parse>
|
||||
</filter>
|
||||
```
|
||||
|
||||
# Kata `shimv2`
|
||||
|
||||
When using the Kata `shimv2` runtime with `containerd`, as described in this
|
||||
[how-to guide](containerd-kata.md#containerd-runtime-v2-api-shim-v2-api), the Kata logs are routed
|
||||
differently, and some adjustments to the above methods will be necessary to filter them in Fluentd.
|
||||
|
||||
The Kata `shimv2` logs are different in two primary ways:
|
||||
|
||||
- The Kata logs are directed via `containerd`, and will be captured along with the `containerd` logs,
|
||||
such as on the containerd stdout or in the system journal.
|
||||
- In parallel, Kata `shimv2` places its logs into the system journal under the systemd name of `kata`.
|
||||
|
||||
Below is an example Fluentd configuration fragment showing one possible method of extracting and separating
|
||||
the `containerd` and Kata logs from the system journal by filtering on the Kata `SYSLOG_IDENTIFIER` field,
|
||||
using the [Fluentd v0.12 rewrite_tag_filter](https://docs.fluentd.org/v/0.12/output/rewrite_tag_filter):
|
||||
|
||||
```yaml
|
||||
<source>
|
||||
type systemd
|
||||
path /path/to/journal
|
||||
# capture the containerd logs
|
||||
filters [{ "_SYSTEMD_UNIT": "containerd.service" }]
|
||||
pos_file /tmp/systemd-containerd.pos
|
||||
read_from_head true
|
||||
# tag those temporarily, as we will filter them and rewrite the tags
|
||||
tag containerd_tmp_tag
|
||||
</source>
|
||||
|
||||
# filter out and split between kata entries and containerd entries
|
||||
<match containerd_tmp_tag>
|
||||
@type rewrite_tag_filter
|
||||
# Tag Kata entries
|
||||
<rule>
|
||||
key SYSLOG_IDENTIFIER
|
||||
pattern kata
|
||||
tag kata_tag
|
||||
</rule>
|
||||
# Anything that was not matched so far, tag as containerd
|
||||
<rule>
|
||||
key MESSAGE
|
||||
pattern /.+/
|
||||
tag containerd_tag
|
||||
</rule>
|
||||
</match>
|
||||
```
|
||||
|
||||
# Caveats
|
||||
|
||||
> **Warning:** You should be aware of the following caveats, which may disrupt or change what and how
|
||||
> you capture and process the Kata Containers logs.
|
||||
|
||||
The following caveats should be noted:
|
||||
|
||||
- There is a [known issue](https://github.com/kata-containers/runtime/issues/985) whereby enabling
|
||||
full debug in Kata, particularly enabling agent kernel log messages, can result in corrupt log lines
|
||||
being generated by Kata (due to overlapping multiple output streams).
|
||||
- Presently only the `kata-runtime` can generate JSON logs, and direct them to files. Other components
|
||||
such as the `proxy` and `shim` can only presently report to the system journal. Hopefully these
|
||||
components will be extended with extra functionality.
|
||||
|
||||
# Summary
|
||||
|
||||
We have shown how native Kata logs using the systemd journal and `logfmt` data can be import, and also
|
||||
how Kata can be instructed to generate JSON logs directly, and import those into Fluentd.
|
||||
|
||||
We have detailed a few known caveats, and leave it to the implementer to choose the best method for their
|
||||
system.
|
106
docs/how-to/how-to-load-kernel-modules-with-kata.md
Normal file
106
docs/how-to/how-to-load-kernel-modules-with-kata.md
Normal file
@@ -0,0 +1,106 @@
|
||||
# Loading kernel modules
|
||||
|
||||
A new feature for loading kernel modules was introduced in Kata Containers 1.9.
|
||||
The list of kernel modules and their parameters can be provided using the
|
||||
configuration file or OCI annotations. The [Kata runtime][1] gives that
|
||||
information to the [Kata Agent][2] through gRPC when the sandbox is created.
|
||||
The [Kata Agent][2] will insert the kernel modules using `modprobe(8)`, hence
|
||||
modules dependencies are resolved automatically.
|
||||
|
||||
The sandbox will not be started when:
|
||||
|
||||
* A kernel module is specified and the `modprobe(8)` command is not installed in
|
||||
the guest or it fails loading the module.
|
||||
* The module is not available in the guest or it doesn't meet the guest kernel
|
||||
requirements, like architecture and version.
|
||||
|
||||
In the following sections are documented the different ways that exist for
|
||||
loading kernel modules in Kata Containers.
|
||||
|
||||
- [Using Kata Configuration file](#using-kata-configuration-file)
|
||||
- [Using annotations](#using-annotations)
|
||||
|
||||
# Using Kata Configuration file
|
||||
|
||||
```
|
||||
NOTE: Use this method, only if you need to pass the kernel modules to all
|
||||
containers. Please use annotations described below to set per pod annotations.
|
||||
```
|
||||
|
||||
The list of kernel modules and parameters can be set in the `kernel_modules`
|
||||
option as a coma separated list, where each entry in the list specifies a kernel
|
||||
module and its parameters. Each list element comprises one or more space separated
|
||||
fields. The first field specifies the module name and subsequent fields specify
|
||||
individual parameters for the module.
|
||||
|
||||
The following example specifies two modules to load: `e1000e` and `i915`. Two parameters
|
||||
are specified for the `e1000` module: `InterruptThrottleRate` (which takes an array
|
||||
of integer values) and `EEE` (which requires a single integer value).
|
||||
|
||||
```toml
|
||||
kernel_modules=["e1000e InterruptThrottleRate=3000,3000,3000 EEE=1", "i915"]
|
||||
```
|
||||
|
||||
Not all the container managers allow users provide custom annotations, hence
|
||||
this is the only way that Kata Containers provide for loading modules when
|
||||
custom annotations are not supported.
|
||||
|
||||
There are some limitations with this approach:
|
||||
|
||||
* Write access to the Kata configuration file is required.
|
||||
* The configuration file must be updated when a new container is created,
|
||||
otherwise the same list of modules is used, even if they are not needed in the
|
||||
container.
|
||||
|
||||
# Using annotations
|
||||
|
||||
As was mentioned above, not all containers need the same modules, therefore using
|
||||
the configuration file for specifying the list of kernel modules per [POD][3] can
|
||||
be a pain. Unlike the configuration file, annotations provide a way to specify
|
||||
custom configurations per POD.
|
||||
|
||||
The list of kernel modules and parameters can be set using the annotation
|
||||
`io.katacontainers.config.agent.kernel_modules` as a semicolon separated
|
||||
list, where the first word of each element is considered as the module name and
|
||||
the rest as its parameters.
|
||||
|
||||
In the following example two PODs are created, but the kernel modules `e1000e`
|
||||
and `i915` are inserted only in the POD `pod1`.
|
||||
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: pod1
|
||||
annotations:
|
||||
io.katacontainers.config.agent.kernel_modules: "e1000e EEE=1; i915"
|
||||
spec:
|
||||
runtimeClassName: kata
|
||||
containers:
|
||||
- name: c1
|
||||
image: busybox
|
||||
command:
|
||||
- sh
|
||||
stdin: true
|
||||
tty: true
|
||||
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: pod2
|
||||
spec:
|
||||
runtimeClassName: kata
|
||||
containers:
|
||||
- name: c2
|
||||
image: busybox
|
||||
command:
|
||||
- sh
|
||||
stdin: true
|
||||
tty: true
|
||||
```
|
||||
|
||||
[1]: https://github.com/kata-containers/runtime
|
||||
[2]: https://github.com/kata-containers/agent
|
||||
[3]: https://kubernetes.io/docs/concepts/workloads/pods/pod/
|
160
docs/how-to/how-to-set-sandbox-config-kata.md
Normal file
160
docs/how-to/how-to-set-sandbox-config-kata.md
Normal file
@@ -0,0 +1,160 @@
|
||||
# Per-Pod Kata Configurations
|
||||
|
||||
Kata Containers gives users freedom to customize at per-pod level, by setting
|
||||
a wide range of Kata specific annotations in the pod specification.
|
||||
|
||||
# Kata Configuration Annotations
|
||||
There are several kinds of Kata configurations and they are listed below.
|
||||
|
||||
## Global Options
|
||||
| Key | Value Type | Comments |
|
||||
|-------| ----- | ----- |
|
||||
| `io.katacontainers.config_path` | string | Kata config file location that overrides the default config paths |
|
||||
| `io.katacontainers.pkg.oci.bundle_path` | string | OCI bundle path |
|
||||
| `io.katacontainers.pkg.oci.container_type`| string | OCI container type. Only accepts `pod_container` and `pod_sandbox` |
|
||||
|
||||
## Runtime Options
|
||||
| Key | Value Type | Comments |
|
||||
|-------| ----- | ----- |
|
||||
| `io.katacontainers.config.runtime.experimental` | `boolean` | determines if experimental features enabled |
|
||||
| `io.katacontainers.config.runtime.disable_guest_seccomp`| `boolean` | determines if `seccomp` should be applied inside guest |
|
||||
| `io.katacontainers.config.runtime.disable_new_netns` | `boolean` | determines if a new netns is created for the hypervisor process |
|
||||
| `io.katacontainers.config.runtime.internetworking_model` | string| determines how the VM should be connected to the container network interface. Valid values are `macvtap`, `tcfilter` and `none` |
|
||||
| `io.katacontainers.config.runtime.sandbox_cgroup_only`| `boolean` | determines if Kata processes are managed only in sandbox cgroup |
|
||||
|
||||
## Agent Options
|
||||
| Key | Value Type | Comments |
|
||||
|-------| ----- | ----- |
|
||||
| `io.katacontainers.config.agent.enable_tracing` | `boolean` | enable tracing for the agent |
|
||||
| `io.katacontainers.config.agent.kernel_modules` | string | the list of kernel modules and their parameters that will be loaded in the guest kernel. Semicolon separated list of kernel modules and their parameters. These modules will be loaded in the guest kernel using `modprobe`(8). E.g., `e1000e InterruptThrottleRate=3000,3000,3000 EEE=1; i915 enable_ppgtt=0` |
|
||||
| `io.katacontainers.config.agent.trace_mode` | string | the trace mode for the agent |
|
||||
| `io.katacontainers.config.agent.trace_type` | string | the trace type for the agent |
|
||||
|
||||
## Hypervisor Options
|
||||
| Key | Value Type | Comments |
|
||||
|-------| ----- | ----- |
|
||||
| `io.katacontainers.config.hypervisor.asset_hash_type` | string | the hash type used for assets verification, default is `sha512` |
|
||||
| `io.katacontainers.config.hypervisor.block_device_cache_direct` | `boolean` | Denotes whether use of `O_DIRECT` (bypass the host page cache) is enabled |
|
||||
| `io.katacontainers.config.hypervisor.block_device_cache_noflush` | `boolean` | Denotes whether flush requests for the device are ignored |
|
||||
| `io.katacontainers.config.hypervisor.block_device_cache_set` | `boolean` | cache-related options will be set to block devices or not |
|
||||
| `io.katacontainers.config.hypervisor.block_device_driver` | string | the driver to be used for block device, valid values are `virtio-blk`, `virtio-scsi`, `nvdimm`|
|
||||
| `io.katacontainers.config.hypervisor.default_max_vcpus` | uint32| the maximum number of vCPUs allocated for the VM by the hypervisor |
|
||||
| `io.katacontainers.config.hypervisor.default_memory` | uint32| the memory assigned for a VM by the hypervisor in `MiB` |
|
||||
| `io.katacontainers.config.hypervisor.default_vcpus` | uint32| the default vCPUs assigned for a VM by the hypervisor |
|
||||
| `io.katacontainers.config.hypervisor.disable_block_device_use` | `boolean` | disallow a block device from being used |
|
||||
| `io.katacontainers.config.hypervisor.disable_vhost_net` | `boolean` | specify if `vhost-net` is not available on the host |
|
||||
| `io.katacontainers.config.hypervisor.enable_hugepages` | `boolean` | if the memory should be `pre-allocated` from huge pages |
|
||||
| `io.katacontainers.config.hypervisor.enable_iothreads` | `boolean`| enable IO to be processed in a separate thread. Supported currently for virtio-`scsi` driver |
|
||||
| `io.katacontainers.config.hypervisor.enable_mem_prealloc` | `boolean` | the memory space used for `nvdimm` device by the hypervisor |
|
||||
| `io.katacontainers.config.hypervisor.enable_swap` | `boolean` | enable swap of VM memory |
|
||||
| `io.katacontainers.config.hypervisor.entropy_source` | string| the path to a host source of entropy (`/dev/random`, `/dev/urandom` or real hardware RNG device) |
|
||||
| `io.katacontainers.config.hypervisor.file_mem_backend` | string | file based memory backend root directory |
|
||||
| `io.katacontainers.config.hypervisor.firmware_hash` | string | container firmware SHA-512 hash value |
|
||||
| `io.katacontainers.config.hypervisor.firmware` | string | the guest firmware that will run the container VM |
|
||||
| `io.katacontainers.config.hypervisor.guest_hook_path` | string | the path within the VM that will be used for drop in hooks |
|
||||
| `io.katacontainers.config.hypervisor.hotplug_vfio_on_root_bus` | `boolean` | indicate if devices need to be hotplugged on the root bus instead of a bridge|
|
||||
| `io.katacontainers.config.hypervisor.hypervisor_hash` | string | container hypervisor binary SHA-512 hash value |
|
||||
| `io.katacontainers.config.hypervisor.image_hash` | string | container guest image SHA-512 hash value |
|
||||
| `io.katacontainers.config.hypervisor.image` | string | the guest image that will run in the container VM |
|
||||
| `io.katacontainers.config.hypervisor.initrd_hash` | string | container guest initrd SHA-512 hash value |
|
||||
| `io.katacontainers.config.hypervisor.initrd` | string | the guest initrd image that will run in the container VM |
|
||||
| `io.katacontainers.config.hypervisor.jailer_hash` | string | container jailer SHA-512 hash value |
|
||||
| `io.katacontainers.config.hypervisor.jailer_path` | string | the jailer that will constrain the container VM |
|
||||
| `io.katacontainers.config.hypervisor.kernel_hash` | string | container kernel image SHA-512 hash value |
|
||||
| `io.katacontainers.config.hypervisor.kernel_params` | string | additional guest kernel parameters |
|
||||
| `io.katacontainers.config.hypervisor.kernel` | string | the kernel used to boot the container VM |
|
||||
| `io.katacontainers.config.hypervisor.machine_accelerators` | string | machine specific accelerators for the hypervisor |
|
||||
| `io.katacontainers.config.hypervisor.machine_type` | string | the type of machine being emulated by the hypervisor |
|
||||
| `io.katacontainers.config.hypervisor.memory_offset` | uint32| the memory space used for `nvdimm` device by the hypervisor |
|
||||
| `io.katacontainers.config.hypervisor.memory_slots` | uint32| the memory slots assigned to the VM by the hypervisor |
|
||||
| `io.katacontainers.config.hypervisor.msize_9p` | uint32 | the `msize` for 9p shares |
|
||||
| `io.katacontainers.config.hypervisor.path` | string | the hypervisor that will run the container VM |
|
||||
| `io.katacontainers.config.hypervisor.shared_fs` | string | the shared file system type, either `virtio-9p` or `virtio-fs` |
|
||||
| `io.katacontainers.config.hypervisor.use_vsock` | `boolean` | specify use of `vsock` for agent communication |
|
||||
| `io.katacontainers.config.hypervisor.virtio_fs_cache_size` | uint32 | virtio-fs DAX cache size in `MiB` |
|
||||
| `io.katacontainers.config.hypervisor.virtio_fs_cache` | string | the cache mode for virtio-fs, valid values are `always`, `auto` and `none` |
|
||||
| `io.katacontainers.config.hypervisor.virtio_fs_daemon` | string | virtio-fs `vhost-user` daemon path |
|
||||
| `io.katacontainers.config.hypervisor.virtio_fs_extra_args` | string | extra options passed to `virtiofs` daemon |
|
||||
|
||||
# CRI Configuration
|
||||
|
||||
In case of CRI-O, all annotations specified in the pod spec are passed down to Kata.
|
||||
For containerd, annotations specified in the pod spec are passed down to Kata
|
||||
starting with version `1.3.0`. Additionally, extra configuration is needed for containerd,
|
||||
by providing a `pod_annotations` field in the containerd config file. The `pod_annotations`
|
||||
field is a list of annotations that can be passed down to Kata as OCI annotations.
|
||||
It supports golang match patterns. Since annotations supported by Kata follow the pattern
|
||||
`io.katacontainers.*`, the following configuration would work for passing annotations to
|
||||
Kata from containerd:
|
||||
|
||||
```
|
||||
$ cat /etc/containerd/config
|
||||
....
|
||||
|
||||
[plugins.cri.containerd.runtimes.kata]
|
||||
runtime_type = "io.containerd.runc.v1"
|
||||
pod_annotations = ["io.katacontainers.*"]
|
||||
[plugins.cri.containerd.runtimes.kata.options]
|
||||
BinaryName = "/usr/bin/kata-runtime"
|
||||
....
|
||||
|
||||
```
|
||||
|
||||
Additional documentation on the above configuration can be found in the
|
||||
[containerd docs](https://github.com/containerd/cri/blob/8d5a8355d07783ba2f8f451209f6bdcc7c412346/docs/config.md).
|
||||
|
||||
# Example - Using annotations
|
||||
|
||||
As mentioned above, not all containers need the same modules, therefore using
|
||||
the configuration file for specifying the list of kernel modules per POD can
|
||||
be a pain. Unlike the configuration file, annotations provide a way to specify
|
||||
custom configurations per POD.
|
||||
|
||||
The list of kernel modules and parameters can be set using the annotation
|
||||
`io.katacontainers.config.agent.kernel_modules` as a semicolon separated
|
||||
list, where the first word of each element is considered as the module name and
|
||||
the rest as its parameters.
|
||||
|
||||
Also users might want to enable guest `seccomp` to provide better isolation with a
|
||||
little performance sacrifice. The annotation
|
||||
`io.katacontainers.config.runtime.disable_guest_seccomp` can used for such purpose.
|
||||
|
||||
In the following example two PODs are created, but the kernel modules `e1000e`
|
||||
and `i915` are inserted only in the POD `pod1`. Also guest `seccomp` is only enabled
|
||||
in the POD `pod2`.
|
||||
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: pod1
|
||||
annotations:
|
||||
io.katacontainers.config.agent.kernel_modules: "e1000e EEE=1; i915"
|
||||
spec:
|
||||
runtimeClassName: kata
|
||||
containers:
|
||||
- name: c1
|
||||
image: busybox
|
||||
command:
|
||||
- sh
|
||||
stdin: true
|
||||
tty: true
|
||||
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: pod2
|
||||
annotations:
|
||||
io.katacontainers.config.runtime.disable_guest_seccomp: false
|
||||
spec:
|
||||
runtimeClassName: kata
|
||||
containers:
|
||||
- name: c2
|
||||
image: busybox
|
||||
command:
|
||||
- sh
|
||||
stdin: true
|
||||
tty: true
|
||||
```
|
220
docs/how-to/how-to-use-k8s-with-cri-containerd-and-kata.md
Normal file
220
docs/how-to/how-to-use-k8s-with-cri-containerd-and-kata.md
Normal file
@@ -0,0 +1,220 @@
|
||||
# How to use Kata Containers and CRI (containerd plugin) with Kubernetes
|
||||
|
||||
* [Requirements](#requirements)
|
||||
* [Install and configure containerd](#install-and-configure-containerd)
|
||||
* [Install and configure Kubernetes](#install-and-configure-kubernetes)
|
||||
* [Install Kubernetes](#install-kubernetes)
|
||||
* [Configure Kubelet to use containerd](#configure-kubelet-to-use-containerd)
|
||||
* [Configure HTTP proxy - OPTIONAL](#configure-http-proxy---optional)
|
||||
* [Start Kubernetes](#start-kubernetes)
|
||||
* [Install a Pod Network](#install-a-pod-network)
|
||||
* [Allow pods to run in the master node](#allow-pods-to-run-in-the-master-node)
|
||||
* [Create an untrusted pod using Kata Containers](#create-an-untrusted-pod-using-kata-containers)
|
||||
* [Delete created pod](#delete-created-pod)
|
||||
|
||||
This document describes how to set up a single-machine Kubernetes (k8s) cluster.
|
||||
|
||||
The Kubernetes cluster will use the
|
||||
[CRI containerd plugin](https://github.com/containerd/cri) and
|
||||
[Kata Containers](https://katacontainers.io) to launch untrusted workloads.
|
||||
|
||||
For Kata Containers 1.5.0-rc2 and above, we will use `containerd-shim-kata-v2` (short as `shimv2` in this documentation)
|
||||
to launch Kata Containers. For the previous version of Kata Containers, the Pods are launched with `kata-runtime`.
|
||||
|
||||
## Requirements
|
||||
|
||||
- Kubernetes, Kubelet, `kubeadm`
|
||||
- containerd with `cri` plug-in
|
||||
- Kata Containers
|
||||
|
||||
> **Note:** For information about the supported versions of these components,
|
||||
> see the Kata Containers
|
||||
> [`versions.yaml`](https://github.com/kata-containers/runtime/blob/master/versions.yaml)
|
||||
> file.
|
||||
|
||||
## Install and configure containerd
|
||||
|
||||
First, follow the [How to use Kata Containers and Containerd](containerd-kata.md) to install and configure containerd.
|
||||
Then, make sure the containerd works with the [examples in it](containerd-kata.md#run).
|
||||
|
||||
## Install and configure Kubernetes
|
||||
|
||||
### Install Kubernetes
|
||||
|
||||
- Follow the instructions for
|
||||
[`kubeadm` installation](https://kubernetes.io/docs/setup/independent/install-kubeadm/).
|
||||
|
||||
- Check `kubeadm` is now available
|
||||
|
||||
```bash
|
||||
$ command -v kubeadm
|
||||
```
|
||||
|
||||
### Configure Kubelet to use containerd
|
||||
|
||||
In order to allow Kubelet to use containerd (using the CRI interface), configure the service to point to the `containerd` socket.
|
||||
|
||||
- Configure Kubernetes to use `containerd`
|
||||
|
||||
```bash
|
||||
$ sudo mkdir -p /etc/systemd/system/kubelet.service.d/
|
||||
$ cat << EOF | sudo tee /etc/systemd/system/kubelet.service.d/0-containerd.conf
|
||||
[Service]
|
||||
Environment="KUBELET_EXTRA_ARGS=--container-runtime=remote --runtime-request-timeout=15m --container-runtime-endpoint=unix:///run/containerd/containerd.sock"
|
||||
EOF
|
||||
```
|
||||
|
||||
- Inform systemd about the new configuration
|
||||
|
||||
```bash
|
||||
$ sudo systemctl daemon-reload
|
||||
```
|
||||
|
||||
### Configure HTTP proxy - OPTIONAL
|
||||
|
||||
If you are behind a proxy, use the following script to configure your proxy for docker, Kubelet, and containerd:
|
||||
|
||||
```bash
|
||||
$ services="
|
||||
kubelet
|
||||
containerd
|
||||
docker
|
||||
"
|
||||
|
||||
$ for service in ${services}; do
|
||||
|
||||
service_dir="/etc/systemd/system/${service}.service.d/"
|
||||
sudo mkdir -p ${service_dir}
|
||||
|
||||
cat << EOT | sudo tee "${service_dir}/proxy.conf"
|
||||
[Service]
|
||||
Environment="HTTP_PROXY=${http_proxy}"
|
||||
Environment="HTTPS_PROXY=${https_proxy}"
|
||||
Environment="NO_PROXY=${no_proxy}"
|
||||
EOT
|
||||
done
|
||||
|
||||
$ sudo systemctl daemon-reload
|
||||
```
|
||||
|
||||
## Start Kubernetes
|
||||
|
||||
- Make sure `containerd` is up and running
|
||||
|
||||
```bash
|
||||
$ sudo systemctl restart containerd
|
||||
$ sudo systemctl status containerd
|
||||
```
|
||||
|
||||
- Prevent conflicts between `docker` iptables (packet filtering) rules and k8s pod communication
|
||||
|
||||
If Docker is installed on the node, it is necessary to modify the rule
|
||||
below. See https://github.com/kubernetes/kubernetes/issues/40182 for further
|
||||
details.
|
||||
|
||||
```bash
|
||||
$ sudo iptables -P FORWARD ACCEPT
|
||||
```
|
||||
|
||||
- Start cluster using `kubeadm`
|
||||
|
||||
```bash
|
||||
$ sudo kubeadm init --cri-socket /run/containerd/containerd.sock --pod-network-cidr=10.244.0.0/16
|
||||
$ export KUBECONFIG=/etc/kubernetes/admin.conf
|
||||
$ sudo -E kubectl get nodes
|
||||
$ sudo -E kubectl get pods
|
||||
```
|
||||
|
||||
## Install a Pod Network
|
||||
|
||||
A pod network plugin is needed to allow pods to communicate with each other.
|
||||
|
||||
- Install the `flannel` plugin by following the
|
||||
[Using `kubeadm` to Create a Cluster](https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/#instructions)
|
||||
guide, starting from the **Installing a pod network** section.
|
||||
|
||||
- Create a pod network using flannel
|
||||
|
||||
> **Note:** There is no known way to determine programmatically the best version (commit) to use.
|
||||
> See https://github.com/coreos/flannel/issues/995.
|
||||
|
||||
```bash
|
||||
$ sudo -E kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
|
||||
```
|
||||
|
||||
- Wait for the pod network to become available
|
||||
|
||||
```bash
|
||||
# number of seconds to wait for pod network to become available
|
||||
$ timeout_dns=420
|
||||
|
||||
$ while [ "$timeout_dns" -gt 0 ]; do
|
||||
if sudo -E kubectl get pods --all-namespaces | grep dns | grep Running; then
|
||||
break
|
||||
fi
|
||||
|
||||
sleep 1s
|
||||
((timeout_dns--))
|
||||
done
|
||||
```
|
||||
|
||||
- Check the pod network is running
|
||||
|
||||
```bash
|
||||
$ sudo -E kubectl get pods --all-namespaces | grep dns | grep Running && echo "OK" || ( echo "FAIL" && false )
|
||||
```
|
||||
|
||||
## Allow pods to run in the master node
|
||||
|
||||
By default, the cluster will not schedule pods in the master node. To enable master node scheduling:
|
||||
|
||||
```bash
|
||||
$ sudo -E kubectl taint nodes --all node-role.kubernetes.io/master-
|
||||
```
|
||||
|
||||
## Create an untrusted pod using Kata Containers
|
||||
|
||||
By default, all pods are created with the default runtime configured in CRI containerd plugin.
|
||||
|
||||
If a pod has the `io.kubernetes.cri.untrusted-workload` annotation set to `"true"`, the CRI plugin runs the pod with the
|
||||
[Kata Containers runtime](https://github.com/kata-containers/runtime/blob/master/README.md).
|
||||
|
||||
- Create an untrusted pod configuration
|
||||
|
||||
```bash
|
||||
$ cat << EOT | tee nginx-untrusted.yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: nginx-untrusted
|
||||
annotations:
|
||||
io.kubernetes.cri.untrusted-workload: "true"
|
||||
spec:
|
||||
containers:
|
||||
- name: nginx
|
||||
image: nginx
|
||||
|
||||
EOT
|
||||
```
|
||||
|
||||
- Create an untrusted pod
|
||||
```bash
|
||||
$ sudo -E kubectl apply -f nginx-untrusted.yaml
|
||||
```
|
||||
|
||||
- Check pod is running
|
||||
|
||||
```bash
|
||||
$ sudo -E kubectl get pods
|
||||
```
|
||||
|
||||
- Check hypervisor is running
|
||||
```bash
|
||||
$ ps aux | grep qemu
|
||||
```
|
||||
|
||||
## Delete created pod
|
||||
|
||||
```bash
|
||||
$ sudo -E kubectl delete -f nginx-untrusted.yaml
|
||||
```
|
130
docs/how-to/how-to-use-kata-containers-with-acrn.md
Normal file
130
docs/how-to/how-to-use-kata-containers-with-acrn.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# Kata Containers with ACRN
|
||||
|
||||
This document provides an overview on how to run Kata containers with ACRN hypervisor and device model.
|
||||
|
||||
- [Introduction](#introduction)
|
||||
- [Pre-requisites](#pre-requisites)
|
||||
- [Configure Docker](#configure-docker)
|
||||
- [Configure Kata Containers with ACRN](#configure-kata-containers-with-acrn)
|
||||
|
||||
## Introduction
|
||||
|
||||
ACRN is a flexible, lightweight Type-1 reference hypervisor built with real-time and safety-criticality in mind. ACRN uses an open source platform making it optimized to streamline embedded development.
|
||||
|
||||
Some of the key features being:
|
||||
|
||||
- Small footprint - Approx. 25K lines of code (LOC).
|
||||
- Real Time - Low latency, faster boot time, improves overall responsiveness with hardware.
|
||||
- Adaptability - Multi-OS support for guest operating systems like Linux, Android, RTOSes.
|
||||
- Rich I/O mediators - Allows sharing of various I/O devices across VMs.
|
||||
- Optimized for a variety of IoT (Internet of Things) and embedded device solutions.
|
||||
|
||||
Please refer to ACRN [documentation](https://projectacrn.github.io/latest/index.html) for more details on ACRN hypervisor and device model.
|
||||
|
||||
## Pre-requisites
|
||||
|
||||
This document requires the presence of the ACRN hypervisor and Kata Containers on your system. Install using the instructions available through the following links:
|
||||
|
||||
- ACRN supported [Hardware](https://projectacrn.github.io/latest/hardware.html#supported-hardware).
|
||||
> **Note:** Please make sure to have a minimum of 4 logical processors (HT) or cores.
|
||||
- ACRN [software](https://projectacrn.github.io/latest/tutorials/kbl-nuc-sdc.html#use-the-script-to-set-up-acrn-automatically) setup.
|
||||
- For networking, ACRN supports either MACVTAP or TAP. If MACVTAP is not enabled in the Service OS, please follow the below steps to update the kernel:
|
||||
|
||||
```sh
|
||||
$ git clone https://github.com/projectacrn/acrn-kernel.git
|
||||
$ cd acrn-kernel
|
||||
$ cp kernel_config_sos .config
|
||||
$ sed -i "s/# CONFIG_MACVLAN is not set/CONFIG_MACVLAN=y/" .config
|
||||
$ sed -i '$ i CONFIG_MACVTAP=y' .config
|
||||
$ make clean && make olddefconfig && make && sudo make modules_install INSTALL_MOD_PATH=out/
|
||||
```
|
||||
Login into Service OS and update the kernel with MACVTAP support:
|
||||
|
||||
```sh
|
||||
$ sudo mount /dev/sda1 /mnt
|
||||
$ sudo scp -r <user name>@<host address>:<your workspace>/acrn-kernel/arch/x86/boot/bzImage /mnt/EFI/org.clearlinux/
|
||||
$ sudo scp -r <user name>@<host address>:<your workspace>/acrn-kernel/out/lib/modules/* /lib/modules/
|
||||
$ conf_file=$(sed -n '$ s/default //p' /mnt/loader/loader.conf).conf
|
||||
$ kernel_img=$(sed -n 2p /mnt/loader/entries/$conf_file | cut -d'/' -f4)
|
||||
$ sudo sed -i "s/$kernel_img/bzImage/g" /mnt/loader/entries/$conf_file
|
||||
$ sync && sudo umount /mnt && sudo reboot
|
||||
```
|
||||
- Kata Containers installation: Automated installation does not seem to be supported for Clear Linux, so please use [manual installation](https://github.com/kata-containers/documentation/blob/master/Developer-Guide.md) steps.
|
||||
|
||||
> **Note:** Create rootfs image and not initrd image.
|
||||
|
||||
In order to run Kata with ACRN, your container stack must provide block-based storage, such as device-mapper.
|
||||
|
||||
> **Note:** Currently, by design you can only launch one VM from Kata Containers using ACRN hypervisor (SDC scenario). Based on feedback from community we can increase number of VMs.
|
||||
|
||||
## Configure Docker
|
||||
|
||||
To configure Docker for device-mapper and Kata,
|
||||
|
||||
1. Stop Docker daemon if it is already running.
|
||||
|
||||
```bash
|
||||
$ sudo systemctl stop docker
|
||||
```
|
||||
|
||||
2. Set `/etc/docker/daemon.json` with the following contents.
|
||||
|
||||
```
|
||||
{
|
||||
"storage-driver": "devicemapper"
|
||||
}
|
||||
```
|
||||
|
||||
3. Restart docker.
|
||||
|
||||
```bash
|
||||
$ sudo systemctl daemon-reload
|
||||
$ sudo systemctl restart docker
|
||||
```
|
||||
|
||||
4. Configure [Docker](https://github.com/kata-containers/documentation/blob/master/Developer-Guide.md#update-the-docker-systemd-unit-file) to use `kata-runtime`.
|
||||
|
||||
## Configure Kata Containers with ACRN
|
||||
|
||||
To configure Kata Containers with ACRN, copy the generated `configuration-acrn.toml` file when building the `kata-runtime` to either `/etc/kata-containers/configuration.toml` or `/usr/share/defaults/kata-containers/configuration.toml`.
|
||||
|
||||
The following command shows full paths to the `configuration.toml` files that the runtime loads. It will use the first path that exists. (Please make sure the kernel and image paths are set correctly in the `configuration.toml` file)
|
||||
|
||||
```bash
|
||||
$ sudo kata-runtime --kata-show-default-config-paths
|
||||
```
|
||||
|
||||
>**Warning:** Please offline CPUs using [this](offline_cpu.sh) script, else VM launches will fail.
|
||||
|
||||
```bash
|
||||
$ sudo ./offline_cpu.sh
|
||||
```
|
||||
|
||||
Start an ACRN based Kata Container,
|
||||
|
||||
```bash
|
||||
$ sudo docker run -ti --runtime=kata-runtime busybox sh
|
||||
```
|
||||
|
||||
You will see ACRN(`acrn-dm`) is now running on your system, as well as a `kata-shim`, `kata-proxy`. You should obtain an interactive shell prompt. Verify that all the Kata processes terminate once you exit the container.
|
||||
|
||||
```bash
|
||||
$ ps -ef | grep -E "kata|acrn"
|
||||
```
|
||||
|
||||
Validate ACRN hypervisor by using `kata-runtime kata-env`,
|
||||
|
||||
```sh
|
||||
$ kata-runtime kata-env | awk -v RS= '/\[Hypervisor\]/'
|
||||
[Hypervisor]
|
||||
MachineType = ""
|
||||
Version = "DM version is: 1.2-unstable-254577a6-dirty (daily tag:acrn-2019w27.4-140000p)
|
||||
Path = "/usr/bin/acrn-dm"
|
||||
BlockDeviceDriver = "virtio-blk"
|
||||
EntropySource = "/dev/urandom"
|
||||
Msize9p = 0
|
||||
MemorySlots = 10
|
||||
Debug = false
|
||||
UseVSock = false
|
||||
SharedFS = ""
|
||||
```
|
115
docs/how-to/how-to-use-kata-containers-with-nemu.md
Normal file
115
docs/how-to/how-to-use-kata-containers-with-nemu.md
Normal file
@@ -0,0 +1,115 @@
|
||||
|
||||
# Kata Containers with NEMU
|
||||
|
||||
* [Introduction](#introduction)
|
||||
* [Pre-requisites](#pre-requisites)
|
||||
* [NEMU](#nemu)
|
||||
* [Download and build](#download-and-build)
|
||||
* [x86_64](#x86_64)
|
||||
* [aarch64](#aarch64)
|
||||
* [Configure Kata Containers](#configure-kata-containers)
|
||||
|
||||
Kata Containers relies by default on the QEMU hypervisor in order to spawn the virtual machines running containers. [NEMU](https://github.com/intel/nemu) is a fork of QEMU that:
|
||||
- Reduces the number of lines of code.
|
||||
- Removes all legacy devices.
|
||||
- Reduces the emulation as far as possible.
|
||||
|
||||
## Introduction
|
||||
|
||||
This document describes how to run Kata Containers with NEMU, first by explaining how to download, build and install it. Then it walks through the steps needed to update your Kata Containers configuration in order to run with NEMU.
|
||||
|
||||
## Pre-requisites
|
||||
This document requires Kata Containers to be [installed](https://github.com/kata-containers/documentation/blob/master/install/README.md) on your system.
|
||||
|
||||
Also, it's worth noting that NEMU only supports `x86_64` and `aarch64` architecture.
|
||||
|
||||
## NEMU
|
||||
|
||||
### Download and build
|
||||
|
||||
```bash
|
||||
$ git clone https://github.com/intel/nemu.git
|
||||
$ cd nemu
|
||||
$ git fetch origin
|
||||
$ git checkout origin/experiment/automatic-removal
|
||||
```
|
||||
#### x86_64
|
||||
```
|
||||
$ SRCDIR=$PWD ./tools/build_x86_64_virt.sh
|
||||
```
|
||||
#### aarch64
|
||||
```
|
||||
$ SRCDIR=$PWD ./tools/build_aarch64.sh
|
||||
```
|
||||
|
||||
> **Note:** The branch `experiment/automatic-removal` is a branch published by Jenkins after it has applied the automatic removal script to the `topic/virt-x86` branch. The purpose of this code removal being to reduce the source tree by removing files not being used by NEMU.
|
||||
|
||||
After those commands have successfully returned, you will find the NEMU binary at `$HOME/build-x86_64_virt/x86_64_virt-softmmu/qemu-system-x86_64_virt` (__x86__), or `$HOME/build-aarch64/aarch64-softmmu/qemu-system-aarch64` (__ARM__).
|
||||
|
||||
You also need the `OVMF` firmware in order to boot the virtual machine's kernel. It can currently be found at this [location](https://github.com/intel/ovmf-virt/releases).
|
||||
```bash
|
||||
$ sudo mkdir -p /usr/share/nemu
|
||||
$ OVMF_URL=$(curl -sL https://api.github.com/repos/intel/ovmf-virt/releases/latest | jq -S '.assets[0].browser_download_url')
|
||||
$ curl -o OVMF.fd -L $(sed -e 's/^"//' -e 's/"$//' <<<"$OVMF_URL")
|
||||
$ sudo install -o root -g root -m 0640 OVMF.fd /usr/share/nemu/
|
||||
```
|
||||
> **Note:** The OVMF firmware will be located at this temporary location until the changes can be pushed upstream.
|
||||
|
||||
|
||||
## Configure Kata Containers
|
||||
All you need from this section is to modify the configuration file `/usr/share/defaults/kata-containers/configuration.toml` to specify the options related to the hypervisor.
|
||||
|
||||
|
||||
```diff
|
||||
[hypervisor.qemu]
|
||||
-path = "/usr/bin/qemu-lite-system-x86_64"
|
||||
+path = "/home/foo/build-x86_64_virt/x86_64_virt-softmmu/qemu-system-x86_64_virt"
|
||||
kernel = "/usr/share/kata-containers/vmlinuz.container"
|
||||
initrd = "/usr/share/kata-containers/kata-containers-initrd.img"
|
||||
image = "/usr/share/kata-containers/kata-containers.img"
|
||||
-machine_type = "pc"
|
||||
+machine_type = "virt"
|
||||
|
||||
# Optional space-separated list of options to pass to the guest kernel.
|
||||
# For example, use `kernel_params = "vsyscall=emulate"` if you are having
|
||||
@@ -31,7 +31,7 @@
|
||||
|
||||
# Path to the firmware.
|
||||
# If you want that qemu uses the default firmware leave this option empty
|
||||
-firmware = ""
|
||||
+firmware = "/usr/share/nemu/OVMF.fd"
|
||||
|
||||
# Machine accelerators
|
||||
# comma-separated list of machine accelerators to pass to the hypervisor.
|
||||
```
|
||||
|
||||
As you can see from this snippet above, all you need to change is:
|
||||
- The path to the hypervisor binary, `/home/foo/build-x86_64_virt/x86_64_virt-softmmu/qemu-system-x86_64_virt` in this example.
|
||||
- The machine type from `pc` to `virt`.
|
||||
- The path to the firmware binary, `/usr/share/nemu/OVMF.fd` in this example.
|
||||
|
||||
Once you have saved those modifications, you can start a new container:
|
||||
```bash
|
||||
$ docker run --runtime=kata-runtime -it busybox
|
||||
```
|
||||
And you will be able to verify this new container is running with the NEMU hypervisor by looking for the hypervisor path and the machine type from the `qemu` process running on your system:
|
||||
```bash
|
||||
$ ps -aux | grep qemu
|
||||
root ... /home/foo/build-x86_64_virt/x86_64_virt-softmmu/qemu-system-x86_64_virt
|
||||
... -machine virt,accel=kvm,kernel_irqchip,nvdimm ...
|
||||
```
|
||||
|
||||
Also relying on `kata-runtime kata-env` is a reliable way to validate you are using the expected hypervisor:
|
||||
```bash
|
||||
$ kata-runtime kata-env | awk -v RS= '/\[Hypervisor\]/'
|
||||
[Hypervisor]
|
||||
MachineType = "virt"
|
||||
Version = "NEMU (like QEMU) version 3.0.0 (v3.0.0-179-gaf9a791)\nCopyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers"
|
||||
Path = "/home/foo/build-x86_64_virt/x86_64_virt-softmmu/qemu-system-x86_64_virt"
|
||||
BlockDeviceDriver = "virtio-scsi"
|
||||
EntropySource = "/dev/urandom"
|
||||
Msize9p = 8192
|
||||
MemorySlots = 10
|
||||
Debug = true
|
||||
UseVSock = false
|
||||
```
|
143
docs/how-to/how-to-use-sysctls-with-kata.md
Normal file
143
docs/how-to/how-to-use-sysctls-with-kata.md
Normal file
@@ -0,0 +1,143 @@
|
||||
# Setting Sysctls with Kata
|
||||
|
||||
## Sysctls
|
||||
In Linux, the sysctl interface allows an administrator to modify kernel
|
||||
parameters at runtime. Parameters are available via the `/proc/sys/` virtual
|
||||
process file system.
|
||||
|
||||
The parameters include the following subsystems among others:
|
||||
- `fs` (file systems)
|
||||
- `kernel` (kernel)
|
||||
- `net` (networking)
|
||||
- `vm` (virtual memory)
|
||||
|
||||
To get a complete list of kernel parameters, run:
|
||||
```
|
||||
$ sudo sysctl -a
|
||||
```
|
||||
|
||||
Both Docker and Kubernetes provide mechanisms for setting namespaced sysctls.
|
||||
Namespaced sysctls can be set per pod in the case of Kubernetes or per container
|
||||
in case of Docker.
|
||||
The following sysctls are known to be namespaced and can be set with
|
||||
Docker and Kubernetes:
|
||||
|
||||
- `kernel.shm*`
|
||||
- `kernel.msg*`
|
||||
- `kernel.sem`
|
||||
- `fs.mqueue.*`
|
||||
- `net.*`
|
||||
|
||||
### Namespaced Sysctls:
|
||||
|
||||
Kata Containers supports setting namespaced sysctls with Docker and Kubernetes.
|
||||
All namespaced sysctls can be set in the same way as regular Linux based
|
||||
containers, the difference being, in the case of Kata they are set inside the guest.
|
||||
|
||||
#### Setting Namespaced Sysctls with Docker:
|
||||
|
||||
```
|
||||
$ sudo docker run --runtime=kata-runtime -it alpine cat /proc/sys/fs/mqueue/queues_max
|
||||
256
|
||||
$ sudo docker run --runtime=kata-runtime --sysctl fs.mqueue.queues_max=512 -it alpine cat /proc/sys/fs/mqueue/queues_max
|
||||
512
|
||||
```
|
||||
|
||||
... and:
|
||||
|
||||
```
|
||||
$ sudo docker run --runtime=kata-runtime -it alpine cat /proc/sys/kernel/shmmax
|
||||
18446744073692774399
|
||||
$ sudo docker run --runtime=kata-runtime --sysctl kernel.shmmax=1024 -it alpine cat /proc/sys/kernel/shmmax
|
||||
1024
|
||||
```
|
||||
|
||||
For additional documentation on setting sysctls with Docker please refer to [Docker-sysctl-doc](https://docs.docker.com/engine/reference/commandline/run/#configure-namespaced-kernel-parameters-sysctls-at-runtime).
|
||||
|
||||
|
||||
#### Setting Namespaced Sysctls with Kubernetes:
|
||||
|
||||
Kubernetes considers certain sysctls as safe and others as unsafe. For detailed
|
||||
information about what sysctls are considered unsafe, please refer to the [Kubernetes sysctl docs](https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/).
|
||||
For using unsafe sysctls, the cluster admin would need to allow these as:
|
||||
|
||||
```
|
||||
$ kubelet --allowed-unsafe-sysctls 'kernel.msg*,net.ipv4.route.min_pmtu' ...
|
||||
```
|
||||
|
||||
or using the declarative approach as:
|
||||
|
||||
```
|
||||
$ cat kubeadm.yaml
|
||||
apiVersion: kubeadm.k8s.io/v1alpha3
|
||||
kind: InitConfiguration
|
||||
nodeRegistration:
|
||||
kubeletExtraArgs:
|
||||
allowed-unsafe-sysctls: "kernel.msg*,kernel.shm.*,net.*"
|
||||
...
|
||||
```
|
||||
|
||||
The above YAML can then be passed to `kubeadm init` as:
|
||||
```
|
||||
$ sudo -E kubeadm init --config=kubeadm.yaml
|
||||
```
|
||||
|
||||
Both safe and unsafe sysctls can be enabled in the same way in the Pod YAML:
|
||||
|
||||
```
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: sysctl-example
|
||||
spec:
|
||||
securityContext:
|
||||
sysctls:
|
||||
- name: kernel.shm_rmid_forced
|
||||
value: "0"
|
||||
- name: net.ipv4.route.min_pmtu
|
||||
value: "1024"
|
||||
```
|
||||
|
||||
### Non-Namespaced Sysctls:
|
||||
|
||||
Docker and Kubernetes disallow sysctls without a namespace.
|
||||
The recommendation is to set them directly on the host or use a privileged
|
||||
container in the case of Kubernetes.
|
||||
|
||||
In the case of Kata, the approach of setting sysctls on the host does not
|
||||
work since the host sysctls have no effect on a Kata Container running
|
||||
inside a guest. Kata gives you the ability to set non-namespaced sysctls using a privileged container.
|
||||
This has the advantage that the non-namespaced sysctls are set inside the guest
|
||||
without having any effect on the `/proc/sys` values of any other pod or the
|
||||
host itself.
|
||||
|
||||
The recommended approach to do this would be to set the sysctl value in a
|
||||
privileged init container. In this way, the application containers do not need any elevated
|
||||
privileges.
|
||||
|
||||
```
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: busybox-kata
|
||||
spec:
|
||||
runtimeClassName: kata-qemu
|
||||
securityContext:
|
||||
sysctls:
|
||||
- name: kernel.shm_rmid_forced
|
||||
value: "0"
|
||||
containers:
|
||||
- name: busybox-container
|
||||
securityContext:
|
||||
privileged: true
|
||||
image: debian
|
||||
command:
|
||||
- sleep
|
||||
- "3000"
|
||||
initContainers:
|
||||
- name: init-sys
|
||||
securityContext:
|
||||
privileged: true
|
||||
image: busybox
|
||||
command: ['sh', '-c', 'echo "64000" > /proc/sys/vm/max_map_count']
|
||||
```
|
51
docs/how-to/how-to-use-virtio-fs-with-kata.md
Normal file
51
docs/how-to/how-to-use-virtio-fs-with-kata.md
Normal file
@@ -0,0 +1,51 @@
|
||||
# Kata Containers with virtio-fs
|
||||
|
||||
- [Introduction](#introduction)
|
||||
- [Pre-requisites](#pre-requisites)
|
||||
- [Install Kata Containers with virtio-fs support](#install-kata-containers-with-virtio-fs-support)
|
||||
- [Run a Kata Container utilizing virtio-fs](#run-a-kata-container-utilizing-virtio-fs)
|
||||
|
||||
## Introduction
|
||||
|
||||
Container deployments utilize explicit or implicit file sharing between host filesystem and containers. From a trust perspective, avoiding a shared file-system between the trusted host and untrusted container is recommended. This is not always feasible. In Kata Containers, block-based volumes are preferred as they allow usage of either device pass through or `virtio-blk` for access within the virtual machine.
|
||||
|
||||
As of the 1.7 release of Kata Containers, [9pfs](https://www.kernel.org/doc/Documentation/filesystems/9p.txt) is the default filesystem sharing mechanism. While this does allow for workload compatibility, it does so with degraded performance and potential for POSIX compliance limitations.
|
||||
|
||||
To help address these limitations, [virtio-fs](https://virtio-fs.gitlab.io/) has been developed. virtio-fs is a shared file system that lets virtual machines access a directory tree on the host. In Kata Containers, virtio-fs can be used to share container volumes, secrets, config-maps, configuration files (hostname, hosts, `resolv.conf`) and the container rootfs on the host with the guest. virtio-fs provides significant performance and POSIX compliance improvements compared to 9pfs.
|
||||
|
||||
Enabling of virtio-fs requires changes in the guest kernel as well as the VMM. For Kata Containers, experimental virtio-fs support is enabled through the [NEMU VMM](https://github.com/intel/nemu).
|
||||
|
||||
**Note: virtio-fs support is experimental in the 1.7 release of Kata Containers. Work is underway to improve stability, performance and upstream integration. This is available for early preview - use at your own risk**
|
||||
|
||||
This document describes how to get Kata Containers to work with virtio-fs.
|
||||
|
||||
## Pre-requisites
|
||||
|
||||
* Before Kata 1.8 this feature required the host to have hugepages support enabled. Enable this with the `sysctl vm.nr_hugepages=1024` command on the host.
|
||||
|
||||
## Install Kata Containers with virtio-fs support
|
||||
|
||||
The Kata Containers NEMU configuration, the NEMU VMM and the `virtiofs` daemon are available in the [Kata Container release](https://github.com/kata-containers/runtime/releases) artifacts starting with the 1.7 release. While the feature is experimental, distribution packages are not supported, but installation is available through [`kata-deploy`](https://github.com/kata-containers/packaging/tree/master/kata-deploy).
|
||||
|
||||
Install the latest release of Kata as follows:
|
||||
```
|
||||
docker run --runtime=runc -v /opt/kata:/opt/kata -v /var/run/dbus:/var/run/dbus -v /run/systemd:/run/systemd -v /etc/docker:/etc/docker -it katadocker/kata-deploy kata-deploy-docker install
|
||||
```
|
||||
|
||||
This will place the Kata release artifacts in `/opt/kata`, and update Docker's configuration to include a runtime target, `kata-nemu`. Learn more about `kata-deploy` and how to use `kata-deploy` in Kubernetes [here](https://github.com/kata-containers/packaging/tree/master/kata-deploy#kubernetes-quick-start).
|
||||
|
||||
|
||||
## Run a Kata Container utilizing virtio-fs
|
||||
|
||||
Once installed, start a new container, utilizing NEMU + `virtiofs`:
|
||||
```bash
|
||||
$ docker run --runtime=kata-nemu -it busybox
|
||||
```
|
||||
|
||||
Verify the new container is running with the NEMU hypervisor as well as using `virtiofsd`. To do this look for the hypervisor path and the `virtiofs` daemon process on the host:
|
||||
```bash
|
||||
$ ps -aux | grep virtiofs
|
||||
root ... /home/foo/build-x86_64_virt/x86_64_virt-softmmu/qemu-system-x86_64_virt
|
||||
... -machine virt,accel=kvm,kernel_irqchip,nvdimm ...
|
||||
root ... /home/foo/build-x86_64_virt/virtiofsd-x86_64 ...
|
||||
```
|
53
docs/how-to/how-to-use-virtio-mem-with-kata.md
Normal file
53
docs/how-to/how-to-use-virtio-mem-with-kata.md
Normal file
@@ -0,0 +1,53 @@
|
||||
# Kata Containers with `virtio-mem`
|
||||
|
||||
- [Introduction](#introduction)
|
||||
- [Requisites](#requisites)
|
||||
- [Run a Kata Container utilizing `virtio-mem`](#run-a-kata-container-utilizing-virtio-mem)
|
||||
|
||||
## Introduction
|
||||
|
||||
The basic idea of `virtio-mem` is to provide a flexible, cross-architecture memory hot plug and hot unplug solution that avoids many limitations imposed by existing technologies, architectures, and interfaces.
|
||||
More details can be found in https://lkml.org/lkml/2019/12/12/681.
|
||||
|
||||
Kata Containers with `virtio-mem` supports memory resize.
|
||||
|
||||
## Requisites
|
||||
|
||||
Kata Containers with `virtio-mem` requires Linux and the QEMU that support `virtio-mem`.
|
||||
The Linux kernel and QEMU upstream version still not support `virtio-mem`. @davidhildenbrand is working on them.
|
||||
Please use following unofficial version of the Linux kernel and QEMU that support `virtio-mem` with Kata Containers.
|
||||
|
||||
The Linux kernel is at https://github.com/davidhildenbrand/linux/tree/virtio-mem-rfc-v4.
|
||||
The Linux kernel config that can work with Kata Containers is at https://gist.github.com/teawater/016194ee84748c768745a163d08b0fb9.
|
||||
|
||||
The QEMU is at https://github.com/teawater/qemu/tree/kata-virtio-mem. (The original source is at https://github.com/davidhildenbrand/qemu/tree/virtio-mem. Its base version of QEMU cannot work with Kata Containers. So merge the commit of `virtio-mem` to upstream QEMU.)
|
||||
|
||||
Set Linux and the QEMU that support `virtio-mem` with following line in the Kata Containers QEMU configuration `configuration-qemu.toml`:
|
||||
```toml
|
||||
[hypervisor.qemu]
|
||||
path = "qemu-dir"
|
||||
kernel = "vmlinux-dir"
|
||||
```
|
||||
|
||||
Enable `virtio-mem` with following line in the Kata Containers configuration:
|
||||
```toml
|
||||
enable_virtio_mem = true
|
||||
```
|
||||
|
||||
## Run a Kata Container utilizing `virtio-mem`
|
||||
|
||||
Use following command to enable memory overcommitment of a Linux kernel. Because QEMU `virtio-mem` device need to allocate a lot of memory.
|
||||
```
|
||||
$ echo 1 | sudo tee /proc/sys/vm/overcommit_memory
|
||||
```
|
||||
|
||||
Use following command start a Kata Container.
|
||||
```
|
||||
$ docker run --rm -it --runtime=kata --name test busybox
|
||||
```
|
||||
|
||||
Use following command set the memory size of test to default_memory + 512m.
|
||||
```
|
||||
$ docker update -m 512m --memory-swap -1 test
|
||||
```
|
||||
|
BIN
docs/how-to/images/efk_direct_from_json.png
Normal file
BIN
docs/how-to/images/efk_direct_from_json.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 130 KiB |
BIN
docs/how-to/images/efk_direct_json_fields.png
Normal file
BIN
docs/how-to/images/efk_direct_json_fields.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 38 KiB |
BIN
docs/how-to/images/efk_filter_on_tag.png
Normal file
BIN
docs/how-to/images/efk_filter_on_tag.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 136 KiB |
BIN
docs/how-to/images/efk_kata_tag.png
Normal file
BIN
docs/how-to/images/efk_kata_tag.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 51 KiB |
BIN
docs/how-to/images/efk_syslog_entry_detail.png
Normal file
BIN
docs/how-to/images/efk_syslog_entry_detail.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 144 KiB |
24
docs/how-to/offline_cpu.sh
Normal file
24
docs/how-to/offline_cpu.sh
Normal file
@@ -0,0 +1,24 @@
|
||||
#!/bin/bash
|
||||
# Copyright (c) 2019 Intel Corporation
|
||||
#
|
||||
# SPDX-License-Identifier: Apache-2.0
|
||||
#
|
||||
# Description: Offline SOS CPUs except BSP before launch UOS
|
||||
|
||||
[ $(id -u) -eq 0 ] || { echo >&2 "ERROR: run as root"; exit 1; }
|
||||
|
||||
for i in $(ls -d /sys/devices/system/cpu/cpu[1-9]*); do
|
||||
online=`cat $i/online`
|
||||
idx=`echo $i | tr -cd "[0-9]"`
|
||||
echo "INFO:$0: cpu$idx online=$online"
|
||||
if [ "$online" = "1" ]; then
|
||||
echo 0 > $i/online
|
||||
while [ "$online" = "1" ]; do
|
||||
sleep 1
|
||||
echo 0 > $i/online
|
||||
online=`cat $i/online`
|
||||
done
|
||||
echo $idx > /sys/class/vhm/acrn_vhm/offline_cpu
|
||||
fi
|
||||
done
|
||||
|
79
docs/how-to/privileged.md
Normal file
79
docs/how-to/privileged.md
Normal file
@@ -0,0 +1,79 @@
|
||||
# Privileged Kata Containers
|
||||
|
||||
Kata Containers supports creation of containers that are "privileged" (i.e. have additional capabilities and access
|
||||
that is not normally granted).
|
||||
|
||||
* [Warnings](#warnings)
|
||||
* [Host Devices](#host-devices)
|
||||
* [Containerd and CRI](#containerd-and-cri)
|
||||
* [CRI-O](#cri-o)
|
||||
|
||||
## Warnings
|
||||
|
||||
**Warning:** Whilst this functionality is supported, it can decrease the security of Kata Containers if not configured
|
||||
correctly.
|
||||
|
||||
### Host Devices
|
||||
|
||||
By default, when privileged is enabled for a container, all the `/dev/*` block devices from the host are mounted
|
||||
into the guest. This will allow the privileged container inside the Kata guest to gain access to mount any block device
|
||||
from the host, a potentially undesirable side-effect that decreases the security of Kata.
|
||||
|
||||
The following sections document how to configure this behavior in different container runtimes.
|
||||
|
||||
#### Containerd and CRI
|
||||
|
||||
The Containerd CRI allows configuring the privileged host devices behavior for each runtime in the CRI config. This is
|
||||
done with the `privileged_without_host_devices` option. Setting this to `true` will disable hot plugging of the host
|
||||
devices into the guest, even when privileged is enabled.
|
||||
|
||||
Support for configuring privileged host devices behaviour was added in containerd `1.3.0` version.
|
||||
|
||||
See below example config:
|
||||
|
||||
```toml
|
||||
[plugins]
|
||||
[plugins.cri]
|
||||
[plugins.cri.containerd]
|
||||
[plugins.cri.containerd.runtimes.runc]
|
||||
runtime_type = "io.containerd.runc.v1"
|
||||
privileged_without_host_devices = false
|
||||
[plugins.cri.containerd.runtimes.kata]
|
||||
runtime_type = "io.containerd.kata.v2"
|
||||
privileged_without_host_devices = true
|
||||
[plugins.cri.containerd.runtimes.kata.options]
|
||||
ConfigPath = "/opt/kata/share/defaults/kata-containers/configuration.toml"
|
||||
```
|
||||
|
||||
- [Kata Containers with Containerd and CRI documentation](how-to-use-k8s-with-cri-containerd-and-kata.md)
|
||||
- [Containerd CRI config documentation](https://github.com/containerd/cri/blob/master/docs/config.md)
|
||||
|
||||
#### CRI-O
|
||||
|
||||
Similar to containerd, CRI-O allows configuring the privileged host devices
|
||||
behavior for each runtime in the CRI config. This is done with the
|
||||
`privileged_without_host_devices` option. Setting this to `true` will disable
|
||||
hot plugging of the host devices into the guest, even when privileged is enabled.
|
||||
|
||||
Support for configuring privileged host devices behaviour was added in CRI-O `1.16.0` version.
|
||||
|
||||
See below example config:
|
||||
|
||||
```toml
|
||||
[crio.runtime.runtimes.runc]
|
||||
runtime_path = "/usr/local/bin/crio-runc"
|
||||
runtime_type = "oci"
|
||||
runtime_root = "/run/runc"
|
||||
privileged_without_host_devices = false
|
||||
[crio.runtime.runtimes.kata]
|
||||
runtime_path = "/usr/bin/kata-runtime"
|
||||
runtime_type = "oci"
|
||||
privileged_without_host_devices = true
|
||||
[crio.runtime.runtimes.kata-shim2]
|
||||
runtime_path = "/usr/local/bin/containerd-shim-kata-v2"
|
||||
runtime_type = "vm"
|
||||
privileged_without_host_devices = true
|
||||
```
|
||||
|
||||
- [Kata Containers with CRI-O](https://github.com/kata-containers/documentation/blob/master/how-to/run-kata-with-k8s.md#cri-o)
|
||||
|
204
docs/how-to/run-kata-with-k8s.md
Normal file
204
docs/how-to/run-kata-with-k8s.md
Normal file
@@ -0,0 +1,204 @@
|
||||
# Run Kata Containers with Kubernetes
|
||||
|
||||
* [Run Kata Containers with Kubernetes](#run-kata-containers-with-kubernetes)
|
||||
* [Prerequisites](#prerequisites)
|
||||
* [Install a CRI implementation](#install-a-cri-implementation)
|
||||
* [CRI-O](#cri-o)
|
||||
* [Kubernetes Runtime Class (CRI-O v1.12 )](#kubernetes-runtime-class-cri-o-v112)
|
||||
* [Untrusted annotation (until CRI-O v1.12)](#untrusted-annotation-until-cri-o-v112)
|
||||
* [Network namespace management](#network-namespace-management)
|
||||
* [containerd with CRI plugin](#containerd-with-cri-plugin)
|
||||
* [Install Kubernetes](#install-kubernetes)
|
||||
* [Configure for CRI-O](#configure-for-cri-o)
|
||||
* [Configure for containerd](#configure-for-containerd)
|
||||
* [Run a Kubernetes pod with Kata Containers](#run-a-kubernetes-pod-with-kata-containers)
|
||||
|
||||
## Prerequisites
|
||||
This guide requires Kata Containers available on your system, install-able by following [this guide](https://github.com/kata-containers/documentation/blob/master/install/README.md).
|
||||
|
||||
## Install a CRI implementation
|
||||
|
||||
Kubernetes CRI (Container Runtime Interface) implementations allow using any
|
||||
OCI-compatible runtime with Kubernetes, such as the Kata Containers runtime.
|
||||
|
||||
Kata Containers support both the [CRI-O](https://github.com/kubernetes-incubator/cri-o) and
|
||||
[CRI-containerd](https://github.com/containerd/cri) CRI implementations.
|
||||
|
||||
After choosing one CRI implementation, you must make the appropriate configuration
|
||||
to ensure it integrates with Kata Containers.
|
||||
|
||||
Kata Containers 1.5 introduced the `shimv2` for containerd 1.2.0, reducing the components
|
||||
required to spawn pods and containers, and this is the preferred way to run Kata Containers with Kubernetes ([as documented here](https://github.com/kata-containers/documentation/blob/master/how-to/how-to-use-k8s-with-cri-containerd-and-kata.md#configure-containerd-to-use-kata-containers)).
|
||||
|
||||
An equivalent shim implementation for CRI-O is planned.
|
||||
|
||||
### CRI-O
|
||||
For CRI-O installation instructions, refer to the [CRI-O Tutorial](https://github.com/kubernetes-incubator/cri-o/blob/master/tutorial.md) page.
|
||||
|
||||
The following sections show how to set up the CRI-O configuration file (default path: `/etc/crio/crio.conf`) for Kata.
|
||||
|
||||
Unless otherwise stated, all the following settings are specific to the `crio.runtime` table:
|
||||
```toml
|
||||
# The "crio.runtime" table contains settings pertaining to the OCI
|
||||
# runtime used and options for how to set up and manage the OCI runtime.
|
||||
[crio.runtime]
|
||||
```
|
||||
A comprehensive documentation of the configuration file can be found [here](https://github.com/cri-o/cri-o/blob/master/docs/crio.conf.5.md).
|
||||
|
||||
> **Note**: After any change to this file, the CRI-O daemon have to be restarted with:
|
||||
>````
|
||||
>$ sudo systemctl restart crio
|
||||
>````
|
||||
|
||||
#### Kubernetes Runtime Class (CRI-O v1.12+)
|
||||
The [Kubernetes Runtime Class](https://kubernetes.io/docs/concepts/containers/runtime-class/)
|
||||
is the preferred way of specifying the container runtime configuration to run a Pod's containers.
|
||||
To use this feature, Kata must added as a runtime handler with:
|
||||
|
||||
```toml
|
||||
[crio.runtime.runtimes.kata-runtime]
|
||||
runtime_path = "/usr/bin/kata-runtime"
|
||||
runtime_type = "oci"
|
||||
```
|
||||
|
||||
You can also add multiple entries to specify alternatives hypervisors, e.g.:
|
||||
```toml
|
||||
[crio.runtime.runtimes.kata-qemu]
|
||||
runtime_path = "/usr/bin/kata-runtime"
|
||||
runtime_type = "oci"
|
||||
|
||||
[crio.runtime.runtimes.kata-fc]
|
||||
runtime_path = "/usr/bin/kata-runtime"
|
||||
runtime_type = "oci"
|
||||
```
|
||||
|
||||
#### Untrusted annotation (until CRI-O v1.12)
|
||||
The untrusted annotation is used to specify a runtime for __untrusted__ workloads, i.e.
|
||||
a runtime to be used when the workload cannot be trusted and a higher level of security
|
||||
is required. An additional flag can be used to let CRI-O know if a workload
|
||||
should be considered _trusted_ or _untrusted_ by default.
|
||||
For further details, see the documentation
|
||||
[here](https://github.com/kata-containers/documentation/blob/master/design/architecture.md#mixing-vm-based-and-namespace-based-runtimes).
|
||||
|
||||
```toml
|
||||
# runtime is the OCI compatible runtime used for trusted container workloads.
|
||||
# This is a mandatory setting as this runtime will be the default one
|
||||
# and will also be used for untrusted container workloads if
|
||||
# runtime_untrusted_workload is not set.
|
||||
runtime = "/usr/bin/runc"
|
||||
|
||||
# runtime_untrusted_workload is the OCI compatible runtime used for untrusted
|
||||
# container workloads. This is an optional setting, except if
|
||||
# default_container_trust is set to "untrusted".
|
||||
runtime_untrusted_workload = "/usr/bin/kata-runtime"
|
||||
|
||||
# default_workload_trust is the default level of trust crio puts in container
|
||||
# workloads. It can either be "trusted" or "untrusted", and the default
|
||||
# is "trusted".
|
||||
# Containers can be run through different container runtimes, depending on
|
||||
# the trust hints we receive from kubelet:
|
||||
# - If kubelet tags a container workload as untrusted, crio will try first to
|
||||
# run it through the untrusted container workload runtime. If it is not set,
|
||||
# crio will use the trusted runtime.
|
||||
# - If kubelet does not provide any information about the container workload trust
|
||||
# level, the selected runtime will depend on the default_container_trust setting.
|
||||
# If it is set to "untrusted", then all containers except for the host privileged
|
||||
# ones, will be run by the runtime_untrusted_workload runtime. Host privileged
|
||||
# containers are by definition trusted and will always use the trusted container
|
||||
# runtime. If default_container_trust is set to "trusted", crio will use the trusted
|
||||
# container runtime for all containers.
|
||||
default_workload_trust = "untrusted"
|
||||
```
|
||||
|
||||
#### Network namespace management
|
||||
To enable networking for the workloads run by Kata, CRI-O needs to be configured to
|
||||
manage network namespaces, by setting the following key to `true`.
|
||||
|
||||
In CRI-O v1.16:
|
||||
```toml
|
||||
manage_network_ns_lifecycle = true
|
||||
```
|
||||
In CRI-O v1.17+:
|
||||
```toml
|
||||
manage_ns_lifecycle = true
|
||||
```
|
||||
|
||||
|
||||
### containerd with CRI plugin
|
||||
|
||||
If you select containerd with `cri` plugin, follow the "Getting Started for Developers"
|
||||
instructions [here](https://github.com/containerd/cri#getting-started-for-developers)
|
||||
to properly install it.
|
||||
|
||||
To customize containerd to select Kata Containers runtime, follow our
|
||||
"Configure containerd to use Kata Containers" internal documentation
|
||||
[here](https://github.com/kata-containers/documentation/blob/master/how-to/how-to-use-k8s-with-cri-containerd-and-kata.md#configure-containerd-to-use-kata-containers).
|
||||
|
||||
## Install Kubernetes
|
||||
|
||||
Depending on what your needs are and what you expect to do with Kubernetes,
|
||||
please refer to the following
|
||||
[documentation](https://kubernetes.io/docs/setup/) to install it correctly.
|
||||
|
||||
Kubernetes talks with CRI implementations through a `container-runtime-endpoint`,
|
||||
also called CRI socket. This socket path is different depending on which CRI
|
||||
implementation you chose, and the Kubelet service has to be updated accordingly.
|
||||
|
||||
### Configure for CRI-O
|
||||
|
||||
`/etc/systemd/system/kubelet.service.d/0-crio.conf`
|
||||
```
|
||||
[Service]
|
||||
Environment="KUBELET_EXTRA_ARGS=--container-runtime=remote --runtime-request-timeout=15m --container-runtime-endpoint=unix:///var/run/crio/crio.sock"
|
||||
```
|
||||
|
||||
### Configure for containerd
|
||||
|
||||
`/etc/systemd/system/kubelet.service.d/0-cri-containerd.conf`
|
||||
```
|
||||
[Service]
|
||||
Environment="KUBELET_EXTRA_ARGS=--container-runtime=remote --runtime-request-timeout=15m --container-runtime-endpoint=unix:///run/containerd/containerd.sock"
|
||||
```
|
||||
For more information about containerd see the "Configure Kubelet to use containerd"
|
||||
documentation [here](https://github.com/kata-containers/documentation/blob/master/how-to/how-to-use-k8s-with-cri-containerd-and-kata.md#configure-kubelet-to-use-containerd).
|
||||
|
||||
## Run a Kubernetes pod with Kata Containers
|
||||
|
||||
After you update your Kubelet service based on the CRI implementation you
|
||||
are using, reload and restart Kubelet. Then, start your cluster:
|
||||
```bash
|
||||
$ sudo systemctl daemon-reload
|
||||
$ sudo systemctl restart kubelet
|
||||
|
||||
# If using CRI-O
|
||||
$ sudo kubeadm init --skip-preflight-checks --cri-socket /var/run/crio/crio.sock --pod-network-cidr=10.244.0.0/16
|
||||
|
||||
# If using CRI-containerd
|
||||
$ sudo kubeadm init --skip-preflight-checks --cri-socket /run/containerd/containerd.sock --pod-network-cidr=10.244.0.0/16
|
||||
|
||||
$ export KUBECONFIG=/etc/kubernetes/admin.conf
|
||||
```
|
||||
|
||||
You can force Kubelet to use Kata Containers by adding some `untrusted`
|
||||
annotation to your pod configuration. In our case, this ensures Kata
|
||||
Containers is the selected runtime to run the described workload.
|
||||
|
||||
`nginx-untrusted.yaml`
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: nginx-untrusted
|
||||
annotations:
|
||||
io.kubernetes.cri.untrusted-workload: "true"
|
||||
spec:
|
||||
containers:
|
||||
- name: nginx
|
||||
image: nginx
|
||||
```
|
||||
|
||||
Next, you run your pod:
|
||||
```
|
||||
$ sudo -E kubectl apply -f nginx-untrusted.yaml
|
||||
```
|
||||
|
246
docs/how-to/service-mesh.md
Normal file
246
docs/how-to/service-mesh.md
Normal file
@@ -0,0 +1,246 @@
|
||||
# Kata Containers and service mesh for Kubernetes
|
||||
|
||||
* [Assumptions](#assumptions)
|
||||
* [How they work](#how-they-work)
|
||||
* [Prerequisites](#prerequisites)
|
||||
* [Kata and Kubernetes](#kata-and-kubernetes)
|
||||
* [Restrictions](#restrictions)
|
||||
* [Install and deploy your service mesh](#install-and-deploy-your-service-mesh)
|
||||
* [Service Mesh Istio](#service-mesh-istio)
|
||||
* [Service Mesh Linkerd](#service-mesh-linkerd)
|
||||
* [Inject your services with sidecars](#inject-your-services-with-sidecars)
|
||||
* [Sidecar Istio](#sidecar-istio)
|
||||
* [Sidecar Linkerd](#sidecar-linkerd)
|
||||
* [Run your services with Kata](#run-your-services-with-kata)
|
||||
* [Lower privileges](#lower-privileges)
|
||||
* [Add annotations](#add-annotations)
|
||||
* [Deploy](#deploy)
|
||||
|
||||
A service mesh is a way to monitor and control the traffic between
|
||||
micro-services running in your Kubernetes cluster. It is a powerful
|
||||
tool that you might want to use in combination with the security
|
||||
brought by Kata Containers.
|
||||
|
||||
## Assumptions
|
||||
|
||||
You are expected to be familiar with concepts such as __pods__,
|
||||
__containers__, __control plane__, __data plane__, and __sidecar__.
|
||||
|
||||
## How they work
|
||||
|
||||
Istio and Linkerd both rely on the same model, where they run controller
|
||||
applications in the control plane, and inject a proxy as a sidecar inside
|
||||
the pod running the service. The proxy registers in the control plane as
|
||||
a first step, and it constantly sends different sorts of information about
|
||||
the service running inside the pod. That information comes from the
|
||||
filtering performed when receiving all the traffic initially intended for
|
||||
the service. That is how the interaction between the control plane and the
|
||||
proxy allows the user to apply load balancing and authentication rules to
|
||||
the incoming and outgoing traffic, inside the cluster, and between multiple
|
||||
micro-services.
|
||||
|
||||
This cannot not happen without a good amount of `iptables` rules ensuring
|
||||
the packets reach the proxy instead of the expected service. Rules are
|
||||
setup through an __init__ container because they have to be there as soon
|
||||
as the proxy starts.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### Kata and Kubernetes
|
||||
|
||||
Follow the [instructions](https://github.com/kata-containers/documentation/blob/master/install/README.md)
|
||||
to get Kata Containers properly installed and configured with Kubernetes.
|
||||
You can choose between CRI-O and CRI-containerd, both are supported
|
||||
through this document.
|
||||
|
||||
For both cases, select the workloads as _trusted_ by default. This way,
|
||||
your cluster and your service mesh run with `runc`, and only the containers
|
||||
you choose to annotate run with Kata Containers.
|
||||
|
||||
### Restrictions
|
||||
|
||||
As documented [here](https://github.com/linkerd/linkerd2/issues/982),
|
||||
a kernel version between 4.14.22 and 4.14.40 causes a deadlock when
|
||||
`getsockopt()` gets called with the `SO_ORIGINAL_DST` option. Unfortunately,
|
||||
both service meshes use this system call with this same option from the
|
||||
proxy container running inside the VM. This means that you cannot run
|
||||
this kernel version range as the guest kernel for Kata if you want your
|
||||
service mesh to work.
|
||||
|
||||
As mentioned when explaining the basic functioning of those service meshes,
|
||||
`iptables` are heavily used, and they need to be properly enabled through
|
||||
the guest kernel config. If they are not properly enabled, the init container
|
||||
is not able to perform a proper setup of the rules.
|
||||
|
||||
## Install and deploy your service mesh
|
||||
|
||||
### Service Mesh Istio
|
||||
|
||||
As a reference, you can follow Istio [instructions](https://istio.io/docs/setup/kubernetes/quick-start/#download-and-prepare-for-the-installation).
|
||||
|
||||
The following is a summary of what you need to install Istio on your system:
|
||||
```
|
||||
$ curl -L https://git.io/getLatestIstio | sh -
|
||||
$ cd istio-*
|
||||
$ export PATH=$PWD/bin:$PATH
|
||||
```
|
||||
|
||||
Now deploy Istio in the control plane of your cluster with the following:
|
||||
```
|
||||
$ kubectl apply -f install/kubernetes/istio-demo.yaml
|
||||
```
|
||||
|
||||
To verify that the control plane is properly deployed, you can use both of
|
||||
the following commands:
|
||||
```
|
||||
$ kubectl get svc -n istio-system
|
||||
$ kubectl get pods -n istio-system -o wide
|
||||
```
|
||||
|
||||
### Service Mesh Linkerd
|
||||
|
||||
As a reference, follow the Linkerd [instructions](https://linkerd.io/2/getting-started/index.html).
|
||||
|
||||
The following is a summary of what you need to install Linkerd on your system:
|
||||
```
|
||||
$ curl https://run.linkerd.io/install | sh
|
||||
$ export PATH=$PATH:$HOME/.linkerd/bin
|
||||
```
|
||||
|
||||
Now deploy Linkerd in the control plane of your cluster with the following:
|
||||
```
|
||||
$ linkerd install | kubectl apply -f -
|
||||
```
|
||||
|
||||
To verify that the control plane is properly deployed, you can use both of
|
||||
the following commands:
|
||||
```
|
||||
$ kubectl get svc -n linkerd
|
||||
$ kubectl get pods -n linkerd -o wide
|
||||
```
|
||||
|
||||
## Inject your services with sidecars
|
||||
|
||||
Once the control plane is running, you need a deployment to define a few
|
||||
services that rely on each other. Then, you inject the YAML file with the
|
||||
sidecar proxy using the tools provided by each service mesh.
|
||||
|
||||
If you do not have such a deployment ready, refer to the samples provided
|
||||
by each project.
|
||||
|
||||
### Sidecar Istio
|
||||
|
||||
Istio provides a [`bookinfo`](https://istio.io/docs/examples/bookinfo/)
|
||||
sample, which you can rely on to inject their `envoy` proxy as a
|
||||
sidecar.
|
||||
|
||||
You need to use their tool called `istioctl kube-inject` to inject
|
||||
your YAML file. We use their `bookinfo` sample as an example:
|
||||
```
|
||||
$ istioctl kube-inject -f samples/bookinfo/kube/bookinfo.yaml -o bookinfo-injected.yaml
|
||||
```
|
||||
|
||||
### Sidecar Linkerd
|
||||
|
||||
Linkerd provides an [`emojivoto`](https://linkerd.io/2/getting-started/index.html)
|
||||
sample, which you can rely on to inject their `linkerd` proxy as a
|
||||
sidecar.
|
||||
|
||||
You need to use their tool called `linkerd inject` to inject your YAML
|
||||
file. We use their `emojivoto` sample as example:
|
||||
```
|
||||
$ wget https://raw.githubusercontent.com/runconduit/conduit-examples/master/emojivoto/emojivoto.yml
|
||||
$ linkerd inject emojivoto.yml > emojivoto-injected.yaml
|
||||
```
|
||||
|
||||
## Run your services with Kata
|
||||
|
||||
Now that your service deployment is injected with the appropriate sidecar
|
||||
containers, manually edit your deployment to make it work with Kata.
|
||||
|
||||
### Lower privileges
|
||||
|
||||
In Kubernetes, the __init__ container is often `privileged` as it needs to
|
||||
setup the environment, which often needs some root privileges. In the case
|
||||
of those services meshes, all they need is the `NET_ADMIN` capability to
|
||||
modify the underlying network rules. Linkerd, by default, does not use
|
||||
`privileged` container, but Istio does.
|
||||
|
||||
Because of the previous reason, if you use Istio you need to switch all
|
||||
containers with `privileged: true` to `privileged: false`.
|
||||
|
||||
### Add annotations
|
||||
|
||||
There is no difference between Istio and Linkerd in this section. It is
|
||||
about which CRI implementation you use.
|
||||
|
||||
For both CRI-O and CRI-containerd, you have to add an annotation indicating
|
||||
the workload for this deployment is not _trusted_, which will trigger
|
||||
`kata-runtime` to be called instead of `runc`.
|
||||
|
||||
__CRI-O:__
|
||||
|
||||
Add the following annotation for CRI-O
|
||||
```yaml
|
||||
io.kubernetes.cri-o.TrustedSandbox: "false"
|
||||
```
|
||||
The following is an example of what your YAML can look like:
|
||||
|
||||
```yaml
|
||||
...
|
||||
apiVersion: extensions/v1beta1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
creationTimestamp: null
|
||||
name: details-v1
|
||||
spec:
|
||||
replicas: 1
|
||||
strategy: {}
|
||||
template:
|
||||
metadata:
|
||||
annotations:
|
||||
io.kubernetes.cri-o.TrustedSandbox: "false"
|
||||
sidecar.istio.io/status: '{"version":"55c9e544b52e1d4e45d18a58d0b34ba4b72531e45fb6d1572c77191422556ffc","initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["istio-envoy","istio-certs"],"imagePullSecrets":null}'
|
||||
creationTimestamp: null
|
||||
labels:
|
||||
app: details
|
||||
version: v1
|
||||
...
|
||||
```
|
||||
|
||||
__CRI-containerd:__
|
||||
|
||||
Add the following annotation for CRI-containerd
|
||||
```yaml
|
||||
io.kubernetes.cri.untrusted-workload: "true"
|
||||
```
|
||||
The following is an example of what your YAML can look like:
|
||||
|
||||
```yaml
|
||||
...
|
||||
apiVersion: extensions/v1beta1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
creationTimestamp: null
|
||||
name: details-v1
|
||||
spec:
|
||||
replicas: 1
|
||||
strategy: {}
|
||||
template:
|
||||
metadata:
|
||||
annotations:
|
||||
io.kubernetes.cri.untrusted-workload: "true"
|
||||
sidecar.istio.io/status: '{"version":"55c9e544b52e1d4e45d18a58d0b34ba4b72531e45fb6d1572c77191422556ffc","initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["istio-envoy","istio-certs"],"imagePullSecrets":null}'
|
||||
creationTimestamp: null
|
||||
labels:
|
||||
app: details
|
||||
version: v1
|
||||
...
|
||||
```
|
||||
|
||||
### Deploy
|
||||
|
||||
Deploy your application by using the following:
|
||||
```
|
||||
$ kubectl apply -f myapp-injected.yaml
|
||||
```
|
47
docs/how-to/what-is-vm-cache-and-how-do-I-use-it.md
Normal file
47
docs/how-to/what-is-vm-cache-and-how-do-I-use-it.md
Normal file
@@ -0,0 +1,47 @@
|
||||
# What Is VMCache and How To Enable It
|
||||
|
||||
* [What is VMCache](#what-is-vmcache)
|
||||
* [How is this different to VM templating](#how-is-this-different-to-vm-templating)
|
||||
* [How to enable VMCache](#how-to-enable-vmcache)
|
||||
* [Limitations](#limitations)
|
||||
|
||||
### What is VMCache
|
||||
|
||||
VMCache is a new function that creates VMs as caches before using it.
|
||||
It helps speed up new container creation.
|
||||
The function consists of a server and some clients communicating
|
||||
through Unix socket. The protocol is gRPC in [`protocols/cache/cache.proto`](https://github.com/kata-containers/runtime/blob/master/protocols/cache/cache.proto).
|
||||
The VMCache server will create some VMs and cache them by factory cache.
|
||||
It will convert the VM to gRPC format and transport it when gets
|
||||
requested from clients.
|
||||
Factory `grpccache` is the VMCache client. It will request gRPC format
|
||||
VM and convert it back to a VM. If VMCache function is enabled,
|
||||
`kata-runtime` will request VM from factory `grpccache` when it creates
|
||||
a new sandbox.
|
||||
|
||||
### How is this different to VM templating
|
||||
|
||||
Both [VM templating](https://github.com/kata-containers/documentation/blob/master/how-to/what-is-vm-templating-and-how-do-I-use-it.md) and VMCache help speed up new container creation.
|
||||
When VM templating enabled, new VMs are created by cloning from a pre-created template VM, and they will share the same initramfs, kernel and agent memory in readonly mode. So it saves a lot of memory if there are many Kata Containers running on the same host.
|
||||
VMCache is not vulnerable to [share memory CVE](https://github.com/kata-containers/documentation/blob/master/how-to/what-is-vm-templating-and-how-do-I-use-it.md#what-are-the-cons) because each VM doesn't share the memory.
|
||||
|
||||
### How to enable VMCache
|
||||
|
||||
VMCache can be enabled by changing your Kata Containers config file (`/usr/share/defaults/kata-containers/configuration.toml`,
|
||||
overridden by `/etc/kata-containers/configuration.toml` if provided) such that:
|
||||
* `vm_cache_number` specifies the number of caches of VMCache:
|
||||
* unspecified or == 0
|
||||
VMCache is disabled
|
||||
* `> 0`
|
||||
will be set to the specified number
|
||||
* `vm_cache_endpoint` specifies the address of the Unix socket.
|
||||
|
||||
Then you can create a VM templating for later usage by calling:
|
||||
```
|
||||
$ sudo kata-runtime factory init
|
||||
```
|
||||
and purge it by `ctrl-c` it.
|
||||
|
||||
### Limitations
|
||||
* Cannot work with VM templating.
|
||||
* Only supports the QEMU hypervisor.
|
60
docs/how-to/what-is-vm-templating-and-how-do-I-use-it.md
Normal file
60
docs/how-to/what-is-vm-templating-and-how-do-I-use-it.md
Normal file
@@ -0,0 +1,60 @@
|
||||
# What Is VM Templating and How To Enable It
|
||||
|
||||
### What is VM templating
|
||||
VM templating is a Kata Containers feature that enables new VM
|
||||
creation using a cloning technique. When enabled, new VMs are created
|
||||
by cloning from a pre-created template VM, and they will share the
|
||||
same initramfs, kernel and agent memory in readonly mode. It is very
|
||||
much like a process fork done by the kernel but here we *fork* VMs.
|
||||
|
||||
### How is this different from VMCache
|
||||
Both [VMCache](https://github.com/kata-containers/documentation/blob/master/how-to/what-is-vm-cache-and-how-do-I-use-it.md) and VM templating help speed up new container creation.
|
||||
When VMCache enabled, new VMs are created by the VMCache server. So it is not vulnerable to share memory CVE because each VM doesn't share the memory.
|
||||
VM templating saves a lot of memory if there are many Kata Containers running on the same host.
|
||||
|
||||
### What are the Pros
|
||||
VM templating helps speed up new container creation and saves a lot
|
||||
of memory if there are many Kata Containers running on the same host.
|
||||
If you are running a density workload, or care a lot about container
|
||||
startup speed, VM templating can be very useful.
|
||||
|
||||
In one example, we created 100 Kata Containers each claiming 128MB
|
||||
guest memory and ended up saving 9GB of memory in total when VM templating
|
||||
is enabled, which is about 72% of the total guest memory. See [full results
|
||||
here](https://github.com/kata-containers/runtime/pull/303#issuecomment-395846767).
|
||||
|
||||
In another example, we created ten Kata Containers with containerd shimv2
|
||||
and calculated the average boot up speed for each of them. The result
|
||||
showed that VM templating speeds up Kata Containers creation by as much as
|
||||
38.68%. See [full results here](https://gist.github.com/bergwolf/06974a3c5981494a40e2c408681c085d).
|
||||
|
||||
### What are the Cons
|
||||
One drawback of VM templating is that it cannot avoid cross-VM side-channel
|
||||
attack such as [CVE-2015-2877](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-2877)
|
||||
that originally targeted at the Linux KSM feature.
|
||||
It was concluded that "Share-until-written approaches for memory conservation among
|
||||
mutually untrusting tenants are inherently detectable for information disclosure,
|
||||
and can be classified as potentially misunderstood behaviors rather than vulnerabilities."
|
||||
|
||||
**Warning**: If you care about such attack vector, do not use VM templating or KSM.
|
||||
|
||||
### How to enable VM templating
|
||||
VM templating can be enabled by changing your Kata Containers config file (`/usr/share/defaults/kata-containers/configuration.toml`,
|
||||
overridden by `/etc/kata-containers/configuration.toml` if provided) such that:
|
||||
|
||||
- `qemu-lite` is specified in `hypervisor.qemu`->`path` section
|
||||
- `enable_template = true`
|
||||
- `initrd =` is set
|
||||
- `image =` option is commented out or removed
|
||||
|
||||
Then you can create a VM templating for later usage by calling
|
||||
```
|
||||
$ sudo kata-runtime factory init
|
||||
```
|
||||
and purge it by calling
|
||||
```
|
||||
$ sudo kata-runtime factory destroy
|
||||
```
|
||||
|
||||
If you do not want to call `kata-runtime factory init` by hand,
|
||||
the very first Kata container you create will automatically create a VM templating.
|
Reference in New Issue
Block a user