kata-monitor: add a README file

Fixes: #3704

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
This commit is contained in:
Francesco Giudici 2022-04-13 17:46:48 +02:00
parent 4443bb68a4
commit 7b2ff02647
3 changed files with 70 additions and 0 deletions

View File

@ -51,6 +51,7 @@ The `kata-monitor` management agent should be started on each node where the Kat
> **Note**: a *node* running Kata containers will be either a single host system or a worker node belonging to a K8s cluster capable of running Kata pods.
- Aggregate sandbox metrics running on the node, adding the `sandbox_id` label to them.
- Attach the additional `cri_uid`, `cri_name` and `cri_namespace` labels to the sandbox metrics, tracking the `uid`, `name` and `namespace` Kubernetes pod metadata.
- Expose a new Prometheus target, allowing all node metrics coming from the Kata shim to be collected by Prometheus indirectly. This simplifies the targets count in Prometheus and avoids exposing shim's metrics by `ip:port`.
Only one `kata-monitor` process runs in each node.

View File

@ -10,6 +10,7 @@ This repository contains the following components:
|-|-|
| `containerd-shim-kata-v2` | The [shimv2 runtime](../../docs/design/architecture/README.md#runtime) |
| `kata-runtime` | [utility program](../../docs/design/architecture/README.md#utility-program) |
| `kata-monitor` | [metrics collector daemon](cmd/kata-monitor/README.md) |
For details of the other Kata Containers repositories, see the
[repository summary](https://github.com/kata-containers/kata-containers).

View File

@ -0,0 +1,68 @@
# Kata monitor
## Overview
`kata-monitor` is a daemon able to collect and expose metrics related to all the Kata Containers workloads running on the same host.
Once started, it detects all the running Kata Containers runtimes (`containerd-shim-kata-v2`) in the system and exposes few http endpoints to allow the retrieval of the available data.
The main endpoint is the `/metrics` one which aggregates metrics from all the kata workloads.
Available metrics include:
* Kata runtime metrics
* Kata agent metrics
* Kata guest OS metrics
* Hypervisor metrics
* Firecracker metrics
* Kata monitor metrics
All the provided metrics are in Prometheus format. While `kata-monitor` can be used as a standalone daemon on any host running Kata Containers workloads and can be used for retrieving profiling data from the running Kata runtimes, its main expected usage is to be deployed as a DaemonSet on a Kubernetes cluster: there Prometheus should scrape the metrics from the kata-monitor endpoints.
For more information on the Kata Containers metrics architecture and a detailed list of the available metrics provided by Kata monitor check the [Kata 2.0 Metrics Design](../../../../docs/design/kata-2-0-metrics.md) document.
## Usage
Each `kata-monitor` instance detects and monitors the Kata Container workloads running on the same node.
### Kata monitor arguments
The `kata-monitor` binary accepts the following arguments:
* `--listen-address` _IP:PORT_
* `--runtime-enpoint` _PATH_TO_THE_CONTAINER_MANAGER_CRI_INTERFACE_
* `--log-level` _[ trace | debug | info | warn | error | fatal | panic ]_
The **listen-address** specifies the IP and TCP port where the kata-monitor HTTP endpoints will be exposed. It defaults to `127.0.0.1:8090`.
The **runtime-endpoint** is the CRI of a CRI compliant container manager: it will be used to retrieve the CRI `PodSandboxMetadata` (`uid`, `name` and `namespace`) which will be attached to the Kata metrics through the labels `cri_uid`, `cri_name` and `cri_namespace`. It defaults to the containerd socket: `/run/containerd/containerd.sock`.
The **log-level** allows the chose how verbose the logs should be. The default is `info`.
### Kata monitor HTTP endpoints
`kata-monitor` exposes the following endpoints:
* `/metrics` : get Kata sandboxes metrics.
* `/sandboxes` : list all the Kata sandboxes running on the host.
* `/agent-url` : Get the agent URL of a Kata sandbox.
* `/debug/vars` : Internal data of the Kata runtime shim.
* `/debug/pprof/` : Golang profiling data of the Kata runtime shim: index page.
* `/debug/pprof/cmdline` : Golang profiling data of the Kata runtime shim: `cmdline` endpoint.
* `/debug/pprof/profile` : Golang profiling data of the Kata runtime shim: `profile` endpoint (CPU profiling).
* `/debug/pprof/symbol` : Golang profiling data of the Kata runtime shim: `symbol` endpoint.
* `/debug/pprof/trace` : Golang profiling data of the Kata runtime shim: `trace` endpoint.
**NOTE: The debug endpoints are available only if the [Kata Containers configuration file](https://github.com/kata-containers/kata-containers/blob/9d5b03a1b70bbd175237ec4b9f821d6ccee0a1f6/src/runtime/config/configuration-qemu.toml.in#L590-L592) includes** `enable_pprof = true` **in the** `[runtime]` **section**.
The `/sandboxes` endpoint lists the _sandbox ID_ of all the detected Kata runtimes. If accessed via a web browser, it provides html links to the endpoints available for each sandbox.
In order to retrieve data for a specific Kata workload, the _sandbox ID_ should be passed in the query string using the _sandbox_ key. The `/agent-url`, and all the `/debug/`* endpoints require `sandbox_id` to be specified in the query string.
<br>
#### Examples
Retrieve the IDs of the available sandboxes:
```bash
$ curl 127.0.0.1:8090/sandboxes
```
output:
```
6fcf0a90b01e90d8747177aa466c3462d02e02a878bc393649df83d4c314af0c
df96b24bd49ec437c872c1a758edc084121d607ce1242ff5d2263a0e1b693343
```
Retrieve the `agent-url` of the sandbox with ID _df96b24bd49ec437c872c1a758edc084121d607ce1242ff5d2263a0e1b693343_:
```bash
$ curl 127.0.0.1:8090/agent-url?sandbox=df96b24bd49ec437c872c1a758edc084121d607ce1242ff5d2263a0e1b693343
```
output:
```
vsock://830455376:1024
```