mirror of
https://github.com/kata-containers/kata-containers.git
synced 2025-04-29 04:04:45 +00:00
docs: Write tracing documentation
Add documentation explaining how to trace the runtime and agent. Fixes: #1892. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
This commit is contained in:
parent
9db56ffd85
commit
09a5e03f4a
@ -11,6 +11,10 @@ For details of the other Kata Containers repositories, see the
|
||||
|
||||
* [Installation guides](./install/README.md): Install and run Kata Containers with Docker or Kubernetes
|
||||
|
||||
## Tracing
|
||||
|
||||
See the [tracing documentation](tracing.md).
|
||||
|
||||
## More User Guides
|
||||
|
||||
* [Upgrading](Upgrading.md): how to upgrade from [Clear Containers](https://github.com/clearcontainers) and [runV](https://github.com/hyperhq/runv) to [Kata Containers](https://github.com/kata-containers) and how to upgrade an existing Kata Containers system to the latest version.
|
||||
|
214
docs/tracing.md
Normal file
214
docs/tracing.md
Normal file
@ -0,0 +1,214 @@
|
||||
# Overview
|
||||
|
||||
This document explains how to trace Kata Containers components.
|
||||
|
||||
# Introduction
|
||||
|
||||
The Kata Containers runtime and agent are able to generate
|
||||
[OpenTelemetry][opentelemetry] trace spans, which allow the administrator to
|
||||
observe what those components are doing and how much time they are spending on
|
||||
each operation.
|
||||
|
||||
# OpenTelemetry summary
|
||||
|
||||
An OpenTelemetry-enabled application creates a number of trace "spans". A span
|
||||
contains the following attributes:
|
||||
|
||||
- A name
|
||||
- A pair of timestamps (recording the start time and end time of some operation)
|
||||
- A reference to the span's parent span
|
||||
|
||||
All spans need to be *finished*, or *completed*, to allow the OpenTelemetry
|
||||
framework to generate the final trace information (by effectively closing the
|
||||
transaction encompassing the initial (root) span and all its children).
|
||||
|
||||
For Kata, the root span represents the total amount of time taken to run a
|
||||
particular component from startup to its shutdown (the "run time").
|
||||
|
||||
# Architecture
|
||||
|
||||
## Runtime tracing architecture
|
||||
|
||||
The runtime, which runs in the host environment, has been modified to
|
||||
optionally generate trace spans which are sent to a trace collector on the
|
||||
host.
|
||||
|
||||
## Agent tracing architecture
|
||||
|
||||
An OpenTelemetry system (such as [Jaeger][jaeger-tracing]) uses a collector to
|
||||
gather up trace spans from the application for viewing and processing. For an
|
||||
application to use the collector, it must run in the same context as
|
||||
the collector.
|
||||
|
||||
This poses a problem for tracing the Kata Containers agent since it does not
|
||||
run in the same context as the collector: it runs inside a virtual machine (VM).
|
||||
|
||||
To allow spans from the agent to be sent to the trace collector, Kata provides
|
||||
a [trace forwarder][trace-forwarder] component. This runs in the same context
|
||||
as the collector (generally on the host system) and listens on a
|
||||
[`VSOCK`][vsock] channel for traces generated by the agent, forwarding them on
|
||||
to the trace collector.
|
||||
|
||||
> **Note:**
|
||||
>
|
||||
> This design supports agent tracing without having to make changes to the
|
||||
> image, but also means that [custom images][osbuilder] can also benefit from
|
||||
> agent tracing.
|
||||
|
||||
The following diagram summarises the architecture used to trace the Kata
|
||||
Containers agent:
|
||||
|
||||
```
|
||||
+--------------------------------------------+
|
||||
| Host |
|
||||
| |
|
||||
| +---------------+ |
|
||||
| | OpenTelemetry | |
|
||||
| | Trace | |
|
||||
| | Collector | |
|
||||
| +---------------+ |
|
||||
| ^ +---------------+ |
|
||||
| | spans | Kata VM | |
|
||||
| +-----+-----+ | | |
|
||||
| | Kata | spans o +-------+ | |
|
||||
| | Trace |<-----------------| Kata | | |
|
||||
| | Forwarder | VSOCK o | Agent | | |
|
||||
| +-----------+ Channel | +-------+ | |
|
||||
| +---------------+ |
|
||||
+--------------------------------------------+
|
||||
```
|
||||
|
||||
# Agent tracing prerequisites
|
||||
|
||||
- You must have a trace collector running.
|
||||
|
||||
Although the collector normally runs on the host, it can also be run from
|
||||
inside a Docker image configured to expose the appropriate host ports to the
|
||||
collector.
|
||||
|
||||
The [Jaeger "all-in-one" Docker image][jaeger-all-in-one] method
|
||||
is the quickest and simplest way to run the collector for testing.
|
||||
|
||||
- If you wish to trace the agent, you must start the
|
||||
[trace forwarder][trace-forwarder].
|
||||
|
||||
> **Notes:**
|
||||
>
|
||||
> - If agent tracing is enabled but the forwarder is not running,
|
||||
> the agent will log an error (signalling that it cannot generate trace
|
||||
> spans), but continue to work as normal.
|
||||
>
|
||||
> - The trace forwarder requires a trace collector (such as Jaeger) to be
|
||||
> running before it is started. If a collector is not running, the trace
|
||||
> forwarder will exit with an error.
|
||||
|
||||
# Enable tracing
|
||||
|
||||
By default, tracing is disabled for all components. To enable _any_ form of
|
||||
tracing an `enable_tracing` option must be enabled for at least one component.
|
||||
|
||||
> **Note:**
|
||||
>
|
||||
> Enabling this option will only allow tracing for subsequently
|
||||
> started containers.
|
||||
|
||||
## Enable runtime tracing
|
||||
|
||||
To enable runtime tracing, set the tracing option as shown:
|
||||
|
||||
```toml
|
||||
[runtime]
|
||||
enable_tracing = true
|
||||
```
|
||||
|
||||
## Enable agent tracing
|
||||
|
||||
To enable agent tracing, set the tracing option as shown:
|
||||
|
||||
```toml
|
||||
[agent.kata]
|
||||
enable_tracing = true
|
||||
```
|
||||
|
||||
> **Note:**
|
||||
>
|
||||
> If both agent tracing and runtime tracing are enabled, the resulting trace
|
||||
> spans will be "collated": expanding individual runtime spans in the Jaeger
|
||||
> web UI will show the agent trace spans resulting from the runtime
|
||||
> operation.
|
||||
|
||||
# Appendices
|
||||
|
||||
## Agent tracing requirements
|
||||
|
||||
### Host environment
|
||||
|
||||
- The host kernel must support the VSOCK socket type.
|
||||
|
||||
This will be available if the kernel is built with the
|
||||
`CONFIG_VHOST_VSOCK` configuration option.
|
||||
|
||||
- The VSOCK kernel module must be loaded:
|
||||
|
||||
```
|
||||
$ sudo modprobe vhost_vsock
|
||||
```
|
||||
|
||||
### Guest environment
|
||||
|
||||
- The guest kernel must support the VSOCK socket type:
|
||||
|
||||
This will be available if the kernel is built with the
|
||||
`CONFIG_VIRTIO_VSOCKETS` configuration option.
|
||||
|
||||
> **Note:** The default Kata Containers guest kernel provides this feature.
|
||||
|
||||
## Agent tracing limitations
|
||||
|
||||
- Agent tracing is only "completed" when the workload and the Kata agent
|
||||
process have exited.
|
||||
|
||||
Although trace information *can* be inspected before the workload and agent
|
||||
have exited, it is incomplete. This is shown as `<trace-without-root-span>`
|
||||
in the Jaeger web UI.
|
||||
|
||||
If the workload is still running, the trace transaction -- which spans the entire
|
||||
runtime of the Kata agent -- will not have been completed. To view the complete
|
||||
trace details, wait for the workload to end, or stop the container.
|
||||
|
||||
## Performance impact
|
||||
|
||||
[OpenTelemetry][opentelemetry] is designed for high performance. It combines
|
||||
the best of two previous generation projects (OpenTracing and OpenCensus) and
|
||||
uses a very efficient mechanism to capture trace spans. Further, the trace
|
||||
points inserted into the agent are generated dynamically at compile time. This
|
||||
is advantageous since new versions of the agent will automatically benefit
|
||||
from improvements in the tracing infrastructure. Overall, the impact of
|
||||
enabling runtime and agent tracing should be extremely low.
|
||||
|
||||
## Agent shutdown behaviour
|
||||
|
||||
In normal operation, the Kata runtime manages the VM shutdown and performs
|
||||
certain optimisations to speed up this process. However, if agent tracing is
|
||||
enabled, the agent itself is responsible for shutting down the VM. This it to
|
||||
ensure all agent trace transactions are completed. This means there will be a
|
||||
small performance impact for container shutdown when agent tracing is enabled
|
||||
as the runtime must wait for the VM to shutdown fully.
|
||||
|
||||
## Set up a tracing development environment
|
||||
|
||||
If you want to debug, further develop, or test tracing,
|
||||
[enabling full debug][enable-full-debug]
|
||||
is highly recommended. For working with the agent, you may also wish to
|
||||
[enable a debug console][setup-debug-console]
|
||||
to allow you to access the VM environment.
|
||||
|
||||
[agent-ctl]: https://github.com/kata-containers/kata-containers/blob/main/tools/agent-ctl
|
||||
[enable-full-debug]: https://github.com/kata-containers/kata-containers/blob/main/docs/Developer-Guide.md#enable-full-debug
|
||||
[jaeger-all-in-one]: https://www.jaegertracing.io/docs/getting-started/
|
||||
[jaeger-tracing]: https://www.jaegertracing.io
|
||||
[opentelemetry]: https://opentelemetry.io
|
||||
[osbuilder]: https://github.com/kata-containers/kata-containers/blob/main/tools/osbuilder
|
||||
[setup-debug-console]: https://github.com/kata-containers/kata-containers/blob/main/docs/Developer-Guide.md#set-up-a-debug-console
|
||||
[trace-forwarder]: https://github.com/kata-containers/kata-containers/blob/main/src/trace-forwarder
|
||||
[vsock]: https://wiki.qemu.org/Features/VirtioVsock
|
@ -3,13 +3,24 @@
|
||||
## Overview
|
||||
|
||||
The Kata Containers trace forwarder, `kata-trace-forwarder`, is a component
|
||||
running on the host system which is used to support tracing the agent process
|
||||
which runs inside the virtual machine.
|
||||
running on the host system which is used to support
|
||||
[tracing the agent process][agent-tracing], which runs inside the Kata
|
||||
Containers virtual machine (VM).
|
||||
|
||||
The trace forwarder, which must be started before the agent, listens over
|
||||
VSOCK for trace data sent by the agent running inside the virtual machine. The
|
||||
trace spans are exported to an OpenTelemetry collector (such as Jaeger)
|
||||
running by default on the host.
|
||||
The trace forwarder, which must be started before the container, listens over
|
||||
[`VSOCK`][vsock] for trace data sent by the agent running inside the VM. The
|
||||
trace spans are exported to an [OpenTelemetry][opentelemetry] collector (such
|
||||
as [Jaeger][jaeger-tracing]) running by default on the host.
|
||||
|
||||
> **Notes:**
|
||||
>
|
||||
> - If agent tracing is enabled but the forwarder is not running,
|
||||
> the agent will log an error (signalling that it cannot generate trace
|
||||
> spans), but continue to work as normal.
|
||||
>
|
||||
> - The trace forwarder requires a trace collector (such as Jaeger) to be
|
||||
> running before it is started. If a collector is not running, the trace
|
||||
> forwarder will exit with an error.
|
||||
|
||||
## Quick start
|
||||
|
||||
@ -133,8 +144,13 @@ You can now proceed as normal to create the "foo" Kata container.
|
||||
|
||||
## Full details
|
||||
|
||||
Run:
|
||||
For further information on how to run the trace forwarder, run:
|
||||
|
||||
```bash
|
||||
$ cargo run -- --help
|
||||
```
|
||||
|
||||
[agent-tracing]: ../../docs/tracing.md
|
||||
[jaeger-tracing]: https://www.jaegertracing.io
|
||||
[opentelemetry]: https://opentelemetry.io
|
||||
[vsock]: https://wiki.qemu.org/Features/VirtioVsock
|
||||
|
Loading…
Reference in New Issue
Block a user