mirror of
https://github.com/kata-containers/kata-containers.git
synced 2025-05-09 17:07:33 +00:00
To make the code directory structure more clear: └── src ├── agent ├── libs │ └── logging ├── runtime ├── runtime-rs (to be added) └── tools ├── agent-ctl └── trace-forwarder Fixes: #3204 Signed-off-by: Peng Tao <bergwolf@hyper.sh>
214 lines
7.5 KiB
Markdown
214 lines
7.5 KiB
Markdown
# Overview
|
|
|
|
This document explains how to trace Kata Containers components.
|
|
|
|
# Introduction
|
|
|
|
The Kata Containers runtime and agent are able to generate
|
|
[OpenTelemetry][opentelemetry] trace spans, which allow the administrator to
|
|
observe what those components are doing and how much time they are spending on
|
|
each operation.
|
|
|
|
# OpenTelemetry summary
|
|
|
|
An OpenTelemetry-enabled application creates a number of trace "spans". A span
|
|
contains the following attributes:
|
|
|
|
- A name
|
|
- A pair of timestamps (recording the start time and end time of some operation)
|
|
- A reference to the span's parent span
|
|
|
|
All spans need to be *finished*, or *completed*, to allow the OpenTelemetry
|
|
framework to generate the final trace information (by effectively closing the
|
|
transaction encompassing the initial (root) span and all its children).
|
|
|
|
For Kata, the root span represents the total amount of time taken to run a
|
|
particular component from startup to its shutdown (the "run time").
|
|
|
|
# Architecture
|
|
|
|
## Runtime tracing architecture
|
|
|
|
The runtime, which runs in the host environment, has been modified to
|
|
optionally generate trace spans which are sent to a trace collector on the
|
|
host.
|
|
|
|
## Agent tracing architecture
|
|
|
|
An OpenTelemetry system (such as [Jaeger][jaeger-tracing]) uses a collector to
|
|
gather up trace spans from the application for viewing and processing. For an
|
|
application to use the collector, it must run in the same context as
|
|
the collector.
|
|
|
|
This poses a problem for tracing the Kata Containers agent since it does not
|
|
run in the same context as the collector: it runs inside a virtual machine (VM).
|
|
|
|
To allow spans from the agent to be sent to the trace collector, Kata provides
|
|
a [trace forwarder][trace-forwarder] component. This runs in the same context
|
|
as the collector (generally on the host system) and listens on a
|
|
[`VSOCK`][vsock] channel for traces generated by the agent, forwarding them on
|
|
to the trace collector.
|
|
|
|
> **Note:**
|
|
>
|
|
> This design supports agent tracing without having to make changes to the
|
|
> image, but also means that [custom images][osbuilder] can also benefit from
|
|
> agent tracing.
|
|
|
|
The following diagram summarises the architecture used to trace the Kata
|
|
Containers agent:
|
|
|
|
```
|
|
+--------------------------------------------+
|
|
| Host |
|
|
| |
|
|
| +---------------+ |
|
|
| | OpenTelemetry | |
|
|
| | Trace | |
|
|
| | Collector | |
|
|
| +---------------+ |
|
|
| ^ +---------------+ |
|
|
| | spans | Kata VM | |
|
|
| +-----+-----+ | | |
|
|
| | Kata | spans o +-------+ | |
|
|
| | Trace |<-----------------| Kata | | |
|
|
| | Forwarder | VSOCK o | Agent | | |
|
|
| +-----------+ Channel | +-------+ | |
|
|
| +---------------+ |
|
|
+--------------------------------------------+
|
|
```
|
|
|
|
# Agent tracing prerequisites
|
|
|
|
- You must have a trace collector running.
|
|
|
|
Although the collector normally runs on the host, it can also be run from
|
|
inside a Docker image configured to expose the appropriate host ports to the
|
|
collector.
|
|
|
|
The [Jaeger "all-in-one" Docker image][jaeger-all-in-one] method
|
|
is the quickest and simplest way to run the collector for testing.
|
|
|
|
- If you wish to trace the agent, you must start the
|
|
[trace forwarder][trace-forwarder].
|
|
|
|
> **Notes:**
|
|
>
|
|
> - If agent tracing is enabled but the forwarder is not running,
|
|
> the agent will log an error (signalling that it cannot generate trace
|
|
> spans), but continue to work as normal.
|
|
>
|
|
> - The trace forwarder requires a trace collector (such as Jaeger) to be
|
|
> running before it is started. If a collector is not running, the trace
|
|
> forwarder will exit with an error.
|
|
|
|
# Enable tracing
|
|
|
|
By default, tracing is disabled for all components. To enable _any_ form of
|
|
tracing an `enable_tracing` option must be enabled for at least one component.
|
|
|
|
> **Note:**
|
|
>
|
|
> Enabling this option will only allow tracing for subsequently
|
|
> started containers.
|
|
|
|
## Enable runtime tracing
|
|
|
|
To enable runtime tracing, set the tracing option as shown:
|
|
|
|
```toml
|
|
[runtime]
|
|
enable_tracing = true
|
|
```
|
|
|
|
## Enable agent tracing
|
|
|
|
To enable agent tracing, set the tracing option as shown:
|
|
|
|
```toml
|
|
[agent.kata]
|
|
enable_tracing = true
|
|
```
|
|
|
|
> **Note:**
|
|
>
|
|
> If both agent tracing and runtime tracing are enabled, the resulting trace
|
|
> spans will be "collated": expanding individual runtime spans in the Jaeger
|
|
> web UI will show the agent trace spans resulting from the runtime
|
|
> operation.
|
|
|
|
# Appendices
|
|
|
|
## Agent tracing requirements
|
|
|
|
### Host environment
|
|
|
|
- The host kernel must support the VSOCK socket type.
|
|
|
|
This will be available if the kernel is built with the
|
|
`CONFIG_VHOST_VSOCK` configuration option.
|
|
|
|
- The VSOCK kernel module must be loaded:
|
|
|
|
```
|
|
$ sudo modprobe vhost_vsock
|
|
```
|
|
|
|
### Guest environment
|
|
|
|
- The guest kernel must support the VSOCK socket type:
|
|
|
|
This will be available if the kernel is built with the
|
|
`CONFIG_VIRTIO_VSOCKETS` configuration option.
|
|
|
|
> **Note:** The default Kata Containers guest kernel provides this feature.
|
|
|
|
## Agent tracing limitations
|
|
|
|
- Agent tracing is only "completed" when the workload and the Kata agent
|
|
process have exited.
|
|
|
|
Although trace information *can* be inspected before the workload and agent
|
|
have exited, it is incomplete. This is shown as `<trace-without-root-span>`
|
|
in the Jaeger web UI.
|
|
|
|
If the workload is still running, the trace transaction -- which spans the entire
|
|
runtime of the Kata agent -- will not have been completed. To view the complete
|
|
trace details, wait for the workload to end, or stop the container.
|
|
|
|
## Performance impact
|
|
|
|
[OpenTelemetry][opentelemetry] is designed for high performance. It combines
|
|
the best of two previous generation projects (OpenTracing and OpenCensus) and
|
|
uses a very efficient mechanism to capture trace spans. Further, the trace
|
|
points inserted into the agent are generated dynamically at compile time. This
|
|
is advantageous since new versions of the agent will automatically benefit
|
|
from improvements in the tracing infrastructure. Overall, the impact of
|
|
enabling runtime and agent tracing should be extremely low.
|
|
|
|
## Agent shutdown behaviour
|
|
|
|
In normal operation, the Kata runtime manages the VM shutdown and performs
|
|
certain optimisations to speed up this process. However, if agent tracing is
|
|
enabled, the agent itself is responsible for shutting down the VM. This it to
|
|
ensure all agent trace transactions are completed. This means there will be a
|
|
small performance impact for container shutdown when agent tracing is enabled
|
|
as the runtime must wait for the VM to shutdown fully.
|
|
|
|
## Set up a tracing development environment
|
|
|
|
If you want to debug, further develop, or test tracing,
|
|
[enabling full debug][enable-full-debug]
|
|
is highly recommended. For working with the agent, you may also wish to
|
|
[enable a debug console][setup-debug-console]
|
|
to allow you to access the VM environment.
|
|
|
|
[enable-full-debug]: https://github.com/kata-containers/kata-containers/blob/main/docs/Developer-Guide.md#enable-full-debug
|
|
[jaeger-all-in-one]: https://www.jaegertracing.io/docs/getting-started/
|
|
[jaeger-tracing]: https://www.jaegertracing.io
|
|
[opentelemetry]: https://opentelemetry.io
|
|
[osbuilder]: https://github.com/kata-containers/kata-containers/blob/main/tools/osbuilder
|
|
[setup-debug-console]: https://github.com/kata-containers/kata-containers/blob/main/docs/Developer-Guide.md#set-up-a-debug-console
|
|
[trace-forwarder]: /src/tools/trace-forwarder
|
|
[vsock]: https://wiki.qemu.org/Features/VirtioVsock
|