mirror of
				https://github.com/kata-containers/kata-containers.git
				synced 2025-10-31 09:26:52 +00:00 
			
		
		
		
	To make the code directory structure more clear:
└── src
    ├── agent
    ├── libs
    │   └── logging
    ├── runtime
    ├── runtime-rs (to be added)
    └── tools
        ├── agent-ctl
        └── trace-forwarder
Fixes: #3204
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
		
	
		
			
				
	
	
		
			214 lines
		
	
	
		
			6.7 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			214 lines
		
	
	
		
			6.7 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| # Kata Tracing proposals
 | |
| 
 | |
| ## Overview
 | |
| 
 | |
| This document summarises a set of proposals triggered by the
 | |
| [tracing documentation PR][tracing-doc-pr].
 | |
| 
 | |
| ## Required context
 | |
| 
 | |
| This section explains some terminology required to understand the proposals.
 | |
| Further details can be found in the
 | |
| [tracing documentation PR][tracing-doc-pr].
 | |
| 
 | |
| ### Agent trace mode terminology
 | |
| 
 | |
| | Trace mode | Description | Use-case |
 | |
| |-|-|-|
 | |
| | Static |  Trace agent from startup to shutdown | Entire lifespan |
 | |
| | Dynamic | Toggle tracing on/off as desired | On-demand "snapshot" |
 | |
| 
 | |
| ### Agent trace type terminology
 | |
| 
 | |
| | Trace type | Description | Use-case |
 | |
| |-|-|-|
 | |
| | isolated | traces all relate to single component | Observing lifespan |
 | |
| | collated | traces "grouped" (runtime+agent) | Understanding component interaction |
 | |
| 
 | |
| ### Container lifespan
 | |
| 
 | |
| | Lifespan | trace mode | trace type |
 | |
| |-|-|-|
 | |
| | short-lived | static | collated if possible, else isolated? |
 | |
| | long-running | dynamic | collated? (to see interactions) |
 | |
| 
 | |
| ## Original plan for agent
 | |
| 
 | |
| - Implement all trace types and trace modes for agent.
 | |
| 
 | |
| - Why?
 | |
|   - Maximum flexibility.
 | |
| 
 | |
|     > **Counterargument:**
 | |
|     >
 | |
|     > Due to the intrusive nature of adding tracing, we have
 | |
|     > learnt that landing small incremental changes is simpler and quicker!
 | |
| 
 | |
|   - Compatibility with [Kata 1.x tracing][kata-1x-tracing].
 | |
| 
 | |
|     > **Counterargument:**
 | |
|     >
 | |
|     > Agent tracing in Kata 1.x was extremely awkward to setup (to the extent
 | |
|     > that it's unclear how many users actually used it!)
 | |
|     >
 | |
|     > This point, coupled with the new architecture for Kata 2.x, suggests
 | |
|     > that we may not need to supply the same set of tracing features (in fact
 | |
|     > they may not make sense)).
 | |
| 
 | |
| ## Agent tracing proposals
 | |
| 
 | |
| ### Agent tracing proposal 1: Don't implement dynamic trace mode
 | |
| 
 | |
| - All tracing will be static.
 | |
| 
 | |
| - Why?
 | |
|   - Because dynamic tracing will always be "partial"
 | |
| 
 | |
|     > In fact, not only would it be only a "snapshot" of activity, it may not
 | |
|     > even be possible to create a complete "trace transaction". If this is
 | |
|     > true, the trace output would be partial and would appear "unstructured".
 | |
| 
 | |
| ### Agent tracing proposal 2: Simplify handling of trace type
 | |
| 
 | |
| - Agent tracing will be "isolated" by default.
 | |
| - Agent tracing will be "collated" if runtime tracing is also enabled.
 | |
| 
 | |
| - Why?
 | |
|   - Offers a graceful fallback for agent tracing if runtime tracing disabled.
 | |
|   - Simpler code!
 | |
| 
 | |
| ## Questions to ask yourself (part 1)
 | |
| 
 | |
| - Are your containers long-running or short-lived?
 | |
| 
 | |
| - Would you ever need to turn on tracing "briefly"?
 | |
|   - If "yes", is a "partial trace" useful or useless?
 | |
| 
 | |
|     > Likely to be considered useless as it is a partial snapshot.
 | |
|     > Alternative tracing methods may be more appropriate to dynamic
 | |
|     > OpenTelemetry tracing.
 | |
| 
 | |
| ## Questions to ask yourself (part 2)
 | |
| 
 | |
| - Are you happy to stop a container to enable tracing?
 | |
|   If "no", dynamic tracing may be required.
 | |
| 
 | |
| - Would you ever want to trace the agent and the runtime "in isolation" at the
 | |
|   same time?
 | |
|   - If "yes", we need to fully implement `trace_mode=isolated`
 | |
| 
 | |
|     > This seems unlikely though.
 | |
| 
 | |
| ## Trace collection
 | |
| 
 | |
| The second set of proposals affect the way traces are collected.
 | |
| 
 | |
| ### Motivation
 | |
| 
 | |
| Currently:
 | |
| 
 | |
| - The runtime sends trace spans to Jaeger directly.
 | |
| - The agent will send trace spans to the [`trace-forwarder`][trace-forwarder] component.
 | |
| - The trace forwarder will send trace spans to Jaeger.
 | |
| 
 | |
| Kata agent tracing overview:
 | |
| 
 | |
| ```
 | |
| +-------------------------------------------+
 | |
| | Host                                      |
 | |
| |                                           |
 | |
| | +-----------+                             |
 | |
| | | Trace     |                             |
 | |
| | | Collector |                             |
 | |
| | +-----+-----+                             |
 | |
| |       ^                  +--------------+ |
 | |
| |       | spans            | Kata VM      | |
 | |
| | +-----+-----+            |              | |
 | |
| | | Kata      |    spans   |     +-----+  | |
 | |
| | | Trace     |<-----------------|Kata |  | |
 | |
| | | Forwarder |    VSOCK   |     |Agent|  | |
 | |
| | +-----------+    Channel |     +-----+  | |
 | |
| |                          +--------------+ |
 | |
| +-------------------------------------------+
 | |
| ```
 | |
| 
 | |
| Currently:
 | |
| 
 | |
| - If agent tracing is enabled but the trace forwarder is not running,
 | |
|   the agent will error.
 | |
| 
 | |
| - If the trace forwarder is started but Jaeger is not running,
 | |
|   the trace forwarder will error.
 | |
| 
 | |
| ### Goals
 | |
| 
 | |
| - The runtime and agent should:
 | |
|   - Use the same trace collection implementation.
 | |
|   - Use the most the common configuration items.
 | |
| 
 | |
| - Kata should should support more trace collection software or `SaaS`
 | |
|   (for example `Zipkin`, `datadog`).
 | |
| 
 | |
| - Trace collection should not block normal runtime/agent operations
 | |
|   (for example if `vsock-exporter`/Jaeger is not running, Kata Containers should work normally).
 | |
| 
 | |
| ### Trace collection proposals
 | |
| 
 | |
| #### Trace collection proposal 1: Send all spans to the trace forwarder as a span proxy
 | |
| 
 | |
| Kata runtime/agent all send spans to trace forwarder, and the trace forwarder,
 | |
| acting as a tracing proxy, sends all spans to a tracing back-end, such as Jaeger or `datadog`.
 | |
| 
 | |
| **Pros:**
 | |
| 
 | |
| - Runtime/agent will be simple.
 | |
| - Could update trace collection target while Kata Containers are running.
 | |
| 
 | |
| **Cons:**
 | |
| 
 | |
| - Requires the trace forwarder component to be running (that is a pressure to operation).
 | |
| 
 | |
| #### Trace collection proposal 2: Send spans to collector directly from runtime/agent
 | |
| 
 | |
| Send spans to collector directly from runtime/agent, this proposal need
 | |
| network accessible to the collector.
 | |
| 
 | |
| **Pros:**
 | |
| 
 | |
| - No additional trace forwarder component needed.
 | |
| 
 | |
| **Cons:**
 | |
| 
 | |
| - Need more code/configuration to support all trace collectors.
 | |
| 
 | |
| ## Future work
 | |
| 
 | |
| - We could add dynamic and fully isolated tracing at a later stage,
 | |
|   if required.
 | |
| 
 | |
| ## Further details
 | |
| 
 | |
| - See the new [GitHub project](https://github.com/orgs/kata-containers/projects/28).
 | |
| - [kata-containers-tracing-status](https://gist.github.com/jodh-intel/0ee54d41d2a803ba761e166136b42277) gist.
 | |
| - [tracing documentation PR][tracing-doc-pr].
 | |
| 
 | |
| ## Summary
 | |
| 
 | |
| ### Time line
 | |
| 
 | |
| - 2021-07-01: A summary of the discussion was
 | |
|   [posted to the mail list](http://lists.katacontainers.io/pipermail/kata-dev/2021-July/001996.html).
 | |
| - 2021-06-22: These proposals were
 | |
|   [discussed in the Kata Architecture Committee meeting](https://etherpad.opendev.org/p/Kata_Containers_2021_Architecture_Committee_Mtgs).
 | |
| - 2021-06-18: These proposals where
 | |
|   [announced on the mailing list](http://lists.katacontainers.io/pipermail/kata-dev/2021-June/001980.html).
 | |
| 
 | |
| ### Outcome
 | |
| 
 | |
| - Nobody opposed the agent proposals, so they are being implemented.
 | |
| - The trace collection proposals are still being considered.
 | |
| 
 | |
| [kata-1x-tracing]: https://github.com/kata-containers/agent/blob/master/TRACING.md
 | |
| [trace-forwarder]: /src/tools/trace-forwarder
 | |
| [tracing-doc-pr]: https://github.com/kata-containers/kata-containers/pull/1937
 |