mirror of
https://github.com/kata-containers/kata-containers.git
synced 2025-04-27 11:31:05 +00:00
debugging: adding a script and instructions for debugging the GO shim
Using a debugger with the kata runtime is complicated, but it can be done and can be very useful. This commits provides a helper script that simplifies it, and updates the developper's documentation to explain how to use it. Signed-off-by: Julien Ropé <jrope@redhat.com>
This commit is contained in:
parent
eae429a39b
commit
e7cfc0865a
185
docs/Debug-shim-guide.md
Normal file
185
docs/Debug-shim-guide.md
Normal file
@ -0,0 +1,185 @@
|
||||
# Using a debugger with the runtime
|
||||
|
||||
Setting up a debugger for the runtime is pretty complex: the shim is a server
|
||||
process that is run by the runtime manager (containerd/CRI-O), and controlled by
|
||||
sending gRPC requests to it.
|
||||
Starting the shim with a debugger then just gives you a process that waits for
|
||||
commands on its socket, and if the runtime manager doesn't start it, it won't
|
||||
send request to it.
|
||||
|
||||
A first method is to attach a debugger to the process that was started by the
|
||||
runtime manager.
|
||||
If the issue you're trying to debug is not located at container creation, this
|
||||
is probably the easiest method.
|
||||
|
||||
The other method involves a script that is placed in between the runtime manager
|
||||
and the actual shim binary. This allows to start the shim with a debugger, and
|
||||
wait for a client debugger connection before execution, allowing debugging of the
|
||||
kata runtime from the very beginning.
|
||||
|
||||
## Prerequisite
|
||||
|
||||
At the time of writing, a debugger was used only with the go shim, but a similar
|
||||
process should be doable with runtime-rs. This documentation will be enhanced
|
||||
with rust-specific instructions later on.
|
||||
|
||||
In order to debug the go runtime, you need to use the [Delve debugger](https://github.com/go-delve/delve).
|
||||
|
||||
You will also need to build the shim binary with debug flags to make sure symbols
|
||||
are available to the debugger.
|
||||
Typically, the flags should be: `-gcflags=all=-N -l`
|
||||
|
||||
## Attach to the running process
|
||||
|
||||
To attach the debugger to the running process, all you need is to let the container
|
||||
start as usual, then use the following command with `dlv`:
|
||||
|
||||
`$ dlv attach [pid of your kata shim]`
|
||||
|
||||
If you need to use your debugger remotely, you can use the following on your target
|
||||
machine:
|
||||
|
||||
`$ dlv attach [pid of your kata shim] --headless --listen=[IP:port]`
|
||||
|
||||
then from your client computer:
|
||||
|
||||
`$ dlv connect [IP:port]`
|
||||
|
||||
## Make CRI-O/containerd start the shim with the debugger
|
||||
|
||||
You can use the [this script](../tools/containerd-shim-katadbg-v2) to make the
|
||||
shim binary executed through a debugger, and make the debugger wait for a client
|
||||
connection before running the shim.
|
||||
This allows starting your container, connecting your debugger, and controlling the
|
||||
shim execution from the beginning.
|
||||
|
||||
### Adapt the script to your setup
|
||||
|
||||
You need to edit the script itself to give it the actual binary
|
||||
to execute.
|
||||
Locate the following line in the script, and set the path accordingly.
|
||||
|
||||
```bash
|
||||
SHIM_BINARY=
|
||||
```
|
||||
|
||||
You may also need to edit the `PATH` variable set within the script,
|
||||
to make sure that the `dlv` binary is accessible.
|
||||
|
||||
### Configure your runtime manager to use the script
|
||||
|
||||
Using either containerd or CRI-O, you will need to have a runtime class that
|
||||
uses the script in place of the actual runtime binary.
|
||||
To do that, we will create a separate runtime class dedicated to debugging.
|
||||
|
||||
- **For containerd**:
|
||||
Make sure that the `containerd-shim-katadbg-v2` script is available to containerd
|
||||
(putting it in the same folder as your regular kata shim typically).
|
||||
Then edit the containerd configuration, and add the following runtime configuration,.
|
||||
|
||||
```toml
|
||||
[plugins]
|
||||
[plugins."io.containerd.grpc.v1.cri"]
|
||||
[plugins."io.containerd.grpc.v1.cri".containerd]
|
||||
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
|
||||
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.katadbg]
|
||||
runtime_type = "io.containerd.katadbg.v2"
|
||||
```
|
||||
|
||||
- **For CRI-O**:
|
||||
Copy your existing kata runtime configuration from `/etc/crio/crio.conf.d/`, and
|
||||
make a new one with the name `katadbg`, and the runtime_path set to the location
|
||||
of the script.
|
||||
|
||||
E.g:
|
||||
|
||||
```toml
|
||||
[crio.runtime.runtimes.katadbg]
|
||||
runtime_path = "/usr/local/bin/containerd-shim-katadbg-v2"
|
||||
runtime_root = "/run/vc"
|
||||
runtime_type = "vm"
|
||||
privileged_without_host_devices = true
|
||||
runtime_config_path = "/usr/share/defaults/kata-containers/configuration.toml"
|
||||
```
|
||||
|
||||
NOTE: for CRI-O, the name of the runtime class doesn't need to match the name of the
|
||||
script. But for consistency, we're using `katadbg` here too.
|
||||
|
||||
### Start your container and connect to the debugger
|
||||
|
||||
Once the above configuration is in place, you can start your container, using
|
||||
your `katadbg` runtime class.
|
||||
|
||||
E.g: `$ crictl runp --runtime=katadbg sandbox.json`
|
||||
|
||||
The command will hang, and you can see that a `dlv` process is started
|
||||
|
||||
```
|
||||
$ ps aux | grep dlv
|
||||
root 9137 1.4 6.8 6231104 273980 pts/10 Sl 15:04 0:02 dlv exec /go/src/github.com/kata-containers/kata-containers/src/runtime/__debug_bin --headless --listen=:12345 --accept-multiclient -r stdout:/tmp/shim_output_oMC6Jo -r stderr:/tmp/shim_output_oMC6Jo -- -namespace default -address -publish-binary /usr/local/bin/crio -id 0bc23d2208d4ff8c407a80cd5635610e772cae36c73d512824490ef671be9293 -debug start
|
||||
```
|
||||
|
||||
Then you can use the `dlv` debugger to connect to it:
|
||||
|
||||
```
|
||||
$ dlv connect localhost:12345
|
||||
Type 'help' for list of commands.
|
||||
(dlv)
|
||||
```
|
||||
|
||||
Before doing anything else, you need to to enable `follow-exec` mode in delve.
|
||||
This is because the first thing that the shim will do is to daemonize itself,
|
||||
i.e: start itself as a subprocess, and exit. So you really want the debugger
|
||||
to attach to the child process.
|
||||
|
||||
```
|
||||
(dlv) target follow-exec -on .*/__debug_bin
|
||||
```
|
||||
|
||||
Note that we are providing a regular expression to filter the name of the binary.
|
||||
This is to make sure that the debugger attaches to the runtime shim, and not
|
||||
to other subprocesses (hypervisor typically).
|
||||
|
||||
To ease this process, we recommand the use of an init file containing the above
|
||||
command.
|
||||
|
||||
```
|
||||
$ cat dlv.ini
|
||||
target follow-exec -on .*/__debug_bin
|
||||
$ dlv connect localhost:12345 --init=dlv.ini
|
||||
Type 'help' for list of commands.
|
||||
(dlv)
|
||||
```
|
||||
|
||||
Once this is done, you can set breakpoints, and use the `continue` keyword to
|
||||
start the execution of the shim.
|
||||
|
||||
You can also use a different client, like VSCode, to connect to it.
|
||||
A typical `launch.json` configuration for VSCode would look like:
|
||||
|
||||
```yaml
|
||||
[...]
|
||||
{
|
||||
"name": "Connect to the debugger",
|
||||
"type": "go",
|
||||
"request": "attach",
|
||||
"mode": "remote",
|
||||
"port": 12345,
|
||||
"host": "127.0.0.1",
|
||||
}
|
||||
[...]
|
||||
```
|
||||
|
||||
NOTE: VSCode's go extension doesn't seem to support the `follow-exec` mode from
|
||||
Delve. So if you want to use VScode, you'll still need to use a commandline
|
||||
`dlv` client to set the `follow-exec` flag.
|
||||
|
||||
## Caveats
|
||||
|
||||
Debugging takes time, and there are a lot of timeouts going on in a Kubernetes
|
||||
environments. It is very possible that while you're debugging, some processes
|
||||
will timeout and cancel the container execution, possibly breaking your debugging
|
||||
session.
|
||||
|
||||
You can mitigate that by increasing the timeouts in the different components
|
||||
involved in your environment.
|
@ -771,6 +771,11 @@ $ sudo su -c 'cd /var/run/vc/vm/${sandbox_id} && socat "stdin,raw,echo=0,escape=
|
||||
To disconnect from the virtual machine, type `CONTROL+q` (hold down the
|
||||
`CONTROL` key and press `q`).
|
||||
|
||||
## Use a debugger with the runtime
|
||||
|
||||
For developers interested in using a debugger with the runtime, please
|
||||
look at [this document](Debug-shim-guide.md).
|
||||
|
||||
## Obtain details of the image
|
||||
|
||||
If the image is created using
|
||||
|
101
tools/containerd-shim-katadbg-v2
Executable file
101
tools/containerd-shim-katadbg-v2
Executable file
@ -0,0 +1,101 @@
|
||||
#!/bin/bash
|
||||
#
|
||||
# Copyright (c) 2024 Red Hat, Inc.
|
||||
#
|
||||
# SPDX-License-Identifier: Apache-2.0
|
||||
#
|
||||
# This script allows debugging the GO kata shim using Delve.
|
||||
# It will start the delve debugger in a way where it runs the program and waits
|
||||
# for connections from your client interface (dlv commandline, vscode, etc).
|
||||
#
|
||||
# You need to configure crio or containerd to use this script in place of the
|
||||
# regular kata shim binary.
|
||||
# For cri-o, that would be in the runtime configuration, under
|
||||
# /etc/crio/crio.conf.d/
|
||||
#
|
||||
|
||||
# Use this for quick-testing the shim binary without a debugger
|
||||
#NO_DEBUG=1
|
||||
|
||||
# Edit this to point to the actual shim binary that needs to be debugged
|
||||
# Make sure you build it with the following flags:
|
||||
# -gcflags=all=-N -l
|
||||
SHIM_BINARY=/go/src/github.com/kata-containers/kata-containers/src/runtime/__debug_bin
|
||||
|
||||
DLV_PORT=12345
|
||||
|
||||
# Edit the following to make sure dlv is in the PATH
|
||||
export PATH=/usr/local/go/bin/:$PATH
|
||||
|
||||
# The shim can be called multiple times for the same container.
|
||||
# If it is already running, subsequent calls just return the socket address that
|
||||
# crio/containerd need to connect to.
|
||||
# This is useful for recovery, if crio/contaienrd is restarted and loses context.
|
||||
#
|
||||
# We usually don't want to debug those additional calls while we're already
|
||||
# debugging the actual server process.
|
||||
# To avoid running additional debuggers and blocking on them, we use a lock file.
|
||||
LOCK_FILE=/tmp/shim_debug.lock
|
||||
if [ -e $LOCK_FILE ]; then
|
||||
NO_DEBUG=1
|
||||
fi
|
||||
|
||||
# crio can try to call the shim with the "features" or "--version" parameters
|
||||
# to get capabilities from the runtime (assuming it's an OCI compatible runtime).
|
||||
# No need to debug that, so just run the regular shim.
|
||||
case "$1" in
|
||||
"features" | "--version")
|
||||
NO_DEBUG=1
|
||||
;;
|
||||
esac
|
||||
|
||||
|
||||
if [ "$NO_DEBUG" == "1" ]; then
|
||||
$SHIM_BINARY "$@"
|
||||
exit $?
|
||||
fi
|
||||
|
||||
|
||||
# dlv commandline
|
||||
#
|
||||
# --headless: dlv run as a server, waiting for a connection
|
||||
#
|
||||
# --listen: port to listen to
|
||||
#
|
||||
# --accept-multiclient: allow multiple dlv client connections
|
||||
# Allows having both a commandline and a GUI
|
||||
#
|
||||
# -r: have the output of the shim redirected to a separate file.
|
||||
# This script will retrieve the output and return it to the
|
||||
# caller, while letting dlv run in the background for debugging.
|
||||
#
|
||||
# -- $@ => give the shim all the parameters this script was given
|
||||
#
|
||||
|
||||
SHIMOUTPUT=$(mktemp /tmp/shim_output_XXXXXX)
|
||||
|
||||
cat > $LOCK_FILE << EOF
|
||||
#!/bin/bash
|
||||
dlv exec ${SHIM_BINARY} --headless --listen=:$DLV_PORT --accept-multiclient -r stdout:$SHIMOUTPUT -r stderr:$SHIMOUTPUT -- "\$@"
|
||||
rm $LOCK_FILE
|
||||
EOF
|
||||
chmod +x $LOCK_FILE
|
||||
|
||||
# We're starting dlv as a background task, so that it continues to run while
|
||||
# this script returns, letting the caller resume its execution.
|
||||
#
|
||||
# We're redirecting the outputs of dlv itself to a separate file so that the
|
||||
# only output the caller will have is the one from this script, giving it the
|
||||
# address of the socket to connect to.
|
||||
#
|
||||
${LOCK_FILE} "$@" > /tmp/dlv_output 2>&1 &
|
||||
|
||||
|
||||
# wait for the output file of the shim process to be filled with the address.
|
||||
while [ ! -s $SHIMOUTPUT ]; do
|
||||
sleep 1
|
||||
done
|
||||
|
||||
# write the adress to stdout
|
||||
cat $SHIMOUTPUT
|
||||
exit 0
|
Loading…
Reference in New Issue
Block a user