Merge pull request #9341 from ChengyuZhu6/guest-pull-doc

docs: Add documents for kata guest image management
This commit is contained in:
Chengyu Zhu 2024-03-27 21:20:22 +08:00 committed by GitHub
commit 87fc17d4d2
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
5 changed files with 301 additions and 1 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 61 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 122 KiB

View File

@ -0,0 +1,119 @@
# Kata Containers Guest Image Management Design
To safeguard the integrity of container images and prevent tampering from the host side, we propose guest image management. This method, employed for Confidential Containers, ensures container images remain unaltered and secure.
## Introduction to remote snapshot
Containerd 1.7 introduced `remote snapshotter` feature which is the foundation for pulling images in the guest for Confidential Containers.
While it's beyond the scope of this document to fully explain how the container rootfs is created to the point it can be executed, a fundamental grasp of the snapshot concept is essential. Putting it in a simple way, containerd fetches the image layers from an OCI registry into its local content storage. However, they cannot be mounted as is (e.g. the layer can be tar+gzip compressed) as well as they should be immutable so the content can be shared among containers. Thus containerd leverages snapshots of those layers to build the container's rootfs.
The role of `remote snapshotter` is to reuse snapshots that are stored in a remotely shared place, thus enabling containerd to prepare the containers rootfs in a manner similar to that of a local `snapshotter`. The key behavior that makes this the building block of Kata's guest image management for Confidential Containers is that containerd will not pull the image layers from registry, instead it assumes that `remote snapshotter` and/or an external entity will perform that operation on his behalf.
Maybe the simplest example of `remote snapshotter` in Confidential Containers is the pulling of images in the guest VM. Once ensuring the VM is part of a Trusted Computing Base (TCB) and throughout a chain of delegations involving containerd, `remote snapshotter` and kata-runtime, it is possible for the kata-agent to pull the image directly.
## `Remote snapshotter` implementations
`Remote snapshotter` is containerd plug-ins that should implement the [Snapshot Interface](https://pkg.go.dev/github.com/containerd/containerd/v2/snapshots?utm_source=godoc#Snapshotter).
The following `remote snapshotter` is leveraged by Kata Containers:
- `nydus snapshotter`
### `Nydus snapshotter`
This `snapshotter` is implemented as an external containerd proxy plug-in for [`Nydus`](https://nydus.dev/).
Currently it supports a couple of runtime backend, notably, `FUSE`, `virtiofs`, and `EROFS`, being the former leveraged on the tested Kata Containers CI.
## `Nydus snapshotter` and containerd-shim-v2 integration
```go
// KataVirtualVolume encapsulates information for extra mount options and direct volumes.
type KataVirtualVolume struct {
VolumeType string `json:"volume_type"`
Source string `json:"source,omitempty"`
FSType string `json:"fs_type,omitempty"`
Options []string `json:"options,omitempty"`
DirectVolume *DirectAssignedVolume `json:"direct_volume,omitempty"`
ImagePull *ImagePullVolume `json:"image_pull,omitempty"` //<-Used for pulling images in the guest
NydusImage *NydusImageVolume `json:"nydus_image,omitempty"`
DmVerity *DmVerityInfo `json:"dm_verity,omitempty"`
}
```
## Guest image management implementations
### Guest pull with `nydus snapshotter`
Pull the container image directly from the guest VM using `nydus snapshotter` backend.
#### General Characteristics
- Container image pulled in the guest
- Pause image should be built in the guest's rootfs
- Confidentiality for image manifest and config: No
- Confidentiality for blob data: Yes
- Use `nydus snapshotter` as `remote snapshotter` configured with the FUSE runtime backend
#### Architecture
The following diagram provides an overview of the architecture for pulling image in the guest with key components.
![guest-image-management-architecture](arch-images/guest-image-management-architecture.png)
#### Sequence diagrams
The following sequence diagram depicted below offers a detailed overview of the messages/calls exchanged to pull an unencrypted unsigned image from an unauthenticated container registry. This involves the kata-runtime, kata-agent, and the guest-components image-rs to use the guest pull mechanism.
![guest-image-management-details](arch-images/guest-image-management-details.png)
First and foremost, the guest pull code path is only activated when `nydus snapshotter` requires the handling of a volume which type is `image_guest_pull`, as can be seen on the message below:
```json
{
{
"volume_type": "image_guest_pull",
"source":"quay.io/kata-containers/confidential-containers:unsigned",
"fs_type":"overlayfs"
"options": [
"containerd.io/snapshot/cri.layer-digest=sha256:24fb2886d6f6c5d16481dd7608b47e78a8e92a13d6e64d87d57cb16d5f766d63",
"containerd.io/snapshot/nydus-proxy-mode=true"
],
"image_pull": {
"metadata": {
"containerd.io/snapshot/cri.layer-digest": "sha256:24fb2886d6f6c5d16481dd7608b47e78a8e92a13d6e64d87d57cb16d5f766d63",
"containerd.io/snapshot/nydus-proxy-mode": "true"
}
}
}
}
```
In other words, `VolumeType` of `KataVirtualVolumeType` is set to `image_guest_pull`.
Next the `handleImageGuestPullBlockVolume()` is called to build the Storage object that will be attached to the message later sent to kata-agent via the `CreateContainerRequest()` RPC. It is in the `handleImageGuestPullBlockVolume()` that it will begin the handling of the pause image if the request is for a sandbox container type (see more about pause image below).
Below is an example of storage information packaged in the message sent to the kata-agent:
```json
"driver": "image_guest_pull",
"driver_options": [
"image_guest_pull"{
"metadata":{
"containerd.io/snapshot/cri.layer-digest": "sha256:24fb2886d6f6c5d16481dd7608b47e78a8e92a13d6e64d87d57cb16d5f766d63",
"containerd.io/snapshot/nydus-proxy-mode": "true",
"io.katacontainers.pkg.oci.bundle_path": "/run/containerd/io.containerd.runtime.v2.task/k8s.io/cb0b47276ea66ee9f44cc53afa94d7980b57a52c3f306f68cb034e58d9fbd3c6",
"io.katacontainers.pkg.oci.container_type": "pod_container",
"io.kubernetes.cri.container-name": "coco-container",
"io.kubernetes.cri.container-type": "container",
"io.kubernetes.cri.image-name": "quay.io/kata-containers/confidential-containers:unsigned",
"io.kubernetes.cri.sandbox-id":"7a0d058477e280604ae02de6a016959e8a05fcd3165c47af41eabcf205b55517",
"io.kubernetes.cri.sandbox-name": "coco-pod","io.kubernetes.cri.sandbox-namespace": "default",
"io.kubernetes.cri.sandbox-uid": "de7c6a0c-79c0-44dc-a099-69bb39f180af",
}
}
],
"source": "quay.io/kata-containers/confidential-containers:unsigned",
"fstype": "overlay",
"options": [],
"mount_point": "/run/kata-containers/cb0b47276ea66ee9f44cc53afa94d7980b57a52c3f306f68cb034e58d9fbd3c6/rootfs",
```
Next, the kata-agent's RPC module will handle the create container request which, among other things, involves adding storages to the sandbox. The storage module contains implementations of `StorageHandler` interface for various storage types, being the `ImagePullHandler` in charge of handling the storage object for the container image (the storage manager instantiates the handler based on the value of the "driver").
`ImagePullHandler` delegates the image pulling operation to the `ImageService.pull_image()` that is going to create the image's bundle directory on the guest filesystem and, in turn, class the image-rs to in fact fetch and uncompress the image's bundle.
> **Notes:**
> In this flow, `ImageService.pull_image()` parses the image metadata, looking for either the `io.kubernetes.cri.container-type: sandbox` or `io.kubernetes.cri-o.ContainerType: sandbox` (CRI-IO case) annotation, then it never calls the `image-rs.pull_image()` because the pause image is expected to already be inside the guest's filesystem, so instead `ImageService.unpack_pause_image()` is called.
References:
[1] [[RFC] Image management proposal for hosting sharing and peer pods](https://github.com/confidential-containers/confidential-containers/issues/137)
[2] https://github.com/containerd/containerd/blob/main/docs/content-flow.md

View File

@ -51,4 +51,5 @@
- [How to run Kata Containers with IBM Secure Execution](how-to-run-kata-containers-with-SE-VMs.md)
- [How to use EROFS to build rootfs in Kata Containers](how-to-use-erofs-build-rootfs.md)
- [How to run Kata Containers with kinds of Block Volumes](how-to-run-kata-containers-with-kinds-of-Block-Volumes.md)
- [How to use the Kata Agent Policy](how-to-use-the-kata-agent-policy.md)
- [How to use the Kata Agent Policy](how-to-use-the-kata-agent-policy.md)
- [How to pull images in the guest](how-to-pull-images-in-guest-with-kata.md)

View File

@ -0,0 +1,180 @@
# Kata Containers with Guest Image Management
Kata Containers 3.3.0 introduces the guest image management feature, which enables the guest VM to directly pull images using `nydus snapshotter`. This feature is designed to protect the integrity of container images and guard against any tampering by the host, which is used for confidential containers. Please refer to [kata-guest-image-management-design](../design/kata-guest-image-management-design.md) for details.
## Prerequisites
- The k8s cluster with Kata 3.3.0+ is ready to use.
- `yq` is installed in the host and it's directory is included in the `PATH` environment variable. (optional, for DaemonSet only)
## Deploy `nydus snapshotter` for guest image management
To pull images in the guest, we need to do the following steps:
1. Delete images used for pulling in the guest (optional, for containerd only)
2. Install `nydus snapshotter`:
1. Install `nydus snapshotter` by k8s DaemonSet (recommended)
2. Install `nydus snapshotter` manually
### Delete images used for pulling in the guest
Though the `CRI Runtime Specific Snapshotter` is still an [experimental feature](https://github.com/containerd/containerd/blob/main/RELEASES.md#experimental-features) in containerd, which containerd is not supported to manage the same image in different `snapshotters`(The default `snapshotter` in containerd is `overlayfs`). To avoid errors caused by this, it is recommended to delete images (including the pause image) in containerd that needs to be pulled in guest later before configuring `nydus snapshotter` in containerd.
### Install `nydus snapshotter`
#### Install `nydus snapshotter` by k8s DaemonSet (recommended)
To use DaemonSet to install `nydus snapshotter`, we need to ensure that `yq` exists in the host.
1. Download `nydus snapshotter` repo
```bash
$ nydus_snapshotter_install_dir="/tmp/nydus-snapshotter"
$ nydus_snapshotter_url=https://github.com/containerd/nydus-snapshotter
$ nydus_snapshotter_version="v0.13.11"
$ git clone -b "${nydus_snapshotter_version}" "${nydus_snapshotter_url}" "${nydus_snapshotter_install_dir}"
```
2. Configure DaemonSet file
```bash
$ pushd "$nydus_snapshotter_install_dir"
$ yq write -i \
> misc/snapshotter/base/nydus-snapshotter.yaml \
> 'data.FS_DRIVER' \
> "proxy" --style=double
# Disable to read snapshotter config from configmap
$ yq write -i \
> misc/snapshotter/base/nydus-snapshotter.yaml \
> 'data.ENABLE_CONFIG_FROM_VOLUME' \
> "false" --style=double
# Enable to run snapshotter as a systemd service
# (skip if you want to run nydus snapshotter as a standalone process)
$ yq write -i \
> misc/snapshotter/base/nydus-snapshotter.yaml \
> 'data.ENABLE_SYSTEMD_SERVICE' \
> "true" --style=double
# Enable "runtime specific snapshotter" feature in containerd when configuring containerd for snapshotter
# (skip if you want to configure nydus snapshotter as a global snapshotter in containerd)
$ yq write -i \
> misc/snapshotter/base/nydus-snapshotter.yaml \
> 'data.ENABLE_RUNTIME_SPECIFIC_SNAPSHOTTER' \
> "true" --style=double
```
3. Install `nydus snapshotter` as a DaemonSet
```bash
$ kubectl create -f "misc/snapshotter/nydus-snapshotter-rbac.yaml"
$ kubectl apply -f "misc/snapshotter/base/nydus-snapshotter.yaml"
```
4. Wait 5 minutes until the DaemonSet is running
```bash
$ kubectl rollout status DaemonSet nydus-snapshotter -n nydus-system --timeout 5m
```
5. Verify whether `nydus snapshotter` is running as a DaemonSet
```bash
$ pods_name=$(kubectl get pods --selector=app=nydus-snapshotter -n nydus-system -o=jsonpath='{.items[*].metadata.name}')
$ kubectl logs "${pods_name}" -n nydus-system
deploying snapshotter
install nydus snapshotter artifacts
configuring snapshotter
Not found nydus proxy plugin!
running snapshotter as systemd service
Created symlink /etc/systemd/system/multi-user.target.wants/nydus-snapshotter.service → /etc/systemd/system/nydus-snapshotter.service.
```
#### Install `nydus snapshotter` manually
1. Download `nydus snapshotter` binary from release
```bash
$ ARCH=$(uname -m)
$ golang_arch=$(case "$ARCH" in
aarch64) echo "arm64" ;;
ppc64le) echo "ppc64le" ;;
x86_64) echo "amd64" ;;
s390x) echo "s390x" ;;
esac)
$ release_tarball="nydus-snapshotter-${nydus_snapshotter_version}-linux-${golang_arch}.tar.gz"
$ curl -OL ${nydus_snapshotter_url}/releases/download/${nydus_snapshotter_version}/${release_tarball}
$ sudo tar -xfz ${release_tarball} -C /usr/local/bin --strip-components=1
```
2. Download `nydus snapshotter` configuration file for pulling images in the guest
```bash
$ curl -OL https://github.com/containerd/nydus-snapshotter/blob/main/misc/snapshotter/config-proxy.toml
$ sudo install -D -m 644 config-proxy.toml /etc/nydus/config-proxy.toml
```
3. Run `nydus snapshotter` as a standalone process
```bash
$ /usr/local/bin/containerd-nydus-grpc --config /etc/nydus/config-proxy.toml --log-to-stdout
level=info msg="Start nydus-snapshotter. Version: v0.13.11-308-g106a6cb, PID: 1100169, FsDriver: proxy, DaemonMode: none"
level=info msg="Run daemons monitor..."
```
4. Configure containerd for `nydus snapshotter`
Configure `nydus snapshotter` to enable `CRI Runtime Specific Snapshotter` in containerd. This ensures run kata containers with `nydus snapshotter`. Below, the steps are illustrated using `kata-qemu` as an example.
```toml
# Modify containerd configuration to ensure that the following lines appear in the containerd configuration
# (Assume that the containerd config is located in /etc/containerd/config.toml)
[plugins."io.containerd.grpc.v1.cri".containerd]
disable_snapshot_annotations = false
discard_unpacked_layers = false
[proxy_plugins.nydus]
type = "snapshot"
address = "/run/containerd-nydus/containerd-nydus-grpc.sock"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-qemu]
snapshotter = "nydus"
```
> **Notes:**
> The `CRI Runtime Specific Snapshotter` feature only works for containerd v1.7.0 and above. So for Containerd v1.7.0 below, in addition to the above settings, we need to set the global `snapshotter` to `nydus` in containerd config. For example:
```toml
[plugins."io.containerd.grpc.v1.cri".containerd]
snapshotter = "nydus"
```
5. Restart containerd service
```bash
$ sudo systemctl restart containerd
```
## Verification
To verify pulling images in a guest VM, please refer to the following commands:
1. Run a kata container
```bash
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: busybox
annotations:
io.containerd.cri.runtime-handler: kata-qemu
spec:
runtimeClassName: kata-qemu
containers:
- name: busybox
image: quay.io/prometheus/busybox:latest
imagePullPolicy: Always
EOF
pod/busybox created
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
busybox 1/1 Running 0 10s
```
> **Notes:**
> The `CRI Runtime Specific Snapshotter` is still an experimental feature. To pull images in the guest under the specific kata runtime (such as `kata-qemu`), we need to add the following annotation in metadata to each pod yaml: `io.containerd.cri.runtime-handler: kata-qemu`. By adding the annotation, we can ensure that the feature works as expected.
2. Verify that the pod's images have been successfully downloaded in the guest.
If images intended for deployment are deleted prior to deploying with `nydus snapshotter`, the root filesystems required for the pod's images (including the pause image and the container image) should not be present on the host.
```bash
$ sandbox_id=$(ps -ef| grep containerd-shim-kata-v2| grep -oP '(?<=-id\s)[a-f0-9]+'| tail -1)
$ rootfs_count=$(find /run/kata-containers/shared/sandboxes/$sandbox_id -name rootfs -type d| grep -o "rootfs" | wc -l)
$ echo $rootfs_count
0
```