mirror of
https://github.com/kata-containers/kata-containers.git
synced 2025-04-27 19:35:32 +00:00
docs: add FAQ to the report
1.provide answers for the qeustions will be frequently asked 2.format the document Fixes:#4193 Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com> Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
This commit is contained in:
parent
fffcb81652
commit
026aaeeccc
@ -4,8 +4,8 @@ In cloud-native scenarios, there is an increased demand for container startup sp
|
||||
|
||||
Also, we provide the following designs:
|
||||
|
||||
- Turn key solution with builtin Dragonball Sandbox
|
||||
- Async io to reduce resource consumption
|
||||
- Turn key solution with builtin `Dragonball` Sandbox
|
||||
- Async I/O to reduce resource consumption
|
||||
- Extensible framework for multiple services, runtimes and hypervisors
|
||||
- Lifecycle management for sandbox and container associated resources
|
||||
|
||||
@ -31,7 +31,7 @@ with the Rust-based implementation.
|
||||

|
||||
As shown in the figure, runtime and VMM are separate processes. The runtime process forks the VMM process and interacts through the inter-process RPC. Typically, process interaction consumes more resources than peers within the process, and it will result in relatively low efficiency. At the same time, the cost of resource operation and maintenance should be considered. For example, when performing resource recovery under abnormal conditions, the exception of any process must be detected by others and activate the appropriate resource recovery process. If there are additional processes, the recovery becomes even more difficult.
|
||||
#### How To Support Built-in VMM
|
||||
We provide Dragonball Sandbox to enable built-in VMM by integrating VMM's function into the Rust library. We could perform VMM-related functionalities by using the library. Because runtime and VMM are in the same process, there is a benefit in terms of message processing speed and API synchronization. It can also guarantee the consistency of the runtime and the VMM life cycle, reducing resource recovery and exception handling maintenance, as shown in the figure:
|
||||
We provide `Dragonball` Sandbox to enable built-in VMM by integrating VMM's function into the Rust library. We could perform VMM-related functionalities by using the library. Because runtime and VMM are in the same process, there is a benefit in terms of message processing speed and API synchronization. It can also guarantee the consistency of the runtime and the VMM life cycle, reducing resource recovery and exception handling maintenance, as shown in the figure:
|
||||

|
||||
### Async Support
|
||||
#### Why Need Async
|
||||
@ -48,11 +48,11 @@ We provide Dragonball Sandbox to enable built-in VMM by integrating VMM's functi
|
||||
- Add 3 I/O threads with a new container
|
||||
- In Sync mode, implementing a timeout mechanism is challenging. For example, in TTRPC API interaction, the timeout mechanism is difficult to align with Golang
|
||||
#### How To Support Async
|
||||
The kata-runtime is controlled by TOKIO_RUNTIME_WORKER_THREADS to run the OS thread, which is 2 threads by default. For TTRPC and container-related threads run in the tokio thread in a unified manner, and related dependencies need to be switched to Async, such as Timer, File, Netlink, etc. With the help of Async, we can easily support no-block io and timer. Currently, we only utilize Async for kata-runtime. The built-in VMM keeps the OS thread because it can ensure that the threads are controllable.
|
||||
The kata-runtime is controlled by TOKIO_RUNTIME_WORKER_THREADS to run the OS thread, which is 2 threads by default. For TTRPC and container-related threads run in the `tokio` thread in a unified manner, and related dependencies need to be switched to Async, such as Timer, File, Netlink, etc. With the help of Async, we can easily support no-block I/O and timer. Currently, we only utilize Async for kata-runtime. The built-in VMM keeps the OS thread because it can ensure that the threads are controllable.
|
||||
|
||||
**For N tokio worker threads and M containers**
|
||||
|
||||
- Sync runtime(both OS thread and tokio task are OS thread but without tokio worker thread) OS thread number: 4 + 12*M
|
||||
- Sync runtime(both OS thread and `tokio` task are OS thread but without `tokio` worker thread) OS thread number: 4 + 12*M
|
||||
- Async runtime(only OS thread is OS thread) OS thread number: 2 + N
|
||||
```shell
|
||||
├─ main(OS thread)
|
||||
@ -72,38 +72,78 @@ The Kata 3.x runtime is designed with the extension of service, runtime, and hyp
|
||||
|
||||

|
||||
### Resource Manager
|
||||
In our case, there will be a variety of resources, and every resource has several subtypes. Especially for Virt-Container, every subtype of resource has different operations. And there may be dependencies, such as the share-fs rootfs and the share-fs volume will use share-fs resources to share files to the VM. Currently, network and share-fs are regarded as sandbox resources, while rootfs, volume, and cgroup are regarded as container resources. Also, we abstract a common interface for each resource and use subclass operations to evaluate the differences between different subtypes.
|
||||
In our case, there will be a variety of resources, and every resource has several subtypes. Especially for `Virt-Container`, every subtype of resource has different operations. And there may be dependencies, such as the share-fs rootfs and the share-fs volume will use share-fs resources to share files to the VM. Currently, network and share-fs are regarded as sandbox resources, while rootfs, volume, and cgroup are regarded as container resources. Also, we abstract a common interface for each resource and use subclass operations to evaluate the differences between different subtypes.
|
||||

|
||||
|
||||
## Roadmap
|
||||
|
||||
- Stage 1: provide basic features(current delivered)
|
||||
- Stage 2: support common features
|
||||
- Stage 1 (June): provide basic features (current delivered)
|
||||
- Stage 2 (September): support common features
|
||||
- Stage 3: support full features
|
||||
|
||||
| **Class** | **Sub-Class** | **Development Stage** |
|
||||
| --- | --- | --- |
|
||||
| service | task service | Stage 1 |
|
||||
| | extend service | Stage 3 |
|
||||
| | image service | Stage 3 |
|
||||
| runtime handler | Virt-Container | Stage 1 |
|
||||
| | Wasm-Container | Stage 3 |
|
||||
| | Linux-Container | Stage 3 |
|
||||
| Endpoint | Veth Endpoint | Stage 1 |
|
||||
| | Physical Endpoint | Stage 2 |
|
||||
| | Tap Endpoint | Stage 2 |
|
||||
| | Tuntap Endpoint | Stage 2 |
|
||||
| | IPVlan Endpoint | Stage 3 |
|
||||
| | MacVlan Endpoint | Stage 3 |
|
||||
| | MacVtap Endpoint | Stage 3 |
|
||||
| | VhostUserEndpoint | Stage 3 |
|
||||
| Network Interworking Model | Tc filter | Stage 1 |
|
||||
| | Route | Stage 1 |
|
||||
| | MacVtap | Stage 3 |
|
||||
| Storage | virtiofs | Stage 1 |
|
||||
| | nydus | Stage 2 |
|
||||
| hypervisor | Dragonball | Stage 1 |
|
||||
| | Qemu | Stage 2 |
|
||||
| | Acrn | Stage 3 |
|
||||
| | CloudHypervisor | Stage 3 |
|
||||
| | Firecracker | Stage 3 |
|
||||
| **Class** | **Sub-Class** | **Development Stage** |
|
||||
| -------------------------- | ------------------- | --------------------- |
|
||||
| Service | task service | Stage 1 |
|
||||
| | extend service | Stage 3 |
|
||||
| | image service | Stage 3 |
|
||||
| Runtime handler | `Virt-Container` | Stage 1 |
|
||||
| | `Wasm-Container` | Stage 3 |
|
||||
| | `Linux-Container` | Stage 3 |
|
||||
| Endpoint | VETH Endpoint | Stage 1 |
|
||||
| | Physical Endpoint | Stage 2 |
|
||||
| | Tap Endpoint | Stage 2 |
|
||||
| | `Tuntap` Endpoint | Stage 2 |
|
||||
| | `IPVlan` Endpoint | Stage 3 |
|
||||
| | `MacVlan` Endpoint | Stage 3 |
|
||||
| | MACVTAP Endpoint | Stage 3 |
|
||||
| | `VhostUserEndpoint` | Stage 3 |
|
||||
| Network Interworking Model | Tc filter | Stage 1 |
|
||||
| | Route | Stage 1 |
|
||||
| | `MacVtap` | Stage 3 |
|
||||
| Storage | Virtio-fs | Stage 1 |
|
||||
| | `nydus` | Stage 2 |
|
||||
| Hypervisor | `Dragonball` | Stage 1 |
|
||||
| | QEMU | Stage 2 |
|
||||
| | ACRN | Stage 3 |
|
||||
| | Cloud Hypervisor | Stage 3 |
|
||||
| | Firecracker | Stage 3 |
|
||||
|
||||
## FAQ
|
||||
|
||||
- Are the "service", "message dispatcher" and "runtime handler" all part of the single Kata 3.x runtime binary?
|
||||
|
||||
Yes. They are components in Kata 3.x runtime. And they will be packed into one binary.
|
||||
1. Service is an interface, which is responsible for handling multiple services like task service, image service and etc.
|
||||
2. Message dispatcher, it is used to match multiple requests from the service module.
|
||||
3. Runtime handler is used to deal with the operation for sandbox and container.
|
||||
- What is the name of the Kata 3.x runtime binary?
|
||||
|
||||
Apparently we can't use `containerd-shim-v2-kata` because it's already used. We are facing the hardest issue of "naming" again. Any suggestions are welcomed.
|
||||
Internally we use `containerd-shim-v2-rund`.
|
||||
|
||||
- Is the Kata 3.x design compatible with the containerd shimv2 architecture?
|
||||
|
||||
Yes. It is designed to follow the functionality of go version kata. And it implements the `containerd shim v2` interface/protocol.
|
||||
|
||||
- How will users migrate to the Kata 3.x architecture?
|
||||
|
||||
The migration plan will be provided before the Kata 3.x is merging into the main branch.
|
||||
|
||||
- Is `Dragonball` limited to its own built-in VMM? Can the `Dragonball` system be configured to work using an external `Dragonball` VMM/hypervisor?
|
||||
|
||||
The `Dragonball` could work as an external hypervisor. However, stability and performance is challenging in this case. Built in VMM could optimise the container overhead, and it's easy to maintain stability.
|
||||
|
||||
`runD` is the `containerd-shim-v2` counterpart of `runC` and can run a pod/containers. `Dragonball` is a `microvm`/VMM that is designed to run container workloads. Instead of microvm/VMM, we sometimes refer to it as secure sandbox.
|
||||
|
||||
- QEMU, Cloud Hypervisor and Firecracker support are planned, but how that would work. Are they working in separate process?
|
||||
|
||||
Yes. They are unable to work as built in VMM.
|
||||
|
||||
- What is `upcall`?
|
||||
|
||||
`Dbs-upcall` is a `vsock-based` direct communication tool between VMM and guests. The server side of the `upcall` is a driver in guest kernel (kernel patches are needed for this feature) and it'll start to serve the requests once the kernel has started. And the client side is in VMM , it'll be a thread that communicates with VSOCK through `uds`. We have accomplished device hotplug / hot-unplug directly through `upcall` in order to avoid virtualization of ACPI to minimize virtual machine's overhead. And there could be many other usage through this direct communication channel. It's already open source.
|
||||
https://github.com/openanolis/dragonball-sandbox/tree/main/crates/dbs-upcall
|
||||
|
||||
- What is the security boundary for the monolithic / "Built-in VMM" case?
|
||||
|
||||
It has the security boundary of virtualization. More details will be provided in next stage.
|
Loading…
Reference in New Issue
Block a user