mirror of
				https://github.com/linuxkit/linuxkit.git
				synced 2025-10-31 12:55:59 +00:00 
			
		
		
		
	
		
			
				
	
	
	
		
			5.3 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	
			5.3 KiB
		
	
	
	
	
	
	
	
Unikernel System Containers
General Architecture
               |=================|                      |================|
               |       priv      |                      |       calf     |
               |=================|                      |================|
               |                 |                      |                |
<--  eth0 ---> |    BPF rules    | <--- network IO ---> |   type-safe    |
               |                 |      (data path)     | network stack  |
               |                 |                      |                |
               |-----------------|                      |----------------|
               |                 |                      |                |
<-- logs ----- |                 | <------- logs ------ |   type-safe    |
               |                 |                      | protocol logic |
<-- metrics -- |                 | <----- metrics ----- |                |
               |                 |                      |                |
               |-----------------|                      |----------------|
               |                 |                      |                |
<-- audit ---  |  config store   | <----- KV store ---> |  config store  |
   diagnostic  |     daemon      |     (control path)   |     client     |
               |                 |                      |                |
               |_________________|                      |________________|
               |                 |
<-- sycalls -- |                 |
               |                 |
               | system handlers |
<-- config --- |                 |
    files      |                 |
               |_________________|
Priv: privileged system service
- run in a privileged container (but can have limited capabilities + seccomp)
- can read all network traffic
- can set-up (e)BPF rules
- exposes an easily auditable KV store for configuration values
- has a set of system handlers who watches for changes in the KV store and perform privileged operations inside moby (syscalls, edit of global config files, etc)
Calf: sandboxed system service
- run in a fully isolated container
- full sandbox (initially a normal Unix process, later on unielf/wasm)
- has a type-safe network stack to handle network IO
- has type-safe business logic to process network IO
- has a limited access read and write access to the config store where the result of the business logic is output
DHCP client
Priv
- 
The privileged system service forwards DHCP traffic in both directions and block all other traffic. This is ensured by setting up BPF filters on the network interface. 
- 
The privileged system service initialize the calf by opening the file descriptors for the control and data paths and calling runc.
- 
The privileged system service exposes a simple KV store to the calf, using the following keys: # read-only, set on startup by the priv /mac # write-only, set by the calf when it gots a lease /ip /gateway /mtu /domain /search /nameserver/001 ... /nameserver/xxxThe the KV store API is defined in term of cap-n-proto prototype: @0x9e83562906de8259; struct Request { id @0 :Int32; path @1 :List(Text); union { write @2 :Data; read @3 :Void; delete @4 :Void; } } struct Response { id @0: Int32; union { ok @1 :Data; error @2 :Data; } }
- 
The privileged system service installs the following system handlers: - if /ip change -> bring up the default interface and set IP address (done)
- if /gateway change -> set up route (done)
- if /domain change -> set moby domain name (todo)
- if /search -> set search domain on moby host (todo)
- if /nameserver/xxx -> set DNS servers on moby (todo)
 
- 
The privileged system service updates configuration files: - /ect/resolv.conf (todo)
 
Calf
- The sandboxed system service is a MirageOS unikernel using charrua-core.
- The sandboxed system service reads the DHCP network traffic from an already opened file descriptor.
- The sandboxed system service reads and sets the control state using and already opened file descriptor,
SDK
What the SDK should enable:
- easily write a new calfs initially in OCaml, then Rust. Probably not very useful on its own.
- easily write a new shim by providing the basic blocks: eBPF scripts, calf runner, KV store, system handlers. Initially could be a standalone blob, but should aim for independant and re-usable pieces that could run in a container.
- (later) generate shim/caft containers from a single (API?) description.
See ./src/sdk for the current state of the SDK.
Roadmap
first PoC: DHCP client
TODO
- better system handler using language bindings instead of shelling out to ifconfig
- use seccomp to isolate the privileged container
- use mtu, domain, nameservers parameters
- generate resolv.conf
- add metrics aggregation (using prometheus)
- better logging aggregation (using syslog)
- IPv6 support
- tests, tests, tests (especially against non compliant RFC servers)
Second iteration: NTP
TODO