mirror of
				https://github.com/linuxkit/linuxkit.git
				synced 2025-10-31 04:34:04 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			279 lines
		
	
	
		
			14 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			279 lines
		
	
	
		
			14 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| # Configuration Reference
 | |
| 
 | |
| The `linuxkit build` command assembles a set of containerised components into in image. The simplest
 | |
| type of image is just a `tar` file of the contents (useful for debugging) but more useful
 | |
| outputs add a `Dockerfile` to build a container, or build a full disk image that can be
 | |
| booted as a linuxKit VM. The main use case is to build an assembly that includes
 | |
| `containerd` to run a set of containers, but the tooling is very generic.
 | |
| 
 | |
| The yaml configuration specifies the components used to build up an image . All components
 | |
| are downloaded at build time to create an image. The image is self-contained and immutable,
 | |
| so it can be tested reliably for continuous delivery.
 | |
| 
 | |
| Components are specified as Docker images which are pulled from a registry during build if they
 | |
| are not available locally. The Docker images are optionally verified with Docker Content Trust.
 | |
| For private registries or private repositories on a registry credentials provided via
 | |
| `docker login` are re-used. 
 | |
| 
 | |
| The configuration file is processed in the order `kernel`, `init`, `onboot`, `onshutdown`,
 | |
| `services`, `files`. Each section adds files to the root file system. Sections may be omitted.
 | |
| 
 | |
| Each container that is specified is allocated a unique `uid` and `gid` that it may use if it
 | |
| wishes to run as an isolated user (or user namespace). Anywhere you specify a `uid` or `gid`
 | |
| field you specify either the numeric id, or if you use a name it will refer to the id allocated
 | |
| to the container with that name.
 | |
| 
 | |
| ```
 | |
| services:
 | |
|   - name: redis
 | |
|     image: redis:latest
 | |
|     uid: redis
 | |
|     gid: redis
 | |
|     binds:
 | |
|      - /etc/redis:/etc/redis
 | |
| files:
 | |
|   - path: /etc/redis/redis.conf
 | |
|     contents: "..."
 | |
|     uid: redis
 | |
|     gid: redis
 | |
|     mode: "0600"
 | |
| ```
 | |
| 
 | |
| ## `kernel`
 | |
| 
 | |
| The `kernel` section is only required if booting a VM. The files will be put into the `boot/`
 | |
| directory, where they are used to build bootable images.
 | |
| 
 | |
| The `kernel` section defines the kernel configuration. The `image` field specifies the Docker image,
 | |
| which should contain a `kernel` file that will be booted (eg a `bzImage` for `amd64`) and a file
 | |
| called `kernel.tar` which is a tarball that is unpacked into the root, which should usually
 | |
| contain a kernel modules directory. `cmdline` specifies the kernel command line options if required.
 | |
| 
 | |
| To override the names, you can specify the kernel image name with `binary: bzImage` and the tar image
 | |
| with `tar: kernel.tar` or the empty string or `none` if you do not want to use a tarball at all.
 | |
| 
 | |
| Kernel packages may also contain a cpio archive containing CPU microcode which needs prepending to
 | |
| the initrd. To select this option, recommended when booting on bare metal, add `ucode: intel-ucode.cpio`
 | |
| to the kernel section.
 | |
| 
 | |
| ## `init`
 | |
| 
 | |
| The `init` section is a list of images that are used for the `init` system and are unpacked directly
 | |
| into the root filesystem. This should bring up `containerd`, start the system and daemon containers,
 | |
| and set up basic filesystem mounts. in the case of a LinuxKit system. For ease of
 | |
| modification `runc` and `containerd` images, which just contain these programs are added here
 | |
| rather than bundled into the `init` container.
 | |
| 
 | |
| ## `onboot`
 | |
| 
 | |
| The `onboot` section is a list of images. These images are run before any other
 | |
| images. They are run sequentially and each must exit before the next one is run.
 | |
| These images can be used to configure one shot settings. See [Image
 | |
| specification](#image-specification) for a list of supported fields.
 | |
| 
 | |
| ## `onshutdown`
 | |
| 
 | |
| This is a list of images to run on a clean shutdown. Note that you must not rely on these
 | |
| being run at all, as machines may be be powered off or shut down without having time to run
 | |
| these scripts. If you add anything here you should test both in the case where they are
 | |
| run and when they are not. Most systems are likely to be "crash only" and not have any setup here,
 | |
| but you can attempt to deregister cleanly from a network service here, rather than relying
 | |
| on timeouts, for example.
 | |
| 
 | |
| ## `services`
 | |
| 
 | |
| The `services` section is a list of images for long running services which are
 | |
| run with `containerd`.  Startup order is undefined, so containers should wait
 | |
| on any resources, such as networking, that they need.  See [Image
 | |
| specification](#image-specification) for a list of supported fields.
 | |
| 
 | |
| ## `files`
 | |
| 
 | |
| The files section can be used to add files inline in the config, or from an external file.
 | |
| 
 | |
| ```
 | |
| files:
 | |
|   - path: dir
 | |
|     directory: true
 | |
|     mode: "0777"
 | |
|   - path: dir/name1
 | |
|     source: "/some/path/on/local/filesystem"
 | |
|     mode: "0666"
 | |
|   - path: dir/name2
 | |
|     source: "/some/path/that/it/is/ok/to/omit"
 | |
|     optional: true
 | |
|     mode: "0666"
 | |
|   - path: dir/name3
 | |
|     contents: "orange"
 | |
|     mode: "0644"
 | |
|     uid: 100
 | |
|     gid: 100
 | |
| ```
 | |
| 
 | |
| Specifying the `mode` is optional, and will default to `0600`. Leading directories will be
 | |
| created if not specified. You can use `~/path` in `source` to specify a path in the build
 | |
| user's home directory.
 | |
| 
 | |
| In addition there is a `metadata` option that will generate the file. Currently the only value
 | |
| supported here is `"yaml"` which will output the yaml used to generate the image into the specified
 | |
| file:
 | |
| ```
 | |
|   - path: etc/linuxkit.yml
 | |
|     metadata: yaml
 | |
| ```
 | |
| 
 | |
| Because a `tmpfs` is mounted onto `/var`, `/run`, and `/tmp` by default, the `tmpfs` mounts will shadow anything specified in `files` section for those directories.
 | |
| 
 | |
| ## `trust`
 | |
| 
 | |
| The `trust` section specifies which build components are to be cryptographically verified with
 | |
| [Docker Content Trust](https://docs.docker.com/engine/security/trust/content_trust/) prior to pulling.
 | |
| Trust is a central concern in any build system, and LinuxKit's is no exception: Docker Content Trust provides authenticity,
 | |
| integrity, and freshness guarantees for the components it verifies.  The LinuxKit maintainers are responsible for signing
 | |
| `linuxkit` components, though collaborators can sign their own images with Docker Content Trust or [Notary](https://github.com/docker/notary).
 | |
| 
 | |
| - `image` lists which individual images to enforce pulling with Docker Content Trust.
 | |
| The image name may include tag or digest, but the matching also succeeds if the base image name is the same.
 | |
| - `org` lists which organizations for which Docker Content Trust is to be enforced across all images,
 | |
| for example `linuxkit` is the org for `linuxkit/kernel`
 | |
| 
 | |
| ## Image specification
 | |
| 
 | |
| Entries in the `onboot` and `services` sections specify an OCI image and
 | |
| options. Default values may be specified using the `org.mobyproject.config` image label.
 | |
| For more details see the [OCI specification](https://github.com/opencontainers/runtime-spec/blob/master/spec.md).
 | |
| 
 | |
| If the `org.mobylinux.config` label is set in the image, that specifies default values for these fields if they
 | |
| are not set in the yaml file. You can override the label by setting the value, or setting it to be empty to remove
 | |
| the specification for that value in the label.
 | |
| 
 | |
| If you need an OCI option that is not specified here please open an issue or pull request as the list is not yet
 | |
| complete.
 | |
| 
 | |
| By default the containers will be run in the host `net`, `ipc` and `uts` namespaces, as that is the usual requirement;
 | |
| in many ways they behave like pods in Kubernetes. Mount points must already exist, as must a file or directory being
 | |
| bind mounted into a container.
 | |
| 
 | |
| - `name` a unique name for the program being executed, used as the `containerd` id.
 | |
| - `image` the Docker image to use for the root filesystem. The default command, path and environment are
 | |
|   extracted from this so they need not be filled in.
 | |
| - `capabilities` the Linux capabilities required, for example `CAP_SYS_ADMIN`. If there is a single
 | |
|   capability `all` then all capabilities are added.
 | |
| - `ambient` the Linux ambient capabilities (capabilities passed to non root users) that are required.
 | |
| - `mounts` is the full form for specifying a mount, which requires `type`, `source`, `destination`
 | |
|   and a list of `options`. If any fields are omitted, sensible defaults are used if possible, for example
 | |
|   if the `type` is `dev` it is assumed you want to mount at `/dev`. The default mounts and their options
 | |
|   can be replaced by specifying a mount with new options here at the same mount point.
 | |
| - `binds` is a simpler interface to specify bind mounts, accepting a string like `/src:/dest:opt1,opt2`
 | |
|   similar to the `-v` option for bind mounts in Docker.
 | |
| - `tmpfs` is a simpler interface to mount a `tmpfs`, like `--tmpfs` in Docker, taking `/dest:opt1,opt2`.
 | |
| - `command` will override the command and entrypoint in the image with a new list of commands.
 | |
| - `env` will override the environment in the image with a new environment list. Specify variables as `VAR=value`.
 | |
| - `cwd` will set the working directory, defaults to `/`.
 | |
| - `net` sets the network namespace, either to a path, or if `none` or `new` is specified it will use a new namespace.
 | |
| - `ipc` sets the ipc namespace, either to a path, or if `new` is specified it will use a new namespace.
 | |
| - `uts` sets the uts namespace, either to a path, or if `new` is specified it will use a new namespace.
 | |
| - `pid` sets the pid namespace, either to a path, or if `host` is specified it will use the host namespace.
 | |
| - `readonly` sets the root filesystem to read only, and changes the other default filesystems to read only.
 | |
| - `maskedPaths` sets paths which should be hidden.
 | |
| - `readonlyPaths` sets paths to read only.
 | |
| - `uid` sets the user id of the process.
 | |
| - `gid` sets the group id of the process.
 | |
| - `additionalGids` sets a list of additional groups for the process.
 | |
| - `noNewPrivileges` is `true` means no additional capabilities can be acquired and `suid` binaries do not work.
 | |
| - `hostname` sets the hostname inside the image.
 | |
| - `oomScoreAdj` changes the OOM score.
 | |
| - `rootfsPropagation` sets the rootfs propagation, eg `shared`, `slave` or (default) `private`.
 | |
| - `cgroupsPath` sets the path for cgroups.
 | |
| - `resources` sets cgroup resource limits as per the OCI spec.
 | |
| - `sysctl` sets a map of `sysctl` key value pairs that are set inside the container namespace.
 | |
| - `rmlimits` sets a list of `rlimit` values in the form `name,soft,hard`, eg `nofile,100,200`. You can use `unlimited` as a value too.
 | |
| - `annotations` sets a map of key value pairs as OCI metadata.
 | |
| 
 | |
| There are experimental `userns`, `uidMappings` and `gidMappings` options for user namespaces but these are not yet supported, and may have
 | |
| permissions issues in use.
 | |
| 
 | |
| In addition to the parts of the specification above used to generate the OCI spec, there is a `runtime` section in the image specification
 | |
| which specifies some actions to take place when the container is being started.
 | |
| - `cgroups` takes a list of cgroups that will be created before the container is run.
 | |
| - `mounts` takes a list of mount specifications (`source`, `destination`, `type`, `options`) and mounts them in the root namespace before the container is created. It will
 | |
|   try to make any missing destination directories.
 | |
| - `mkdir` takes a list of directories to create at runtime, in the root mount namespace. These are created before the container is started, so they can be used to create
 | |
|   directories for bind mounts, for example in `/tmp` or `/run` which would otherwise be empty.
 | |
| - `interface` defines a list of actions to perform on a network interface:
 | |
|   - `name` specifies the name of an interface. An existing interface with this name will be moved into the container's network namespace.
 | |
|   - `add` specifies a type of interface to be created in the containers namespace, with the specified name.
 | |
|   - `createInRoot` is a boolean which specifes that the interface being `add`ed should be created in the root namespace first, then moved. This is needed for `wireguard` interfaces.
 | |
|   - `peer` specifies the name of the other end when creating a `veth` interface. This end will remain in the root namespace, where it can be attached to a bridge. Specifying this implies `add: veth`.
 | |
| - `bindNS` specifies a namespace type and a path where the namespace from the container being created will be bound. This allows a namespace to be set up in an `onboot` container, and then
 | |
|   using `net: path` for a `service` container to use that network namespace later.
 | |
| - `namespace` overrides the LinuxKit default containerd namespace to put the container in; only applicable to services.
 | |
| 
 | |
| An example of using the `runtime` config to configure a network namespace with `wireguard` and then run `nginx` in that namespace is shown below:
 | |
| ```
 | |
| onboot:
 | |
|   - name: dhcpcd
 | |
|     image: linuxkit/dhcpcd:<hash>
 | |
|     command: ["/sbin/dhcpcd", "--nobackground", "-f", "/dhcpcd.conf", "-1"]
 | |
|   - name: wg
 | |
|     image: linuxkit/ip:<hash>
 | |
|     net: new
 | |
|     binds:
 | |
|       - /etc/wireguard:/etc/wireguard
 | |
|     command: ["sh", "-c", "ip link set dev wg0 up; ip address add dev wg0 192.168.2.1 peer 192.168.2.2; wg setconf wg0 /etc/wireguard/wg0.conf; wg show wg0"]
 | |
|     runtime:
 | |
|       interfaces:
 | |
|         - name: wg0
 | |
|           add: wireguard
 | |
|           createInRoot: true
 | |
|       bindNS:
 | |
|         net: /run/netns/wg
 | |
| services:
 | |
|   - name: nginx
 | |
|     image: nginx:alpine
 | |
|     net: /run/netns/wg
 | |
|     capabilities:
 | |
|      - CAP_NET_BIND_SERVICE
 | |
|      - CAP_CHOWN
 | |
|      - CAP_SETUID
 | |
|      - CAP_SETGID
 | |
|      - CAP_DAC_OVERRIDE
 | |
| ```
 | |
| 
 | |
| 
 | |
| ### Mount Options
 | |
| When mounting filesystem paths into a container - whether as part of `onboot` or `services` - there are several options of which you need to be aware. Using them properly is necessary for your containers to function properly.
 | |
| 
 | |
| For most containers - e.g. nginx or even docker - these options are not needed. Simply doing the following will work fine:
 | |
| 
 | |
| ```yml
 | |
| binds:
 | |
|  - /var:/some/var/path
 | |
| ```
 | |
| 
 | |
| Please note that `binds` doesn't **add** the mount points, but **replaces** them.
 | |
| You can examine the `Dockerfile` of the component (in particular, `binds` value of
 | |
| `org.mobyproject.config` label) to get the list of the existing binds.
 | |
| 
 | |
| However, in some circumstances you will need additional options. These options are used primarily if you intend to make changes to mount points _from within your container_ that should be visible from outside the container, e.g., if you intend to mount an external disk from inside the container but have it be visible outside.
 | |
| 
 | |
| In order for new mounts from within a container to be propagated, you must set the following on the container:
 | |
| 
 | |
| 1. `rootfsPropagation: shared`
 | |
| 2. The mount point into the container below which new mounts are to occur must be `rshared,rbind`. In practice, this is `/var` (or some subdir of `/var`), since that is the only true read-write area of the filesystem where you will mount things.
 | |
| 
 | |
| Thus, if you have a regular container that is only reading and writing, go ahead and do:
 | |
| 
 | |
| ```yml
 | |
| binds:
 | |
|  - /var:/some/var/path
 | |
| ```
 | |
| 
 | |
| On the other hand, if you have a container that will make new mounts that you wish to be visible outside the container, do:
 | |
| 
 | |
| ```yml
 | |
| binds:
 | |
|  - /var:/var:rshared,rbind
 | |
| rootfsPropagation: shared
 | |
| ```
 |