Merge moby/tool into LinuxKit

Note these ended up with unrelated histories in the export process. Signed-off-by: Justin Cormack <justin@specialbusservice.com>
2025-09-01 15:08:33 +00:00 · 2018-07-14 10:49:46 +01:00
parent 3e4a5342b2 b807994372
commit 021b5718f8
17 changed files with 4057 additions and 0 deletions
--- a/docs/privateimages.md
+++ b/docs/privateimages.md
@@ -0,0 +1,13 @@
+## Private Images
+When building, `moby` downloads, and optionally checks the notary signature, on any OCI images referenced in any section. 
+
+As of this writing, `moby` does **not** have the ability to download these images from registries that require credentials to access. This is equally true for private images on public registries, like https://hub.docker.com, as for private registries.
+
+We are working on enabling private images with credentials. Until such time as that feature is added, you can follow these steps to build a moby image using OCI images
+that require credentials to access:
+
+1. `docker login` as relevant to authenticate against the desired registry.
+2. `docker pull` to download the images to your local machine where you will run `moby build`.
+3. Run `moby build` (or `linuxkit build`).
+
+Additionally, ensure that you do **not** have trust enabled for those images. See the section on [trust](#trust) in this document. Alternately, you can run `moby build` or `linuxkit build` with `--disable-trust`.
--- a/docs/yaml.md
+++ b/docs/yaml.md
@@ -0,0 +1,273 @@
+# Configuration Reference
+
+The `moby` tool assembles a set of containerised components into in image. The simplest
+type of image is just a `tar` file of the contents (useful for debugging) but more useful
+outputs add a `Dockerfile` to build a container, or build a full disk image that can be
+booted as a linuxKit VM. The main use case is to build an assembly that includes
+`containerd` to run a set of containers, but the tooling is very generic.
+
+The yaml configuration specifies the components used to build up an image . All components
+are downloaded at build time to create an image. The image is self-contained and immutable,
+so it can be tested reliably for continuous delivery.
+
+The configuration file is processed in the order `kernel`, `init`, `onboot`, `onshutdown`,
+`services`, `files`. Each section adds files to the root file system. Sections may be omitted.
+
+Each container that is specified is allocated a unique `uid` and `gid` that it may use if it
+wishes to run as an isolated user (or user namespace). Anywhere you specify a `uid` or `gid`
+field you specify either the numeric id, or if you use a name it will refer to the id allocated
+to the container with that name.
+
+```
+services:
+  - name: redis
+    image: redis:latest
+    uid: redis
+    gid: redis
+    binds:
+     - /etc/redis:/etc/redis
+files:
+  - path: /etc/redis/redis.conf
+    contents: "..."
+    uid: redis
+    gid: redis
+    mode: "0600"
+```
+
+## `kernel`
+
+The `kernel` section is only required if booting a VM. The files will be put into the `boot/`
+directory, where they are used to build bootable images.
+
+The `kernel` section defines the kernel configuration. The `image` field specifies the Docker image,
+which should contain a `kernel` file that will be booted (eg a `bzImage` for `amd64`) and a file
+called `kernel.tar` which is a tarball that is unpacked into the root, which should usually
+contain a kernel modules directory. `cmdline` specifies the kernel command line options if required.
+
+To override the names, you can specify the kernel image name with `binary: bzImage` and the tar image
+with `tar: kernel.tar` or the empty string or `none` if you do not want to use a tarball at all.
+
+Kernel packages may also contain a cpio archive containing CPU microcode which needs prepending to
+the initrd. To select this option, recommended when booting on bare metal, add `ucode: intel-ucode.cpio`
+to the kernel section.
+
+## `init`
+
+The `init` section is a list of images that are used for the `init` system and are unpacked directly
+into the root filesystem. This should bring up `containerd`, start the system and daemon containers,
+and set up basic filesystem mounts. in the case of a LinuxKit system. For ease of
+modification `runc` and `containerd` images, which just contain these programs are added here
+rather than bundled into the `init` container.
+
+## `onboot`
+
+The `onboot` section is a list of images. These images are run before any other
+images. They are run sequentially and each must exit before the next one is run.
+These images can be used to configure one shot settings. See [Image
+specification](#image-specification) for a list of supported fields.
+
+## `onshutdown`
+
+This is a list of images to run on a clean shutdown. Note that you must not rely on these
+being run at all, as machines may be be powered off or shut down without having time to run
+these scripts. If you add anything here you should test both in the case where they are
+run and when they are not. Most systems are likely to be "crash only" and not have any setup here,
+but you can attempt to deregister cleanly from a network service here, rather than relying
+on timeouts, for example.
+
+## `services`
+
+The `services` section is a list of images for long running services which are
+run with `containerd`.  Startup order is undefined, so containers should wait
+on any resources, such as networking, that they need.  See [Image
+specification](#image-specification) for a list of supported fields.
+
+## `files`
+
+The files section can be used to add files inline in the config, or from an external file.
+
+```
+files:
+  - path: dir
+    directory: true
+    mode: "0777"
+  - path: dir/name1
+    source: "/some/path/on/local/filesystem"
+    mode: "0666"
+  - path: dir/name2
+    source: "/some/path/that/it/is/ok/to/omit"
+    optional: true
+    mode: "0666"
+  - path: dir/name3
+    contents: "orange"
+    mode: "0644"
+    uid: 100
+    gid: 100
+```
+
+Specifying the `mode` is optional, and will default to `0600`. Leading directories will be
+created if not specified. You can use `~/path` in `source` to specify a path in the build
+user's home directory.
+
+In addition there is a `metadata` option that will generate the file. Currently the only value
+supported here is `"yaml"` which will output the yaml used to generate the image into the specified
+file:
+```
+  - path: etc/linuxkit.yml
+    metadata: yaml
+```
+
+Because a `tmpfs` is mounted onto `/var`, `/run`, and `/tmp` by default, the `tmpfs` mounts will shadow anything specified in `files` section for those directories.
+
+## `trust`
+
+The `trust` section specifies which build components are to be cryptographically verified with
+[Docker Content Trust](https://docs.docker.com/engine/security/trust/content_trust/) prior to pulling.
+Trust is a central concern in any build system, and LinuxKit's is no exception: Docker Content Trust provides authenticity,
+integrity, and freshness guarantees for the components it verifies.  The LinuxKit maintainers are responsible for signing
+`linuxkit` components, though collaborators can sign their own images with Docker Content Trust or [Notary](https://github.com/docker/notary).
+
+- `image` lists which individual images to enforce pulling with Docker Content Trust.
+The image name may include tag or digest, but the matching also succeeds if the base image name is the same.
+- `org` lists which organizations for which Docker Content Trust is to be enforced across all images,
+for example `linuxkit` is the org for `linuxkit/kernel`
+
+## Image specification
+
+Entries in the `onboot` and `services` sections specify an OCI image and
+options. Default values may be specified using the `org.mobyproject.config` image label.
+For more details see the [OCI specification](https://github.com/opencontainers/runtime-spec/blob/master/spec.md).
+
+If the `org.mobylinux.config` label is set in the image, that specifies default values for these fields if they
+are not set in the yaml file. You can override the label by setting the value, or setting it to be empty to remove
+the specification for that value in the label.
+
+If you need an OCI option that is not specified here please open an issue or pull request as the list is not yet
+complete.
+
+By default the containers will be run in the host `net`, `ipc` and `uts` namespaces, as that is the usual requirement;
+in many ways they behave like pods in Kubernetes. Mount points must already exist, as must a file or directory being
+bind mounted into a container.
+
+- `name` a unique name for the program being executed, used as the `containerd` id.
+- `image` the Docker image to use for the root filesystem. The default command, path and environment are
+  extracted from this so they need not be filled in.
+- `capabilities` the Linux capabilities required, for example `CAP_SYS_ADMIN`. If there is a single
+  capability `all` then all capabilities are added.
+- `ambient` the Linux ambient capabilities (capabilities passed to non root users) that are required.
+- `mounts` is the full form for specifying a mount, which requires `type`, `source`, `destination`
+  and a list of `options`. If any fields are omitted, sensible defaults are used if possible, for example
+  if the `type` is `dev` it is assumed you want to mount at `/dev`. The default mounts and their options
+  can be replaced by specifying a mount with new options here at the same mount point.
+- `binds` is a simpler interface to specify bind mounts, accepting a string like `/src:/dest:opt1,opt2`
+  similar to the `-v` option for bind mounts in Docker.
+- `tmpfs` is a simpler interface to mount a `tmpfs`, like `--tmpfs` in Docker, taking `/dest:opt1,opt2`.
+- `command` will override the command and entrypoint in the image with a new list of commands.
+- `env` will override the environment in the image with a new environment list. Specify variables as `VAR=value`.
+- `cwd` will set the working directory, defaults to `/`.
+- `net` sets the network namespace, either to a path, or if `none` or `new` is specified it will use a new namespace.
+- `ipc` sets the ipc namespace, either to a path, or if `new` is specified it will use a new namespace.
+- `uts` sets the uts namespace, either to a path, or if `new` is specified it will use a new namespace.
+- `pid` sets the pid namespace, either to a path, or if `host` is specified it will use the host namespace.
+- `readonly` sets the root filesystem to read only, and changes the other default filesystems to read only.
+- `maskedPaths` sets paths which should be hidden.
+- `readonlyPaths` sets paths to read only.
+- `uid` sets the user id of the process.
+- `gid` sets the group id of the process.
+- `additionalGids` sets a list of additional groups for the process.
+- `noNewPrivileges` is `true` means no additional capabilities can be acquired and `suid` binaries do not work.
+- `hostname` sets the hostname inside the image.
+- `oomScoreAdj` changes the OOM score.
+- `rootfsPropagation` sets the rootfs propagation, eg `shared`, `slave` or (default) `private`.
+- `cgroupsPath` sets the path for cgroups.
+- `resources` sets cgroup resource limits as per the OCI spec.
+- `sysctl` sets a map of `sysctl` key value pairs that are set inside the container namespace.
+- `rmlimits` sets a list of `rlimit` values in the form `name,soft,hard`, eg `nofile,100,200`. You can use `unlimited` as a value too.
+- `annotations` sets a map of key value pairs as OCI metadata.
+
+There are experimental `userns`, `uidMappings` and `gidMappings` options for user namespaces but these are not yet supported, and may have
+permissions issues in use.
+
+In addition to the parts of the specification above used to generate the OCI spec, there is a `runtime` section in the image specification
+which specifies some actions to take place when the container is being started.
+- `cgroups` takes a list of cgroups that will be created before the container is run.
+- `mounts` takes a list of mount specifications (`source`, `destination`, `type`, `options`) and mounts them in the root namespace before the container is created. It will
+  try to make any missing destination directories.
+- `mkdir` takes a list of directories to create at runtime, in the root mount namespace. These are created before the container is started, so they can be used to create
+  directories for bind mounts, for example in `/tmp` or `/run` which would otherwise be empty.
+- `interface` defines a list of actions to perform on a network interface:
+  - `name` specifies the name of an interface. An existing interface with this name will be moved into the container's network namespace.
+  - `add` specifies a type of interface to be created in the containers namespace, with the specified name.
+  - `createInRoot` is a boolean which specifes that the interface being `add`ed should be created in the root namespace first, then moved. This is needed for `wireguard` interfaces.
+  - `peer` specifies the name of the other end when creating a `veth` interface. This end will remain in the root namespace, where it can be attached to a bridge. Specifying this implies `add: veth`.
+- `bindNS` specifies a namespace type and a path where the namespace from the container being created will be bound. This allows a namespace to be set up in an `onboot` container, and then
+  using `net: path` for a `service` container to use that network namespace later.
+- `namespace` overrides the LinuxKit default containerd namespace to put the container in; only applicable to services.
+
+An example of using the `runtime` config to configure a network namespace with `wireguard` and then run `nginx` in that namespace is shown below:
+```
+onboot:
+  - name: dhcpcd
+    image: linuxkit/dhcpcd:<hash>
+    command: ["/sbin/dhcpcd", "--nobackground", "-f", "/dhcpcd.conf", "-1"]
+  - name: wg
+    image: linuxkit/ip:<hash>
+    net: new
+    binds:
+      - /etc/wireguard:/etc/wireguard
+    command: ["sh", "-c", "ip link set dev wg0 up; ip address add dev wg0 192.168.2.1 peer 192.168.2.2; wg setconf wg0 /etc/wireguard/wg0.conf; wg show wg0"]
+    runtime:
+      interfaces:
+        - name: wg0
+          add: wireguard
+          createInRoot: true
+      bindNS:
+        net: /run/netns/wg
+services:
+  - name: nginx
+    image: nginx:alpine
+    net: /run/netns/wg
+    capabilities:
+     - CAP_NET_BIND_SERVICE
+     - CAP_CHOWN
+     - CAP_SETUID
+     - CAP_SETGID
+     - CAP_DAC_OVERRIDE
+```
+
+
+### Mount Options
+When mounting filesystem paths into a container - whether as part of `onboot` or `services` - there are several options of which you need to be aware. Using them properly is necessary for your containers to function properly.
+
+For most containers - e.g. nginx or even docker - these options are not needed. Simply doing the following will work fine:
+
+```yml
+binds:
+ - /var:/some/var/path
+```
+
+Please note that `binds` doesn't **add** the mount points, but **replaces** them.
+You can examine the `Dockerfile` of the component (in particular, `binds` value of
+`org.mobyproject.config` label) to get the list of the existing binds.
+
+However, in some circumstances you will need additional options. These options are used primarily if you intend to make changes to mount points _from within your container_ that should be visible from outside the container, e.g., if you intend to mount an external disk from inside the container but have it be visible outside.
+
+In order for new mounts from within a container to be propagated, you must set the following on the container:
+
+1. `rootfsPropagation: shared`
+2. The mount point into the container below which new mounts are to occur must be `rshared,rbind`. In practice, this is `/var` (or some subdir of `/var`), since that is the only true read-write area of the filesystem where you will mount things.
+
+Thus, if you have a regular container that is only reading and writing, go ahead and do:
+
+```yml
+binds:
+ - /var:/some/var/path
+```
+
+On the other hand, if you have a container that will make new mounts that you wish to be visible outside the container, do:
+
+```yml
+binds:
+ - /var:/var:rshared,rbind
+rootfsPropagation: shared
+```