under certain cases the container image is already in the local docker
registry, but with the wrong architecture; in this case just pretend
it is not there and let the caller decide if they want to build it
Signed-off-by: Christoph Ostarek <christoph@zededa.com>
according to the documentation the following command is valid:
`linuxkit build equinixmetal.yml equinixmetal.arm64.yml`
(docs/platform-equinixmetal.md)
So, make it valid.
Signed-off-by: Christoph Ostarek <christoph@zededa.com>
Update `ReferenceExpand` to support image references from remote
registries. This fixes local image lookup and pulling with newer
versions of Docker.
fixes#4045
Signed-off-by: Jameel Al-Aziz <jameel@bastion.io>
cgroups v2 has been out since 2015. Not having
to set a kernel parameter helps improve the user
experience by not requiring it when it is required
by services in a build. Making this the default was
discussed back in 2021.
Signed-off-by: Jacob Weinstock <jakobweinstock@gmail.com>
before a command like
linuxkit cache pull 127.0.0.1:5000/pkgalpine
would result in trying to pull the following image:
docker.io/127.0.0.1:5000/pkgalpine
and this is wrong
Signed-off-by: Christoph Ostarek <christoph@zededa.com>
Arguably the long term fix is to introduce a check for links in the
documentation with tools like markdown-link-check.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
* Use latest kernel in linuxkit
Signed-off-by: Frédéric Dalleau <frederic.dalleau@docker.com>
* Parallelize kernel source compression
This surpringly saves a lot of time:
M1: from 340 to 90 seconds
Intel: from 527 to 222 seconds (2 cores 4 threads)
Signed-off-by: Frédéric Dalleau <frederic.dalleau@docker.com>
* Add buildx target
buildx can use remote builders and automatically generate the multiarch manifest.
A properly configured builder is required :
First create docker context for the remote builders :
$ docker context create node-<arch> --docker "host=ssh://<user>@<host>"
Then create a buildx configuration using the remote builders:
$ docker buildx create --name kernel_builder --platform linux/amd64
$ docker buildx create --name kernel_builder --node node-arm64 --platform linux/arm64 --append
$ docker buildx use kernel_builder
$ docker buildx ls
Signed-off-by: Frédéric Dalleau <frederic.dalleau@docker.com>
* Add a PLATFORMS variable to declare platforms needed for buildx
Signed-off-by: Frédéric Dalleau <frederic.dalleau@docker.com>
* Make image name customizable
Signed-off-by: Frédéric Dalleau <frederic.dalleau@docker.com>
* Do not tag use the architecture suffix for images built with buildx
Signed-off-by: Frédéric Dalleau <frederic.dalleau@docker.com>
* Add make kconfigx to upgrade configs using buildx
To update configuration for 5.10 kernels use :
make -C kernel KERNEL_VERSIONS=5.10.104 kconfigx
Signed-off-by: Frédéric Dalleau <frederic.dalleau@docker.com>
---------
Signed-off-by: Frédéric Dalleau <frederic.dalleau@docker.com>
This allows SBOM tools to look at /lib/apk/db/installed to determine
which package versions are included in the container. This should
probably be applied across all of the linuxkit containers.
Signed-off-by: eriknordmark <erik@zededa.com>
Enables support for C version of virtiofs
A qemu option allows to specify virtiofsd path.
config.StatePath is used for storing the virtiofs sockets
Note that virtiofsd requires to start as root
Signed-off-by: Frédéric Dalleau <frederic.dalleau@docker.com>
bpfilter is not meant to be used at all at this point. Only the module's
boilerplate is available on upstream kernels.
Signed-off-by: Quentin Deslandes <qde@naccy.de>
* simplify test/pkg/Makefile
Signed-off-by: Avi Deitcher <avi@deitcher.net>
* ensure pkg and test/pkg built before downstream workflows in CI
Signed-off-by: Avi Deitcher <avi@deitcher.net>
Signed-off-by: Avi Deitcher <avi@deitcher.net>
* sync logwrite with memlogd
Signed-off-by: Avi Deitcher <avi@deitcher.net>
* update linuxkit/logwrite and linuxkit/memlogd dependencies
Signed-off-by: Avi Deitcher <avi@deitcher.net>
Signed-off-by: Avi Deitcher <avi@deitcher.net>
Seems buildkit breaks API compatibility with previous OCI implementation
in new RC release, let's update it
Signed-off-by: Petr Fedchenkov <giggsoff@gmail.com>
Signed-off-by: Petr Fedchenkov <giggsoff@gmail.com>
* Fix return code of rungetty.sh
In case of INITGETTY defined we will return exit code 1 which is not
expected
Signed-off-by: Petr Fedchenkov <giggsoff@gmail.com>
* Update getty sha
Signed-off-by: Petr Fedchenkov <giggsoff@gmail.com>
* restore package cache in LinuxKit Build Tests
Signed-off-by: Petr Fedchenkov <giggsoff@gmail.com>
Signed-off-by: Petr Fedchenkov <giggsoff@gmail.com>
* Update of buildkit to the last version
Commit contains the version of buildkit from output of
`go list -m -json github.com/moby/buildkit@c0ac5e8b9b51603c5a93795fcf1373d6d44d3a85`:
go get -u github.com/moby/buildkit@v0.11.0-rc1.0.20221213132957-c0ac5e8b9b51
go mod tidy
go mod vendor
Signed-off-by: Petr Fedchenkov <giggsoff@gmail.com>
* Fix handling of platform flag
In case of 'FROM --platform' defined I can see 'ERROR: no match for
platform in manifest: not found'. The problem was fixed on buildkit side
Signed-off-by: Petr Fedchenkov <giggsoff@gmail.com>
Signed-off-by: Petr Fedchenkov <giggsoff@gmail.com>
Seems we should not use own credential extraction logic as it should be
aligned with resolver internally to select correct information for the
host we want to push manifest. I.e. we may want to push manifest onto
ghcr.io, and in that case we will hit errors as we will extract
credentials for docker.io instead.
Signed-off-by: Petr Fedchenkov <giggsoff@gmail.com>
We use dedicated docker container as builder and we are able to clean
data inside only by re-creating of it. Let's add disk usage and clean
commands for builder.
Signed-off-by: Petr Fedchenkov <giggsoff@gmail.com>
We cannot build for another arch after building for one arch because of
setting skipBuild to true if one arch found. In other words "linuxkit
pkg build --platforms linux/riscv64,linux/amd64 ..." after "linuxkit pkg
build --platforms linux/amd64 ..." will not build for linux/riscv64
which is not expected.
In general when we check for available images and able to found part of
platforms we do not want to rebuild all of them. So this PR includes
platformsToBuild slice which we fill with platforms we want to build for
.
Signed-off-by: Petr Fedchenkov <giggsoff@gmail.com>
It is not easy to use cross-platform build with CGO enabled so lets
allow build without cgo for darwin and use virtualization framework only
if we built with CGO.
Signed-off-by: Petr Fedchenkov <giggsoff@gmail.com>
If we cannot open file for some reason it is better to skip it instead
of exit. Also we should skip symlinks and directories.
Signed-off-by: Petr Fedchenkov <giggsoff@gmail.com>
We should expand the list of supported arches to be able to build them if we want. Without this we will stuck on sending tarball during build for riscv64.
Signed-off-by: Petr Fedchenkov <giggsoff@gmail.com>
To be able to identify successive file changes without commit, we should
use their hash in tag alongside with dirty flag
(<ls-tree>-dirty-<content hash>).
Signed-off-by: Petr Fedchenkov <giggsoff@gmail.com>
We pull all arches for the image which is suboptimal in terms of storage
consumption. Let's pull only required platforms.
Signed-off-by: Petr Fedchenkov <giggsoff@gmail.com>
We check only for existence of builder container and do not start it in
case of not running state. We should start it for example after reboot
of node to be able to build something.
Signed-off-by: Petr Fedchenkov <giggsoff@gmail.com>
We noticed that we use host arch when we want to use previously build
image in oci-layout. Let's use fix on buildkit side and improve test.
Signed-off-by: Petr Fedchenkov <giggsoff@gmail.com>
We should check if we have args in "FROM" and replace them:
ARG IMAGE=linuxkit/img
FROM ${IMAGE} as src
will be parsed as
FROM linuxkit/img as src
Signed-off-by: Petr Fedchenkov <giggsoff@gmail.com>
We do not allow to load into docker images that are targets another
platform differ from current arch. Assume this is because of no support
of manifest. But we can keep all images in place by adding arch suffix
and using tag without arch suffix to point onto current system arch. It
will help to use images from docker for another arch.
Signed-off-by: Petr Fedchenkov <giggsoff@gmail.com>
This option was previously not available and required postprocessing of a `tar-kernel-initrd` output.
Comparison with `iso-efi`:
`iso-efi` only loads the kernel at boot, and the root filesystem is mounted from the actual boot media (eg, a CD-ROM - physical or emulated). This can often cause trouble (it has for us) for multiple reasons:
- the linuxkit kernel might not have the correct drivers built-in for the hardware (see #3154)
- especially with virtual or emulated CD-ROMs, performance can be abysmal: we saw the case where the server IPMI allowed using a ISO stored in AWS S3 over HTTP...you can imagine what happens when you start doing random I/O on the root fs in that case.
- The ISO image has the root device name baked in (ie, `/dev/sr0`) which fails if for some reason the CD-ROM we're running from doesn't end up using that device, so manual tweaking is required (see #2375)
`iso-efi-initrd`, on the other hand, packs the root filesystem as an initramfs (ie similar to what the raw output does, except that in this case we're preparing an ISO image), so both the kernel and the initramfs are loaded in memory by the boot loader and, once running, we don't need to worry about root devices or kernel drivers (and the speed is good, as everything runs in RAM).
Also, the generated ISO can be copied verbatim (eg with `dd`) onto a USB media and it still works.
Finally, the image size is much smaller compared to `iso-efi`.
IMHO, `iso-efi-initrd` could be used almost anywhere `iso-efi` would be used, or might even supersede it. I can't think of a scenario where one might explicitly want to use `iso-efi`.
Points to consider:
- Not tested under aarch64 as I don't have access to that arch. If the automated CI tests also test that, then it should be fine.
- I'm not sure what to put inside `images.yaml` for the `iso-efi-initrd` image. As it is it works of course (my personal image on docker hub), but I guess it'll have to be some more "official" image. However, that cannot be until this PR is merged, so it's kind of a chicken and egg situation. Please advise.
- I can look into adding the corresponding `iso-bios-initrd` builder if there is interest.

Signed-off-by: Davide Brini <waldner@katamail.com>
With linux kernel 5.15+ change of proc/sys/net/ipv4/ip_forward require
CAP_NET_ADMIN (https://github.com/torvalds/linux/commit/8292d7f6). We do
not use ip_forward now, but we should be ready for future changes of
conf files.
Signed-off-by: Petr Fedchenkov <giggsoff@gmail.com>
This allows multiple build flavors for a single codebase, without
sacrificing reproducible builds. The build-args are set in build.yml,
which is typically under the source control (if it is not, then no
reproducible builds are possible anyways). Meaning that mutating
build-args would result in setting "dirty" flag.
Intended use of this commit is to switch between build flavors by
specifying a different yaml file (presumably also under the version
control) by `-build-yml` option.
Because it is impossible to build a final image from packages in
cache, the test for this feature relies on the `RUN echo $build-arg`
output during the `pkg build` process.
Signed-off-by: Yuri Volchkov <yuri@zededa.com>
These are easier to create than cgroupv1 cgroups as they are only a
single mkdir.
Detect which mode we are in by looking for the presence of the
cgroupv2-only cgroup.controllers file.
Signed-off-by: David Scott <dave@recoil.org>
The kernel config is derived from the 5.12 kernel
config we used to have
We explicitly enable RANDOMIZE_KSTACK_OFFSET_DEFAULT
which is off by default.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
./scripts/update-component-sha.sh linuxkit/runc:21dbbda709ae138de0af6b0c7e4ae49525db5e88 linuxkit/runc:9f7aad4eb5e4360cc9ed8778a5c501cce6e21601
Signed-off-by: David Scott <dave@recoil.org>
This reverts commit 380f36cc1a.
Now that runc includes a fix for this, this patch can be reverted
Signed-off-by: Frédéric Dalleau <frederic.dalleau@docker.com>
According to busybox' acpid code, acpid should be allowed to access /dev/input/event*, so we all all "input" devices (whose major number is 13)
Signed-off-by: Sylvain Prat <sylvain.prat@gmail.com>
Previously when we set `cmd.Stderr = os.Stderr`, the stderr from buildx
would be mixed with the image tar, corrupting it.
Work around this (Windows-specific) problem by adding an explicit
indirection via a io.Pipe()
Signed-off-by: David Scott <dave@recoil.org>
After runc 1.0.0-rc92 mounting /dev with ro will fail to start the
container with an error trying to `mkdir /dev/...` (for example
`/dev/pts`). This can be observed following the runc example
Comparing our `config.json` with the working one generated by
`runc spec`, both have a readonly rootfs (good) but the `runc spec`
one does not set `ro` in the `/dev` mount options.
This patch fixes readonly onboot containers by removing the "ro"
option from `/dev`, to match the `runc spec` example.
Signed-off-by: David Scott <dave@recoil.org>
After the runc security advisory[1] the default cgroup device
whitelist was changed.
In previous versions every container had "rwm" (read, write, mknod)
for every device ("a" for all). Typically this was overridden by
container engines like Docker. In LinuxKit we left the permissive
default.
In recent `runc` versions the default allow-all rule was removed,
so a container can only access a device if it is specifically
granted access, which LinuxKit handles via a device: entry.
However it is inconvenient for pkg/format, pkg/mount, pkg/swap
to list all possible block devices up-front. Therefore we add the
ability to grant access to an entire class of device with a single
rule:
```
- path: all
type: b
```
Obviously a paranoid user can still override this with a specific
major/minor number in a device: rule.
[1] https://github.com/opencontainers/runc/security/advisories/GHSA-g54h-m393-cpwq
Signed-off-by: David Scott <dave@recoil.org>
With 561ce6f4be ("Remove Notary and Content Trust") we
removed support for content trust. No need to have it
in the YAMLs either.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
oprofile kernel support was dropped with 5.12.x with:
f8408264c77a ("drivers: Remove CONFIG_OPROFILE support")
However the commit stated that the userspace oprofile tools
had stopped using the kernel interface for a log time. So
drop the check.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
CONFIG_BPFILTER is aimed to provide a replacement for netfilter.
When CONFIG_BPFILTER is enabled, the kernel tries to contact a user mode helper
for each iptable rule update. However the implementation of this helper has not
been upstreamed yet. The communication thus fails and the kernel then falls back
to netfilter.
As a result, the rule update takes more than ten times the duration of the
netfilter implementation alone.
This has been reported by Docker Desktop users for whom it can take minutes to
start a container sharing a few hundred ports. https://github.com/for-mac/issues/5668
More details on the situation is described in https://lwn.net/Articles/822744/.
Signed-off-by: Frederic Dalleau <frederic.dalleau@docker.com>
The bcc portion of the build had been disabled because it wasn't
building. Now that bcc is building again, add it back to the list of
default targets in the kernel build.
Signed-off-by: Krister Johansen <krister.johansen@oracle.com>
This moves up to bcc 0.20.0 and builds on the latest 3.13 Alpine base
image. It uses libelf from Alpine, which allows us to drop a number of
the patches we were carrying and reduce the number of steps taken in the
bcc build.
This builds for me on a branch of tip against 5.11.x, 5.10.x,
5.10.x-dbg, and 5.4.x on x86_65. I have not had a chance to attempt
this on other platforms due to lack of hardware.
Signed-off-by: Krister Johansen <krister.johansen@oracle.com>
Some kernels are only build for some architectures. The
test assumed that all kernels were build for all architectures.
Now, get a list of architectures for which we have a given
kernel image and then make sure the builder images pointed
to by the label and the builder image tagged by convention
exist and point to the same thing.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Declare KERNEL_SOURCE as an environment variable so it
get's picked up in kernel-source-info
fixes#3653
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
5.4.x is the only kernel left which does not have
WireGuard in tree and it people should be using more
recent kernels. Remove the now special case for
compiling out of tree WireGuard.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Prior to this commit we go build -o bin/foo, archive it, and
expand the archive, leaving the resulting artifact in bin.
This doesn't allow us to easily change the bin directory, or
move parts of the makefile around to make things more modular.
This commit changes the behaviour to:
go build -o foo, archive it, expand to `bin`
Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
In alpine version 3.12, the open-vm-tools package got split into new
smaller sub-packages. The implication of this is that features such as
reporting of hostname and ip address to vCenter stopped working.
Signed-off-by: Edvin Eriksson <edvin.erikson@leovegas.com>
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
This adds a --skip-platforms flag that can be used with
lkt pkg build to ignore any arch specified in build.yml
Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
This prevents override of the platform by the user.
lkt pkg build --platform=linux/amd64 pkg/bpftrace should
attempt to build that package for that arch even though
it is not in the build.yml
Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
This commit adds the default linuxkit cache directory to the
GitHub Actions cache. This will ensure that we don't pull images
that already exist in the cache, or build them if we've already
done so. It should speed up CI.
Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
Go can be weird about tools having to run in a directory with
go.mod. This commit moves the linuxkit makefile to the same
directory as the code.
It also changes the semantics of the local-build target.
You can now use STATIC=0 for dynamic builds or PIE=1 to
use --buildmode=pie. The binaries we were producing in
local-static weren't actually static so I fixed that too
Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
Docker Desktop proxies the docker socket at its default location
(/var/run/docker.sock), but allows connecting to the non-proxied
socket through /var/run/docker.sock.raw.
This patch allows the trim-after-delete utility to customize
the docker socket path, so that it can connect to the non-proxied
socket.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The kernel config is derived from 5.6.x by running it through
make oldconfig.
For x86_64 changed manually:
- CONFIG_VIRTIO_MEM=m -> y
- CONFIG_PLDMFW=y -> not set
For aarch64 changed manually:
- CONFIG_SMSC_PHY=m -> not set
- CONFIG_PLDMFW=y -> not set
No adjustment to s390x config
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
- Introduce separate os/arch to the matrix
- Pass os/arch to the local build
- Switch to upload-artifact@v0 and cache@v2
- Fetch linuxkit binary from artefacts rather than using cache
- Add some debug (print file and hashes)
While at it, add some debug for the generated artefacts.
fixes https://github.com/linuxkit/linuxkit/issues/3522
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
`go get -u` will try to update modules dependencies
`go get` (no `-u`) incorrectly resolves dependencies
we should instead advise users to `go install`
Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
This commit removes Notary and Content Trust.
Notary v1 is due to be replaced with Notary v2 soon.
There is no clean migration path from one to the other.
For now, this removes all signing from LinuxKit.
We will look to add this back once a new Notary alternative
becomes available.
Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
Dave Scott works on the Docker Desktop team, and maintains
LinuxKit changes internally for that. I think Dave would
make a good addition to the list of maintainers to help
out. :)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
From Kubernetes v1.20.0 Release notes:
The label applied to control-plane nodes "node-role.kubernetes.io/master"
is now deprecated and will be removed in a future release after a GA
deprecation period.
Introduce a new label "node-role.kubernetes.io/control-plane" that will
be applied in parallel to "node-role.kubernetes.io/master" until the
removal of the "node-role.kubernetes.io/master" label.
xref: https://kubernetes.io/docs/setup/release/notes/#no-really-you-must-read-this-before-you-upgrade
Signed-off-by: Alex Szakaly <alex.szakaly@gmail.com>
We already run the command after an image delete but
- a container delete
- a volume delete
will also free space on the filesystem.
Co-authored-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: David Scott <dave@recoil.org>
The patch we carry for 5.4 and 5.6 does not apply to
5.4.28. Disable the -rt kernel until the version has
been bumped.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
NOTE: This will be a shared mount, due to root being turned into a
shared with `MC_REC` set: `mount("", "/", "", rec|shared, "")`.
For some reason setting `shared` when mounting `/sys/fs/bpf` doesn't
work at all, perhaps that's just a kernel feature.
Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com>
* Fix using ams1 as zone
* Allow specifying image size (+ calculate default from ISO size)
* Fix mangling logs when asking for ssh passphrase
* Some minor code and docs cleanups
Signed-off-by: Karol Woźniak <wozniakk@gmail.com>
This was previously build for 5.4 and 4.19. Latest LTS is 5.4 and
latest stable is 5.6. Also skip s390x build for perf
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Reduce the number of packages to build for s390x. Firmware
is only used for physical devices, so disable it for s390x
where we mostly run in virtual machines.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Some drivers offer mutliple firmwares with the WHENCE file
defining the default. Use the cope-firmware.sh script to
create a copy of the firmware repository with the defaults
copied in to the right place.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
For some reason, the 'make ARCH=s390 oldconfig' yields
a different config when executing on a real s390c system...
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
A subsequent commit will make the 5.4 kernel the default.
This is primarily to reduce the number of kernels we need
to compile for every upgrade.
Note, we keep the 4.19 config file for arm64 around since the
-rt kernel config needs it.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
This should allow end-users to gracefully reboot or shutdown Kubernetes
nodes (incuding control planes) running on vSphere Hypervisor
There are several use cases when cluster administrators are not able to
install extra packages onto the host OS
Fixes#3462
Signed-off-by: Alex Szakaly <alex.szakaly@gmail.com>
- Disable the devmapper snapshotter. We are not using it
- Cherry-pick and upstream commit to be able to disable
the devmapper integration tests
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
This commit uses the GitHub Actions cache to ensure that the `rtf`
binary can be re-used between runs if it hasn't changed.
It also caches the linuxkit binaries for use in future stages.
Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
This commit adds the GCP test that formerly ran in LinuxKitCI to run
under rtf.
As GitHub Actions doesn't currently support adding secret files, I've
skipped this test for now. Credentials can be passed via environment
variable but as RTF runs with `-x` the contents is viewable in the logs.
I will create an issue to follow up and find either a way of writing the
variable to file that doesn't compromise it. Or perhaps another approach
that is more compatible with GH actions
Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
This commit adds a GitHub Actions workflow to replace both CircleCI and
LinuxKit CI.
It will build the Linuxkit binary, run tests and upload artifacts
It replaces the Integration Tests that are run by Linuxkit CI via
the make ci or make ci-pr targets with multiple sets of Integration
Tests that are run in parallel.
It does not yet test GCP. The GCP test in LinuxKit CI could be moved to RTF
Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
This new snapshot comes from the brand new linux-compat repo, which
follows the recent upstreaming into net-next. When Linux 5.6 lands in
LinuxKit, we'll be able to remove the module entirely.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
This adds a new configuration provider that just reads a file.
This is needed for Docker Desktop, where we will run a LinuxKit distro in an isolated namespace within WSL 2.
In this scenario, the config will be accessible trough the WSL2 built-in 9p mount of the Windows filesystem.
Signed-off-by: Simon Ferquel <simon.ferquel@docker.com>
Allows us to drop some patches we were carrying, since the bugs were
fixed upstream. Gives numerous tooling improvements too.
Signed-off-by: Krister Johansen <krister.johansen@oracle.com>
Re-enable perf builds for 5.3.x and 4.19.x since they're the latest
stable and LTS, respectively.
Update the bcc build rules to map to these same kernel releases, too.
Signed-off-by: Krister Johansen <krister.johansen@oracle.com>
The first patch re-adds symbol definitions that were temporarily omitted
from the 4.19 stable branch.
The latter patch corrects the uapi swab.h to that errors about "unknown
type name '__always_inline'" are no longer present in builds. Without
this patch, bcc would build but attempts to compile the internal
programs at runtime would fail.
Signed-off-by: Krister Johansen <krister.johansen@oracle.com>
There were some mistakes made in the initial code where writes didn't work, this commit fixes that.
Signed-off-by: Simon Fridlund <simon@fridlund.email>
This commit removes the container backend for QEMU.
QEMU and it's tools are available on all platforms.
Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
If the swap disk is larger than 1MiB, then use a 1MiB blocksize in `dd`
On my machine using a large block size speeds up swap file creation:
```
/ # time dd if=/dev/zero of=output bs=1024 count=1048576
1048576+0 records in
1048576+0 records out
real 0m 4.61s
user 0m 0.79s
sys 0m 3.77s
/ # time dd if=/dev/zero of=output bs=1048576 count=1024
1024+0 records in
1024+0 records out
real 0m 1.06s
user 0m 0.00s
sys 0m 1.04s
```
Signed-off-by: David Scott <dave.scott@docker.com>
KCONFIG_TAG variable can be used to set a custom kconfig tag.
If KCONFIG_TAG is not set, the the image is tagged as linuxkit/kconfig:latest
This is useful for projects requiring to build multiple kernels that have
different patches.
When trying to edit an unpatched kernel config after working on a patched
kernel config (same kernel version), one had to rerun make kconfig first
in order to edit the config of an unpatched kernel.
Now it is possible to generate a tegged kconfig image and then, get the wanted
config by selecting the corresponding linuxkit/kexec:tag.
Signed-off-by: Gabriel Chabot <gabriel.chabot@qarnot-computing.com>
This commit will update the Scaleway provider to fetch the cloud-init/cloud-config data from the user_data/cloud-init endpoint it will also make sure the whole public ssh key is fetched and no longer strip out the `ssh-rsa` part of the keys
Signed-off-by: Simon Fridlund <simon@fridlund.email>
It's now not needed to send a boot signal when booting an instance on
Scaleway, thus the method is not needed anymore.
Signed-off-by: Patrik Cyvoct <patrik@ptrk.io>
Update Gophercloud dependencies and also bring in the 'utils'
package. This provides support for configuring access to OpenStack
clouds as detailed in the [official
documentation](https://docs.openstack.org/os-client-config/latest/user/configuration.html).
By relying on this package we can simplify the code required to
interact with OpenStack's APIs. Support is also provided upstream for
self-signed and insecure SSL configurations.
Tested with a public cloud running OpenStack 'Rocky', the latest release.
Signed-off-by: Nick Jones <nick@dischord.org>
The rootfs fs was removed in 5.3.x but was mostly a
irrelevant entry in the filesystems list anyway.
Here is the upstream commit:
commit fd3e007f6c6a0f677e4ee8aca4b9bab8ad6cab9a
Author: Al Viro <viro@zeniv.linux.org.uk>
Date: Thu May 30 17:48:35 2019 -0400
don't bother with registering rootfs
init_mount_tree() can get to rootfs_fs_type directly and that simplifies
a lot of things. We don't need to register it, we don't need to look
it up *and* we don't need to bother with preventing subsequent userland
mounts. That's the way we should've done that from the very beginning.
There is a user-visible change, namely the disappearance of "rootfs"
from /proc/filesystems. Note that it's been unmountable all along
and it didn't show up in /proc/mounts; however, it *is* a user-visible
change and theoretically some script might've been using its presence
in /proc/filesystems to tell 2.4.11+ from earlier kernels.
*IF* any complaints about behaviour change do show up, we could fake
it in /proc/filesystems. I very much doubt we'll have to, though.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Short references without domains will now fail parsing on recent versions
of Go as net/url parser is more strict.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Intel microrode download is moved earlier in the Dockerfile, before the
kernel is actually built, so that it's available in the context of a
build and can be referenced in CONFIG_EXTRA_FIRMWARE for people who want
the microcode to be built-in the kernel.
It is still copied in the out/ directory and so that it is still
available for addition in a 'ucode:' section in linuxkit.yml.
Signed-off-by: Yoann Ricordel <yoann.ricordel@qarnot-computing.com>
Copy firmaware files to the correct directory. Instead of
<vendor>/<fw-name>/<fw-name> copy it to <vendor>/<fw-name>.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Vultr provides an API that looks a lot like the AWS api, resulting in
the AWS provider succeeding, but missing certain metadata parts that one
would expect to work out of the box on Vultr, such as SSH PubKey
fetching.
Signed-off-by: Sachi King <nakato@nakato.io>
The Vultr provider currently never calls handleSSH, resulting in it
being impossible to bring up a LinuxKit image in vultr with the SSH
pubkey provided via the Vultr metadata API.
Signed-off-by: Sachi King <nakato@nakato.io>
This skips 0.0.20190531
Changelog for 0.0.20190601
== Changes ==
* compat: don't call xgetbv on cpus with no XSAVE
There was an issue with the backport compat layer in yesterday's snapshot,
causing issues on certain (mostly Atom) Intel chips on kernels older than
4.2, due to the use of xgetbv without checking cpu flags for xsave support.
This manifested itself simply at module load time. Indeed it's somewhat tricky
to support 33 different kernel versions (3.10+), plus weird distro
frankenkernels.
Changelog for 0.0.20190531
== Changes ==
* tools: add wincompat layer to wg(8)
Consistent with a lot of the Windows work we've been doing this last cycle,
wg(8) now supports the WireGuard for Windows app by talking through a named
pipe. You can compile this as `PLATFORM=windows make -C src/tools` with mingw.
Because programming things for Windows is pretty ugly, we've done this via a
separate standalone wincompat layer, so that we don't pollute our pretty *nix
utility.
* compat: udp_tunnel: force cast sk_data_ready
This is a hack to work around broken Android kernel wrapper scripts.
* wg-quick: freebsd: workaround SIOCGIFSTATUS race in FreeBSD kernel
FreeBSD had a number of kernel race conditions, some of which we can vaguely
work around. These are in the process of being fixed upstream, but probably
people won't update for a while.
* wg-quick: make darwin and freebsd path search strict like linux
Correctness.
* socket: set ignore_df=1 on xmit
This was intended from early on but didn't work on IPv6 without the ignore_df
flag. It allows sending fragments over IPv6.
* qemu: use newer iproute2 and kernel
* qemu: build iproute2 with libmnl support
* qemu: do not check for alignment with ubsan
The QEMU build system has been improved to compile newer versions. Linking
against libmnl gives us better error messages. As well, enabling the alignment
check on x86 UBSAN isn't realistic.
* wg-quick: look up existing routes properly
* wg-quick: specify protocol to ip(8), because of inconsistencies
The route inclusion check was wrong prior, and Linux 5.1 made it break
entirely. This makes a better invocation of `ip route show match`.
* netlink: use new strict length types in policy for 5.2
* kbuild: account for recent upstream changes
* zinc: arm64: use cpu_get_elf_hwcap accessor for 5.2
The usual churn of changes required for the upcoming 5.2.
* timers: add jitter on ack failure reinitiation
Correctness tweak in the timer system.
* blake2s,chacha: latency tweak
* blake2s: shorten ssse3 loop
In every odd-numbered round, instead of operating over the state
x00 x01 x02 x03
x05 x06 x07 x04
x10 x11 x08 x09
x15 x12 x13 x14
we operate over the rotated state
x03 x00 x01 x02
x04 x05 x06 x07
x09 x10 x11 x08
x14 x15 x12 x13
The advantage here is that this requires no changes to the 'x04 x05 x06 x07'
row, which is in the critical path. This results in a noticeable latency
improvement of roughly R cycles, for R diagonal rounds in the primitive. As
well, the blake2s AVX implementation is now SSSE3 and considerably shorter.
* tools: allow setting WG_ENDPOINT_RESOLUTION_RETRIES
System integrators can now specify things like
WG_ENDPOINT_RESOLUTION_RETRIES=infinity when building wg(8)-based init
scripts and services, or 0, or any other integer.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Linux has documented but somewhat unusual behavior around
SIGSTOP/SIGCONT and certain syscalls, of which epoll_wait(2) is one. In
this particular case, rngd exited unexpectedly after getting ptrace'd
mid-epoll_wait. Fix this by handling EINTR from this syscall, and
continuing to add entropy and wait.
Signed-off-by: Krister Johansen <krister.johansen@oracle.com>
Update the image tag for the mkimage-rpi3 tool used by the CLI to adopt
the dynamic DTB selection feature.
Signed-off-by: Richard Connon <richard@connon.me.uk>
U-Boot sets the variable fdtfile to the correct file name for the
detected hardware revision. Use this in the boot script to load either
the 3-b or 3-b-plus DTB
Signed-off-by: Richard Connon <richard@connon.me.uk>
Update the u-boot image included in the mkimage-rpi3 image to support
detecting newer hardware versions and setting the fdtfile variable
accordingly
Shallow clone the u-boot repository during docker build to improve build
efficiency
Signed-off-by: Richard Connon <richard@connon.me.uk>
This stops the output from also being copied to logs if the user
has a log driver configured.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Update Raspberry Pi firmware used in mkimage-rpi3 to the latest stable
version to support newer hardware models such as the 3B+
Signed-off-by: Richard Connon <richard@connon.me.uk>
Intel seem to have switched to hosting the microcode on GitHub.
Use this source and update to the 20190514 version.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
== Changes ==
* allowedips: initialize list head when removing intermediate nodes
Fix for an important regression in removing allowed IPs from the last
snapshot. We have new test cases to catch these in the future as well.
* wg-quick: freebsd: rebreak interface loopback, while fixing localhost
* wg-quick: freebsd: export TMPDIR when restoring and don't make empty
Two fixes for FreeBSD which have already been backported into ports.
* tools: genkey: account for short reads of /dev/urandom
* tools: add support for Haiku
The tools now support Haiku! Maybe somebody is working on a WireGuard
implementation for it?
* tools: warn if an AllowedIP has a nonzero host part
If you try to run `wg set wg0 peer ... allowed-ips 192.168.1.82/24`, wg(8)
will now print a warning. Even though we mask this automatically down to
192.168.1.0/24, usually when people specify it like this, it's a mistake.
* wg-quick: add 'strip' subcommand
The new strip subcommand prints the config file to stdout after stripping
it of all wg-quick-specific options. This enables tricks such as:
`wg addconf $DEV <(wg-quick strip $DEV)`.
* tools: avoid unneccessary next_peer assignments in sort_peers()
Small C optimization the compiler was probably already doing.
* peerlookup: rename from hashtables
* allowedips: do not use __always_inline
* device: use skb accessor functions where possible
Suggested tweaks from Dave Miller.
* qemu: set framewarn 1280 for 64bit and 1024 for 32bit
These should indicate to us more clearly when we cross the most strict stack
thresholds expected when using recent compilers with the kernel.
* blake2s: simplify
* blake2s: remove outlen parameter from final
The blake2s implementation has been simplified, since we don't use any of the
fancy tree hashing parameters or the like. We also no longer separate the
output length at initialization time from the output length at finalization
time.
* global: the _bh variety of rcu helpers have been unified
* compat: nf_nat_core.h was removed upstream
* compat: backport skb_mark_not_on_list
The usual assortment of compat fixes for Linux 5.1.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Getting compile errors:
AS [M] /wireguard/crypto/zinc/chacha20/chacha20-x86_64.o
In file included from <command-line>:
/wireguard/compat/compat.h:795:10: fatal error: net/netfilter/nf_nat_core.h: No such file or directory
#include <net/netfilter/nf_nat_core.h>
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
With the current firmware being pulled for the RPi3, recent revisions of
the RPi hardware, such as the 3 B+ will fail to boot.
The issue is exhibited as when RPi 3 B+ receives power and attempts to
boot, the power LED will turn off and the ACT LED will flash 8 times.
According to elinux.org troubleshooting guide[0] this correlates to an
SDRAM initialisation error that can be fixed by updating the firmware.
After updating this firmware the power light stays on, and UBoot can be
seen booting.
[0] - https://elinux.org/R-Pi_Troubleshooting#Green_LED_blinks_in_a_specific_pattern
Signed-off-by: Sachi King <nakato@nakato.io>
Commit d47b283df4 ("kernel: Remove fetch target") removed
the 'fetch' target to simplify the Makefile. This left
dependencies on 'sources' lingering. Remove it.
resolves#3333
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Commit 250b14661b ("kernel: Use elfutils-dev instead
of libelf-dev") switched the kernel build to use
elfutils-dev instead of libelf-dev. This caused the kernel
module tests to fail. The still installed libelf-dev and
the dunamically linked objtool (and friends) from the
kernel source package failed to execute.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
With kernel 5.0.6 we start seeing compile errors such as:
HOSTCXX -fPIC scripts/gcc-plugins/randomize_layout_plugin.o
In file included from <stdin>:1:
/usr/include/libelf/libelf.h:28:5: error: "__LIBELF_INTERNAL__" is not defined, evaluates to 0 [-Werror=undef]
#if __LIBELF_INTERNAL__
^~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
elutils-dev installs a different version of libelf.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
All our 4.x kernels had CFQ enabled. This was removed
in 5.x and replaced with BFQ. Enable it.
resolves#3308
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
To reduce the number of kernels we maintain, for s390x
and ar64 we only support the latest LTS and newer kernels.
v4.19.x has been out for a while, so lets remove support for
v4.14.x.
resolves#3302
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
See https://github.com/moby/moby/issues/38887
for details. Basically 5.x removed support for
CFQ with f382fb0bcef4 ("block: remove legacy IO
schedulers") and the Moby check still requires it.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Many places where checking for -ge 4 and some minor version.
This will fail for 5.x kernels if their minor version is less.
Fix it.
While at it, also restructure/simplify the code, make it easier
to run against arbitrary kernel configs, and tidy up some
whitespaces.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
This target allowed to locally download the kernel source
tar balls. We haven't used this foir a while and adding
v5.x kernel support for it would add yet another conditional.
Remove it to keep the Makefile simpler.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
The make-gcp script in the mkimage-gcp tool creates a virtual fs of 1GB, excactly. If your filesystem needs to be larger, then make-gcp errors in a poorly explained way. Simply removing the arg makes the fs the same size as the image used to build it.
Signed-off-by: Daniel Smith <daniel@razorsecure.com>
The build of the perf utility has been quite bothersome,
with different arches and kernel versions failing.
Since we now have the ful kernel source in the package,
factor out the actual build into Dockerfile.perf
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
The compile fails with:
[ 30%] Building CXX object src/ast/CMakeFiles/ast.dir/codegen_llvm.cpp.o
[ 30%] Building CXX object src/ast/CMakeFiles/ast.dir/irbuilderbpf.cpp.o
[ 31%] Building CXX object src/ast/CMakeFiles/ast.dir/printer.cpp.o
[ 31%] Building CXX object src/ast/CMakeFiles/ast.dir/semantic_analyser.cpp.o
/bpftrace/src/ast/irbuilderbpf.cpp: In member function 'llvm::CallInst* bpftrace::ast::IRBuilderBPF::CreateProbeReadStr(llvm::AllocaInst*, size_t, llvm::Value*)':
/bpftrace/src/ast/irbuilderbpf.cpp:279:16: error: 'BPF_FUNC_probe_read_str' was not declared in this scope
getInt64(BPF_FUNC_probe_read_str),
^~~~~~~~~~~~~~~~~~~~~~~
/bpftrace/src/ast/irbuilderbpf.cpp: In member function 'llvm::CallInst* bpftrace::ast::IRBuilderBPF::CreateProbeReadStr(llvm::Value*, size_t, llvm::Value*)':
/bpftrace/src/ast/irbuilderbpf.cpp:294:16: error: 'BPF_FUNC_probe_read_str' was not declared in this scope
getInt64(BPF_FUNC_probe_read_str),
^~~~~~~~~~~~~~~~~~~~~~~
/bpftrace/src/ast/irbuilderbpf.cpp: In member function 'llvm::CallInst* bpftrace::ast::IRBuilderBPF::CreateGetCurrentCgroupId()':
/bpftrace/src/ast/irbuilderbpf.cpp:422:16: error: 'BPF_FUNC_get_current_cgroup_id' was not declared in this scope
getInt64(BPF_FUNC_get_current_cgroup_id),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/bpftrace/src/ast/irbuilderbpf.cpp: In member function 'llvm::CallInst* bpftrace::ast::IRBuilderBPF::CreateGetCurrentTask()':
/bpftrace/src/ast/irbuilderbpf.cpp:461:16: error: 'BPF_FUNC_get_current_task' was not declared in this scope
getInt64(BPF_FUNC_get_current_task),
^~~~~~~~~~~~~~~~~~~~~~~~~
/bpftrace/src/ast/irbuilderbpf.cpp: In member function 'llvm::CallInst* bpftrace::ast::IRBuilderBPF::CreateGetStackId(llvm::Value*, bool)':
/bpftrace/src/ast/irbuilderbpf.cpp:497:16: error: 'BPF_FUNC_get_stackid' was not declared in this scope
getInt64(BPF_FUNC_get_stackid),
^~~~~~~~~~~~~~~~~~~~
/bpftrace/src/ast/semantic_analyser.cpp: In member function 'int bpftrace::ast::SemanticAnalyser::create_maps(bool)':
/bpftrace/src/ast/semantic_analyser.cpp:871:68: error: 'BPF_MAP_TYPE_STACK_TRACE' was not declared in this scope
bpftrace_.stackid_map_ = std::make_unique<bpftrace::FakeMap>(BPF_MAP_TYPE_STACK_TRACE);
^~~~~~~~~~~~~~~~~~~~~~~~
/bpftrace/src/ast/semantic_analyser.cpp:885:64: error: 'BPF_MAP_TYPE_STACK_TRACE' was not declared in this scope
bpftrace_.stackid_map_ = std::make_unique<bpftrace::Map>(BPF_MAP_TYPE_STACK_TRACE);
^~~~~~~~~~~~~~~~~~~~~~~~
make[2]: *** [src/ast/CMakeFiles/ast.dir/build.make:89: src/ast/CMakeFiles/ast.dir/irbuilderbpf.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[2]: *** [src/ast/CMakeFiles/ast.dir/build.make:115: src/ast/CMakeFiles/ast.dir/semantic_analyser.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:276: src/ast/CMakeFiles/ast.dir/all] Error 2
make: *** [Makefile:141: all] Error 2
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
The wireguard package has some sub-packages which are
now dependencies. Include them in the alpine base.
Also include openresolv, which is required by one
of the wireguard packages.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
* wg-quick: freebsd: allow loopback to work
FreeBSD adds a route for point-to-point destination addresses. We don't
really want to specify any destination address, but unfortunately we
have to. Before we tried to cheat by giving our own address as the
destination, but this had the unfortunate effect of preventing
loopback from working on our local ip address. We work around this with
yet another kludge: we set the destination address to 127.0.0.1. Since
127.0.0.1 is already assigned to an interface, this has the same effect
of not specifying a destination address, and therefore we accomplish the
intended behavior. Note that the bad behavior is still present in Darwin,
where such workaround does not exist.
* tools: remove unused check phony declaration
* highlighter: when subtracting char, cast to unsigned
* chacha20: name enums
* tools: fight compiler slightly harder
* tools: c_acc doesn't need to be initialized
* queueing: more reasonable allocator function convention
Usual nits.
* systemd: wg-quick should depend on nss-lookup.target
Since wg-quick(8) calls wg(8) which does hostname lookups, we should
probably only run this after we're allowed to look up hostnames.
* compat: backport ALIGN_DOWN
* noise: whiten the nanoseconds portion of the timestamp
This mitigates unrelated sidechannel attacks that think they can turn
WireGuard into a useful time oracle.
* hashtables: decouple hashtable allocations from the main device allocation
The hashtable allocations are quite large, and cause the device allocation in
the net framework to stall sometimes while it tries to find a contiguous
region that can fit the device struct. To fix the allocation stalls, decouple
the hashtable allocations from the device allocation and allocate the
hashtables with kvmalloc's implicit __GFP_NORETRY so that the allocations fall
back to vmalloc with little resistance.
* chacha20poly1305: permit unaligned strides on certain platforms
The map allocations required to fix this are mostly slower than unaligned
paths.
* noise: store clamped key instead of raw key
This causes `wg show` to now show the right thing. Useful for doing
comparisons.
* compat: ipv6_stub is sometimes null
On ancient kernels, ipv6_stub is sometimes null in cases where IPv6 has
been disabled with a command line flag or other failures.
* Makefile: don't duplicate code in install and modules-install
* Makefile: make the depmod path configurable
* queueing: net-next has changed signature of skb_probe_transport_header
A 5.1 change. This could change again, but for now it allows us to keep this
snapshot aligned with our upstream submissions.
* netlink: don't remove allowed ips for new peers
* peer: only synchronize_rcu_bh and traverse trie once when removing all peers
* allowedips: maintain per-peer list of allowedips
This is a rather big and important change that makes it much much faster to do
operations involving thousands of peers. Batch peer/allowedip addition and
clearing is several orders of magnitude faster now.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
These tests expect a $TMPDIR which supports user xattrs, which the tmpfs on
/tmp does not. Redirect it to the persistent disk which does.
Signed-off-by: Ian Campbell <ijc@docker.com>
... from the old-skool label scheme.
No semantic change intended. Some keys are in different orders and the "mounts"
entry gained an empty "destination" key, neither of which makes a practical
difference.
Signed-off-by: Ian Campbell <ijc@docker.com>
* tools: curve25519: handle unaligned loads/stores safely
This should fix sporadic crashes with `wg pubkey` on certain architectures.
* netlink: auth socket changes against namespace of socket
In WireGuard, the underlying UDP socket lives in the namespace where the
interface was created and doesn't move if the interface is moved. This
allows one to create the interface in some privileged place that has
Internet access, and then move it into a container namespace that only
has the WireGuard interface for egress. Consider the following
situation:
1. Interface created in namespace A. Socket therefore lives in namespace A.
2. Interface moved to namespace B. Socket remains in namespace A.
3. Namespace B now has access to the interface and changes the listen
port and/or fwmark of socket. Change is reflected in namespace A.
This behavior is arguably _fine_ and perhaps even expected or
acceptable. But there's also an argument to be made that B should have
A's cred to do so. So, this patch adds a simple ns_capable check.
* ratelimiter: build tests with !IPV6
Should reenable building in debug mode for systems without IPv6.
* noise: replace getnstimeofday64 with ktime_get_real_ts64
* ratelimiter: totalram_pages is now a function
* qemu: enable FP on MIPS
Linux 5.0 support.
* keygen-html: bring back pure javascript implementation
Benoît Viguier has proofs that values will stay well within 2^53. We
also have an improved carry function that's much simpler. Probably more
constant time than emscripten's 64-bit integers.
* contrib: introduce simple highlighter library
This is the highlighter library being used in:
- https://twitter.com/EdgeSecurity/status/1085294681003454465
- https://twitter.com/EdgeSecurity/status/1081953278248796165
It's included here as a contrib example, so that others can paste it into
their own GUI clients for having the same strictly validating highlighting.
* netlink: use __kernel_timespec for handshake time
This readies us for Y2038. See https://lwn.net/Articles/776435/ for more info.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
This also fixes up test/cases/020_kernel/110_namespace/common.yml
and test/cases/040_packages/032_bcc/test.yml to use the 4.19.x
kernel. I missed these when making the 4.19 kernel the default.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
4.19.x is the new LTS kernel and has been out for a while. Switch
all examples and tests to using it instead of the 4.14.x kernel.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
The kernel config was derived from the 4.19.13 kernel config
run through the 'make oldconfig' with all defaults accepted,
except for:
- NET_VENDOR_MICROCHIP (defauly 'y', set to 'n')
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
We already have 4.9.x, 4.14.x, and 4,19.x as LTS releases.
4.9.x has a longer lifetime as 4.4.x as well and fewer security
fixes can be backported to 4.4.x. Remove it.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Sort the list of mount points by destination. This makes the list
deterministic for reproducible builds and also ensures that, e.g.,
the mount for /dev happens before the mount for /dev/pts.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Currently 'docker export' is used to convert a linuxkit entry
in the YAML file to a tar file of the root filesystem. This
process creates a number of files and directories which have
the timestamp of when the 'docker export' is run. Fix 'em up.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
When creating files for the "intermediate" tar ball,
fix the ModTime. This reduces the difference between
LinuxKit images build from identical inputs.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
packet.net will soon have x86 and arm64 machines with NFPs.
Enable the driver for it.
The 4.9 kernel only has support for the NFP VF driver,
so don't enable it there.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Support plain gzip'ed files, as used on arm64, and bzImage with
embedded gzip'ed kernel, as used on x86.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Add the '-vmlinux' flag to build and pass it all
the way to the kernel filter.
Note, this commit only adds the flag but does not
yet perform the decompression. This will be added
with the next commit.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Stash the kernel image in a local buffer and
flush it out once done.
This is preparation work for supporting uncompressed
kernels in the next commit.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Please add your use cases here. There are many adopters that I know about but have not
documented here, please fill this in.
I divided this into production users, and also linked a selection of open source projects
that I know about here.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
A previosu commit removed suppoer for 4.18.x kernels for
arm64 and s390x but did not remove the config files. Fix it.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Needed for containerd v1.2.0 otherwise:
$ ctr run -t docker.io/library/hello-world@sha256:f3b3b28a45160805bb16542c9531888519430e9e6d6ffc09d72261b0d26ff74f test
[ 1311.667587] overlayfs: failed to resolve '/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/5/fs': -2
ctr: failed to mount /tmp/containerd-mount111658703: no such file or directory
Signed-off-by: Ian Campbell <ijc@docker.com>
On Linux a key in `~/.docker/config.json` indicates if a credentials helper is
in use (and which), if one is then the method is identical to the Darwin case
so refactor to support that.
Signed-off-by: Ian Campbell <ijc@docker.com>
If the YAML does not specify a kernel, kernel commandline
or any containers, don't create empty files. Note, an
initrd file is still created if the kernel image contains
CPU ucode.
This only applies to kernel+initrd and tar-kernel-initrd
output formats.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
The logic for perf became too complex. Just build for latest LTS
and latest stable.
Disable for arm64 for now as it is broken for 4.19 due to a header
mismatch:
In file included from /linux/tools/arch/arm64/include/uapi/asm/unistd.h:20:0,
from libbpf.c:36:
/linux/tools/include/uapi/asm-generic/unistd.h:754:0: error: "__NR_fcntl" redefined [-Werror]
In file included from /usr/include/sys/syscall.h:4:0,
from /linux/tools/perf/perf-sys.h:7,
from libbpf.c:35:
/usr/include/bits/syscall.h:26:0: note: this is the location of the previous definition
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
The kernel configs were constructed by running the 4.18.x config
through the 4.19 oldconfig process.
The 4.19.x has a new option, RANDOM_TRUST_CPU, which indicates
if the CPUs random instruction is to be trusted. It defaults to
"no" and this default was accepted.
Most of the defaults were accepted, except for:
BLK_CGROUP_IOLATENCY=y
NFT_TUNNEL=y
NFT_OSF=y
NFT_TPROXY=y
NETFILTER_XT_MATCH_SOCKET=y
NET_VENDOR_CADENCE=n
NET_VENDOR_NETERION=n
NET_VENDOR_PACKET_ENGINES=n
We also disallow CIFS for insecure legacy servers:
CIFS_ALLOW_INSECURE_LEGACY=n
For arm64, the following changes were made to the default:
SENSORS_RASPBERRYPI_HWMON=y
CRYPTO_DEV_QCOM_RNG=m
CRYPTO_DEV_HISI_SEC=m
For s390x, the additional changes were made to the default:
KERNEL_BZIP2 (default is gzip)
GCC_PLUGINS=y
GCC_PLUGIN_STRUCTLEAK=y
GCC_PLUGIN_STRUCTLEAK_BYREF_ALL=y
GCC_PLUGIN_RANDSTRUCT=y
GCC_PLUGIN_RANDSTRUCT_PERFORMANCE=y
Running the 4.18 and 4.19 kernel config through
./scripts/kconfig-split.py yields the following 4.19.x
only config options for x86_64:
The x86_64 kernel difference to 4.18 for
CONFIG_ARCH_SUPPORTS_ACPI=y
CONFIG_BLK_CGROUP_IOLATENCY=y
CONFIG_BNXT_HWMON=y
CONFIG_BUILD_SALT=""
CONFIG_CONSOLE_LOGLEVEL_QUIET=4
CONFIG_CRASH_CORE=y
CONFIG_HAVE_ARCH_PREL32_RELOCATIONS=y
CONFIG_HAVE_RELIABLE_STACKTRACE=y
CONFIG_MEMCG_KMEM=y
CONFIG_MLX5_EN_ARFS=y
CONFIG_MLX5_EN_RXNFC=y
CONFIG_NETFILTER_NETLINK_OSF=y
CONFIG_NETFILTER_XT_MATCH_SOCKET=y
CONFIG_NFT_OSF=y
CONFIG_NFT_TPROXY=y
CONFIG_NFT_TUNNEL=y
CONFIG_NF_SOCKET_IPV4=y
CONFIG_NF_SOCKET_IPV6=y
CONFIG_XEN_SCRUB_PAGES_DEFAULT=y
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
After 'make oldconfig' we check that that the kernel config
is as we expect and error if they don't. We used to print
the default 'diff' output on a mismatch but a unified diff
is easier to read.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Using filepath primitives instead of manipulating file paths manually takes care of platform specific formats.
Signed-off-by: Mathieu Champlon <mathieu.champlon@docker.com>
This cherry picks:
- b6fe0440c637 ("bridge: implement missing ndo_uninit()")
- b1b9d366028f ("bridge: move bridge multicast cleanup to ndo_uninit")
The fix is in b1b9d366028f ("bridge: move bridge multicast cleanup
to ndo_uninit") but it requires b6fe0440c637 ("bridge: implement missing
ndo_uninit()"). Furthermore, b1b9d366028f needed some manual resolution
of a cherry-pick conflict because the surrounding code had changed.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
We want to compile BCC for the latest LTS and the latest
stable and missed the update to 4.18 when enabling it. Do
it now.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
support SPI in container environment (introduced Linux 4.12 2017-06-02).
Abstraction define interface EP for CAN module in containered environment. This
namespace is available and introduced with Linux Kernel 4.12 by M. Kicherer
and later O. Hartkopp, to allow containers bridging such device.
@see linux-kernel/net/can@fc4c581
Although KSPP did not explicilty noted `CAN` as secure kernel flag, this
would aim to bring such conclusion. As for security concerns, CAN protocol did
not yield any user-land or host-level vulnerabilities since introduced as
SocketCAN module in Linux Kernel. Lower-layer [protocol] standards is not
secured by default since applications are supposed to implement their own
security mechanism.
This global abstraction currently supports CAN raw, proc and af_can
codes. Does not support GW and BCM. Namespace uses _NEWNET on pseudo-file
system. Allows modprobe to environment, works by recv `pnet` for the given
interface.
Signed-off-by: Halis Duraki <duraki@linuxmail.org>
Note, this update skips 4.18.2/4.17.16/4.14.64/4.9.121/4.4.149
as the change was a single patch, a bug fix.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
In setup_net() there are a few particularly slow subsystems that
contribute more than 140ms of time to the new net namespace creation
path. The docker daemon doesn't depend on these, and won't modprobe
them into the kernel. Convert these to modules to reduce the amount of
time it takes for docker to start a container. This change takes an
additional ~120 ms of time off container start time.
Signed-off-by: Krister Johansen <krister.johansen@oracle.com>
While investigating performance problems around 'docker run' times, it
was observed that a large amount of time was spent in network namespace
creation. Of that time, a large portion involved waiting for RCU grace
periods to elapse. Increasing HZ causes the periodic timer to check for
quiesced periods more frequently, which consequently reduces the amount
of time RCU callers spend waiting for grace periods and in barrier
waits.
By itself, this change took the amount of time to execute a 'docker run
hello-world' down to 570ms from over 2000ms on 4.14, and down to 390ms
from 1260 on 4.17 and 4.18.
Signed-off-by: Krister Johansen <krister.johansen@oracle.com>
The kernel config was derived from the 4.17.x kernel config
and then tweaked a little. Specifically:
- Enable XDP_SOCKETS
- Enable NFT_CONNLIMIT
- Enable IP_VS_MH
- Enable BPFILTER (as module)
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
The 4.14.63 contains important security fixes in particular
against L1TF (CVE-2018-3615, CVE-2018-3620, CVE-2018-3646) and
userspace-userspace SpectreRSB.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
linuxkit/vsudd:98e554e4f3024c318e42c1f6876b541b654acd9f
linuxkit/host-timesync-daemon:613dc55e67470ec375335a1958650c3711dc4aa6
linuxkit/test-virtsock:57883002c2bc824709efa6cd3818e1ff51a11889
linuxkit/test-ns:a21f996641f391d467a7842e85088a304d24fae5
Signed-off-by: David Scott <dave.scott@docker.com>
In addition to bug fixes, this removes the special protocol used
for `shutdown` needed by old Windows builds < 14393.
Signed-off-by: David Scott <dave.scott@docker.com>
Note: this patch introduces an incompatibility in the
`linuxkit run vbox` arguments.
It wasn't impossible to specify more than one network adapter
to the `linuxkit run vbox` command.
This patch allows to specify more than one `-networking` argument to specify
different network adapters.
For instance:
~~~sh
linuxkit run vbox -networking type=nat -networking type=hostonly,adapter=vboxnet0
~~~
will setup the VM with 2 NICs.
It is also possible to get rid of the `type` argument.
Signed-off-by: Brice Figureau <brice@daysofwonder.com>
VirtualBox hardware (like physical hardware) has only a limited number
of IDE device on an IDE Controller.
Unfortunately when using an additional drive, it was given the port
value of 2, which doesn't exists in VirtualBox IDE controllers (as
only 0 and 1 are permitted).
This change makes use of the SATA Controller which can host much
more drives, to hook the additional drives.
Signed-off-by: Brice Figureau <brice@daysofwonder.com>
While processing the content of a tar image, linuxkit's moby tool is
blindly reusing the original tar format.
Moreover it locates the files under a new prefix, so if the original
file was stored as USTAR in the original archive, the filename length
and new prefix could be greater than the USTAR name limit leading
to a fatal error.
The fix is to always enforce PAX format on all copied files from the
original image archive.
Signed-off-by: Brice Figureau <brice-puppet@daysofwonder.com>
When building the build context, symlink need special
treatment as the link name needs to be added when
building the tar.FileInfoHeader. This code does that.
We may also need to add a special case for hard links
as the moby/moby package 'archive' does, but this
should for now
fixes#3142
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
This is the final piece. If 'sources' are defined, tar up
the sources and rewrite them accordingly. Pass it as build
build context to 'docker'.
This allows building from something like this:
├── etc
│ ├── foo
└── foo
├── Dockerfile
├── build.yml
└── main.go
With 'build.yml':
image: foo
extra-sources:
- ../etc:etc
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
This commit adds the ability to add a build context to
docker for the package build. The build context is passed
on stdin to the docker process.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
If the build.yml specifies 'extra-sources', ie sources
outside the package directory, calculate the hash based on
the tree hash of all source directories and the package
directory.
Note, this requires the source directories to be under
git revision control.
Also clean up the src and dst of the path and stash the
result in the Pkg structure.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
In e8786d73bb the logwrite package will
automatically append .log to every log.
In 5201049f2c the init package will send
stderr of a service `s` to a log named `s` and the stdout to `s.out`.
Therefore the files we create on disk are `s.log` and `s.out.log`.
This patch modifies the memlogd `logwrite` command-line wrapper to use
the same convention.
Note there is a confusing name clash between `pkg/logwrite` and `cmd/logwrite`
in `memlogd` modified here.
Signed-off-by: David Scott <dave.scott@docker.com>
This commit adds support for authentication for image pulls for
'linuxkit build'. For each image reference we look up credentials
via the docker CLI configuration and use it if defined for
a given registry server. The code caches credentials to avoid
lookups for every image.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
A subsequent commit will enable support for private repositories.
This requires some functions from 'github.com/docker/cli' which
in turn relies on some newer versions of some of the vendored
packages here.
In this commit, update all packages used here to the versions
used by 'github.com/docker/cli' release 18.06 (the latest stable).
This requires vendoring a bunch of additional packages, such
as prometheus
Also run 'sort' over 'vendor.conf' to keep things in order.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
- use the mkimage hashes that we had in LinuxKit as more up to date than tool.
- update docs
- move the code from moby under src/cmd/linuxkit
Signed-off-by: Justin Cormack <justin@specialbusservice.com>
These must have fallen through the crack during various
kernel updates. Move everything to the latest 4.14.x kernel.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@gmail.com>
When logging directly to files (the not-using-memlogd case) the onboot
services must log to /run/log because /var/log might be overmounted
by a persistent disk. Therefore we create a symlink at the end of
the onboot section.
When logging via memlogd, all logs are buffered until a logwrite service
starts, so no symlink is needed.
Signed-off-by: David Scott <dave.scott@docker.com>
Also simplify the code by directly storing the path to
the log file in the LogFile structure.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
Update the the firmware packages to the latest commit
of the upstream linux-firmware repository.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
This process connects to memlogd and streams logs to individual files,
one per log. It keeps track of how many bytes have been written to each
file and rotates when the file size exceeds a defined threshold.
By default the maximum size of each file before rotation is 1MiB and
we keep up to 10 files per log.
Signed-off-by: David Scott <dave.scott@docker.com>
Looks like brtfs-prog v4.17 as shipped with alpine:3.8 requires
a loopback device of 109MB while the containerd tests only
create a 100MB device. This causes the test to fail.
Disable it until https://github.com/containerd/containerd/issues/2447
is fixed.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@gmail.com>
This simplifies the example by adding a service which writes to the
log every 1s and a getty for introspection.
To see the logs:
/proc/1/root/usr/bin/logread -F
Signed-off-by: David Scott <dave.scott@docker.com>
Switch to a more formally-specified `kmsg`-style format for reading
the logs.
- update the spec in docs/logging.md
- check for bad names in pkg/memlogd with unit test
Signed-off-by: David Scott <dave.scott@docker.com>
- check writing to the log does not block
- check the log doesn't expand -- it should be finite
- check that client connections don't buffer arbitrary amounts of
data if the client is slow
Signed-off-by: David Scott <dave.scott@docker.com>
Previously we had a per-connection
bytes.Buffer // to be written to the connection
sync.Cond // to allow us to Wait for more data
This had the major disadvantage that the buffer was unbounded and so
a slow client could cause memory exhaustion in the server. This patch
replaces these with a single
chan *logEntry
which is naturally bounded and supports blocking read. We make write
non-blocking using select i.e. we drop messages rather than allocate
more space.
Signed-off-by: David Scott <dave.scott@docker.com>
This is an example external logging service which can be enabled by
adding it to the `init` section of the .yml, for example:
...
init:
- linuxkit/init:35866bb276c264a5f664bfac7456f4b9eeb87a4d
- linuxkit/runc:v0.4
- linuxkit/containerd:f2bc1bda1ab18146967fa1a149800aaf14bee81b
- linuxkit/ca-certificates:v0.4
- linuxkit/memlogd:cc035e5c9e4011ec1ba97a181a6689fc90965ce9
onboot:
...
Signed-off-by: David Scott <dave.scott@docker.com>
If external logging is enabled, this patch sets the stdout and stderr
of the `runc` invocations to one end of a socketpair and the other end is
sent to the logging service. Otherwise we log to files as before.
Signed-off-by: David Scott <dave.scott@docker.com>
An external logging system exists if the socket
/var/run/linuxkit-external-logging.sock
exists.
If an external logging system is enabled then create FIFOs for
containerd and send the other end of the FIFOs to the logging service.
Otherwise use /var/log files as before.
Signed-off-by: David Scott <dave.scott@docker.com>
Previously memlogd would always run in the foreground. This patch
adds a `-daemonize` option which binds the /var/run sockets, forks
and execs itself and immediately returns. Therefore the program won't
block (important for an init.d script) but guarantees the sockets will
be available for any program started afterwards.
This also removes the alpine base from the memlogd image as `init`
"containers" are treated as simple file overlays.
Signed-off-by: David Scott <dave.scott@docker.com>
We will place the control sockets in the root /var/run and then share
with all services who need access.
Signed-off-by: David Scott <dave.scott@docker.com>
Since I struggled to understand and find information about how to
troubleshoot a running linuxkit instance, I propose to add these two
FAQ entries.
The first one explains why it is possible to not see the `containerd` or
`init` outputs at boot in the console.
The second one gives a few `ctr` example to list containers, running
containers or how to open a shell in a given container.
Signed-off-by: Brice Figureau <brice@daysofwonder.com>
These were being added to the incorrect directory.
Also move config file to /etc to be more standard.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
The previous commit updated to 4.16.18, which is the last
4.16.x kernel. The 4.16.18 kernel was compiled and pushed
but we may as well now remove it as it has been EOLed.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
When dealing with apk, `uname -m` doesn't always match the architecture
name that apk uses. Instead `apk --print-arch` is used.
Signed-off-by: Alan Raison <alanraison@users.noreply.github.com>
DNS lookups fail in qemu-user when it is built on Alpine: https://bugs.alpinelinux.org/issues/8131
Until this is resolved, we fetch the binaries from Debian and use those instead. The final stage
of the Dockerfile is still based on scratch.
We can revert this once the Alpine issue is fixed.
Signed-off-by: Justin Barrick <jbarrick@cloudflare.com>
For some reason, bind mounting does not always seem to work,
sometimes the filesystem is empty. Mounting a fresh copy seems
a better solution, and simplifies things. The container does
need `CAP_SYS_ADMIN` but only on boot.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
This attempts to work around a CI issue where we're running out of disk
space when rebuilding the init package.
Signed-off-by: Krister Johansen <krister.johansen@oracle.com>
When busybox's reboot processing occurs in init, it runs all SHUTDOWN
actions that are defined in inittab. Once those are complete, it will
trigger either a halt, poweroff, or reboot, depending upon what signal
is received. The mechanism that's used to shell out through inittab
does not allow us to pass through exactly which invocation was
requested.
Due to the way that rc.shutdown works, it invokes the poweroff action
for any and all SHUTDOWN callbacks, whether they're a reboot, poweroff,
or halt. Instead of handling the reboot(2) syscall in rc.shutdown,
return after killing and unmounting and let busybox's init process
decide which reboot(2) action to use.
Signed-off-by: Krister Johansen <krister.johansen@oracle.com>
With PR #3030 the behaviour of poweroff/halt is changed. This
test relies on on-shutdown containers to be executed to display
the test result (service containers have their stdout redirected).
Use 'poweroff' (note, no '-f') to ensure that:
- the machine actually powers off
- the on-shutdown container is executed
Note, there are subtle differences between 'poweroff' and 'halt'
between hypervisors. With HyperKit, 'halt' actually works, but with
qemu/kvm, with 'halt' the process does not exit.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@gmail.com>
Previously name and image were always the same so running two hosts
from one image was not possible!
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
While we can re-create the kernel source code we don't have it
handily available in one place. This commit stashes the kernel
and the WireGuard source as /src/linux.tar.xz and
/src/wireguard.tar.xz in the kernel package.
This increases the size of the hub image by around 100MB.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
Trying to keep the number of kernels we compile for these
platforms small and 4.16 is likely to be EOLed soon anyway.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
The kernel configs are the 4.16.x configs run through
a 'make defconfig && make oldconfig' cycle.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
Originally, Memorizer kernel fed inputs to add boot printouts from a debug tool, however, it creates unnecessary output. Remove the kernel boot option parameter.
Signed-off-by: Nathan Dautenhahn <ndd@cis.upenn.edu>
Note, we skip 4.14.45 because 4.14.46 only has 3 patches
in it which unbreak 'perf' compilation.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
The default Go tar has restrictions on filename length for example.
PAX is recommended over GNU.
Requires Go 1.10
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
For the initrd we only want to extract kernel, cmdline, and
the ucode CPIO archive. Skip whatever is left in ./boot
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
This output produces a kernel and a root filesystem
in squashfs format. squashfs is a read-only, compressed
filesystem.
The 'kernel+squashfs' output can be used in a similar way as
the default 'kernel+initrd' output format with the benefit
that the rootfs does not consume any memory.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
We currently hardcode the Linuxkit/mkimage- images. This has the
unfortunate consequence that, if we update the LinuxKit image used
to generate the output, we have to update the Moby tool and then
vendor it back into the LinuxKit repository.
This commit introduces UpdateOutputImages() which allows a client
of the Moby tools package to selectively overwrite the packages
used to generate the outputs.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@gmail.com>
It is quite confusing that from the host or another container that
binds `/containers` you cannot see the bind mounts, you have to enter
the container namespace. I think `rshared` is a better default. You
can always be explicit and add `private` if you want a private bind mount.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
This adds a namespace field to override the LinuxKit containerd
default namespace, in case you want to run a container in another
namespace.
Needs a patch in LinuxKit to implement this that I will open soon.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Annotations do not do anything by default but get passed through to the runtime,
which can be useful. I never metadata I didn't like...
Also fix sysctl to be a map in the validation, not an array. I can't see any
examples using this in LinuxKit, but this matches OCI so is correct.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
This prepends 'ucode.cpio' to the initrd if present. Padding
should not be necessary as the ucode.cpio should be padded
to the right size.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
For now the backends for the different formats do not yet
use the extracted ucode cpio archive, but '// TODO' are
placed for the backends which should eventually handle it.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
This extends the kernel filter to also look for the CPU microcode
file if specified in the YAML. If found, the ucode cpio archive
is placed into the intermediate tar file as '/boot/ucode.cpio'.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
This optional option will allow users to specify a CPU
microcode cpio archive to be prepended to the initrd file.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
User specified mounts should be able to rely on the rootfs being mounted, in
particular for a writeable container they should expect the writeable overlay
to already be in place.
Signed-off-by: Ian Campbell <ijc@docker.com>
Since that bumps to gogo protobuf v0.5 too do the same.
Note that there are no actual containerd changes here, although there are some
gogo proto ones.
Signed-off-by: Ian Campbell <ijc@docker.com>
Rather than queueing up into a `bytes.Buffer`.
In my test case (building kube master image) this reduces Maximum RSS (as
measured by time(1)) compared with the previous patch from 2.8G to 110M. The
tar output case goes from 2.1G to 110M also. Overall allocations are ~715M in
both cases.
Signed-off-by: Ian Campbell <ijc@docker.com>
All of the `output*` functions took a `[]byte` and immediately wrapped it in a
`bytes.Buffer` to produce an `io.Reader`. Make them take an `io.Reader` instead
and satisfy this further up the call chain by directing `moby.Build` to output
to a temp file instead of another `bytes.Buffer`.
In my test case (building kube master image) this reduces Maximum RSS (as
measured by time(1)) from 6.7G to 2.8G and overall allocations from 9.7G to
5.3G. When building a tar (output to /dev/null) the Maximum RSS fell slightly
from 2.2G to 2.1G. Overall allocations remained stable at around 5.3G.
Signed-off-by: Ian Campbell <ijc@docker.com>
Following https://golang.org/pkg/runtime/pprof/. When attempting to build
images in https://github.com/linuxkit/kubernetes CI the process is mysteriously
being SIGKILL'd, which I think might be down to OOMing due to the resource
limits placed on the build container.
I haven't done so yet but I'm intending to use these options to investigate and
they seem potentially useful in any case, even if this turns out to be a
red-herring.
Signed-off-by: Ian Campbell <ijc@docker.com>
The syntax used for the yaml definitions is changed by the need to include the
substruct in the struct literal.
For the label switch to `ImageConfig` directly, which is actually more correct
in that it avoids spurious `name` and `image` fields in the label.
Signed-off-by: Ian Campbell <ijc@docker.com>
Where "config-related" here means "ones you might find in the
"org.mobyproject.config" label on an image.
By making this new struct an anonymous member of the existing Image struct the
Go json parser does the right thing (i.e. inlines into the parent) when parsing
a complete image (from a yml assembly) by default. The Go yaml library which we
use requires a tag on the anonymous field to achieve the same.
Signed-off-by: Ian Campbell <ijc@docker.com>
It appears that the `$GOPATH` in `working_directory` is being treated as a literal
`GOPATH` at least when processing the `state_artifacts.path`. Inlining it seems
to have worked, at the cost of some duplication.
Signed-off-by: Ian Campbell <ijc@docker.com>
Solv: Updated documentation to point out limits of
files section regarding /var, /run, and /tmp dirs.
Signed-off-by: Tristan Slominski <tristan.slominski@gmail.com>
Looks like a6b89f1137 ("Update linuxkit/mkimage-*") updated to a
non-existing tag.
linuxkit pkg show-tag tools/mkimage-iso-bios
linuxkit/mkimage-iso-bios:165b051322578cb0c2a4f16253b20f7d2797a502
and docker pull of that image works.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
These versions were created by https://github.com/linuxkit/linuxkit/pull/2607
which enables content trust, so drop the sha256 from all of them and ensure
DOCKER_CONTENT_TRUST is unconditionally set when running, since these
references are hardcoded we know they must be signed.
Signed-off-by: Ian Campbell <ijc@docker.com>
AFAICT none of the callers (which all involve one of `linuxkit/mkimage-*`) have
any reason to hit the network.
Signed-off-by: Ian Campbell <ijc@docker.com>
If the YAML file contains:
- path: etc/linuxkit.yml
metadata: yaml
in the fil section, the image was build with content trust,
then the linuxkit.yml file image contains fully qualified
image references (including the sha256).
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
Instead of passing the image name as string use the a reference
to a containerd reference.Spec. This allows us, for example,
to update the reference in place when verifying content trust
with more specific information, such as the sha256
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
When constructing a Moby structure from a YAML also
extract a containerd reference.Spec for each image
and the kernel.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
We want to modify some of the content of the Image structure
and thus have to pass them by reference.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
This is a tarball of the kernel, initrd and cmdline files, suitable for
sending to the mkimage images that expect this format.
Note you can't currently stream this output format using `-o` will clean this
up in future commits.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
The next commit will start using some components of containerd
so vendor the latest version.
The latest vndr also removed some un-needed files previously vendored.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
We are going to phase out the LinuxKit build option, in favour of keeping Docker
or a native Linux build option for CI use cases, as it is faster. So the
hyperkit option that only worked in one very limited use case is not needed.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Previously any Runtime specified in yml would completely override anything from
the image label, even if they set distinct fields. This pushes the merging down
to the next layer, and in the case of BindNS down two layers.
Most of the fields involved needed to become pointers to support this, which
required a smattering of other changes to cope. As well as the local test suite
this has been put through the linuxkit test suite (as of cc200d296a).
I also tested in the scenario which caused me to file #152.
Fixes#152.
Signed-off-by: Ian Campbell <ijc@docker.com>
This puts the build side in charge of the runtime layout, which enables
additional optimisations later, like sharing the rootfs if it is
used multiple times.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
This could be used in LinuxKit now, as there are some examples, eg
https://github.com/linuxkit/linuxkit/blob/master/blueprints/docker-for-mac/base.yml#L33
which are creating containers to do a mount.
The main reason though is to in future change the ad hoc code that generates
overlay mounts for writeable containers with a runtime config which does
the same thing; this code needs to create both tmpfs and overlay mounts.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
This adds a `runtime` section in the config that can be used
to move network interfaces into a container, create directories,
and bind mount container namespaces into the filesystem.
See also https://github.com/linuxkit/linuxkit/pull/2413
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Rather than using an initrd, unpack full filesystem for ISO BIOS.
Stream docker output direct to file rather than via a buffer, to save
memory.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
When we converted these to cpio we were not noticing that they
were invalid as they had incorrect paths as we converted the
path to a symlink anyway. Only the busybox images have hard links
in, the Alpine ones are symlinks anyway, which is why it was
less visible too.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Also do some code cleanup.
Related to #131 we need to read the OCI config to find if the container
is read only, not rely on the yaml, as it may just be set in the label.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
To work with truly immutable filesystems, rather than ones
we sneakily remount `rw`, we are going to use overlay for
writeable containers. To leave the final mount as `rootfs`,
in the writeable case we make a new `lower` path for the read
only filesystem, and leave `rootfs` as a mount point for an
overlay, with the writable layer and workdir mounted as a tmpfs
on `tmp`.
See https://github.com/linuxkit/linuxkit/issues/2288
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Unfortunately there are a lot of issues with resolv.conf as we
cannot actually write it into the image from any docker image, as docker will
always have something bind mounted in.
In addition, normally we expect the filesystem to br read only for images
that moby generates, so the actual etc/resolv.conf is likely not to be writeable.
Previously we were adding in a default resolv.conf into every image pointing at
Google's name servers but that is really a bad idea.
Instead, normal images now get an empty default, while images in the `init`
section will get a symlink, currently hard coded to `/run/resolvconf/resolv.conf`
but you can override this with the `files` section to be static or a different
link.
In future, if we have an easy way to build and extract images with user control
of this, we can drop this.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Some of these are arbitrary and just syncing for the sake of it, however the
image- and runtime-spec are relevant. Interesting changes:
- runtime spec:
- LinuxRLimit is now POSIXRLimit.
- Specs.Config is now a pointer.
- LinuxResources.DisableOOMKiller moved to
LinuxResources.LinuxMemory.DisableOOMKiller
- image spec:
- Platform.Features is removed (unused here).
Signed-off-by: Ian Campbell <ijc@docker.com>
This is a list of images to run on a clean shutdown. Note that you must not rely on these
being run at all, as machines may be be powered off or shut down without having time to run
these scripts. If you add anything here you should test both in the case where they are
run and when they are not. Most systems are likely to be "crash only" and not have any setup here,
but you can attempt to deregister cleanly from a network service here, rather than relying
on timeouts, for example.
Fix https://github.com/linuxkit/linuxkit/issues/1988
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Currently this supports "yaml" as the only option, which will output
the yaml config (as JSON) into the file specified in the image.
Fix#107
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Previously I was forcing them to be strings, which is horrible. Now you
can either specify a numeric uid or the name of a service to use the
allocated id for that service.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
This removes the `lint` dependency from building Moby.
I've also added ineffassign to check ineffecutal assignments alongside
checks to ensure that both it and golint are installed.
Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
This includes https://github.com/moby/moby/pull/34040 which fixes Windows build
issues.
Note that this pulls in more than 500 (non merge) commits as well as the fix we
are interested in. A couple of new deps are pulled in, versions taken from
vendor/github.com/docker/docker/vendor.conf.
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
Continue to allow onboot to have duplicates as we do not run simultaneously
so that is ok (and we number them anyway), but services are run together
so we will get a runtime error if duplicated as this is the containerd/runc
id.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
We were pulling in this whole stack of packages just for `trust.ReleasesRole`.
Just define it locally.
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
Note that various fields have changed moved around in the JSON as a result:
* `Platform` has been removed.
* `Process` is now a pointer.
* `OOMScoreAdj` has moved into `Process`, from `Linux.Resources` (resolving a
TODO here).
Also updates golang.org/x/sys which is less critical.
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
This adds the OCI parts needed into the yaml, but there are still
permissions issues in practise so marked as experimental.
It may just need further documentation to resolve the issues.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
In order to support not running containers as root, allocate
each of them a uid and gid, a bit like traditional Unix system
service IDs. These can be referred to elsewhere by the name of
the container, eg if you wish to create a file owned by a
particular esrvice.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Allow setting ambient capabilities, as a seperate option to the standard
ones. If you are running as a non root user you should use these.
Note that unless you add `CAP_DAC_OVERRIDE` and similar permissions you
need to be careful about file ownership. Added support to set ownership
in the `files` section to help out with this.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Rather than build the image and have something weird happen, let's check
that the capabilities specified are actually valid capabilities.
Signed-off-by: Tycho Andersen <tycho@docker.com>
- this is pretty much the smallest change to split this out and it
exposes a few things that can be improved later
- no change to logging yet
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
- enable the hyperkit option by default on MacOS
- use it for creating raw disk images
fix#68
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
This disables the code in LinuxKit's `/bin/rc.init` which attempts to detect an
unconfigured hostname and generate a unique (ish) version from the MAC address.
Anyone who wants a specific fallback hostname can populate `etc/hostname`
through the `files` stanza in their `yml` file.
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
In the WIP code in `moby` we now have a standard base tarball format,
that includes the kernel and cmdline as files in `/boot` so that the
entire output of the yaml file can default to a single tarball. Then
this can be split back up by LinuxKit into initrd, kernel and cmdline
as needed. This will probably become the only output of the `moby build`
stage, with a `moby package` stage dealing with output formats.
We may remove the output format specification from the yaml file as well,
and just have it in the command.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Instead, make a hard link a symlink. This isn't much better, but it allows
some cases (e.g. installing GCC on moby via alpine) to work.
Signed-off-by: Tycho Andersen <tycho@docker.com>
This does not get everything where we want it finally, see #1266
nor the optimal way of building, but it gets it out of top level.
Added instructions to build if you have a Go installation.
Not moving `vendor` yet.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
- remove remainder of editions code
- add a new check container to run tests without Docker
- switch over `make test` to use new command to build tests
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Also keep track of directory creation there, so you can explicitly
set directory permissions if required, and to avoid duplicates.
We should really keep track of files created elsewhere in the build
as well as we still might create some extras, but at least you can
set the write permisisons.
We can add uid, gid support too if required...
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
This will add a Dockerfile which will build the contents into an
image and then call `tinit` to start it.
This is fairly experimental, but is a prototype for other non
LinuxKit outputs. The container will need to run as `privileged`
as `runc` needs quite a few capabilities and `containerd` needs to
mount.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
- generally people refer to a plain disk image as `raw`
- `gcp` is shorter and it is the only image type supported
- remove `img-gz` as it is not needed. It does not really save space
as you have to build the full image and compress it anyway. On
many platforms the `raw` image will be a sparse file anyway,
even on the Mac soon.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
This is a little ugly in terms of the validation now, but it is a move towards
splitting "build" and "package".
The "tar" output (and soon others) can output direct to a file or to stdout.
Obviously you can only build a single output format like this.
The LinuxKit output formats that build disk images cannot stream as they
have to build whole images. These allow multiple outputs.
In future we will probably change to
```
moby build | moby package
```
or similar, but that is a bit ugly, so currently have a compromise where
there are essentially two output types.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
GCP does not recognise the images, even though they appear identical to those made
by libguestfs and work on qemu fine. Their validation code does not like them for some
reason.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Each section will be appended in order of the CLI, other then
kernel where last specified one wins.
This is useful if you eg want to have a base version for (say)
AWS and GCP and then add your own image on top.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
- does not require docker if user has qemu natively, will still fall back to docker
- allow specifying size for fixed size disk images
- add a raw disk output format
- more dogfooding
- marginally slower, but can be improved later
The images used to do the build are cached to make the process quicker.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Default to sharing net, ipc, uts namespaces between containers in config.
This makes most sense, as this is normal other than if we want to specifically
isolate system containers, in which case we will specify in config.
- explicitly support the value "new" if you want to isolate
- support the synonym "root" for "host" as in non LinuxKit setups it may
not actually be the host, it will be the current namespace.
- only support "none" as a synonym for "new" for network namespace where it is
carried over from Docker.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
This removes outputs from yaml, instead you can do
```
moby build -output tar -output qcow2 file.yaml
```
or alternative syntax
```
moby build -output tar,qcow2 file.yaml
```
In future we may change this to be available in a `moby package`
step, but lets try this for now.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Using the label `org.mobyproject.config` will use that JSON
(or yaml, but it is very hard to get yaml into a label as newlines are
not respected) for parameters that are not explicitly set in the yaml file.
Had to change parameter definitions so override behaves as expected.
fix#16
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
This is a fairly generic bootable disk with syslinux. Should
work if you dd it onto a USB stick, and should also work for AWS.
You need to uncompress it of course! Default size is 1G.
Will add cli option to set the size once I split out `moby build`
and `moby package` shortly.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Due to a missing else the tool would previously terminate with an error
message showing that the kernel or init image didn't exist, even if it
was pulled successfully. Invoking the tool again would continue to the
next image.
Signed-off-by: Magnus Skjegstad <magnus@skjegstad.com>
Add a canonical single tarball output format. This
adds kernel and cmdline to `/boot` where LinuxKit output
formats will find them.
Make the other output formats use that as a base.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
`docker create` will not pull an image so we need an additional fallback.
Rework the pull and trust code so it is in one place to facilitate this.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Also do not require `tar` to be in container, use the standard
image export code that we already have and find the files we
want.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
This currently only changes the `gcp` target, but is the new
model - the `build` command will only do things locally, then
you need to `push` to an image store such as GCP or other ones
in order to `run` for platforms that cannot boot directly from
a local image.
Fix#1618
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
GCP defines some "standard" environment variables for project and
zone. Use them for 'moby run gcp'. Change the other environment
variables to follow the same pattern.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
This uses the Packet.net API and iPXE to boot a Moby host.
There are several enhancements coming soon, such as SSH key
customisation, but this PR is sufficient to boot a host and
then use the web interface to get console access.
The user must currently upload the built artefacts to a public
URL and specify it via --base-url, e.g.:
moby run packet --api-key <key> --project-id <id> \
--base-url http://recoil.org/~avsm/ipxe --hostname test-moby packet
See #1424#1245 for related issues.
Signed-off-by: Anil Madhavapeddy <anil@docker.com>
This makes gcp behave in a similar way to the qemu backend.
The minimum size on GCP 1GB, whereas qemu uses 256MB.
Without this, the LTP tests fail on GCP.
Signed-off-by: Dave Tucker <dt@docker.com>
Adds an "access config" with a type of "ONE_TO_ONE_NAT" that
allows an instance to obtain an ephemeral IP address and access the
internet
Signed-off-by: Dave Tucker <dt@docker.com>
As suggested by @shykes these are clearer
- onboot for things that are run at boot time to completion
- services for persistent services
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
This allows overriding the name used of the file in google storage,
image name or instance name. This will vary depending on how much `moby
run` is doing which is goverened by whether the positional argument
contains an `.img.tar.gz` or not.
For example:
`moby run gcp -img-name test-ea34d1 test` creates an instance called
`test-ea34d1` from the image `test`
`moby run gcp -img-name test-ea34d1` test.img.tar.gz` will upload the
file as `test-ea34d1.tar.gz`, create image `test-ea34d1` and create an
instance called `test-ea34d1`.
The use case for this is for CI to be able to spawn many concurrent test
machines and provide it's own name for them.
Signed-off-by: Dave Tucker <dt@docker.com>
- masked paths
- readonly paths
- allow attaching to existing namespaces, eg if bind mounted by a system container
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Pass version and git commit hash from the Makefile
into main.go. Add a 'version' subcommand to print
the information.
While at it also tweak the help output to only print the
command name and not the entire path.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
Something like "moby-4.10.yml" did not work when invoked
like "moby build moby-4.10".
While at it, also allow .yaml as an extension.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
This refactors the mount handling, without changing any defaults.
Any specification of a mount destination will override the default,
so if you want to make `sysfs` read only you can add
```
mounts:
- type: sysfs
options: ["ro"]
```
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
This commit implements `moby run gcp` which allows for testing of moby
images on the Google Cloud Platform
This backend attaches (via SSH) to the serial console.
It generates instance-only SSH keys and adds the public key to the
image metadata. These are used by the `moby` tool only.
It will also automatically upload a file and creates an image if the prefix
given to `moby run` is a filename
Signed-off-by: Dave Tucker <dt@docker.com>
This commit uses the older GCP API as it supports both compute and
storage. As a result, we can now use either Application Default
Credentials that are generated using the `gcloud` tool or by supplying the
service account credentials in JSON format
Signed-off-by: Dave Tucker <dt@docker.com>
This adds every capability. We had this before the OCI changes as we
passed these values to Docker. Makes fully privileged containers less verbose.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
In the riddler change I changed "command" in the yaml to "args"
but did not change the files. In fact we basically used the
default command everywhere so this did not actually break.
Remove the unnecessary "command" lines to simplify yaml.
Revert the command to args change for now as I think I prefer
command, but its easier to switch now. Need to think if the
entrypoint/command distinction matters before finalizing.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
We are generally outputting to stdout pipe which the log driver does
not cope with very well; always did this in older builds.
Saves another 5% of build time.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Generated largely from the specified config; small parts taken from `docker image inspect`,
such as the command line.
Renamed some of the yaml keys to match the OCI spec rather than Docker Compose as
we decided they are more readable, no more underscores.
Add some extra functionality
- tmpfs specification
- fully general mount specification
- no new privileges can be specified now
For nostalgic reasons, using engine-api to talk to the docker cli as
we only need an old API version, and it is nice and easy to vendor...
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
This is compatible with containerd 8353da59c6ae7e1933aac2228df23541ef8b163f
which was picked up by d2caae4c1a.
This required jiggering with riddler output some more to update to new OCI
config.json format for capabilities.
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
Add a -data option to the HyperKit "run" backend. This either
adds a string or a file to a ISO which is attached to the VM.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
Separating command line option parsing from executing hyperkit
makes the code awkward with many parameters passed between functions.
Having everything in one function makes the code simpler.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
This provides a consistent UX between build and run:
moby build foo # build from foo.yml
moby run foo # boot, e.g., foo-bzImage, foo-initrd.img
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
Some users seem to have Docker for Mac/hyperkit in a non-standard
path. Allow them to specify the path to the hyperkit executable.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
- Move HyperKit code into a separate file. It should be compilable
on all supported OSes now.
- Add a (optional) subcommand to "moby run" to select a backend
i.e., "moby run hyperkit [options] [prefix]"
- On macOS the default is "hyperkit" so that:
"moby run [options] [prefix]"
just works
- Add enough command line parsing to make it easy to add new
backends to the run command
Update help messages.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
This formatter strips the prefix from Info() events to
make the default output of "moby build" more readable.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
This adds log.Info() to the main steps of the "moby build"
process. By default the Info() output is shown to the user
so it provides some idea of progress and what is happening.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
docker-compose and other utilities use the .yml extension.
For consistency rename all .yaml to .yml
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
- this removes the use of riddler to extract the rootfs, use code
we were using for rootfs. riddler now just geenrates the config,
next stage is to generate this ourselves
- change the naming of the daemons so no longer include number as we
do not guarantee ordering as they start up simultaneously
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Corrected naming from vmware->vmdk and fixed Makfile
Fixed mistake outputting a vhd instead of a vmdk in output.go
Build vmdk image and added to Docker Hub, corrected link in output.go
Modified directories to confirm to standard mkimage-<imgType>
Signed-off-by: Dan Finneran <dan@thebsdbox.co.uk>
Removing the left over indirect creates that use the Docker socket
and run in containers not directly.
See #1347
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
'moby run' will use the kernel and initrd image produced
by 'moby build' and, on macOS, will run it inside a
hyperkit VM. This assumes that you have a recent version
of Docker for Mac installed as it re-uses the hyperkit
and VPNKit from it.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
This does not get everything where we want it finally, see #1266
nor the optimal way of building, but it gets it out of top level.
Added instructions to build if you have a Go installation.
Not moving `vendor` yet.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Trying to find the relevant yaml file was an issue as we now support
`--name` and it might be in a different directory, so although it is
a bit verbose outputing a whole file at least it is more consistent.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
- this needs improvements to make it more "platform native", in
particular GCP supports multiple users and more ssh key mangement
options.
- at present you can login as root with any platform ssh key
- add support for uts=host and ipc=host
- set the hostname from the metadata as well
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
- the `public` option was not previously implemented
- add `replace` only for GCP images which will error otherwise. Only
recommended for use in development, in production use the `--name` option
to provide a different name eaxch time. Note only applies to GCP images,
will document these options properly soon.
- add a `family` option; this allows you to upload many images and the
user can select the latest using the `family` option instead of a specific
image.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
This sets the base name of the built images which otherwise
defaults to the basename of your yaml file. This allows
building different versions easily eg adding git sha to the
output names.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
This requires switching to the dosfstools from alpine:edge since neither the
busybox nor alpine:3.5 dosfstools supports the -C option (in fact alpine:3.5
only has mkfs.fat and not mkfs.vfat).
The 511k slack seems like a lot to me, but 256k was somehow not enough.
Fixes#1304.
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
- the image upload uses the cloud API
- currently auth and image creation need the `gcloud` CLI tool.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
- VHD is uncompressed VHD. Currently hard coded at 1GB, which may need to change. Use `format: vhd`
- GCE is the GCE compressed tarred raw image. Use `format: gce-img` - reserving `gce` for actually
uploading the image.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
from:
2017/03/07 09:59:30 Failed to extract kernel image and tarball
to
2017/03/07 10:06:04 Failed to extract kernel image and tarball: Unable to find image 'mobylinux/kernel:7fa748810d7866797fd807a5682d5cb3c9c98111' locally
Signed-off-by: Tycho Andersen <tycho@docker.com>
- remove remainder of editions code
- add a new check container to run tests without Docker
- switch over `make test` to use new command to build tests
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Note that the EFI ISO is not yet automatically sized, and the
kernel command lines are currently hard coded in the builders.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
- split out config processing a bit
- just use `capabilities` not `cap-add` and `cap-drop`
- allow use of CAP_ prefix on capabilities, as this is what `runc` uses
- add nginx to example config
- fix bind mounts
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
_This list is currently under construction. Please add your use cases to this with a PR. Thanks!_
# Production Users
**_[Docker Desktop](https://www.docker.com/products/docker-desktop)_** - Docker Desktop for Mac and Windows uses LinuxKit to provide an embedded, invisible virtual machine in order to run Linux containers and to run Kubernetes. There are currently millions of active users.
**_[TagHub](https://www.taghub.net)_** - TagHub is a SaaS product for doing asset management. We use LinuxKit to have small and secure linux nodes powering our multi-cloud infrastructure. TagHub is made by [Smart Management](http://www.smartm.no/).
# Projects Using LinuxKit
**_[LinuxKit Nix](https://github.com/nix-community/linuxkit-nix)_** aims to provide a Linux Nix VM for macOS.
**_[cfdev](https://github.com/cloudfoundry-incubator/cfdev)_** A fast and easy local Cloud Foundry experience on native hypervisors, powered by LinuxKit with VPNKit
**_[dm-linuxkit](https://github.com/dotmesh-io/dm-linuxkit)_** A dotmesh controller for LinuxKit persistent storage management.
**_[Linux Foundation Edge EVE](https://github.com/lf-edge/eve)_** Edge Virtualization Engine Operating System
@@ -10,12 +10,13 @@ LinuxKit, a toolkit for building custom minimal, immutable Linux distributions.
- Completely stateless, but persistent storage can be attached
- Easy tooling, with easy iteration
- Built with containers, for running containers
- Designed to create [reproducible builds](./docs/reproducible-builds.md) [WIP]
- Designed for building and running clustered applications, including but not limited to container orchestration such as Docker or Kubernetes
- Designed from the experience of building Docker Editions, but redesigned as a general-purpose toolkit
- Designed to be managed by external tooling, such as [Infrakit](https://github.com/docker/infrakit) or similar tools
- Designed to be managed by external tooling, such as [Infrakit](https://github.com/docker/infrakit) (renamed to [deploykit](https://github.com/docker/deploykit) which has been archived in 2019) or similar tools
- Includes a set of longer-term collaborative projects in various stages of development to innovate on kernel and userspace changes, particularly around security
LinuxKit currently supports the `x86_64`, `arm64`, and `s390x` architectures on a variety of platforms, both as virtual machines and baremetal (see [below](#booting-and-testing) for details.
LinuxKit currently supports the `x86_64`, `arm64`, and `s390x` architectures on a variety of platforms, both as virtual machines and baremetal (see [below](#booting-and-testing) for details).
## Subprojects
@@ -24,6 +25,7 @@ LinuxKit currently supports the `x86_64`, `arm64`, and `s390x` architectures on
- [linux](https://github.com/linuxkit/linux) A copy of the Linux stable tree with branches LinuxKit kernels.
- [virtsock](https://github.com/linuxkit/virtsock) A `go` library and test utilities for `virtio` and Hyper-V sockets.
- [rtf](https://github.com/linuxkit/rtf) A regression test framework used for the LinuxKit CI tests (and other projects).
- [homebrew](https://github.com/linuxkit/homebrew-linuxkit) Homebrew packages for the `linuxkit` tool.
## Getting Started
@@ -34,7 +36,7 @@ LinuxKit uses the `linuxkit` tool for building, pushing and running VM images.
Simple build instructions: use `make` to build. This will build the tool in `bin/`. Add this
to your `PATH` or copy it to somewhere in your `PATH` eg `sudo cp bin/* /usr/local/bin/`. Or you can use `sudo make install`.
If you already have `go` installed you can use `go get -u github.com/linuxkit/linuxkit/src/cmd/linuxkit` to install the `linuxkit` tool.
If you already have `go` installed you can use `go install github.com/linuxkit/linuxkit/src/cmd/linuxkit@latest` to install the `linuxkit` tool.
On MacOS there is a `brew tap` available. Detailed instructions are at [linuxkit/homebrew-linuxkit](https://github.com/linuxkit/homebrew-linuxkit),
the short summary is
@@ -43,11 +45,17 @@ brew tap linuxkit/linuxkit
brew install --HEAD linuxkit
```
Build requirements from source:
Build requirements from source using a container
- GNU `make`
- Docker
- optionally `qemu`
For a local build using `make local`
-`go`
-`make`
-`go get -u golang.org/x/lint/golint`
-`go get -u github.com/gordonklaus/ineffassign`
### Building images
Once you have built the tool, use
@@ -55,10 +63,8 @@ Once you have built the tool, use
```
linuxkit build linuxkit.yml
```
to build the example configuration. You can also specify different output formats, eg `linuxkit build -format raw-bios linuxkit.yml` to
output a raw BIOS bootable disk image, or `linuxkit build -format iso-efi linuxkit.yml` to output an EFI bootable ISO image. See `linuxkit build -help` for more information.
Since `linuxkit build` is built around the [Moby tool](https://github.com/moby/tool) the input yml files are described in the [Moby tool documentation](https://github.com/moby/tool/blob/master/docs/yaml.md).
to build the example configuration. You can also specify different output formats, eg `linuxkit build --format raw-bios linuxkit.yml` to
output a raw BIOS bootable disk image, or `linuxkit build --format iso-efi linuxkit.yml` to output an EFI bootable ISO image. See `linuxkit build -help` for more information.
### Booting and Testing
@@ -69,6 +75,7 @@ for example VMWare. See `linuxkit run --help`.
- [Raspberry Pi Model 3b](docs/platform-rpi3.md) `[arm64]`
@@ -117,7 +125,7 @@ To customise, copy or modify the [`linuxkit.yml`](linuxkit.yml) to your own `fil
generate its specified output. You can run the output with `linuxkit run file`.
The yaml file specifies a kernel and base init system, a set of containers that are built into the generated image and started at boot time. You can specify the type
of artifact to build with the `moby` tool eg `linuxkit build -format vhd linuxkit.yml`.
of artifact to build eg `linuxkit build -format vhd linuxkit.yml`.
If you want to build your own packages, see this [document](docs/packages.md).
@@ -131,7 +139,7 @@ The yaml format specifies the image to be built:
-`services` is the system services, which normally run for the whole time the system is up
-`files` are additional files to add to the image
For a more detailed overview of the options see [yaml documentation](https://github.com/moby/tool/blob/master/docs/yaml.md)
For a more detailed overview of the options see [yaml documentation](docs/yaml.md)
## Architecture and security
@@ -156,7 +164,11 @@ This is an open project without fixed judgements, open to the community to set t
## Development reports
There are weekly [development reports](reports/) summarizing work carried out in the week.
There are monthly [development reports](reports/) summarising the work carried out each month.
## Adopters
We maintain an incomplete list of [adopters](ADOPTERS.md). Please open a PR if you are using LinuxKit in production or in your project, or both.
This should allow end-users to gracefully reboot or shutdown Kubernetes nodes (incuding control planes) running on vSphere Hypervisor.
Furthermore, it is also mandatory to have `open-vm-tools` installed on your Kubernetes nodes to use vSphere Cloud Provider (i.e. determinte virtual machine's FQDN).
## Remarks:
-`spec.template.spec.hostNetwork: true`: correctly report node IP address; required
-`spec.template.spec.hostPID: true`: send the right signal to node, instead of killing the container itself; required
-`spec.template.spec.priorityClassName: system-cluster-critical`: critical to a fully functional cluster
-`spec.template.spec.securityContext.privileged: true`: gain more privileges than its parent process; required
git commit -a -s -m "pkgs: Update packages to the latest linuxkit/alpine"
# update package tags - may want to include the release in it if set
cd$LK_ROOT
make update-package-tags
MSG=""
[ -n "$LK_RELEASE"]&&MSG="to $LK_RELEASE"
git commit -a -s -m "Update package tags $MSG"
git push $LK_REMOTE$LK_BRANCH
```
#### Update tools packages
On your primary build machine, update the other tools packages.
Note, the `git checkout` reverts the changes made by
`update-component-sha.sh` to files which are accidentally updated.
Important is the `git checkout` of some sensitive packages that only can be built with
specific older versions of upstream packages:
*`grub-dev`
*`mkimage-rpi3`
Only update those if you know what you are doing with them.
Then we update any dependencies of these tools.
#### Update test packages
Next, we update the test packages to the updated alpine base.
Next, we update the use of test packages to latest.
Some tests also use `linuxkit/alpine`, so we update them as well.
### Update packages
Next, we update the LinuxKit packages. This is really the core of the
release. The other steps above are just there to ensure consistency
across packages.
#### External Tools
Most of the packages are build from `linuxkit/alpine` and source code
in the `linuxkit` repository, but some packages wrap external
tools. When updating all packages, and especially during the time of a release,
is a good opportunity to check if there have been updates. Specifically:
-`pkg/cadvisor`: Check for [new releases](https://github.com/google/cadvisor/releases).
-`pkg/firmware` and `pkg/firmware-all`: Use latest commit from [here](https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git).
-`pkg/node_exporter`: Check for [new releases](https://github.com/prometheus/node_exporter/releases).
- Check [docker hub](https://hub.docker.com/r/library/docker/tags/) for the latest `dind` tags. and update `examples/docker.yml`, `examples/docker-for-mac.yml`, `examples/cadvisor.yml`, and `test/cases/030_security/000_docker-bench/test.yml` if necessary.
This is at your discretion.
### Build and push affected downstream packages
<ul>Note</ul>: All of the `make push` and `make forcepush` in this section use `linuxkit pkg push`, which will build for all architectures and push
the images out. See [Build Platforms](./packages.md#Build_Platforms).
The kernel command-line is a string of text that the kernel parses as it is starting up. It is passed by the boot loader
to the kernel and specifies parameters that the kernel uses to configure the system. The command-line is a list of command-line
options separated by spaces. The options are parsed by the kernel and can be used to enable or disable certain features.
LinuxKit passes all command-line options to the kernel, which uses them in the usual way.
There are several options that can be used to control the behaviour of linuxkit itself, or specifically packages
within linuxkit. Unless standard Linux options exist, these all are prefaced with `linuxkit.`.
| Option | Description |
|---|---|
| `linuxkit.unified_cgroup_hierarchy=0` | Start up cgroups v1. If not present or set to 1, default to cgroups v1. |
| `linuxkit.runc_debug=1` | Start runc for `onboot` and `onshutdown` containers to run with `--debug`, and add extra logging messages for each stage of starting those containers. If not present or set to 0, default to usual mode. |
| `linuxkit.runc_console=1` | Send logs for runc for `onboot` and `onshutdown` containers, as well as the output of the containers themselves, to the console, instead of the normal output to logfiles. If not present or set to 0, default to usual mode. |
It often is useful to combine both of the `linuxkit.runc_debug` and `linuxkit.runc_console` options to get the most
information about what is happening with `onboot` containers.
This document describes how to install and maintain a LinuxKit development platform. It will grow over time.
The LinuxKit team also maintains several Linux-based build platforms. These are donated by Equinix Metal (arm64) and IBM (s390x).
## Platform-Specific Installation
### arm64 and amd64
The `amd64` and `arm64` platforms are fully supported by most OS vendors and Docker. Just upgrade to the latest OS and install the latest Docker using the
packaging tools. As of this writing, that is:
* Ubuntu/Debian with `apt`
* RHEL/CentOS/Fedora with `yum`. For any of these, use the CentOS 7/8 packages as released by Docker.
Docker does not recommend that you using the packages released by the OS vendors, as those tend to be out of date. Follow the instructions
The s390x has modern versions of most OSes, including RHEL and Ubuntu, but does not have recent versions of docker, neither as
`apt` packages for Ubuntu, nor as static downloads. In any case, these static downloads mostly are replicas.
This section describes how to install modern versions of Docker on these platforms.
#### RHEL
RHEL 7 on s390x only has releases from Docker. Follow the instructions from Docker to install. The rpm packages for RHEL are available at
https://download.docker.com/linux/rhel/
#### Ubuntu
Docker does not release packages for Ubuntu on s390x. The most recent release was for Ubuntu 18.04 Bionic, with Docker version 18.06.3.
This is quite old, and does not support modern capabilities, e.g. buildkit.
To install a more modern version:
1. Upgrade any dependent apt packages `apt upgrade`
1. Upgrade the operating system to your desired version `do-release-upgrade -d`. Note that you can set which versions to suggest via changing `/etc/update-manager/release-upgrades`
1. Download the necessary rpms (yes, rpms) from the Docker RHEL7 site. These are available [here](https://download.docker.com/linux/rhel/7/s390x/stable/Packages/). You need the following packages:
*`containerd.io-*.rpm`
*`docker-ce-*.rpm`
*`docker-ce-cli-*.rpm`
1. Install alien: `apt install alien`
1. Convert each package to a dpkg `alien --scripts <source-rpm-file.rpm>`
1. Install each package with `dpkg -i <source-dpkg>.dpkg`. Dependency management is not great, so we recommend installing them in order:
In the packages section you can find an image to setup dm-crypt encrypted devices in [linuxkit](https://github.com/linuxkit/linuxkit)-generated images.
The above will map `/dev/sda1` as an encrypted device under `/dev/mapper/dm_crypt_name` and mount it under `/var/secure_storage`
The `dm-crypt` container by default bind-mounts `/dev:/dev` and `/etc/dm-crypt:/etc/dm-crypt`. It expects the encryption key to be present in the file `/etc/dm-crypt/key`. You can pass an alternative location as encryption key which can be either a file path relative to `/etc/dm-crypt` or an absolute path.
Providing an alternative encryption key file name:
Note that you have to also map `/dev:/dev` explicitly if you override the default bind-mounts.
The `dm-crypt` container
* Will create an `ext4` file system on the encrypted device if none is present.
* It will also initialize the encrypted device by filling it from `/dev/zero` prior to creating the filesystem. Which means if the device is being setup for the first time it might take a bit longer.
* Uses the `aes-cbc-essiv:sha256` cipher (it's explicitly specified in case the default ever changes)
* Consequently the encryption key is expected to be 32 bytes long, a random one can be created via
```shell
dd if=/dev/urandom of=dm-crypt.key bs=32 count=1
```
If you see the error `Cannot read requested amount of data.` next to the log message `Creating dm-crypt mapping for ...` then this means your keyfile doesn't contain enough data.
### Examples
There are two examples in the `examples/` folder:
1. `dm-crypt.yml` - formats an external disk and mounts it encrypted.
2. `dm-crypt-loop.yml` - mounts an encrypted loop device backed by a regular file sitting on an external disk
### Options
|Option|Default|Required|Notes|
|---|---|---|---|
|`-k` or `--key`|`key`|No|Encryption key file name. Must be either relative to `/etc/dm-crypt` or an absolute file path.|
|`-l` or `--luks`||No|Use LUKS format for encryption|
|`<dm_name>`||**Yes**|The device-mapper device name to use. The device will be mapped under `/dev/mapper/<dm_name>`|
In order to make the disk available, you need to tell `linuxkit` where the disk file or block device is.
All local `linuxkit run` methods (currently `hyperkit`, `qemu`, and `vmware`) take a `-disk` argument:
All local `linuxkit run` methods (currently `hyperkit`, `qemu`, `virtualization.framework`and `vmware`)
take a `-disk` argument:
*`-disk path,size=100M,format=qcow2`. For size the default is in GB but an `M` can be appended to specify sizes in MB. The format can be omitted for the platform default, and is only useful on `qemu` at present.
-`-force` can be used to force the partition to be cleared and recreated (if applicable), and the recreated partition formatted. This option would be used to re-init the partition on every boot, rather than persisting the partition between boots.
-`-label` can be used to give the disk a label
-`-type` can be used to specify the type. This is `ext4` by default but `btrfs` and `xfs` are also supported
-`-partition` can be used to specify the partition table type. This is `dos` by default but `gpt` is also supported
-`-verbose` enables verbose logging, which can be used to troubleshoot device auto-detection and (re-)partitioning
- The final (optional) argument specifies the device name
@@ -6,7 +6,7 @@ Please open an issue if you want to add a question here.
LinuxKit does not require being installed on a disk, it is often run from an ISO, PXE or other
such means, so it does not require an on disk upgrade method such as the ChromeOS code that
is often used. It would definitely be possible to use that type of upgrade method if the
is often used. It would definitely be possible to use that type of upgrade method if the
system is installed, and it would be useful to support this for that use case, and an
updater container to control this for people who want to use this.
@@ -30,3 +30,95 @@ of dependencies and functionality that we do not need. At present we are using t
`init` process, and a small set of minimal scripts, but we expect to replace that with a small
standalone `init` process and a small piece of code to bring up the system containers where the
real work takes place.
## Console not displaying init or containerd output at boot
If you're not seeing `containerd` logs in the console during boot, make sure that your kernel `cmdline` configuration doesn't list multiple consoles.
`init` and other processes like `containerd` will use the last defined console in the kernel `cmdline`. When using `qemu`, to see the console you need to list `ttyS0` as the last console to properly see the output.
## Enabling and controlling containerd logs
On startup, linuxkit looks for and parses a file `/etc/containerd/runtime-config.toml`. If it exists, the content is used to configure containerd runtime.
Sample config is below:
```toml
cliopts="--log-level debug"
stderr="/var/log/containerd.out.log"
stdout="stdout"
```
The options are as follows:
*`cliopts`: options to pass to the containerd command-line as is.
*`stderr`: where to send stderr from containerd. If blank, it sends it to the default stderr, which is the console.
*`stdout`: where to send stdout from containerd. If blank, it sends it to the default stdout, which is the console. containerd normally does not have any stdout.
The `stderr` and `stdout` options can take exactly one of the following options:
*`stderr` - send to stderr
*`stdout` - send to stdout
* any absolute path (beginning with `/`) - send to that file. If the file exists, append to it; if not, create it and append to it.
Thus, to enable
a higher log level, for example `debug`, create a file whose contents are `--log-level debug` and place it on the image:
```yml
files:
- path:/etc/containerd/runtime-config.toml
source:"/path/to/runtime-config.toml"
mode:"0644"
```
Note that the package that parses the `cliopts` splits on _all_ whitespace. It does not, as of this writing, support shell-like parsing, so the following will work:
```
--log-level debug --arg abcd
```
while the following will not:
```
--log-level debug --arg 'abcd def'
```
## Troubleshooting containers
Linuxkit runs all services in a specific `containerd` namespace called `services.linuxkit`. To list all the defined containers:
```sh
(ns: getty) linuxkit-befde23bc535:~# ctr -n services.linuxkit container ls
CONTAINER IMAGE RUNTIME
getty - io.containerd.runtime.v1.linux
```
To list all running containers and their status:
```sh
(ns: getty) linuxkit-befde23bc535:~# ctr -n services.linuxkit task ls
linuxkit builds each runtime OS image from a combination of Docker images.
These images are pulled from a registry and cached locally.
linuxkit does not use the docker image cache to store these images. This is
for two key reasons.
First, docker does not provide support for different architecture versions. For
example, if you want to pull down `docker.io/library/alpine:3.13` by manifest,
with its signature, but get the `arm64` version while you are on an `amd64` device,
it is not supported.
Second, and more importantly, this requires a running docker daemon. Since the
very essence of linuxkit is removing daemons and operating systems where unnecessary,
just laying down bits in a file, removing docker from the image build process
is valuable. It also simplifies many use cases, like CI, where a docker daemon
may be unavailable.
## How LinuxKit Caches Images
LinuxKit pulls images down from a registry and stores them in a local cache.
It stores the root manifest or index of the image, the manifest, and all of the layers
for the requested architecture. It does not pull down layers, manifest or config
for all available architectures, only the requested one. If none is requested, it
defaults to the architecture on which you are running.
By default, LinuxKit caches images in `~/.linuxkit/cache/`. It can be changed
via a command-line option. The structure of the cache directory matches the
[OCI spec for image layout](http://github.com/opencontainers/image-spec/blob/master/image-layout.md).
Image names are kept in `index.json` in the [annotation](https://github.com/opencontainers/image-spec/blob/master/annotations.md) `org.opencontainers.image.ref.name`. For example"
@@ -10,17 +10,51 @@ The LinuxKit kernels are based on the latest stable releases and are
updated frequently to include bug and security fixes. For some
kernels we do carry additional patches, which are mostly back-ported
fixes from newer kernels. The full kernel source with patches can be
found on [github](https://github.com/linuxkit/linux). Each kernel
image is tagged with the full kernel version (e.g.,
`linuxkit/kernel:4.9.33`) and with the full kernel version plus the
hash of the files it was created from (git tree hash of the `./kernel`
directory). For selected kernels (mostly the LTS kernels and latest
stable kernels) we also compile/push kernels with additional debugging
enabled. The hub images for these kernels have the `-dbg` suffix in
the tag. For some kernels, we also provide matching packages
containing the `perf` utility for debugging and performance tracing.
The perf package is called `kernel-perf` and is tagged the same way as
the kernel packages.
found on [github](https://github.com/linuxkit/linux).
## Kernel Image Naming and Tags
We publish the following kernel images:
* primary kernel
* debug kernel
* tools for the specific kernel build - bcc and perf
* builder image for the specific kernel build, useful for compiling compatible kernel modules
### Primary Kernel Images
Each kernel image is tagged with:
* the full kernel version, e.g. `linuxkit/kernel:6.6.13`. This is a multi-arch index, and should be used whenever possible.
* the full kernel version plus hash of the files it was created from (git tree hash of the `./kernel` directory), e.g. `6.6.13-c0d96951e9892a7447a8e7965d2d6bd7e621c3fd`. This is a multi-arch index.
* the full kernel version plus architecture, e.g. `linuxkit/kernel:6.6.13-amd64` or `linuxkit/kernel:6.6.13-arm64`. Each of these is architecture specific.
* the full kernel version plus hash of the files it was created from (git tree hash of the `./kernel` directory) plus architecture, e.g. `6.6.13-c0d96951e9892a7447a8e7965d2d6bd7e621c3fd-arm64`.
### Debug Kernel Images
With each kernel image, we also publish kernels with additional debugging enabled.
These have the same image name and the same tags as the primary kernel, with the `-dbg`
In addition to the official images, there are also some
[scripts](../contrib/foreign-kernels) which repackage kernels packages
@@ -32,7 +66,6 @@ use cases for the promising IoT scenarios. All -rt patches are grabbed from
https://www.kernel.org/pub/linux/kernel/projects/rt/. But so far we just
enable it over 4.14.x.
## Loading kernel modules
Most kernel modules are autoloaded with `mdev` but if you need to `modprobe` a module manually you can use the `modprobe` package in the `onboot` section like this:
@@ -45,22 +78,36 @@ Most kernel modules are autoloaded with `mdev` but if you need to `modprobe` a m
## Compiling external kernel modules
This section describes how to build external (out-of-tree) kernel
modules. It is assumed you have the source available to those modules,
and require the correct kernel version headers and compile tools.
modules. You need the following to build external modules. All of
these are to be built for a specific version of the kernel. For
the examples, we will assume 5.10.104; replace with your desired
version.
The LinuxKit kernel packages include `kernel-dev.tar` which contains
* source available to your modules - you need to get those on your own
* kernel development headers - available in the `linuxkit/kernel` image as `kernel-dev.tar`, e.g. `linuxkit/kernel:5.10.104`
* OS with sources and compiler - this **must** be the exact same version as that used to compile the kernel
As described above, the `linuxkit/kernel` images include `kernel-dev.tar` which contains
the headers and other files required to compile kernel modules against
the specific version of the kernel. Currently, the headers are not
included in the initial RAM disk, but it is possible to compile custom
modules offline and then include the modules in the initial RAM disk.
There is a [example](../test/cases/020_kernel/010_kmod_4.9.x), but
The source is available as the same name as the `linuxkit/kernel` image, with the addition of `-builder` on the tag.
For example:
*`linuxkit/kernel:5.10.92` has builder `linuxkit/kernel:5.10.92-builder`
*`linuxkit/kernel:5.15.15` has builder `linuxkit/kernel:5.15.15-builder`
With the above in hand, you can create a multi-stage `Dockerfile` build to compile your modules.
There is an [example](../test/cases/020_kernel/113_kmod_5.10.x), but
basically one can use a multi-stage build to compile the kernel
modules:
```
FROMlinuxkit/kernel:4.9.33 AS ksrc
FROM linuxkit/alpine:<hash> AS build
```dockerfile
FROMlinuxkit/kernel:5.10.104ASksrc
FROMlinuxkit/kernel:5.10.104-builderASbuild
RUN apk add build-base
COPY --from=ksrc /kernel-dev.tar /
@@ -73,55 +120,275 @@ To use the kernel module, we recommend adding a final stage to the
Dockerfile above, which copies the kernel module from the `build`
stage and performs a `insmod` as the entry point. You can add this
This section describes how to build kernels, and how to modify existing ones.
Throughout the document, the terms used are:
* kernel version: actual semver version of a kernel, e.g. `6.6.13` or `5.15.27`
* kernel series: major.minor version of a kernel, e.g. `6.6.x` or `5.15.x`
Throughout this document, the architecture used is the kernel-recognized one, available
on most systems as `uname -m`, e.g. `aarch64` or `x86_64`. You may be familiar with the alpine
or golang one, e.g. `amd64` or `amd64`, which are not used here.
**Note:** After changing _and committing any changes_ to the kernel directory or any
subdirectories, you must update tests, examples and other dependencies. This is done
via:
```bash
make update-kernel-yamls
```
Each series of kernels has a dedicated directory in [../kernel/](../kernel),
e.g. [6.6.x](../kernel/6.6.x) or [5.15.x](../kernel/5.15.x).
Variants, like rt kernels, have their own directory as well, e.g. [5.11.x-rt](../kernel/5.11.x-rt).
However, for variants, the patches from _both_ the common kernel, e.g. [5.11.x](../kernel/5.11.x),
and the variant, e.g. [5.11.x-rt](../kernel/5.11.x-rt), are applied, and the configs from _both_ are combined.
Within the series-dedicated directory, there are:
* kernel config file for each architecture named `config-<arch>`, e.g. [6.6.13/config-x86_64](../kernel/6.6.13/config-x86_64), one per target architecture.
* optional patches directory, e.g. [6.6.13/patches](../kernel/6.6.13/patches), which contains patches to apply to the kernel source
The config file and patches are applied during the kernel build process.
**Note**: We try to keep the differences between kernel versions and
architectures to a minimum, so if you make changes to one
configuration also try to apply it to the others. The script [kconfig-split.py](../scripts/kconfig-split.py) can be used to compare kernel config files. For example:
creates a file with the common and the x86_64 and arm64 specific
config options for the 4.9.x kernel series.
config options for the 5.15.x kernel series.
**Note**: The CI pipeline does *not* push out kernel images.
Anyone modifying a kernel should:
1. Follow the steps below for the desired changes and commit them.
1. Run appropriate `make build` or variants to ensure that it works.
1. Open a PR with the changes. This may fail, as the CI pipeline may not have access to the modified kernels.
1. A maintainer should run `make push` to push out the images.
1. Run (or rerun) the tests.
#### Build options
The targets and variants for building are as follows:
*`make build` - make all kernels in the version list and their variants
*`make build-<version>` - make all variants of a specific kernel version
*`make buildkernel-<version>` - make all variants of a specific kernel version
*`make buildplainkernel-<version>` - make just the provided version's kernel
*`make builddebugkernel-<version>` - make just the provided version's debug kernel
*`make buildtools-<version>` - make just the provided version's tools
To push:
*`make push` - push all kernels in the version list and their variants
*`make push-<version>` - push all variants of a specific kernel version
Finally, for convenience:
*`make list` - list all kernels in the version list
By default, it builds for all supported architectures. To build just for a specific
architecture:
```sh
make build ARCH=amd64
```
The variable `ARCH` should use the golang variants only, i.e. `amd64` and `arm64`.
To build for multiple architectures, call it multiple times:
```sh
make build ARCH=amd64
make build ARCH=arm64
```
When building for a specific architecture, the build process will use your local
Docker, passing it `--platforms` for the architecture. If you have a builder on a different
architecture, e.g. you are running on an Apple Silicon Mac (arm64) and want to build for
`x86_64` without emulating (which can be very slow), you can use the `BUILDER` variable:
```sh
make build ARCH=x86_64 BUILDER=remote-amd64-builder
```
Builder also supports a builder pattern. If `BUILDER` contains the string `{{.Arch}}`,
it will be replaced with the architecture being built.
For example:
```sh
make build ARCH=x86_64 BUILDER=remote-{{.Arch}}-builder
make build ARCH=aarch64 BUILDER=remote-{{.Arch}}-builder
```
will build `x86_64` on `remote-amd64-builder` and `aarch64` on `remote-arm64-builder`.
Finally, if no `BUILDER` is specified, the build will look for a builder named
`linuxkit-linux-{{.Arch}}-builder`, e.g. `linuxkit-linux-amd64-builder` or
`linuxkit-linux-arm64-builder`. If that builder does not exist, it will fall back to
your local Docker setup.
### Modifying the kernel config
The process of modifying the kernel configuration is as follows:
1. Create a `linuxkit/kconfig` container image: `make kconfig`. This is not pushed out.
1. Run a container based on `linuxkit/kconfig`.
1. In the container, modify the config to suit your needs using normal kernel tools like `make defconfig` or `make menuconfig`.
1. Save the config from the image.
The `linuxkit/kconfig` image contains the patched sources
for all support kernels and architectures in `/linux-<major>.<minor>.<rev>`.
The kernel source also has the kernel config copied to the default kernel config location,
so that `make menuconfig` and `make defconfig` work correctly.
Run the container as follows:
```sh
docker run --rm -ti -v $(pwd):/src linuxkit/kconfig
```
This will give you a interactive shell where you can modify the kernel
configuration you want, while mounting the directory, so that you can save the
modified config.
To create or modify the config, you must cd to the correct directory,
e.g.
```sh
cd /linux-6.6.13
# or
cd /linux-5.15.27
```
Now you can build the config.
When `make defconfig` or `make menuconfig` is done,
the modified config file will be in `.config`; save the file back to `/src`,
e.g.
```sh
cp .config /src/6.6.x/config-x86_64
```
You can also configure other architectures other than the native
one. For example to configure the arm64 kernel on x86_64, use:
```sh
make ARCH=arm64 defconfig
make ARCH=arm64 oldconfig # or menuconfig
```
Note that the generated file **must** be final. When you actually build the kernel,
it will check that running `make defconfig` will have no changes. If there are changes,
the build will fail.
The easiest way to check it is to rerun `make defconfig` inside the kconfig container.
1. Finish your creation of the config file, as above.
1. Copy the `.config` file to the target location, as above.
1. Copy the `.config` file to the source location for defconfig, e.g. `cp .config arch/x86/configs/x86_64_config` or `cp. config /linux/arch/arm64/configs/defconfig`
1. Run `make defconfig` again, and check that there are no changes, e.g. `diff .config arch/x86/configs/x86_64_config` or `diff .config /linux/arch/arm64/configs/defconfig`
If there are no differences, then you can commit the new config file.
Finally, test that you can build the kernel with that config as `make build-<version>`, e.g. `make build-5.15.148`.
## Adding a new kernel version
If you want to add a new kernel version within an existing series, e.g. `5.15.27` already exists
and you want to add (or replace it with) `5.15.148`, apply the following process.
1. Determine the series, i.e. the kernel major.minor version, followed by `x`. E.g. for `5.15.148`, the series is `5.15.x`.
1. Modify the `KERNEL_VERSION` in the `build-args` file in the series directory to the new version. E.g. `5.15.x/build-args`.
1. Create a new `linuxkit/kconfig` container image: `make kconfig`. This is not pushed out.
1. Run a container based on `linuxkit/kconfig`.
```sh
docker run --rm -ti -v $(pwd):/src linuxkit/kconfig
```
1. In the container, change directory to the kernel source directory for the new version, e.g. `cd /linux-5.15.148`.
1. Run `make defconfig` to create the default config file.
1. If the config file has changed, copy it out of the container and check it in, e.g. `cp .config /src/5.15.x/config-x86_64`.
1. Repeat for other architectures.
1. Commit the changed config files.
1. Test that you can build the kernel with that config as `make build-<version>`, e.g. `make build-5.15.148`.
## Adding a new kernel series
To add a new kernel series, you need to:
1. Create new directory for the series, e.g. `6.7.x`
1. Create config files for each architecture in that directory
1. Optionally, create a `patches/` subdirectory in that directory with any patches to add
1. Create a `build-args` file in that directory with at least the following settings:
```bash
KERNEL_VERSION=<version>
KERNEL_SERIES=<series>
BUILD_IMAGE=linuxkit/alpine:<builder>
```
Since the last major series likely is the best basis for the new one, subject to additional modifications, you can use
the previous one as a starting point.
1. Make the directory for the new series, e.g. `mkdir 7.0.x`
1. Create a new `linuxkit/kconfig` container image: `make kconfig`. This is not pushed out.
1. Run a container based on `linuxkit/kconfig`.
```sh
docker run --rm -ti -v $(pwd):/src linuxkit/kconfig
```
1. In the container, change directory to the kernel source directory for the new version, e.g. `cd /linux-7.0.5`.
1. Copy the existing config file for the previous series, e.g. `cp /src/6.6.x/config-x86_64 .config`.
1. Run `make oldconfig` to create the config file for the new series from the old one. Answer any questions.
1. Save the newly generated config file `.config` to the source directory, e.g. `cp .config /src/7.0.x/config-x86_64`.
1. Repeat for other architectures.
1. Commit the new config files.
1. Test that you can build the kernel with that config as `make build-<version>`, e.g. `make build-7.0.5`.
In addition, there are tests that are applied to a specific kernel version, notably the tests in
[020_kernel](../test/cases/020_kernel/). You will need to add a new test case for the new series,
copying an existing one and modifying it as needed.
## Building and using custom kernels
@@ -149,7 +416,7 @@ appended. Then you can also override the Hub organisation to use the
image elsewhere with (and also disable image signing):
```sh
make ORG=<your hub org>NOTRUST=1
make ORG=<your hub org>
```
The image will be uploaded to Hub and can be use in a YAML file as
@@ -322,7 +589,7 @@ yourself:
```sh
cd kernel
make ORG=<foo>NOTRUST=1 push_zfs_4.9.x # or different kernel version
make ORG=<foo> push_zfs_4.9.x # or different kernel version
```
will build and push a `zfs-kmod-4.9.<version>` image to Docker Hub
@@ -347,3 +614,31 @@ Alpine `zfs` utilities are available in `linuxkit/alpine` and the
version of the kernel module should match the version of the
tools. The container where you run the `zfs` tools might also need
`CAP_SYS_MODULE` to be able to load the kernel modules.
## Kernels in examples and tests
All of the linuxkit `.yml` files use the images from `linuxkit/kernel:<tag>`.
When updating the kernel, you run commands to update the tests. The updates to any file that contains
references to `linuxkit/kernel` in this repository work as follows:
- Semver tags are replaced by the most recent kernel version. For example, `linuxkit/kernel:5.10.104` will become `6.6.13` when available, and then `6.6.15`, and then `7.0.1`, etc. The highest semver always is used.
- Semver+hash tags are replaced by the most recent hash and patch version for that series. For example, `linuxkit/kernel:5.10.104-abcdef1234` will become `5.10.104-aaaa54232` (same semver, newer hash), and then `5.10.105-bbbb12345` (newer semver, newer hash), etc. The highest semver+hash always is used.
This is not an inherent characteristic of `linuxkit` tool, which **never** will change your `.yml` files. It is part of
the update process for yml files _in this repository_.
The net of the above is the following rule:
* If you want a reference to a specific kernel series, e.g. a test or example that works only with `5.10.x`, then use a specific hash, e.g. `linuxkit/kernel:5.10.104-abcdef1234`. The hash and patch version will update, but not more. The most common use case for this is kernel version-specific tests.
* If you want a reference to the most recent kernel, whatever version it is, then use a semver tag, e.g. `linuxkit/kernel:6.6.13`. The most common use case for this is examples that work with any kernel version, which is the vast majority of cases.
You can get the current hash by executing the following:
Image to setup a loop device backed by a regular file in a [linuxkit](https://github.com/linuxkit/linuxkit)-generated image. The typical use case is to have a portable storage location which can be used to persist settings or other files. Can be combined with the `linuxkit/dm-crypt` package for protection.
## Usage
The setup is a one time step during boot:
```yaml
onboot:
- name:losetup
image:linuxkit/losetup:<hash>
command:["/usr/bin/loopy","-c","/var/test.img"]
```
The above will associate the file `/var/test.img` with `/dev/loop0` and will also create it if it's not present.
The container by default bind-mounts `/var:/var` and `/dev:/dev`. Usually the loop-file will reside on external storage which should be typically mounted under `/var` hence the choice of the defaults. If the loop-file is located somewhere else and you need a different bind-mount for it then do not forget to explicitly bind-mount `/dev:/dev` as well or else `losetup` will fail.
### Options
|Option|Default|Required|Notes|
|---|---|---|---|
|`-c` or `--create`||No|Creates the file if not present. If `--create` is not specified and the file is missing then the loop setup will obviously fail.|
|`-s` or `--size`|10|No|If `--create` was specified and the file is not present then this sets the size in MiB of the created file. The file will be filled from `/dev/zero`.|
|`-d` or `--dev`|`/dev/loop0`|No|Loop device which should be associated with the file.|
|`<file>`||**Yes**|The file to use as backing storage.|
@@ -7,23 +7,37 @@ packages, as it's very easy. Packages are the unit of customisation
in a LinuxKit-based project, if you know how to build a container,
you should be able to build a LinuxKit package.
All LinuxKit packages are:
-Signed with Docker Content Trust.
-Enabled with multi-arch manifests to work on multiple architectures.
- Derived from well-known (and signed) sources for repeatable builds.
All official LinuxKit packages are:
-Enabled with multi-arch indexes to work on multiple architectures.
-Derived from well-known sources for repeatable builds.
- Built with multi-stage builds to minimise their size.
## CI and Package Builds
When building and merging packages, it is important to note that our CI process builds packages. The targets `make ci` and `make ci-pr` execute `make -C pkg build`. These in turn execute `linuxkit pkg build` for each package under `pkg/`. This in turn will try to pull the image whose tag matches the tree hash or, failing that, to build it.
We do not want the builds to happen with each CI run for two reasons:
Any released image, i.e. any package under `pkg/` that has _not_ changed as
part of a pull request,
already will be released to Docker Hub. This will cause it to download that image, rather
than try to build it.
Any non-releaed image, i.e. any package under `pkg/` that _has_ changed as part of
a pull request, will not be in Docker Hub until the PR has merged.
This will cause the download to fail, leading `linuxkit pkg build` to try and build the
image and save it in the cache.
This does have two downsides:
1. It is slower to do a package build than to just pull the latest image.
2. If any of the steps of the build fails, e.g. a `curl` download that depends on an intermittent target, it can cause all of CI to fail.
Thus, if, as a maintainer, you merge any commits into a `pkg/`, even if the change is documentation alone, please do a `linuxkit package push`.
In the past, each PR required a maintainer to build, and push to Docker Hub, every
changed package in `pkg/`. This placed the maintainer in the PR cycle, with the
following downsides:
1. A maintainer had to be involved in every PR, not just reviewing but actually building and pushing. This reduces the ability for others to contribute.
1. The actual package is pushed out by a person, violating good supply-chain practice.
## Package source
@@ -36,11 +50,13 @@ A package source consists of a directory containing at least two files:
-`image`_(string)_: *(mandatory)* The name of the image to build
-`org`_(string)_: The hub/registry organisation to which this package belongs
-`dockerfile`_(string)_: The dockerfile to use to build this package, must be in this directory or below (default: `Dockerfile`)
-`arches`_(list of string)_: The architectures which this package should be built for (valid entries are `GOARCH` names)
-`extra-sources`_(list of strings)_: Additional sources for the package outside the package directory. The format is `src:dst`, where `src` can be relative to the package directory and `dst` is the destination in the build context. This is useful for sharing files, such as vendored go code, between packages.
-`gitrepo`_(string)_: The git repository where the package source is kept.
-`network`_(bool)_: Allow network access during the package build (default: no)
-`disable-content-trust`_(bool)_: Disable Docker content trust for this package (default: no)
-`disable-cache`_(bool)_: Disable build cache for this package (default: no)
-`buildArgs` will forward a list of build arguments down to docker. As if `--build-arg` was specified during `docker build`
-`config`: _(struct `github.com/moby/tool/src/moby.ImageConfig`)_: Image configuration, marshalled to JSON and added as `org.mobyproject.config` label on image (default: no label)
-`depends`: Contains information on prerequisites which must be satisfied in order to build the package. Has subfields:
-`docker-images`: Docker images to be made available (as `tar` files via `docker image save`) within the package build context. Contains the following nested fields:
@@ -52,9 +68,9 @@ A package source consists of a directory containing at least two files:
### Prerequisites
Before you can build packages you need:
- Docker version 17.06 or newer. If you are on a Mac you also need
`docker-credential-osxkeychain.bin`, which comes with Docker for Mac.
-`make`,`notary`,`base64`, `jq`, and `expect`
- Docker version 19.03 or newer.
- If you are on a Mac you also need`docker-credential-osxkeychain.bin`, which comes with Docker for Mac.
-`make`, `base64`, `jq`, and `expect`
- A *recent* version of `manifest-tool` which you can build with `make
bin/manifest-tool`, or `go get github.com:estesp/manifest-tool`, or
via the LinuxKit homebrew tap with `brew install --HEAD
@@ -65,68 +81,258 @@ Further, when building packages you need to be logged into hub with
`docker login` as some of the tooling extracts your hub credentials
during the build.
### Build Targets
LinuxKit builds packages as docker images. It deposits the built package as a docker image in one or both of two targets:
* the linuxkit cache, which is at `~/.linuxkit/cache/` (configurable)
* the docker image cache (optional)
The package _always_ is built and saved in the linuxkit cache. However, you _also_ can load the package for the current
architecture, if available, into the docker image cache.
If you want to build images and test and run them _in a standalone_ fashion locally, then you should add the docker image cache.
Otherwise, you don't need anything more than the default linuxkit cache. LinuxKit defaults to building OS images using docker
images from this cache, only looking in the docker cache if instructed to via `linuxkit build --docker`.
In the linuxkit cache, it creates all of the layers, the manifest that can be uploaded
to a registry, and the multi-architecture index. If an image already exists for a different architecture in the cache,
it updates the index to include additional manifests created.
The order of building is as follows:
1. Build the image to the linuxkit cache
1. If `--docker` is provided, load the image into the docker image cache
For example:
```bash
linuxkit pkg build pkg/foo # builds pkg/foo and places it in the linuxkit cache
linuxkit pkg build pkg/foo --docker # builds pkg/foo and places it in the linuxkit cache and also loads it into docker
```
#### Build Platforms
By default, `linuxkit pkg build` builds for all supported platforms in the package's `build.yml`, whose syntax is available
[here][Package source]. If no platforms are provided in the `build.yml`, it builds for all platforms that linuxkit supports.
As of this writing, those are:
* `linux/amd64`
* `linux/arm64`
* `linux/s390x`
You can choose to skip one of the platforms from `build.yml` or those selected
The options for `--platforms` are identical to those for [docker build](https://docs.docker.com/engine/reference/commandline/build/).
An example is available in the official [buildx documentation](https://docs.docker.com/buildx/working-with-buildx/#build-multi-platform-images).
Given that this is linuxkit, i.e. all builds are for linux, the `OS` part would seem redundant, and it should be sufficient to pass `--platform arm64`. However, for complete consistency, the _entire_ platform, e.g. `--platforms linux/amd64,linux/arm64`, must be provided.
#### Where it builds
You are running the `linuxkit pkg build` command on a single platform, e.g. your local linux cloud instance running on `amd64`, or
a MacBook with Apple Silicon running on `arm64`.
How does linuxkit determine where to build the target images?
linuxkit uses [buildkit](https://github.com/moby/buildkit) directly to build all images.
It uses docker contexts to determine _where_ to run those buildkit containers, based on the target
architecture.
When running a package build, linuxkit looks for a container named `linuxkit-builder`, running the appropriate
version of buildkit. If it cannot find a container with that name, it creates it.
If the container already exists but is not running buildkit, or if the version is incorrect, linuxkit stops and removes
the existing `linuxkit-builder` container and creates one running the correct version of buildkit.
When linuxkit needs to build a package for a particular architecture:
1. If a context for that architecture was provided, use that context, looking for and/or starting a buildkit container named `linuxkit-builder`.
1. If no context for that architecture was provided, use the `default` context.
The actual building then will be one of:
1. native, if the provided context has the same architecture as the target build architecture; else
1. cross-build, if the provided context has a different architecture, but the package's `Dockerfile` supports cross-building; else
1. emulated build, using docker's qemu binfmt capabilities
Cross-building, i.e. building on one platform using that platform's binaries to create outputs for a different platform,
depends on the package's `Dockerfile`. Details are available in the
* if the image is just `FROM something`, then it runs it under qemu using binfmt
* if the image is `FROM --platform=$BUILDPLATFORM something`, then it runs it using the local architecture, invoking cross-builders
Read the official docs to learn more how to leverage cross-building with buildx.
**Important:** When building, if the local architecture is not one of those being build,
selecting `--docker` to load the images into the docker image cache will result in an error.
You _must_ be building for the local architecture - optionally for others as well - in order to
pass the `--docker` option.
#### Providing native builder nodes
linuxkit is capable of using native build nodes to do the build, even remotely. To do so, you must:
1. Create a [docker context](https://docs.docker.com/engine/context/working-with-contexts/) that references the build node
1. Tell linuxkit to use that context for that architecture
linuxkit will then use that provided context to look for and/or start a container in which to run buildkit for that architecture.
linuxkit looks for contexts in the following descending order of priority:
1. CLI option `--builders <platform>=<context>,<platform>=<context>`, e.g. `--builders linux/arm64=linuxkit-arm64,linux/amd64=default`
1. Environment variable `LINUXKIT_BUILDERS=<platform>=<context>,<platform>=<context>`, e.g. `LINUXKIT_BUILDERS=linux/arm64=linuxkit-arm64,linux/amd64=default`
1. Existing context named `linuxkit-<platform>`, e.g. `linuxkit-linux-arm64` or `linuxkit-linux-s390x`, with "/" replaced by "-", as "/" is an invalid character.
1. Default context
If a builder name is provided for a specific platform, and it doesn't exist, it will be treated as a fatal error.
#### Examples
##### Simple build
There are no contexts starting with `linuxkit-`, no environment variable `LINUXKIT_BUILDERS`, no command-line argument `--builders`.
linuxkit will build any requested packages using `default` context on the local platform, with a container (created, if necessary) named `linuxkit-builder`.
Builds for the same architecture will be native, builds for other platforms will use either qemu or cross-building.
##### Specified target
You create a context named `my-remote-arm64` and then run:
* for arm64 using the context `linuxkit-linux-arm64`, since there is a context with the name `linuxkit-<platform>`, and you did not override it using `--builders` or the environment variable `LINUXKIT_BUILDERS`
* for amd64 using the context `default` and the `linuxkit` builder, as that is the default fallback
##### Combination
You create a context named `linuxkit-linux-arm64`, and another named `my-remote-builder-amd64` and then run:
* for arm64 using the context `linuxkit-linux-arm64`, since there is a context with the name `linuxkit-<platform>`, and you did not override that particular architecture using `--builders` or the environment variable `LINUXKIT_BUILDERS`
* for amd64 using the context `my-remote-builder-amd64`, since you specified for that architecture using `--builders`
The same would happen if you used `LINUXKIT_BUILDERS=linux/arm64=my-remote-builder-amd64` instead of the `--builders` flag.
##### Missing context
You do not have a context named `my-remote-arm64`, and run:
linuxkit will try to build for `linux/arm64` using the context `my-remote-arm64`. Since that context does not exist, you will get an error.
##### Preset build arguments
When building packages, the following build-args automatically are set for you:
* `SOURCE` - the source repository of the package
* `REVISION` - the git commit that was used for the build
* `GOPKGVERSION` - the go package version or pseudo-version per https://go.dev/ref/mod#glos-pseudo-version
* `PKG_HASH` - the git tree hash of the package directory, e.g. `45a1ad5919f0b6acf0f0cf730e9434abfae11fe6`; tag part of `linuxkit pkg show-tag`
* `PKG_IMAGE` - the name of the image that is being built, e.g. `linuxkit/init`; image name part of `linuxkit pkg show-tag`. Combine with `PKG_HASH` for the full tag.
Note that the above are set **only** if you do not set them in `build.yaml`. Your settings _always_
override these built-in ones.
To use them, simply address them in your `Dockerfile`:
```dockerfile
ARG SOURCE
```
### Build packages as a maintainer
If you have write access to the `linuxkit` organisation on hub, you
should also be set up with signing keys for packages and your signing
key should have a passphrase, which we call `<passphrase>` throughout.
All official LinuxKit packages are multi-arch manifests and most of
them are available for `amd64`, `arm64`, and `s390x`. Official images
*must* be build on both architectures and they must be build *in
@@ -35,7 +35,7 @@ specified bucket, and create a bootable image from the stored image.
Alternatively, you can use the `AWS_BUCKET` environment variable to specify the bucket name.
**Note:** If the push times out before it finishes, you can use the `-timeout` flag to extend the timeout.
**Note:** If the push times out before it finishes, you can use the `-timeout` flag to extend the timeout. You may also want to consider passing `-ena` to enable enhanced networking in the AMI.
@@ -11,17 +11,7 @@ Supported (tested) versions of the relevant OpenStack APIs are:
## Authentication
LinuxKit's support for OpenStack handles two ways of providing the endpoint and authentication details. You can either set the standard set of environment variables and the commands detailed below will inherit those, or you can explicitly provide them on the command-line as options to `push` and `run`. The examples below use the latter, but if you prefer the former then you'll need to set the following:
```shell
OS_USERNAME="admin"
OS_PASSWORD="xxx"
OS_TENANT_NAME="linuxkit"
OS_AUTH_URL="https://keystone.com:5000/v3"
OS_USER_DOMAIN_NAME=default
OS_CACERT=/path/to/cacert.pem
OS_INSECURE=false
```
LinuxKit's support for OpenStack includes configuring access to your cloud as detailed in the official [os-client-config](https://docs.openstack.org/os-client-config/latest/user/configuration.html) documentation.
## Push
@@ -40,32 +30,17 @@ Images generated with Moby can be uploaded into OpenStack's image service with `
```shell
./linuxkit push openstack \
-authurl=https://keystone.example.com:5000/v3 \
-username=admin \
-password=XXXXXXXXXXX \
-project=linuxkit \
-img-name=LinuxKitTest
./linuxkit.iso
```
If successful, this will return the image's UUID. If you've set your environment variables up as described above, this command can then be simplified:
```shell
./linuxkit push openstack \
-img-name "LinuxKitTest"\
~/Desktop/linuxkitmage.qcow2
```
## Run
Virtual machines can be launched using `linuxkit run openstack`. As an example:
@@ -24,9 +24,9 @@ specified with `-arch` and currently accepts `x86_64`, `aarch64`, and
`linuxkit run qemu` can boot in different types of images:
-`kernel+initrd`: This is the default mode of `linuxkit run qemu` [`x86_64`, `arm64`, `s390x`]
-`kernel+squashfs`: `linuxkit run qemu -squashfs <path to directory>`. This expects a kernel and a squashfs image. [`x86_64`, `arm64`, `s390x`]
-`iso-bios`: `linuxkit run qemu -iso <path to iso>` [`x86_64`]
-`iso-efi`: `linuxkit run qemu -iso -uefi <path to iso>`. This looks in `/usr/share/ovmf/bios.bin` for the EFI firmware by default. Can be overwritten with `-fw`. [`x86_64`, `arm64`]
-`kernel+squashfs`: `linuxkit run qemu --squashfs <path to directory>`. This expects a kernel and a squashfs image. [`x86_64`, `arm64`, `s390x`]
-`iso-bios`: `linuxkit run qemu --iso <path to iso>` [`x86_64`]
-`iso-efi`: `linuxkit run qemu --iso --uefi <path to iso>`. This looks in `/usr/share/ovmf/bios.bin` for the EFI firmware by default. Can be overwritten with `-fw`. [`x86_64`, `arm64`]
-`qcow-bios`: `linuxkit run qemu disk.qcow2` [`x86_64`]
-`raw-bios`: `linuxkit run qemu disk.img` [`x86_64`]
-`aws`: `linuxkit run qemu disk.img` boots a raw AWS disk image. [`x86_64`]
This is a quick guide to run LinuxKit on Scaleway (only VPS x86_64 for now)
## Setup
You must create a Scaleway API Token (combination of Access and Secret Key), available at [Scaleway Console](https://console.scaleway.com/account/credentials), first.
Then you can use it either with the `SCW_ACCESS_KEY` and `SCW_SECRET_KEY` environment variables or the `-access-key` and `-secret-key` flags
of the `linuxkit push scaleway` and `linuxkit run scaleway` commands.
In addition, Organization ID value has to be set, either with the `SCW_DEFAULT_ORGANIZATION_ID` environment variable or the `-organization-id` command line flag.
The environment variable `SCW_DEFAULT_ZONE` is used to set the zone (there is also the `-zone` flag)
## Build an image
Scaleway requires a `iso-efi` image. To create one:
* You have to set `root=/dev/vda` in the `cmdline` to have the right device set on boot
* The metadata package is not only used to set the metadata, but also to signal Scaleway that the instance has booted. So it is encouraged to use it (dhcpcd must be set before)
## Push image
You have to do `linuxkit push scaleway scaleway.iso` to upload it to your Scaleway images.
By default the image name is the name of the ISO file without the extension.
It can be overidden with the `-img-name` flag or the `SCW_IMAGE_NAME` environment variable.
**Note 1:** If an image (and snapshot) of the same name exists, it will be replaced.
**Note 2:** The image is zone specific: if you create an image in `par1` you can't use is in `ams1`.
### Push process
Building a Scaleway image have a special process. Basically:
* Create an `image-builder` instance with an additional volume, based on Ubuntu Bionic (only x86_64 for now)
* Copy the ISO image on this instance
* Use `dd` to write the image on the additional volume (`/dev/vdb` by default)
* Terminate the instance, create a snapshot, and create an image from the snapshot
**Note 1:** An image is linked to a snapshot, so you can't delete a snapshot before the image.
**Note 2:** You can specify an already running instance to act as the image builder with the `-instance-id` flag. But if you don't specify the `-no-clean` flag it will be destroyed upon completion.
## Create an instance and connect to it
With the image created, we can now create an instance.
```
linuxkit run scaleway scaleway
```
By default, the instance name is `linuxkit`. It can be overidden with the `-instance-name` flag.
If you don't set the `-no-attach` flag, you will be connected to the serial port.
You can edit the Scaleway example to allow you to SSH to your instance in order to use it.
There are no special integration services available for Virtualization.Framework, but
there are a number of packages, such as `vsudd`, which enable
tighter integration of the VM with the host (see below).
The Virtualization.Framework backend also allows passing custom userdata into the
[metadata package](./metadata.md) using either the `-data` or `-data-file` command-line
option. This attaches a CD device with the data on.
### `vsudd` unix domain socket forwarding
The [`vsudd` package](/pkg/vsudd) provides a daemon that exposes unix
domain socket inside the VM to the host via virtio or Hyper-V sockets.
With Virtualization.Framework, the virtio sockets can be exposed as unix domain
sockets on the host, enabling access to other daemons, like
`containerd` and `dockerd`, from the host. An example configuration
file is available in [examples/vsudd-containerd.yml](/examples/vsudd-containerd.yml).
After building the example, run it with `linuxkit run virtualization.framework
-vsock-ports 2374 vsudd`. This will create a unix domain socket in the state directory that maps to the `containerd` control socket. The socket is called `guest.00000946`.
If you install the `ctr` tool on the host you should be able to access the
`containerd` running in the VM:
```
$ go get -u -ldflags -s github.com/containerd/containerd/cmd/ctr
This document describes the steps to make a LinuxKit release. A
LinuxKit release consists of:
- A git tag of the form vX.Y on a specific commit.
- Packages on Docker hub, tagged with the release tag.
- All sample `YAML` files updated to use the release packages
-`linuxkit` binaries for all supported architectures.
- Changelog entry
Note, we explicitly do not tag kernel images with LinuxKit release
tags as we encourage users to stay current with the kernel
releases. We also do not tag test and `mkimage` packages as these are
not end-user facing.
## Pre-requisites
Releases can be done by any maintainer. Maintainers need to have
access to build machines for all architectures support by LinuxKit and
signing keys set up to sign Docker hub images.
## Release preparation
The release preparation is by far the most time consuming task as it
involves updating all packages and YAML files.
The release preparation is performed on a branch of your up-to-date
LinuxKit clone. This document assumes that your clone of the LinuxKit
repository is available as the `origin` remote in your local `git`
clone (in my setup the official LinuxKit repository is available as
`upstream` remote). If your setup is different, you may have to adjust
some of the commands below.
As a starting point you have to be on the update to date master branch
and be in the root directory of your local git clone. You should also
have the same setup on all build machines used.
### Update `linuxkit/alpine`
This step is not necessarily required if the alpine base image has
recently been updated, but it is good to pick up any recent bug
fixes. Follow the process in [alpine-base-update.md](./alpine-base-update.md)
There are several important notes to consider when updating alpine base:
*`LK_BRANCH` is set to `rel_$LK_RELEASE`, when cutting a release, for e.g. `LK_BRANCH=rel_v0.9`
* It not necessarily required to update the alpine base image if it has recently been updated, but it is good to pick up any recent bug
fixes. However, you do need to update the tools, packages and tests.
* Releases are a particularly good time to check for updates in wrapped external dependencies, as highlighted in [alpine-base-update.md#External Tools](./alpine-base-update.md#External_Tools)
### Final preparation steps
- Update AUTHORS by running `./scripts/generate-authors.sh`
- Update the `VERSION` variable in the top-level `Makefile`
- Create an entry in `CHANGELOG.md`. Take a look at `git log v0.3..HEAD` and pick interesting updates (of course adjust `v0.3` to the previous version).
- Create a PR with your changes.
## Releasing
Once the PR is merged we can do the actual release.
- Update your local git clone to the lastest
- Identify the merge commit for your PR and tag it and push it to the main LinuxKit repository (remote `upstream` in my case):
```
git tag $LK_RELEASE master
git push upstream $LK_RELEASE
```
Then head over to GitHub and look at the `Releases` tab. You should see the new tag. Edit it:
- Add the changelog message
- Head over to the Circle CI page of the master build (try the Circle CI badge in the top level `README.md`)
- Download the artefacts and SHA256 sums file.
- Add the downloaded binaries to the release page (drag-and-drop below the editor window)
- Add the `sha256` sums to the release notes on the release page
Hit the `Publish release` button.
This completes the release, but you are not done, one more step is required.
## Post release
Create a PR which bumps the version number in the top-level `Makefile`
to `$LK_RELEASE+` to make sure that the version reported by `linuxkit
LinuxKit bootable images are composed of existing OCI images.
OCI images, when built, often are scanned to create a
software bill-of-materials (SBoM). The buildkit builder
system itself contains the [ability to integrate SBoM scanning and generation into the build process](https://docs.docker.com/build/attestations/sbom/).
When LinuxKit composes an operating system image using `linuxkit build`,
it will, by default, combine the SBoMs of all the OCI images used to create
@@ -50,8 +50,6 @@ and namespaced separately from the host as appropriate.
LinuxKit's build process heavily leverages Docker images for packaging. Of note, all intermediate build images
are referenced by digest to ensures reproducibility across LinuxKit builds. Tags are mutable, and thus subject to override
(intentionally or maliciously) - referencing by digest mitigates classes of registry poisoning attacks in LinuxKit's buildchain.
Certain images, such as the kernel image, will be signed by LinuxKit maintainers using [Docker Content Trust](https://docs.docker.com/engine/security/trust/content_trust/),
which guarantees authenticity, integrity, and freshness of the image.
Moreover, LinuxKit's build process leverages [Alpine Linux's](https://alpinelinux.org/) hardened userspace tools such as
Musl libc, and compiler options that include `-fstack-protector` and position-independent executable output. Go binaries
This command will generate some private keys in `~/.docker/trust` and ask you for passphrases such that they are encrypted at rest.
All linuxkit repositories are currently using the same root key so we can pin trust on key ID `1908a0cf4f55710138e63f65ab2a97e8fa3948e5ca3b8857a29f235a3b61ea1b`.
We'll also let the notary server take control of the snapshot key, for easier delegation collaboration:
Maintainers are to sign with `delegation` keys, which are adminstered by a non-root key.
Thusly, they are easily rotated without having to bring the root key online.
Additionally, maintainers can be added to separate roles for auditing purposes: the current setup is to add maintainers to both the `targets/releases` role that is intended
for release consumption, as well as an individual `targets/<maintainer_name>` role for auditing.
Docker will automatically sign into both roles when pushing with Docker Content Trust.
Here's what the command looks like to add all maintainers to the `targets/releases` role:
This document contains a list of known issues related to using, building or testing linuxkit.
## Images
## Packages
### Invalid MediaType
**Problem**
```
Error: error building and pushing "linuxkit/mkimage-iso-efi-initrd:0e66171ffde9bb735b0e014f811f9626fc8b9bc9": PUT https://index.docker.io/v2/linuxkit/mkimage-iso-efi-initrd/manifests/0e66171ffde9bb735b0e014f811f9626fc8b9bc9: MANIFEST_INVALID: manifest invalid; if present, mediaType in image index should be 'application/vnd.oci.image.index.v1+json' not 'application/vnd.docker.distribution.manifest.list.v2+json'
```
The above message is caused by registries, notably docker hub, refusing to accept indexes with the
docker media type of `application/vnd.docker.distribution.manifest.list.v2+json`, rather than the OCI
one `application/vnd.oci.image.index.v1+json`.
Linuxkit _does_ use the OCI media type, however, if the image _already_ exists in the registry, linuxkit will
pull the index down, update it, and push it back up. The above error occurs because the index that exists in
the hub, the one that is pulled down, has the older media type, from when the registry accepted it.
**Solution**
The solution is to force an entirely new build, which will generate the images and index with the correct media
The `linuxkit build` command assembles a set of containerised components into in image. The simplest
type of image is just a `tar` file of the contents (useful for debugging) but more useful
outputs add a `Dockerfile` to build a container, or build a full disk image that can be
booted as a linuxkit VM. The main use case is to build an assembly that includes
`containerd` to run a set of containers, but the tooling is very generic.
The yaml configuration specifies the components used to build up an image . All components
are downloaded at build time to create an image. The image is self-contained and immutable,
so it can be tested reliably for continuous delivery.
Components are specified as Docker images which are pulled from a registry during build if they
are not available locally. See [image-cache](./image-cache.md) for more details on local caching.
The Docker images are optionally verified with Docker Content Trust.
For private registries or private repositories on a registry credentials provided via
`docker login` are re-used.
## Sections
The configuration file is processed in the order `kernel`, `init`, `onboot`, `onshutdown`,
`services`, `files`, `volumes`. Each section adds files to the root file system. Sections may be omitted.
Each container that is specified is allocated a unique `uid` and `gid` that it may use if it
wishes to run as an isolated user (or user namespace). Anywhere you specify a `uid` or `gid`
field you specify either the numeric id, or if you use a name it will refer to the id allocated
to the container with that name.
```
services:
- name: redis
image: redis:latest
uid: redis
gid: redis
binds:
- /etc/redis:/etc/redis
files:
- path: /etc/redis/redis.conf
contents: "..."
uid: redis
gid: redis
mode: "0600"
```
### `kernel`
The `kernel` section is only required if booting a VM. The files will be put into the `boot/`
directory, where they are used to build bootable images.
The `kernel` section defines the kernel configuration. The `image` field specifies the Docker image,
which should contain a `kernel` file that will be booted (eg a `bzImage` for `amd64`) and a file
called `kernel.tar` which is a tarball that is unpacked into the root, which should usually
contain a kernel modules directory. `cmdline` specifies the kernel command line options if required.
The contents of `cmdline` are passed to the kernel as-is. There are several special values that are
used to control the behaviour of linuxkit packages. See [kernel command line options](../docs/cmdline.md).
To override the names, you can specify the kernel image name with `binary: bzImage` and the tar image
with `tar: kernel.tar` or the empty string or `none` if you do not want to use a tarball at all.
Kernel packages may also contain a cpio archive containing CPU microcode which needs prepending to
the initrd. To select this option, recommended when booting on bare metal, add `ucode: intel-ucode.cpio`
to the kernel section.
### `init`
The `init` section is a list of images that are used for the `init` system and are unpacked directly
into the root filesystem. This should bring up `containerd`, start the system and daemon containers,
and set up basic filesystem mounts. in the case of a LinuxKit system. For ease of
modification `runc` and `containerd` images, which just contain these programs are added here
rather than bundled into the `init` container.
### `onboot`
The `onboot` section is a list of images. These images are run before any other
images. They are run sequentially and each must exit before the next one is run.
These images can be used to configure one shot settings. See [Image
specification](#image-specification) for a list of supported fields.
### `onshutdown`
This is a list of images to run on a clean shutdown. Note that you must not rely on these
being run at all, as machines may be be powered off or shut down without having time to run
these scripts. If you add anything here you should test both in the case where they are
run and when they are not. Most systems are likely to be "crash only" and not have any setup here,
but you can attempt to deregister cleanly from a network service here, rather than relying
on timeouts, for example.
### `services`
The `services` section is a list of images for long running services which are
run with `containerd`. Startup order is undefined, so containers should wait
on any resources, such as networking, that they need. See [Image
specification](#image-specification) for a list of supported fields.
### `volumes`
The volumes section is a list of named volumes that can be used by other containers,
including those in `services`, `onboot` and `onshutdown`. The volumes are created in a directory
chosen by linuxkit at build-time. The volumes then can be referenced by other containers and
mounted into them.
Volumes normally are blank directories. If an image is provided, the contents of that image
will be used to populate the volume.
The `volumes` section can declare a volume to be read-write or read-only. If the volume is read-write,
a volume that is mounted into a container can be mounted read-only or read-write. If the volume is read-only,
it can be mounted into a container read-only; attempting to do so read-write will generate a build-time error.
By default, volumes are created read-write, and are mounted read-write.
Volume names **must** be unique, and must contain only lower-case alphanumeric characters, hyphens, and
underscores.
Sample `volumes` section:
```yml
volumes:
- name:vola
image:alpine:latest
readonly:true
- name:volb
image:alpine:latest
readonly:false
- name:volc
readonly:false
```
In the above example:
*`vola` is populated by the contents of `alpine:latest` and is read-only.
*`volb` is populated by the contents of `alpine:latest` and is read-write.
*`volc` is an empty volume and is read-write.
Sample usage of volumes in `services` section:
```yml
services:
- name:myservice
image:alpine:latest
binds:
- volA:/mnt/volA:ro
- volB:/mnt/volB
```
### `files`
The files section can be used to add files inline in the config, or from an external file.
```yml
files:
- path:dir
directory:true
mode:"0777"
- path:dir/name1
source:"/some/path/on/local/filesystem"
mode:"0666"
- path:dir/name2
source:"/some/path/that/it/is/ok/to/omit"
optional:true
mode:"0666"
- path:dir/name3
contents:"orange"
mode:"0644"
uid:100
gid:100
```
Specifying the `mode` is optional, and will default to `0600`. Leading directories will be
created if not specified. You can use `~/path` in `source` to specify a path in the build
user's home directory.
In addition there is a `metadata` option that will generate the file. Currently the only value
supported here is `"yaml"` which will output the yaml used to generate the image into the specified
file:
```yml
- path:etc/linuxkit.yml
metadata:yaml
```
Note that if you use templates in the yaml, the final resolved version will be included in the image,
and not the original input template.
Because a `tmpfs` is mounted onto `/var`, `/run`, and `/tmp` by default, the `tmpfs` mounts will shadow anything specified in `files` section for those directories.
## Image specification
Entries in the `onboot`, `onshutdown`, `volumes` and `services` sections specify an OCI image and
options. Default values may be specified using the `org.mobyproject.config` image label.
For more details see the [OCI specification](https://github.com/opencontainers/runtime-spec/blob/master/spec.md).
If the `org.mobylinux.config` label is set in the image, that specifies default values for these fields if they
are not set in the yaml file. While most fields are _replaced_ if they are specified in the yaml file,
some support _add_ via the format `<field>.add`; see below.
You can override the label entirely by setting the value, or setting it to be empty to remove
the specification for that value in the label.
If you need an OCI option that is not specified here please open an issue or pull request as the list is not yet
complete.
By default the containers will be run in the host `net`, `ipc` and `uts` namespaces, as that is the usual requirement;
in many ways they behave like pods in Kubernetes. Mount points must already exist, as must a file or directory being
bind mounted into a container.
-`name` a unique name for the program being executed, used as the `containerd` id.
-`image` the Docker image to use for the root filesystem. The default command, path and environment are
extracted from this so they need not be filled in.
-`capabilities` the Linux capabilities required, for example `CAP_SYS_ADMIN`. If there is a single
capability `all` then all capabilities are added.
-`capabilities.add` the Linux capabilities required, but these are added to the defaults, rather than overriding them.
-`ambient` the Linux ambient capabilities (capabilities passed to non root users) that are required.
-`mounts` is the full form for specifying a mount, which requires `type`, `source`, `destination`
and a list of `options`. If any fields are omitted, sensible defaults are used if possible, for example
if the `type` is `dev` it is assumed you want to mount at `/dev`. The default mounts and their options
can be replaced by specifying a mount with new options here at the same mount point.
-`binds` is a simpler interface to specify bind mounts, accepting a string like `/src:/dest:opt1,opt2`
similar to the `-v` option for bind mounts in Docker.
-`binds.add` is a simpler interface to specify bind mounts, but these are added to the defaults, rather than overriding them.
-`tmpfs` is a simpler interface to mount a `tmpfs`, like `--tmpfs` in Docker, taking `/dest:opt1,opt2`.
-`command` will override the command and entrypoint in the image with a new list of commands.
-`env` will override the environment in the image with a new environment list. Specify variables as `VAR=value`.
-`cwd` will set the working directory, defaults to `/`.
-`net` sets the network namespace, either to a path, or if `none` or `new` is specified it will use a new namespace.
-`ipc` sets the ipc namespace, either to a path, or if `new` is specified it will use a new namespace.
-`uts` sets the uts namespace, either to a path, or if `new` is specified it will use a new namespace.
-`pid` sets the pid namespace, either to a path, or if `host` is specified it will use the host namespace.
-`readonly` sets the root filesystem to read only, and changes the other default filesystems to read only.
-`maskedPaths` sets paths which should be hidden.
-`readonlyPaths` sets paths to read only.
-`uid` sets the user id of the process.
-`gid` sets the group id of the process.
-`additionalGids` sets a list of additional groups for the process.
-`noNewPrivileges` is `true` means no additional capabilities can be acquired and `suid` binaries do not work.
-`hostname` sets the hostname inside the image.
-`oomScoreAdj` changes the OOM score.
-`rootfsPropagation` sets the rootfs propagation, eg `shared`, `slave` or (default) `private`.
-`cgroupsPath` sets the path for cgroups.
-`resources` sets cgroup resource limits as per the OCI spec.
-`sysctl` sets a map of `sysctl` key value pairs that are set inside the container namespace.
-`rmlimits` sets a list of `rlimit` values in the form `name,soft,hard`, eg `nofile,100,200`. You can use `unlimited` as a value too.
-`annotations` sets a map of key value pairs as OCI metadata.
There are experimental `userns`, `uidMappings` and `gidMappings` options for user namespaces but these are not yet supported, and may have
permissions issues in use.
In addition to the parts of the specification above used to generate the OCI spec, there is a `runtime` section in the image specification
which specifies some actions to take place when the container is being started.
-`cgroups` takes a list of cgroups that will be created before the container is run.
-`mounts` takes a list of mount specifications (`source`, `destination`, `type`, `options`) and mounts them in the root namespace before the container is created. It will
try to make any missing destination directories.
-`mkdir` takes a list of directories to create at runtime, in the root mount namespace. These are created before the container is started, so they can be used to create
directories for bind mounts, for example in `/tmp` or `/run` which would otherwise be empty.
-`interface` defines a list of actions to perform on a network interface:
-`name` specifies the name of an interface. An existing interface with this name will be moved into the container's network namespace.
-`add` specifies a type of interface to be created in the containers namespace, with the specified name.
-`createInRoot` is a boolean which specifes that the interface being `add`ed should be created in the root namespace first, then moved. This is needed for `wireguard` interfaces.
-`peer` specifies the name of the other end when creating a `veth` interface. This end will remain in the root namespace, where it can be attached to a bridge. Specifying this implies `add: veth`.
-`bindNS` specifies a namespace type and a path where the namespace from the container being created will be bound. This allows a namespace to be set up in an `onboot` container, and then
using `net: path` for a `service` container to use that network namespace later.
-`namespace` overrides the LinuxKit default containerd namespace to put the container in; only applicable to services.
An example of using the `runtime` config to configure a network namespace with `wireguard` and then run `nginx` in that namespace is shown below:
command:["sh","-c","ip link set dev wg0 up; ip address add dev wg0 192.168.2.1 peer 192.168.2.2; wg setconf wg0 /etc/wireguard/wg0.conf; wg show wg0"]
runtime:
interfaces:
- name:wg0
add:wireguard
createInRoot:true
bindNS:
net:/run/netns/wg
services:
- name:nginx
image:nginx:alpine
net:/run/netns/wg
capabilities:
- CAP_NET_BIND_SERVICE
- CAP_CHOWN
- CAP_SETUID
- CAP_SETGID
- CAP_DAC_OVERRIDE
```
## `devices`
To access the console, it's necessary to explicitly add a "device" definition, for example:
```
devices:
- path: "/dev/console"
type: c
major: 5
minor: 1
mode: 0666
```
See the [getty package](../pkg/getty/build.yml) for a more complete example
and see [runc](https://github.com/opencontainers/runc/commit/60e21ec26e15945259d4b1e790e8fd119ee86467) for context.
To grant access to all block devices use:
```
devices:
- path: all
type: b
```
See the [format package](../pkg/format/build.yml) for an example.
### Mount Options
When mounting filesystem paths into a container - whether as part of `onboot` or `services` - there are several options of which you need to be aware. Using them properly is necessary for your containers to function properly.
For most containers - e.g. nginx or even docker - these options are not needed. Simply doing the following will work fine:
```yml
binds:
- /var:/some/var/path
```
Please note that `binds` doesn't **add** the mount points, but **replaces** them.
You can examine the `Dockerfile` of the component (in particular, `binds` value of
`org.mobyproject.config` label) to get the list of the existing binds.
However, in some circumstances you will need additional options. These options are used primarily if you intend to make changes to mount points _from within your container_ that should be visible from outside the container, e.g., if you intend to mount an external disk from inside the container but have it be visible outside.
In order for new mounts from within a container to be propagated, you must set the following on the container:
1.`rootfsPropagation: shared`
2. The mount point into the container below which new mounts are to occur must be `rshared,rbind`. In practice, this is `/var` (or some subdir of `/var`), since that is the only true read-write area of the filesystem where you will mount things.
Thus, if you have a regular container that is only reading and writing, go ahead and do:
```yml
binds:
- /var:/some/var/path
```
On the other hand, if you have a container that will make new mounts that you wish to be visible outside the container, do:
```yml
binds:
- /var:/var:rshared,rbind
rootfsPropagation:shared
```
## Templates
The `yaml` file supports templates for the names of images. Anyplace an image is used in a file and begins
with the character `@`, it indicates that it is not an actual name, but a template. The first word after
the `@` indicates the type of template, and the rest of the line is the argument to the template. The
templates currently supported are:
*`@pkg:` - the argument is the path to a linuxkit package. For example, `@pkg:./pkg/init`.
For `pkg`, linuxkit will resolve the path to the package, and then run the equivalent of `linuxkit pkg show-tag <dir>`.
For example:
```yaml
init:
- "@pkg:../pkg/init"
```
Will cause linuxkit to resolve `../pkg/init` to a package, and then run `linuxkit pkg show-tag ../pkg/init`.
The paths are relative to the directory of the yaml file.
You can specify absolute paths, although it is not recommended, as that can make the yaml file less portable.
The `@pkg:` templating is supported **only** when the yaml file is being read from a local filesystem. It does not
support when using via stdin, e.g. `cat linuxkit.yml | linuxkit build -`, or URLs, e.g. `linuxkit build https://example.com/foo.yml`.
The `@pkg:` template currently supports only default `linuxkit pkg` options, i.e. `build.yml` and `tag` options. There
are no command-line options to override them.
**Note:** The character `@` is reserved in yaml. To use it in the beginning of a string, you must put the entire string in
quotes.
If you use the template, the actual derived value, and not the initial template, is what will be stored in the final
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.