We use dedicated docker container as builder and we are able to clean
data inside only by re-creating of it. Let's add disk usage and clean
commands for builder.
Signed-off-by: Petr Fedchenkov <giggsoff@gmail.com>
We cannot build for another arch after building for one arch because of
setting skipBuild to true if one arch found. In other words "linuxkit
pkg build --platforms linux/riscv64,linux/amd64 ..." after "linuxkit pkg
build --platforms linux/amd64 ..." will not build for linux/riscv64
which is not expected.
In general when we check for available images and able to found part of
platforms we do not want to rebuild all of them. So this PR includes
platformsToBuild slice which we fill with platforms we want to build for
.
Signed-off-by: Petr Fedchenkov <giggsoff@gmail.com>
It is not easy to use cross-platform build with CGO enabled so lets
allow build without cgo for darwin and use virtualization framework only
if we built with CGO.
Signed-off-by: Petr Fedchenkov <giggsoff@gmail.com>
If we cannot open file for some reason it is better to skip it instead
of exit. Also we should skip symlinks and directories.
Signed-off-by: Petr Fedchenkov <giggsoff@gmail.com>
We should expand the list of supported arches to be able to build them if we want. Without this we will stuck on sending tarball during build for riscv64.
Signed-off-by: Petr Fedchenkov <giggsoff@gmail.com>
To be able to identify successive file changes without commit, we should
use their hash in tag alongside with dirty flag
(<ls-tree>-dirty-<content hash>).
Signed-off-by: Petr Fedchenkov <giggsoff@gmail.com>
We pull all arches for the image which is suboptimal in terms of storage
consumption. Let's pull only required platforms.
Signed-off-by: Petr Fedchenkov <giggsoff@gmail.com>
We check only for existence of builder container and do not start it in
case of not running state. We should start it for example after reboot
of node to be able to build something.
Signed-off-by: Petr Fedchenkov <giggsoff@gmail.com>
We noticed that we use host arch when we want to use previously build
image in oci-layout. Let's use fix on buildkit side and improve test.
Signed-off-by: Petr Fedchenkov <giggsoff@gmail.com>
We should check if we have args in "FROM" and replace them:
ARG IMAGE=linuxkit/img
FROM ${IMAGE} as src
will be parsed as
FROM linuxkit/img as src
Signed-off-by: Petr Fedchenkov <giggsoff@gmail.com>
We do not allow to load into docker images that are targets another
platform differ from current arch. Assume this is because of no support
of manifest. But we can keep all images in place by adding arch suffix
and using tag without arch suffix to point onto current system arch. It
will help to use images from docker for another arch.
Signed-off-by: Petr Fedchenkov <giggsoff@gmail.com>
This option was previously not available and required postprocessing of a `tar-kernel-initrd` output.
Comparison with `iso-efi`:
`iso-efi` only loads the kernel at boot, and the root filesystem is mounted from the actual boot media (eg, a CD-ROM - physical or emulated). This can often cause trouble (it has for us) for multiple reasons:
- the linuxkit kernel might not have the correct drivers built-in for the hardware (see #3154)
- especially with virtual or emulated CD-ROMs, performance can be abysmal: we saw the case where the server IPMI allowed using a ISO stored in AWS S3 over HTTP...you can imagine what happens when you start doing random I/O on the root fs in that case.
- The ISO image has the root device name baked in (ie, `/dev/sr0`) which fails if for some reason the CD-ROM we're running from doesn't end up using that device, so manual tweaking is required (see #2375)
`iso-efi-initrd`, on the other hand, packs the root filesystem as an initramfs (ie similar to what the raw output does, except that in this case we're preparing an ISO image), so both the kernel and the initramfs are loaded in memory by the boot loader and, once running, we don't need to worry about root devices or kernel drivers (and the speed is good, as everything runs in RAM).
Also, the generated ISO can be copied verbatim (eg with `dd`) onto a USB media and it still works.
Finally, the image size is much smaller compared to `iso-efi`.
IMHO, `iso-efi-initrd` could be used almost anywhere `iso-efi` would be used, or might even supersede it. I can't think of a scenario where one might explicitly want to use `iso-efi`.
Points to consider:
- Not tested under aarch64 as I don't have access to that arch. If the automated CI tests also test that, then it should be fine.
- I'm not sure what to put inside `images.yaml` for the `iso-efi-initrd` image. As it is it works of course (my personal image on docker hub), but I guess it'll have to be some more "official" image. However, that cannot be until this PR is merged, so it's kind of a chicken and egg situation. Please advise.
- I can look into adding the corresponding `iso-bios-initrd` builder if there is interest.

Signed-off-by: Davide Brini <waldner@katamail.com>
With linux kernel 5.15+ change of proc/sys/net/ipv4/ip_forward require
CAP_NET_ADMIN (https://github.com/torvalds/linux/commit/8292d7f6). We do
not use ip_forward now, but we should be ready for future changes of
conf files.
Signed-off-by: Petr Fedchenkov <giggsoff@gmail.com>
This allows multiple build flavors for a single codebase, without
sacrificing reproducible builds. The build-args are set in build.yml,
which is typically under the source control (if it is not, then no
reproducible builds are possible anyways). Meaning that mutating
build-args would result in setting "dirty" flag.
Intended use of this commit is to switch between build flavors by
specifying a different yaml file (presumably also under the version
control) by `-build-yml` option.
Because it is impossible to build a final image from packages in
cache, the test for this feature relies on the `RUN echo $build-arg`
output during the `pkg build` process.
Signed-off-by: Yuri Volchkov <yuri@zededa.com>
These are easier to create than cgroupv1 cgroups as they are only a
single mkdir.
Detect which mode we are in by looking for the presence of the
cgroupv2-only cgroup.controllers file.
Signed-off-by: David Scott <dave@recoil.org>
The kernel config is derived from the 5.12 kernel
config we used to have
We explicitly enable RANDOMIZE_KSTACK_OFFSET_DEFAULT
which is off by default.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
./scripts/update-component-sha.sh linuxkit/runc:21dbbda709ae138de0af6b0c7e4ae49525db5e88 linuxkit/runc:9f7aad4eb5e4360cc9ed8778a5c501cce6e21601
Signed-off-by: David Scott <dave@recoil.org>
This reverts commit 380f36cc1a.
Now that runc includes a fix for this, this patch can be reverted
Signed-off-by: Frédéric Dalleau <frederic.dalleau@docker.com>
According to busybox' acpid code, acpid should be allowed to access /dev/input/event*, so we all all "input" devices (whose major number is 13)
Signed-off-by: Sylvain Prat <sylvain.prat@gmail.com>
Previously when we set `cmd.Stderr = os.Stderr`, the stderr from buildx
would be mixed with the image tar, corrupting it.
Work around this (Windows-specific) problem by adding an explicit
indirection via a io.Pipe()
Signed-off-by: David Scott <dave@recoil.org>
After runc 1.0.0-rc92 mounting /dev with ro will fail to start the
container with an error trying to `mkdir /dev/...` (for example
`/dev/pts`). This can be observed following the runc example
Comparing our `config.json` with the working one generated by
`runc spec`, both have a readonly rootfs (good) but the `runc spec`
one does not set `ro` in the `/dev` mount options.
This patch fixes readonly onboot containers by removing the "ro"
option from `/dev`, to match the `runc spec` example.
Signed-off-by: David Scott <dave@recoil.org>
After the runc security advisory[1] the default cgroup device
whitelist was changed.
In previous versions every container had "rwm" (read, write, mknod)
for every device ("a" for all). Typically this was overridden by
container engines like Docker. In LinuxKit we left the permissive
default.
In recent `runc` versions the default allow-all rule was removed,
so a container can only access a device if it is specifically
granted access, which LinuxKit handles via a device: entry.
However it is inconvenient for pkg/format, pkg/mount, pkg/swap
to list all possible block devices up-front. Therefore we add the
ability to grant access to an entire class of device with a single
rule:
```
- path: all
type: b
```
Obviously a paranoid user can still override this with a specific
major/minor number in a device: rule.
[1] https://github.com/opencontainers/runc/security/advisories/GHSA-g54h-m393-cpwq
Signed-off-by: David Scott <dave@recoil.org>
With 561ce6f4be ("Remove Notary and Content Trust") we
removed support for content trust. No need to have it
in the YAMLs either.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
oprofile kernel support was dropped with 5.12.x with:
f8408264c77a ("drivers: Remove CONFIG_OPROFILE support")
However the commit stated that the userspace oprofile tools
had stopped using the kernel interface for a log time. So
drop the check.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
CONFIG_BPFILTER is aimed to provide a replacement for netfilter.
When CONFIG_BPFILTER is enabled, the kernel tries to contact a user mode helper
for each iptable rule update. However the implementation of this helper has not
been upstreamed yet. The communication thus fails and the kernel then falls back
to netfilter.
As a result, the rule update takes more than ten times the duration of the
netfilter implementation alone.
This has been reported by Docker Desktop users for whom it can take minutes to
start a container sharing a few hundred ports. https://github.com/for-mac/issues/5668
More details on the situation is described in https://lwn.net/Articles/822744/.
Signed-off-by: Frederic Dalleau <frederic.dalleau@docker.com>
The bcc portion of the build had been disabled because it wasn't
building. Now that bcc is building again, add it back to the list of
default targets in the kernel build.
Signed-off-by: Krister Johansen <krister.johansen@oracle.com>
This moves up to bcc 0.20.0 and builds on the latest 3.13 Alpine base
image. It uses libelf from Alpine, which allows us to drop a number of
the patches we were carrying and reduce the number of steps taken in the
bcc build.
This builds for me on a branch of tip against 5.11.x, 5.10.x,
5.10.x-dbg, and 5.4.x on x86_65. I have not had a chance to attempt
this on other platforms due to lack of hardware.
Signed-off-by: Krister Johansen <krister.johansen@oracle.com>
Some kernels are only build for some architectures. The
test assumed that all kernels were build for all architectures.
Now, get a list of architectures for which we have a given
kernel image and then make sure the builder images pointed
to by the label and the builder image tagged by convention
exist and point to the same thing.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Declare KERNEL_SOURCE as an environment variable so it
get's picked up in kernel-source-info
fixes#3653
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
5.4.x is the only kernel left which does not have
WireGuard in tree and it people should be using more
recent kernels. Remove the now special case for
compiling out of tree WireGuard.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Prior to this commit we go build -o bin/foo, archive it, and
expand the archive, leaving the resulting artifact in bin.
This doesn't allow us to easily change the bin directory, or
move parts of the makefile around to make things more modular.
This commit changes the behaviour to:
go build -o foo, archive it, expand to `bin`
Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
In alpine version 3.12, the open-vm-tools package got split into new
smaller sub-packages. The implication of this is that features such as
reporting of hostname and ip address to vCenter stopped working.
Signed-off-by: Edvin Eriksson <edvin.erikson@leovegas.com>
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
This adds a --skip-platforms flag that can be used with
lkt pkg build to ignore any arch specified in build.yml
Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
This prevents override of the platform by the user.
lkt pkg build --platform=linux/amd64 pkg/bpftrace should
attempt to build that package for that arch even though
it is not in the build.yml
Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
This commit adds the default linuxkit cache directory to the
GitHub Actions cache. This will ensure that we don't pull images
that already exist in the cache, or build them if we've already
done so. It should speed up CI.
Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
Go can be weird about tools having to run in a directory with
go.mod. This commit moves the linuxkit makefile to the same
directory as the code.
It also changes the semantics of the local-build target.
You can now use STATIC=0 for dynamic builds or PIE=1 to
use --buildmode=pie. The binaries we were producing in
local-static weren't actually static so I fixed that too
Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
Docker Desktop proxies the docker socket at its default location
(/var/run/docker.sock), but allows connecting to the non-proxied
socket through /var/run/docker.sock.raw.
This patch allows the trim-after-delete utility to customize
the docker socket path, so that it can connect to the non-proxied
socket.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The kernel config is derived from 5.6.x by running it through
make oldconfig.
For x86_64 changed manually:
- CONFIG_VIRTIO_MEM=m -> y
- CONFIG_PLDMFW=y -> not set
For aarch64 changed manually:
- CONFIG_SMSC_PHY=m -> not set
- CONFIG_PLDMFW=y -> not set
No adjustment to s390x config
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
- Introduce separate os/arch to the matrix
- Pass os/arch to the local build
- Switch to upload-artifact@v0 and cache@v2
- Fetch linuxkit binary from artefacts rather than using cache
- Add some debug (print file and hashes)
While at it, add some debug for the generated artefacts.
fixes https://github.com/linuxkit/linuxkit/issues/3522
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
`go get -u` will try to update modules dependencies
`go get` (no `-u`) incorrectly resolves dependencies
we should instead advise users to `go install`
Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
This commit removes Notary and Content Trust.
Notary v1 is due to be replaced with Notary v2 soon.
There is no clean migration path from one to the other.
For now, this removes all signing from LinuxKit.
We will look to add this back once a new Notary alternative
becomes available.
Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
Dave Scott works on the Docker Desktop team, and maintains
LinuxKit changes internally for that. I think Dave would
make a good addition to the list of maintainers to help
out. :)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
From Kubernetes v1.20.0 Release notes:
The label applied to control-plane nodes "node-role.kubernetes.io/master"
is now deprecated and will be removed in a future release after a GA
deprecation period.
Introduce a new label "node-role.kubernetes.io/control-plane" that will
be applied in parallel to "node-role.kubernetes.io/master" until the
removal of the "node-role.kubernetes.io/master" label.
xref: https://kubernetes.io/docs/setup/release/notes/#no-really-you-must-read-this-before-you-upgrade
Signed-off-by: Alex Szakaly <alex.szakaly@gmail.com>
We already run the command after an image delete but
- a container delete
- a volume delete
will also free space on the filesystem.
Co-authored-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: David Scott <dave@recoil.org>
The patch we carry for 5.4 and 5.6 does not apply to
5.4.28. Disable the -rt kernel until the version has
been bumped.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
NOTE: This will be a shared mount, due to root being turned into a
shared with `MC_REC` set: `mount("", "/", "", rec|shared, "")`.
For some reason setting `shared` when mounting `/sys/fs/bpf` doesn't
work at all, perhaps that's just a kernel feature.
Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com>
* Fix using ams1 as zone
* Allow specifying image size (+ calculate default from ISO size)
* Fix mangling logs when asking for ssh passphrase
* Some minor code and docs cleanups
Signed-off-by: Karol Woźniak <wozniakk@gmail.com>
This was previously build for 5.4 and 4.19. Latest LTS is 5.4 and
latest stable is 5.6. Also skip s390x build for perf
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Reduce the number of packages to build for s390x. Firmware
is only used for physical devices, so disable it for s390x
where we mostly run in virtual machines.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Some drivers offer mutliple firmwares with the WHENCE file
defining the default. Use the cope-firmware.sh script to
create a copy of the firmware repository with the defaults
copied in to the right place.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
For some reason, the 'make ARCH=s390 oldconfig' yields
a different config when executing on a real s390c system...
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
A subsequent commit will make the 5.4 kernel the default.
This is primarily to reduce the number of kernels we need
to compile for every upgrade.
Note, we keep the 4.19 config file for arm64 around since the
-rt kernel config needs it.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
This should allow end-users to gracefully reboot or shutdown Kubernetes
nodes (incuding control planes) running on vSphere Hypervisor
There are several use cases when cluster administrators are not able to
install extra packages onto the host OS
Fixes#3462
Signed-off-by: Alex Szakaly <alex.szakaly@gmail.com>
- Disable the devmapper snapshotter. We are not using it
- Cherry-pick and upstream commit to be able to disable
the devmapper integration tests
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
This commit uses the GitHub Actions cache to ensure that the `rtf`
binary can be re-used between runs if it hasn't changed.
It also caches the linuxkit binaries for use in future stages.
Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
This commit adds the GCP test that formerly ran in LinuxKitCI to run
under rtf.
As GitHub Actions doesn't currently support adding secret files, I've
skipped this test for now. Credentials can be passed via environment
variable but as RTF runs with `-x` the contents is viewable in the logs.
I will create an issue to follow up and find either a way of writing the
variable to file that doesn't compromise it. Or perhaps another approach
that is more compatible with GH actions
Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
This commit adds a GitHub Actions workflow to replace both CircleCI and
LinuxKit CI.
It will build the Linuxkit binary, run tests and upload artifacts
It replaces the Integration Tests that are run by Linuxkit CI via
the make ci or make ci-pr targets with multiple sets of Integration
Tests that are run in parallel.
It does not yet test GCP. The GCP test in LinuxKit CI could be moved to RTF
Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
This new snapshot comes from the brand new linux-compat repo, which
follows the recent upstreaming into net-next. When Linux 5.6 lands in
LinuxKit, we'll be able to remove the module entirely.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
This adds a new configuration provider that just reads a file.
This is needed for Docker Desktop, where we will run a LinuxKit distro in an isolated namespace within WSL 2.
In this scenario, the config will be accessible trough the WSL2 built-in 9p mount of the Windows filesystem.
Signed-off-by: Simon Ferquel <simon.ferquel@docker.com>
Allows us to drop some patches we were carrying, since the bugs were
fixed upstream. Gives numerous tooling improvements too.
Signed-off-by: Krister Johansen <krister.johansen@oracle.com>
Re-enable perf builds for 5.3.x and 4.19.x since they're the latest
stable and LTS, respectively.
Update the bcc build rules to map to these same kernel releases, too.
Signed-off-by: Krister Johansen <krister.johansen@oracle.com>
The first patch re-adds symbol definitions that were temporarily omitted
from the 4.19 stable branch.
The latter patch corrects the uapi swab.h to that errors about "unknown
type name '__always_inline'" are no longer present in builds. Without
this patch, bcc would build but attempts to compile the internal
programs at runtime would fail.
Signed-off-by: Krister Johansen <krister.johansen@oracle.com>
There were some mistakes made in the initial code where writes didn't work, this commit fixes that.
Signed-off-by: Simon Fridlund <simon@fridlund.email>
This commit removes the container backend for QEMU.
QEMU and it's tools are available on all platforms.
Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
If the swap disk is larger than 1MiB, then use a 1MiB blocksize in `dd`
On my machine using a large block size speeds up swap file creation:
```
/ # time dd if=/dev/zero of=output bs=1024 count=1048576
1048576+0 records in
1048576+0 records out
real 0m 4.61s
user 0m 0.79s
sys 0m 3.77s
/ # time dd if=/dev/zero of=output bs=1048576 count=1024
1024+0 records in
1024+0 records out
real 0m 1.06s
user 0m 0.00s
sys 0m 1.04s
```
Signed-off-by: David Scott <dave.scott@docker.com>
KCONFIG_TAG variable can be used to set a custom kconfig tag.
If KCONFIG_TAG is not set, the the image is tagged as linuxkit/kconfig:latest
This is useful for projects requiring to build multiple kernels that have
different patches.
When trying to edit an unpatched kernel config after working on a patched
kernel config (same kernel version), one had to rerun make kconfig first
in order to edit the config of an unpatched kernel.
Now it is possible to generate a tegged kconfig image and then, get the wanted
config by selecting the corresponding linuxkit/kexec:tag.
Signed-off-by: Gabriel Chabot <gabriel.chabot@qarnot-computing.com>
This commit will update the Scaleway provider to fetch the cloud-init/cloud-config data from the user_data/cloud-init endpoint it will also make sure the whole public ssh key is fetched and no longer strip out the `ssh-rsa` part of the keys
Signed-off-by: Simon Fridlund <simon@fridlund.email>
It's now not needed to send a boot signal when booting an instance on
Scaleway, thus the method is not needed anymore.
Signed-off-by: Patrik Cyvoct <patrik@ptrk.io>
Update Gophercloud dependencies and also bring in the 'utils'
package. This provides support for configuring access to OpenStack
clouds as detailed in the [official
documentation](https://docs.openstack.org/os-client-config/latest/user/configuration.html).
By relying on this package we can simplify the code required to
interact with OpenStack's APIs. Support is also provided upstream for
self-signed and insecure SSL configurations.
Tested with a public cloud running OpenStack 'Rocky', the latest release.
Signed-off-by: Nick Jones <nick@dischord.org>
The rootfs fs was removed in 5.3.x but was mostly a
irrelevant entry in the filesystems list anyway.
Here is the upstream commit:
commit fd3e007f6c6a0f677e4ee8aca4b9bab8ad6cab9a
Author: Al Viro <viro@zeniv.linux.org.uk>
Date: Thu May 30 17:48:35 2019 -0400
don't bother with registering rootfs
init_mount_tree() can get to rootfs_fs_type directly and that simplifies
a lot of things. We don't need to register it, we don't need to look
it up *and* we don't need to bother with preventing subsequent userland
mounts. That's the way we should've done that from the very beginning.
There is a user-visible change, namely the disappearance of "rootfs"
from /proc/filesystems. Note that it's been unmountable all along
and it didn't show up in /proc/mounts; however, it *is* a user-visible
change and theoretically some script might've been using its presence
in /proc/filesystems to tell 2.4.11+ from earlier kernels.
*IF* any complaints about behaviour change do show up, we could fake
it in /proc/filesystems. I very much doubt we'll have to, though.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Short references without domains will now fail parsing on recent versions
of Go as net/url parser is more strict.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Intel microrode download is moved earlier in the Dockerfile, before the
kernel is actually built, so that it's available in the context of a
build and can be referenced in CONFIG_EXTRA_FIRMWARE for people who want
the microcode to be built-in the kernel.
It is still copied in the out/ directory and so that it is still
available for addition in a 'ucode:' section in linuxkit.yml.
Signed-off-by: Yoann Ricordel <yoann.ricordel@qarnot-computing.com>
Copy firmaware files to the correct directory. Instead of
<vendor>/<fw-name>/<fw-name> copy it to <vendor>/<fw-name>.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Vultr provides an API that looks a lot like the AWS api, resulting in
the AWS provider succeeding, but missing certain metadata parts that one
would expect to work out of the box on Vultr, such as SSH PubKey
fetching.
Signed-off-by: Sachi King <nakato@nakato.io>
The Vultr provider currently never calls handleSSH, resulting in it
being impossible to bring up a LinuxKit image in vultr with the SSH
pubkey provided via the Vultr metadata API.
Signed-off-by: Sachi King <nakato@nakato.io>
This skips 0.0.20190531
Changelog for 0.0.20190601
== Changes ==
* compat: don't call xgetbv on cpus with no XSAVE
There was an issue with the backport compat layer in yesterday's snapshot,
causing issues on certain (mostly Atom) Intel chips on kernels older than
4.2, due to the use of xgetbv without checking cpu flags for xsave support.
This manifested itself simply at module load time. Indeed it's somewhat tricky
to support 33 different kernel versions (3.10+), plus weird distro
frankenkernels.
Changelog for 0.0.20190531
== Changes ==
* tools: add wincompat layer to wg(8)
Consistent with a lot of the Windows work we've been doing this last cycle,
wg(8) now supports the WireGuard for Windows app by talking through a named
pipe. You can compile this as `PLATFORM=windows make -C src/tools` with mingw.
Because programming things for Windows is pretty ugly, we've done this via a
separate standalone wincompat layer, so that we don't pollute our pretty *nix
utility.
* compat: udp_tunnel: force cast sk_data_ready
This is a hack to work around broken Android kernel wrapper scripts.
* wg-quick: freebsd: workaround SIOCGIFSTATUS race in FreeBSD kernel
FreeBSD had a number of kernel race conditions, some of which we can vaguely
work around. These are in the process of being fixed upstream, but probably
people won't update for a while.
* wg-quick: make darwin and freebsd path search strict like linux
Correctness.
* socket: set ignore_df=1 on xmit
This was intended from early on but didn't work on IPv6 without the ignore_df
flag. It allows sending fragments over IPv6.
* qemu: use newer iproute2 and kernel
* qemu: build iproute2 with libmnl support
* qemu: do not check for alignment with ubsan
The QEMU build system has been improved to compile newer versions. Linking
against libmnl gives us better error messages. As well, enabling the alignment
check on x86 UBSAN isn't realistic.
* wg-quick: look up existing routes properly
* wg-quick: specify protocol to ip(8), because of inconsistencies
The route inclusion check was wrong prior, and Linux 5.1 made it break
entirely. This makes a better invocation of `ip route show match`.
* netlink: use new strict length types in policy for 5.2
* kbuild: account for recent upstream changes
* zinc: arm64: use cpu_get_elf_hwcap accessor for 5.2
The usual churn of changes required for the upcoming 5.2.
* timers: add jitter on ack failure reinitiation
Correctness tweak in the timer system.
* blake2s,chacha: latency tweak
* blake2s: shorten ssse3 loop
In every odd-numbered round, instead of operating over the state
x00 x01 x02 x03
x05 x06 x07 x04
x10 x11 x08 x09
x15 x12 x13 x14
we operate over the rotated state
x03 x00 x01 x02
x04 x05 x06 x07
x09 x10 x11 x08
x14 x15 x12 x13
The advantage here is that this requires no changes to the 'x04 x05 x06 x07'
row, which is in the critical path. This results in a noticeable latency
improvement of roughly R cycles, for R diagonal rounds in the primitive. As
well, the blake2s AVX implementation is now SSSE3 and considerably shorter.
* tools: allow setting WG_ENDPOINT_RESOLUTION_RETRIES
System integrators can now specify things like
WG_ENDPOINT_RESOLUTION_RETRIES=infinity when building wg(8)-based init
scripts and services, or 0, or any other integer.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Linux has documented but somewhat unusual behavior around
SIGSTOP/SIGCONT and certain syscalls, of which epoll_wait(2) is one. In
this particular case, rngd exited unexpectedly after getting ptrace'd
mid-epoll_wait. Fix this by handling EINTR from this syscall, and
continuing to add entropy and wait.
Signed-off-by: Krister Johansen <krister.johansen@oracle.com>
Update the image tag for the mkimage-rpi3 tool used by the CLI to adopt
the dynamic DTB selection feature.
Signed-off-by: Richard Connon <richard@connon.me.uk>
U-Boot sets the variable fdtfile to the correct file name for the
detected hardware revision. Use this in the boot script to load either
the 3-b or 3-b-plus DTB
Signed-off-by: Richard Connon <richard@connon.me.uk>
Update the u-boot image included in the mkimage-rpi3 image to support
detecting newer hardware versions and setting the fdtfile variable
accordingly
Shallow clone the u-boot repository during docker build to improve build
efficiency
Signed-off-by: Richard Connon <richard@connon.me.uk>
This stops the output from also being copied to logs if the user
has a log driver configured.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Update Raspberry Pi firmware used in mkimage-rpi3 to the latest stable
version to support newer hardware models such as the 3B+
Signed-off-by: Richard Connon <richard@connon.me.uk>
Intel seem to have switched to hosting the microcode on GitHub.
Use this source and update to the 20190514 version.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
== Changes ==
* allowedips: initialize list head when removing intermediate nodes
Fix for an important regression in removing allowed IPs from the last
snapshot. We have new test cases to catch these in the future as well.
* wg-quick: freebsd: rebreak interface loopback, while fixing localhost
* wg-quick: freebsd: export TMPDIR when restoring and don't make empty
Two fixes for FreeBSD which have already been backported into ports.
* tools: genkey: account for short reads of /dev/urandom
* tools: add support for Haiku
The tools now support Haiku! Maybe somebody is working on a WireGuard
implementation for it?
* tools: warn if an AllowedIP has a nonzero host part
If you try to run `wg set wg0 peer ... allowed-ips 192.168.1.82/24`, wg(8)
will now print a warning. Even though we mask this automatically down to
192.168.1.0/24, usually when people specify it like this, it's a mistake.
* wg-quick: add 'strip' subcommand
The new strip subcommand prints the config file to stdout after stripping
it of all wg-quick-specific options. This enables tricks such as:
`wg addconf $DEV <(wg-quick strip $DEV)`.
* tools: avoid unneccessary next_peer assignments in sort_peers()
Small C optimization the compiler was probably already doing.
* peerlookup: rename from hashtables
* allowedips: do not use __always_inline
* device: use skb accessor functions where possible
Suggested tweaks from Dave Miller.
* qemu: set framewarn 1280 for 64bit and 1024 for 32bit
These should indicate to us more clearly when we cross the most strict stack
thresholds expected when using recent compilers with the kernel.
* blake2s: simplify
* blake2s: remove outlen parameter from final
The blake2s implementation has been simplified, since we don't use any of the
fancy tree hashing parameters or the like. We also no longer separate the
output length at initialization time from the output length at finalization
time.
* global: the _bh variety of rcu helpers have been unified
* compat: nf_nat_core.h was removed upstream
* compat: backport skb_mark_not_on_list
The usual assortment of compat fixes for Linux 5.1.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Getting compile errors:
AS [M] /wireguard/crypto/zinc/chacha20/chacha20-x86_64.o
In file included from <command-line>:
/wireguard/compat/compat.h:795:10: fatal error: net/netfilter/nf_nat_core.h: No such file or directory
#include <net/netfilter/nf_nat_core.h>
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
With the current firmware being pulled for the RPi3, recent revisions of
the RPi hardware, such as the 3 B+ will fail to boot.
The issue is exhibited as when RPi 3 B+ receives power and attempts to
boot, the power LED will turn off and the ACT LED will flash 8 times.
According to elinux.org troubleshooting guide[0] this correlates to an
SDRAM initialisation error that can be fixed by updating the firmware.
After updating this firmware the power light stays on, and UBoot can be
seen booting.
[0] - https://elinux.org/R-Pi_Troubleshooting#Green_LED_blinks_in_a_specific_pattern
Signed-off-by: Sachi King <nakato@nakato.io>
Commit d47b283df4 ("kernel: Remove fetch target") removed
the 'fetch' target to simplify the Makefile. This left
dependencies on 'sources' lingering. Remove it.
resolves#3333
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Commit 250b14661b ("kernel: Use elfutils-dev instead
of libelf-dev") switched the kernel build to use
elfutils-dev instead of libelf-dev. This caused the kernel
module tests to fail. The still installed libelf-dev and
the dunamically linked objtool (and friends) from the
kernel source package failed to execute.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
With kernel 5.0.6 we start seeing compile errors such as:
HOSTCXX -fPIC scripts/gcc-plugins/randomize_layout_plugin.o
In file included from <stdin>:1:
/usr/include/libelf/libelf.h:28:5: error: "__LIBELF_INTERNAL__" is not defined, evaluates to 0 [-Werror=undef]
#if __LIBELF_INTERNAL__
^~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
elutils-dev installs a different version of libelf.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
All our 4.x kernels had CFQ enabled. This was removed
in 5.x and replaced with BFQ. Enable it.
resolves#3308
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
To reduce the number of kernels we maintain, for s390x
and ar64 we only support the latest LTS and newer kernels.
v4.19.x has been out for a while, so lets remove support for
v4.14.x.
resolves#3302
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
See https://github.com/moby/moby/issues/38887
for details. Basically 5.x removed support for
CFQ with f382fb0bcef4 ("block: remove legacy IO
schedulers") and the Moby check still requires it.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Many places where checking for -ge 4 and some minor version.
This will fail for 5.x kernels if their minor version is less.
Fix it.
While at it, also restructure/simplify the code, make it easier
to run against arbitrary kernel configs, and tidy up some
whitespaces.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
This target allowed to locally download the kernel source
tar balls. We haven't used this foir a while and adding
v5.x kernel support for it would add yet another conditional.
Remove it to keep the Makefile simpler.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
The make-gcp script in the mkimage-gcp tool creates a virtual fs of 1GB, excactly. If your filesystem needs to be larger, then make-gcp errors in a poorly explained way. Simply removing the arg makes the fs the same size as the image used to build it.
Signed-off-by: Daniel Smith <daniel@razorsecure.com>
The build of the perf utility has been quite bothersome,
with different arches and kernel versions failing.
Since we now have the ful kernel source in the package,
factor out the actual build into Dockerfile.perf
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
The compile fails with:
[ 30%] Building CXX object src/ast/CMakeFiles/ast.dir/codegen_llvm.cpp.o
[ 30%] Building CXX object src/ast/CMakeFiles/ast.dir/irbuilderbpf.cpp.o
[ 31%] Building CXX object src/ast/CMakeFiles/ast.dir/printer.cpp.o
[ 31%] Building CXX object src/ast/CMakeFiles/ast.dir/semantic_analyser.cpp.o
/bpftrace/src/ast/irbuilderbpf.cpp: In member function 'llvm::CallInst* bpftrace::ast::IRBuilderBPF::CreateProbeReadStr(llvm::AllocaInst*, size_t, llvm::Value*)':
/bpftrace/src/ast/irbuilderbpf.cpp:279:16: error: 'BPF_FUNC_probe_read_str' was not declared in this scope
getInt64(BPF_FUNC_probe_read_str),
^~~~~~~~~~~~~~~~~~~~~~~
/bpftrace/src/ast/irbuilderbpf.cpp: In member function 'llvm::CallInst* bpftrace::ast::IRBuilderBPF::CreateProbeReadStr(llvm::Value*, size_t, llvm::Value*)':
/bpftrace/src/ast/irbuilderbpf.cpp:294:16: error: 'BPF_FUNC_probe_read_str' was not declared in this scope
getInt64(BPF_FUNC_probe_read_str),
^~~~~~~~~~~~~~~~~~~~~~~
/bpftrace/src/ast/irbuilderbpf.cpp: In member function 'llvm::CallInst* bpftrace::ast::IRBuilderBPF::CreateGetCurrentCgroupId()':
/bpftrace/src/ast/irbuilderbpf.cpp:422:16: error: 'BPF_FUNC_get_current_cgroup_id' was not declared in this scope
getInt64(BPF_FUNC_get_current_cgroup_id),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/bpftrace/src/ast/irbuilderbpf.cpp: In member function 'llvm::CallInst* bpftrace::ast::IRBuilderBPF::CreateGetCurrentTask()':
/bpftrace/src/ast/irbuilderbpf.cpp:461:16: error: 'BPF_FUNC_get_current_task' was not declared in this scope
getInt64(BPF_FUNC_get_current_task),
^~~~~~~~~~~~~~~~~~~~~~~~~
/bpftrace/src/ast/irbuilderbpf.cpp: In member function 'llvm::CallInst* bpftrace::ast::IRBuilderBPF::CreateGetStackId(llvm::Value*, bool)':
/bpftrace/src/ast/irbuilderbpf.cpp:497:16: error: 'BPF_FUNC_get_stackid' was not declared in this scope
getInt64(BPF_FUNC_get_stackid),
^~~~~~~~~~~~~~~~~~~~
/bpftrace/src/ast/semantic_analyser.cpp: In member function 'int bpftrace::ast::SemanticAnalyser::create_maps(bool)':
/bpftrace/src/ast/semantic_analyser.cpp:871:68: error: 'BPF_MAP_TYPE_STACK_TRACE' was not declared in this scope
bpftrace_.stackid_map_ = std::make_unique<bpftrace::FakeMap>(BPF_MAP_TYPE_STACK_TRACE);
^~~~~~~~~~~~~~~~~~~~~~~~
/bpftrace/src/ast/semantic_analyser.cpp:885:64: error: 'BPF_MAP_TYPE_STACK_TRACE' was not declared in this scope
bpftrace_.stackid_map_ = std::make_unique<bpftrace::Map>(BPF_MAP_TYPE_STACK_TRACE);
^~~~~~~~~~~~~~~~~~~~~~~~
make[2]: *** [src/ast/CMakeFiles/ast.dir/build.make:89: src/ast/CMakeFiles/ast.dir/irbuilderbpf.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[2]: *** [src/ast/CMakeFiles/ast.dir/build.make:115: src/ast/CMakeFiles/ast.dir/semantic_analyser.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:276: src/ast/CMakeFiles/ast.dir/all] Error 2
make: *** [Makefile:141: all] Error 2
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
The wireguard package has some sub-packages which are
now dependencies. Include them in the alpine base.
Also include openresolv, which is required by one
of the wireguard packages.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
* wg-quick: freebsd: allow loopback to work
FreeBSD adds a route for point-to-point destination addresses. We don't
really want to specify any destination address, but unfortunately we
have to. Before we tried to cheat by giving our own address as the
destination, but this had the unfortunate effect of preventing
loopback from working on our local ip address. We work around this with
yet another kludge: we set the destination address to 127.0.0.1. Since
127.0.0.1 is already assigned to an interface, this has the same effect
of not specifying a destination address, and therefore we accomplish the
intended behavior. Note that the bad behavior is still present in Darwin,
where such workaround does not exist.
* tools: remove unused check phony declaration
* highlighter: when subtracting char, cast to unsigned
* chacha20: name enums
* tools: fight compiler slightly harder
* tools: c_acc doesn't need to be initialized
* queueing: more reasonable allocator function convention
Usual nits.
* systemd: wg-quick should depend on nss-lookup.target
Since wg-quick(8) calls wg(8) which does hostname lookups, we should
probably only run this after we're allowed to look up hostnames.
* compat: backport ALIGN_DOWN
* noise: whiten the nanoseconds portion of the timestamp
This mitigates unrelated sidechannel attacks that think they can turn
WireGuard into a useful time oracle.
* hashtables: decouple hashtable allocations from the main device allocation
The hashtable allocations are quite large, and cause the device allocation in
the net framework to stall sometimes while it tries to find a contiguous
region that can fit the device struct. To fix the allocation stalls, decouple
the hashtable allocations from the device allocation and allocate the
hashtables with kvmalloc's implicit __GFP_NORETRY so that the allocations fall
back to vmalloc with little resistance.
* chacha20poly1305: permit unaligned strides on certain platforms
The map allocations required to fix this are mostly slower than unaligned
paths.
* noise: store clamped key instead of raw key
This causes `wg show` to now show the right thing. Useful for doing
comparisons.
* compat: ipv6_stub is sometimes null
On ancient kernels, ipv6_stub is sometimes null in cases where IPv6 has
been disabled with a command line flag or other failures.
* Makefile: don't duplicate code in install and modules-install
* Makefile: make the depmod path configurable
* queueing: net-next has changed signature of skb_probe_transport_header
A 5.1 change. This could change again, but for now it allows us to keep this
snapshot aligned with our upstream submissions.
* netlink: don't remove allowed ips for new peers
* peer: only synchronize_rcu_bh and traverse trie once when removing all peers
* allowedips: maintain per-peer list of allowedips
This is a rather big and important change that makes it much much faster to do
operations involving thousands of peers. Batch peer/allowedip addition and
clearing is several orders of magnitude faster now.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
These tests expect a $TMPDIR which supports user xattrs, which the tmpfs on
/tmp does not. Redirect it to the persistent disk which does.
Signed-off-by: Ian Campbell <ijc@docker.com>
... from the old-skool label scheme.
No semantic change intended. Some keys are in different orders and the "mounts"
entry gained an empty "destination" key, neither of which makes a practical
difference.
Signed-off-by: Ian Campbell <ijc@docker.com>
* tools: curve25519: handle unaligned loads/stores safely
This should fix sporadic crashes with `wg pubkey` on certain architectures.
* netlink: auth socket changes against namespace of socket
In WireGuard, the underlying UDP socket lives in the namespace where the
interface was created and doesn't move if the interface is moved. This
allows one to create the interface in some privileged place that has
Internet access, and then move it into a container namespace that only
has the WireGuard interface for egress. Consider the following
situation:
1. Interface created in namespace A. Socket therefore lives in namespace A.
2. Interface moved to namespace B. Socket remains in namespace A.
3. Namespace B now has access to the interface and changes the listen
port and/or fwmark of socket. Change is reflected in namespace A.
This behavior is arguably _fine_ and perhaps even expected or
acceptable. But there's also an argument to be made that B should have
A's cred to do so. So, this patch adds a simple ns_capable check.
* ratelimiter: build tests with !IPV6
Should reenable building in debug mode for systems without IPv6.
* noise: replace getnstimeofday64 with ktime_get_real_ts64
* ratelimiter: totalram_pages is now a function
* qemu: enable FP on MIPS
Linux 5.0 support.
* keygen-html: bring back pure javascript implementation
Benoît Viguier has proofs that values will stay well within 2^53. We
also have an improved carry function that's much simpler. Probably more
constant time than emscripten's 64-bit integers.
* contrib: introduce simple highlighter library
This is the highlighter library being used in:
- https://twitter.com/EdgeSecurity/status/1085294681003454465
- https://twitter.com/EdgeSecurity/status/1081953278248796165
It's included here as a contrib example, so that others can paste it into
their own GUI clients for having the same strictly validating highlighting.
* netlink: use __kernel_timespec for handshake time
This readies us for Y2038. See https://lwn.net/Articles/776435/ for more info.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
This also fixes up test/cases/020_kernel/110_namespace/common.yml
and test/cases/040_packages/032_bcc/test.yml to use the 4.19.x
kernel. I missed these when making the 4.19 kernel the default.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
4.19.x is the new LTS kernel and has been out for a while. Switch
all examples and tests to using it instead of the 4.14.x kernel.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
The kernel config was derived from the 4.19.13 kernel config
run through the 'make oldconfig' with all defaults accepted,
except for:
- NET_VENDOR_MICROCHIP (defauly 'y', set to 'n')
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
We already have 4.9.x, 4.14.x, and 4,19.x as LTS releases.
4.9.x has a longer lifetime as 4.4.x as well and fewer security
fixes can be backported to 4.4.x. Remove it.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Sort the list of mount points by destination. This makes the list
deterministic for reproducible builds and also ensures that, e.g.,
the mount for /dev happens before the mount for /dev/pts.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Currently 'docker export' is used to convert a linuxkit entry
in the YAML file to a tar file of the root filesystem. This
process creates a number of files and directories which have
the timestamp of when the 'docker export' is run. Fix 'em up.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
When creating files for the "intermediate" tar ball,
fix the ModTime. This reduces the difference between
LinuxKit images build from identical inputs.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
packet.net will soon have x86 and arm64 machines with NFPs.
Enable the driver for it.
The 4.9 kernel only has support for the NFP VF driver,
so don't enable it there.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Support plain gzip'ed files, as used on arm64, and bzImage with
embedded gzip'ed kernel, as used on x86.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Add the '-vmlinux' flag to build and pass it all
the way to the kernel filter.
Note, this commit only adds the flag but does not
yet perform the decompression. This will be added
with the next commit.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Stash the kernel image in a local buffer and
flush it out once done.
This is preparation work for supporting uncompressed
kernels in the next commit.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Please add your use cases here. There are many adopters that I know about but have not
documented here, please fill this in.
I divided this into production users, and also linked a selection of open source projects
that I know about here.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
A previosu commit removed suppoer for 4.18.x kernels for
arm64 and s390x but did not remove the config files. Fix it.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Needed for containerd v1.2.0 otherwise:
$ ctr run -t docker.io/library/hello-world@sha256:f3b3b28a45160805bb16542c9531888519430e9e6d6ffc09d72261b0d26ff74f test
[ 1311.667587] overlayfs: failed to resolve '/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/5/fs': -2
ctr: failed to mount /tmp/containerd-mount111658703: no such file or directory
Signed-off-by: Ian Campbell <ijc@docker.com>
On Linux a key in `~/.docker/config.json` indicates if a credentials helper is
in use (and which), if one is then the method is identical to the Darwin case
so refactor to support that.
Signed-off-by: Ian Campbell <ijc@docker.com>
If the YAML does not specify a kernel, kernel commandline
or any containers, don't create empty files. Note, an
initrd file is still created if the kernel image contains
CPU ucode.
This only applies to kernel+initrd and tar-kernel-initrd
output formats.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
The logic for perf became too complex. Just build for latest LTS
and latest stable.
Disable for arm64 for now as it is broken for 4.19 due to a header
mismatch:
In file included from /linux/tools/arch/arm64/include/uapi/asm/unistd.h:20:0,
from libbpf.c:36:
/linux/tools/include/uapi/asm-generic/unistd.h:754:0: error: "__NR_fcntl" redefined [-Werror]
In file included from /usr/include/sys/syscall.h:4:0,
from /linux/tools/perf/perf-sys.h:7,
from libbpf.c:35:
/usr/include/bits/syscall.h:26:0: note: this is the location of the previous definition
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
The kernel configs were constructed by running the 4.18.x config
through the 4.19 oldconfig process.
The 4.19.x has a new option, RANDOM_TRUST_CPU, which indicates
if the CPUs random instruction is to be trusted. It defaults to
"no" and this default was accepted.
Most of the defaults were accepted, except for:
BLK_CGROUP_IOLATENCY=y
NFT_TUNNEL=y
NFT_OSF=y
NFT_TPROXY=y
NETFILTER_XT_MATCH_SOCKET=y
NET_VENDOR_CADENCE=n
NET_VENDOR_NETERION=n
NET_VENDOR_PACKET_ENGINES=n
We also disallow CIFS for insecure legacy servers:
CIFS_ALLOW_INSECURE_LEGACY=n
For arm64, the following changes were made to the default:
SENSORS_RASPBERRYPI_HWMON=y
CRYPTO_DEV_QCOM_RNG=m
CRYPTO_DEV_HISI_SEC=m
For s390x, the additional changes were made to the default:
KERNEL_BZIP2 (default is gzip)
GCC_PLUGINS=y
GCC_PLUGIN_STRUCTLEAK=y
GCC_PLUGIN_STRUCTLEAK_BYREF_ALL=y
GCC_PLUGIN_RANDSTRUCT=y
GCC_PLUGIN_RANDSTRUCT_PERFORMANCE=y
Running the 4.18 and 4.19 kernel config through
./scripts/kconfig-split.py yields the following 4.19.x
only config options for x86_64:
The x86_64 kernel difference to 4.18 for
CONFIG_ARCH_SUPPORTS_ACPI=y
CONFIG_BLK_CGROUP_IOLATENCY=y
CONFIG_BNXT_HWMON=y
CONFIG_BUILD_SALT=""
CONFIG_CONSOLE_LOGLEVEL_QUIET=4
CONFIG_CRASH_CORE=y
CONFIG_HAVE_ARCH_PREL32_RELOCATIONS=y
CONFIG_HAVE_RELIABLE_STACKTRACE=y
CONFIG_MEMCG_KMEM=y
CONFIG_MLX5_EN_ARFS=y
CONFIG_MLX5_EN_RXNFC=y
CONFIG_NETFILTER_NETLINK_OSF=y
CONFIG_NETFILTER_XT_MATCH_SOCKET=y
CONFIG_NFT_OSF=y
CONFIG_NFT_TPROXY=y
CONFIG_NFT_TUNNEL=y
CONFIG_NF_SOCKET_IPV4=y
CONFIG_NF_SOCKET_IPV6=y
CONFIG_XEN_SCRUB_PAGES_DEFAULT=y
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
After 'make oldconfig' we check that that the kernel config
is as we expect and error if they don't. We used to print
the default 'diff' output on a mismatch but a unified diff
is easier to read.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Using filepath primitives instead of manipulating file paths manually takes care of platform specific formats.
Signed-off-by: Mathieu Champlon <mathieu.champlon@docker.com>
This cherry picks:
- b6fe0440c637 ("bridge: implement missing ndo_uninit()")
- b1b9d366028f ("bridge: move bridge multicast cleanup to ndo_uninit")
The fix is in b1b9d366028f ("bridge: move bridge multicast cleanup
to ndo_uninit") but it requires b6fe0440c637 ("bridge: implement missing
ndo_uninit()"). Furthermore, b1b9d366028f needed some manual resolution
of a cherry-pick conflict because the surrounding code had changed.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
We want to compile BCC for the latest LTS and the latest
stable and missed the update to 4.18 when enabling it. Do
it now.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
support SPI in container environment (introduced Linux 4.12 2017-06-02).
Abstraction define interface EP for CAN module in containered environment. This
namespace is available and introduced with Linux Kernel 4.12 by M. Kicherer
and later O. Hartkopp, to allow containers bridging such device.
@see linux-kernel/net/can@fc4c581
Although KSPP did not explicilty noted `CAN` as secure kernel flag, this
would aim to bring such conclusion. As for security concerns, CAN protocol did
not yield any user-land or host-level vulnerabilities since introduced as
SocketCAN module in Linux Kernel. Lower-layer [protocol] standards is not
secured by default since applications are supposed to implement their own
security mechanism.
This global abstraction currently supports CAN raw, proc and af_can
codes. Does not support GW and BCM. Namespace uses _NEWNET on pseudo-file
system. Allows modprobe to environment, works by recv `pnet` for the given
interface.
Signed-off-by: Halis Duraki <duraki@linuxmail.org>
Note, this update skips 4.18.2/4.17.16/4.14.64/4.9.121/4.4.149
as the change was a single patch, a bug fix.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
In setup_net() there are a few particularly slow subsystems that
contribute more than 140ms of time to the new net namespace creation
path. The docker daemon doesn't depend on these, and won't modprobe
them into the kernel. Convert these to modules to reduce the amount of
time it takes for docker to start a container. This change takes an
additional ~120 ms of time off container start time.
Signed-off-by: Krister Johansen <krister.johansen@oracle.com>
While investigating performance problems around 'docker run' times, it
was observed that a large amount of time was spent in network namespace
creation. Of that time, a large portion involved waiting for RCU grace
periods to elapse. Increasing HZ causes the periodic timer to check for
quiesced periods more frequently, which consequently reduces the amount
of time RCU callers spend waiting for grace periods and in barrier
waits.
By itself, this change took the amount of time to execute a 'docker run
hello-world' down to 570ms from over 2000ms on 4.14, and down to 390ms
from 1260 on 4.17 and 4.18.
Signed-off-by: Krister Johansen <krister.johansen@oracle.com>
The kernel config was derived from the 4.17.x kernel config
and then tweaked a little. Specifically:
- Enable XDP_SOCKETS
- Enable NFT_CONNLIMIT
- Enable IP_VS_MH
- Enable BPFILTER (as module)
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
The 4.14.63 contains important security fixes in particular
against L1TF (CVE-2018-3615, CVE-2018-3620, CVE-2018-3646) and
userspace-userspace SpectreRSB.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
linuxkit/vsudd:98e554e4f3024c318e42c1f6876b541b654acd9f
linuxkit/host-timesync-daemon:613dc55e67470ec375335a1958650c3711dc4aa6
linuxkit/test-virtsock:57883002c2bc824709efa6cd3818e1ff51a11889
linuxkit/test-ns:a21f996641f391d467a7842e85088a304d24fae5
Signed-off-by: David Scott <dave.scott@docker.com>
In addition to bug fixes, this removes the special protocol used
for `shutdown` needed by old Windows builds < 14393.
Signed-off-by: David Scott <dave.scott@docker.com>
Note: this patch introduces an incompatibility in the
`linuxkit run vbox` arguments.
It wasn't impossible to specify more than one network adapter
to the `linuxkit run vbox` command.
This patch allows to specify more than one `-networking` argument to specify
different network adapters.
For instance:
~~~sh
linuxkit run vbox -networking type=nat -networking type=hostonly,adapter=vboxnet0
~~~
will setup the VM with 2 NICs.
It is also possible to get rid of the `type` argument.
Signed-off-by: Brice Figureau <brice@daysofwonder.com>
VirtualBox hardware (like physical hardware) has only a limited number
of IDE device on an IDE Controller.
Unfortunately when using an additional drive, it was given the port
value of 2, which doesn't exists in VirtualBox IDE controllers (as
only 0 and 1 are permitted).
This change makes use of the SATA Controller which can host much
more drives, to hook the additional drives.
Signed-off-by: Brice Figureau <brice@daysofwonder.com>
While processing the content of a tar image, linuxkit's moby tool is
blindly reusing the original tar format.
Moreover it locates the files under a new prefix, so if the original
file was stored as USTAR in the original archive, the filename length
and new prefix could be greater than the USTAR name limit leading
to a fatal error.
The fix is to always enforce PAX format on all copied files from the
original image archive.
Signed-off-by: Brice Figureau <brice-puppet@daysofwonder.com>
When building the build context, symlink need special
treatment as the link name needs to be added when
building the tar.FileInfoHeader. This code does that.
We may also need to add a special case for hard links
as the moby/moby package 'archive' does, but this
should for now
fixes#3142
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
_This list is currently under construction. Please add your use cases to this with a PR. Thanks!_
# Production Users
**_[Docker Desktop](https://www.docker.com/products/docker-desktop)_** - Docker Desktop for Mac and Windows uses LinuxKit to provide an embedded, invisible virtual machine in order to run Linux containers and to run Kubernetes. There are currently millions of active users.
**_[TagHub](https://www.taghub.net)_** - TagHub is a SaaS product for doing asset management. We use LinuxKit to have small and secure linux nodes powering our multi-cloud infrastructure. TagHub is made by [Smart Management](http://www.smartm.no/).
# Projects Using LinuxKit
**_[LinuxKit Nix](https://github.com/nix-community/linuxkit-nix)_** aims to provide a Linux Nix VM for macOS.
**_[cfdev](https://github.com/cloudfoundry-incubator/cfdev)_** A fast and easy local Cloud Foundry experience on native hypervisors, powered by LinuxKit with VPNKit
**_[dm-linuxkit](https://github.com/dotmesh-io/dm-linuxkit)_** A dotmesh controller for LinuxKit persistent storage management.
**_[Linux Foundation Edge EVE](https://github.com/lf-edge/eve)_** Edge Virtualization Engine Operating System
@@ -10,9 +10,10 @@ LinuxKit, a toolkit for building custom minimal, immutable Linux distributions.
- Completely stateless, but persistent storage can be attached
- Easy tooling, with easy iteration
- Built with containers, for running containers
- Designed to create [reproducible builds](./docs/reproducible-builds.md) [WIP]
- Designed for building and running clustered applications, including but not limited to container orchestration such as Docker or Kubernetes
- Designed from the experience of building Docker Editions, but redesigned as a general-purpose toolkit
- Designed to be managed by external tooling, such as [Infrakit](https://github.com/docker/infrakit) or similar tools
- Designed to be managed by external tooling, such as [Infrakit](https://github.com/docker/infrakit) (renamed to [deploykit](https://github.com/docker/deploykit) which has been archived in 2019) or similar tools
- Includes a set of longer-term collaborative projects in various stages of development to innovate on kernel and userspace changes, particularly around security
LinuxKit currently supports the `x86_64`, `arm64`, and `s390x` architectures on a variety of platforms, both as virtual machines and baremetal (see [below](#booting-and-testing) for details).
@@ -24,6 +25,7 @@ LinuxKit currently supports the `x86_64`, `arm64`, and `s390x` architectures on
- [linux](https://github.com/linuxkit/linux) A copy of the Linux stable tree with branches LinuxKit kernels.
- [virtsock](https://github.com/linuxkit/virtsock) A `go` library and test utilities for `virtio` and Hyper-V sockets.
- [rtf](https://github.com/linuxkit/rtf) A regression test framework used for the LinuxKit CI tests (and other projects).
- [homebrew](https://github.com/linuxkit/homebrew-linuxkit) Homebrew packages for the `linuxkit` tool.
## Getting Started
@@ -34,7 +36,7 @@ LinuxKit uses the `linuxkit` tool for building, pushing and running VM images.
Simple build instructions: use `make` to build. This will build the tool in `bin/`. Add this
to your `PATH` or copy it to somewhere in your `PATH` eg `sudo cp bin/* /usr/local/bin/`. Or you can use `sudo make install`.
If you already have `go` installed you can use `go get -u github.com/linuxkit/linuxkit/src/cmd/linuxkit` to install the `linuxkit` tool.
If you already have `go` installed you can use `go install github.com/linuxkit/linuxkit/src/cmd/linuxkit@latest` to install the `linuxkit` tool.
On MacOS there is a `brew tap` available. Detailed instructions are at [linuxkit/homebrew-linuxkit](https://github.com/linuxkit/homebrew-linuxkit),
the short summary is
@@ -43,11 +45,17 @@ brew tap linuxkit/linuxkit
brew install --HEAD linuxkit
```
Build requirements from source:
Build requirements from source using a container
- GNU `make`
- Docker
- optionally `qemu`
For a local build using `make local`
-`go`
-`make`
-`go get -u golang.org/x/lint/golint`
-`go get -u github.com/gordonklaus/ineffassign`
### Building images
Once you have built the tool, use
@@ -67,6 +75,7 @@ for example VMWare. See `linuxkit run --help`.
This should allow end-users to gracefully reboot or shutdown Kubernetes nodes (incuding control planes) running on vSphere Hypervisor.
Furthermore, it is also mandatory to have `open-vm-tools` installed on your Kubernetes nodes to use vSphere Cloud Provider (i.e. determinte virtual machine's FQDN).
## Remarks:
-`spec.template.spec.hostNetwork: true`: correctly report node IP address; required
-`spec.template.spec.hostPID: true`: send the right signal to node, instead of killing the container itself; required
-`spec.template.spec.priorityClassName: system-cluster-critical`: critical to a fully functional cluster
-`spec.template.spec.securityContext.privileged: true`: gain more privileges than its parent process; required
git commit -a -s -m "pkgs: Update packages to the latest linuxkit/alpine"
# update package tags - may want to include the release in it if set
cd$LK_ROOT
make update-package-tags
MSG=""
[ -n "$LK_RELEASE"]&&MSG="to $LK_RELEASE"
git commit -a -s -m "Update package tags $MSG"
git push $LK_REMOTE$LK_BRANCH
```
#### Update tools packages
On your primary build machine, update the other tools packages.
Note, the `git checkout` reverts the changes made by
`update-component-sha.sh` to files which are accidentally updated.
Important is the `git checkout` of `grub`. This is a bit old and only can be built with specific
older versions of packages like `gcc`, and should not be updated.
Then we update any dependencies of these tools.
#### Update test packages
Next, we update the test packages to the updated alpine base.
Next, we update the use of test packages to latest.
Some tests also use `linuxkit/alpine`, so we update them as well.
### Update packages
Next, we update the LinuxKit packages. This is really the core of the
release. The other steps above are just there to ensure consistency
across packages.
#### External Tools
Most of the packages are build from `linuxkit/alpine` and source code
in the `linuxkit` repository, but some packages wrap external
tools. When updating all packages, and especially during the time of a release,
is a good opportunity to check if there have been updates. Specifically:
-`pkg/cadvisor`: Check for [new releases](https://github.com/google/cadvisor/releases).
-`pkg/firmware` and `pkg/firmware-all`: Use latest commit from [here](https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git).
-`pkg/node_exporter`: Check for [new releases](https://github.com/prometheus/node_exporter/releases).
- Check [docker hub](https://hub.docker.com/r/library/docker/tags/) for the latest `dind` tags. and update `examples/docker.yml`, `examples/docker-for-mac.yml`, `examples/cadvisor.yml`, and `test/cases/030_security/000_docker-bench/test.yml` if necessary.
This is at your discretion.
### Build and push affected downstream packages
<ul>Note</ul>: All of the `make push` and `make forcepush` in this section use `linuxkit pkg push`, which will build for all architectures and push
the images out. See [Build Platforms](./packages.md#Build_Platforms).
This document describes how to install and maintain a LinuxKit development platform. It will grow over time.
The LinuxKit team also maintains several Linux-based build platforms. These are donated by Equinix Metal (arm64) and IBM (s390x).
## Platform-Specific Installation
### arm64 and amd64
The `amd64` and `arm64` platforms are fully supported by most OS vendors and Docker. Just upgrade to the latest OS and install the latest Docker using the
packaging tools. As of this writing, that is:
* Ubuntu/Debian with `apt`
* RHEL/CentOS/Fedora with `yum`. For any of these, use the CentOS 7/8 packages as released by Docker.
Docker does not recommend that you using the packages released by the OS vendors, as those tend to be out of date. Follow the instructions
The s390x has modern versions of most OSes, including RHEL and Ubuntu, but does not have recent versions of docker, neither as
`apt` packages for Ubuntu, nor as static downloads. In any case, these static downloads mostly are replicas.
This section describes how to install modern versions of Docker on these platforms.
#### RHEL
RHEL 7 on s390x only has releases from Docker. Follow the instructions from Docker to install. The rpm packages for RHEL are available at
https://download.docker.com/linux/rhel/
#### Ubuntu
Docker does not release packages for Ubuntu on s390x. The most recent release was for Ubuntu 18.04 Bionic, with Docker version 18.06.3.
This is quite old, and does not support modern capabilities, e.g. buildkit.
To install a more modern version:
1. Upgrade any dependent apt packages `apt upgrade`
1. Upgrade the operating system to your desired version `do-release-upgrade -d`. Note that you can set which versions to suggest via changing `/etc/update-manager/release-upgrades`
1. Download the necessary rpms (yes, rpms) from the Docker RHEL7 site. These are available [here](https://download.docker.com/linux/rhel/7/s390x/stable/Packages/). You need the following packages:
*`containerd.io-*.rpm`
*`docker-ce-*.rpm`
*`docker-ce-cli-*.rpm`
1. Install alien: `apt install alien`
1. Convert each package to a dpkg `alien --scripts <source-rpm-file.rpm>`
1. Install each package with `dpkg -i <source-dpkg>.dpkg`. Dependency management is not great, so we recommend installing them in order:
In the packages section you can find an image to setup dm-crypt encrypted devices in [linuxkit](https://github.com/linuxkit/linuxkit)-generated images.
The above will map `/dev/sda1` as an encrypted device under `/dev/mapper/dm_crypt_name` and mount it under `/var/secure_storage`
The `dm-crypt` container by default bind-mounts `/dev:/dev` and `/etc/dm-crypt:/etc/dm-crypt`. It expects the encryption key to be present in the file `/etc/dm-crypt/key`. You can pass an alternative location as encryption key which can be either a file path relative to `/etc/dm-crypt` or an absolute path.
Providing an alternative encryption key file name:
Note that you have to also map `/dev:/dev` explicitly if you override the default bind-mounts.
The `dm-crypt` container
* Will create an `ext4` file system on the encrypted device if none is present.
* It will also initialize the encrypted device by filling it from `/dev/zero` prior to creating the filesystem. Which means if the device is being setup for the first time it might take a bit longer.
* Uses the `aes-cbc-essiv:sha256` cipher (it's explicitly specified in case the default ever changes)
* Consequently the encryption key is expected to be 32 bytes long, a random one can be created via
```shell
dd if=/dev/urandom of=dm-crypt.key bs=32 count=1
```
If you see the error `Cannot read requested amount of data.` next to the log message `Creating dm-crypt mapping for ...` then this means your keyfile doesn't contain enough data.
### Examples
There are two examples in the `examples/` folder:
1. `dm-crypt.yml` - formats an external disk and mounts it encrypted.
2. `dm-crypt-loop.yml` - mounts an encrypted loop device backed by a regular file sitting on an external disk
### Options
|Option|Default|Required|Notes|
|---|---|---|---|
|`-k` or `--key`|`key`|No|Encryption key file name. Must be either relative to `/etc/dm-crypt` or an absolute file path.|
|`-l` or `--luks`||No|Use LUKS format for encryption|
|`<dm_name>`||**Yes**|The device-mapper device name to use. The device will be mapped under `/dev/mapper/<dm_name>`|
In order to make the disk available, you need to tell `linuxkit` where the disk file or block device is.
All local `linuxkit run` methods (currently `hyperkit`, `qemu`, and `vmware`) take a `-disk` argument:
All local `linuxkit run` methods (currently `hyperkit`, `qemu`, `virtualization.framework`and `vmware`)
take a `-disk` argument:
*`-disk path,size=100M,format=qcow2`. For size the default is in GB but an `M` can be appended to specify sizes in MB. The format can be omitted for the platform default, and is only useful on `qemu` at present.
-`-force` can be used to force the partition to be cleared and recreated (if applicable), and the recreated partition formatted. This option would be used to re-init the partition on every boot, rather than persisting the partition between boots.
-`-label` can be used to give the disk a label
-`-type` can be used to specify the type. This is `ext4` by default but `btrfs` and `xfs` are also supported
-`-partition` can be used to specify the partition table type. This is `dos` by default but `gpt` is also supported
-`-verbose` enables verbose logging, which can be used to troubleshoot device auto-detection and (re-)partitioning
- The final (optional) argument specifies the device name
@@ -6,7 +6,7 @@ Please open an issue if you want to add a question here.
LinuxKit does not require being installed on a disk, it is often run from an ISO, PXE or other
such means, so it does not require an on disk upgrade method such as the ChromeOS code that
is often used. It would definitely be possible to use that type of upgrade method if the
is often used. It would definitely be possible to use that type of upgrade method if the
system is installed, and it would be useful to support this for that use case, and an
updater container to control this for people who want to use this.
@@ -37,6 +37,52 @@ If you're not seeing `containerd` logs in the console during boot, make sure tha
`init` and other processes like `containerd` will use the last defined console in the kernel `cmdline`. When using `qemu`, to see the console you need to list `ttyS0` as the last console to properly see the output.
## Enabling and controlling containerd logs
On startup, linuxkit looks for and parses a file `/etc/containerd/runtime-config.toml`. If it exists, the content is used to configure containerd runtime.
Sample config is below:
```toml
cliopts="--log-level debug"
stderr="/var/log/containerd.out.log"
stdout="stdout"
```
The options are as follows:
*`cliopts`: options to pass to the containerd command-line as is.
*`stderr`: where to send stderr from containerd. If blank, it sends it to the default stderr, which is the console.
*`stdout`: where to send stdout from containerd. If blank, it sends it to the default stdout, which is the console. containerd normally does not have any stdout.
The `stderr` and `stdout` options can take exactly one of the following options:
*`stderr` - send to stderr
*`stdout` - send to stdout
* any absolute path (beginning with `/`) - send to that file. If the file exists, append to it; if not, create it and append to it.
Thus, to enable
a higher log level, for example `debug`, create a file whose contents are `--log-level debug` and place it on the image:
```yml
files:
- path:/etc/containerd/runtime-config.toml
source:"/path/to/runtime-config.toml"
mode:"0644"
```
Note that the package that parses the `cliopts` splits on _all_ whitespace. It does not, as of this writing, support shell-like parsing, so the following will work:
```
--log-level debug --arg abcd
```
while the following will not:
```
--log-level debug --arg 'abcd def'
```
## Troubleshooting containers
Linuxkit runs all services in a specific `containerd` namespace called `services.linuxkit`. To list all the defined containers:
linuxkit builds each runtime OS image from a combination of Docker images.
These images are pulled from a registry and cached locally.
linuxkit does not use the docker image cache to store these images. This is
for two key reasons.
First, docker does not provide support for different architecture versions. For
example, if you want to pull down `docker.io/library/alpine:3.13` by manifest,
with its signature, but get the `arm64` version while you are on an `amd64` device,
it is not supported.
Second, and more importantly, this requires a running docker daemon. Since the
very essence of linuxkit is removing daemons and operating systems where unnecessary,
just laying down bits in a file, removing docker from the image build process
is valuable. It also simplifies many use cases, like CI, where a docker daemon
may be unavailable.
## How LinuxKit Caches Images
LinuxKit pulls images down from a registry and stores them in a local cache.
It stores the root manifest or index of the image, the manifest, and all of the layers
for the requested architecture. It does not pull down layers, manifest or config
for all available architectures, only the requested one. If none is requested, it
defaults to the architecture on which you are running.
By default, LinuxKit caches images in `~/.linuxkit/cache/`. It can be changed
via a command-line option. The structure of the cache directory matches the
[OCI spec for image layout](http://github.com/opencontainers/image-spec/blob/master/image-layout.md).
Image names are kept in `index.json` in the [annotation](https://github.com/opencontainers/image-spec/blob/master/annotations.md) `org.opencontainers.image.ref.name`. For example"
Image to setup a loop device backed by a regular file in a [linuxkit](https://github.com/linuxkit/linuxkit)-generated image. The typical use case is to have a portable storage location which can be used to persist settings or other files. Can be combined with the `linuxkit/dm-crypt` package for protection.
## Usage
The setup is a one time step during boot:
```yaml
onboot:
- name:losetup
image:linuxkit/losetup:<hash>
command:["/usr/bin/loopy","-c","/var/test.img"]
```
The above will associate the file `/var/test.img` with `/dev/loop0` and will also create it if it's not present.
The container by default bind-mounts `/var:/var` and `/dev:/dev`. Usually the loop-file will reside on external storage which should be typically mounted under `/var` hence the choice of the defaults. If the loop-file is located somewhere else and you need a different bind-mount for it then do not forget to explicitly bind-mount `/dev:/dev` as well or else `losetup` will fail.
### Options
|Option|Default|Required|Notes|
|---|---|---|---|
|`-c` or `--create`||No|Creates the file if not present. If `--create` is not specified and the file is missing then the loop setup will obviously fail.|
|`-s` or `--size`|10|No|If `--create` was specified and the file is not present then this sets the size in MiB of the created file. The file will be filled from `/dev/zero`.|
|`-d` or `--dev`|`/dev/loop0`|No|Loop device which should be associated with the file.|
|`<file>`||**Yes**|The file to use as backing storage.|
@@ -7,23 +7,37 @@ packages, as it's very easy. Packages are the unit of customisation
in a LinuxKit-based project, if you know how to build a container,
you should be able to build a LinuxKit package.
All LinuxKit packages are:
-Signed with Docker Content Trust.
-Enabled with multi-arch manifests to work on multiple architectures.
- Derived from well-known (and signed) sources for repeatable builds.
All official LinuxKit packages are:
-Enabled with multi-arch indexes to work on multiple architectures.
-Derived from well-known sources for repeatable builds.
- Built with multi-stage builds to minimise their size.
## CI and Package Builds
When building and merging packages, it is important to note that our CI process builds packages. The targets `make ci` and `make ci-pr` execute `make -C pkg build`. These in turn execute `linuxkit pkg build` for each package under `pkg/`. This in turn will try to pull the image whose tag matches the tree hash or, failing that, to build it.
We do not want the builds to happen with each CI run for two reasons:
Any released image, i.e. any package under `pkg/` that has _not_ changed as
part of a pull request,
already will be released to Docker Hub. This will cause it to download that image, rather
than try to build it.
Any non-releaed image, i.e. any package under `pkg/` that _has_ changed as part of
a pull request, will not be in Docker Hub until the PR has merged.
This will cause the download to fail, leading `linuxkit pkg build` to try and build the
image and save it in the cache.
This does have two downsides:
1. It is slower to do a package build than to just pull the latest image.
2. If any of the steps of the build fails, e.g. a `curl` download that depends on an intermittent target, it can cause all of CI to fail.
Thus, if, as a maintainer, you merge any commits into a `pkg/`, even if the change is documentation alone, please do a `linuxkit package push`.
In the past, each PR required a maintainer to build, and push to Docker Hub, every
changed package in `pkg/`. This placed the maintainer in the PR cycle, with the
following downsides:
1. A maintainer had to be involved in every PR, not just reviewing but actually building and pushing. This reduces the ability for others to contribute.
1. The actual package is pushed out by a person, violating good supply-chain practice.
## Package source
@@ -40,8 +54,8 @@ A package source consists of a directory containing at least two files:
-`extra-sources`_(list of strings)_: Additional sources for the package outside the package directory. The format is `src:dst`, where `src` can be relative to the package directory and `dst` is the destination in the build context. This is useful for sharing files, such as vendored go code, between packages.
-`gitrepo`_(string)_: The git repository where the package source is kept.
-`network`_(bool)_: Allow network access during the package build (default: no)
-`disable-content-trust`_(bool)_: Disable Docker content trust for this package (default: no)
-`disable-cache`_(bool)_: Disable build cache for this package (default: no)
-`buildArgs` will forward a list of build arguments down to docker. As if `--build-arg` was specified during `docker build`
-`config`: _(struct `github.com/moby/tool/src/moby.ImageConfig`)_: Image configuration, marshalled to JSON and added as `org.mobyproject.config` label on image (default: no label)
-`depends`: Contains information on prerequisites which must be satisfied in order to build the package. Has subfields:
-`docker-images`: Docker images to be made available (as `tar` files via `docker image save`) within the package build context. Contains the following nested fields:
@@ -53,9 +67,9 @@ A package source consists of a directory containing at least two files:
### Prerequisites
Before you can build packages you need:
- Docker version 17.06 or newer. If you are on a Mac you also need
`docker-credential-osxkeychain.bin`, which comes with Docker for Mac.
-`make`,`notary`,`base64`, `jq`, and `expect`
- Docker version 19.03 or newer.
- If you are on a Mac you also need`docker-credential-osxkeychain.bin`, which comes with Docker for Mac.
-`make`, `base64`, `jq`, and `expect`
- A *recent* version of `manifest-tool` which you can build with `make
bin/manifest-tool`, or `go get github.com:estesp/manifest-tool`, or
via the LinuxKit homebrew tap with `brew install --HEAD
@@ -66,68 +80,239 @@ Further, when building packages you need to be logged into hub with
`docker login` as some of the tooling extracts your hub credentials
during the build.
### Build Targets
LinuxKit builds packages as docker images. It deposits the built package as a docker image in one or both of two targets:
* the linuxkit cache, which is at `~/.linuxkit/cache/` (configurable)
* the docker image cache (optional)
The package _always_ is built and saved in the linuxkit cache. However, you _also_ can load the package for the current
architecture, if available, into the docker image cache.
If you want to build images and test and run them _in a standalone_ fashion locally, then you should add the docker image cache.
Otherwise, you don't need anything more than the default linuxkit cache. LinuxKit defaults to building OS images using docker
images from this cache, only looking in the docker cache if instructed to via `linuxkit build --docker`.
In the linuxkit cache, it creates all of the layers, the manifest that can be uploaded
to a registry, and the multi-architecture index. If an image already exists for a different architecture in the cache,
it updates the index to include additional manifests created.
The order of building is as follows:
1. Build the image to the linuxkit cache
1. If `--docker` is provided, load the image into the docker image cache
For example:
```bash
linuxkit pkg build pkg/foo # builds pkg/foo and places it in the linuxkit cache
linuxkit pkg build pkg/foo --docker # builds pkg/foo and places it in the linuxkit cache and also loads it into docker
```
#### Build Platforms
By default, `linuxkit pkg build` builds for all supported platforms in the package's `build.yml`, whose syntax is available
[here][Package source]. If no platforms are provided in the `build.yml`, it builds for all platforms that linuxkit supports.
As of this writing, those are:
* `linux/amd64`
* `linux/arm64`
* `linux/s390x`
You can choose to skip one of the platforms from `build.yml` or those selected
The options for `--platforms` are identical to those for [docker build](https://docs.docker.com/engine/reference/commandline/build/).
An example is available in the official [buildx documentation](https://docs.docker.com/buildx/working-with-buildx/#build-multi-platform-images).
Given that this is linuxkit, i.e. all builds are for linux, the `OS` part would seem redundant, and it should be sufficient to pass `--platform arm64`. However, for complete consistency, the _entire_ platform, e.g. `--platforms linux/amd64,linux/arm64`, must be provided.
#### Where it builds
You are running the `linuxkit pkg build` command on a single platform, e.g. your local linux cloud instance running on `amd64`, or
a MacBook with Apple Silicon running on `arm64`.
How does linuxkit determine where to build the target images?
linuxkit uses [buildkit](https://github.com/moby/buildkit) directly to build all images.
It uses docker contexts to determine _where_ to run those buildkit containers, based on the target
architecture.
When running a package build, linuxkit looks for a container named `linuxkit-builder`, running the appropriate
version of buildkit. If it cannot find a container with that name, it creates it.
If the container already exists but is not running buildkit, or if the version is incorrect, linuxkit stops and removes
the existing `linuxkit-builder` container and creates one running the correct version of buildkit.
When linuxkit needs to build a package for a particular architecture:
1. If a context for that architecture was provided, use that context, looking for and/or starting a buildkit container named `linuxkit-builder`.
1. If no context for that architecture was provided, use the `default` context.
The actual building then will be one of:
1. native, if the provided context has the same architecture as the target build architecture; else
1. cross-build, if the provided context has a different architecture, but the package's `Dockerfile` supports cross-building; else
1. emulated build, using docker's qemu binfmt capabilities
Cross-building, i.e. building on one platform using that platform's binaries to create outputs for a different platform,
depends on the package's `Dockerfile`. Details are available in the
* if the image is just `FROM something`, then it runs it under qemu using binfmt
* if the image is `FROM --platform=$BUILDPLATFORM something`, then it runs it using the local architecture, invoking cross-builders
Read the official docs to learn more how to leverage cross-building with buildx.
**Important:** When building, if the local architecture is not one of those being build,
selecting `--docker` to load the images into the docker image cache will result in an error.
You _must_ be building for the local architecture - optionally for others as well - in order to
pass the `--docker` option.
#### Providing native builder nodes
linuxkit is capable of using native build nodes to do the build, even remotely. To do so, you must:
1. Create a [docker context](https://docs.docker.com/engine/context/working-with-contexts/) that references the build node
1. Tell linuxkit to use that context for that architecture
linuxkit will then use that provided context to look for and/or start a container in which to run buildkit for that architecture.
linuxkit looks for contexts in the following descending order of priority:
1. CLI option `--builders <platform>=<context>,<platform>=<context>`, e.g. `--builders linux/arm64=linuxkit-arm64,linux/amd64=default`
1. Environment variable `LINUXKIT_BUILDERS=<platform>=<context>,<platform>=<context>`, e.g. `LINUXKIT_BUILDERS=linux/arm64=linuxkit-arm64,linux/amd64=default`
1. Existing context named `linuxkit-<platform>`, e.g. `linuxkit-linux-arm64` or `linuxkit-linux-s390x`, with "/" replaced by "-", as "/" is an invalid character.
1. Default context
If a builder name is provided for a specific platform, and it doesn't exist, it will be treated as a fatal error.
#### Examples
##### Simple build
There are no contexts starting with `linuxkit-`, no environment variable `LINUXKIT_BUILDERS`, no command-line argument `--builders`.
linuxkit will build any requested packages using `default` context on the local platform, with a container (created, if necessary) named `linuxkit-builder`.
Builds for the same architecture will be native, builds for other platforms will use either qemu or cross-building.
##### Specified target
You create a context named `my-remote-arm64` and then run:
* for arm64 using the context `linuxkit-linux-arm64`, since there is a context with the name `linuxkit-<platform>`, and you did not override it using `--builders` or the environment variable `LINUXKIT_BUILDERS`
* for amd64 using the context `default` and the `linuxkit` builder, as that is the default fallback
##### Combination
You create a context named `linuxkit-linux-arm64`, and another named `my-remote-builder-amd64` and then run:
* for arm64 using the context `linuxkit-linux-arm64`, since there is a context with the name `linuxkit-<platform>`, and you did not override that particular architecture using `--builders` or the environment variable `LINUXKIT_BUILDERS`
* for amd64 using the context `my-remote-builder-amd64`, since you specified for that architecture using `--builders`
The same would happen if you used `LINUXKIT_BUILDERS=linux/arm64=my-remote-builder-amd64` instead of the `--builders` flag.
##### Missing context
You do not have a context named `my-remote-arm64`, and run:
@@ -35,7 +35,7 @@ specified bucket, and create a bootable image from the stored image.
Alternatively, you can use the `AWS_BUCKET` environment variable to specify the bucket name.
**Note:** If the push times out before it finishes, you can use the `-timeout` flag to extend the timeout.
**Note:** If the push times out before it finishes, you can use the `-timeout` flag to extend the timeout. You may also want to consider passing `-ena` to enable enhanced networking in the AMI.
@@ -11,17 +11,7 @@ Supported (tested) versions of the relevant OpenStack APIs are:
## Authentication
LinuxKit's support for OpenStack handles two ways of providing the endpoint and authentication details. You can either set the standard set of environment variables and the commands detailed below will inherit those, or you can explicitly provide them on the command-line as options to `push` and `run`. The examples below use the latter, but if you prefer the former then you'll need to set the following:
```shell
OS_USERNAME="admin"
OS_PASSWORD="xxx"
OS_TENANT_NAME="linuxkit"
OS_AUTH_URL="https://keystone.com:5000/v3"
OS_USER_DOMAIN_NAME=default
OS_CACERT=/path/to/cacert.pem
OS_INSECURE=false
```
LinuxKit's support for OpenStack includes configuring access to your cloud as detailed in the official [os-client-config](https://docs.openstack.org/os-client-config/latest/user/configuration.html) documentation.
## Push
@@ -40,32 +30,17 @@ Images generated with Moby can be uploaded into OpenStack's image service with `
```shell
./linuxkit push openstack \
-authurl=https://keystone.example.com:5000/v3 \
-username=admin \
-password=XXXXXXXXXXX \
-project=linuxkit \
-img-name=LinuxKitTest
./linuxkit.iso
```
If successful, this will return the image's UUID. If you've set your environment variables up as described above, this command can then be simplified:
```shell
./linuxkit push openstack \
-img-name "LinuxKitTest"\
~/Desktop/linuxkitmage.qcow2
```
## Run
Virtual machines can be launched using `linuxkit run openstack`. As an example:
This is a quick guide to run LinuxKit on Scaleway (only VPS x86_64 for now)
## Setup
Before you proceed it's recommanded that you set up the [Scaleway CLI](https://github.com/scaleway/scaleway-cli/)
and perform an `scw login`. This will create a `$HOME/.scwrc` file containing the required API token.
You can also use the `SCW_TOKEN` environment variable to set a Scaleway token.
The`-token` flag of the `linuxkit push scaleway` and `linuxkit run scaleway`can also be used.
You must create a Scaleway API Token (combination of Access and Secret Key), available at [Scaleway Console](https://console.scaleway.com/account/credentials), first.
Then you can use it either with the `SCW_ACCESS_KEY` and `SCW_SECRET_KEY` environment variables or the `-access-key` and `-secret-key` flags
of the `linuxkit push scaleway` and `linuxkit run scaleway` commands.
The environment variable `SCW_TARGET_REGION` is used to set the region (there is also the `-region` flag)
In addition, Organization ID value has to be set, either with the `SCW_DEFAULT_ORGANIZATION_ID` environment variable or the `-organization-id` command line flag.
The environment variable `SCW_DEFAULT_ZONE` is used to set the zone (there is also the `-zone` flag)
There are no special integration services available for Virtualization.Framework, but
there are a number of packages, such as `vsudd`, which enable
tighter integration of the VM with the host (see below).
The Virtualization.Framework backend also allows passing custom userdata into the
[metadata package](./metadata.md) using either the `-data` or `-data-file` command-line
option. This attaches a CD device with the data on.
### `vsudd` unix domain socket forwarding
The [`vsudd` package](/pkg/vsudd) provides a daemon that exposes unix
domain socket inside the VM to the host via virtio or Hyper-V sockets.
With Virtualization.Framework, the virtio sockets can be exposed as unix domain
sockets on the host, enabling access to other daemons, like
`containerd` and `dockerd`, from the host. An example configuration
file is available in [examples/vsudd-containerd.yml](/examples/vsudd-containerd.yml).
After building the example, run it with `linuxkit run virtualization.framework
-vsock-ports 2374 vsudd`. This will create a unix domain socket in the state directory that maps to the `containerd` control socket. The socket is called `guest.00000946`.
If you install the `ctr` tool on the host you should be able to access the
`containerd` running in the VM:
```
$ go get -u -ldflags -s github.com/containerd/containerd/cmd/ctr
git commit -a -s -m "Update package tags to $LK_RELEASE"
```
*`LK_BRANCH` is set to `rel_$LK_RELEASE`, when cutting a release, for e.g. `LK_BRANCH=rel_v0.9`
* It not necessarily required to update the alpine base image if it has recently been updated, but it is good to pick up any recent bug
fixes. However, you do need to update the tools, packages and tests.
* Releases are a particularly good time to check for updates in wrapped external dependencies, as highlighted in [alpine-base-update.md#External Tools](./alpine-base-update.md#External_Tools)
### Final preparation steps
@@ -275,5 +86,3 @@ This completes the release, but you are not done, one more step is required.
Create a PR which bumps the version number in the top-level `Makefile`
to `$LK_RELEASE+` to make sure that the version reported by `linuxkit
@@ -50,8 +50,6 @@ and namespaced separately from the host as appropriate.
LinuxKit's build process heavily leverages Docker images for packaging. Of note, all intermediate build images
are referenced by digest to ensures reproducibility across LinuxKit builds. Tags are mutable, and thus subject to override
(intentionally or maliciously) - referencing by digest mitigates classes of registry poisoning attacks in LinuxKit's buildchain.
Certain images, such as the kernel image, will be signed by LinuxKit maintainers using [Docker Content Trust](https://docs.docker.com/engine/security/trust/content_trust/),
which guarantees authenticity, integrity, and freshness of the image.
Moreover, LinuxKit's build process leverages [Alpine Linux's](https://alpinelinux.org/) hardened userspace tools such as
Musl libc, and compiler options that include `-fstack-protector` and position-independent executable output. Go binaries
This command will generate some private keys in `~/.docker/trust` and ask you for passphrases such that they are encrypted at rest.
All linuxkit repositories are currently using the same root key so we can pin trust on key ID `1908a0cf4f55710138e63f65ab2a97e8fa3948e5ca3b8857a29f235a3b61ea1b`.
We'll also let the notary server take control of the snapshot key, for easier delegation collaboration:
Maintainers are to sign with `delegation` keys, which are adminstered by a non-root key.
Thusly, they are easily rotated without having to bring the root key online.
Additionally, maintainers can be added to separate roles for auditing purposes: the current setup is to add maintainers to both the `targets/releases` role that is intended
for release consumption, as well as an individual `targets/<maintainer_name>` role for auditing.
Docker will automatically sign into both roles when pushing with Docker Content Trust.
Here's what the command looks like to add all maintainers to the `targets/releases` role:
@@ -11,9 +11,10 @@ are downloaded at build time to create an image. The image is self-contained and
so it can be tested reliably for continuous delivery.
Components are specified as Docker images which are pulled from a registry during build if they
are not available locally. The Docker images are optionally verified with Docker Content Trust.
are not available locally. See [image-cache](./image-cache.md) for more details on local caching.
The Docker images are optionally verified with Docker Content Trust.
For private registries or private repositories on a registry credentials provided via
`docker login` are re-used.
`docker login` are re-used.
The configuration file is processed in the order `kernel`, `init`, `onboot`, `onshutdown`,
`services`, `files`. Each section adds files to the root file system. Sections may be omitted.
@@ -124,19 +125,6 @@ file:
Because a `tmpfs` is mounted onto `/var`, `/run`, and `/tmp` by default, the `tmpfs` mounts will shadow anything specified in `files` section for those directories.
## `trust`
The `trust` section specifies which build components are to be cryptographically verified with
[Docker Content Trust](https://docs.docker.com/engine/security/trust/content_trust/) prior to pulling.
Trust is a central concern in any build system, and LinuxKit's is no exception: Docker Content Trust provides authenticity,
integrity, and freshness guarantees for the components it verifies. The LinuxKit maintainers are responsible for signing
`linuxkit` components, though collaborators can sign their own images with Docker Content Trust or [Notary](https://github.com/docker/notary).
-`image` lists which individual images to enforce pulling with Docker Content Trust.
The image name may include tag or digest, but the matching also succeeds if the base image name is the same.
-`org` lists which organizations for which Docker Content Trust is to be enforced across all images,
for example `linuxkit` is the org for `linuxkit/kernel`
## Image specification
Entries in the `onboot` and `services` sections specify an OCI image and
@@ -144,7 +132,9 @@ options. Default values may be specified using the `org.mobyproject.config` imag
For more details see the [OCI specification](https://github.com/opencontainers/runtime-spec/blob/master/spec.md).
If the `org.mobylinux.config` label is set in the image, that specifies default values for these fields if they
are not set in the yaml file. You can override the label by setting the value, or setting it to be empty to remove
are not set in the yaml file. While most fields are _replaced_ if they are specified in the yaml file,
some support _add_ via the format `<field>.add`; see below.
You can override the label entirely by setting the value, or setting it to be empty to remove
the specification for that value in the label.
If you need an OCI option that is not specified here please open an issue or pull request as the list is not yet
@@ -159,6 +149,7 @@ bind mounted into a container.
extracted from this so they need not be filled in.
-`capabilities` the Linux capabilities required, for example `CAP_SYS_ADMIN`. If there is a single
capability `all` then all capabilities are added.
-`capabilities.add` the Linux capabilities required, but these are added to the defaults, rather than overriding them.
-`ambient` the Linux ambient capabilities (capabilities passed to non root users) that are required.
-`mounts` is the full form for specifying a mount, which requires `type`, `source`, `destination`
and a list of `options`. If any fields are omitted, sensible defaults are used if possible, for example
@@ -166,6 +157,7 @@ bind mounted into a container.
can be replaced by specifying a mount with new options here at the same mount point.
-`binds` is a simpler interface to specify bind mounts, accepting a string like `/src:/dest:opt1,opt2`
similar to the `-v` option for bind mounts in Docker.
-`binds.add` is a simpler interface to specify bind mounts, but these are added to the defaults, rather than overriding them.
-`tmpfs` is a simpler interface to mount a `tmpfs`, like `--tmpfs` in Docker, taking `/dest:opt1,opt2`.
-`command` will override the command and entrypoint in the image with a new list of commands.
-`env` will override the environment in the image with a new environment list. Specify variables as `VAR=value`.
@@ -240,6 +232,31 @@ services:
- CAP_DAC_OVERRIDE
```
## `devices`
To access the console, it's necessary to explicitly add a "device" definition, for example:
```
devices:
- path: "/dev/console"
type: c
major: 5
minor: 1
mode: 0666
```
See the [getty package](../pkg/getty/build.yml) for a more complete example
and see [runc](https://github.com/opencontainers/runc/commit/60e21ec26e15945259d4b1e790e8fd119ee86467) for context.
To grant access to all block devices use:
```
devices:
- path: all
type: b
```
See the [format package](../pkg/format/build.yml) for an example.
### Mount Options
When mounting filesystem paths into a container - whether as part of `onboot` or `services` - there are several options of which you need to be aware. Using them properly is necessary for your containers to function properly.
File diff suppressed because it is too large
Load Diff
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.