After 'make oldconfig' we check that that the kernel config
is as we expect and error if they don't. We used to print
the default 'diff' output on a mismatch but a unified diff
is easier to read.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
This cherry picks:
- b6fe0440c637 ("bridge: implement missing ndo_uninit()")
- b1b9d366028f ("bridge: move bridge multicast cleanup to ndo_uninit")
The fix is in b1b9d366028f ("bridge: move bridge multicast cleanup
to ndo_uninit") but it requires b6fe0440c637 ("bridge: implement missing
ndo_uninit()"). Furthermore, b1b9d366028f needed some manual resolution
of a cherry-pick conflict because the surrounding code had changed.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
We want to compile BCC for the latest LTS and the latest
stable and missed the update to 4.18 when enabling it. Do
it now.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Note, this update skips 4.18.2/4.17.16/4.14.64/4.9.121/4.4.149
as the change was a single patch, a bug fix.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
In setup_net() there are a few particularly slow subsystems that
contribute more than 140ms of time to the new net namespace creation
path. The docker daemon doesn't depend on these, and won't modprobe
them into the kernel. Convert these to modules to reduce the amount of
time it takes for docker to start a container. This change takes an
additional ~120 ms of time off container start time.
Signed-off-by: Krister Johansen <krister.johansen@oracle.com>
While investigating performance problems around 'docker run' times, it
was observed that a large amount of time was spent in network namespace
creation. Of that time, a large portion involved waiting for RCU grace
periods to elapse. Increasing HZ causes the periodic timer to check for
quiesced periods more frequently, which consequently reduces the amount
of time RCU callers spend waiting for grace periods and in barrier
waits.
By itself, this change took the amount of time to execute a 'docker run
hello-world' down to 570ms from over 2000ms on 4.14, and down to 390ms
from 1260 on 4.17 and 4.18.
Signed-off-by: Krister Johansen <krister.johansen@oracle.com>
The kernel config was derived from the 4.17.x kernel config
and then tweaked a little. Specifically:
- Enable XDP_SOCKETS
- Enable NFT_CONNLIMIT
- Enable IP_VS_MH
- Enable BPFILTER (as module)
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
The 4.14.63 contains important security fixes in particular
against L1TF (CVE-2018-3615, CVE-2018-3620, CVE-2018-3646) and
userspace-userspace SpectreRSB.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
The previous commit updated to 4.16.18, which is the last
4.16.x kernel. The 4.16.18 kernel was compiled and pushed
but we may as well now remove it as it has been EOLed.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
While we can re-create the kernel source code we don't have it
handily available in one place. This commit stashes the kernel
and the WireGuard source as /src/linux.tar.xz and
/src/wireguard.tar.xz in the kernel package.
This increases the size of the hub image by around 100MB.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
Trying to keep the number of kernels we compile for these
platforms small and 4.16 is likely to be EOLed soon anyway.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
The kernel configs are the 4.16.x configs run through
a 'make defconfig && make oldconfig' cycle.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
Note, we skip 4.14.45 because 4.14.46 only has 3 patches
in it which unbreak 'perf' compilation.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
This microcode bundle comes with a file called "list"
which seems to confuse the 'iucode_tool', so we just
remove it.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
the 4.14.38 kernel backported the Spectre mitigation requiring
a change of the kernel config.
Might as well enabled the mitigations by default.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
This is useful for some baremetal configs, such as using
USB sticks on a RPi3. I enabled it for x86_64 as well
to keep the differences smaller.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@gmail.com>
Note, the depeding SERIAL_DEV_CTRL_TTYPORT defaults to
'N' with the 4.14.x kernel and 'Y' for the 4.16.x kernel.
I chose to stick with the defaults.
This may fix the serial console issue, I've seen on the RPi3
with 4.14.x kernels.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@gmail.com>
The s390x build VM we have access to is quite slow. Dropping
the 4.15.x kernel, which soon will be EOLed anyway, to
save some time.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
For example kernel module signatures if you do not provide a key. So add
to the dependencies for kernel builds.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
The kernel config is based on the 4.15.x kernel config
run through 'make defconfig && make oldconfig' and then
tweaked a little by hand.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
There are too many kernels to compile and arm64 takes a bit
too long to compile even on a beefy arm64 server.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@gmail.com>
These fix some issues around hot-unplugging devices which may be the cause
of some LCOW issues we are seeing.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@gmail.com>
Enable the Integrity Measurement Architecture (IMA) for 4.14.x
and 4.15.x kernels. This pretty much uses the defaults except we
also enable INTEGRITY_ASYMMETRIC_KEYS and IMA_READ_POLICY. The
latter may be useful for debugging.
For s390x we also needed to enable TPM support.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
- Disable all network device driver apart from Mellanox, which
is the only support NIC on s390x
- Disable Fusion MPT
- Disable DAX/NVMEM/NVME
- Disable USB
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
While this now has some duplication, it is clearer as to which
kernels are compiled for each architecture.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
Update building process to add s390 support.
The patch serial-forbid-8250-on-s390.patch has been added to disable
8250 serial for s390.
The patch is available upstream https://patchwork.kernel.org/patch/10106437/
but it is not backported.
Signed-off-by: Alice Frosi <alice@linux.vnet.ibm.com>
Also remove the 4.4 patch which should have been removed by
231cead2cc ("kernel: Update to 4.15.4/4.14.20/4.9.82/4.4.116")
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
We may soon get another arch, so wanted to set the template
for having per arch list of kernels to compile.
While at it also drop the 4.4.x kernel for arm64. We never really
tested it and folks should be on 4.9 or 4.14 anyway. I'll leave
4.4.x for x86 for now as it might be useful to test for regressions.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
In order to cut the number of kernels we build, remove the debug
kernel for the now non-default 4.9.x series.
Also remove the -rt debug kernel. Users who need it can build
it themselves with 'make EXTRA=-rt DEBUG=-dbg build_4.14.x'
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
These are part of the Meltdown/Spectre mitigations for arm64
now available for 4.14 and 4.15
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
The 4.14.20 update has Meltdown/Spectre fixes for arm64
The 4.4.116 update incorporates the proper fix for the
div by zero crash in the firmware loader, so the patch
with the hackish workaround was dropped.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
In order to get such a preempt-rt Linux kerne, we grab -rt patch via
https://www.kernel.org/pub/linux/kernel/projects/rt/. So far we just enable it
over 4.14.x.
Signed-off-by: Tiejun Chen <tiejun.china@gmail.com>
This should make debugging a lot easier. Note, 991f8f1c6eb6
("hyper-v: trace channel events"), patch 18, required some
minor modifications from upstream as another patch was not easy
to cherry-pick.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
Drop the hack for the microcode division by 0 on GCP as
a proper fix is in upstream as:
2760f452a718 ("x86/microcode: Do the family check first")
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
These kernels have significant changes/addition for Spectre
mitigation as well as the usual other set of fixes.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
The CONFIG_BPF_JIT_ALWAYS_ON option has now been back-ported
to 4.4.115 as well. Enable it.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
This adds a patch to avoid a division by zero panic for 4.4.x
and 4.9.x kernels on single vCPU machine types on Google Cloud.
4.14.x and 4.15.x kernels seem to work fine.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
This option is not enabled by default, but disables the
BPF interpreter which can be used to inject speculative
execution into the kernel. Enabled it as it seems
like a good security measure.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
The 4.14 and 4.9 kernels have a significant number of
fixes to eBPF and also a fix for kernel level sockets
and namespace removals, ie fixes some aspects of
https://github.com/moby/moby/issues/5618
"unregister_netdevice: waiting for lo to become free"
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
* receive: treat packet checking as irrelevant for timers
Small simplification to the state machine, as discussed with Mathias
Hall-Andersen.
* socket: check for null socket before fishing out sport
* wg-quick: ifnames have max len of 15
* tools: plug memleak in config error path
Important bug fixes.
* external-tests: add python implementation
Piotr Lizonczyk has contributed a test vector written in Python.
* poly1305: remove indirect calls
From Samuel Neves, we now are in a better position to mitigate speculative
execution attacks.
* curve25519: modularize implementation
* curve25519: import 32-bit fiat-crypto implementation
* curve25519: import 64-bit hacl-star implementation
* curve25519: resolve symbol clash between fe types
* curve25519: wire up new impls and remove donna
* tools: import new curve25519 implementations
* contrib: keygen-html: update curve25519 implementation
Two of our Curve25519 implementations now use formally verified C. Read this
mailing list post for more information:
https://lists.zx2c4.com/pipermail/wireguard/2018-January/002304.html
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
- Enable RETPOLINE by default. Note, however, this will
only be used if the compiler supports it.
- Enable sysfs interface for vulnerabilities
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
The 4.4.14 has a number of important fixes/additions:
- New support for retpolines (enabled but requires newer gcc
to take advantage of). This provides mitigation for Spectre
style attacks.
- Various KPTI fixes including fixes for EFI booting
- More eBPF fixes around out-of-bounds and overflow of
maps. These were used for variant 1 of CVE-2017-5753.
- Several KVM related to CVE-2017-5753, CVE-2017-5715,
CVE-2017-17741.
- New sysfs interface listing vulnerabilities:
/sys/devices/system/cpu/vulnerabilities
The 4.9.77 kernel also has seems to have most/all of the above
back-ported.
See https://lwn.net/SubscriberLink/744287/1fc3c18173f732e7/
for more details on the Spectre mitigation.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
DOwnload and verify the Intel microcode package and convert it
to a cpio archive which can be prepended to the initrd.
It also adds the license file to the kernel package.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
This looks like there are a couple of minor fixes to the
recent KPTI changes but nothing major...
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
This is the new Lernel Page Table Isolation (KPTI,
formerly KAISER) introduced with 4.14.11 (and in
4.15.rcX).
KPTI runs the kernel and userspace off separate
pagetables (and uses PCID on more recent processors
to minimise the TLB flush penalty). It comes with
a performance hit but is enabled by default as a
workaround around some serious, not yet disclosed,
bug in Intel processors.
When enabled in the kernel config, KPTI will be
be dynamically enabled at boot time deping on the
CPU it is executing (currently all Intel x86 CPUs).
Depending on the environment, you may choose to
disable it using 'pti=off' on the kernel commandline.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
This contains the fixes to the eBPF verifier which allowed
privilege escalation in 4.9 and 4.14 kernels.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
Commit 340d45d70850 ("locking/refcounts, x86/asm: Enable
CONFIG_ARCH_HAS_REFCOUNT") re-enabled the ARCH_HAS_REFCOUNT
again as default. Pick it up in our kernel config.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
* curve25519: explictly depend on AS_AVX
* curve25519: modularize dispatch
It's now much cleaner to see which implementation we're calling, and it will
be simpler to add more implementations in the future.
* compat: support RAP in assembly
This should fix PaX/Grsecurity support.
* device: do not clear keys during sleep on Android
While we want to clear keys when going to sleep on ordinary Linux, this
doesn't make sense in the Android world, where phones often sleep but are
woken up every few milliseconds by the radios to process packets.
* compat: fix 3.10 backport
Important compat fixes for non-x86.
* device: clear last handshake timer on ifdown
When bringing up an interface, we don't want the rate limiting to handshakes
to apply.
* netlink: rename symbol to avoid clashes
Allows coexistance with horrible Android drivers.
* kernel-tree: jury rig is the more common spelling
* tools: no need to put this on the stack
* blake2s-x86_64: fix spacing
Small fixes.
* contrib: keygen-html for generating keys in the browser
This was covered here:
https://lists.zx2c4.com/pipermail/wireguard/2017-December/002127.html
* tools: remove undocumented unused syntax
Not only did nobody know about this or use it, but the implementation actually
exposed compiler bugs in Qualcomm's "Snapdragon Clang".
* poly1305: update x86-64 kernel to AVX512F only
From Samuel Neves, this pulls in Andy Polyakov's changes to only require F and
not VL for the Poly implementation.
* chacha20-arm: fix with clang -fno-integrated-as.
This pulls in David Benjamin's clang fix.
* global: add SPDX tags to all files
From Greg KH, we now have SPDX annotations on all files, matching upstream
kernel's new approach to file licenses.
* chacha20poly1305: cleaner generic code
This entirely removes the last remains of Martin Willi's ChaCha
implementation, and now the generic C implementation is extremely small and
clearly written, while delivering a small performance boost too.
* poly1305: fix avx512f alignment bug
Unlucky people may have had their linkers misalign a constant. This fixes that
potential.
* chacha20: avx512vl implementation
From Samuel Neves, this imports Andy Polyakov's AVX512VL implementation of
ChaCha which should have a ~50% performance improvement over AVX2, though it
is still much slower than our AVX512F implementation.
* chacha20poly1305: wire up avx512vl for skylake-x
Some Skylake machines do not have two FMA units (though others do), so we
prefer the AVX512VL implementation over the should-be-faster AVX512F
implementation on those machines. What's needed now is to read the PIROM in
order to determine at runtime whether the particular Skylake-X machine
actually has the second FMA unit or not, but until that happens, we just fall
back to the VL implementation for all Skylake-X.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
This is a double bump.
Changes 0.0.20171122:
* chacha20poly1305: fast primitives from Andy Polyakov
Samuel Neves and I have spent considerable time and headaches porting,
reworking, and partially rewriting Andy's optimized implementations of
ChaCha20 and Poly1305. We now support the following:
On x86_64:
- Poly1305: integer unit
- ChaCha20: SSSE3
- HChaCha20: SSSE3
- Poly1305: AVX
- ChaCha20: AVX2
- Poly1305: AVX2
- ChaCha20: AVX512
- Poly1305: AVX512
On ARM:
- Poly1305: integer unit
- ChaCha20: NEON
- Poly1305: NEON
On ARM64:
- Poly1305: integer unit
- ChaCha20: NEON
- Poly1305: NEON
On MIPS64:
- Poly1305: integer unit
All others:
- ChaCha20: generic C
- Poly1305: generic C
This is a pretty substantial amount of new handrolled assembly. It will
perhaps MURDER KITTENS, so please tread lightly with this snapshot and adjust
expectations accordingly. I'm looking forward to quickly fixing any issues
folks find while testing.
Performance-wise, this should see increases all around. The biggest speedups
will be on ARM and ARM64, but x86_64 and MIPS64 should also see modest speed
improvements too, especially on Skylake systems supporting AVX512.
* chacha20poly1305: add more test vectors, some of which are weird
Test vectors are pretty important, so we added more to catch odd edge cases
using the following butcher's code:
from cryptography.hazmat.primitives.ciphers.aead import ChaCha20Poly1305
import os
def encode_blob(blob):
a = ""
for i in blob:
a += "\\x" + hex(i)[2:]
return a
enc = [ ]
dec = [ ]
def make_vector(plen, adlen):
key = os.urandom(32)
nonce = os.urandom(8)
p = os.urandom(plen)
ad = os.urandom(adlen)
c = ChaCha20Poly1305(key).encrypt(nonce=bytes(4) + nonce, data=p, associated_data=ad)
out = "{\n"
out += "\t.key\t= \"" + encode_blob(key) + "\",\n"
out += "\t.nonce\t= \"" + encode_blob(nonce) + "\",\n"
out += "\t.assoc\t= \"" + encode_blob(ad) + "\",\n"
out += "\t.alen\t= " + str(len(ad)) + ",\n"
out += "\t.input\t= \"" + encode_blob(p) + "\",\n"
out += "\t.ilen\t= " + str(len(p)) + ",\n"
out += "\t.result\t= \"" + encode_blob(c) + "\"\n"
out += "}"
enc.append(out)
out = "{\n"
out += "\t.key\t= \"" + encode_blob(key) + "\",\n"
out += "\t.nonce\t= \"" + encode_blob(nonce) + "\",\n"
out += "\t.assoc\t= \"" + encode_blob(ad) + "\",\n"
out += "\t.alen\t= " + str(len(ad)) + ",\n"
out += "\t.input\t= \"" + encode_blob(c) + "\",\n"
out += "\t.ilen\t= " + str(len(c)) + ",\n"
out += "\t.result\t= \"" + encode_blob(p) + "\"\n"
out += "}"
dec.append(out)
make_vector(0, 0)
make_vector(0, 8)
make_vector(1, 8)
make_vector(1, 0)
make_vector(129, 7)
make_vector(256, 0)
make_vector(512, 0)
make_vector(513, 9)
make_vector(1024, 16)
make_vector(1933, 7)
make_vector(2011, 63)
print("======== encryption vectors ========")
print(", ".join(enc))
print("\n\n\n======== decryption vectors ========")
print(", ".join(dec))
* wg-quick: document localhost exception and v6 rule
Probably a "kill switch" wants this too:
-m addrtype ! --dst-type LOCAL
so that basic local services can continue to work.
* selftest: allowedips: randomized test mutex update
* allowedips: do not write out of bounds
* device: uninitialize socket first in destruction
* tools: tighten up strtoul parsing
Small fixups.
* qemu: update kernel
* qemu: use unprefixed strip when not cross-compiling
Fedora/Redhat doesn't ship with a prefixed strip, and we don't need
to use it anyway when we're not cross compiling, so don't.
* compat: 3.16.50 got proper rt6_get_cookie
* compat: stable finally backported fix
* compat: new kernels have netlink fixes
* compat: fix compilation with PaX
Usual set of compatibility updates.
* curve25519-neon: compile in thumb mode
In thumb mode, it's not possible to use sp as an operand of and, so
we have to muck around with r3 as a scratch register.
* socket: only free socket after successful creation of new
When an interface is down, the socket port can change freely. A socket
will be allocated when the interface comes up, and if a socket can't be
allocated, the interface doesn't come up.
However, a socket port can change while the interface is up. In this
case, if a new socket with a new port cannot be allocated, it's
important to keep the interface in a consistent state. The choices are
either to bring down the interface or to preserve the old socket. This
patch implements the latter.
* global: switch from timeval to timespec
This gets us nanoseconds instead of microseconds, which is better, and
we can do this pretty much without freaking out existing userspace,
which doesn't actually make use of the nano/microseconds field. The below
test program shows that this won't break existing sizes:
zx2c4@thinkpad ~ $ cat a.c
void main()
{
puts(sizeof(struct timeval) == sizeof(struct timespec) ?
"success" : "failure");
}
zx2c4@thinkpad ~ $ gcc a.c -m64 && ./a.out
success
zx2c4@thinkpad ~ $ gcc a.c -m32 && ./a.out
success
Changes 0.0.20171127:
* compat: support timespec64 on old kernels
* compat: support AVX512BW+VL by lying
* compat: fix typo and ranges
* compat: support 4.15's netlink and barrier changes
* poly1305-avx512: requires AVX512F+VL+BW
Numerous compat fixes which should keep us supporting 3.10-4.15-rc1.
* blake2s: AVX512F+VL implementation
* blake2s: tweak avx512 code
* blake2s: hmac space optimization
Another terrific submission from Samuel Neves: we now have an implementation
of Blake2s using AVX512, which is extremely fast.
* allowedips: optimize
* allowedips: simplify
* chacha20: directly assign constant and initial state
Small performance tweaks.
* tools: fix removing preshared keys
* qemu: use netfilter.org https site
* qemu: take shared lock for untarring
Small bug fixes.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
The update in 6ede240737 ("kernel: Update to
4.14.1/4.13.15/4.9.64/4.4.100") failed to build on aarch64.
This fixes it.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
For 'build_perf_' and 'build_zfs_' targets in the Makefile,
since both of them are dependends on the build_$(2)$(3) target,
So, we pull the image with DCT as part of the dependency on build_$(2)$(3)
and then build with DOCKER_CONTENT_TRUST explicitly set to 0.
Signed-off-by: Dennis Chen <dennis.chen@arm.com>
Commit 31c8c4942820 ("security/keys: add CONFIG_KEYS_COMPAT
to Kconfig") moved the KEYS_COMPAT config option to a different
section. Adjust config file.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
REFCOUNT_FULL enables full reference count validation. There is a
potential slow down but ti protects against certain use-after-free
attacks.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
On 4.13 and 4.14 kernels GCC_PLUGIN_RANDSTRUCT can be use to randomise
some kernel data structures such as structs with function pointers.
We also select GCC_PLUGIN_RANDSTRUCT_PERFORMANCE which
tries harder to restrict randomisation to cache-lines in order to reduce
performance impact.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
The 4.13 and 4.14 kernels support GCC_PLUGIN_STRUCTLEAK, a GCC plugin
to zero initialise any structures with the __user attribute to prevent
information exposure.
On 4.14 kernels also enable GCC_PLUGIN_STRUCTLEAK_BYREF_ALL which is
an extension of the above
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
Routine version bump that also removes the necessity of carrying that
extra patch. Changes:
* Kconfig: remove trailing whitespace
* allowedips: rename from routingtable
* tools: remove ioctl cruft
* global: revert checkpatch.pl changes
Cleanliness.
* device: please lockdep
* device: wait for all peers to be freed before destroying
These make the various checkers happy.
* netlink: plug memory leak
* qemu: check for memory leaks
There was a small memory leak on the netlink configuration layer that's now
been fixed.
* receive: hoist fpu outside of receive loop
Should be a small speedup on x86_64.
* qemu: more debugging
* qemu: bump kernel version
Significantly more debugging checkers have been turned on.
* wg-quick: stat the correct enclosing folder of config file
* wg-quick: allow for tabs in keys
Minor fixups for wg-quick(8).
* compat: 4.4.0 has strange ECN function
Nobody actually runs base 4.4.0, but this is more correct anyway.
* netlink: make sure we reserve space for NLMSG_DONE
A rather important change - due to an upstream kernel bug, that's existed
since the advent of netlink itself, sometimes wg(8) failed to receive valid
data back from kernelspace, resulting in "ENOBUFS" when trying to dump all
peers. This patch works around it while we wait for upstream to commit the
fix.
* curve25519: reject deriving from NULL private keys
* tools: allow for NULL keys everywhere
A null 25519 private point isn't a valid point (prior to normalization), which
is why we use it as the "unsetting" value. Conversely, however, except for
psk, we should be using the existence of it in the netlink message being an
indication of whether or not it's set, for the tools.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
The previous commit used the 4.13.x config files as the
4.14.x config files. This commit stashes the result of
running the 4.14.x oldconfig over them.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
The kernel config files are a copy of the 4.13 kernel configs,
which will be refined in subsequent commits.
This does not yet include any patches which may
be required for LCOW.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
4.14.x has dropped 'make firmware_install' and according to [1]
the in-tree firmware has not been updated since 2013, so drop it
for all kernels.
We will need to find another way to add firmware blobs to a
LinuxKit image (see [2])
[1] https://lkml.org/lkml/2017/9/15/343
[2] https://github.com/linuxkit/linuxkit/issues/2714
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
I got error when un-tarring the linux-4.14 kernel:
tar: linux-4.14/arch/arm64/boot/dts/arm: Directory renamed before its status could be extracted
tar: linux-4.14/arch/arm64/boot/dts: Directory renamed before its status could be extracted
tar: linux-4.14/arch/arm64/boot: Directory renamed before its status could be extracted
tar: linux-4.14/arch/arm64: Directory renamed before its status could be extracted
tar: linux-4.14/arch: Directory renamed before its status could be extracted
Using bsdtar, this error goes away.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
We don't have it enabled on x86_64 and according to
https://github.com/linuxkit/linuxkit/issues/2434#issuecomment-342370982
may cause the ThunderX NIC driver from working.
Note, this also disables MEMORY_ISOLATION and ARCH_HAS_GIGANTIC_PAGE
which are internal config variables no longer needed.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
Version 0.0.20171101 errors out when compiled for
debug kernels. This will be fixed in the next release.
In the meantime pull in the patch which fixes the
compile error.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
20171031, the Halloween edition, had a show stopper bug, which was
neither security related, nor did it affect LinuxKit kernels, but
was important enough for me to bump the snapshot. This is the
corresponding LinuxKit bump. Changes:
* wg-quick: save all hooks on save
Tiny bug fix for 'wg-quick save'.
* timers: switch to kees' new timer_list functions
Shiny new things for Linux 4.14.
* compat: unbreak unloading on kernels 4.6 through 4.9
The real motivation for this extra snapshot bump. Before we would run into
some issues when unloading the module, which was not good.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Simple version bump. Changes:
* netns: use read built-in instead of ncat hack for dmesg
* netns: use time-based test instead of quantity-based
* qemu: allow for cross compilation
* qemu: work around ccache bugs
* qemu: test using four cores
* selftest: initialize mutex in routingtable selftest
We now cross compile and run in QEMU for x86_64, i686,
ARMv7, Aarch64, and MIPS. You can see the current build
status on: https://www.wireguard.com/build-status/
* stats: more robust accounting
* compat: fix up stat calculation for udp tunnel
The statistics from `ip link -stats` or from `wg show` are
now much more accurate.
* global: accept decent check_patch.pl suggestions
* global: infuriating kernel iterator style
* global: style nits
* global: use fewer BUG_ONs
* global: get rid of useless forward declarations
* blake2: include headers for macros
* tools: correct type for CTRL_ATTR_FAMILY_ID
Lots of style cleanups.
* crypto/avx: make sure we can actually use ymm registers
This fixes an issue on some Xen platforms that expose
conflicting CPU features.
* peer: get rid of peer_for_each magic
* peer: store total number of peers instead of iterating
A major cleanup of our peer iteration logic, getting rid
of a big ugly macro and clarifying our locking semantics.
* compat: be sure to include header before testing
* wg-quick: allow specifiying multiple hooks
You can now specify {Post,Pre}{Down,Up} multiple times, and
the commands will then run in succession.
* wg-quick: remember to rewind DNS settings on failure
Small consistency fix.
* wg-quick: allow for saving existing interface
There is now a 'save' option for saving an existing
configuration without having to bring down the device.
* wg-quick: fsync the temporary file before renaming
In case the system looses power, you are now left with
either the old file or the new file but not an empty file.
* wg-quick: allow for the hatchet, but not by default
In order to account for distributions that do not have an
implementation of resolvconf(8), the contrib directory ships
with an alternative implementation that may be patched in.
This was extensively discussed and debated on the mailing
list.
* device: only take reference if netns is different
Solves an important memory leak when tearing down network
namespaces that haven't moved the wireguard device.
* device: expand scope of destruct lock
* timers: guard entire setting in block
Just to be certain.
* curve25519: only enable int128 if compiler support is sound
Allows building for Aarch64 with old gcc (such as that used
by Android) where we don't want to branch to a __multi3.
* contrib: add reresolve-dns
A small script that's been passed around for a while now for
reresolving DNS entries from a cronjob.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Note: There were more conflicts in applying the
vmbus patches to 4.13. For now I've just skipped the
conflicting patches so the end-result may be that
Hyper-V sockets on 4.13 may break (if they were not
already broken by the update to 4.13.6).
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
The patches are for vsock and hvsock and anyone using these
should be using more modern kernels.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
It's kinda obvious that these are kernel configuration files
and, looking at various other distros it seems more common
to call the files 'config-<foo>'.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
Copying the entire local directory into the container allows
us to check for the existence of the patch directory and
only apply the patches if the directory exists.
An alternative would have been to re-arrange the patch directory
into a sub-directory, but in terms of copying wouldn't have
made that much if a difference.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
NOTE: Some of the 4.13.x VMBus patches did not apply cleanly and they
were dropped for now. This may break LCOW and other Windows support.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
- Enable ARCH_BCM2835
- Enable USB_NET_SMSC95XX.
- Compile in MII and USB_USBNET. These are needed
by the onboard network driver
- Enable the DWC2 USB controller.
- Enabled MMC, MMC_SDHCI, MMC_BCM2835 for SD card access
- Enable various BCM2835 platform devices: HW_RANDOM_BCM2835,
I2C_BCM2835, PINCTRL_BCM2835, DMA_BCM2835,BCM2835_MBOX,
WM_BCM2835, ...
- Enable SERIAL_8250 and friends.
- Enable FB_SIMPLE to get console output
The above configuration gives a minimal working system
with serial console access (via the GPIO pins), networking
and SD storage. The smsc95xx network driver does not
seem to get autoloaded. This is likely a mdev issue.
We specifically do not configure any WLAN,
sound or graphics drivers as they would pull in
too many other cruft into the kernel. To enable
these we consider adding a -rpi3 config similar
to the -dbg config to provide additional kernel
config options.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
Simple version bump. Changes:
* noise: handshake constants can be read-only after init
* noise: no need to take the RCU lock if we're not dereferencing
* send: improve dead packet control flow
* receive: improve control flow
* socket: eliminate dead code
* device: our use of queues means this check is worthless
* device: no need to take lock for integer comparison
* blake2s: modernize API and have faster _final
* compat: support READ_ONCE
* compat: just make ro_after_init read_mostly
Assorted cleanups to the module, including nice things like marking our
precomputations as const.
* Makefile: even prettier output
* Makefile: do not clean before cloc
* selftest: better test index for rate limiter
* netns: disable accept_dad for all interfaces
Fixes in our testing and build infrastructure. Now works on the 4.14 rc
series.
* qemu: add build-only target
* qemu: work on ubuntu toolchain
* qemu: add more debugging options to main makefile
* qemu: simplify shutdown
* qemu: open /dev/console if we're started early
* qemu: phase out bitbanging
* qemu: always create directory before untarring
* qemu: newer packages
* qemu: put hvc directive into configuration
This is the beginning of working out a cross building test suite, so we do
several tricks to be less platform independent.
* tools: encoding: be more paranoid
* tools: retry resolution except when fatal
* tools: don't insist on having a private key
* tools: add pass example to wg-quick man page
* tools: style
* tools: newline after warning
* tools: account for padding being in zero attribute
Several important tools fixes, one of which suppresses a needless warning.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
'make firmware_install' adds the firmware blobs creating
during the build to '/lib/firmware' in the result tarball.
This should be installed along with the kernel modules.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
By running:
./scripts/update-component-sha.sh --image linuxkit/alpine ad35b6ddbc70faa07e59a9d7dee7707c08122e8d
Signed-off-by: Ian Campbell <ijc@docker.com>
This is a useful read only filesystem for images that is efficient and
small as it supports compression.
For many use cases, when you are writing to media, it makes more sense than
using an initramfs as it does not require RAM, and it is more suitable for
disk media than ISO.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
This new feature was disabled by default, enable it as it seems
sensible to have. From the documentation:
Detect overflows of buffers in common string and memory functions
where the compiler can determine and validate the buffer sizes.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
The patches from 4.12 applied cleanly, except for 81304747d9
("Drivers: hv: vmbus: Fix rescind handling"), which was already
in upstream so has been dropped from the patch series.
The kernel config is from 4.12 run through defconfig/oldconfig to
pick up any new defaults.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
Notie, the instructions added in: https://github.com/Microsoft/opengcs/pull/147
add a commit to revert another patch in this series. Instead of applying
c15d7f606f8 ("Revert "vmbus: destroy a hv_sock device only after the RESCIND_OFFER
is received"") we simply drop the orginal commit e37da6e7a52ea6 ("vmbus: destroy a
hv_sock device only after the RESCIND_OFFER is received") from our list.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
These drivers are for HPE SCSI cards and enabling them subsequently
enabled RAID_ATTRS and CHECK_SIGNATURE.
Only enabled for 4.9 and 4.12 kernels.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
Depmod in the zfs makefiles will never run as `/boot/` and relevant map files dont exist in our build environments.
Included style suggestions by @rn
Signed-off-by: Matt Johnson <matjohn2@cisco.com>
These are the recommended patches for 4.12 for Hyper-V sockets
and LCOW. Based on: https://github.com/Microsoft/opengcs/pull/138
This also includes a cherry-pick from upstream which fixes the
ext4/nvdimm/pax failures we have seen since 4.11.2.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
This enables per task (IO) accounting which is useful
for monitoring IO activity and the like.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
This adds building the zfs-kmod package to the kernel build.
The zfs-kmod packages contains the matching ZFS kernel modules
for a given kernel in /lib/modules/$(uname -r)/extra.
The zfs-kmod package also contains the standard kernel modules
and depmod is run over them so that modprobe works
The zfs-kmod package is not build by default due to unclarity
about licenses. Users will have to build it themselves.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
Note this is not the latest ZFS version but the version matched
by the current alpine zfs utilities.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
Note, on x86_64 for 4.12.9 a new kernel option,
HARDLOCKUP_CHECK_TIMESTAMP was added which defaults to enabled. It enables
a low pass filter to compensate for perf based hard lockup detection.
Added this option to the x86_64 4.12.x kernel confog file.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
Due to https://github.com/moby/moby/issues/34199 we can't supply
the FROM image via --build-arg and use DOCKER_CONTENT_TRUST=1 for build.
So we pull the image with DCT and then explicitly build it without.
This regression was introduced with 8b84baf2 ("kernel: Allow disabling content trust")
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
For some use cases, we may want to add additional kernel
configuration options (e.g. when adding AUFS). This commit
enables it by:
- renaming DEBUG to EXTRA
- append kernel_config${EXTRA} to the kernel config
- allowing passing in an EXTRA argument to the make file
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
specifying NOTRUST=1 on the make command line disables
content trust just like with packages.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
Without this change, recent Docker build produce this warning:
[WARNING]: Empty continuation line found in:
RUN apk add xz xz-dev zlib-dev && if [ $(uname -m) == x86_64 ]; then apk add libunwind-dev;
fi
[WARNING]: Empty continuation lines will become errors in a future release.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
- The x86_64 kernel config was derived from our 4.11 config
and then adjusted with the recent changes
- The arm64 kernel config was derived from the 4.9 config
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
The VMBus/Hyper-V socket patches were partly taken from the now
defunct 4.11 tree and partly form the WIP 4.12 tree at:
https://github.com/dcui/linux/commits/decui/msft-4.12.y
From the 4.11 tree:
- 0001-tools-build-Add-test-for-sched_getcpu.patch
Does not apply, may not be needed anymore to compile perf
- 0002-vmbus-vmbus_open-reset-onchannel_callback-on-error.patch
From https://github.com/dcui/linux/commits/decui/msft-4.12.y
- 0003-vmbus-add-the-matching-tasklet_enable-in-vmbus_close.patch
Already upstream: 5116f5e2e05cf("vmbus: re-enable channel tasklet")
- 0004-vmbus-remove-goto-error_clean_msglist-in-vmbus_open
From https://github.com/dcui/linux/commits/decui/msft-4.12.y
- 0005-vmbus-dynamically-enqueue-dequeue-a-channel-on-vmbus.patch
From the 4.11 patches
- 0006-hv_sock-implements-Hyper-V-transport-for-Virtual-Soc.patch
From https://github.com/dcui/linux/commits/decui/msft-4.12.y
- 0007-VMCI-only-try-to-load-on-VMware-hypervisor.patch
From https://github.com/dcui/linux/commits/decui/msft-4.12.y
- 0008-hv_sock-add-the-support-of-auto-loading.patch
From https://github.com/dcui/linux/commits/decui/msft-4.12.y
- 0009-tools-hv_sock-2-simple-test-cases.patch
Dropped, this was just test code
- 0010-vmbus-introduce-in-place-packet-iterator.patch
Already upstream: f3dd3f4797652("vmbus: introduce in-place packet iterator")
- 0011-hvsock-fix-a-race-in-hvs_stream_dequeue.patch
From https://github.com/dcui/linux/commits/decui/msft-4.12.y
- 0012-hvsock-fix-vsock_dequeue-enqueue_accept-race.patch
From https://github.com/dcui/linux/commits/decui/msft-4.12.y
- 0013-Drivers-hv-vmbus-Fix-rescind-handling.patch
From the 4.11 patches
- 0014-vmbus-fix-hv_percpu_channel_deq-enq-race.patch
From the 4.11 patches
- 0015-vmbus-add-vmbus-onoffer-onoffer_rescind-sync.patch
From the 4.11 patches
- 0016-hv-sock-a-temporary-workaround-for-the-pending_send_.patch
DROPPED. Does not apply at all anymore. Was a hack anyway
- 0017-vmbus-fix-the-missed-signaling-in-hv_signal_on_read.patch
Applied manually from the 4.11 patches
- 0018-hv-sock-avoid-double-FINs-if-shutdown-is-called.patch
From https://github.com/dcui/linux/commits/decui/msft-4.12.y
- 0019-Added-vsock-transport-support-to-9pfs.patch
From the 4.11 patches
- 0020-NVDIMM-reducded-ND_MIN_NAMESPACE_SIZE-from-4MB-to-4K.patch
From the 4.11 patches
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
The host side VSOCK implementation introduced with
0009-VSOCK-Introduce-vhost_vsock.ko.patch
does not compile due to vhost_vq_init_access not being defined.
VHOST support (including VHOST_VSOCK) was enabled with
86deeaff ("kernel: Bring 4.4 x86_64 kernel config more in line
with 4.9") but not compile tested. Having VHOST support in
itself is fine, it's just the VHOST_VSOCK portion which is not
avail.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
The kernel config for debug kernels is created by concatenating
config files, so we can't use diff to check it.
This fixes a regression introduced by:
9362de0a ("kernel: Verify kernel config")
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
Note, vhost vsock is disabled on arm64 because it failed to compile.
'vhost_vq_init_access' was not defined, but with a quick check
I could not find where it was supposed to be defined.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
The new Dockerfile.kconfig can be used, via the 'kconfig' make target
to build a 'linuxkit/kconfig' images. This images contains the patched
source and default kernel configs for all supported kernels.
It's useful to updating the kernel config files.
While at it, also update the alpine base.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
The kernel build currently downloads the source tar ball every
time, which is a little tedious when experimenting with kernel
configs or when compiling the kernel multiple times.
This commit adds a new 'fetch' make target which downloads the
kernel sources into ./sources. Then in the Dockerfile we add
the directory and only download the source if it is not present.
The tarballs signature is till checked on each build.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
Since we supply a full .config file we can check that after
make defconfig/oldconfig it hasn't changed. This should catch
cases where a config option has changed between releases.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
This is a recommended security measure to protect the low portion
of virtual memory. On x86_64 the recommended value is 65536 while
for arm it shouldn't be higher than 32768.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
The resulting kernel boots fine on qemu and on Cavium Thunder,
though the latter still has some issues.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
Enable DEVPTS_MULTIPLE_INSTANCES in kernel configuration file
to avoid the devpts mounting hang issue during bootup when
running LinuxKit.
Signed-off-by: Dennis Chen <dennis.chen@arm.com>
It has been EOLed today and won't receive any further updates.
The images are still on hub so can be continued to be used
for the time being.
4.12 support is coming soon.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
- Adding NFS debug modules to kernel_config.debug
- Also updating some dead links in the kernels.md doc file
Signed-off-by: Dave Freitag <dcfreita@us.ibm.com>
This is a semi-educated guess of which kernel config options
may be needed to run LCOW based on the config file posted here:
2e5c2fac44/kernelconfig/4.11/kconfig_for_4_11
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
- Enables module for some common 10/40G NICs
from Broadcom, Intel, and Mellanox
- Enable KVM and related modules
These are targeted to support more bare metal
configuration with LinuxKit.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
The original kernel Dockerfile hardcodes the amd64 as the
only arch supported, this patch removes this kind of hardcode
and make the Dockerfile is ready to support both amd64 and
arm64 by using the runtime arch type.
Signed-off-by: Dennis Chen <dennis.chen@arm.com>
Added a new patch to the 4.11 and 4.9 kernels based on a patch
submitted to stable: https://patchwork.kernel.org/patch/9829039/
This patch fixes a off-by-one error in the VMBus code.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
Otherwise files which have an updated timestamp but no actual changes are
marked as changes because `git diff-index` only uses the `lstat` result and not
the actual file contents. Running `git update-index --refresh` updates the
cache.
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
The definition of `$(TAG)` differs from pkg/package.mk and is only the
HASH+DIRTY since the full tag is defined by the kernel macro and varies for
each kernel.
Also `show-tag` is `show-tags` here due to the multiple builds. Individual
`show-tag_FOO` rules are provided similar to the `build_FOO` rules.
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
- Combine 'sign' and 'push' targets like it is done for
package builds.
- Append '-dirty' to the tag if the repository is dirty.
- Don't push to hub if the repository is dirty.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
In particular this contains 1be7107fbe18eed3e319 ("mm: larger stack
guard gap, between vmas") which is a fix for CVE-2017-1000364.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
Module loading on hotplug and boot seems to work now, so
move some less commonly used kernel features and drivers
out of the kernel into drivers. Specifically:
- Devices: All non-virtual network device drivers
- Networking: GRE, GENEVE, PPP, non-essential IPv6 protos,
L2TP, MPLS_GSO, bonding, IPSec (XFRM), openvswitch,
queueing/schedulers
- FS: SUNRPC, NFS, NFSD, LOCKD, NTFS
- Misc: ATA over Ethernet
Remove Nozomi serial driver. It doesn't seem to be used
on any of our platforms.
The config files were also run through 'make defconfig &&
make oldconfig' to update any missing options.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
It clashes with libelf-dev but libelf-dev is sufficient
to compile the kernel. This also allows us to remove the
'|| true' from the 'apk add', catching errors with the
tools installation.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>