Commit d47b283df4 ("kernel: Remove fetch target") removed
the 'fetch' target to simplify the Makefile. This left
dependencies on 'sources' lingering. Remove it.
resolves#3333
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
With kernel 5.0.6 we start seeing compile errors such as:
HOSTCXX -fPIC scripts/gcc-plugins/randomize_layout_plugin.o
In file included from <stdin>:1:
/usr/include/libelf/libelf.h:28:5: error: "__LIBELF_INTERNAL__" is not defined, evaluates to 0 [-Werror=undef]
#if __LIBELF_INTERNAL__
^~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
elutils-dev installs a different version of libelf.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
All our 4.x kernels had CFQ enabled. This was removed
in 5.x and replaced with BFQ. Enable it.
resolves#3308
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
To reduce the number of kernels we maintain, for s390x
and ar64 we only support the latest LTS and newer kernels.
v4.19.x has been out for a while, so lets remove support for
v4.14.x.
resolves#3302
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
This target allowed to locally download the kernel source
tar balls. We haven't used this foir a while and adding
v5.x kernel support for it would add yet another conditional.
Remove it to keep the Makefile simpler.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
The build of the perf utility has been quite bothersome,
with different arches and kernel versions failing.
Since we now have the ful kernel source in the package,
factor out the actual build into Dockerfile.perf
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
* wg-quick: freebsd: allow loopback to work
FreeBSD adds a route for point-to-point destination addresses. We don't
really want to specify any destination address, but unfortunately we
have to. Before we tried to cheat by giving our own address as the
destination, but this had the unfortunate effect of preventing
loopback from working on our local ip address. We work around this with
yet another kludge: we set the destination address to 127.0.0.1. Since
127.0.0.1 is already assigned to an interface, this has the same effect
of not specifying a destination address, and therefore we accomplish the
intended behavior. Note that the bad behavior is still present in Darwin,
where such workaround does not exist.
* tools: remove unused check phony declaration
* highlighter: when subtracting char, cast to unsigned
* chacha20: name enums
* tools: fight compiler slightly harder
* tools: c_acc doesn't need to be initialized
* queueing: more reasonable allocator function convention
Usual nits.
* systemd: wg-quick should depend on nss-lookup.target
Since wg-quick(8) calls wg(8) which does hostname lookups, we should
probably only run this after we're allowed to look up hostnames.
* compat: backport ALIGN_DOWN
* noise: whiten the nanoseconds portion of the timestamp
This mitigates unrelated sidechannel attacks that think they can turn
WireGuard into a useful time oracle.
* hashtables: decouple hashtable allocations from the main device allocation
The hashtable allocations are quite large, and cause the device allocation in
the net framework to stall sometimes while it tries to find a contiguous
region that can fit the device struct. To fix the allocation stalls, decouple
the hashtable allocations from the device allocation and allocate the
hashtables with kvmalloc's implicit __GFP_NORETRY so that the allocations fall
back to vmalloc with little resistance.
* chacha20poly1305: permit unaligned strides on certain platforms
The map allocations required to fix this are mostly slower than unaligned
paths.
* noise: store clamped key instead of raw key
This causes `wg show` to now show the right thing. Useful for doing
comparisons.
* compat: ipv6_stub is sometimes null
On ancient kernels, ipv6_stub is sometimes null in cases where IPv6 has
been disabled with a command line flag or other failures.
* Makefile: don't duplicate code in install and modules-install
* Makefile: make the depmod path configurable
* queueing: net-next has changed signature of skb_probe_transport_header
A 5.1 change. This could change again, but for now it allows us to keep this
snapshot aligned with our upstream submissions.
* netlink: don't remove allowed ips for new peers
* peer: only synchronize_rcu_bh and traverse trie once when removing all peers
* allowedips: maintain per-peer list of allowedips
This is a rather big and important change that makes it much much faster to do
operations involving thousands of peers. Batch peer/allowedip addition and
clearing is several orders of magnitude faster now.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
This skiks 4.20.9/4.19.22/4.14.100/4.9.157 because they
contained a bug. See:
https://lwn.net/Articles/779934/
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
* tools: curve25519: handle unaligned loads/stores safely
This should fix sporadic crashes with `wg pubkey` on certain architectures.
* netlink: auth socket changes against namespace of socket
In WireGuard, the underlying UDP socket lives in the namespace where the
interface was created and doesn't move if the interface is moved. This
allows one to create the interface in some privileged place that has
Internet access, and then move it into a container namespace that only
has the WireGuard interface for egress. Consider the following
situation:
1. Interface created in namespace A. Socket therefore lives in namespace A.
2. Interface moved to namespace B. Socket remains in namespace A.
3. Namespace B now has access to the interface and changes the listen
port and/or fwmark of socket. Change is reflected in namespace A.
This behavior is arguably _fine_ and perhaps even expected or
acceptable. But there's also an argument to be made that B should have
A's cred to do so. So, this patch adds a simple ns_capable check.
* ratelimiter: build tests with !IPV6
Should reenable building in debug mode for systems without IPv6.
* noise: replace getnstimeofday64 with ktime_get_real_ts64
* ratelimiter: totalram_pages is now a function
* qemu: enable FP on MIPS
Linux 5.0 support.
* keygen-html: bring back pure javascript implementation
Benoît Viguier has proofs that values will stay well within 2^53. We
also have an improved carry function that's much simpler. Probably more
constant time than emscripten's 64-bit integers.
* contrib: introduce simple highlighter library
This is the highlighter library being used in:
- https://twitter.com/EdgeSecurity/status/1085294681003454465
- https://twitter.com/EdgeSecurity/status/1081953278248796165
It's included here as a contrib example, so that others can paste it into
their own GUI clients for having the same strictly validating highlighting.
* netlink: use __kernel_timespec for handshake time
This readies us for Y2038. See https://lwn.net/Articles/776435/ for more info.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Enable the STACKLEAK GCC plugin which erases the
kernel stack before returning from system calls.
This security options has a reported performance
hit of around 1% which seem like a reasonable amount.
For more details see: https://outflux.net/blog/archives/2018/12/24/security-things-in-linux-v4-20/
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
The kernel config was derived from the 4.19.13 kernel config
run through the 'make oldconfig' with all defaults accepted,
except for:
- NET_VENDOR_MICROCHIP (defauly 'y', set to 'n')
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
We already have 4.9.x, 4.14.x, and 4,19.x as LTS releases.
4.9.x has a longer lifetime as 4.4.x as well and fewer security
fixes can be backported to 4.4.x. Remove it.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
packet.net will soon have x86 and arm64 machines with NFPs.
Enable the driver for it.
The 4.9 kernel only has support for the NFP VF driver,
so don't enable it there.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
A previosu commit removed suppoer for 4.18.x kernels for
arm64 and s390x but did not remove the config files. Fix it.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
The logic for perf became too complex. Just build for latest LTS
and latest stable.
Disable for arm64 for now as it is broken for 4.19 due to a header
mismatch:
In file included from /linux/tools/arch/arm64/include/uapi/asm/unistd.h:20:0,
from libbpf.c:36:
/linux/tools/include/uapi/asm-generic/unistd.h:754:0: error: "__NR_fcntl" redefined [-Werror]
In file included from /usr/include/sys/syscall.h:4:0,
from /linux/tools/perf/perf-sys.h:7,
from libbpf.c:35:
/usr/include/bits/syscall.h:26:0: note: this is the location of the previous definition
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
The kernel configs were constructed by running the 4.18.x config
through the 4.19 oldconfig process.
The 4.19.x has a new option, RANDOM_TRUST_CPU, which indicates
if the CPUs random instruction is to be trusted. It defaults to
"no" and this default was accepted.
Most of the defaults were accepted, except for:
BLK_CGROUP_IOLATENCY=y
NFT_TUNNEL=y
NFT_OSF=y
NFT_TPROXY=y
NETFILTER_XT_MATCH_SOCKET=y
NET_VENDOR_CADENCE=n
NET_VENDOR_NETERION=n
NET_VENDOR_PACKET_ENGINES=n
We also disallow CIFS for insecure legacy servers:
CIFS_ALLOW_INSECURE_LEGACY=n
For arm64, the following changes were made to the default:
SENSORS_RASPBERRYPI_HWMON=y
CRYPTO_DEV_QCOM_RNG=m
CRYPTO_DEV_HISI_SEC=m
For s390x, the additional changes were made to the default:
KERNEL_BZIP2 (default is gzip)
GCC_PLUGINS=y
GCC_PLUGIN_STRUCTLEAK=y
GCC_PLUGIN_STRUCTLEAK_BYREF_ALL=y
GCC_PLUGIN_RANDSTRUCT=y
GCC_PLUGIN_RANDSTRUCT_PERFORMANCE=y
Running the 4.18 and 4.19 kernel config through
./scripts/kconfig-split.py yields the following 4.19.x
only config options for x86_64:
The x86_64 kernel difference to 4.18 for
CONFIG_ARCH_SUPPORTS_ACPI=y
CONFIG_BLK_CGROUP_IOLATENCY=y
CONFIG_BNXT_HWMON=y
CONFIG_BUILD_SALT=""
CONFIG_CONSOLE_LOGLEVEL_QUIET=4
CONFIG_CRASH_CORE=y
CONFIG_HAVE_ARCH_PREL32_RELOCATIONS=y
CONFIG_HAVE_RELIABLE_STACKTRACE=y
CONFIG_MEMCG_KMEM=y
CONFIG_MLX5_EN_ARFS=y
CONFIG_MLX5_EN_RXNFC=y
CONFIG_NETFILTER_NETLINK_OSF=y
CONFIG_NETFILTER_XT_MATCH_SOCKET=y
CONFIG_NFT_OSF=y
CONFIG_NFT_TPROXY=y
CONFIG_NFT_TUNNEL=y
CONFIG_NF_SOCKET_IPV4=y
CONFIG_NF_SOCKET_IPV6=y
CONFIG_XEN_SCRUB_PAGES_DEFAULT=y
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
After 'make oldconfig' we check that that the kernel config
is as we expect and error if they don't. We used to print
the default 'diff' output on a mismatch but a unified diff
is easier to read.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
This cherry picks:
- b6fe0440c637 ("bridge: implement missing ndo_uninit()")
- b1b9d366028f ("bridge: move bridge multicast cleanup to ndo_uninit")
The fix is in b1b9d366028f ("bridge: move bridge multicast cleanup
to ndo_uninit") but it requires b6fe0440c637 ("bridge: implement missing
ndo_uninit()"). Furthermore, b1b9d366028f needed some manual resolution
of a cherry-pick conflict because the surrounding code had changed.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
We want to compile BCC for the latest LTS and the latest
stable and missed the update to 4.18 when enabling it. Do
it now.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
Note, this update skips 4.18.2/4.17.16/4.14.64/4.9.121/4.4.149
as the change was a single patch, a bug fix.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
In setup_net() there are a few particularly slow subsystems that
contribute more than 140ms of time to the new net namespace creation
path. The docker daemon doesn't depend on these, and won't modprobe
them into the kernel. Convert these to modules to reduce the amount of
time it takes for docker to start a container. This change takes an
additional ~120 ms of time off container start time.
Signed-off-by: Krister Johansen <krister.johansen@oracle.com>
While investigating performance problems around 'docker run' times, it
was observed that a large amount of time was spent in network namespace
creation. Of that time, a large portion involved waiting for RCU grace
periods to elapse. Increasing HZ causes the periodic timer to check for
quiesced periods more frequently, which consequently reduces the amount
of time RCU callers spend waiting for grace periods and in barrier
waits.
By itself, this change took the amount of time to execute a 'docker run
hello-world' down to 570ms from over 2000ms on 4.14, and down to 390ms
from 1260 on 4.17 and 4.18.
Signed-off-by: Krister Johansen <krister.johansen@oracle.com>
The kernel config was derived from the 4.17.x kernel config
and then tweaked a little. Specifically:
- Enable XDP_SOCKETS
- Enable NFT_CONNLIMIT
- Enable IP_VS_MH
- Enable BPFILTER (as module)
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
The 4.14.63 contains important security fixes in particular
against L1TF (CVE-2018-3615, CVE-2018-3620, CVE-2018-3646) and
userspace-userspace SpectreRSB.
Signed-off-by: Rolf Neugebauer <rn@rneugeba.io>
The previous commit updated to 4.16.18, which is the last
4.16.x kernel. The 4.16.18 kernel was compiled and pushed
but we may as well now remove it as it has been EOLed.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
While we can re-create the kernel source code we don't have it
handily available in one place. This commit stashes the kernel
and the WireGuard source as /src/linux.tar.xz and
/src/wireguard.tar.xz in the kernel package.
This increases the size of the hub image by around 100MB.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
Trying to keep the number of kernels we compile for these
platforms small and 4.16 is likely to be EOLed soon anyway.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
The kernel configs are the 4.16.x configs run through
a 'make defconfig && make oldconfig' cycle.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
Note, we skip 4.14.45 because 4.14.46 only has 3 patches
in it which unbreak 'perf' compilation.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
This microcode bundle comes with a file called "list"
which seems to confuse the 'iucode_tool', so we just
remove it.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
the 4.14.38 kernel backported the Spectre mitigation requiring
a change of the kernel config.
Might as well enabled the mitigations by default.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
This is useful for some baremetal configs, such as using
USB sticks on a RPi3. I enabled it for x86_64 as well
to keep the differences smaller.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@gmail.com>
Note, the depeding SERIAL_DEV_CTRL_TTYPORT defaults to
'N' with the 4.14.x kernel and 'Y' for the 4.16.x kernel.
I chose to stick with the defaults.
This may fix the serial console issue, I've seen on the RPi3
with 4.14.x kernels.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@gmail.com>
The s390x build VM we have access to is quite slow. Dropping
the 4.15.x kernel, which soon will be EOLed anyway, to
save some time.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
For example kernel module signatures if you do not provide a key. So add
to the dependencies for kernel builds.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
The kernel config is based on the 4.15.x kernel config
run through 'make defconfig && make oldconfig' and then
tweaked a little by hand.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
There are too many kernels to compile and arm64 takes a bit
too long to compile even on a beefy arm64 server.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@gmail.com>
These fix some issues around hot-unplugging devices which may be the cause
of some LCOW issues we are seeing.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@gmail.com>
Enable the Integrity Measurement Architecture (IMA) for 4.14.x
and 4.15.x kernels. This pretty much uses the defaults except we
also enable INTEGRITY_ASYMMETRIC_KEYS and IMA_READ_POLICY. The
latter may be useful for debugging.
For s390x we also needed to enable TPM support.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
- Disable all network device driver apart from Mellanox, which
is the only support NIC on s390x
- Disable Fusion MPT
- Disable DAX/NVMEM/NVME
- Disable USB
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
While this now has some duplication, it is clearer as to which
kernels are compiled for each architecture.
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
Update building process to add s390 support.
The patch serial-forbid-8250-on-s390.patch has been added to disable
8250 serial for s390.
The patch is available upstream https://patchwork.kernel.org/patch/10106437/
but it is not backported.
Signed-off-by: Alice Frosi <alice@linux.vnet.ibm.com>