diff --git a/projects/README.md b/projects/README.md index 2861e2715..2a87f2321 100644 --- a/projects/README.md +++ b/projects/README.md @@ -19,6 +19,7 @@ If you want to create a project, please submit a pull request to create a new di - [Swarmd](swarmd) Standalone swarmkit based orchestrator - [Landlock LSM](landlock/) programmatic access control - [Clear Containers](clear-containers/) Clear Containers image +- [Logging](logging/) Experimental logging tools ## Current projects not yet documented - VMWare support (VMWare) diff --git a/projects/logging/README.md b/projects/logging/README.md new file mode 100644 index 000000000..2a56a2ace --- /dev/null +++ b/projects/logging/README.md @@ -0,0 +1,54 @@ +### Logging tools + +Experimental logging tools for linuxkit. + +This project currently provides three tools for system logs; `logwrite`, `logread` and `memlogd` (+ `startmemlogd` to run `memlogd` with `runc`). + +`memlogd` is the daemon that keeps logs in a circular buffer in memory. It is started automatically by `init`/`startmemlogd` in a runc container. It is passed two sockets - one that allows clients to dump/follow the logs and one that can be used to send open file descriptors to `memlogd`. When `memlogd` receives a file descriptor it will read from the file descriptor and timestamp and append the content to the in-memory log until the file is closed. + +`logwrite` executes a command and will send stderr and stdout to `memlogd`. It does this by opening a socketpair for stdout and stderr and then sends the file descriptors to memlogd, before executing a specified command. Output is also sent to normal stderr/stdin. For example, `logwrite ls` will show the output both in the console and record it in the logs. + +`logread` connects to memlogd and dumps the ring buffer. Parameters `-f` and `-F` can be used to follow the logs and disable the initial log dump (it behaves similar to busybox’ `logread`) + +Init is modified to run all `onboot` and `service` containers wrapped in`logwrite` and to run `/usr/bin/startmemlogd`. + +New sockets: +`/tmp/memlogd.sock` — sock_dgram which accepts an fd and a null-terminated source description +`/tmp/memlogdq.sock` — sock_stream to ask to dump/follow logs + +Usage examples: +``` +/ # logread -f +2017-04-15T15:37:37Z memlogd memlogd started +2017-04-15T15:37:37Z 002-dhcpcd.stdout eth0: waiting for carrier +2017-04-15T15:37:37Z 002-dhcpcd.stdout eth0: carrier acquired +2017-04-15T15:37:37Z 002-dhcpcd.stdout DUID 00:01:00:01:20:84:fa:c1:02:50:00:00:00:24 +2017-04-15T15:37:37Z 002-dhcpcd.stdout eth0: IAID 00:00:00:24 +2017-04-15T15:37:37Z 002-dhcpcd.stdout eth0: adding address fe80::84e3:ca52:2590:fe80 +2017-04-15T15:37:37Z 002-dhcpcd.stdout eth0: soliciting an IPv6 router +2017-04-15T15:37:37Z 002-dhcpcd.stdout eth0: soliciting a DHCP lease +2017-04-15T15:37:37Z 002-dhcpcd.stdout eth0: offered 192.168.65.37 from 192.168.65.1 `vpnkit' +2017-04-15T15:37:37Z 002-dhcpcd.stdout eth0: leased 192.168.65.37 for 7199 seconds +2017-04-15T15:37:37Z 002-dhcpcd.stdout eth0: adding route to 192.168.65.0/24 +2017-04-15T15:37:37Z 002-dhcpcd.stdout eth0: adding default route via 192.168.65.1 +2017-04-15T15:37:37Z 002-dhcpcd.stdout exiting due to oneshot +2017-04-15T15:37:37Z 002-dhcpcd.stdout dhcpcd exited +2017-04-15T15:37:37Z rngd.stderr Unable to open file: /dev/tpm0 +^C +/ # logwrite echo testing123 +testing123 +/ # logread | tail -n1 +2017-04-15T15:37:45Z echo.stdout testing123 +/ # echo -en "GET / HTTP/1.0\n\n" | nc localhost 80 > /dev/null +/ # logread | grep nginx +2017-04-15T15:42:40Z nginx.stdout 127.0.0.1 - - [15/Apr/2017:15:42:40 +0000] "GET / HTTP/1.0" 200 612 "-" "-" "-" +``` + +Current issues and limitations: + +- The moby tool only supports onboot and service containers. `memlogd` runs as a special container that is managed by init, as it needs fd’s created in advance. To work around this a memlogd container is exported during build. The init-section in the yml is used to extract it to `/containers/init/memlogd` with a pre-created `config.json`. +- No docker logger plugin support yet - it could be nice to add support to memlogd, so the docker container logs would also be gathered in one place +- No syslog compatibility at the moment and `/dev/log` doesn’t exist. This socket could be created to keep syslog compatibility, e.g. by using https://github.com/mcuadros/go-syslog. Processes that require syslog should then be able to log directly to memlogd. +- Kernel messages not read on startup yet (but can be captured with `logwrite dmesg`) +- Currently no direct external hooks exposed - but options available that could be added. Should also be possible to pipe output to e.g. `oklog` from `logread` (https://github.com/oklog/oklog) + diff --git a/projects/logging/examples/logging.yml b/projects/logging/examples/logging.yml new file mode 100644 index 000000000..6a42c00c4 --- /dev/null +++ b/projects/logging/examples/logging.yml @@ -0,0 +1,60 @@ +kernel: + image: "mobylinux/kernel:4.9.x" + cmdline: "console=ttyS0 console=tty0 page_poison=1" +init: + - linuxkit/init:b5c88b78cd9cc73ed83b45f66bc9de618223768a # with runc, logwrite, startmemlogd + - mobylinux/runc:b0fb122e10dbb7e4e45115177a61a3f8d68c19a9 + - mobylinux/containerd:18eaf72f3f4f9a9f29ca1951f66df701f873060b # unmodified containerd, from pre pr #1636 + - mobylinux/ca-certificates:eabc5a6e59f05aa91529d80e9a595b85b046f935 + - linuxkit/memlogd:9b5834189f598f43c507f6938077113906f51012 +onboot: + - name: sysctl + image: "mobylinux/sysctl:2cf2f9d5b4d314ba1bfc22b2fe931924af666d8c" + net: host + pid: host + ipc: host + capabilities: + - CAP_SYS_ADMIN + readonly: true + - name: binfmt + image: "linuxkit/binfmt:8881283ac627be1542811bd25c85e7782aebc692" + binds: + - /proc/sys/fs/binfmt_misc:/binfmt_misc + readonly: true + - name: dhcpcd + image: "linuxkit/dhcpcd:48e249ebef6a521eed886b3bce032db69fbb4afa" + binds: + - /var:/var + - /tmp/etc:/etc + capabilities: + - CAP_NET_ADMIN + - CAP_NET_BIND_SERVICE + - CAP_NET_RAW + net: host + command: ["/sbin/dhcpcd", "--nobackground", "-f", "/dhcpcd.conf", "-1"] +services: + - name: rngd + image: "mobylinux/rngd:3dad6dd43270fa632ac031e99d1947f20b22eec9" + capabilities: + - CAP_SYS_ADMIN + oomScoreAdj: -800 + readonly: true + - name: nginx + image: "nginx:alpine" + capabilities: + - CAP_NET_BIND_SERVICE + - CAP_CHOWN + - CAP_SETUID + - CAP_SETGID + - CAP_DAC_OVERRIDE + net: host +files: + - path: etc/docker/daemon.json + contents: '{"debug": true}' +trust: + image: + - mobylinux/kernel +outputs: + - format: kernel+initrd + - format: iso-bios + - format: iso-efi diff --git a/projects/logging/pkg/init/.gitignore b/projects/logging/pkg/init/.gitignore new file mode 100644 index 000000000..cf40cde3b --- /dev/null +++ b/projects/logging/pkg/init/.gitignore @@ -0,0 +1,2 @@ +sbin/ +usr/ diff --git a/projects/logging/pkg/init/Dockerfile b/projects/logging/pkg/init/Dockerfile new file mode 100644 index 000000000..92dea3588 --- /dev/null +++ b/projects/logging/pkg/init/Dockerfile @@ -0,0 +1,9 @@ +FROM alpine:3.5 + +RUN \ + apk --no-cache update && \ + apk --no-cache upgrade -a && \ + apk --no-cache add \ + && rm -rf /var/cache/apk/* + +COPY . ./ diff --git a/projects/logging/pkg/init/Makefile b/projects/logging/pkg/init/Makefile new file mode 100644 index 000000000..6cf6e7495 --- /dev/null +++ b/projects/logging/pkg/init/Makefile @@ -0,0 +1,38 @@ +C_COMPILE=linuxkit/c-compile:63b085bbaec1aa7c42a7bd22a4b1c350d900617d@sha256:286e3a729c7a0b1a605ae150235416190f9f430c29b00e65fa50ff73158998e5 +START_STOP_DAEMON=sbin/start-stop-daemon + +default: push + +$(START_STOP_DAEMON): start-stop-daemon.c + mkdir -p $(dir $@) + tar cf - $^ | docker run --rm --net=none --log-driver=none -i $(C_COMPILE) -o $@ | tar xf - + +.PHONY: tag push + +BASE=alpine:3.5 +IMAGE=init + +ETC=$(shell find etc -type f) + +hash: Dockerfile $(ETC) init $(START_STOP_DAEMON) + DOCKER_CONTENT_TRUST=1 docker pull $(BASE) + tar cf - $^ | docker build --no-cache -t $(IMAGE):build - + docker run --rm $(IMAGE):build sh -c 'cat $^ /lib/apk/db/installed | sha1sum' | sed 's/ .*//' > $@ + +push: hash + docker pull linuxkit/$(IMAGE):$(shell cat hash) || \ + (docker tag $(IMAGE):build linuxkit/$(IMAGE):$(shell cat hash) && \ + docker push linuxkit/$(IMAGE):$(shell cat hash)) + docker rmi $(IMAGE):build + rm -f hash + +tag: hash + docker pull linuxkit/$(IMAGE):$(shell cat hash) || \ + docker tag $(IMAGE):build linuxkit/$(IMAGE):$(shell cat hash) + docker rmi $(IMAGE):build + rm -f hash + +clean: + rm -rf hash sbin usr + +.DELETE_ON_ERROR: diff --git a/projects/logging/pkg/init/etc/init.d/containerd b/projects/logging/pkg/init/etc/init.d/containerd new file mode 100755 index 000000000..f62710d7e --- /dev/null +++ b/projects/logging/pkg/init/etc/init.d/containerd @@ -0,0 +1,9 @@ +#!/bin/sh + +# bring up containerd +ulimit -n 1048576 +ulimit -p unlimited + +printf "\nStarting containerd\n" +mkdir -p /var/log +exec /usr/bin/containerd diff --git a/projects/logging/pkg/init/etc/init.d/containers b/projects/logging/pkg/init/etc/init.d/containers new file mode 100755 index 000000000..c43677628 --- /dev/null +++ b/projects/logging/pkg/init/etc/init.d/containers @@ -0,0 +1,36 @@ +#!/bin/sh + +# start memlogd container + +/usr/bin/startmemlogd + +# start onboot containers, run to completion + +if [ -d /containers/onboot ] +then + for f in $(find /containers/onboot -mindepth 1 -maxdepth 1 | sort) + do + base="$(basename $f)" + /bin/mount --bind "$f/rootfs" "$f/rootfs" + mount -o remount,rw "$f/rootfs" + /usr/bin/logwrite -n "$(basename $f)" /usr/bin/runc run --bundle "$f" "$(basename $f)" + printf " - $base\n" + done +fi + +# start service containers + +if [ -d /containers/services ] +then + for f in $(find /containers/services -mindepth 1 -maxdepth 1 | sort) + do + base="$(basename $f)" + /bin/mount --bind "$f/rootfs" "$f/rootfs" + mount -o remount,rw "$f/rootfs" + log="/var/log/$base.log" + /usr/bin/logwrite -n "$(basename $f)" /sbin/start-stop-daemon --start --pidfile /run/$base.pid --exec /usr/bin/runc -- run --bundle "$f" --pid-file /run/$base.pid "$(basename $f)" $log >$log & + printf " - $base\n" + done +fi + +wait diff --git a/projects/logging/pkg/init/etc/init.d/rcS b/projects/logging/pkg/init/etc/init.d/rcS new file mode 100755 index 000000000..fdd1faea4 --- /dev/null +++ b/projects/logging/pkg/init/etc/init.d/rcS @@ -0,0 +1,114 @@ +#!/bin/sh + +# mount filesystems +mkdir -p -m 0755 /proc /run /tmp /sys /dev + +mount -n -t proc proc /proc -o ndodev,nosuid,noexec,relatime + +mount -n -t tmpfs tmpfs /run -o nodev,nosuid,noexec,relatime,size=10%,mode=755 +mount -n -t tmpfs tmpfs /tmp -o nodev,nosuid,noexec,relatime,size=10%,mode=1777 + +# mount devfs +mount -n -t devtmpfs dev /dev -o nosuid,noexec,relatime,size=10m,nr_inodes=248418,mode=755 +# devices +[ -c /dev/console ] || mknod -m 600 /dev/console c 5 1 +[ -c /dev/tty1 ] || mknod -m 620 /dev/tty1 c 4 1 +[ -c /dev/tty ] || mknod -m 666 /dev/tty c 5 0 + +[ -c /dev/null ] || mknod -m 666 /dev/null c 1 3 +[ -c /dev/kmsg ] || mknod -m 660 /dev/kmsg c 1 11 + +# extra symbolic links not provided by default +[ -e /dev/fd ] || ln -snf /proc/self/fd /dev/fd +[ -e /dev/stdin ] || ln -snf /proc/self/fd/0 /dev/stdin +[ -e /dev/stdout ] || ln -snf /proc/self/fd/1 /dev/stdout +[ -e /dev/stderr ] || ln -snf /proc/self/fd/2 /dev/stderr +[ -e /proc/kcore ] && ln -snf /proc/kcore /dev/core + +# devfs filesystems +mkdir -p -m 1777 /dev/mqueue +mkdir -p -m 1777 /dev/shm +mkdir -p -m 0755 /dev/pts +mount -n -t mqueue -o noexec,nosuid,nodev mqueue /dev/mqueue +mount -n -t tmpfs -o noexec,nosuid,nodev,mode=1777 shm /dev/shm +mount -n -t devpts -o noexec,nosuid,gid=5,mode=0620 devpts /dev/pts + +# mount sysfs +sysfs_opts=nodev,noexec,nosuid +mount -n -t sysfs -o ${sysfs_opts} sysfs /sys +[ -d /sys/kernel/security ] && mount -n -t securityfs -o ${sysfs_opts} securityfs /sys/kernel/security +[ -d /sys/kernel/debug ] && mount -n -t debugfs -o ${sysfs_opts} debugfs /sys/kernel/debug +[ -d /sys/kernel/config ] && mount -n -t configfs -o ${sysfs_opts} configfs /sys/kernel/config +[ -d /sys/fs/fuse/connections ] && mount -n -t fusectl -o ${sysfs_opts} fusectl /sys/fs/fuse/connections +[ -d /sys/fs/selinux ] && mount -n -t selinuxfs -o nosuid,noexec selinuxfs /sys/fs/selinux +[ -d /sys/fs/pstore ] && mount -n -t pstore pstore -o ${sysfs_opts} /sys/fs/pstore +[ -d /sys/firmware/efi/efivars ] && mount -n -t efivarfs -o ro,${sysfs_opts} efivarfs /sys/firmware/efi/efivars + +# misc /proc mounted fs +[ -d /proc/sys/fs/binfmt_misc ] && mount -t binfmt_misc -o nodev,noexec,nosuid binfmt_misc /proc/sys/fs/binfmt_misc + +# mount cgroups +mount -n -t tmpfs -o nodev,noexec,nosuid,mode=755,size=10m cgroup_root /sys/fs/cgroup + +while read name hier groups enabled rest +do + case "${enabled}" in + 1) mkdir -p /sys/fs/cgroup/${name} + mount -n -t cgroup -o ${sysfs_opts},${name} ${name} /sys/fs/cgroup/${name} + ;; + esac +done < /proc/cgroups + +# use hierarchy for memory +echo 1 > /sys/fs/cgroup/memory/memory.use_hierarchy + +# for compatibility +mkdir -p /sys/fs/cgroup/systemd +mount -t cgroup -o none,name=systemd cgroup /sys/fs/cgroup/systemd + +# start mdev for hotplug +echo "/sbin/mdev" > /proc/sys/kernel/hotplug + +# mdev -s will not create /dev/usb[1-9] devices with recent kernels +# so we trigger hotplug events for usb for now +for i in $(find /sys/devices -name 'usb[0-9]*'); do + [ -e $i/uevent ] && echo add > $i/uevent +done + +mdev -s + +# set hostname +if [ -s /etc/hostname ] +then + hostname -F /etc/hostname +fi + +if [ $(hostname) = "moby" -a -f /sys/class/net/eth0/address ] +then + mac=$(cat /sys/class/net/eth0/address) + hostname moby-$(echo $mac | sed 's/://g') +fi + +# set system clock from hwclock +hwclock --hctosys --utc + +# bring up loopback interface +ip addr add 127.0.0.1/8 dev lo brd + scope host +ip route add 127.0.0.0/8 dev lo scope host +ip link set lo up + +# for containerising dhcpcd and other containers that need writable etc +mkdir /tmp/etc +mv /etc/resolv.conf /tmp/etc/resolv.conf +ln -snf /tmp/etc/resolv.conf /etc/resolv.conf + +# remount rootfs as readonly +mount -o remount,ro / + +# make /var writeable and shared +mount -o bind /var /var +mount -o remount,rw,nodev,nosuid,noexec,relatime /var /var +mount --make-rshared /var + +# make / rshared +mount --make-rshared / diff --git a/projects/logging/pkg/init/etc/inittab b/projects/logging/pkg/init/etc/inittab new file mode 100644 index 000000000..8ef3e8565 --- /dev/null +++ b/projects/logging/pkg/init/etc/inittab @@ -0,0 +1,15 @@ +# /etc/inittab + +::sysinit:/etc/init.d/rcS +::once:/etc/init.d/containerd +::once:/etc/init.d/containers + +# Stuff to do for the 3-finger salute +::ctrlaltdel:/sbin/reboot + +# Stuff to do before rebooting +::shutdown:/usr/sbin/killall5 -15 +::shutdown:/bin/sleep 5 +::shutdown:/usr/sbin/killall5 -9 +::shutdown:/bin/echo "Unmounting filesystems" +::shutdown:/bin/umount -a -r diff --git a/projects/logging/pkg/init/etc/issue b/projects/logging/pkg/init/etc/issue new file mode 100644 index 000000000..ac3f79e41 --- /dev/null +++ b/projects/logging/pkg/init/etc/issue @@ -0,0 +1,12 @@ + +Welcome to LinuxKit + + ## . + ## ## ## == + ## ## ## ## ## === + /"""""""""""""""""\___/ === + ~~~ {~~ ~~~~ ~~~ ~~~~ ~~~ ~ / ===- ~~~ + \______ o __/ + \ \ __/ + \____\_______/ + diff --git a/projects/logging/pkg/init/init b/projects/logging/pkg/init/init new file mode 100755 index 000000000..f27b647b0 --- /dev/null +++ b/projects/logging/pkg/init/init @@ -0,0 +1,45 @@ +#!/bin/sh + +setup_console() { + tty=${1%,*} + speed=${1#*,} + inittab="$2" + securetty="$3" + line= + term="linux" + [ "$speed" = "$1" ] && speed=115200 + + case "$tty" in + ttyS*|ttyAMA*|ttyUSB*|ttyMFD*) + line="-L" + term="vt100" + ;; + tty?) + line="" + speed="38400" + term="" + ;; + esac + # skip consoles already in inittab + grep -q "^$tty:" "$inittab" && return + + echo "$tty::once:cat /etc/issue" >> "$inittab" + echo "$tty::respawn:/sbin/getty -n -l /bin/sh $line $speed $tty $term" >> "$inittab" + if ! grep -q -w "$tty" "$securetty"; then + echo "$tty" >> "$securetty" + fi +} + +/bin/mount -t tmpfs tmpfs /mnt + +/bin/cp -a / /mnt 2>/dev/null + +/bin/mount -t proc -o noexec,nosuid,nodev proc /proc +for opt in $(cat /proc/cmdline); do + case "$opt" in + console=*) + setup_console ${opt#console=} /mnt/etc/inittab /mnt/etc/securetty;; + esac +done + +exec /bin/busybox switch_root /mnt /sbin/init diff --git a/projects/logging/pkg/init/start-stop-daemon.c b/projects/logging/pkg/init/start-stop-daemon.c new file mode 100644 index 000000000..f27406746 --- /dev/null +++ b/projects/logging/pkg/init/start-stop-daemon.c @@ -0,0 +1,1054 @@ +/* + * A rewrite of the original Debian's start-stop-daemon Perl script + * in C (faster - it is executed many times during system startup). + * + * Written by Marek Michalkiewicz , + * public domain. Based conceptually on start-stop-daemon.pl, by Ian + * Jackson . May be used and distributed + * freely for any purpose. Changes by Christian Schwarz + * , to make output conform to the Debian + * Console Message Standard, also placed in public domain. Minor + * changes by Klee Dienes , also placed in the Public + * Domain. + * + * Changes by Ben Collins , added --chuid, --background + * and --make-pidfile options, placed in public domain aswell. + * + * Port to OpenBSD by Sontri Tomo Huynh + * and Andreas Schuldei + * + * Changes by Ian Jackson: added --retry (and associated rearrangements). + * + * Modified for Gentoo rc-scripts by Donny Davies : + * I removed the BSD/Hurd/OtherOS stuff, added #include + * and stuck in a #define VERSION "1.9.18". Now it compiles without + * the whole automake/config.h dance. + * + * Modified to compile on Alpine by Justin Cormack + */ + +#include +#define VERSION "1.9.18" + +#define MIN_POLL_INTERVAL 20000 /*us*/ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static int testmode = 0; +static int quietmode = 0; +static int exitnodo = 1; +static int start = 0; +static int stop = 0; +static int background = 0; +static int mpidfile = 0; +static int signal_nr = 15; +static const char *signal_str = NULL; +static int user_id = -1; +static int runas_uid = -1; +static int runas_gid = -1; +static const char *userspec = NULL; +static char *changeuser = NULL; +static const char *changegroup = NULL; +static char *changeroot = NULL; +static const char *cmdname = NULL; +static char *execname = NULL; +static char *startas = NULL; +static const char *pidfile = NULL; +static char what_stop[1024]; +static const char *schedule_str = NULL; +static const char *progname = ""; +static int nicelevel = 0; + +static struct stat exec_stat; + +struct pid_list { + struct pid_list *next; + pid_t pid; +}; + +static struct pid_list *found = NULL; +static struct pid_list *killed = NULL; + +struct schedule_item { + enum { sched_timeout, sched_signal, sched_goto, sched_forever } type; + int value; /* seconds, signal no., or index into array */ + /* sched_forever is only seen within parse_schedule and callees */ +}; + +static int schedule_length; +static struct schedule_item *schedule = NULL; + +LIST_HEAD(namespace_head, namespace); + +struct namespace { + LIST_ENTRY(namespace) list; + char *path; + int nstype; +}; + +static struct namespace_head namespace_head; + +static void *xmalloc(int size); +static void push(struct pid_list **list, pid_t pid); +static void do_help(void); +static void parse_options(int argc, char * const *argv); +static int pid_is_user(pid_t pid, uid_t uid); +static int pid_is_cmd(pid_t pid, const char *name); +static void check(pid_t pid); +static void do_pidfile(const char *name); +static void do_stop(int signal_nr, int quietmode, + int *n_killed, int *n_notkilled, int retry_nr); +static int pid_is_exec(pid_t pid, const struct stat *esb); + +#ifdef __GNUC__ +static void fatal(const char *format, ...) + __attribute__((noreturn, format(printf, 1, 2))); +static void badusage(const char *msg) + __attribute__((noreturn)); +#else +static void fatal(const char *format, ...); +static void badusage(const char *msg); +#endif + +/* This next part serves only to construct the TVCALC macro, which + * is used for doing arithmetic on struct timeval's. It works like this: + * TVCALC(result, expression); + * where result is a struct timeval (and must be an lvalue) and + * expression is the single expression for both components. In this + * expression you can use the special values TVELEM, which when fed a + * const struct timeval* gives you the relevant component, and + * TVADJUST. TVADJUST is necessary when subtracting timevals, to make + * it easier to renormalise. Whenver you subtract timeval elements, + * you must make sure that TVADJUST is added to the result of the + * subtraction (before any resulting multiplication or what have you). + * TVELEM must be linear in TVADJUST. + */ +typedef long tvselector(const struct timeval*); +static long tvselector_sec(const struct timeval *tv) { return tv->tv_sec; } +static long tvselector_usec(const struct timeval *tv) { return tv->tv_usec; } +#define TVCALC_ELEM(result, expr, sec, adj) \ +{ \ + const long TVADJUST = adj; \ + long (*const TVELEM)(const struct timeval*) = tvselector_##sec; \ + (result).tv_##sec = (expr); \ +} +#define TVCALC(result,expr) \ +do { \ + TVCALC_ELEM(result, expr, sec, (-1)); \ + TVCALC_ELEM(result, expr, usec, (+1000000)); \ + (result).tv_sec += (result).tv_usec / 1000000; \ + (result).tv_usec %= 1000000; \ +} while(0) + + +static void +fatal(const char *format, ...) +{ + va_list arglist; + + fprintf(stderr, "%s: ", progname); + va_start(arglist, format); + vfprintf(stderr, format, arglist); + va_end(arglist); + putc('\n', stderr); + exit(2); +} + + +static void * +xmalloc(int size) +{ + void *ptr; + + ptr = malloc(size); + if (ptr) + return ptr; + fatal("malloc(%d) failed", size); +} + +static void +xgettimeofday(struct timeval *tv) +{ + if (gettimeofday(tv,0) != 0) + fatal("gettimeofday failed: %s", strerror(errno)); +} + +static void +push(struct pid_list **list, pid_t pid) +{ + struct pid_list *p; + + p = xmalloc(sizeof(*p)); + p->next = *list; + p->pid = pid; + *list = p; +} + +static void +clear(struct pid_list **list) +{ + struct pid_list *here, *next; + + for (here = *list; here != NULL; here = next) { + next = here->next; + free(here); + } + + *list = NULL; +} + +static char * +next_dirname(const char *s) +{ + char *cur; + + cur = (char *)s; + + if (*cur != '\0') { + for (; *cur != '/'; ++cur) + if (*cur == '\0') + return cur; + + for (; *cur == '/'; ++cur) + ; + } + + return cur; +} + +static void +add_namespace(const char *path) +{ + int nstype; + char *nsdirname, *nsname, *cur; + struct namespace *namespace; + + cur = (char *)path; + nsdirname = nsname = ""; + + while ((cur = next_dirname(cur))[0] != '\0') { + nsdirname = nsname; + nsname = cur; + } + + if (!memcmp(nsdirname, "ipcns/", strlen("ipcns/"))) + nstype = CLONE_NEWIPC; + else if (!memcmp(nsdirname, "netns/", strlen("netns/"))) + nstype = CLONE_NEWNET; + else if (!memcmp(nsdirname, "utcns/", strlen("utcns/"))) + nstype = CLONE_NEWUTS; + else + badusage("invalid namepspace path"); + + namespace = xmalloc(sizeof(*namespace)); + namespace->path = (char *)path; + namespace->nstype = nstype; + LIST_INSERT_HEAD(&namespace_head, namespace, list); +} + +#ifdef HAVE_LXC +static void +set_namespaces() +{ + struct namespace *namespace; + int fd; + + LIST_FOREACH(namespace, &namespace_head, list) { + if ((fd = open(namespace->path, O_RDONLY)) == -1) + fatal("open namespace %s: %s", namespace->path, strerror(errno)); + if (setns(fd, namespace->nstype) == -1) + fatal("setns %s: %s", namespace->path, strerror(errno)); + } +} +#else +static void +set_namespaces() +{ + if (!LIST_EMPTY(&namespace_head)) + fatal("LCX namespaces not supported"); +} +#endif + +static void +do_help(void) +{ + printf( +"start-stop-daemon " VERSION " for Debian - small and fast C version written by\n" +"Marek Michalkiewicz , public domain.\n" +"\n" +"Usage:\n" +" start-stop-daemon -S|--start options ... -- arguments ...\n" +" start-stop-daemon -K|--stop options ...\n" +" start-stop-daemon -H|--help\n" +" start-stop-daemon -V|--version\n" +"\n" +"Options (at least one of --exec|--pidfile|--user is required):\n" +" -x|--exec program to start/check if it is running\n" +" -p|--pidfile pid file to check\n" +" -c|--chuid \n" +" change to this user/group before starting process\n" +" -u|--user | stop processes owned by this user\n" +" -n|--name stop processes with this name\n" +" -s|--signal signal to send (default TERM)\n" +" -a|--startas program to start (default is )\n" +" -N|--nicelevel add incr to the process's nice level\n" +" -b|--background force the process to detach\n" +" -m|--make-pidfile create the pidfile before starting\n" +" -R|--retry check whether processes die, and retry\n" +" -t|--test test mode, don't do anything\n" +" -o|--oknodo exit status 0 (not 1) if nothing done\n" +" -q|--quiet be more quiet\n" +" -v|--verbose be more verbose\n" +"Retry is |//... where is one of\n" +" -|[-] send that signal\n" +" wait that many seconds\n" +" forever repeat remainder forever\n" +"or may be just , meaning //KILL/\n" +"\n" +"Exit status: 0 = done 1 = nothing done (=> 0 if --oknodo)\n" +" 3 = trouble 2 = with --retry, processes wouldn't die\n"); +} + + +static void +badusage(const char *msg) +{ + if (msg) + fprintf(stderr, "%s: %s\n", progname, msg); + fprintf(stderr, "Try `%s --help' for more information.\n", progname); + exit(3); +} + +struct sigpair { + const char *name; + int signal; +}; + +const struct sigpair siglist[] = { + { "ABRT", SIGABRT }, + { "ALRM", SIGALRM }, + { "FPE", SIGFPE }, + { "HUP", SIGHUP }, + { "ILL", SIGILL }, + { "INT", SIGINT }, + { "KILL", SIGKILL }, + { "PIPE", SIGPIPE }, + { "QUIT", SIGQUIT }, + { "SEGV", SIGSEGV }, + { "TERM", SIGTERM }, + { "USR1", SIGUSR1 }, + { "USR2", SIGUSR2 }, + { "CHLD", SIGCHLD }, + { "CONT", SIGCONT }, + { "STOP", SIGSTOP }, + { "TSTP", SIGTSTP }, + { "TTIN", SIGTTIN }, + { "TTOU", SIGTTOU } +}; + +static int parse_integer (const char *string, int *value_r) { + unsigned long ul; + char *ep; + + if (!string[0]) + return -1; + + ul= strtoul(string,&ep,10); + if (ul > INT_MAX || *ep != '\0') + return -1; + + *value_r= ul; + return 0; +} + +static int parse_signal (const char *signal_str, int *signal_nr) +{ + unsigned int i; + + if (parse_integer(signal_str, signal_nr) == 0) + return 0; + + for (i = 0; i < sizeof (siglist) / sizeof (siglist[0]); i++) { + if (strcmp (signal_str, siglist[i].name) == 0) { + *signal_nr = siglist[i].signal; + return 0; + } + } + return -1; +} + +static void +parse_schedule_item(const char *string, struct schedule_item *item) { + const char *after_hyph; + + if (!strcmp(string,"forever")) { + item->type = sched_forever; + } else if (isdigit(string[0])) { + item->type = sched_timeout; + if (parse_integer(string, &item->value) != 0) + badusage("invalid timeout value in schedule"); + } else if ((after_hyph = string + (string[0] == '-')) && + parse_signal(after_hyph, &item->value) == 0) { + item->type = sched_signal; + } else { + badusage("invalid schedule item (must be [-], " + "-, or `forever'"); + } +} + +static void +parse_schedule(const char *schedule_str) { + char item_buf[20]; + const char *slash; + int count, repeatat; + ptrdiff_t str_len; + + count = 0; + for (slash = schedule_str; *slash; slash++) + if (*slash == '/') + count++; + + schedule_length = (count == 0) ? 4 : count+1; + schedule = xmalloc(sizeof(*schedule) * schedule_length); + + if (count == 0) { + schedule[0].type = sched_signal; + schedule[0].value = signal_nr; + parse_schedule_item(schedule_str, &schedule[1]); + if (schedule[1].type != sched_timeout) { + badusage ("--retry takes timeout, or schedule list" + " of at least two items"); + } + schedule[2].type = sched_signal; + schedule[2].value = SIGKILL; + schedule[3]= schedule[1]; + } else { + count = 0; + repeatat = -1; + while (schedule_str != NULL) { + slash = strchr(schedule_str,'/'); + str_len = slash ? slash - schedule_str : strlen(schedule_str); + if (str_len >= (ptrdiff_t)sizeof(item_buf)) + badusage("invalid schedule item: far too long" + " (you must delimit items with slashes)"); + memcpy(item_buf, schedule_str, str_len); + item_buf[str_len] = 0; + schedule_str = slash ? slash+1 : NULL; + + parse_schedule_item(item_buf, &schedule[count]); + if (schedule[count].type == sched_forever) { + if (repeatat >= 0) + badusage("invalid schedule: `forever'" + " appears more than once"); + repeatat = count; + continue; + } + count++; + } + if (repeatat >= 0) { + schedule[count].type = sched_goto; + schedule[count].value = repeatat; + count++; + } + assert(count == schedule_length); + } +} + +static void +parse_options(int argc, char * const *argv) +{ + static struct option longopts[] = { + { "help", 0, NULL, 'H'}, + { "stop", 0, NULL, 'K'}, + { "start", 0, NULL, 'S'}, + { "version", 0, NULL, 'V'}, + { "startas", 1, NULL, 'a'}, + { "name", 1, NULL, 'n'}, + { "oknodo", 0, NULL, 'o'}, + { "pidfile", 1, NULL, 'p'}, + { "quiet", 0, NULL, 'q'}, + { "signal", 1, NULL, 's'}, + { "test", 0, NULL, 't'}, + { "user", 1, NULL, 'u'}, + { "chroot", 1, NULL, 'r'}, + { "namespace", 1, NULL, 'd'}, + { "verbose", 0, NULL, 'v'}, + { "exec", 1, NULL, 'x'}, + { "chuid", 1, NULL, 'c'}, + { "nicelevel", 1, NULL, 'N'}, + { "background", 0, NULL, 'b'}, + { "make-pidfile", 0, NULL, 'm'}, + { "retry", 1, NULL, 'R'}, + { NULL, 0, NULL, 0} + }; + int c; + + for (;;) { + c = getopt_long(argc, argv, "HKSVa:n:op:qr:d:s:tu:vx:c:N:bmR:", + longopts, (int *) 0); + if (c == -1) + break; + switch (c) { + case 'H': /* --help */ + do_help(); + exit(0); + case 'K': /* --stop */ + stop = 1; + break; + case 'S': /* --start */ + start = 1; + break; + case 'V': /* --version */ + printf("start-stop-daemon " VERSION "\n"); + exit(0); + case 'a': /* --startas */ + startas = optarg; + break; + case 'n': /* --name */ + cmdname = optarg; + break; + case 'o': /* --oknodo */ + exitnodo = 0; + break; + case 'p': /* --pidfile */ + pidfile = optarg; + break; + case 'q': /* --quiet */ + quietmode = 1; + break; + case 's': /* --signal */ + signal_str = optarg; + break; + case 't': /* --test */ + testmode = 1; + break; + case 'u': /* --user | */ + userspec = optarg; + break; + case 'v': /* --verbose */ + quietmode = -1; + break; + case 'x': /* --exec */ + execname = optarg; + break; + case 'c': /* --chuid | */ + /* we copy the string just in case we need the + * argument later. */ + changeuser = strdup(optarg); + changeuser = strtok(changeuser, ":"); + changegroup = strtok(NULL, ":"); + break; + case 'r': /* --chroot /new/root */ + changeroot = optarg; + break; + case 'd': /* --namespace /.../||/name */ + add_namespace(optarg); + break; + case 'N': /* --nice */ + nicelevel = atoi(optarg); + break; + case 'b': /* --background */ + background = 1; + break; + case 'm': /* --make-pidfile */ + mpidfile = 1; + break; + case 'R': /* --retry | */ + schedule_str = optarg; + break; + default: + badusage(NULL); /* message printed by getopt */ + } + } + + if (signal_str != NULL) { + if (parse_signal (signal_str, &signal_nr) != 0) + badusage("signal value must be numeric or name" + " of signal (KILL, INTR, ...)"); + } + + if (schedule_str != NULL) { + parse_schedule(schedule_str); + } + + if (start == stop) + badusage("need one of --start or --stop"); + + if (!execname && !pidfile && !userspec && !cmdname) + badusage("need at least one of --exec, --pidfile, --user or --name"); + + if (!startas) + startas = execname; + + if (start && !startas) + badusage("--start needs --exec or --startas"); + + if (mpidfile && pidfile == NULL) + badusage("--make-pidfile is only relevant with --pidfile"); + + if (background && !start) + badusage("--background is only relevant with --start"); + +} + +static int +pid_is_exec(pid_t pid, const struct stat *esb) +{ + struct stat sb; + char buf[32]; + + sprintf(buf, "/proc/%d/exe", pid); + if (stat(buf, &sb) != 0) + return 0; + return (sb.st_dev == esb->st_dev && sb.st_ino == esb->st_ino); +} + + +static int +pid_is_user(pid_t pid, uid_t uid) +{ + struct stat sb; + char buf[32]; + + sprintf(buf, "/proc/%d", pid); + if (stat(buf, &sb) != 0) + return 0; + return (sb.st_uid == uid); +} + + +static int +pid_is_cmd(pid_t pid, const char *name) +{ + char buf[32]; + FILE *f; + int c; + + sprintf(buf, "/proc/%d/stat", pid); + f = fopen(buf, "r"); + if (!f) + return 0; + while ((c = getc(f)) != EOF && c != '(') + ; + if (c != '(') { + fclose(f); + return 0; + } + /* this hopefully handles command names containing ')' */ + while ((c = getc(f)) != EOF && c == *name) + name++; + fclose(f); + return (c == ')' && *name == '\0'); +} + + +static void +check(pid_t pid) +{ + if (execname && !pid_is_exec(pid, &exec_stat)) + return; + if (userspec && !pid_is_user(pid, user_id)) + return; + if (cmdname && !pid_is_cmd(pid, cmdname)) + return; + push(&found, pid); +} + +static void +do_pidfile(const char *name) +{ + FILE *f; + pid_t pid; + + f = fopen(name, "r"); + if (f) { + if (fscanf(f, "%d", &pid) == 1) + check(pid); + fclose(f); + } else if (errno != ENOENT) + fatal("open pidfile %s: %s", name, strerror(errno)); + +} + +/* WTA: this needs to be an autoconf check for /proc/pid existance. + */ +static void +do_procinit(void) +{ + DIR *procdir; + struct dirent *entry; + int foundany; + pid_t pid; + + procdir = opendir("/proc"); + if (!procdir) + fatal("opendir /proc: %s", strerror(errno)); + + foundany = 0; + while ((entry = readdir(procdir)) != NULL) { + if (sscanf(entry->d_name, "%d", &pid) != 1) + continue; + foundany++; + check(pid); + } + closedir(procdir); + if (!foundany) + fatal("nothing in /proc - not mounted?"); +} + +static void +do_findprocs(void) +{ + clear(&found); + + if (pidfile) + do_pidfile(pidfile); + else + do_procinit(); +} + +/* return 1 on failure */ +static void +do_stop(int signal_nr, int quietmode, int *n_killed, int *n_notkilled, int retry_nr) +{ + struct pid_list *p; + + do_findprocs(); + + *n_killed = 0; + *n_notkilled = 0; + + if (!found) + return; + + clear(&killed); + + for (p = found; p; p = p->next) { + if (testmode) + printf("Would send signal %d to %d.\n", + signal_nr, p->pid); + else if (kill(p->pid, signal_nr) == 0) { + push(&killed, p->pid); + (*n_killed)++; + } else { + printf("%s: warning: failed to kill %d: %s\n", + progname, p->pid, strerror(errno)); + (*n_notkilled)++; + } + } + if (quietmode < 0 && killed) { + printf("Stopped %s (pid", what_stop); + for (p = killed; p; p = p->next) + printf(" %d", p->pid); + putchar(')'); + if (retry_nr > 0) + printf(", retry #%d", retry_nr); + printf(".\n"); + } +} + + +static void +set_what_stop(const char *str) +{ + strncpy(what_stop, str, sizeof(what_stop)); + what_stop[sizeof(what_stop)-1] = '\0'; +} + +static int +run_stop_schedule(void) +{ + int r, position, n_killed, n_notkilled, value, ratio, anykilled, retry_nr; + struct timeval stopat, before, after, interval, maxinterval; + + if (testmode) { + if (schedule != NULL) { + printf("Ignoring --retry in test mode\n"); + schedule = NULL; + } + } + + if (cmdname) + set_what_stop(cmdname); + else if (execname) + set_what_stop(execname); + else if (pidfile) + sprintf(what_stop, "process in pidfile `%.200s'", pidfile); + else if (userspec) + sprintf(what_stop, "process(es) owned by `%.200s'", userspec); + else + fatal("internal error, please report"); + + anykilled = 0; + retry_nr = 0; + + if (schedule == NULL) { + do_stop(signal_nr, quietmode, &n_killed, &n_notkilled, 0); + if (n_notkilled > 0 && quietmode <= 0) + printf("%d pids were not killed\n", n_notkilled); + if (n_killed) + anykilled = 1; + goto x_finished; + } + + for (position = 0; position < schedule_length; ) { + value= schedule[position].value; + n_notkilled = 0; + + switch (schedule[position].type) { + + case sched_goto: + position = value; + continue; + + case sched_signal: + do_stop(value, quietmode, &n_killed, &n_notkilled, retry_nr++); + if (!n_killed) + goto x_finished; + else + anykilled = 1; + goto next_item; + + case sched_timeout: + /* We want to keep polling for the processes, to see if they've exited, + * or until the timeout expires. + * + * This is a somewhat complicated algorithm to try to ensure that we + * notice reasonably quickly when all the processes have exited, but + * don't spend too much CPU time polling. In particular, on a fast + * machine with quick-exiting daemons we don't want to delay system + * shutdown too much, whereas on a slow one, or where processes are + * taking some time to exit, we want to increase the polling + * interval. + * + * The algorithm is as follows: we measure the elapsed time it takes + * to do one poll(), and wait a multiple of this time for the next + * poll. However, if that would put us past the end of the timeout + * period we wait only as long as the timeout period, but in any case + * we always wait at least MIN_POLL_INTERVAL (20ms). The multiple + * (`ratio') starts out as 2, and increases by 1 for each poll to a + * maximum of 10; so we use up to between 30% and 10% of the + * machine's resources (assuming a few reasonable things about system + * performance). + */ + xgettimeofday(&stopat); + stopat.tv_sec += value; + ratio = 1; + for (;;) { + xgettimeofday(&before); + if (timercmp(&before,&stopat,>)) + goto next_item; + + do_stop(0, 1, &n_killed, &n_notkilled, 0); + if (!n_killed) + goto x_finished; + + xgettimeofday(&after); + + if (!timercmp(&after,&stopat,<)) + goto next_item; + + if (ratio < 10) + ratio++; + + TVCALC(interval, ratio * (TVELEM(&after) - TVELEM(&before) + TVADJUST)); + TVCALC(maxinterval, TVELEM(&stopat) - TVELEM(&after) + TVADJUST); + + if (timercmp(&interval,&maxinterval,>)) + interval = maxinterval; + + if (interval.tv_sec == 0 && + interval.tv_usec <= MIN_POLL_INTERVAL) + interval.tv_usec = MIN_POLL_INTERVAL; + + r = select(0,0,0,0,&interval); + if (r < 0 && errno != EINTR) + fatal("select() failed for pause: %s", + strerror(errno)); + } + + default: + assert(!"schedule[].type value must be valid"); + + } + + next_item: + position++; + } + + if (quietmode <= 0) + printf("Program %s, %d process(es), refused to die.\n", + what_stop, n_killed); + + return 2; + +x_finished: + if (!anykilled) { + if (quietmode <= 0) + printf("No %s found running; none killed.\n", what_stop); + return exitnodo; + } else { + return 0; + } +} + +/* +int main(int argc, char **argv) NONRETURNING; +*/ + +int +main(int argc, char **argv) +{ + progname = argv[0]; + + LIST_INIT(&namespace_head); + + parse_options(argc, argv); + argc -= optind; + argv += optind; + + if (execname && stat(execname, &exec_stat)) + fatal("stat %s: %s", execname, strerror(errno)); + + if (userspec && sscanf(userspec, "%d", &user_id) != 1) { + struct passwd *pw; + + pw = getpwnam(userspec); + if (!pw) + fatal("user `%s' not found\n", userspec); + + user_id = pw->pw_uid; + } + + if (changegroup && sscanf(changegroup, "%d", &runas_gid) != 1) { + struct group *gr = getgrnam(changegroup); + if (!gr) + fatal("group `%s' not found\n", changegroup); + runas_gid = gr->gr_gid; + } + if (changeuser && sscanf(changeuser, "%d", &runas_uid) != 1) { + struct passwd *pw = getpwnam(changeuser); + if (!pw) + fatal("user `%s' not found\n", changeuser); + runas_uid = pw->pw_uid; + if (changegroup == NULL) { /* pass the default group of this user */ + changegroup = ""; /* just empty */ + runas_gid = pw->pw_gid; + } + } + + if (stop) { + int i = run_stop_schedule(); + exit(i); + } + + do_findprocs(); + + if (found) { + if (quietmode <= 0) + printf("%s already running.\n", execname); + exit(exitnodo); + } + if (testmode) { + printf("Would start %s ", startas); + while (argc-- > 0) + printf("%s ", *argv++); + if (changeuser != NULL) { + printf(" (as user %s[%d]", changeuser, runas_uid); + if (changegroup != NULL) + printf(", and group %s[%d])", changegroup, runas_gid); + else + printf(")"); + } + if (changeroot != NULL) + printf(" in directory %s", changeroot); + if (nicelevel) + printf(", and add %i to the priority", nicelevel); + printf(".\n"); + exit(0); + } + if (quietmode < 0) + printf("Starting %s...\n", startas); + *--argv = startas; + if (changeroot != NULL) { + if (chdir(changeroot) < 0) + fatal("Unable to chdir() to %s", changeroot); + if (chroot(changeroot) < 0) + fatal("Unable to chroot() to %s", changeroot); + } + if (changeuser != NULL) { + if (setgid(runas_gid)) + fatal("Unable to set gid to %d", runas_gid); + if (initgroups(changeuser, runas_gid)) + fatal("Unable to set initgroups() with gid %d", runas_gid); + if (setuid(runas_uid)) + fatal("Unable to set uid to %s", changeuser); + } + + if (background) { /* ok, we need to detach this process */ + int i, fd; + if (quietmode < 0) + printf("Detatching to start %s...", startas); + i = fork(); + if (i<0) { + fatal("Unable to fork.\n"); + } + if (i) { /* parent */ + if (quietmode < 0) + printf("done.\n"); + exit(0); + } + /* child continues here */ + /* now close all extra fds */ + for (i=getdtablesize()-1; i>=0; --i) close(i); + /* change tty */ + fd = open("/dev/tty", O_RDWR); + ioctl(fd, TIOCNOTTY, 0); + close(fd); + chdir("/"); + umask(022); /* set a default for dumb programs */ + setpgid(0,0); /* set the process group */ + fd=open("/dev/null", O_RDWR); /* stdin */ + dup(fd); /* stdout */ + dup(fd); /* stderr */ + } + if (nicelevel) { + errno = 0; + if (nice(nicelevel) < 0 && errno) + fatal("Unable to alter nice level by %i: %s", nicelevel, + strerror(errno)); + } + if (mpidfile && pidfile != NULL) { /* user wants _us_ to make the pidfile :) */ + FILE *pidf = fopen(pidfile, "w"); + pid_t pidt = getpid(); + if (pidf == NULL) + fatal("Unable to open pidfile `%s' for writing: %s", pidfile, + strerror(errno)); + fprintf(pidf, "%d\n", pidt); + fclose(pidf); + } + set_namespaces(); + execv(startas, argv); + fatal("Unable to start %s: %s", startas, strerror(errno)); +} diff --git a/projects/logging/pkg/memlogd/.gitignore b/projects/logging/pkg/memlogd/.gitignore new file mode 100644 index 000000000..d1be63e32 --- /dev/null +++ b/projects/logging/pkg/memlogd/.gitignore @@ -0,0 +1,5 @@ +usr +hash +containers +.* +sbin diff --git a/projects/logging/pkg/memlogd/Dockerfile b/projects/logging/pkg/memlogd/Dockerfile new file mode 100644 index 000000000..d610c6d71 --- /dev/null +++ b/projects/logging/pkg/memlogd/Dockerfile @@ -0,0 +1,3 @@ +FROM scratch +COPY . ./ +WORKDIR / diff --git a/projects/logging/pkg/memlogd/Dockerfile.memlogd b/projects/logging/pkg/memlogd/Dockerfile.memlogd new file mode 100644 index 000000000..2d5fd9d28 --- /dev/null +++ b/projects/logging/pkg/memlogd/Dockerfile.memlogd @@ -0,0 +1,3 @@ +FROM scratch +COPY . ./ +CMD ["/usr/bin/memlogd","-fd","3"] diff --git a/projects/logging/pkg/memlogd/Makefile b/projects/logging/pkg/memlogd/Makefile new file mode 100644 index 000000000..2d7f1db0f --- /dev/null +++ b/projects/logging/pkg/memlogd/Makefile @@ -0,0 +1,66 @@ +GO_COMPILE=mobylinux/go-compile:3afebc59c5cde31024493c3f91e6102d584a30b9@sha256:e0786141ea7df8ba5735b63f2a24b4ade9eae5a02b0e04c4fca33b425ec69b0a + +SHA_IMAGE=alpine:3.5@sha256:dfbd4a3a8ebca874ebd2474f044a0b33600d4523d03b0df76e5c5986cb02d7e8 + +MEMLOGD_BINARY=usr/bin/memlogd +LOGWRITE_BINARY=usr/bin/logwrite +STARTMEMLOGD_BINARY=usr/bin/startmemlogd +LOGREAD_BINARY=sbin/logread + +IMAGE=memlogd + +.PHONY: tag push clean container +default: tag + +DEPS=$(MEMLOGD_BINARY) $(LOGWRITE_BINARY) $(STARTMEMLOGD_BINARY) $(LOGREAD_BINARY) + +$(MEMLOGD_BINARY): cmd/memlogd/main.go + mkdir -p $(dir $@) + tar -Ccmd/memlogd -cf - main.go | docker run --rm --net=none --log-driver=none -i $(GO_COMPILE) -o $@ | tar xf - + +$(LOGWRITE_BINARY): cmd/logwrite/main.go + mkdir -p $(dir $@) + tar -Ccmd/logwrite -cf - main.go | docker run --rm --net=none --log-driver=none -i $(GO_COMPILE) -o $@ | tar xf - + +$(STARTMEMLOGD_BINARY): cmd/startmemlogd/main.go + mkdir -p $(dir $@) + tar -Ccmd/startmemlogd -cf - main.go | docker run --rm --net=none --log-driver=none -i $(GO_COMPILE) -o $@ | tar xf - + +$(LOGREAD_BINARY): cmd/logread/main.go + mkdir -p $(dir $@) + tar -Ccmd/logread -cf - main.go | docker run --rm --net=none --log-driver=none -i $(GO_COMPILE) -o $@ | tar xf - + +containers: $(MEMLOGD_BINARY) Dockerfile.memlogd config.json + mkdir -p containers/init/memlogd/rootfs + tar -cf - $^ | docker build -f Dockerfile.memlogd -t $(IMAGE):build1 --no-cache - + docker create --name $(IMAGE)-build1 $(IMAGE):build1 + docker export $(IMAGE)-build1 | tar -Ccontainers/init/memlogd/rootfs -xv - + docker rm $(IMAGE)-build1 + docker rmi $(IMAGE):build1 + mv containers/init/memlogd/rootfs/Dockerfile.memlogd containers/init/memlogd/rootfs/Dockerfile + mv containers/init/memlogd/rootfs/config.json containers/init/memlogd + +container: Dockerfile $(LOGWRITE_BINARY) $(STARTMEMLOGD_BINARY) $(LOGREAD_BINARY) containers + tar cf - $^ | docker build --no-cache -t $(IMAGE):build - + +hash: Dockerfile Dockerfile.memlogd $(DEPS) + find $^ -type f | xargs cat | docker run --rm -i $(SHA_IMAGE) sha1sum - | sed 's/ .*//' > hash + +push: hash container + docker pull linuxkit/$(IMAGE):$(shell cat hash) || \ + (docker tag $(IMAGE):build linuxkit/$(IMAGE):$(shell cat hash) && \ + docker push linuxkit/$(IMAGE):$(shell cat hash)) + docker rmi $(IMAGE):build + rm -f hash + +tag: hash container + docker pull linuxkit/$(IMAGE):$(shell cat hash) || \ + docker tag $(IMAGE):build linuxkit/$(IMAGE):$(shell cat hash) + docker rmi $(IMAGE):build + rm -f hash + +clean: + rm -rf hash usr containers sbin + +.DELETE_ON_ERROR: + diff --git a/projects/logging/pkg/memlogd/cmd/logread/main.go b/projects/logging/pkg/memlogd/cmd/logread/main.go new file mode 100644 index 000000000..ed53d1555 --- /dev/null +++ b/projects/logging/pkg/memlogd/cmd/logread/main.go @@ -0,0 +1,52 @@ +package main + +import ( + "bufio" + "flag" + "net" + "os" +) + +const ( + logDump byte = iota + logFollow + logDumpFollow +) + +func main() { + var err error + + var socketPath string + var follow bool + var dumpFollow bool + + flag.StringVar(&socketPath, "socket", "/tmp/memlogdq.sock", "memlogd log query socket") + flag.BoolVar(&dumpFollow, "F", false, "dump log, then follow") + flag.BoolVar(&follow, "f", false, "follow log buffer") + flag.Parse() + + addr := net.UnixAddr{socketPath, "unix"} + conn, err := net.DialUnix("unix", nil, &addr) + if err != nil { + panic(err) + } + defer conn.Close() + + var n int + switch { + case dumpFollow: + n, err = conn.Write([]byte{logDumpFollow}) + case follow && !dumpFollow: + n, err = conn.Write([]byte{logFollow}) + default: + n, err = conn.Write([]byte{logDump}) + } + + if err != nil || n < 1 { + panic(err) + } + + r := bufio.NewReader(conn) + r.WriteTo(os.Stdout) + +} diff --git a/projects/logging/pkg/memlogd/cmd/logwrite/main.go b/projects/logging/pkg/memlogd/cmd/logwrite/main.go new file mode 100644 index 000000000..4c325162d --- /dev/null +++ b/projects/logging/pkg/memlogd/cmd/logwrite/main.go @@ -0,0 +1,97 @@ +package main + +import ( + "flag" + "fmt" + "io" + "log" + "net" + "os" + "os/exec" + "syscall" +) + +func getLogFileSocketPair() (*os.File, int) { + fds, err := syscall.Socketpair(syscall.AF_UNIX, syscall.SOCK_STREAM, 0) + if err != nil { + panic(err) + } + + localFd := fds[0] + remoteFd := fds[1] + + localLogFile := os.NewFile(uintptr(localFd), "") + return localLogFile, remoteFd +} + +func sendFD(conn *net.UnixConn, remoteAddr *net.UnixAddr, source string, fd int) error { + oobs := syscall.UnixRights(fd) + _, _, err := conn.WriteMsgUnix([]byte(source), oobs, remoteAddr) + return err +} + +func main() { + var err error + var ok bool + + var serverSocket string + var name string + + flag.StringVar(&serverSocket, "socket", "/tmp/memlogd.sock", "socket to pass fd's to memlogd") + flag.StringVar(&name, "n", "", "name of sender, defaults to first argument if left blank") + flag.Parse() + args := flag.Args() + + if len(args) < 1 { + log.Fatal("no command specified") + } + + if name == "" { + name = args[0] + } + + localStdoutLog, remoteStdoutFd := getLogFileSocketPair() + localStderrLog, remoteStderrFd := getLogFileSocketPair() + + var outSocket int + if outSocket, err = syscall.Socket(syscall.AF_UNIX, syscall.SOCK_DGRAM, 0); err != nil { + log.Fatal("Unable to create socket: ", err) + } + + var outFile net.Conn + if outFile, err = net.FileConn(os.NewFile(uintptr(outSocket), "")); err != nil { + log.Fatal(err) + } + + var conn *net.UnixConn + if conn, ok = outFile.(*net.UnixConn); !ok { + log.Fatal("Internal error, invalid cast.") + } + + raddr := net.UnixAddr{Name: serverSocket, Net: "unixgram"} + + if err = sendFD(conn, &raddr, name+".stdout", remoteStdoutFd); err != nil { + log.Fatal("fd stdout send failed: ", err) + } + + if err = sendFD(conn, &raddr, name+".stderr", remoteStderrFd); err != nil { + log.Fatal("fd stderr send failed: ", err) + } + + cmd := exec.Command(args[0], args[1:]...) + outStderr := io.MultiWriter(localStderrLog, os.Stderr) + outStdout := io.MultiWriter(localStdoutLog, os.Stdout) + cmd.Stderr = outStderr + cmd.Stdout = outStdout + if err = cmd.Run(); err != nil { + if exitError, ok := err.(*exec.ExitError); ok { + // exit with exit code from process + status := exitError.Sys().(syscall.WaitStatus) + os.Exit(status.ExitStatus()) + } else { + // no exit code, report error and exit 1 + fmt.Println(err) + os.Exit(1) + } + } +} diff --git a/projects/logging/pkg/memlogd/cmd/memlogd/main.go b/projects/logging/pkg/memlogd/cmd/memlogd/main.go new file mode 100644 index 000000000..d9f576d0e --- /dev/null +++ b/projects/logging/pkg/memlogd/cmd/memlogd/main.go @@ -0,0 +1,315 @@ +package main + +import ( + "bufio" + "bytes" + "container/list" + "container/ring" + "flag" + "fmt" + "io" + "log" + "net" + "os" + "sync" + "syscall" + "time" +) + +type logEntry struct { + time time.Time + source string + msg string +} + +type fdMessage struct { + name string + fd int +} + +type logMode byte + +const ( + logDump logMode = iota + logFollow + logDumpFollow +) + +type queryMessage struct { + conn *net.UnixConn + mode logMode +} + +type connListener struct { + conn *net.UnixConn + cond *sync.Cond // condition and mutex used to notify listeners of more data + buffer bytes.Buffer + err error + exitOnEOF bool // exit instead of blocking if no more data in read buffer +} + +func doLog(logCh chan logEntry, msg string) { + logCh <- logEntry{time: time.Now(), source: "memlogd", msg: msg} + return +} + +func logQueryHandler(l *connListener) { + defer l.conn.Close() + + data := make([]byte, 0xffff) + + l.cond.L.Lock() + for { + var n, remaining int + var rerr, werr error + + for rerr == nil && werr == nil { + if n, rerr = l.buffer.Read(data); n == 0 { // process data before checking error + break // exit read loop to wait for more data + } + l.cond.L.Unlock() + + remaining = n + w := data + for remaining > 0 && werr == nil { + w = data[:remaining] + n, werr = l.conn.Write(w) + w = w[n:] + remaining = remaining - n + } + + l.cond.L.Lock() + } + + // check errors + if werr != nil { + l.err = werr + l.cond.L.Unlock() + break + } + + if rerr != nil && rerr != io.EOF { // EOF is ok, just wait for more data + l.err = rerr + l.cond.L.Unlock() + break + } + if l.exitOnEOF && rerr == io.EOF { // ... unless we should exit on EOF + l.err = nil + l.cond.L.Unlock() + break + } + l.cond.Wait() // unlock and wait for more data + } +} + +func (msg *logEntry) String() string { + return fmt.Sprintf("%s %s %s", msg.time.Format(time.RFC3339), msg.source, msg.msg) +} + +func ringBufferHandler(ringSize int, logCh chan logEntry, queryMsgChan chan queryMessage) { + // Anything that interacts with the ring buffer goes through this handler + ring := ring.New(ringSize) + listeners := list.New() + + for { + select { + case msg := <-logCh: + fmt.Printf("%s\n", msg.String()) + // add log entry + ring.Value = msg + ring = ring.Next() + + // send to listeners + var l *connListener + var remove []*list.Element + for e := listeners.Front(); e != nil; e = e.Next() { + l = e.Value.(*connListener) + if l.err != nil { + remove = append(remove, e) + continue + } + l.cond.L.Lock() + l.buffer.WriteString(fmt.Sprintf("%s\n", msg.String())) + l.cond.L.Unlock() + l.cond.Signal() + } + if len(remove) > 0 { // remove listeners that returned errors + for _, e := range remove { + l = e.Value.(*connListener) + fmt.Println("Removing connection, error: ", l.err) + listeners.Remove(e) + } + } + + case msg := <-queryMsgChan: + l := connListener{conn: msg.conn, cond: sync.NewCond(&sync.Mutex{}), err: nil, exitOnEOF: (msg.mode == logDump)} + listeners.PushBack(&l) + go logQueryHandler(&l) + if msg.mode == logDumpFollow || msg.mode == logDump { + l.cond.L.Lock() + // fill with current data in buffer + ring.Do(func(f interface{}) { + if msg, ok := f.(logEntry); ok { + s := fmt.Sprintf("%s\n", msg.String()) + l.buffer.WriteString(s) + } + }) + l.cond.L.Unlock() + l.cond.Signal() // signal handler that more data is available + } + } + } +} + +func receiveQueryHandler(l *net.UnixListener, logCh chan logEntry, queryMsgChan chan queryMessage) { + for { + var conn *net.UnixConn + var err error + if conn, err = l.AcceptUnix(); err != nil { + doLog(logCh, fmt.Sprintf("Connection error %s", err)) + continue + } + mode := make([]byte, 1) + n, err := conn.Read(mode) + if err != nil || n != 1 { + doLog(logCh, fmt.Sprintf("No mode received: %s", err)) + } + queryMsgChan <- queryMessage{conn, logMode(mode[0])} + } +} + +func receiveFdHandler(conn *net.UnixConn, logCh chan logEntry, fdMsgChan chan fdMessage) { + oob := make([]byte, 512) + b := make([]byte, 512) + + for { + n, oobn, _, _, err := conn.ReadMsgUnix(b, oob) + if err != nil { + doLog(logCh, fmt.Sprintf("ERROR: Unable to read oob data: %s", err.Error())) + continue + } + + if oobn == 0 { + continue + } + + oobmsgs, err := syscall.ParseSocketControlMessage(oob[:oobn]) + if err != nil { + doLog(logCh, fmt.Sprintf("ERROR: Failed to parse socket control message: %s", err.Error())) + continue + } + + for _, oobmsg := range oobmsgs { + r, err := syscall.ParseUnixRights(&oobmsg) + if err != nil { + doLog(logCh, fmt.Sprintf("ERROR: Failed to parse UNIX rights in oob data: %s", err.Error())) + continue + } + for _, fd := range r { + name := "" + if n > 0 { + name = string(b[:n]) + } + fdMsgChan <- fdMessage{name: name, fd: fd} + } + } + } +} + +func readLogFromFd(maxLineLen int, fd int, source string, logCh chan logEntry) { + f := os.NewFile(uintptr(fd), "") + defer f.Close() + + r := bufio.NewReader(f) + l, isPrefix, err := r.ReadLine() + var buffer bytes.Buffer + + for err == nil { + buffer.Write(l) + for isPrefix { + l, isPrefix, err = r.ReadLine() + if err != nil { + break + } + if buffer.Len() < maxLineLen { + buffer.Write(l) + } + } + if buffer.Len() > maxLineLen { + buffer.Truncate(maxLineLen) + } + logCh <- logEntry{time: time.Now(), source: source, msg: buffer.String()} + buffer.Reset() + + l, isPrefix, err = r.ReadLine() + } +} + +func main() { + var err error + + var socketQueryPath string + var passedQueryFD int + var socketLogPath string + var passedLogFD int + var linesInBuffer int + var lineMaxLength int + + flag.StringVar(&socketQueryPath, "socket-query", "/tmp/memlogdq.sock", "unix domain socket for responding to log queries. Overridden by -fd-query") + flag.StringVar(&socketLogPath, "socket-log", "/tmp/memlogd.sock", "unix domain socket to listen for new fds to add to log. Overridden by -fd-log") + flag.IntVar(&passedLogFD, "fd-log", -1, "an existing SOCK_DGRAM socket for receiving fd's. Overrides -socket-log.") + flag.IntVar(&passedQueryFD, "fd-query", -1, "an existing SOCK_STREAM for receiving log read requets. Overrides -socket-query.") + flag.IntVar(&linesInBuffer, "max-lines", 5000, "Number of log lines to keep in memory") + flag.IntVar(&lineMaxLength, "max-line-len", 1024, "Maximum line length recorded. Additional bytes are dropped.") + + flag.Parse() + + var connLogFd *net.UnixConn + if passedLogFD == -1 { // no fd on command line, use socket path + addr := net.UnixAddr{socketLogPath, "unixgram"} + if connLogFd, err = net.ListenUnixgram("unixgram", &addr); err != nil { + log.Fatal("Unable to open socket: ", err) + } + defer os.Remove(addr.Name) + } else { // use given fd + var f net.Conn + if f, err = net.FileConn(os.NewFile(uintptr(passedLogFD), "")); err != nil { + log.Fatal("Unable to open fd: ", err) + } + connLogFd = f.(*net.UnixConn) + } + defer connLogFd.Close() + + var connQuery *net.UnixListener + if passedQueryFD == -1 { // no fd on command line, use socket path + addr := net.UnixAddr{socketQueryPath, "unix"} + if connQuery, err = net.ListenUnix("unix", &addr); err != nil { + log.Fatal("Unable to open socket: ", err) + } + defer os.Remove(addr.Name) + } else { // use given fd + var f net.Listener + if f, err = net.FileListener(os.NewFile(uintptr(passedQueryFD), "")); err != nil { + log.Fatal("Unable to open fd: ", err) + } + connQuery = f.(*net.UnixListener) + } + defer connQuery.Close() + + logCh := make(chan logEntry) + fdMsgChan := make(chan fdMessage) + queryMsgChan := make(chan queryMessage) + + go receiveFdHandler(connLogFd, logCh, fdMsgChan) + go receiveQueryHandler(connQuery, logCh, queryMsgChan) + go ringBufferHandler(linesInBuffer, logCh, queryMsgChan) + + doLog(logCh, "memlogd started") + + for true { + select { + case msg := <-fdMsgChan: // incoming fd + go readLogFromFd(lineMaxLength, msg.fd, msg.name, logCh) + } + } +} diff --git a/projects/logging/pkg/memlogd/cmd/startmemlogd/main.go b/projects/logging/pkg/memlogd/cmd/startmemlogd/main.go new file mode 100644 index 000000000..890721bbc --- /dev/null +++ b/projects/logging/pkg/memlogd/cmd/startmemlogd/main.go @@ -0,0 +1,76 @@ +package main + +import ( + "flag" + "fmt" + "log" + "net" + "os" + "os/exec" + "syscall" +) + +func main() { + var socketLogPath string + var socketQueryPath string + var memlogdBundle string + var pidFile string + var detach bool + flag.StringVar(&socketLogPath, "socket-log", "/tmp/memlogd.sock", "path to fd logging socket. Created and passed to logging container. Existing socket will be removed.") + flag.StringVar(&socketQueryPath, "socket-query", "/tmp/memlogdq.sock", "path to query socket. Created and passed to logging container. Existing socket will be removed.") + flag.StringVar(&memlogdBundle, "bundle", "/containers/init/memlogd", "runc bundle with memlogd") + flag.StringVar(&pidFile, "pid-file", "/run/memlogd.pid", "path to pid file") + flag.BoolVar(&detach, "detach", true, "detach from subprocess") + flag.Parse() + + laddr := net.UnixAddr{socketLogPath, "unixgram"} + os.Remove(laddr.Name) // remove existing socket + lconn, err := net.ListenUnixgram("unixgram", &laddr) + if err != nil { + panic(err) + } + lfd, err := lconn.File() + if err != nil { + panic(err) + } + + qaddr := net.UnixAddr{socketQueryPath, "unix"} + os.Remove(qaddr.Name) // remove existing socket + qconn, err := net.ListenUnix("unix", &qaddr) + if err != nil { + panic(err) + } + qfd, err := qconn.File() + if err != nil { + panic(err) + } + + cmd := exec.Command("/sbin/start-stop-daemon", "--start", "--pidfile", pidFile, + "--exec", "/usr/bin/runc", "--", "run", "--preserve-fds=2", + "--bundle", memlogdBundle, + "--pid-file", pidFile, "memlogd") + log.Println(cmd.Args) + cmd.ExtraFiles = append(cmd.ExtraFiles, lfd, qfd) + cmd.Stdout = os.Stdout + cmd.Stderr = os.Stderr + if err := cmd.Start(); err != nil { + panic(err) + } + if detach { + if err := cmd.Process.Release(); err != nil { + panic(err) + } + } else { + if err := cmd.Wait(); err != nil { + if exitError, ok := err.(*exec.ExitError); ok { + // exit with exit code from process + status := exitError.Sys().(syscall.WaitStatus) + os.Exit(status.ExitStatus()) + } else { + // no exit code, report error and exit 1 + fmt.Println(err) + os.Exit(1) + } + } + } +} diff --git a/projects/logging/pkg/memlogd/config.json b/projects/logging/pkg/memlogd/config.json new file mode 100644 index 000000000..d2b188a27 --- /dev/null +++ b/projects/logging/pkg/memlogd/config.json @@ -0,0 +1,115 @@ +{ + "ociVersion": "1.0.0-rc5-dev", + "platform": { + "os": "linux", + "arch": "amd64" + }, + "process": { + "consoleSize": { + "height": 0, + "width": 0 + }, + "user": { + "uid": 0, + "gid": 0 + }, + "args": [ + "/usr/bin/memlogd", + "-fd-log", + "3", + "-fd-query", + "4" + ], + "env": [ + "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" + ], + "cwd": "/", + "capabilities": {} + }, + "root": { + "path": "rootfs", + "readonly": true + }, + "mounts": [ + { + "destination": "/proc", + "type": "proc", + "source": "proc", + "options": [ + "nosuid", + "nodev", + "noexec", + "relatime" + ] + }, + { + "destination": "/dev", + "type": "tmpfs", + "source": "tmpfs", + "options": [ + "nosuid", + "strictatime", + "mode=755", + "size=65536k", + "ro" + ] + }, + { + "destination": "/sys", + "type": "sysfs", + "source": "sysfs", + "options": [ + "nosuid", + "noexec", + "nodev", + "ro" + ] + }, + { + "destination": "/dev/pts", + "type": "devpts", + "source": "devpts", + "options": [ + "nosuid", + "noexec", + "newinstance", + "ptmxmode=0666", + "mode=0620" + ] + }, + { + "destination": "/sys/fs/cgroup", + "type": "cgroup", + "source": "cgroup", + "options": [ + "nosuid", + "noexec", + "nodev", + "relatime", + "ro" + ] + } + ], + "linux": { + "resources": { + "disableOOMKiller": false + }, + "namespaces": [ + { + "type": "network" + }, + { + "type": "pid" + }, + { + "type": "ipc" + }, + { + "type": "uts" + }, + { + "type": "mount" + } + ] + } +}