Production-Grade Container Scheduling and Management
Go to file
Florent Delannoy e627474e8f Fix fluentd-gcp addon liveness probe
Fix three issues with the fluentd-gcp liveness probe:

h1. STUCK_THRESHOLD_SECONDS was overridden by LIVENESS_THRESHOLD_SECONDS
if defined

Probably a copy/paste issue introduced in edf1ffc074

h1. `[[` is [a bashism](https://stackoverflow.com/a/47576482), and will always failed when called with `/bin/sh`

Introduced by a844523c20

Given that we call the liveness probe with `/bin/sh`, we cannot use the
double-bracketed `[[` syntax for test, as it is not POSIX-compliant and
will throw an error.

Annoyingly, even through it prints an error, `sh` returns with exit code 0
in this case:

```bash
root@fluentd-7mprs:/# sh liveness.sh
liveness.sh: 8: liveness.sh: [[: not found
liveness.sh: 15: liveness.sh: [[: not found
root@fluentd-7mprs:/# echo $?
0
```

Which means the liveness probe is considered successful by Kubernetes,
despite failing to test things as it was intended. This is also
probably the reason why this bug wasn't reported sooner :)

Thankfully, the test in this case can just as easily be written as
POSIX-compliant as it doesn't use any bash-specific features within the
`[[` block.

h1. Buffers are transient and cannot be relied upon for monitoring

Finally, after fixing the above issue, we started seeing the fluentd
containers being restarted very often, and found an issue with the
underlying logic of the liveness probe.

The probe checks that the pod is still alive by running the following
command:

`find /var/log/fluentd-buffers -type f -newer /tmp/marker-stuck -print -quit`

This checks if any _regular_ file exists under `/var/log/fluentd-buffers`
that is more recent than a predetermined time, and will return an empty
string otherwise.

The issue is that these buffers are temporary and volatile, they get created and
deleted constantly. Here is an example of running that check every second on a
running fluentd:

```
root@fluentd-eks-playground-jdc8m:/# LIVENESS_THRESHOLD_SECONDS=${LIVENESS_THRESHOLD_SECONDS:-300};
root@fluentd-eks-playground-jdc8m:/# STUCK_THRESHOLD_SECONDS=${LIVENESS_THRESHOLD_SECONDS:-900};
root@fluentd-eks-playground-jdc8m:/# touch -d "${STUCK_THRESHOLD_SECONDS} seconds ago" /tmp/marker-stuck;
root@fluentd-eks-playground-jdc8m:/# touch -d "${LIVENESS_THRESHOLD_SECONDS} seconds ago" /tmp/marker-liveness;
root@fluentd-eks-playground-jdc8m:/# while true; do date ; find /var/log/fluentd-buffers -type f -newer /tmp/marker-stuck -print -quit ; sleep 1 ; done
Fri Feb 22 10:52:57 UTC 2019
Fri Feb 22 10:52:58 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827964ccf4c7004103c3fa7c8533f85.log
Fri Feb 22 10:52:59 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827964ccf4c7004103c3fa7c8533f85.log
Fri Feb 22 10:53:00 UTC 2019
Fri Feb 22 10:53:01 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827964fb8b2eedcccd2763ea7775cc2.log
Fri Feb 22 10:53:02 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827964fb8b2eedcccd2763ea7775cc2.log
Fri Feb 22 10:53:03 UTC 2019
Fri Feb 22 10:53:04 UTC 2019
Fri Feb 22 10:53:05 UTC 2019
Fri Feb 22 10:53:06 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827965564883997b673d703af54848b.log
Fri Feb 22 10:53:07 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827965564883997b673d703af54848b.log
Fri Feb 22 10:53:08 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827965564883997b673d703af54848b.log
Fri Feb 22 10:53:09 UTC 2019
Fri Feb 22 10:53:10 UTC 2019
Fri Feb 22 10:53:11 UTC 2019
Fri Feb 22 10:53:12 UTC 2019
Fri Feb 22 10:53:13 UTC 2019
Fri Feb 22 10:53:14 UTC 2019
Fri Feb 22 10:53:15 UTC 2019
Fri Feb 22 10:53:16 UTC 2019
```

We can see buffers being created, then disappearing. The LivenessProbe running
under these conditions has a ~50% chance of failing, despite fluentd being
perfectly happy.

I believe that check is probably ok for fluentd installs using large
amounts of buffers, in which case the liveness probe will be correct more
often than not, but fluentd installs that use buffering less intensively
will be negatively impacted by this.

My solution to fix this is to check the last updated time of buffering
_folders_ within `/var/log/fluentd_buffers`. These _do_ get updated when
buffers are created, and do not get deleted as buffers are emptied,
making them the perfect candidate for our use.

Here's an example with the `-d` flag for directories:
```
root@fluentd-eks-playground-jdc8m:/# while true; do date ; find /var/log/fluentd-buffers -type d -newer /tmp/marker-stuck -print -quit ; sleep 1 ; done
Fri Feb 22 10:57:51 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:52 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:53 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:54 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:55 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:56 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:57 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:58 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:59 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:58:00 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:58:01 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:58:02 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:58:03 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
```

And example of the directory being updated as new buffers come in:
```
root@fluentd-eks-playground-jdc8m:/# ls -lah /var/log/fluentd-buffers/kubernetes.system.buffer
total 0
drwxr-xr-x 2 root root  6 Feb 22 11:17 .
drwxr-xr-x 3 root root 38 Feb 22 11:14 ..
root@fluentd-eks-playground-jdc8m:/# ls -lah /var/log/fluentd-buffers/kubernetes.system.buffer
total 16K
drwxr-xr-x 2 root root  224 Feb 22 11:18 .
drwxr-xr-x 3 root root   38 Feb 22 11:14 ..
-rw-r--r-- 1 root root 1.8K Feb 22 11:18 buffer.b58279be6e21e8b29fc333a7d50096ed0.log
-rw-r--r-- 1 root root  215 Feb 22 11:18 buffer.b58279be6e21e8b29fc333a7d50096ed0.log.meta
-rw-r--r-- 1 root root  429 Feb 22 11:18 buffer.b58279be6f09bdfe047a96486a525ece2.log
-rw-r--r-- 1 root root  195 Feb 22 11:18 buffer.b58279be6f09bdfe047a96486a525ece2.log.meta
root@fluentd-eks-playground-jdc8m:/# ls -lah /var/log/fluentd-buffers/kubernetes.system.buffer
total 0
drwxr-xr-x 2 root root  6 Feb 22 11:18 .
drwxr-xr-x 3 root root 38 Feb 22 11:14 ..
```
2019-02-25 11:48:31 +00:00
.github delete all duplicate empty blanks 2019-02-23 10:28:04 +08:00
api Merge pull request #73555 from bsalamat/priority_to_ga 2019-02-22 16:14:49 -08:00
build bazel: create genrules to produce debs and RPMs without arch-specific names 2019-02-23 15:44:34 -08:00
cluster Fix fluentd-gcp addon liveness probe 2019-02-25 11:48:31 +00:00
cmd Merge pull request #74457 from neolit123/fix-kubeproxy-winkernel 2019-02-23 14:05:15 -08:00
docs Updated OWNERS files to include link to docs 2019-02-04 22:33:12 +01:00
Godeps Merge pull request #74348 from danielqsj/ku 2019-02-21 22:17:57 -08:00
hack Merge pull request #74269 from moshe010/kubelet_gen_cert 2019-02-23 18:41:10 -08:00
logo delete all duplicate empty blanks 2019-02-23 10:28:04 +08:00
pkg Merge pull request #74328 from daixiang0/delete-blank 2019-02-23 01:43:58 -08:00
plugin fix typo 2019-02-22 22:38:48 +08:00
staging Merge pull request #74328 from daixiang0/delete-blank 2019-02-23 01:43:58 -08:00
test Merge pull request #72939 from runyontr/test-cmd-what 2019-02-23 02:54:36 -08:00
third_party Updated OWNERS files to include link to docs 2019-02-04 22:33:12 +01:00
translations Updated OWNERS files to include link to docs 2019-02-04 22:33:12 +01:00
vendor Merge pull request #74348 from danielqsj/ku 2019-02-21 22:17:57 -08:00
.bazelrc
.generated_files generated_files: remove line about /docs/.generated_docs 2019-02-11 02:54:45 +02:00
.gitattributes Remove build/verify scripts for swagger 1.2 API docs, API server swagger ui / swagger 1.2 config 2019-01-15 13:33:06 -05:00
.gitignore git ignore vendordiff.patch to avoid accidental checkin 2018-10-30 18:49:53 -04:00
.kazelcfg.json
BUILD.bazel
CHANGELOG-1.2.md
CHANGELOG-1.3.md
CHANGELOG-1.4.md delete all duplicate empty blanks 2019-02-23 10:28:04 +08:00
CHANGELOG-1.5.md replace features to enhancements 2018-12-12 11:37:21 +08:00
CHANGELOG-1.6.md replace features to enhancements 2018-12-12 11:37:21 +08:00
CHANGELOG-1.7.md delete all duplicate empty blanks 2019-02-23 10:28:04 +08:00
CHANGELOG-1.8.md delete all duplicate empty blanks 2019-02-23 10:28:04 +08:00
CHANGELOG-1.9.md delete all duplicate empty blanks 2019-02-23 10:28:04 +08:00
CHANGELOG-1.10.md delete all duplicate empty blanks 2019-02-23 10:28:04 +08:00
CHANGELOG-1.11.md delete all duplicate empty blanks 2019-02-23 10:28:04 +08:00
CHANGELOG-1.12.md delete all duplicate empty blanks 2019-02-23 10:28:04 +08:00
CHANGELOG-1.13.md Update CHANGELOG-1.13.md for v1.13.3. 2019-02-01 21:06:28 +00:00
CHANGELOG-1.14.md delete all duplicate empty blanks 2019-02-23 10:28:04 +08:00
CHANGELOG.md Updated a link to the latest changelog 2018-12-28 15:05:09 +13:00
code-of-conduct.md
CONTRIBUTING.md
LICENSE
Makefile
Makefile.generated_files
OWNERS Updated OWNERS files to include link to docs 2019-02-04 22:33:12 +01:00
OWNERS_ALIASES Merge pull request #74270 from Huang-Wei/wei-scheduler-approver 2019-02-23 01:43:37 -08:00
README.md
SECURITY_CONTACTS
SUPPORT.md delete all duplicate empty blanks 2019-02-23 10:28:04 +08:00
WORKSPACE

Kubernetes

GoDoc Widget CII Best Practices


Kubernetes is an open source system for managing containerized applications across multiple hosts; providing basic mechanisms for deployment, maintenance, and scaling of applications.

Kubernetes builds upon a decade and a half of experience at Google running production workloads at scale using a system called Borg, combined with best-of-breed ideas and practices from the community.

Kubernetes is hosted by the Cloud Native Computing Foundation (CNCF). If you are a company that wants to help shape the evolution of technologies that are container-packaged, dynamically-scheduled and microservices-oriented, consider joining the CNCF. For details about who's involved and how Kubernetes plays a role, read the CNCF announcement.


To start using Kubernetes

See our documentation on kubernetes.io.

Try our interactive tutorial.

Take a free course on Scalable Microservices with Kubernetes.

To start developing Kubernetes

The community repository hosts all information about building Kubernetes from source, how to contribute code and documentation, who to contact about what, etc.

If you want to build Kubernetes right away there are two options:

You have a working Go environment.
$ go get -d k8s.io/kubernetes
$ cd $GOPATH/src/k8s.io/kubernetes
$ make
You have a working Docker environment.
$ git clone https://github.com/kubernetes/kubernetes
$ cd kubernetes
$ make quick-release

For the full story, head over to the developer's documentation.

Support

If you need support, start with the troubleshooting guide, and work your way through the process that we've outlined.

That said, if you have questions, reach out to us one way or another.

Analytics