Commit Graph

280 Commits

Author SHA1 Message Date
Davanum Srinivas
50bea1dad8
Move from k8s.gcr.io to registry.k8s.io
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2022-05-31 10:16:53 -04:00
Jordan Liggitt
a44192b955 Remove PodSecurityPolicy cluster config 2022-05-04 16:00:56 -04:00
Davanum Srinivas
9682b7248f
OWNERS cleanup - Jan 2021 Week 1
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2022-01-10 08:14:29 -05:00
Davanum Srinivas
9405e9b55e
Check in OWNERS modified by update-yamlfmt.sh
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2021-12-09 21:31:26 -05:00
David Ashpole
3813ed1ef7 fix prometheus-to-sd image for fluentbit 2021-05-27 10:54:10 -07:00
David Ashpole
febf9d9366 update event-exporter and prometheus-to-sd versions in cluster addons 2021-05-13 11:40:41 -07:00
Marian Lobur
41e39dd1fa Add required fields to fluentd-gcp-scaler-policy CRD. 2021-04-19 16:01:46 +02:00
Marian Lobur
d4de8438e3 Switch fluentd-gcp-scaler policy to non deprecated api.
Starting from Kubernetes 1.22 apiextensions.k8s.io/v1beta1 is removed.
Instead apiextensions.k8s.io/v1 should be used: https://kubernetes.io/docs/reference/using-api/deprecation-guide/#customresourcedefinition-v122
2021-04-12 10:28:50 +02:00
Sean McGinnis
be131457ef
Remove stale analytics links from docs
Many README files and other docs contained a link to a an appspot
tracking app that is no longer active. Following the links leads to an
error about Go 1.9 no longer being supported. Go 1.9 support was dropped
in appspot in 2019 and disabled June 2020.

This also resulted in a broken image link displaying when viewing these
files on GitHub. Since the app is no longer functioning, and since it
causes a potentially (but granted, minor) confusing error to display,
this just removes those links as I don't believe they are needed
anymore.

Signed-off-by: Sean McGinnis <sean.mcginnis@gmail.com>
2020-11-18 07:04:48 -06:00
Arjun Ramachandrula
dcc1ab176d Removed broken link to Analytics 2020-08-13 16:03:37 -04:00
Nikhita Raghunath
3a74f461a2 Revert "Merge pull request #93160 from logicalhan/triage-instrumentation"
This reverts commit 1ed2cf1895, reversing
changes made to 04ecdb9eb6.
2020-07-24 18:09:07 +05:30
Han Kang
f3c02d7221 auto triage sig-instrumentation tagged PRs
Change-Id: Ibae7373fb197485aeb222f1455515178cc3b4d13
2020-07-16 13:48:25 -07:00
Kubernetes Prow Robot
81a0e2f62b
Merge pull request #85923 from MrHohn/sig-gcp-owner-file
Migrate OWNERS file to apply the area/provider/gcp label
2020-04-02 19:03:46 -07:00
Yu-Ju Hong
bcd975aa65 Replace Beta OS/arch labels with the GA ones
Beta OS/arch labels have been deprecated since 1.14.
This change replaces these labels with the GA ones.
2020-02-13 09:38:51 -08:00
Jordan Liggitt
5c6371502a Update addon permissions 2019-12-13 12:23:39 -05:00
Zihong Zheng
5463eda704 Migrate OWNERS file to apply the area/provider/gcp label 2019-12-04 17:05:43 -08:00
Marek Siarkowicz
c601d34eba Introduce sig-instrumentation aliases in OWNERS_ALISES and simplify OWNERS files 2019-10-10 14:04:20 +02:00
Marian Lobur
be1704ae84 Bump version of event-exporter and prometheus-to-sd.
Version 0.3.1 of event-exporter switches from json usage to protobuf.
Version 0.7.2 of prometheus-to-sd introduces security fixes and new
metrics.
2019-10-02 14:55:53 +02:00
Zheng Chen
70a7134906
added override for sd testing env in event-exporter yaml 2019-08-20 16:29:15 -04:00
draveness
495faa22db feat: cleanup pod critical pod annotations feature 2019-08-09 08:41:23 +08:00
draveness
d83526d253 Revert "feat: cleanup pod critical pod annotations feature"
This reverts commit b6d41ee5cc.
2019-07-18 13:31:12 +08:00
draveness
b6d41ee5cc feat: cleanup pod critical pod annotations feature 2019-07-11 08:54:19 +08:00
Marian Lobur
60e5717f4f Bump image of event-exporter.
Image has a new base image that have some security issue fixes.
2019-05-13 16:27:25 +02:00
Marek Siarkowicz
37381eb384 Pick up security patches for fluentd-gcp-scaler by upgrading to version 0.5.2 2019-04-18 11:52:53 +02:00
Marek Siarkowicz
9e9b906047 Update gcp images with security patches
[stackdriver addon] Bump prometheus-to-sd to v0.5.0 to pick up security fixes.
[fluentd-gcp addon] Bump fluentd-gcp-scaler to v0.5.1 to pick up security fixes.
[fluentd-gcp addon] Bump event-exporter to v0.2.4 to pick up security fixes.
[fluentd-gcp addon] Bump prometheus-to-sd to v0.5.0 to pick up security fixes.
[metatada-proxy addon] Bump prometheus-to-sd v0.5.0 to pick up security fixes.
2019-03-15 09:24:32 +01:00
Kubernetes Prow Robot
45e5f6053b
Merge pull request #74424 from liggitt/drop-k8s-io-node-labels
Clean up self-set node labels
2019-03-06 08:24:26 -08:00
Jordan Liggitt
8975233788 Finish migration of fluentd to daemonset 2019-02-26 11:42:23 -05:00
Florent Delannoy
e627474e8f Fix fluentd-gcp addon liveness probe
Fix three issues with the fluentd-gcp liveness probe:

h1. STUCK_THRESHOLD_SECONDS was overridden by LIVENESS_THRESHOLD_SECONDS
if defined

Probably a copy/paste issue introduced in edf1ffc074

h1. `[[` is [a bashism](https://stackoverflow.com/a/47576482), and will always failed when called with `/bin/sh`

Introduced by a844523c20

Given that we call the liveness probe with `/bin/sh`, we cannot use the
double-bracketed `[[` syntax for test, as it is not POSIX-compliant and
will throw an error.

Annoyingly, even through it prints an error, `sh` returns with exit code 0
in this case:

```bash
root@fluentd-7mprs:/# sh liveness.sh
liveness.sh: 8: liveness.sh: [[: not found
liveness.sh: 15: liveness.sh: [[: not found
root@fluentd-7mprs:/# echo $?
0
```

Which means the liveness probe is considered successful by Kubernetes,
despite failing to test things as it was intended. This is also
probably the reason why this bug wasn't reported sooner :)

Thankfully, the test in this case can just as easily be written as
POSIX-compliant as it doesn't use any bash-specific features within the
`[[` block.

h1. Buffers are transient and cannot be relied upon for monitoring

Finally, after fixing the above issue, we started seeing the fluentd
containers being restarted very often, and found an issue with the
underlying logic of the liveness probe.

The probe checks that the pod is still alive by running the following
command:

`find /var/log/fluentd-buffers -type f -newer /tmp/marker-stuck -print -quit`

This checks if any _regular_ file exists under `/var/log/fluentd-buffers`
that is more recent than a predetermined time, and will return an empty
string otherwise.

The issue is that these buffers are temporary and volatile, they get created and
deleted constantly. Here is an example of running that check every second on a
running fluentd:

```
root@fluentd-eks-playground-jdc8m:/# LIVENESS_THRESHOLD_SECONDS=${LIVENESS_THRESHOLD_SECONDS:-300};
root@fluentd-eks-playground-jdc8m:/# STUCK_THRESHOLD_SECONDS=${LIVENESS_THRESHOLD_SECONDS:-900};
root@fluentd-eks-playground-jdc8m:/# touch -d "${STUCK_THRESHOLD_SECONDS} seconds ago" /tmp/marker-stuck;
root@fluentd-eks-playground-jdc8m:/# touch -d "${LIVENESS_THRESHOLD_SECONDS} seconds ago" /tmp/marker-liveness;
root@fluentd-eks-playground-jdc8m:/# while true; do date ; find /var/log/fluentd-buffers -type f -newer /tmp/marker-stuck -print -quit ; sleep 1 ; done
Fri Feb 22 10:52:57 UTC 2019
Fri Feb 22 10:52:58 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827964ccf4c7004103c3fa7c8533f85.log
Fri Feb 22 10:52:59 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827964ccf4c7004103c3fa7c8533f85.log
Fri Feb 22 10:53:00 UTC 2019
Fri Feb 22 10:53:01 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827964fb8b2eedcccd2763ea7775cc2.log
Fri Feb 22 10:53:02 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827964fb8b2eedcccd2763ea7775cc2.log
Fri Feb 22 10:53:03 UTC 2019
Fri Feb 22 10:53:04 UTC 2019
Fri Feb 22 10:53:05 UTC 2019
Fri Feb 22 10:53:06 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827965564883997b673d703af54848b.log
Fri Feb 22 10:53:07 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827965564883997b673d703af54848b.log
Fri Feb 22 10:53:08 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827965564883997b673d703af54848b.log
Fri Feb 22 10:53:09 UTC 2019
Fri Feb 22 10:53:10 UTC 2019
Fri Feb 22 10:53:11 UTC 2019
Fri Feb 22 10:53:12 UTC 2019
Fri Feb 22 10:53:13 UTC 2019
Fri Feb 22 10:53:14 UTC 2019
Fri Feb 22 10:53:15 UTC 2019
Fri Feb 22 10:53:16 UTC 2019
```

We can see buffers being created, then disappearing. The LivenessProbe running
under these conditions has a ~50% chance of failing, despite fluentd being
perfectly happy.

I believe that check is probably ok for fluentd installs using large
amounts of buffers, in which case the liveness probe will be correct more
often than not, but fluentd installs that use buffering less intensively
will be negatively impacted by this.

My solution to fix this is to check the last updated time of buffering
_folders_ within `/var/log/fluentd_buffers`. These _do_ get updated when
buffers are created, and do not get deleted as buffers are emptied,
making them the perfect candidate for our use.

Here's an example with the `-d` flag for directories:
```
root@fluentd-eks-playground-jdc8m:/# while true; do date ; find /var/log/fluentd-buffers -type d -newer /tmp/marker-stuck -print -quit ; sleep 1 ; done
Fri Feb 22 10:57:51 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:52 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:53 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:54 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:55 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:56 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:57 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:58 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:59 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:58:00 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:58:01 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:58:02 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:58:03 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
```

And example of the directory being updated as new buffers come in:
```
root@fluentd-eks-playground-jdc8m:/# ls -lah /var/log/fluentd-buffers/kubernetes.system.buffer
total 0
drwxr-xr-x 2 root root  6 Feb 22 11:17 .
drwxr-xr-x 3 root root 38 Feb 22 11:14 ..
root@fluentd-eks-playground-jdc8m:/# ls -lah /var/log/fluentd-buffers/kubernetes.system.buffer
total 16K
drwxr-xr-x 2 root root  224 Feb 22 11:18 .
drwxr-xr-x 3 root root   38 Feb 22 11:14 ..
-rw-r--r-- 1 root root 1.8K Feb 22 11:18 buffer.b58279be6e21e8b29fc333a7d50096ed0.log
-rw-r--r-- 1 root root  215 Feb 22 11:18 buffer.b58279be6e21e8b29fc333a7d50096ed0.log.meta
-rw-r--r-- 1 root root  429 Feb 22 11:18 buffer.b58279be6f09bdfe047a96486a525ece2.log
-rw-r--r-- 1 root root  195 Feb 22 11:18 buffer.b58279be6f09bdfe047a96486a525ece2.log.meta
root@fluentd-eks-playground-jdc8m:/# ls -lah /var/log/fluentd-buffers/kubernetes.system.buffer
total 0
drwxr-xr-x 2 root root  6 Feb 22 11:18 .
drwxr-xr-x 3 root root 38 Feb 22 11:14 ..
```
2019-02-25 11:48:31 +00:00
Roy Lenferink
b43c04452f Updated OWNERS files to include link to docs 2019-02-04 22:33:12 +01:00
Yu-Ju Hong
9c892243f6 GCE: update addon DaemonSets to select node OS
These DaemonSets supports only Linux today, so this change updates the
specs to reflect this limitation. The labels have recently been promoted
to GA. Using the beta labels for now until node-master version skew
problem no longer exists.
2019-01-23 09:01:40 -08:00
Kubernetes Prow Robot
a938f8b25e
Merge pull request #72243 from cezarygerard/patch-1
[GCP] Update scaler-deployment.yaml CPU_LIMITS
2019-01-05 05:08:15 -08:00
Jordan Liggitt
d2c1fdbcfa Fixup apps/v1 addon manifests 2018-12-26 15:19:01 -05:00
Cezary Zawadka
1affe568e9
replace single quotes with double quotes in yaml 2018-12-20 15:23:41 +01:00
Jordan Liggitt
cc680273e8 Change add-on manifests to apps/v1 2018-12-19 17:30:59 -05:00
Cezary Zawadka
7b3946776c
Update scaler-deployment.yaml CPU_LIMITS
setting CPU_LIMITS to '1' fixes the following log appearing every 60 seconds:
Running: kubectl set resources -n kube-system ds fluentd-gcp-v3.1.0 -c fluentd-gcp --requests=cpu=100m,memory=200Mi --limits=cpu=1000m,memory=500Mi
error: info: {extensions v1beta1 daemonsets} "fluentd-gcp-v3.1.0" was not changed

this PR does not change scaler's behaviour, pods are scaled correctly despite error in the logs
2018-12-19 21:00:22 +01:00
k8s-ci-robot
396271cf52
Merge pull request #70954 from qingling128/master
Upgrade Stackdriver Logging Agent addon image to 0.6-1.6.0-1 to use Fluentd v1.2.
2018-11-25 23:09:07 -08:00
k8s-ci-robot
a19bf332de
Merge pull request #71124 from Random-Liu/make-fluentd-container-runtime-service-configurable
Make fluentd container runtime service configurable.
2018-11-21 07:49:42 -08:00
Mike Danese
98c468de8d update PSPs to allow projected volumes 2018-11-16 19:32:44 +00:00
Lantao Liu
1670b4089a Make fluentd container runtime service configurable. 2018-11-16 02:17:55 -08:00
Ling Huang
02b7ed3291 Upgrade Stackdriver Logging Agent addon image to 0.6-1.6.0-1 to use Fluentd v1.2. 2018-11-12 13:21:44 -05:00
Ling Huang
85d8b5069b Add tolerations for Stackdriver Logging and Metadata Agents. 2018-10-12 11:15:33 -04:00
k8s-ci-robot
1aef63124b
Merge pull request #68920 from qingling128/master
Enable insertId generation, and update Stackdriver Logging Agent image to 0.5-1.5.36-1-k8s.
2018-10-11 13:44:51 -07:00
Ling Huang
d8da1baf48 Enable insertId generation, update Stackdriver Logging Agent image to 0.5-1.5.36-1-k8s and add priorityClassName for Metadata Agent. 2018-10-09 13:42:40 -04:00
Daniel Kłobuszewski
9454876318 Bump version of fluentd-gcp-scaler 2018-09-19 17:15:05 +02:00
Kubernetes Submit Queue
e2d6362c09
Merge pull request #67691 from loburm/security_fixes
Automatic merge from submit-queue (batch tested with PRs 67691, 68147). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.

Bump versions of components with latest security patches.

**What this PR does / why we need it**:
Upgrade versions of monitoring components used on GCP, to include latest security patches.

**Release note**:
```release-note
[fluentd-gcp-scaler addon] Bump fluentd-gcp-scaler to 0.4 to pick up security fixes.
[prometheus-to-sd addon] Bump prometheus-to-sd to 0.3.1 to pick up security fixes, bug fixes and new features.
[event-exporter addon] Bump event-exporter to 0.2.3 to pick up security fixes.
```
2018-09-05 09:49:31 -07:00
Kubernetes Submit Queue
888546c325
Merge pull request #68029 from neolit123/fluentd-owners
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.

cluster/addons: add labels to fluentd owner files

**What this PR does / why we need it**:
this PR adds SIG labels to fluentd OWNER files:
- cluster/addons/fluentd-elasticsearch/OWNERS
- cluster/addons/fluentd-gcp/OWNERS

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:
let me know if the labels need adjustment.

**Release note**:

```release-note
NONE
```

/assign @roberthbailey @mikedanese 
/cc @timothysc 
/sig gcp
/sig instrumentation
/kind cleanup
2018-09-02 12:51:38 -07:00
Arnold Szederjesi
fcdef3ffcc Put fluentd back to host network 2018-08-30 10:44:04 +02:00
Lubomir I. Ivanov
aefb5b3c0e cluser/addons: add labels to fluentd owner files 2018-08-30 00:38:08 +03:00
Marian Lobur
ffa934a939 Bump versions of components with latest security patches. 2018-08-22 11:27:36 +02:00
liangwei
5ea138f4e9 remove rescheduler 2018-08-22 11:49:14 +08:00