kubernetes

mirror of https://github.com/k3s-io/kubernetes.git synced 2025-07-31 23:37:01 +00:00

Author	SHA1	Message	Date
David Ashpole	3813ed1ef7	fix prometheus-to-sd image for fluentbit	2021-05-27 10:54:10 -07:00
David Ashpole	febf9d9366	update event-exporter and prometheus-to-sd versions in cluster addons	2021-05-13 11:40:41 -07:00
Yu-Ju Hong	bcd975aa65	Replace Beta OS/arch labels with the GA ones Beta OS/arch labels have been deprecated since 1.14. This change replaces these labels with the GA ones.	2020-02-13 09:38:51 -08:00
draveness	495faa22db	feat: cleanup pod critical pod annotations feature	2019-08-09 08:41:23 +08:00
draveness	d83526d253	Revert "feat: cleanup pod critical pod annotations feature" This reverts commit `b6d41ee5cc`.	2019-07-18 13:31:12 +08:00
draveness	b6d41ee5cc	feat: cleanup pod critical pod annotations feature	2019-07-11 08:54:19 +08:00
Marek Siarkowicz	9e9b906047	Update gcp images with security patches [stackdriver addon] Bump prometheus-to-sd to v0.5.0 to pick up security fixes. [fluentd-gcp addon] Bump fluentd-gcp-scaler to v0.5.1 to pick up security fixes. [fluentd-gcp addon] Bump event-exporter to v0.2.4 to pick up security fixes. [fluentd-gcp addon] Bump prometheus-to-sd to v0.5.0 to pick up security fixes. [metatada-proxy addon] Bump prometheus-to-sd v0.5.0 to pick up security fixes.	2019-03-15 09:24:32 +01:00
Kubernetes Prow Robot	45e5f6053b	Merge pull request #74424 from liggitt/drop-k8s-io-node-labels Clean up self-set node labels	2019-03-06 08:24:26 -08:00
Jordan Liggitt	8975233788	Finish migration of fluentd to daemonset	2019-02-26 11:42:23 -05:00
Florent Delannoy	e627474e8f	Fix fluentd-gcp addon liveness probe Fix three issues with the fluentd-gcp liveness probe: h1. STUCK_THRESHOLD_SECONDS was overridden by LIVENESS_THRESHOLD_SECONDS if defined Probably a copy/paste issue introduced in `edf1ffc074` h1. `[[` is [a bashism](https://stackoverflow.com/a/47576482), and will always failed when called with `/bin/sh` Introduced by `a844523c20` Given that we call the liveness probe with `/bin/sh`, we cannot use the double-bracketed `[[` syntax for test, as it is not POSIX-compliant and will throw an error. Annoyingly, even through it prints an error, `sh` returns with exit code 0 in this case: ```bash root@fluentd-7mprs:/# sh liveness.sh liveness.sh: 8: liveness.sh: [[: not found liveness.sh: 15: liveness.sh: [[: not found root@fluentd-7mprs:/# echo $? 0 ``` Which means the liveness probe is considered successful by Kubernetes, despite failing to test things as it was intended. This is also probably the reason why this bug wasn't reported sooner :) Thankfully, the test in this case can just as easily be written as POSIX-compliant as it doesn't use any bash-specific features within the `[[` block. h1. Buffers are transient and cannot be relied upon for monitoring Finally, after fixing the above issue, we started seeing the fluentd containers being restarted very often, and found an issue with the underlying logic of the liveness probe. The probe checks that the pod is still alive by running the following command: `find /var/log/fluentd-buffers -type f -newer /tmp/marker-stuck -print -quit` This checks if any _regular_ file exists under `/var/log/fluentd-buffers` that is more recent than a predetermined time, and will return an empty string otherwise. The issue is that these buffers are temporary and volatile, they get created and deleted constantly. Here is an example of running that check every second on a running fluentd: ``` root@fluentd-eks-playground-jdc8m:/# LIVENESS_THRESHOLD_SECONDS=${LIVENESS_THRESHOLD_SECONDS:-300}; root@fluentd-eks-playground-jdc8m:/# STUCK_THRESHOLD_SECONDS=${LIVENESS_THRESHOLD_SECONDS:-900}; root@fluentd-eks-playground-jdc8m:/# touch -d "${STUCK_THRESHOLD_SECONDS} seconds ago" /tmp/marker-stuck; root@fluentd-eks-playground-jdc8m:/# touch -d "${LIVENESS_THRESHOLD_SECONDS} seconds ago" /tmp/marker-liveness; root@fluentd-eks-playground-jdc8m:/# while true; do date ; find /var/log/fluentd-buffers -type f -newer /tmp/marker-stuck -print -quit ; sleep 1 ; done Fri Feb 22 10:52:57 UTC 2019 Fri Feb 22 10:52:58 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827964ccf4c7004103c3fa7c8533f85.log Fri Feb 22 10:52:59 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827964ccf4c7004103c3fa7c8533f85.log Fri Feb 22 10:53:00 UTC 2019 Fri Feb 22 10:53:01 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827964fb8b2eedcccd2763ea7775cc2.log Fri Feb 22 10:53:02 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827964fb8b2eedcccd2763ea7775cc2.log Fri Feb 22 10:53:03 UTC 2019 Fri Feb 22 10:53:04 UTC 2019 Fri Feb 22 10:53:05 UTC 2019 Fri Feb 22 10:53:06 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827965564883997b673d703af54848b.log Fri Feb 22 10:53:07 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827965564883997b673d703af54848b.log Fri Feb 22 10:53:08 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827965564883997b673d703af54848b.log Fri Feb 22 10:53:09 UTC 2019 Fri Feb 22 10:53:10 UTC 2019 Fri Feb 22 10:53:11 UTC 2019 Fri Feb 22 10:53:12 UTC 2019 Fri Feb 22 10:53:13 UTC 2019 Fri Feb 22 10:53:14 UTC 2019 Fri Feb 22 10:53:15 UTC 2019 Fri Feb 22 10:53:16 UTC 2019 ``` We can see buffers being created, then disappearing. The LivenessProbe running under these conditions has a ~50% chance of failing, despite fluentd being perfectly happy. I believe that check is probably ok for fluentd installs using large amounts of buffers, in which case the liveness probe will be correct more often than not, but fluentd installs that use buffering less intensively will be negatively impacted by this. My solution to fix this is to check the last updated time of buffering _folders_ within `/var/log/fluentd_buffers`. These _do_ get updated when buffers are created, and do not get deleted as buffers are emptied, making them the perfect candidate for our use. Here's an example with the `-d` flag for directories: ``` root@fluentd-eks-playground-jdc8m:/# while true; do date ; find /var/log/fluentd-buffers -type d -newer /tmp/marker-stuck -print -quit ; sleep 1 ; done Fri Feb 22 10:57:51 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer Fri Feb 22 10:57:52 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer Fri Feb 22 10:57:53 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer Fri Feb 22 10:57:54 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer Fri Feb 22 10:57:55 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer Fri Feb 22 10:57:56 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer Fri Feb 22 10:57:57 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer Fri Feb 22 10:57:58 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer Fri Feb 22 10:57:59 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer Fri Feb 22 10:58:00 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer Fri Feb 22 10:58:01 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer Fri Feb 22 10:58:02 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer Fri Feb 22 10:58:03 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer ``` And example of the directory being updated as new buffers come in: ``` root@fluentd-eks-playground-jdc8m:/# ls -lah /var/log/fluentd-buffers/kubernetes.system.buffer total 0 drwxr-xr-x 2 root root 6 Feb 22 11:17 . drwxr-xr-x 3 root root 38 Feb 22 11:14 .. root@fluentd-eks-playground-jdc8m:/# ls -lah /var/log/fluentd-buffers/kubernetes.system.buffer total 16K drwxr-xr-x 2 root root 224 Feb 22 11:18 . drwxr-xr-x 3 root root 38 Feb 22 11:14 .. -rw-r--r-- 1 root root 1.8K Feb 22 11:18 buffer.b58279be6e21e8b29fc333a7d50096ed0.log -rw-r--r-- 1 root root 215 Feb 22 11:18 buffer.b58279be6e21e8b29fc333a7d50096ed0.log.meta -rw-r--r-- 1 root root 429 Feb 22 11:18 buffer.b58279be6f09bdfe047a96486a525ece2.log -rw-r--r-- 1 root root 195 Feb 22 11:18 buffer.b58279be6f09bdfe047a96486a525ece2.log.meta root@fluentd-eks-playground-jdc8m:/# ls -lah /var/log/fluentd-buffers/kubernetes.system.buffer total 0 drwxr-xr-x 2 root root 6 Feb 22 11:18 . drwxr-xr-x 3 root root 38 Feb 22 11:14 .. ```	2019-02-25 11:48:31 +00:00
Yu-Ju Hong	9c892243f6	GCE: update addon DaemonSets to select node OS These DaemonSets supports only Linux today, so this change updates the specs to reflect this limitation. The labels have recently been promoted to GA. Using the beta labels for now until node-master version skew problem no longer exists.	2019-01-23 09:01:40 -08:00
Jordan Liggitt	cc680273e8	Change add-on manifests to apps/v1	2018-12-19 17:30:59 -05:00
Ling Huang	85d8b5069b	Add tolerations for Stackdriver Logging and Metadata Agents.	2018-10-12 11:15:33 -04:00
Ling Huang	d8da1baf48	Enable insertId generation, update Stackdriver Logging Agent image to 0.5-1.5.36-1-k8s and add priorityClassName for Metadata Agent.	2018-10-09 13:42:40 -04:00
Kubernetes Submit Queue	e2d6362c09	Merge pull request #67691 from loburm/security_fixes Automatic merge from submit-queue (batch tested with PRs 67691, 68147). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md. Bump versions of components with latest security patches. What this PR does / why we need it: Upgrade versions of monitoring components used on GCP, to include latest security patches. Release note: ```release-note [fluentd-gcp-scaler addon] Bump fluentd-gcp-scaler to 0.4 to pick up security fixes. [prometheus-to-sd addon] Bump prometheus-to-sd to 0.3.1 to pick up security fixes, bug fixes and new features. [event-exporter addon] Bump event-exporter to 0.2.3 to pick up security fixes. ```	2018-09-05 09:49:31 -07:00
Arnold Szederjesi	fcdef3ffcc	Put fluentd back to host network	2018-08-30 10:44:04 +02:00
Marian Lobur	ffa934a939	Bump versions of components with latest security patches.	2018-08-22 11:27:36 +02:00
Bryan Moyles	32c2bfadfd	A large set of improvements to the Stackdriver components. Metadata Agent Improvements Bump metadata agent version to 0.2-0.0.21-1. Expand the metadata agent's access to all API groups. Remove metadata agent config maps in favor of command line flags. Update the metadata agent's liveness probe to a new /healthz handler. Logging Agent Improvements Bump logging agent version to 0.2-1.5.33-1-k8s-1. Appropriately set log severity for k8s_container. Fix detect exceptions plugin to analyze message field instead of log field. Fix detect exceptions plugin to analyze streams based on local resource id. Disable the metadata agent for monitored resource construction in logging. Disable timestamp adjustment in logs to optimize performance. Reduce logging agent buffer chunk limit to 512k to optimize performance.	2018-08-06 11:26:35 -04:00
Daniel Kłobuszewski	7773f8f5eb	Increase fluentd-gcp grace termination period to 1min By default, all pods have 30s for graceful termination. This gives fluentd additional 30s to export logs when the node is shutting down.	2018-06-14 10:44:13 +02:00
Bryan Moyles	a0a7686e38	Use the logging agent's node name as the metadata agent URL.	2018-05-02 10:12:35 +02:00
Ling Huang	cbec62ada4	Add support to ingest log entries to Stackdriver against new "k8s_container" and "k8s_node" resources.	2018-04-06 08:47:19 -04:00
Mik Vyatskov	d6cef02a9d	Revert "Enable partial success in fluentd-gcp"	2018-03-29 11:48:01 +02:00
Mik Vyatskov	c8773044ea	Enable partial success in fluentd-gcp Signed-off-by: Mik Vyatskov <vmik@google.com>	2018-03-27 15:51:16 +02:00
Shyam Jeedigunta	123fa5c706	Revert "Increase fluentd rolling-upgrade maxUnavailable to large value" This reverts commit `7dd6adc438`.	2018-03-26 15:17:54 +02:00
Shyam Jeedigunta	7dd6adc438	Increase fluentd rolling-upgrade maxUnavailable to large value	2018-03-22 12:33:42 +01:00
Kubernetes Submit Queue	7e063329f3	Merge pull request #60722 from filbranden/fluentd1 Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Remove mapping to /host/lib from fluentd-gcp container. What this PR does / why we need it: This mapping is no longer needed since fluentd-gcp v2.0.16, in which it started using a container image based on Debian Stretch, in which the systemd libraries already include support for all the supported compression algorithms. The `/run.sh` in the image no longer accesses `/host/lib` anyways, so let's stop mapping it here. Related changes: - fluentd-gcp on GoogleCloudPlatform/k8s-stackdriver#101 - fluentd-es on GoogleCloudPlatform/google-fluentd#80 /assign @timstclair /cc @crassirostris @bmoyles0117 Which issue(s) this PR fixes (optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged): N/A Special notes for your reviewer: N/A Release note: ```release-note NONE ```	2018-03-16 03:38:28 -07:00
Bryan Moyles	a844523c20	Find most recent modified date for fluentd buffers recursively. Due to updates in Fluent v0.14, the buffers directory modified date is no longer updated when files inside the directory are changed. Therefore we must find the most recent modified date recursively to fix liveness probe.	2018-03-12 15:28:55 -04:00
Filipe Brandenburger	cea4c98508	Remove mapping to /host/lib from fluentd-gcp container. This mapping is no longer needed since fluentd-gcp v2.0.16, in which it started using a container image based on Debian Stretch, in which the systemd libraries already include support for all the supported compression algorithms. The /run.sh in the image no longer accesses /host/lib anyways, so let's stop mapping it here. Related changes: - fluentd-gcp on GoogleCloudPlatform/k8s-stackdriver#101 - fluentd-es on GoogleCloudPlatform/google-fluentd#80	2018-03-02 10:20:08 -08:00
Bryan Moyles	84a86cffce	Update to use Stackdriver Agent image. Prometheus is enabled by default.	2018-02-26 14:05:33 -05:00
Daniel Kłobuszewski	a88ddac1e4	use prometheus-to-sd 0.2.4 and fluentd-gcp-image 2.0.16	2018-02-16 09:16:59 +01:00
Kubernetes Submit Queue	d3bacb914c	Merge pull request #59657 from x13n/manual-fluentd-gcp-scaler Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Enable scaling fluentd-gcp resources using ScalingPolicy. See https://github.com/justinsb/scaler for more details about ScalingPolicy resource. What this PR does / why we need it: This is adding a way to override fluentd-gcp resources in a running cluster. The resources syncing for fluentd-gcp is decoupled from addon manager. Special notes for your reviewer: Release note: ```release-note fluentd-gcp resources can be modified via a ScalingPolicy ``` cc @kawych @justinsb	2018-02-15 03:42:14 -08:00
Kubernetes Submit Queue	bc9c6df31d	Merge pull request #59103 from Random-Liu/upload-container-runtime-log Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Upload container runtime log to sd/es. I've verified this in my environment. My stackdriver has an extra `container-runtime` entry for node log, and it collects container runtime daemon log correctly. @yujuhong @feiskyer @crassirostris @piosz @kubernetes/sig-node-pr-reviews @kubernetes/sig-instrumentation-pr-reviews Signed-off-by: Lantao Liu <lantaol@google.com> Release note: ```release-note Container runtime daemon (e.g. dockerd) logs in GCE cluster will be uploaded to stackdriver and elasticsearch with tag `container-runtime` ```	2018-02-14 03:33:21 -08:00
Lantao Liu	8d920d095c	Upload container runtime log to sd/es. Signed-off-by: Lantao Liu <lantaol@google.com>	2018-02-13 18:25:02 +00:00
Kubernetes Submit Queue	7ef11bd964	Merge pull request #59237 from tanshanshan/addons1 Automatic merge from submit-queue (batch tested with PRs 59767, 56454, 59237, 59730, 55479). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Change critical pods’ template to use priority What this PR does / why we need it: Change critical pods’ template to use priority Thanks. Which issue(s) this PR fixes (optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged): ref #57471 Special notes for your reviewer: Release note: ```release-note ```	2018-02-12 15:44:36 -08:00
Daniel Kłobuszewski	2eb24f9ae1	Enable scaling fluentd-gcp resources using ScalingPolicy. See https://github.com/justinsb/scaler for more details about ScalingPolicy resource.	2018-02-09 14:33:33 +01:00
tanshanshan	95b2b94b1b	Change critical pods’ template to use priority	2018-02-08 15:06:27 +08:00
Tim Hockin	3586986416	Switch to k8s.gcr.io vanity domain This is the 2nd attempt. The previous was reverted while we figured out the regional mirrors (oops). New plan: k8s.gcr.io is a read-only facade that auto-detects your source region (us, eu, or asia for now) and pulls from the closest. To publish an image, push k8s-staging.gcr.io and it will be synced to the regionals automatically (similar to today). For now the staging is an alias to gcr.io/google_containers (the legacy URL). When we move off of google-owned projects (working on it), then we just do a one-time sync, and change the google-internal config, and nobody outside should notice. We can, in parallel, change the auto-sync into a manual sync - send a PR to "promote" something from staging, and a bot activates it. Nice and visible, easy to keep track of.	2018-02-07 21:14:19 -08:00
Ross Light	6831581f1c	Bump fluentd-gcp version	2018-01-12 10:16:13 -08:00
Daniel Kłobuszewski	dca74f17fd	Bump fluentd-gcp image used to 2.0.13	2018-01-08 14:54:26 +01:00
Daniel Kłobuszewski	2eded687be	Bump fluentd-gcp version	2018-01-03 11:46:13 +01:00
Tim Hockin	e9dd8a68f6	Revert k8s.gcr.io vanity domain This reverts commit `eba5b6092a`. Fixes https://github.com/kubernetes/kubernetes/issues/57526	2017-12-22 14:36:16 -08:00
Tim Hockin	eba5b6092a	Use k8s.gcr.io vanity domain for container images	2017-12-18 09:18:34 -08:00
Daniel Kłobuszewski	d2cbc37c05	Bump fluentd-gcp version	2017-12-07 14:23:05 +01:00
Rohit Agarwal	ad05928c6e	Add wildcard tolerations to kube-proxy. fluend-gcp already has these tolerations. kube-proxy when it runs as a static pod gets wildcard `NoExecute` toleration (all static pods get that). So, added the same toleration to kube-proxy when it runs as a daemonset. Also added wildcard `NoSchedule` toleration to kube-proxy.	2017-11-29 12:36:58 -08:00
Mik Vyatskov	e9322b929c	Fix setting resources in fluentd-gcp plugin Signed-off-by: Mik Vyatskov <vmik@google.com>	2017-11-22 12:40:50 +01:00
Lantao Liu	53d7494b9e	Fix CRI fluentd config. Signed-off-by: Lantao Liu <lantaol@google.com>	2017-11-10 20:55:56 +00:00
Lantao Liu	70a0cdfa8e	Add CRI log format support in fluentd.	2017-10-30 06:25:52 +00:00
Kubernetes Submit Queue	949ec719c3	Merge pull request #54635 from loburm/prom-to-sd Automatic merge from submit-queue (batch tested with PRs 54635, 54250, 54657, 54696, 54700). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Bump version of prometheus-to-sd to 0.2.2. Bump version of prometheus-to-sd to improve logging, add pod_name and pod_namespace flags and remove deprecated flags. Fixes #54583 ```release-note NONE ```	2017-10-27 14:38:21 -07:00
Kubernetes Submit Queue	fc8bfe2d89	Merge pull request #54395 from crassirostris/fluentd-gcp-rollback-host-networking Automatic merge from submit-queue (batch tested with PRs 50776, 54395). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Move fluentd-gcp out of host network Since metadata proxy doesn't filter service account after all, make fluentd-gcp addon run in its own network This will mitigate the problem with port collision ```release-note [fluentd-gcp addon] Fluentd now runs in its own network, not in the host one. ```	2017-10-27 03:09:25 -07:00
Marian Lobur	5b62eb29d2	Bump version of prometheus-to-sd to 0.2.2. Bump version of prometheus-to-sd to improve logging, add pod_name and pod_namespace flags and remove deprecated flags.	2017-10-26 15:54:54 +02:00

1 2 3

107 Commits