Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Enable scaling fluentd-gcp resources using ScalingPolicy.
See https://github.com/justinsb/scaler for more details about ScalingPolicy resource.
**What this PR does / why we need it**:
This is adding a way to override fluentd-gcp resources in a running cluster. The resources syncing for fluentd-gcp is decoupled from addon manager.
**Special notes for your reviewer**:
**Release note**:
```release-note
fluentd-gcp resources can be modified via a ScalingPolicy
```
cc @kawych @justinsb
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Upload container runtime log to sd/es.
I've verified this in my environment. My stackdriver has an extra `container-runtime` entry for node log, and it collects container runtime daemon log correctly.
@yujuhong @feiskyer @crassirostris @piosz
@kubernetes/sig-node-pr-reviews @kubernetes/sig-instrumentation-pr-reviews
Signed-off-by: Lantao Liu <lantaol@google.com>
**Release note**:
```release-note
Container runtime daemon (e.g. dockerd) logs in GCE cluster will be uploaded to stackdriver and elasticsearch with tag `container-runtime`
```
Automatic merge from submit-queue (batch tested with PRs 59767, 56454, 59237, 59730, 55479). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Change critical pods’ template to use priority
**What this PR does / why we need it**:
Change critical pods’ template to use priority
Thanks.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
ref #57471
**Special notes for your reviewer**:
**Release note**:
```release-note
```
This is the 2nd attempt. The previous was reverted while we figured out
the regional mirrors (oops).
New plan: k8s.gcr.io is a read-only facade that auto-detects your source
region (us, eu, or asia for now) and pulls from the closest. To publish
an image, push k8s-staging.gcr.io and it will be synced to the regionals
automatically (similar to today). For now the staging is an alias to
gcr.io/google_containers (the legacy URL).
When we move off of google-owned projects (working on it), then we just
do a one-time sync, and change the google-internal config, and nobody
outside should notice.
We can, in parallel, change the auto-sync into a manual sync - send a PR
to "promote" something from staging, and a bot activates it. Nice and
visible, easy to keep track of.
fluend-gcp already has these tolerations. kube-proxy when it runs as a
static pod gets wildcard `NoExecute` toleration (all static pods get
that). So, added the same toleration to kube-proxy when it runs as a
daemonset. Also added wildcard `NoSchedule` toleration to kube-proxy.
Automatic merge from submit-queue (batch tested with PRs 56207, 55950). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix setting resources in fluentd-gcp plugin
Currently if some of the variables are not set, scripts prints error, which is not critical, since the function is executed in a separate process, but it leads to the wrong resulting values
```release-note
NONE
```
/cc @piosz @x13n
/assign @roberthbailey @mikedanese
Could you please approve?
Automatic merge from submit-queue (batch tested with PRs 54602, 54877, 55243, 55509, 55128). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
PodSecurityPolicies for addons
**What this PR does / why we need it**:
1. Colocate addon PodSecurityPolicy config with the addons (in a `podsecuritypolicies` subdirectory).
2. Add policies for addons that are currently missing policies (not in the default GCE suite)
3. Remove HostPath SSL certs from several heapster deployments, so that heapster doesn't require a special PSP
**Which issue(s) this PR fixes**:
#43538
**Release note**:
```release-note
- Add PodSecurityPolicies for cluster addons
- Remove SSL cert HostPath volumes from heapster addons
```
Automatic merge from submit-queue (batch tested with PRs 54635, 54250, 54657, 54696, 54700). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Bump version of prometheus-to-sd to 0.2.2.
Bump version of prometheus-to-sd to improve logging, add pod_name and
pod_namespace flags and remove deprecated flags.
Fixes#54583
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50776, 54395). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Move fluentd-gcp out of host network
Since metadata proxy doesn't filter service account after all, make fluentd-gcp addon run in its own network
This will mitigate the problem with port collision
```release-note
[fluentd-gcp addon] Fluentd now runs in its own network, not in the host one.
```
- Use a dedicated service account to run the fluentd-gcp DS
- Update prometheus-to-sd from v0.1.3 to v0.2.1
- Use the certificates in the prometheus-to-sd image rather than mounting the host certs
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
[fluentd-gcp addon] Update Stackdriver plugin to version 0.6.7
A new gem among all fixes Java logging severity parsing and string timestamp parsing
Also sync the buffer size with the gem guidelines, making it 1M instead of 2M.
/cc @igorpeshansky
Automatic merge from submit-queue (batch tested with PRs 51041, 52297, 52296, 52335, 52338)
[fluentd-gcp addon] Restore the metric for the number of read log entries
This metric, previously removed, will allow to monitor the number of log entries, that were read, but weren't sent by the output plugin because of liveness probe removing the data.
Automatic merge from submit-queue (batch tested with PRs 50489, 51070, 51011, 51022, 51141)
update to rbac v1 in yaml file
**What this PR does / why we need it**:
ref to https://github.com/kubernetes/kubernetes/pull/49642
ref https://github.com/kubernetes/features/issues/2
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
cc @liggitt
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 48812, 48276)
Change fluentd-gcp monitoring to use metrics exposed by SD plugin
Following https://github.com/GoogleCloudPlatform/fluent-plugin-google-cloud/pull/135, make fluentd-gcp expose metrics in Prometheus registry and use them instead of counting records in the pipeline.
/cc @piosz @igorpeshansky
```release-note
Fluentd-gcp DaemonSet exposes different set of metrics.
```
Automatic merge from submit-queue
Update docs for user-guide
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 47860, 47170)
Make fluentd log to stdio instead of a dedicated file
Lower verbosity also, to reduce volume of system logs exported to the backend.
Fix https://github.com/kubernetes/kubernetes/issues/43772
/cc @piosz
Automatic merge from submit-queue
Make fluentd-gcp run with host network
Fluentd-gcp should have access to instance's platform-dependent service account in order to work.
/cc @piosz
Automatic merge from submit-queue
Add event exporter deployment to the fluentd-gcp addon
Introduce event exporter deployment to the fluentd-gcp addon so that by default if logging to Stackdriver is enabled, events will be available there also.
In this release, event exporter is a non-critical pod in BestEffort QoS class to avoid preempting actual workload in tightly loaded clusters. It will become critical in one of the future releases.
```release-note
Stackdriver cluster logging now deploys a new component to export Kubernetes events.
```
Automatic merge from submit-queue (batch tested with PRs 42070, 42127)
Remove fluentd-gcp image sources
This PR removes fluentd-gcp image sources from the main kubernetes repo to move it the `contrib`: https://github.com/kubernetes/contrib/pull/2426
Once image is moved, it will be maintained by Stackdriver team (@igorpeshansky, @qingling128 and @dhrupadb)
CC @ixdy @timstclair
Automatic merge from submit-queue (batch tested with PRs 42126, 42130, 42232, 42245, 41932)
Update fluentd-gcp configuration for hosted masters
This PR makes use of the new fluentd-gcp image, which is not configured per se, for the hosted masters, which cannot use configmaps.
Mirroring https://github.com/kubernetes/kubernetes/pull/42126
Automatic merge from submit-queue
Move fluentd DS config to configmap
This is the logical continuation of https://github.com/kubernetes/kubernetes/pull/41998. This PR makes fluentd-gcp DaemonSet use the new image configured using ConfigMap.
This PR doesn't change the way fluentd-gcp works in case master is not registered, that'll be fixed in a separate PR
CC @ixdy @timstclair @igorpeshansky @qingling128 @dhrupadb
**Release note:**
```release-note
Fluentd-gcp containers spawned by DaemonSet are now configured using ConfigMap
```
Automatic merge from submit-queue
Cleanup fluentd-gcp image, rebase on debian-base
**Why we need this PR**:
There are several problems with our current fluentd-gcp image:
- It pulls in lots of unused packages, which expose unnecessary risk and create noise in CVE scans (and scare customers). The most notable example is the fluent-ui, which pulls in rails.
- `curl | sh ` is not a good practice for a Dockerfile. First, the script is not checked in the same source control branch, so builds are not reproducible. Second, the actions it is taking are opaque. Third, in this case, using non-standard packages means they're harder to manage with CVE scans & upstream fixes.
**What is changed by this PR?**
- Rather than relying on td-agent (which includes fluent-ui), use standard upstream packages. This is largely based off the [official fluentd debian-based image](https://github.com/fluent/fluentd-docker-image/blob/master/v0.12/debian/Dockerfile).
- Rebases the image on debian-base (depends on https://github.com/kubernetes/kubernetes/pull/41915). We would like to move towards a single full-distro base image we can maintain. This change should be relatively minor.
As a result of these changes, the image size is reduced from 360.6 MB to 185.8 MB (nearly half). Many packages were removed, and the full diff (focus on the unversioned files) is listed here: 3fb704f977
**Which issue this PR fixes** https://github.com/kubernetes/kubernetes/issues/40248
**Special notes for your reviewer**:
This change both addresses security concerns, and is expected to greatly reduce the maintenance burden of the fluentd-gcp image. I'd *really* like to get this into 1.6, so please prioritize this review if possible.
I tested this by running the default e2e suite on a private e2e cluster using the new image. If there are other tests you'd like me to run, please let me know ASAP.
**Release note**:
```release-note
Cleanup fluentd-gcp image: rebase on debian-base, switch to upstream packages, remove fluent-ui & rails
```
Automatic merge from submit-queue (batch tested with PRs 39855, 41433, 41567, 41887, 41652)
Add fluentd monitoring to fluentd-gcp image
Right now we are not able to monitor the state of fluentd in cluster, which may result in logging subsystem quietly failing. This PR tries to address that problem by introducing the fluentd container monitoring:
* fluentd internal metrics, like number of buffers and number of data in buffers
* `logging_line_count`, number of lines, read by fluentd from application containers' logs
* Has `tag` label, corresponding to the fluentd tag of the entry
* `logging_entry_count`, number of entries, emitted to the output plugin
* With label `component` set to `container`, generated by application containers
* With label `component` set to `system`, generated by system components like kubelet, docker, scheduler, etc.
* Has `tag` label, corresponding to the fluentd tag of the entry
CC @fabxc @igorpeshansky @edsiper
Automatic merge from submit-queue
Bump fluentd-gcp google_cloud plugin version
Bump the version of `fluent-plugin-google-cloud` in fluentd-gcp image, because it's broken for version `0.5.2`.
Recently, gem `google-api-client` was updated to version `0.10.0`. The new version broke `fluent-plugin-google-cloud` which doesn't specify the upper version of `google-api-client` gem. I'm bumping the version used in our image to allow future changes in this release to be run and tested.
This PR doesn't bump the version, since no effective changes has happened, leaving this for the next PR to do.
CC @igorpeshansky
Automatic merge from submit-queue (batch tested with PRs 40000, 41508, 41489)
Add toleration to fluentd daemonset to make it run on master
Because of https://github.com/kubernetes/kubernetes/pull/41172 fluentd pods stopped being allocated on master node.
This PR introduces toleration for master taint for fluentd.
CC @davidopp @janetkuo @kubernetes/sig-scheduling-bugs
Unfortunately, we don't have e2e tests to ensure that master logs are being ingested. This problem is a great signal to work on https://github.com/kubernetes/kubernetes/issues/41411
These files have been created lately, so we don't have much information
about them anyway, so let's just:
- Remove assignees and make them approvers
- Copy approves as reviewers
Automatic merge from submit-queue
Try parse golang logs by default
Glog by default logs to stderr, so Stackdriver Logging shows them all as errors. This PR makes fluentd try to parse messages using glog format and if succeeded, set timestamp and severity accordingly.
CC @piosz @fgrzadkowski
Automatic merge from submit-queue
Remove all MAINTAINER statements in the codebase as they are deprecated
**What this PR does / why we need it**:
ref: https://github.com/docker/docker/pull/25466
**Release note**:
```release-note
Remove all MAINTAINER statements in Dockerfiles in the codebase as they are deprecated by docker
```
@ixdy @thockin (who else should be notified?)
Automatic merge from submit-queue
Make fluentd pods critical
Related to https://github.com/kubernetes/kubernetes/issues/38322
Make fluentd critical so it will be evicted with less probability.
CC @piosz @fgrzadkowski
We can then avoid the following warning:
```
WARNING: The '--' argument must be specified between gcloud specific args on the left and DOCKER_ARGS on the right. IMPORTANT: previously, commands allowed the omission of the --, and unparsed arguments were treated as implementation args. This usage is being deprecated and will be removed in March 2017.
This will be strictly enforced in March 2017. Use 'gcloud beta docker' to see new behavior.
```
Signed-off-by: Jess Frazelle <acidburn@google.com>
Only run the systemd-journal plugin when on a platform that requests it.
The plugin crashes the fluentd process if the journal isn't present, so
it can't just be run blindly in all configurations.
It includes some performance improvements for parsing JSON (which is
very important for us, since all Docker logs are JSON) as well as a
couple new settings, like forcing of a flush of multiline logs after a
time period rather than having to wait until a new log is seen before
feeling confident flushing the previous one.
I didn't expect glog to split single log statements onto multiple lines,
but apparently it does if they're long enough. This groups them back
together appropriately.