Metadata Agent Improvements
Bump metadata agent version to 0.2-0.0.21-1.
Expand the metadata agent's access to all API groups.
Remove metadata agent config maps in favor of command line flags.
Update the metadata agent's liveness probe to a new /healthz handler.
Logging Agent Improvements
Bump logging agent version to 0.2-1.5.33-1-k8s-1.
Appropriately set log severity for k8s_container.
Fix detect exceptions plugin to analyze message field instead of log field.
Fix detect exceptions plugin to analyze streams based on local resource id.
Disable the metadata agent for monitored resource construction in logging.
Disable timestamp adjustment in logs to optimize performance.
Reduce logging agent buffer chunk limit to 512k to optimize performance.
Automatic merge from submit-queue (batch tested with PRs 63569, 63918, 63980, 63295, 63989). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
New event exporter config with support for new stackdriver resources
New event exporter, with support for use new and old stackdriver resource model.
This should also be cherry-picked to release-1.10 branch, as all fluentd-gcp components support new and stackdriver resource model.
```release-note
Update event-exporter to version v0.2.0 that supports old (gke_container/gce_instance) and new (k8s_container/k8s_node/k8s_pod) stackdriver resources.
```
Automatic merge from submit-queue (batch tested with PRs 61918, 62180, 62198). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Pass 2: k8s GCR vanity URL
Also push out the old URL deprecation since we have not started the community transition yet and there are some instances of it still floating about.
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add support to ingest log entries to Stackdriver against new "k8s_container" and "k8s_node" resources.
**What this PR does / why we need it**:
**Which issue(s) this PR fixes**
Fluentd 0.14 has some memory leak issues that caused the e2e tests to be flaky. Downgrading to v0.12.
**Special notes for your reviewer**:
We never released any previous version with Fluentd v0.14. Only upgraded it very recently. So this downgrading is not visible to users.
**Release note**:
```release-note
Add support to ingest log entries to Stackdriver against new "k8s_container" and "k8s_node" resources.
```
Automatic merge from submit-queue (batch tested with PRs 60465, 61773, 61371, 61146). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Enable partial success in fluentd-gcp
Enable partial success in fluentd-gcp. This will allow to reduce amount of lost data in case of invalid (e.g. too big) entries: instead of dropping the whole request, only failed entries will be dropped.
```release-note
[fluentd-gcp addon] Partial success option is enabled in fluentd.
```
/assign @x13n
/cc @bmoyles0117
Automatic merge from submit-queue (batch tested with PRs 60465, 61773, 61371, 61146). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Adding resource constraints for fluentd-gcp
**What this PR does / why we need it**:
Adds resource constraints to `fluentd-gcp`. Values mostly lifted from `fluentd-es`, cpu cap set to a sensible value after reviewing various threads.
**Which issue(s) this PR fixes**
Fixes#55416
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 61452, 61727, 61462, 61692, 61738). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Update event-exporter image
This is a follow-up of https://github.com/GoogleCloudPlatform/k8s-stackdriver/pull/126 to apply the latest patch to the base image of event-exporter.
```release-note
[fluentd-gcp addon] Update event-exporter image to have the latest base image.
```
/assign @x13n
Could you please take a look?
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Update GCP fluentd configmap for COS audit logging on GKE node
**What this PR does / why we need it**:
This PR adds a placeholder in fluentd configmap for COS audit logging on GKE node.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
NONE
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 60722, 61269). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Bump fluentd-gcp-scaler version
**What this PR does / why we need it**:
This version fixes a bug in which scaler was setting resources for all containers in the pod, not only fluentd-gcp one.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#60763
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Remove mapping to /host/lib from fluentd-gcp container.
**What this PR does / why we need it**:
This mapping is no longer needed since fluentd-gcp v2.0.16, in which it started using a container image based on Debian Stretch, in which the systemd libraries already include support for all the supported
compression algorithms.
The `/run.sh` in the image no longer accesses `/host/lib` anyways, so let's stop mapping it here.
Related changes:
- fluentd-gcp on GoogleCloudPlatform/k8s-stackdriver#101
- fluentd-es on GoogleCloudPlatform/google-fluentd#80
/assign @timstclair
/cc @crassirostris @bmoyles0117
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
N/A
**Special notes for your reviewer**:
N/A
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 60737, 60739, 61080, 60968, 60951). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Find most recent modified date for fluentd buffers recursively.
Fixes#60762
**What this PR does / why we need it**:
Due to updates in Fluent v0.14, the buffers directory modified date is no
longer updated when files inside the directory are changed. Therefore we
must find the most recent modified date recursively to fix liveness probe.
**Release note**:
```release-note
NONE
```
Due to updates in Fluent v0.14, the buffers directory modified date is no
longer updated when files inside the directory are changed. Therefore we
must find the most recent modified date recursively to fix liveness probe.
This mapping is no longer needed since fluentd-gcp v2.0.16, in which it
started using a container image based on Debian Stretch, in which the
systemd libraries already include support for all the supported
compression algorithms.
The /run.sh in the image no longer accesses /host/lib anyways, so let's
stop mapping it here.
Related changes:
- fluentd-gcp on GoogleCloudPlatform/k8s-stackdriver#101
- fluentd-es on GoogleCloudPlatform/google-fluentd#80
Automatic merge from submit-queue (batch tested with PRs 60054, 60202, 60219, 58090, 60275). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Pass location parameter to event exporter.
**What this PR does / why we need it**:
This PR makes event-exporter export cluster location together with events.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Enable scaling fluentd-gcp resources using ScalingPolicy.
See https://github.com/justinsb/scaler for more details about ScalingPolicy resource.
**What this PR does / why we need it**:
This is adding a way to override fluentd-gcp resources in a running cluster. The resources syncing for fluentd-gcp is decoupled from addon manager.
**Special notes for your reviewer**:
**Release note**:
```release-note
fluentd-gcp resources can be modified via a ScalingPolicy
```
cc @kawych @justinsb
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Upload container runtime log to sd/es.
I've verified this in my environment. My stackdriver has an extra `container-runtime` entry for node log, and it collects container runtime daemon log correctly.
@yujuhong @feiskyer @crassirostris @piosz
@kubernetes/sig-node-pr-reviews @kubernetes/sig-instrumentation-pr-reviews
Signed-off-by: Lantao Liu <lantaol@google.com>
**Release note**:
```release-note
Container runtime daemon (e.g. dockerd) logs in GCE cluster will be uploaded to stackdriver and elasticsearch with tag `container-runtime`
```
Automatic merge from submit-queue (batch tested with PRs 59767, 56454, 59237, 59730, 55479). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Change critical pods’ template to use priority
**What this PR does / why we need it**:
Change critical pods’ template to use priority
Thanks.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
ref #57471
**Special notes for your reviewer**:
**Release note**:
```release-note
```
This is the 2nd attempt. The previous was reverted while we figured out
the regional mirrors (oops).
New plan: k8s.gcr.io is a read-only facade that auto-detects your source
region (us, eu, or asia for now) and pulls from the closest. To publish
an image, push k8s-staging.gcr.io and it will be synced to the regionals
automatically (similar to today). For now the staging is an alias to
gcr.io/google_containers (the legacy URL).
When we move off of google-owned projects (working on it), then we just
do a one-time sync, and change the google-internal config, and nobody
outside should notice.
We can, in parallel, change the auto-sync into a manual sync - send a PR
to "promote" something from staging, and a bot activates it. Nice and
visible, easy to keep track of.
fluend-gcp already has these tolerations. kube-proxy when it runs as a
static pod gets wildcard `NoExecute` toleration (all static pods get
that). So, added the same toleration to kube-proxy when it runs as a
daemonset. Also added wildcard `NoSchedule` toleration to kube-proxy.
Automatic merge from submit-queue (batch tested with PRs 56207, 55950). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix setting resources in fluentd-gcp plugin
Currently if some of the variables are not set, scripts prints error, which is not critical, since the function is executed in a separate process, but it leads to the wrong resulting values
```release-note
NONE
```
/cc @piosz @x13n
/assign @roberthbailey @mikedanese
Could you please approve?
Automatic merge from submit-queue (batch tested with PRs 54602, 54877, 55243, 55509, 55128). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
PodSecurityPolicies for addons
**What this PR does / why we need it**:
1. Colocate addon PodSecurityPolicy config with the addons (in a `podsecuritypolicies` subdirectory).
2. Add policies for addons that are currently missing policies (not in the default GCE suite)
3. Remove HostPath SSL certs from several heapster deployments, so that heapster doesn't require a special PSP
**Which issue(s) this PR fixes**:
#43538
**Release note**:
```release-note
- Add PodSecurityPolicies for cluster addons
- Remove SSL cert HostPath volumes from heapster addons
```
Automatic merge from submit-queue (batch tested with PRs 54635, 54250, 54657, 54696, 54700). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Bump version of prometheus-to-sd to 0.2.2.
Bump version of prometheus-to-sd to improve logging, add pod_name and
pod_namespace flags and remove deprecated flags.
Fixes#54583
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50776, 54395). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Move fluentd-gcp out of host network
Since metadata proxy doesn't filter service account after all, make fluentd-gcp addon run in its own network
This will mitigate the problem with port collision
```release-note
[fluentd-gcp addon] Fluentd now runs in its own network, not in the host one.
```
- Use a dedicated service account to run the fluentd-gcp DS
- Update prometheus-to-sd from v0.1.3 to v0.2.1
- Use the certificates in the prometheus-to-sd image rather than mounting the host certs
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
[fluentd-gcp addon] Update Stackdriver plugin to version 0.6.7
A new gem among all fixes Java logging severity parsing and string timestamp parsing
Also sync the buffer size with the gem guidelines, making it 1M instead of 2M.
/cc @igorpeshansky
Automatic merge from submit-queue (batch tested with PRs 51041, 52297, 52296, 52335, 52338)
[fluentd-gcp addon] Restore the metric for the number of read log entries
This metric, previously removed, will allow to monitor the number of log entries, that were read, but weren't sent by the output plugin because of liveness probe removing the data.
Automatic merge from submit-queue (batch tested with PRs 50489, 51070, 51011, 51022, 51141)
update to rbac v1 in yaml file
**What this PR does / why we need it**:
ref to https://github.com/kubernetes/kubernetes/pull/49642
ref https://github.com/kubernetes/features/issues/2
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
cc @liggitt
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 48812, 48276)
Change fluentd-gcp monitoring to use metrics exposed by SD plugin
Following https://github.com/GoogleCloudPlatform/fluent-plugin-google-cloud/pull/135, make fluentd-gcp expose metrics in Prometheus registry and use them instead of counting records in the pipeline.
/cc @piosz @igorpeshansky
```release-note
Fluentd-gcp DaemonSet exposes different set of metrics.
```
Automatic merge from submit-queue
Update docs for user-guide
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```