mirror of
https://github.com/k3s-io/kubernetes.git
synced 2025-09-03 02:07:38 +00:00
Add event correlation to client
This commit is contained in:
@@ -35,14 +35,23 @@ Documentation for other releases can be found at
|
||||
|
||||
This document captures the design of event compression.
|
||||
|
||||
|
||||
## Background
|
||||
|
||||
Kubernetes components can get into a state where they generate tons of events which are identical except for the timestamp. For example, when pulling a non-existing image, Kubelet will repeatedly generate `image_not_existing` and `container_is_waiting` events until upstream components correct the image. When this happens, the spam from the repeated events makes the entire event mechanism useless. It also appears to cause memory pressure in etcd (see [#3853](http://issue.k8s.io/3853)).
|
||||
Kubernetes components can get into a state where they generate tons of events.
|
||||
|
||||
The events can be categorized in one of two ways:
|
||||
|
||||
1. same - the event is identical to previous events except it varies only on timestamp
|
||||
2. similar - the event is identical to previous events except it varies on timestamp and message
|
||||
|
||||
For example, when pulling a non-existing image, Kubelet will repeatedly generate `image_not_existing` and `container_is_waiting` events until upstream components correct the image. When this happens, the spam from the repeated events makes the entire event mechanism useless. It also appears to cause memory pressure in etcd (see [#3853](http://issue.k8s.io/3853)).
|
||||
|
||||
The goal is introduce event counting to increment same events, and event aggregation to collapse similar events.
|
||||
|
||||
## Proposal
|
||||
|
||||
Each binary that generates events (for example, `kubelet`) should keep track of previously generated events so that it can collapse recurring events into a single event instead of creating a new instance for each new event.
|
||||
Each binary that generates events (for example, `kubelet`) should keep track of previously generated events so that it can collapse recurring events into a single event instead of creating a new instance for each new event. In addition, if many similar events are
|
||||
created, events should be aggregated into a single event to reduce spam.
|
||||
|
||||
Event compression should be best effort (not guaranteed). Meaning, in the worst case, `n` identical (minus timestamp) events may still result in `n` event entries.
|
||||
|
||||
@@ -61,6 +70,24 @@ Instead of a single Timestamp, each event object [contains](http://releases.k8s.
|
||||
Each binary that generates events:
|
||||
* Maintains a historical record of previously generated events:
|
||||
* Implemented with ["Least Recently Used Cache"](https://github.com/golang/groupcache/blob/master/lru/lru.go) in [`pkg/client/record/events_cache.go`](../../pkg/client/record/events_cache.go).
|
||||
* Implemented behind an `EventCorrelator` that manages two subcomponents: `EventAggregator` and `EventLogger`
|
||||
* The `EventCorrelator` observes all incoming events and lets each subcomponent visit and modify the event in turn.
|
||||
* The `EventAggregator` runs an aggregation function over each event. This function buckets each event based on an `aggregateKey`,
|
||||
and identifies the event uniquely with a `localKey` in that bucket.
|
||||
* The default aggregation function groups similar events that differ only by `event.Message`. It's `localKey` is `event.Message` and its aggregate key is produced by joining:
|
||||
* `event.Source.Component`
|
||||
* `event.Source.Host`
|
||||
* `event.InvolvedObject.Kind`
|
||||
* `event.InvolvedObject.Namespace`
|
||||
* `event.InvolvedObject.Name`
|
||||
* `event.InvolvedObject.UID`
|
||||
* `event.InvolvedObject.APIVersion`
|
||||
* `event.Reason`
|
||||
* If the `EventAggregator` observes a similar event produced 10 times in a 10 minute window, it drops the event that was provided as
|
||||
input and creates a new event that differs only on the message. The message denotes that this event is used to group similar events
|
||||
that matched on reason. This aggregated `Event` is then used in the event processing sequence.
|
||||
* The `EventLogger` observes the event out of `EventAggregation` and tracks the number of times it has observed that event previously
|
||||
by incrementing a key in a cache associated with that matching event.
|
||||
* The key in the cache is generated from the event object minus timestamps/count/transient fields, specifically the following events fields are used to construct a unique key for an event:
|
||||
* `event.Source.Component`
|
||||
* `event.Source.Host`
|
||||
@@ -71,7 +98,7 @@ Each binary that generates events:
|
||||
* `event.InvolvedObject.APIVersion`
|
||||
* `event.Reason`
|
||||
* `event.Message`
|
||||
* The LRU cache is capped at 4096 events. That means if a component (e.g. kubelet) runs for a long period of time and generates tons of unique events, the previously generated events cache will not grow unchecked in memory. Instead, after 4096 unique events are generated, the oldest events are evicted from the cache.
|
||||
* The LRU cache is capped at 4096 events for both `EventAggregator` and `EventLogger`. That means if a component (e.g. kubelet) runs for a long period of time and generates tons of unique events, the previously generated events cache will not grow unchecked in memory. Instead, after 4096 unique events are generated, the oldest events are evicted from the cache.
|
||||
* When an event is generated, the previously generated events cache is checked (see [`pkg/client/unversioned/record/event.go`](http://releases.k8s.io/HEAD/pkg/client/unversioned/record/event.go)).
|
||||
* If the key for the new event matches the key for a previously generated event (meaning all of the above fields match between the new event and some previously generated event), then the event is considered to be a duplicate and the existing event entry is updated in etcd:
|
||||
* The new PUT (update) event API is called to update the existing event entry in etcd with the new last seen timestamp and count.
|
||||
|
Reference in New Issue
Block a user