mirror of
				https://github.com/linuxkit/linuxkit.git
				synced 2025-11-04 03:29:06 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			412 lines
		
	
	
		
			18 KiB
		
	
	
	
		
			Diff
		
	
	
	
	
	
			
		
		
	
	
			412 lines
		
	
	
		
			18 KiB
		
	
	
	
		
			Diff
		
	
	
	
	
	
From d3c22a59774ab35444d4c5958df90ced8600758a Mon Sep 17 00:00:00 2001
 | 
						|
From: Tom Zanussi <tom.zanussi@linux.intel.com>
 | 
						|
Date: Mon, 15 Jan 2018 20:52:06 -0600
 | 
						|
Subject: [PATCH 109/450] tracing: Add inter-event hist trigger Documentation
 | 
						|
 | 
						|
Add background and details on inter-event hist triggers, including
 | 
						|
hist variables, synthetic events, and actions.
 | 
						|
 | 
						|
Link: http://lkml.kernel.org/r/b0414efb66535aa52aa7411f58c3d56724027fce.1516069914.git.tom.zanussi@linux.intel.com
 | 
						|
 | 
						|
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
 | 
						|
Signed-off-by: Baohong Liu <baohong.liu@intel.com>
 | 
						|
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
 | 
						|
(cherry picked from commit 7d5f30af5e39e572f6984c1083fe79fd7dc34d04)
 | 
						|
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
 | 
						|
---
 | 
						|
 Documentation/trace/histogram.txt | 381 ++++++++++++++++++++++++++++++
 | 
						|
 1 file changed, 381 insertions(+)
 | 
						|
 | 
						|
diff --git a/Documentation/trace/histogram.txt b/Documentation/trace/histogram.txt
 | 
						|
index 0aec2d8e166b..df08882d091c 100644
 | 
						|
--- a/Documentation/trace/histogram.txt
 | 
						|
+++ b/Documentation/trace/histogram.txt
 | 
						|
@@ -1603,3 +1603,384 @@
 | 
						|
         Hits: 489
 | 
						|
         Entries: 7
 | 
						|
         Dropped: 0
 | 
						|
+
 | 
						|
+
 | 
						|
+2.2 Inter-event hist triggers
 | 
						|
+-----------------------------
 | 
						|
+
 | 
						|
+Inter-event hist triggers are hist triggers that combine values from
 | 
						|
+one or more other events and create a histogram using that data.  Data
 | 
						|
+from an inter-event histogram can in turn become the source for
 | 
						|
+further combined histograms, thus providing a chain of related
 | 
						|
+histograms, which is important for some applications.
 | 
						|
+
 | 
						|
+The most important example of an inter-event quantity that can be used
 | 
						|
+in this manner is latency, which is simply a difference in timestamps
 | 
						|
+between two events.  Although latency is the most important
 | 
						|
+inter-event quantity, note that because the support is completely
 | 
						|
+general across the trace event subsystem, any event field can be used
 | 
						|
+in an inter-event quantity.
 | 
						|
+
 | 
						|
+An example of a histogram that combines data from other histograms
 | 
						|
+into a useful chain would be a 'wakeupswitch latency' histogram that
 | 
						|
+combines a 'wakeup latency' histogram and a 'switch latency'
 | 
						|
+histogram.
 | 
						|
+
 | 
						|
+Normally, a hist trigger specification consists of a (possibly
 | 
						|
+compound) key along with one or more numeric values, which are
 | 
						|
+continually updated sums associated with that key.  A histogram
 | 
						|
+specification in this case consists of individual key and value
 | 
						|
+specifications that refer to trace event fields associated with a
 | 
						|
+single event type.
 | 
						|
+
 | 
						|
+The inter-event hist trigger extension allows fields from multiple
 | 
						|
+events to be referenced and combined into a multi-event histogram
 | 
						|
+specification.  In support of this overall goal, a few enabling
 | 
						|
+features have been added to the hist trigger support:
 | 
						|
+
 | 
						|
+  - In order to compute an inter-event quantity, a value from one
 | 
						|
+    event needs to saved and then referenced from another event.  This
 | 
						|
+    requires the introduction of support for histogram 'variables'.
 | 
						|
+
 | 
						|
+  - The computation of inter-event quantities and their combination
 | 
						|
+    require some minimal amount of support for applying simple
 | 
						|
+    expressions to variables (+ and -).
 | 
						|
+
 | 
						|
+  - A histogram consisting of inter-event quantities isn't logically a
 | 
						|
+    histogram on either event (so having the 'hist' file for either
 | 
						|
+    event host the histogram output doesn't really make sense).  To
 | 
						|
+    address the idea that the histogram is associated with a
 | 
						|
+    combination of events, support is added allowing the creation of
 | 
						|
+    'synthetic' events that are events derived from other events.
 | 
						|
+    These synthetic events are full-fledged events just like any other
 | 
						|
+    and can be used as such, as for instance to create the
 | 
						|
+    'combination' histograms mentioned previously.
 | 
						|
+
 | 
						|
+  - A set of 'actions' can be associated with histogram entries -
 | 
						|
+    these can be used to generate the previously mentioned synthetic
 | 
						|
+    events, but can also be used for other purposes, such as for
 | 
						|
+    example saving context when a 'max' latency has been hit.
 | 
						|
+
 | 
						|
+  - Trace events don't have a 'timestamp' associated with them, but
 | 
						|
+    there is an implicit timestamp saved along with an event in the
 | 
						|
+    underlying ftrace ring buffer.  This timestamp is now exposed as a
 | 
						|
+    a synthetic field named 'common_timestamp' which can be used in
 | 
						|
+    histograms as if it were any other event field; it isn't an actual
 | 
						|
+    field in the trace format but rather is a synthesized value that
 | 
						|
+    nonetheless can be used as if it were an actual field.  By default
 | 
						|
+    it is in units of nanoseconds; appending '.usecs' to a
 | 
						|
+    common_timestamp field changes the units to microseconds.
 | 
						|
+
 | 
						|
+These features are decribed in more detail in the following sections.
 | 
						|
+
 | 
						|
+2.2.1 Histogram Variables
 | 
						|
+-------------------------
 | 
						|
+
 | 
						|
+Variables are simply named locations used for saving and retrieving
 | 
						|
+values between matching events.  A 'matching' event is defined as an
 | 
						|
+event that has a matching key - if a variable is saved for a histogram
 | 
						|
+entry corresponding to that key, any subsequent event with a matching
 | 
						|
+key can access that variable.
 | 
						|
+
 | 
						|
+A variable's value is normally available to any subsequent event until
 | 
						|
+it is set to something else by a subsequent event.  The one exception
 | 
						|
+to that rule is that any variable used in an expression is essentially
 | 
						|
+'read-once' - once it's used by an expression in a subsequent event,
 | 
						|
+it's reset to its 'unset' state, which means it can't be used again
 | 
						|
+unless it's set again.  This ensures not only that an event doesn't
 | 
						|
+use an uninitialized variable in a calculation, but that that variable
 | 
						|
+is used only once and not for any unrelated subsequent match.
 | 
						|
+
 | 
						|
+The basic syntax for saving a variable is to simply prefix a unique
 | 
						|
+variable name not corresponding to any keyword along with an '=' sign
 | 
						|
+to any event field.
 | 
						|
+
 | 
						|
+Either keys or values can be saved and retrieved in this way.  This
 | 
						|
+creates a variable named 'ts0' for a histogram entry with the key
 | 
						|
+'next_pid':
 | 
						|
+
 | 
						|
+  # echo 'hist:keys=next_pid:vals=$ts0:ts0=common_timestamp ... >> \
 | 
						|
+	event/trigger
 | 
						|
+
 | 
						|
+The ts0 variable can be accessed by any subsequent event having the
 | 
						|
+same pid as 'next_pid'.
 | 
						|
+
 | 
						|
+Variable references are formed by prepending the variable name with
 | 
						|
+the '$' sign.  Thus for example, the ts0 variable above would be
 | 
						|
+referenced as '$ts0' in expressions.
 | 
						|
+
 | 
						|
+Because 'vals=' is used, the common_timestamp variable value above
 | 
						|
+will also be summed as a normal histogram value would (though for a
 | 
						|
+timestamp it makes little sense).
 | 
						|
+
 | 
						|
+The below shows that a key value can also be saved in the same way:
 | 
						|
+
 | 
						|
+  # echo 'hist:timer_pid=common_pid:key=timer_pid ...' >> event/trigger
 | 
						|
+
 | 
						|
+If a variable isn't a key variable or prefixed with 'vals=', the
 | 
						|
+associated event field will be saved in a variable but won't be summed
 | 
						|
+as a value:
 | 
						|
+
 | 
						|
+  # echo 'hist:keys=next_pid:ts1=common_timestamp ... >> event/trigger
 | 
						|
+
 | 
						|
+Multiple variables can be assigned at the same time.  The below would
 | 
						|
+result in both ts0 and b being created as variables, with both
 | 
						|
+common_timestamp and field1 additionally being summed as values:
 | 
						|
+
 | 
						|
+  # echo 'hist:keys=pid:vals=$ts0,$b:ts0=common_timestamp,b=field1 ... >> \
 | 
						|
+	event/trigger
 | 
						|
+
 | 
						|
+Note that variable assignments can appear either preceding or
 | 
						|
+following their use.  The command below behaves identically to the
 | 
						|
+command above:
 | 
						|
+
 | 
						|
+  # echo 'hist:keys=pid:ts0=common_timestamp,b=field1:vals=$ts0,$b ... >> \
 | 
						|
+	event/trigger
 | 
						|
+
 | 
						|
+Any number of variables not bound to a 'vals=' prefix can also be
 | 
						|
+assigned by simply separating them with colons.  Below is the same
 | 
						|
+thing but without the values being summed in the histogram:
 | 
						|
+
 | 
						|
+  # echo 'hist:keys=pid:ts0=common_timestamp:b=field1 ... >> event/trigger
 | 
						|
+
 | 
						|
+Variables set as above can be referenced and used in expressions on
 | 
						|
+another event.
 | 
						|
+
 | 
						|
+For example, here's how a latency can be calculated:
 | 
						|
+
 | 
						|
+  # echo 'hist:keys=pid,prio:ts0=common_timestamp ... >> event1/trigger
 | 
						|
+  # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp-$ts0 ... >> event2/trigger
 | 
						|
+
 | 
						|
+In the first line above, the event's timetamp is saved into the
 | 
						|
+variable ts0.  In the next line, ts0 is subtracted from the second
 | 
						|
+event's timestamp to produce the latency, which is then assigned into
 | 
						|
+yet another variable, 'wakeup_lat'.  The hist trigger below in turn
 | 
						|
+makes use of the wakeup_lat variable to compute a combined latency
 | 
						|
+using the same key and variable from yet another event:
 | 
						|
+
 | 
						|
+  # echo 'hist:key=pid:wakeupswitch_lat=$wakeup_lat+$switchtime_lat ... >> event3/trigger
 | 
						|
+
 | 
						|
+2.2.2 Synthetic Events
 | 
						|
+----------------------
 | 
						|
+
 | 
						|
+Synthetic events are user-defined events generated from hist trigger
 | 
						|
+variables or fields associated with one or more other events.  Their
 | 
						|
+purpose is to provide a mechanism for displaying data spanning
 | 
						|
+multiple events consistent with the existing and already familiar
 | 
						|
+usage for normal events.
 | 
						|
+
 | 
						|
+To define a synthetic event, the user writes a simple specification
 | 
						|
+consisting of the name of the new event along with one or more
 | 
						|
+variables and their types, which can be any valid field type,
 | 
						|
+separated by semicolons, to the tracing/synthetic_events file.
 | 
						|
+
 | 
						|
+For instance, the following creates a new event named 'wakeup_latency'
 | 
						|
+with 3 fields: lat, pid, and prio.  Each of those fields is simply a
 | 
						|
+variable reference to a variable on another event:
 | 
						|
+
 | 
						|
+  # echo 'wakeup_latency \
 | 
						|
+          u64 lat; \
 | 
						|
+          pid_t pid; \
 | 
						|
+	  int prio' >> \
 | 
						|
+	  /sys/kernel/debug/tracing/synthetic_events
 | 
						|
+
 | 
						|
+Reading the tracing/synthetic_events file lists all the currently
 | 
						|
+defined synthetic events, in this case the event defined above:
 | 
						|
+
 | 
						|
+  # cat /sys/kernel/debug/tracing/synthetic_events
 | 
						|
+    wakeup_latency u64 lat; pid_t pid; int prio
 | 
						|
+
 | 
						|
+An existing synthetic event definition can be removed by prepending
 | 
						|
+the command that defined it with a '!':
 | 
						|
+
 | 
						|
+  # echo '!wakeup_latency u64 lat pid_t pid int prio' >> \
 | 
						|
+    /sys/kernel/debug/tracing/synthetic_events
 | 
						|
+
 | 
						|
+At this point, there isn't yet an actual 'wakeup_latency' event
 | 
						|
+instantiated in the event subsytem - for this to happen, a 'hist
 | 
						|
+trigger action' needs to be instantiated and bound to actual fields
 | 
						|
+and variables defined on other events (see Section 6.3.3 below).
 | 
						|
+
 | 
						|
+Once that is done, an event instance is created, and a histogram can
 | 
						|
+be defined using it:
 | 
						|
+
 | 
						|
+  # echo 'hist:keys=pid,prio,lat.log2:sort=pid,lat' >> \
 | 
						|
+        /sys/kernel/debug/tracing/events/synthetic/wakeup_latency/trigger
 | 
						|
+
 | 
						|
+The new event is created under the tracing/events/synthetic/ directory
 | 
						|
+and looks and behaves just like any other event:
 | 
						|
+
 | 
						|
+  # ls /sys/kernel/debug/tracing/events/synthetic/wakeup_latency
 | 
						|
+        enable  filter  format  hist  id  trigger
 | 
						|
+
 | 
						|
+Like any other event, once a histogram is enabled for the event, the
 | 
						|
+output can be displayed by reading the event's 'hist' file.
 | 
						|
+
 | 
						|
+2.2.3 Hist trigger 'actions'
 | 
						|
+----------------------------
 | 
						|
+
 | 
						|
+A hist trigger 'action' is a function that's executed whenever a
 | 
						|
+histogram entry is added or updated.
 | 
						|
+
 | 
						|
+The default 'action' if no special function is explicity specified is
 | 
						|
+as it always has been, to simply update the set of values associated
 | 
						|
+with an entry.  Some applications, however, may want to perform
 | 
						|
+additional actions at that point, such as generate another event, or
 | 
						|
+compare and save a maximum.
 | 
						|
+
 | 
						|
+The following additional actions are available.  To specify an action
 | 
						|
+for a given event, simply specify the action between colons in the
 | 
						|
+hist trigger specification.
 | 
						|
+
 | 
						|
+  - onmatch(matching.event).<synthetic_event_name>(param list)
 | 
						|
+
 | 
						|
+    The 'onmatch(matching.event).<synthetic_event_name>(params)' hist
 | 
						|
+    trigger action is invoked whenever an event matches and the
 | 
						|
+    histogram entry would be added or updated.  It causes the named
 | 
						|
+    synthetic event to be generated with the values given in the
 | 
						|
+    'param list'.  The result is the generation of a synthetic event
 | 
						|
+    that consists of the values contained in those variables at the
 | 
						|
+    time the invoking event was hit.
 | 
						|
+
 | 
						|
+    The 'param list' consists of one or more parameters which may be
 | 
						|
+    either variables or fields defined on either the 'matching.event'
 | 
						|
+    or the target event.  The variables or fields specified in the
 | 
						|
+    param list may be either fully-qualified or unqualified.  If a
 | 
						|
+    variable is specified as unqualified, it must be unique between
 | 
						|
+    the two events.  A field name used as a param can be unqualified
 | 
						|
+    if it refers to the target event, but must be fully qualified if
 | 
						|
+    it refers to the matching event.  A fully-qualified name is of the
 | 
						|
+    form 'system.event_name.$var_name' or 'system.event_name.field'.
 | 
						|
+
 | 
						|
+    The 'matching.event' specification is simply the fully qualified
 | 
						|
+    event name of the event that matches the target event for the
 | 
						|
+    onmatch() functionality, in the form 'system.event_name'.
 | 
						|
+
 | 
						|
+    Finally, the number and type of variables/fields in the 'param
 | 
						|
+    list' must match the number and types of the fields in the
 | 
						|
+    synthetic event being generated.
 | 
						|
+
 | 
						|
+    As an example the below defines a simple synthetic event and uses
 | 
						|
+    a variable defined on the sched_wakeup_new event as a parameter
 | 
						|
+    when invoking the synthetic event.  Here we define the synthetic
 | 
						|
+    event:
 | 
						|
+
 | 
						|
+    # echo 'wakeup_new_test pid_t pid' >> \
 | 
						|
+           /sys/kernel/debug/tracing/synthetic_events
 | 
						|
+
 | 
						|
+    # cat /sys/kernel/debug/tracing/synthetic_events
 | 
						|
+          wakeup_new_test pid_t pid
 | 
						|
+
 | 
						|
+    The following hist trigger both defines the missing testpid
 | 
						|
+    variable and specifies an onmatch() action that generates a
 | 
						|
+    wakeup_new_test synthetic event whenever a sched_wakeup_new event
 | 
						|
+    occurs, which because of the 'if comm == "cyclictest"' filter only
 | 
						|
+    happens when the executable is cyclictest:
 | 
						|
+
 | 
						|
+    # echo 'hist:keys=$testpid:testpid=pid:onmatch(sched.sched_wakeup_new).\
 | 
						|
+            wakeup_new_test($testpid) if comm=="cyclictest"' >> \
 | 
						|
+            /sys/kernel/debug/tracing/events/sched/sched_wakeup_new/trigger
 | 
						|
+
 | 
						|
+    Creating and displaying a histogram based on those events is now
 | 
						|
+    just a matter of using the fields and new synthetic event in the
 | 
						|
+    tracing/events/synthetic directory, as usual:
 | 
						|
+
 | 
						|
+    # echo 'hist:keys=pid:sort=pid' >> \
 | 
						|
+           /sys/kernel/debug/tracing/events/synthetic/wakeup_new_test/trigger
 | 
						|
+
 | 
						|
+    Running 'cyclictest' should cause wakeup_new events to generate
 | 
						|
+    wakeup_new_test synthetic events which should result in histogram
 | 
						|
+    output in the wakeup_new_test event's hist file:
 | 
						|
+
 | 
						|
+    # cat /sys/kernel/debug/tracing/events/synthetic/wakeup_new_test/hist
 | 
						|
+
 | 
						|
+    A more typical usage would be to use two events to calculate a
 | 
						|
+    latency.  The following example uses a set of hist triggers to
 | 
						|
+    produce a 'wakeup_latency' histogram:
 | 
						|
+
 | 
						|
+    First, we define a 'wakeup_latency' synthetic event:
 | 
						|
+
 | 
						|
+    # echo 'wakeup_latency u64 lat; pid_t pid; int prio' >> \
 | 
						|
+            /sys/kernel/debug/tracing/synthetic_events
 | 
						|
+
 | 
						|
+    Next, we specify that whenever we see a sched_waking event for a
 | 
						|
+    cyclictest thread, save the timestamp in a 'ts0' variable:
 | 
						|
+
 | 
						|
+    # echo 'hist:keys=$saved_pid:saved_pid=pid:ts0=common_timestamp.usecs \
 | 
						|
+            if comm=="cyclictest"' >> \
 | 
						|
+	    /sys/kernel/debug/tracing/events/sched/sched_waking/trigger
 | 
						|
+
 | 
						|
+    Then, when the corresponding thread is actually scheduled onto the
 | 
						|
+    CPU by a sched_switch event, calculate the latency and use that
 | 
						|
+    along with another variable and an event field to generate a
 | 
						|
+    wakeup_latency synthetic event:
 | 
						|
+
 | 
						|
+    # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:\
 | 
						|
+            onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,\
 | 
						|
+	            $saved_pid,next_prio) if next_comm=="cyclictest"' >> \
 | 
						|
+	    /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
 | 
						|
+
 | 
						|
+    We also need to create a histogram on the wakeup_latency synthetic
 | 
						|
+    event in order to aggregate the generated synthetic event data:
 | 
						|
+
 | 
						|
+    # echo 'hist:keys=pid,prio,lat:sort=pid,lat' >> \
 | 
						|
+            /sys/kernel/debug/tracing/events/synthetic/wakeup_latency/trigger
 | 
						|
+
 | 
						|
+    Finally, once we've run cyclictest to actually generate some
 | 
						|
+    events, we can see the output by looking at the wakeup_latency
 | 
						|
+    synthetic event's hist file:
 | 
						|
+
 | 
						|
+    # cat /sys/kernel/debug/tracing/events/synthetic/wakeup_latency/hist
 | 
						|
+
 | 
						|
+  - onmax(var).save(field,..	.)
 | 
						|
+
 | 
						|
+    The 'onmax(var).save(field,...)' hist trigger action is invoked
 | 
						|
+    whenever the value of 'var' associated with a histogram entry
 | 
						|
+    exceeds the current maximum contained in that variable.
 | 
						|
+
 | 
						|
+    The end result is that the trace event fields specified as the
 | 
						|
+    onmax.save() params will be saved if 'var' exceeds the current
 | 
						|
+    maximum for that hist trigger entry.  This allows context from the
 | 
						|
+    event that exhibited the new maximum to be saved for later
 | 
						|
+    reference.  When the histogram is displayed, additional fields
 | 
						|
+    displaying the saved values will be printed.
 | 
						|
+
 | 
						|
+    As an example the below defines a couple of hist triggers, one for
 | 
						|
+    sched_waking and another for sched_switch, keyed on pid.  Whenever
 | 
						|
+    a sched_waking occurs, the timestamp is saved in the entry
 | 
						|
+    corresponding to the current pid, and when the scheduler switches
 | 
						|
+    back to that pid, the timestamp difference is calculated.  If the
 | 
						|
+    resulting latency, stored in wakeup_lat, exceeds the current
 | 
						|
+    maximum latency, the values specified in the save() fields are
 | 
						|
+    recoreded:
 | 
						|
+
 | 
						|
+    # echo 'hist:keys=pid:ts0=common_timestamp.usecs \
 | 
						|
+            if comm=="cyclictest"' >> \
 | 
						|
+            /sys/kernel/debug/tracing/events/sched/sched_waking/trigger
 | 
						|
+
 | 
						|
+    # echo 'hist:keys=next_pid:\
 | 
						|
+            wakeup_lat=common_timestamp.usecs-$ts0:\
 | 
						|
+            onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm) \
 | 
						|
+            if next_comm=="cyclictest"' >> \
 | 
						|
+            /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
 | 
						|
+
 | 
						|
+    When the histogram is displayed, the max value and the saved
 | 
						|
+    values corresponding to the max are displayed following the rest
 | 
						|
+    of the fields:
 | 
						|
+
 | 
						|
+    # cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist
 | 
						|
+      { next_pid:       2255 } hitcount:        239
 | 
						|
+        common_timestamp-ts0:          0
 | 
						|
+        max:         27
 | 
						|
+	next_comm: cyclictest
 | 
						|
+        prev_pid:          0  prev_prio:        120  prev_comm: swapper/1
 | 
						|
+
 | 
						|
+      { next_pid:       2256 } hitcount:       2355
 | 
						|
+        common_timestamp-ts0: 0
 | 
						|
+        max:         49  next_comm: cyclictest
 | 
						|
+        prev_pid:          0  prev_prio:        120  prev_comm: swapper/0
 | 
						|
+
 | 
						|
+      Totals:
 | 
						|
+          Hits: 12970
 | 
						|
+          Entries: 2
 | 
						|
+          Dropped: 0
 | 
						|
-- 
 | 
						|
2.19.2
 | 
						|
 |