doc: content updates to RT perf tuning doc

Signed-off-by: Deb Taylor <deb.taylor@intel.com>
2026-05-12 17:52:39 +00:00 · 2019-09-25 09:32:14 -04:00
parent c8bcab9006
commit 7eb0dc1436
1 changed files with 61 additions and 61 deletions
--- a/doc/tutorials/realtime_performance_tuning.rst
+++ b/doc/tutorials/realtime_performance_tuning.rst
@@ -1,20 +1,20 @@
 .. _rt_performance_tuning:

-Trace and Data Collection for ACRN Real-Time(RT) Performance Tuning
-###################################################################
-The document describes the methods to collect trace/data for ACRN RT VM real-time
-performance analysis. Two parts are included:
+Trace and Data Collection for ACRN Real-Time (RT) Performance Tuning
+####################################################################

- Method to use trace for the VM exits analysis;
- Method to collect performance monitoring counts for tuning based on PMU.
+The document describes the methods to collect trace/data for ACRN RT VM
+real-time performance analysis. Two parts are included:
+
+- Method to use trace for the VM exits analysis.
+- Method to collect performance monitoring counts for tuning based on Performance Monitoring Unit, or PMU.

 VM exits analysis for ACRN RT performance
 *****************************************

-VM exits in response to certain instructions and events are a key source of 
-performance degradation in virtual machines. During the runtime of hard RTVM 
-of ACRN, there are still some instructions and events which will impact the 
-RT latency's determinism.
+VM exits in response to certain instructions and events are a key source of
+performance degradation in virtual machines. During the runtime of a hard
+RTVM of ACRN, the following impacts real-time deterministic latency:

  - CPUID
  - TSC_Adjust read/write
@@ -22,19 +22,18 @@ RT latency's determinism.
  - APICID/LDR read
  - ICR write

-Generally, we don't want to see any VM exits occur during the critical section 
-of the RT task.
+Generally, we don't want to see any VM exits occur during the critical section of the RT task.

-The methodology of VM exits analysis is very simple. Firstly, we should clearly 
-identify the critical section of RT task. The critical section is the duration 
-of time where we do not want to see any VM exits occur. Different RT tasks get 
-different critical section. So this article will take the cyclictest as an example
-to elaborate how to do VM exits analysis.
+The methodology of VM exits analysis is very simple. First, we clearly
+identify the **critical section** of the RT task. The critical section is
+the duration of time where we do not want to see any VM exits occur.
+Different RT tasks use different critical sections. This document uses
+the cyclictest as an example of how to do VM exits analysis.

 The critical sections
 =====================

-Here is example pseudocode of cyclictest implementation.
+Here is example pseudocode of a cyclictest implementation.

 .. code-block:: none

@@ -47,25 +46,25 @@ Here is example pseudocode of cyclictest implementation.
         next += interval
   }

-Time point ``now`` is the actual point at which the cyclictest is wakeuped and 
-scheduled. Time point ``next`` is the expected point at which we want the cyclictest 
-to be woken up and scheduled. Here we can get the latency by ``now - next``. We don't 
-want to see VM exits during ``next`` through ``now``. So define the start point of 
-critical section as ``next`` and end point ``now``.
+Time point ``now`` is the actual point at which the cyclictest is wakeuped
+and scheduled. Time point ``next`` is the expected point at which we want
+the cyclictest to be awakened and scheduled. Here we can get the latency by
+``now - next``. We don't want to see VM exits during ``next`` through ``now``. So, define the start point of critical section as ``next`` and end
+point ``now``.

 Log and trace data collection
 =============================

 #. Add timestamps (in TSC) at ``next`` and ``now``.
-#. Capture the log with the above timestamps in RTVM.
-#. Capture the acrntrace log in Service VM at the same time.
+#. Capture the log with the above timestamps in the RTVM.
+#. Capture the acrntrace log in the service VM at the same time.

 Offline analysis
 ================

 #. Convert the raw trace data to human readable format.
-#. Merge the logs in RTVM and ACRN hypervisor trace based on timestamps (in TSC).
-#. Check if there is any VM exit within the critical sections, the pattern is as follows:
+#. Merge the logs in the RTVM and the ACRN hypervisor trace based on timestamps (in TSC).
+#. Check to see if any VM exit exists within the critical sections. The pattern is as follows:

   .. figure:: images/vm_exits_log.png
      :align: center
@@ -77,14 +76,15 @@ Performance monitoring counts collecting
 Enable Performance Monitoring Unit (PMU) support in VM
 ======================================================

-By default, ACRN hypervisor doesn't expose the PMU related CPUID and MSRs to 
-guest VM. In order to use Performance Monitoring Counters (PMCs) in guest VM, 
-need to modify the ACRN hypervisor code to expose the capability to RTVM.
+By default, the ACRN hypervisor doesn't expose the PMU-related CPUID and
+MSRs to the guest VM. In order to use Performance Monitoring Counters (PMCs)
+in the guest VM, modify the ACRN hypervisor code in order to expose the
+capability to the RTVM.

-.. note:: Precise Event Based Sampling (PEBS) is not enabled in VM yet.
+Note that Precise Event Based Sampling (PEBS) is not yet enabled in the VM.
+
+#. Expose the CPUID leaf 0xA as below:

-#. Expose CPUID leaf 0xA as below:
-   
   .. code-block:: none

      --- a/hypervisor/arch/x86/guest/vcpuid.c
@@ -99,7 +99,7 @@ need to modify the ACRN hypervisor code to expose the capability to RTVM.
      case 0x0fU:
      case 0x10U:

-#. Expose PMU related MSRs to VM as below:
+#. Expose the PMU-related MSRs to the VM as below:

   .. code-block:: none

@@ -108,7 +108,7 @@ need to modify the ACRN hypervisor code to expose the capability to RTVM.
      @@ -337,6 +337,41 @@ void init_msr_emulation(struct acrn_vcpu *vcpu)
      /* don't need to intercept rdmsr for these MSRs */
      enable_msr_interception(msr_bitmap, MSR_IA32_TIME_STAMP_COUNTER, INTERCEPT_WRITE);
-      
+
      +
      + /* Passthru PMU related MSRs to guest */
      + enable_msr_interception(msr_bitmap, MSR_IA32_FIXED_CTR_CTL, INTERCEPT_DISABLE);
@@ -148,39 +148,39 @@ need to modify the ACRN hypervisor code to expose the capability to RTVM.
      value64 = hva2hpa(vcpu->arch.msr_bitmap);
      exec_vmwrite64(VMX_MSR_BITMAP_FULL, value64);

-Use Perf/PMU tool in performance analysis
-=========================================
+Perf/PMU tools in performance analysis
+======================================

-After exposing PMU related CPUID/MSRs to VM, the performance analysis tool such as 
-perf and pmu tool can be used inside VM to locate the bottleneck of the application.
-**Perf** is a profiler tool for Linux 2.6+ based systems that abstracts away CPU 
-hardware differences in Linux performance measurements and presents a simple command 
-line interface. Perf is based on the perf_events interface exported by recent versions 
-of the Linux kernel.
-**PMU** tools is a collection of tools for profile collection and performance analysis 
-on Intel CPUs on top of Linux Perf. You can refer to the following links for the usage 
-of Perf:
+After exposing PMU-related CPUID/MSRs to the VM, performance analysis tools
+such as **perf** and **pmu** can be used inside the VM to locate
+the bottleneck of the application.
+
+**Perf** is a profiler tool for Linux 2.6+ based systems that abstracts away
+CPU hardware differences in Linux performance measurements and presents a
+simple command line interface. Perf is based on the ``perf_events`` interface
+exported by recent versions of the Linux kernel.
+
+**PMU** tools is a collection of tools for profile collection and performance analysis on Intel CPUs on top of Linux Perf. Refer to the following links for perf usage:

  - https://perf.wiki.kernel.org/index.php/Main_Page
  - https://perf.wiki.kernel.org/index.php/Tutorial

-You can refer to https://github.com/andikleen/pmu-tools for the usage of PMU tool.
+Refer to https://github.com/andikleen/pmu-tools for pmu usage.

 Top-down Micro-architecture Analysis Method (TMAM)
 ==================================================

-The Top-down Micro-architecture Analysis Method based on the Top-Down Characterization 
-methodology aims to provide an insight into whether you have made wise choices with your 
-algorithms and data structures. See the Intel |reg| 64 and IA-32 `Architectures Optimization 
-Reference Manual <http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf>`_,
-Appendix B.1 for more details on the Top-down Micro-architecture Analysis Method.
-You can refer to this `technical paper
-<https://fd.io/wp-content/uploads/sites/34/2018/01/performance_analysis_sw_data_planes_dec21_2017.pdf>`_
-which adopts TMAM for systematic performance benchmarking and analysis of compute-native 
-Network Function data planes executed on Commercial-Off-The-Shelf (COTS) servers using available
-open-source measurement tools.
+The Top-down Micro-architecture Analysis Method (TMAM), based on Top-Down
+Characterization methodology, aims to provide an insight into whether you
+have made wise choices with your algorithms and data structures. See the
+Intel |reg| 64 and IA-32 `Architectures Optimization Reference Manual <http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf>`_,
+Appendix B.1 for more details on TMAM. Refer to this `technical paper <https://fd.io/wp-content/uploads/sites/34/2018/01/performance_analysis_sw_data_planes_dec21_2017.pdf>`_
+which adopts TMAM for systematic performance benchmarking and analysis
+of compute-native Network Function data planes that are executed on
+Commercial-Off-The-Shelf (COTS) servers using available open-source
+measurement tools.

-Example: Using Perf to analysis TMAM level 1 on CPU core 1.
+Example: Using Perf to analyze TMAM level 1 on CPU core 1

   .. code-block:: console

@@ -188,11 +188,11 @@ Example: Using Perf to analysis TMAM level 1 on CPU core 1.
      10+0 records in
      10+0 records out
      5120 bytes (5.1 kB, 5.0 KiB) copied, 0.00336348 s, 1.5 MB/s
-      
+
      Performance counter stats for 'CPU(s) 1':
-      
+
              retiring bad speculation frontend bound backend bound
      S0-C1 1 10.6%               1.5%           3.9%         84.0%
-      
+
      0.006737123 seconds time elapsed