doc: content updates to RT perf tuning doc

Signed-off-by: Deb Taylor <deb.taylor@intel.com>
This commit is contained in:
Deb Taylor 2019-09-25 09:32:14 -04:00 committed by deb-intel
parent c8bcab9006
commit 7eb0dc1436

View File

@ -1,20 +1,20 @@
.. _rt_performance_tuning: .. _rt_performance_tuning:
Trace and Data Collection for ACRN Real-Time(RT) Performance Tuning Trace and Data Collection for ACRN Real-Time (RT) Performance Tuning
################################################################### ####################################################################
The document describes the methods to collect trace/data for ACRN RT VM real-time
performance analysis. Two parts are included:
- Method to use trace for the VM exits analysis; The document describes the methods to collect trace/data for ACRN RT VM
- Method to collect performance monitoring counts for tuning based on PMU. real-time performance analysis. Two parts are included:
- Method to use trace for the VM exits analysis.
- Method to collect performance monitoring counts for tuning based on Performance Monitoring Unit, or PMU.
VM exits analysis for ACRN RT performance VM exits analysis for ACRN RT performance
***************************************** *****************************************
VM exits in response to certain instructions and events are a key source of VM exits in response to certain instructions and events are a key source of
performance degradation in virtual machines. During the runtime of hard RTVM performance degradation in virtual machines. During the runtime of a hard
of ACRN, there are still some instructions and events which will impact the RTVM of ACRN, the following impacts real-time deterministic latency:
RT latency's determinism.
- CPUID - CPUID
- TSC_Adjust read/write - TSC_Adjust read/write
@ -22,19 +22,18 @@ RT latency's determinism.
- APICID/LDR read - APICID/LDR read
- ICR write - ICR write
Generally, we don't want to see any VM exits occur during the critical section Generally, we don't want to see any VM exits occur during the critical section of the RT task.
of the RT task.
The methodology of VM exits analysis is very simple. Firstly, we should clearly The methodology of VM exits analysis is very simple. First, we clearly
identify the critical section of RT task. The critical section is the duration identify the **critical section** of the RT task. The critical section is
of time where we do not want to see any VM exits occur. Different RT tasks get the duration of time where we do not want to see any VM exits occur.
different critical section. So this article will take the cyclictest as an example Different RT tasks use different critical sections. This document uses
to elaborate how to do VM exits analysis. the cyclictest as an example of how to do VM exits analysis.
The critical sections The critical sections
===================== =====================
Here is example pseudocode of cyclictest implementation. Here is example pseudocode of a cyclictest implementation.
.. code-block:: none .. code-block:: none
@ -47,25 +46,25 @@ Here is example pseudocode of cyclictest implementation.
next += interval next += interval
} }
Time point ``now`` is the actual point at which the cyclictest is wakeuped and Time point ``now`` is the actual point at which the cyclictest is wakeuped
scheduled. Time point ``next`` is the expected point at which we want the cyclictest and scheduled. Time point ``next`` is the expected point at which we want
to be woken up and scheduled. Here we can get the latency by ``now - next``. We don't the cyclictest to be awakened and scheduled. Here we can get the latency by
want to see VM exits during ``next`` through ``now``. So define the start point of ``now - next``. We don't want to see VM exits during ``next`` through ``now``. So, define the start point of critical section as ``next`` and end
critical section as ``next`` and end point ``now``. point ``now``.
Log and trace data collection Log and trace data collection
============================= =============================
#. Add timestamps (in TSC) at ``next`` and ``now``. #. Add timestamps (in TSC) at ``next`` and ``now``.
#. Capture the log with the above timestamps in RTVM. #. Capture the log with the above timestamps in the RTVM.
#. Capture the acrntrace log in Service VM at the same time. #. Capture the acrntrace log in the service VM at the same time.
Offline analysis Offline analysis
================ ================
#. Convert the raw trace data to human readable format. #. Convert the raw trace data to human readable format.
#. Merge the logs in RTVM and ACRN hypervisor trace based on timestamps (in TSC). #. Merge the logs in the RTVM and the ACRN hypervisor trace based on timestamps (in TSC).
#. Check if there is any VM exit within the critical sections, the pattern is as follows: #. Check to see if any VM exit exists within the critical sections. The pattern is as follows:
.. figure:: images/vm_exits_log.png .. figure:: images/vm_exits_log.png
:align: center :align: center
@ -77,13 +76,14 @@ Performance monitoring counts collecting
Enable Performance Monitoring Unit (PMU) support in VM Enable Performance Monitoring Unit (PMU) support in VM
====================================================== ======================================================
By default, ACRN hypervisor doesn't expose the PMU related CPUID and MSRs to By default, the ACRN hypervisor doesn't expose the PMU-related CPUID and
guest VM. In order to use Performance Monitoring Counters (PMCs) in guest VM, MSRs to the guest VM. In order to use Performance Monitoring Counters (PMCs)
need to modify the ACRN hypervisor code to expose the capability to RTVM. in the guest VM, modify the ACRN hypervisor code in order to expose the
capability to the RTVM.
.. note:: Precise Event Based Sampling (PEBS) is not enabled in VM yet. Note that Precise Event Based Sampling (PEBS) is not yet enabled in the VM.
#. Expose CPUID leaf 0xA as below: #. Expose the CPUID leaf 0xA as below:
.. code-block:: none .. code-block:: none
@ -99,7 +99,7 @@ need to modify the ACRN hypervisor code to expose the capability to RTVM.
case 0x0fU: case 0x0fU:
case 0x10U: case 0x10U:
#. Expose PMU related MSRs to VM as below: #. Expose the PMU-related MSRs to the VM as below:
.. code-block:: none .. code-block:: none
@ -148,39 +148,39 @@ need to modify the ACRN hypervisor code to expose the capability to RTVM.
value64 = hva2hpa(vcpu->arch.msr_bitmap); value64 = hva2hpa(vcpu->arch.msr_bitmap);
exec_vmwrite64(VMX_MSR_BITMAP_FULL, value64); exec_vmwrite64(VMX_MSR_BITMAP_FULL, value64);
Use Perf/PMU tool in performance analysis Perf/PMU tools in performance analysis
========================================= ======================================
After exposing PMU related CPUID/MSRs to VM, the performance analysis tool such as After exposing PMU-related CPUID/MSRs to the VM, performance analysis tools
perf and pmu tool can be used inside VM to locate the bottleneck of the application. such as **perf** and **pmu** can be used inside the VM to locate
**Perf** is a profiler tool for Linux 2.6+ based systems that abstracts away CPU the bottleneck of the application.
hardware differences in Linux performance measurements and presents a simple command
line interface. Perf is based on the perf_events interface exported by recent versions **Perf** is a profiler tool for Linux 2.6+ based systems that abstracts away
of the Linux kernel. CPU hardware differences in Linux performance measurements and presents a
**PMU** tools is a collection of tools for profile collection and performance analysis simple command line interface. Perf is based on the ``perf_events`` interface
on Intel CPUs on top of Linux Perf. You can refer to the following links for the usage exported by recent versions of the Linux kernel.
of Perf:
**PMU** tools is a collection of tools for profile collection and performance analysis on Intel CPUs on top of Linux Perf. Refer to the following links for perf usage:
- https://perf.wiki.kernel.org/index.php/Main_Page - https://perf.wiki.kernel.org/index.php/Main_Page
- https://perf.wiki.kernel.org/index.php/Tutorial - https://perf.wiki.kernel.org/index.php/Tutorial
You can refer to https://github.com/andikleen/pmu-tools for the usage of PMU tool. Refer to https://github.com/andikleen/pmu-tools for pmu usage.
Top-down Micro-architecture Analysis Method (TMAM) Top-down Micro-architecture Analysis Method (TMAM)
================================================== ==================================================
The Top-down Micro-architecture Analysis Method based on the Top-Down Characterization The Top-down Micro-architecture Analysis Method (TMAM), based on Top-Down
methodology aims to provide an insight into whether you have made wise choices with your Characterization methodology, aims to provide an insight into whether you
algorithms and data structures. See the Intel |reg| 64 and IA-32 `Architectures Optimization have made wise choices with your algorithms and data structures. See the
Reference Manual <http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf>`_, Intel |reg| 64 and IA-32 `Architectures Optimization Reference Manual <http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf>`_,
Appendix B.1 for more details on the Top-down Micro-architecture Analysis Method. Appendix B.1 for more details on TMAM. Refer to this `technical paper <https://fd.io/wp-content/uploads/sites/34/2018/01/performance_analysis_sw_data_planes_dec21_2017.pdf>`_
You can refer to this `technical paper which adopts TMAM for systematic performance benchmarking and analysis
<https://fd.io/wp-content/uploads/sites/34/2018/01/performance_analysis_sw_data_planes_dec21_2017.pdf>`_ of compute-native Network Function data planes that are executed on
which adopts TMAM for systematic performance benchmarking and analysis of compute-native Commercial-Off-The-Shelf (COTS) servers using available open-source
Network Function data planes executed on Commercial-Off-The-Shelf (COTS) servers using available measurement tools.
open-source measurement tools.
Example: Using Perf to analysis TMAM level 1 on CPU core 1. Example: Using Perf to analyze TMAM level 1 on CPU core 1
.. code-block:: console .. code-block:: console