mirror of
https://github.com/projectacrn/acrn-hypervisor.git
synced 2025-06-24 14:33:38 +00:00
doc: content updates to RT perf tuning doc
Signed-off-by: Deb Taylor <deb.taylor@intel.com>
This commit is contained in:
parent
c8bcab9006
commit
7eb0dc1436
@ -1,20 +1,20 @@
|
|||||||
.. _rt_performance_tuning:
|
.. _rt_performance_tuning:
|
||||||
|
|
||||||
Trace and Data Collection for ACRN Real-Time(RT) Performance Tuning
|
Trace and Data Collection for ACRN Real-Time (RT) Performance Tuning
|
||||||
###################################################################
|
####################################################################
|
||||||
The document describes the methods to collect trace/data for ACRN RT VM real-time
|
|
||||||
performance analysis. Two parts are included:
|
|
||||||
|
|
||||||
- Method to use trace for the VM exits analysis;
|
The document describes the methods to collect trace/data for ACRN RT VM
|
||||||
- Method to collect performance monitoring counts for tuning based on PMU.
|
real-time performance analysis. Two parts are included:
|
||||||
|
|
||||||
|
- Method to use trace for the VM exits analysis.
|
||||||
|
- Method to collect performance monitoring counts for tuning based on Performance Monitoring Unit, or PMU.
|
||||||
|
|
||||||
VM exits analysis for ACRN RT performance
|
VM exits analysis for ACRN RT performance
|
||||||
*****************************************
|
*****************************************
|
||||||
|
|
||||||
VM exits in response to certain instructions and events are a key source of
|
VM exits in response to certain instructions and events are a key source of
|
||||||
performance degradation in virtual machines. During the runtime of hard RTVM
|
performance degradation in virtual machines. During the runtime of a hard
|
||||||
of ACRN, there are still some instructions and events which will impact the
|
RTVM of ACRN, the following impacts real-time deterministic latency:
|
||||||
RT latency's determinism.
|
|
||||||
|
|
||||||
- CPUID
|
- CPUID
|
||||||
- TSC_Adjust read/write
|
- TSC_Adjust read/write
|
||||||
@ -22,19 +22,18 @@ RT latency's determinism.
|
|||||||
- APICID/LDR read
|
- APICID/LDR read
|
||||||
- ICR write
|
- ICR write
|
||||||
|
|
||||||
Generally, we don't want to see any VM exits occur during the critical section
|
Generally, we don't want to see any VM exits occur during the critical section of the RT task.
|
||||||
of the RT task.
|
|
||||||
|
|
||||||
The methodology of VM exits analysis is very simple. Firstly, we should clearly
|
The methodology of VM exits analysis is very simple. First, we clearly
|
||||||
identify the critical section of RT task. The critical section is the duration
|
identify the **critical section** of the RT task. The critical section is
|
||||||
of time where we do not want to see any VM exits occur. Different RT tasks get
|
the duration of time where we do not want to see any VM exits occur.
|
||||||
different critical section. So this article will take the cyclictest as an example
|
Different RT tasks use different critical sections. This document uses
|
||||||
to elaborate how to do VM exits analysis.
|
the cyclictest as an example of how to do VM exits analysis.
|
||||||
|
|
||||||
The critical sections
|
The critical sections
|
||||||
=====================
|
=====================
|
||||||
|
|
||||||
Here is example pseudocode of cyclictest implementation.
|
Here is example pseudocode of a cyclictest implementation.
|
||||||
|
|
||||||
.. code-block:: none
|
.. code-block:: none
|
||||||
|
|
||||||
@ -47,25 +46,25 @@ Here is example pseudocode of cyclictest implementation.
|
|||||||
next += interval
|
next += interval
|
||||||
}
|
}
|
||||||
|
|
||||||
Time point ``now`` is the actual point at which the cyclictest is wakeuped and
|
Time point ``now`` is the actual point at which the cyclictest is wakeuped
|
||||||
scheduled. Time point ``next`` is the expected point at which we want the cyclictest
|
and scheduled. Time point ``next`` is the expected point at which we want
|
||||||
to be woken up and scheduled. Here we can get the latency by ``now - next``. We don't
|
the cyclictest to be awakened and scheduled. Here we can get the latency by
|
||||||
want to see VM exits during ``next`` through ``now``. So define the start point of
|
``now - next``. We don't want to see VM exits during ``next`` through ``now``. So, define the start point of critical section as ``next`` and end
|
||||||
critical section as ``next`` and end point ``now``.
|
point ``now``.
|
||||||
|
|
||||||
Log and trace data collection
|
Log and trace data collection
|
||||||
=============================
|
=============================
|
||||||
|
|
||||||
#. Add timestamps (in TSC) at ``next`` and ``now``.
|
#. Add timestamps (in TSC) at ``next`` and ``now``.
|
||||||
#. Capture the log with the above timestamps in RTVM.
|
#. Capture the log with the above timestamps in the RTVM.
|
||||||
#. Capture the acrntrace log in Service VM at the same time.
|
#. Capture the acrntrace log in the service VM at the same time.
|
||||||
|
|
||||||
Offline analysis
|
Offline analysis
|
||||||
================
|
================
|
||||||
|
|
||||||
#. Convert the raw trace data to human readable format.
|
#. Convert the raw trace data to human readable format.
|
||||||
#. Merge the logs in RTVM and ACRN hypervisor trace based on timestamps (in TSC).
|
#. Merge the logs in the RTVM and the ACRN hypervisor trace based on timestamps (in TSC).
|
||||||
#. Check if there is any VM exit within the critical sections, the pattern is as follows:
|
#. Check to see if any VM exit exists within the critical sections. The pattern is as follows:
|
||||||
|
|
||||||
.. figure:: images/vm_exits_log.png
|
.. figure:: images/vm_exits_log.png
|
||||||
:align: center
|
:align: center
|
||||||
@ -77,13 +76,14 @@ Performance monitoring counts collecting
|
|||||||
Enable Performance Monitoring Unit (PMU) support in VM
|
Enable Performance Monitoring Unit (PMU) support in VM
|
||||||
======================================================
|
======================================================
|
||||||
|
|
||||||
By default, ACRN hypervisor doesn't expose the PMU related CPUID and MSRs to
|
By default, the ACRN hypervisor doesn't expose the PMU-related CPUID and
|
||||||
guest VM. In order to use Performance Monitoring Counters (PMCs) in guest VM,
|
MSRs to the guest VM. In order to use Performance Monitoring Counters (PMCs)
|
||||||
need to modify the ACRN hypervisor code to expose the capability to RTVM.
|
in the guest VM, modify the ACRN hypervisor code in order to expose the
|
||||||
|
capability to the RTVM.
|
||||||
|
|
||||||
.. note:: Precise Event Based Sampling (PEBS) is not enabled in VM yet.
|
Note that Precise Event Based Sampling (PEBS) is not yet enabled in the VM.
|
||||||
|
|
||||||
#. Expose CPUID leaf 0xA as below:
|
#. Expose the CPUID leaf 0xA as below:
|
||||||
|
|
||||||
.. code-block:: none
|
.. code-block:: none
|
||||||
|
|
||||||
@ -99,7 +99,7 @@ need to modify the ACRN hypervisor code to expose the capability to RTVM.
|
|||||||
case 0x0fU:
|
case 0x0fU:
|
||||||
case 0x10U:
|
case 0x10U:
|
||||||
|
|
||||||
#. Expose PMU related MSRs to VM as below:
|
#. Expose the PMU-related MSRs to the VM as below:
|
||||||
|
|
||||||
.. code-block:: none
|
.. code-block:: none
|
||||||
|
|
||||||
@ -148,39 +148,39 @@ need to modify the ACRN hypervisor code to expose the capability to RTVM.
|
|||||||
value64 = hva2hpa(vcpu->arch.msr_bitmap);
|
value64 = hva2hpa(vcpu->arch.msr_bitmap);
|
||||||
exec_vmwrite64(VMX_MSR_BITMAP_FULL, value64);
|
exec_vmwrite64(VMX_MSR_BITMAP_FULL, value64);
|
||||||
|
|
||||||
Use Perf/PMU tool in performance analysis
|
Perf/PMU tools in performance analysis
|
||||||
=========================================
|
======================================
|
||||||
|
|
||||||
After exposing PMU related CPUID/MSRs to VM, the performance analysis tool such as
|
After exposing PMU-related CPUID/MSRs to the VM, performance analysis tools
|
||||||
perf and pmu tool can be used inside VM to locate the bottleneck of the application.
|
such as **perf** and **pmu** can be used inside the VM to locate
|
||||||
**Perf** is a profiler tool for Linux 2.6+ based systems that abstracts away CPU
|
the bottleneck of the application.
|
||||||
hardware differences in Linux performance measurements and presents a simple command
|
|
||||||
line interface. Perf is based on the perf_events interface exported by recent versions
|
**Perf** is a profiler tool for Linux 2.6+ based systems that abstracts away
|
||||||
of the Linux kernel.
|
CPU hardware differences in Linux performance measurements and presents a
|
||||||
**PMU** tools is a collection of tools for profile collection and performance analysis
|
simple command line interface. Perf is based on the ``perf_events`` interface
|
||||||
on Intel CPUs on top of Linux Perf. You can refer to the following links for the usage
|
exported by recent versions of the Linux kernel.
|
||||||
of Perf:
|
|
||||||
|
**PMU** tools is a collection of tools for profile collection and performance analysis on Intel CPUs on top of Linux Perf. Refer to the following links for perf usage:
|
||||||
|
|
||||||
- https://perf.wiki.kernel.org/index.php/Main_Page
|
- https://perf.wiki.kernel.org/index.php/Main_Page
|
||||||
- https://perf.wiki.kernel.org/index.php/Tutorial
|
- https://perf.wiki.kernel.org/index.php/Tutorial
|
||||||
|
|
||||||
You can refer to https://github.com/andikleen/pmu-tools for the usage of PMU tool.
|
Refer to https://github.com/andikleen/pmu-tools for pmu usage.
|
||||||
|
|
||||||
Top-down Micro-architecture Analysis Method (TMAM)
|
Top-down Micro-architecture Analysis Method (TMAM)
|
||||||
==================================================
|
==================================================
|
||||||
|
|
||||||
The Top-down Micro-architecture Analysis Method based on the Top-Down Characterization
|
The Top-down Micro-architecture Analysis Method (TMAM), based on Top-Down
|
||||||
methodology aims to provide an insight into whether you have made wise choices with your
|
Characterization methodology, aims to provide an insight into whether you
|
||||||
algorithms and data structures. See the Intel |reg| 64 and IA-32 `Architectures Optimization
|
have made wise choices with your algorithms and data structures. See the
|
||||||
Reference Manual <http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf>`_,
|
Intel |reg| 64 and IA-32 `Architectures Optimization Reference Manual <http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf>`_,
|
||||||
Appendix B.1 for more details on the Top-down Micro-architecture Analysis Method.
|
Appendix B.1 for more details on TMAM. Refer to this `technical paper <https://fd.io/wp-content/uploads/sites/34/2018/01/performance_analysis_sw_data_planes_dec21_2017.pdf>`_
|
||||||
You can refer to this `technical paper
|
which adopts TMAM for systematic performance benchmarking and analysis
|
||||||
<https://fd.io/wp-content/uploads/sites/34/2018/01/performance_analysis_sw_data_planes_dec21_2017.pdf>`_
|
of compute-native Network Function data planes that are executed on
|
||||||
which adopts TMAM for systematic performance benchmarking and analysis of compute-native
|
Commercial-Off-The-Shelf (COTS) servers using available open-source
|
||||||
Network Function data planes executed on Commercial-Off-The-Shelf (COTS) servers using available
|
measurement tools.
|
||||||
open-source measurement tools.
|
|
||||||
|
|
||||||
Example: Using Perf to analysis TMAM level 1 on CPU core 1.
|
Example: Using Perf to analyze TMAM level 1 on CPU core 1
|
||||||
|
|
||||||
.. code-block:: console
|
.. code-block:: console
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user