diff --git a/doc/tutorials/realtime_performance_tuning.rst b/doc/tutorials/realtime_performance_tuning.rst index 0e7ffecb2..7b3bc85cc 100644 --- a/doc/tutorials/realtime_performance_tuning.rst +++ b/doc/tutorials/realtime_performance_tuning.rst @@ -1,20 +1,20 @@ .. _rt_performance_tuning: -Trace and Data Collection for ACRN Real-Time(RT) Performance Tuning -################################################################### -The document describes the methods to collect trace/data for ACRN RT VM real-time -performance analysis. Two parts are included: +Trace and Data Collection for ACRN Real-Time (RT) Performance Tuning +#################################################################### -- Method to use trace for the VM exits analysis; -- Method to collect performance monitoring counts for tuning based on PMU. +The document describes the methods to collect trace/data for ACRN RT VM +real-time performance analysis. Two parts are included: + +- Method to use trace for the VM exits analysis. +- Method to collect performance monitoring counts for tuning based on Performance Monitoring Unit, or PMU. VM exits analysis for ACRN RT performance ***************************************** -VM exits in response to certain instructions and events are a key source of -performance degradation in virtual machines. During the runtime of hard RTVM -of ACRN, there are still some instructions and events which will impact the -RT latency's determinism. +VM exits in response to certain instructions and events are a key source of +performance degradation in virtual machines. During the runtime of a hard +RTVM of ACRN, the following impacts real-time deterministic latency: - CPUID - TSC_Adjust read/write @@ -22,19 +22,18 @@ RT latency's determinism. - APICID/LDR read - ICR write -Generally, we don't want to see any VM exits occur during the critical section -of the RT task. +Generally, we don't want to see any VM exits occur during the critical section of the RT task. -The methodology of VM exits analysis is very simple. Firstly, we should clearly -identify the critical section of RT task. The critical section is the duration -of time where we do not want to see any VM exits occur. Different RT tasks get -different critical section. So this article will take the cyclictest as an example -to elaborate how to do VM exits analysis. +The methodology of VM exits analysis is very simple. First, we clearly +identify the **critical section** of the RT task. The critical section is +the duration of time where we do not want to see any VM exits occur. +Different RT tasks use different critical sections. This document uses +the cyclictest as an example of how to do VM exits analysis. The critical sections ===================== -Here is example pseudocode of cyclictest implementation. +Here is example pseudocode of a cyclictest implementation. .. code-block:: none @@ -47,25 +46,25 @@ Here is example pseudocode of cyclictest implementation. next += interval } -Time point ``now`` is the actual point at which the cyclictest is wakeuped and -scheduled. Time point ``next`` is the expected point at which we want the cyclictest -to be woken up and scheduled. Here we can get the latency by ``now - next``. We don't -want to see VM exits during ``next`` through ``now``. So define the start point of -critical section as ``next`` and end point ``now``. +Time point ``now`` is the actual point at which the cyclictest is wakeuped +and scheduled. Time point ``next`` is the expected point at which we want +the cyclictest to be awakened and scheduled. Here we can get the latency by +``now - next``. We don't want to see VM exits during ``next`` through ``now``. So, define the start point of critical section as ``next`` and end +point ``now``. Log and trace data collection ============================= #. Add timestamps (in TSC) at ``next`` and ``now``. -#. Capture the log with the above timestamps in RTVM. -#. Capture the acrntrace log in Service VM at the same time. +#. Capture the log with the above timestamps in the RTVM. +#. Capture the acrntrace log in the service VM at the same time. Offline analysis ================ #. Convert the raw trace data to human readable format. -#. Merge the logs in RTVM and ACRN hypervisor trace based on timestamps (in TSC). -#. Check if there is any VM exit within the critical sections, the pattern is as follows: +#. Merge the logs in the RTVM and the ACRN hypervisor trace based on timestamps (in TSC). +#. Check to see if any VM exit exists within the critical sections. The pattern is as follows: .. figure:: images/vm_exits_log.png :align: center @@ -77,14 +76,15 @@ Performance monitoring counts collecting Enable Performance Monitoring Unit (PMU) support in VM ====================================================== -By default, ACRN hypervisor doesn't expose the PMU related CPUID and MSRs to -guest VM. In order to use Performance Monitoring Counters (PMCs) in guest VM, -need to modify the ACRN hypervisor code to expose the capability to RTVM. +By default, the ACRN hypervisor doesn't expose the PMU-related CPUID and +MSRs to the guest VM. In order to use Performance Monitoring Counters (PMCs) +in the guest VM, modify the ACRN hypervisor code in order to expose the +capability to the RTVM. -.. note:: Precise Event Based Sampling (PEBS) is not enabled in VM yet. +Note that Precise Event Based Sampling (PEBS) is not yet enabled in the VM. + +#. Expose the CPUID leaf 0xA as below: -#. Expose CPUID leaf 0xA as below: - .. code-block:: none --- a/hypervisor/arch/x86/guest/vcpuid.c @@ -99,7 +99,7 @@ need to modify the ACRN hypervisor code to expose the capability to RTVM. case 0x0fU: case 0x10U: -#. Expose PMU related MSRs to VM as below: +#. Expose the PMU-related MSRs to the VM as below: .. code-block:: none @@ -108,7 +108,7 @@ need to modify the ACRN hypervisor code to expose the capability to RTVM. @@ -337,6 +337,41 @@ void init_msr_emulation(struct acrn_vcpu *vcpu) /* don't need to intercept rdmsr for these MSRs */ enable_msr_interception(msr_bitmap, MSR_IA32_TIME_STAMP_COUNTER, INTERCEPT_WRITE); - + + + /* Passthru PMU related MSRs to guest */ + enable_msr_interception(msr_bitmap, MSR_IA32_FIXED_CTR_CTL, INTERCEPT_DISABLE); @@ -148,39 +148,39 @@ need to modify the ACRN hypervisor code to expose the capability to RTVM. value64 = hva2hpa(vcpu->arch.msr_bitmap); exec_vmwrite64(VMX_MSR_BITMAP_FULL, value64); -Use Perf/PMU tool in performance analysis -========================================= +Perf/PMU tools in performance analysis +====================================== -After exposing PMU related CPUID/MSRs to VM, the performance analysis tool such as -perf and pmu tool can be used inside VM to locate the bottleneck of the application. -**Perf** is a profiler tool for Linux 2.6+ based systems that abstracts away CPU -hardware differences in Linux performance measurements and presents a simple command -line interface. Perf is based on the perf_events interface exported by recent versions -of the Linux kernel. -**PMU** tools is a collection of tools for profile collection and performance analysis -on Intel CPUs on top of Linux Perf. You can refer to the following links for the usage -of Perf: +After exposing PMU-related CPUID/MSRs to the VM, performance analysis tools +such as **perf** and **pmu** can be used inside the VM to locate +the bottleneck of the application. + +**Perf** is a profiler tool for Linux 2.6+ based systems that abstracts away +CPU hardware differences in Linux performance measurements and presents a +simple command line interface. Perf is based on the ``perf_events`` interface +exported by recent versions of the Linux kernel. + +**PMU** tools is a collection of tools for profile collection and performance analysis on Intel CPUs on top of Linux Perf. Refer to the following links for perf usage: - https://perf.wiki.kernel.org/index.php/Main_Page - https://perf.wiki.kernel.org/index.php/Tutorial -You can refer to https://github.com/andikleen/pmu-tools for the usage of PMU tool. +Refer to https://github.com/andikleen/pmu-tools for pmu usage. Top-down Micro-architecture Analysis Method (TMAM) ================================================== -The Top-down Micro-architecture Analysis Method based on the Top-Down Characterization -methodology aims to provide an insight into whether you have made wise choices with your -algorithms and data structures. See the Intel |reg| 64 and IA-32 `Architectures Optimization -Reference Manual `_, -Appendix B.1 for more details on the Top-down Micro-architecture Analysis Method. -You can refer to this `technical paper -`_ -which adopts TMAM for systematic performance benchmarking and analysis of compute-native -Network Function data planes executed on Commercial-Off-The-Shelf (COTS) servers using available -open-source measurement tools. +The Top-down Micro-architecture Analysis Method (TMAM), based on Top-Down +Characterization methodology, aims to provide an insight into whether you +have made wise choices with your algorithms and data structures. See the +Intel |reg| 64 and IA-32 `Architectures Optimization Reference Manual `_, +Appendix B.1 for more details on TMAM. Refer to this `technical paper `_ +which adopts TMAM for systematic performance benchmarking and analysis +of compute-native Network Function data planes that are executed on +Commercial-Off-The-Shelf (COTS) servers using available open-source +measurement tools. -Example: Using Perf to analysis TMAM level 1 on CPU core 1. +Example: Using Perf to analyze TMAM level 1 on CPU core 1 .. code-block:: console @@ -188,11 +188,11 @@ Example: Using Perf to analysis TMAM level 1 on CPU core 1. 10+0 records in 10+0 records out 5120 bytes (5.1 kB, 5.0 KiB) copied, 0.00336348 s, 1.5 MB/s - + Performance counter stats for 'CPU(s) 1': - + retiring bad speculation frontend bound backend bound S0-C1 1 10.6% 1.5% 3.9% 84.0% - + 0.006737123 seconds time elapsed