doc: edits for rtvm_performance_tips doc

Fixed windows line endings, improved tip formatting, additional grammar
and content simplification edits.

Signed-off-by: David B. Kinder <david.b.kinder@intel.com>
This commit is contained in:
David B. Kinder 2020-04-20 08:45:45 -07:00 committed by deb-intel
parent f7594f0a93
commit 714b3a35d6

View File

@ -1,213 +1,196 @@
.. _rt_perf_tips_rtvm: .. _rt_perf_tips_rtvm:
ACRN Real-Time VM Performance Tips ACRN Real-Time VM Performance Tips
################################## ##################################
Background Background
********** **********
The ACRN real-time VM (RTVM) is a special type of ACRN post-launched VM. In The ACRN real-time VM (RTVM) is a special type of ACRN post-launched VM.
order to achieve bare metal-like RT performance, a set of constraints and This document shows how you can configure RTVMs to potentially achieve
technologies are applied to the RTVM compared to the ACRN standard VM. With near bare-metal performance by configuring certain key technologies and
these additional constraints and technologies, RT tasks can run on the RTVM eliminating use of a VM-exit within RT tasks, thereby avoiding this
without a VM-exit, which is a key virtualization overhead issue. common virtualization overhead issue.
In addition to the VM-exit, interference from neighbor VMs, such as Service Neighbor VMs such as Service VMs, Human-Machine-Interface (HMI) VMs, or
VMs, Human-Machine-Interface (HMI) VMs, or other RT VMs may affect the other real-time VMs, may negatively affect the execution of real-time
execution of real-time tasks on a certain RTVM. Other technologies are tasks on an RTVM. This document also shows technologies used to isolate
applied to isolate noise from the neighbor VMs. potential runtime noise from neighbor VMs.
Here is the list of key technologies applied to enable the bare metal-like Here are some key technologies that can significantly improve
RT performance: RTVM performance:
- LAPIC passthrough with core partitioning. - LAPIC passthrough with core partitioning.
- PCIe Device Passthrough: Only MSI interrupt-capable PCI devices will be - PCIe Device Passthrough: Only MSI interrupt-capable PCI devices are
supported for the RTVM. supported for the RTVM.
- Enable CAT (Cache Allocation Technology)-based cache isolation: RTVM uses - Enable CAT (Cache Allocation Technology)-based cache isolation: RTVM uses
a dedicated CLOS (Class of Service). While others may share CLOS, the GPU a dedicated CLOS (Class of Service). While others may share CLOS, the GPU
uses a CLOS that will not overlap with the RTVM CLOS. uses a CLOS that will not overlap with the RTVM CLOS.
- PMD virtio: Both virtio BE and FE work in polling mode so that the - PMD virtio: Both virtio BE and FE work in polling mode so
interrupts or notification between the Service VM and RTVM are not needed. interrupts and notification between the Service VM and RTVM are not needed.
The RTVM guest memory is hidden from the Service VM except for the virtio All RTVM guest memory is hidden from the Service VM except for the virtio
queue memory which is all that the Service VM can access. queue memory.
This document list tips that are summarized from issues encountered and This document summarizes tips from issues encountered and
resolved during real-time development and performance tuning. resolved during real-time development and performance tuning.
Mandatory options for an RTVM Mandatory options for an RTVM
***************************** *****************************
An RTVM is a post-launched VM with LAPIC passthrough. To launch an ACRN An RTVM is a post-launched VM with LAPIC passthrough. Pay attention to
RTVM, take note of the following options: these options when you launch an ACRN RTVM:
**Tip 1:** Apply the acrn-dm option "--lapic_pt" and make the guest RTVM Tip: Apply the acrn-dm option ``--lapic_pt``
operate under the LAPIC X2APIC mode to enable the LAPIC passthrough. The LAPIC passthrough feature of ACRN is configured via the
``--lapic_pt`` option, but the feature is actually enabled when LAPIC is
The LAPIC passthrough feature of ACRN is configured via the "--lapic_pt" switched to X2APIC mode. Both conditions should be met to enable an
option, but the feature is actually enabled when LAPIC is switched to X2APIC RTVM. The ``--rtvm`` option will be automatically attached once
mode. So, both conditions should be met to enable an RTVM. The "--rtvm" ``--lapic_pt`` is applied.
option will be automatically attached once "--lapic_pt" is applied.
Tip: Use virtio polling mode
**Tip 2:** If necessary, use virtio polling mode to prevent the frontend of Polling mode prevents the frontend of the VM-exit from sending a
the VM-exit from sending a notification to the backend. notification to the backend. We recommend that you passthrough a
physical peripheral device (such as block or an ethernet device), to an
We recommend that you passthrough a physical peripheral device to an RTVM, RTVM. If no physical device is available, ACRN supports virtio devices
such as block or an ethernet device. If no physical device is available, and enables polling mode to avoid a VM-exit at the frontend. Enable
ACRN supports virtio devices and enables the polling mode to avoid a VM-exit virtio polling mode via the option ``--virtio_poll [polling interval]``.
at the frontend. Virtio polling mode can be enabled via the option
"--virtio_poll [polling interval]". Avoid VM-exit latency
*********************
Avoid VM-exit latency
********************* VM-exit has a significant negative impact on virtualization performance.
A single VM-exit causes a several micro-second or longer latency,
VM-exit has a significant negative impact on virtualization performance. depending on what's done in VMX-root mode. VM-exit is classified into two
A single VM-exit can cause several micro-second latencies, or even longer, types: triggered by external CPU events or triggered by operations initiated
depending on what's done in VMX-root mode. VM-exit is classified into two by the vCPU.
types: triggered by external CPU events or triggered by operations initiated
by the vCPU. ACRN eliminates almost all VM-exits triggered by external events by
using LAPIC passthrough. A few exceptions exist:
ACRN eliminates almost all VM-exits triggered by external events via the
LAPIC passthrough. A few exceptions exist: - SMI - This brings the processor into the SMM, causing a much longer
performance impact. The SMI should be handled in the BIOS.
- SMI - it will bring the processor into the SMM, causing a much longer
performance impact. The SMI should be handled in the BIOS. - NMI - ACRN uses NMI for system-level notification.
- NMI - ACRN uses NMI for system-level notification. You should avoid VM-exits triggered by operations initiated by the
vCPU. Refer to the `Intel Software Developer Manuals (SMD)
Users should take care of VM-exits that are triggered by operations <https://software.intel.com/en-us/articles/intel-sdm>`_ "Instructions
initiated by the vCPU. Refer to the Intel SMD: "Instructions Cause VM-exits Cause VM-exits Unconditionally" (SDM V3, 25.1.2) and "Instructions That
Unconditionally" (SDM V3, 25.1.2) and "Instructions That Cause VM-exits Cause VM-exits Conditionally" (SDM V3, 25.1.3).
Conditionally" (SDM V3, 25.1.3).
Tip: Do not use CPUID in a real-time critical section.
**Tip 3:** Do not use CPUID in the RT critical section. The CPUID instruction causes VM-exits unconditionally. You should
detect CPU capability **before** entering a RT-critical section.
CPUID is an instruction that causes VM-exits unconditionally. As to the CPUID can be executed at any privilege level to serialize instruction
normal usage of CPUID, this can be avoided by detecting the CPU capability execution and its high efficiency of execution. It's commonly used as a
before entering the RT critical section. CPUID can be executed at any serializing instruction in an application by using CPUID
privilege level to serialize instruction execution and its high efficiency immediately before and after RDTSC. Remove use of CPUID in this case by
of execution. It's commonly used as a serializing instruction in an using RDTSCP instead of RDTSC. RDTSCP waits until all previous
application, and a typical case is using CPUID immediately before and after instructions have been executed before reading the counter, and the
RDTSC. In order to remove CPUID in this case, use RDTSCP instead of RDTSC. subsequent instructions after the RDTSCP normally have data dependency
Because RDTSCP waits until all previous instructions have been executed on it, so they must wait until the RDTSCP has been executed.
before reading the counter, and the subsequent instructions after the RDTSCP
normally have data dependency on it, they must wait until the RDTSCP has RDMSR or WRMSR are instructions that cause VM-exits conditionally. On the
been executed. ACRN RTVM, most MSRs are not intercepted by the HV, so they won't cause a
VM-exit. But there are exceptions for security consideration:
RDMSR or WRMSR are instructions that cause VM-exits conditionally. On the
ACRN RTVM, most MSRs are not intercepted by the HV, so they won't cause a 1) read from APICID and LDR;
VM-exit. But there are exceptions for security consideration: 1) read from 2) write to TSC_ADJUST if VMX_TSC_OFFSET_FULL is zero;
APICID and LDR; 2) write to TSC_ADJUST if VMX_TSC_OFFSET_FULL is zero; otherwise, read and write to TSC_ADJUST and TSC_DEADLINE;
otherwise, read and write to TSC_ADJUST and TSC_DEADLINE; 3) write to ICR. 3) write to ICR.
**Tip 4:** Do not use RDMSR to access APICID and LDR at the RT critical Tip: Do not use RDMSR to access APICID and LDR in an RT critical section.
section. ACRN does not present a physical APICID to a guest, so APICID
and LDR are virtualized even though LAPIC is passthrough. As a result,
ACRN does not intend to present a physical APICID to a guest so that APICID access to APICID and LDR can cause a VM-exit.
and LDR are virtualized even though LAPIC is passthrough. As a result,
access to APICID and LDR can cause a VM-exit. Tip: Guarantee that VMX_TSC_OFFSET_FULL is zero; otherwise, do not access TSC_ADJUST and TSC_DEADLINE in the RT critical section.
ACRN uses VMX_TSC_OFFSET_FULL as the offset between vTSC_ADJUST and
**Tip 5:** Guarantee that VMX_TSC_OFFSET_FULL is zero; otherwise, do not pTSC_ADJUST. If VMX_TSC_OFFSET_FULL is zero, intercepting
access TSC_ADJUST and TSC_DEADLINE in the RT critical section. TSC_ADJUST and TSC_DEADLINE is not necessary. Otherwise, they should be
intercepted to guarantee functionality.
ACRN uses VMX_TSC_OFFSET_FULL as the offset between vTSC_ADJUST and
pTSC_ADJUST; therefore, if VMX_TSC_OFFSET_FULL is zero, intercepting Tip: Utilize Preempt-RT Linux mechanisms to reduce the access of ICR from the RT core.
TSC_ADJUST and TSC_DEADLINE is not necessary. Otherwise, they should be #. Add ``domain`` to ``isolcpus`` ( ``isolcpus=nohz,domain,1`` ) to the kernel parameters.
intercepted to guarantee functionality. #. Add ``idle=poll`` to the kernel parameters.
#. Add ``rcu_nocb_poll`` along with ``rcu_nocbs=1`` to the kernel parameters.
**Tip 6:** Utilize Preempt-RT Linux mechanisms to reduce the access of ICR #. Disable the logging service like journald, syslogd if possible.
from the RT core:
The parameters shown above are recommended for the guest Preempt-RT
#. Add "domain" to the "isolcpus" ( “isolcpus=nohz,domain,1” ) to the kernel parameters. Linux. For an UP RTVM, ICR interception is not a problem. But for an SMP
#. Add "idle=poll" to the kernel parameters. RTVM, IPI may be needed between vCPUs. These tips are about reducing ICR
#. Add "rcu_nocb_poll" along with "rcu_nocbs=1" to the kernel parameters. access. The example above assumes it is a dual-core RTVM, while core 0
#. Disable the logging service like journald, syslogd if possible. is a housekeeping core and core 1 is a real-time core. The ``domain``
flag makes strong isolation of the RT core from the general SMP
These parameters are recommended for the guest Preempt-RT Linux. For a UP balancing and scheduling algorithms. The parameters ``idle=poll`` and
RTVM, ICR interception is not a problem. But for an SMP RTVM, IPI may be ``rcu_nocb_poll`` could prevent the RT core from sending reschedule IPI
needed between vCPUs; these tips are about to reduce the ICR access. The to wakeup tasks on core 0 in most cases. The logging service is disabled
example above assumes it is a dual-core RTVM, while core 0 is a housekeeping because an IPI may be issued to the housekeeping core to notify the
core and core 1 is a real-time core. The "domain" flag makes strong logging service when there are kernel messages output on the RT core.
isolation of the RT core from the general SMP balancing and scheduling
algorithms. "idle=poll" and "rcu_nocb_poll" could prevent the RT core from .. note::
sending reschedule IPI to wakeup tasks on core 0 in most cases. And the If an ICR access is inevitable within the RT critical section, be
disabling of the logging service is because an IPI may be issued to the aware of the extra 3~4 microsecont latency for each access.
housekeeping core to notify the logging service when there are kernel
messages output on the RT core. Tip: Create and initialize the RT tasks at the beginning to avoid runtime access to control registers.
Accessing Control Registers is another cause of a VM-exit. An ACRN access
.. note:: to CR3 and CR8 does not cause a VM-exit. However, writes to CR0 and CR4 may cause a
If an ICR access is inevitable within the RT critical section, please be VM-exit, which would happen at the spawning and initialization of a new task.
aware of the extra 3~4 us latency from each access.
Isolating the impact of neighbor VMs
**TIP 7:** Create and initialize the RT tasks at the beginning to avoid ************************************
runtime access to control registers.
ACRN makes use of several technologies and hardware features to avoid
The access to Control Registers is another cause of a VM-exit. An ACRN access performance impact on the RTVM by neighbor VMs:
to CR3 and CR8 do not cause a VM-exit, but writes to CR0 and CR4 may cause a
VM-exit, which would happen at the spawning and initialization of a new task. Tip: Do not share CPUs allocated to the RTVM with other RT or non-RT VMs.
ACRN enables CPU sharing to improve the utilization of CPU resources.
Isolating the impact of neighbor VMs However, for an RT VM, CPUs should be dedicatedly allocated for determinism.
************************************
Tip: Use RDT such as CAT and MBA to allocate dedicated resources to the RTVM.
ACRN makes use of several technologies and hardware features to avoid the ACRN enables Intel® Resource Director Technology such as CAT, and MBA
impact to the RTVM from neighbor VMs: components such as the GPU via the memory hierarchy. The availability of RDT is
hardware-specific. Refer to the :ref:`rdt_configuration`.
**TIP 8:** Do not share CPUs allocated to the RTVM with other RT/non-RT VMs.
Tip: Lock the GPU to a feasible lowest frequency.
ACRN enables CPU sharing to improve the utilization of CPU resources. A GPU can put a heavy load on the power/memory subsystem. Locking
However, for RT VM, CPUs should be dedicatedly allocated for the determinism. the GPU frequency as low as possible can help improve RT performance
determinism. GPU frequency can usually be locked in the BIOS, but such
**TIP 9:** Use RDT such as CAT and MBA to allocate dedicated resources to BIOS support is platform-specific.
the RTVM.
Miscellaneous
ACRN enables the Intel® Resource Director Technology, such as CAT and MBA, *************
components such as the GPU via memory hierarchy. The availability of RDT is
hardware-specific. Refer to the :ref:`rdt_configuration`. Tip: Disable timer migration on Preempt-RT Linux.
Because most tasks are set affinitive to the housekeeping core, the timer
**TIP 10:** Lock the GPU to a feasible lowest frequency. armed by RT tasks might be migrated to the nearest busy CPU for power
saving. But it will hurt RT determinism because the timer interrupts raised
GPU can put heavy pressure on the power/memory subsystem, so locking the GPU on the housekeeping core need to be resent to the RT core. The timer
frequency as low as possible can help to improve the determinism of RT migration can be disabled by the command::
performance. It can be locked in the BIOS, but the availability of certain
BIOS option is platform-specific. echo 0 > /proc/kernel/timer_migration
Miscellaneous Tip: Add ``mce=off`` to RT VM kernel parameters.
************* This parameter disables the mce periodic timer and avoids a VM-exit.
**TIP 11:** Disable timer migration on Preempt-RT Linux. Tip: Disable the Intel processor C-State and P-State of the RTVM.
Power management of a processor could save power, but it could also impact
Because most tasks are set affinitive to the housekeeping core, the timer the RT performance because the power state is changing. C-State and P-State
armed by RT tasks might be migrated to the nearest busy CPU for power PM mechanism can be disabled by adding ``processor.max_cstate=0
saving. But it will hurt the determinism because the timer interrupts raised intel_idle.max_cstate=0 intel_pstate=disabled`` to the kernel parameters.
on the housekeeping core need to be resent to the RT core. The timer
migration could be disabled by cmd: "echo 0 > /proc/kernel/timer_migration" Tip: Exercise caution when setting ``/proc/sys/kernel/sched_rt_runtime_us``.
Setting ``/proc/sys/kernel/sched_rt_runtime_us`` to ``-1`` can be a
**TIP 12:** Add "mce=off" to RT VM kernel parameters. problem. A value of ``-1`` allows RT tasks to monopolize a CPU, so that
a mechanism such as ``nohz`` might get no chance to work, which can hurt
"mce=off" can disable the mce periodic timer in order to void a VM-exit. the RT performance or even (potentially) lock up a system.
**TIP 13:** Disable the Intel processor C-State and P-State of the RTVM. Tip: Disable the software workaround for Machine Check Error on Page Size Change.
By default, the software workaround for Machine Check Error on Page Size
Power management of a processor could save power, but it could also impact Change is conditionally applied to the models that may be affected by the
the RT performance because the power state is changing. C-State and P-State issue. However, the software workaround has a negative impact on
PM mechanism can be disabled by adding "processor.max_cstate=0 performance. If all guest OS kernels are trusted, the
intel_idle.max_cstate=0 intel_pstate=disabled" to the kernel parameters. :option:`CONFIG_MCE_ON_PSC_WORKAROUND_DISABLED` option could be set for performance.
**TIP 14:** Exercise caution when setting /proc/sys/kernel/sched_rt_runtime_us. .. note::
The tips for preempt-RT Linux are mostly applicable to the Linux-based RT OS as well, such as Xenomai.
Setting /proc/sys/kernel/sched_rt_runtime_us to -1 can be dangerous. A value
of -1 allows RT tasks to monopolize a CPU, so that the mechanism such as
"nohz" might get no chance to work, which can hurt the RT performance or
even (potentially) lock up a system.
**TIP 15:** Disable the software workaround for Machine Check Error on Page
Size Change.
By default, the software workaround for Machine Check Error on Page Size
Change is conditionally applied to the models that may be affected by the
issue. However, the software workaround has a negative impact on
performance. If all guest OS kernels are trusted, the
:option:`CONFIG_MCE_ON_PSC_WORKAROUND_DISABLED` option could be set for performance.
.. note::
The tips for preempt-RT Linux is mostly applicable to the Linux-based RT OS as well, such as Xenomai.