mirror of
https://github.com/projectacrn/acrn-hypervisor.git
synced 2025-08-02 16:30:12 +00:00
doc: edits for rtvm_performance_tips doc
Fixed windows line endings, improved tip formatting, additional grammar and content simplification edits. Signed-off-by: David B. Kinder <david.b.kinder@intel.com>
This commit is contained in:
parent
f7594f0a93
commit
714b3a35d6
@ -1,213 +1,196 @@
|
|||||||
.. _rt_perf_tips_rtvm:
|
.. _rt_perf_tips_rtvm:
|
||||||
|
|
||||||
ACRN Real-Time VM Performance Tips
|
ACRN Real-Time VM Performance Tips
|
||||||
##################################
|
##################################
|
||||||
|
|
||||||
Background
|
Background
|
||||||
**********
|
**********
|
||||||
|
|
||||||
The ACRN real-time VM (RTVM) is a special type of ACRN post-launched VM. In
|
The ACRN real-time VM (RTVM) is a special type of ACRN post-launched VM.
|
||||||
order to achieve bare metal-like RT performance, a set of constraints and
|
This document shows how you can configure RTVMs to potentially achieve
|
||||||
technologies are applied to the RTVM compared to the ACRN standard VM. With
|
near bare-metal performance by configuring certain key technologies and
|
||||||
these additional constraints and technologies, RT tasks can run on the RTVM
|
eliminating use of a VM-exit within RT tasks, thereby avoiding this
|
||||||
without a VM-exit, which is a key virtualization overhead issue.
|
common virtualization overhead issue.
|
||||||
|
|
||||||
In addition to the VM-exit, interference from neighbor VMs, such as Service
|
Neighbor VMs such as Service VMs, Human-Machine-Interface (HMI) VMs, or
|
||||||
VMs, Human-Machine-Interface (HMI) VMs, or other RT VMs may affect the
|
other real-time VMs, may negatively affect the execution of real-time
|
||||||
execution of real-time tasks on a certain RTVM. Other technologies are
|
tasks on an RTVM. This document also shows technologies used to isolate
|
||||||
applied to isolate noise from the neighbor VMs.
|
potential runtime noise from neighbor VMs.
|
||||||
|
|
||||||
Here is the list of key technologies applied to enable the bare metal-like
|
Here are some key technologies that can significantly improve
|
||||||
RT performance:
|
RTVM performance:
|
||||||
|
|
||||||
- LAPIC passthrough with core partitioning.
|
- LAPIC passthrough with core partitioning.
|
||||||
- PCIe Device Passthrough: Only MSI interrupt-capable PCI devices will be
|
- PCIe Device Passthrough: Only MSI interrupt-capable PCI devices are
|
||||||
supported for the RTVM.
|
supported for the RTVM.
|
||||||
- Enable CAT (Cache Allocation Technology)-based cache isolation: RTVM uses
|
- Enable CAT (Cache Allocation Technology)-based cache isolation: RTVM uses
|
||||||
a dedicated CLOS (Class of Service). While others may share CLOS, the GPU
|
a dedicated CLOS (Class of Service). While others may share CLOS, the GPU
|
||||||
uses a CLOS that will not overlap with the RTVM CLOS.
|
uses a CLOS that will not overlap with the RTVM CLOS.
|
||||||
- PMD virtio: Both virtio BE and FE work in polling mode so that the
|
- PMD virtio: Both virtio BE and FE work in polling mode so
|
||||||
interrupts or notification between the Service VM and RTVM are not needed.
|
interrupts and notification between the Service VM and RTVM are not needed.
|
||||||
The RTVM guest memory is hidden from the Service VM except for the virtio
|
All RTVM guest memory is hidden from the Service VM except for the virtio
|
||||||
queue memory which is all that the Service VM can access.
|
queue memory.
|
||||||
|
|
||||||
This document list tips that are summarized from issues encountered and
|
This document summarizes tips from issues encountered and
|
||||||
resolved during real-time development and performance tuning.
|
resolved during real-time development and performance tuning.
|
||||||
|
|
||||||
Mandatory options for an RTVM
|
Mandatory options for an RTVM
|
||||||
*****************************
|
*****************************
|
||||||
|
|
||||||
An RTVM is a post-launched VM with LAPIC passthrough. To launch an ACRN
|
An RTVM is a post-launched VM with LAPIC passthrough. Pay attention to
|
||||||
RTVM, take note of the following options:
|
these options when you launch an ACRN RTVM:
|
||||||
|
|
||||||
**Tip 1:** Apply the acrn-dm option "--lapic_pt" and make the guest RTVM
|
Tip: Apply the acrn-dm option ``--lapic_pt``
|
||||||
operate under the LAPIC X2APIC mode to enable the LAPIC passthrough.
|
The LAPIC passthrough feature of ACRN is configured via the
|
||||||
|
``--lapic_pt`` option, but the feature is actually enabled when LAPIC is
|
||||||
The LAPIC passthrough feature of ACRN is configured via the "--lapic_pt"
|
switched to X2APIC mode. Both conditions should be met to enable an
|
||||||
option, but the feature is actually enabled when LAPIC is switched to X2APIC
|
RTVM. The ``--rtvm`` option will be automatically attached once
|
||||||
mode. So, both conditions should be met to enable an RTVM. The "--rtvm"
|
``--lapic_pt`` is applied.
|
||||||
option will be automatically attached once "--lapic_pt" is applied.
|
|
||||||
|
Tip: Use virtio polling mode
|
||||||
**Tip 2:** If necessary, use virtio polling mode to prevent the frontend of
|
Polling mode prevents the frontend of the VM-exit from sending a
|
||||||
the VM-exit from sending a notification to the backend.
|
notification to the backend. We recommend that you passthrough a
|
||||||
|
physical peripheral device (such as block or an ethernet device), to an
|
||||||
We recommend that you passthrough a physical peripheral device to an RTVM,
|
RTVM. If no physical device is available, ACRN supports virtio devices
|
||||||
such as block or an ethernet device. If no physical device is available,
|
and enables polling mode to avoid a VM-exit at the frontend. Enable
|
||||||
ACRN supports virtio devices and enables the polling mode to avoid a VM-exit
|
virtio polling mode via the option ``--virtio_poll [polling interval]``.
|
||||||
at the frontend. Virtio polling mode can be enabled via the option
|
|
||||||
"--virtio_poll [polling interval]".
|
Avoid VM-exit latency
|
||||||
|
*********************
|
||||||
Avoid VM-exit latency
|
|
||||||
*********************
|
VM-exit has a significant negative impact on virtualization performance.
|
||||||
|
A single VM-exit causes a several micro-second or longer latency,
|
||||||
VM-exit has a significant negative impact on virtualization performance.
|
depending on what's done in VMX-root mode. VM-exit is classified into two
|
||||||
A single VM-exit can cause several micro-second latencies, or even longer,
|
types: triggered by external CPU events or triggered by operations initiated
|
||||||
depending on what's done in VMX-root mode. VM-exit is classified into two
|
by the vCPU.
|
||||||
types: triggered by external CPU events or triggered by operations initiated
|
|
||||||
by the vCPU.
|
ACRN eliminates almost all VM-exits triggered by external events by
|
||||||
|
using LAPIC passthrough. A few exceptions exist:
|
||||||
ACRN eliminates almost all VM-exits triggered by external events via the
|
|
||||||
LAPIC passthrough. A few exceptions exist:
|
- SMI - This brings the processor into the SMM, causing a much longer
|
||||||
|
performance impact. The SMI should be handled in the BIOS.
|
||||||
- SMI - it will bring the processor into the SMM, causing a much longer
|
|
||||||
performance impact. The SMI should be handled in the BIOS.
|
- NMI - ACRN uses NMI for system-level notification.
|
||||||
|
|
||||||
- NMI - ACRN uses NMI for system-level notification.
|
You should avoid VM-exits triggered by operations initiated by the
|
||||||
|
vCPU. Refer to the `Intel Software Developer Manuals (SMD)
|
||||||
Users should take care of VM-exits that are triggered by operations
|
<https://software.intel.com/en-us/articles/intel-sdm>`_ "Instructions
|
||||||
initiated by the vCPU. Refer to the Intel SMD: "Instructions Cause VM-exits
|
Cause VM-exits Unconditionally" (SDM V3, 25.1.2) and "Instructions That
|
||||||
Unconditionally" (SDM V3, 25.1.2) and "Instructions That Cause VM-exits
|
Cause VM-exits Conditionally" (SDM V3, 25.1.3).
|
||||||
Conditionally" (SDM V3, 25.1.3).
|
|
||||||
|
Tip: Do not use CPUID in a real-time critical section.
|
||||||
**Tip 3:** Do not use CPUID in the RT critical section.
|
The CPUID instruction causes VM-exits unconditionally. You should
|
||||||
|
detect CPU capability **before** entering a RT-critical section.
|
||||||
CPUID is an instruction that causes VM-exits unconditionally. As to the
|
CPUID can be executed at any privilege level to serialize instruction
|
||||||
normal usage of CPUID, this can be avoided by detecting the CPU capability
|
execution and its high efficiency of execution. It's commonly used as a
|
||||||
before entering the RT critical section. CPUID can be executed at any
|
serializing instruction in an application by using CPUID
|
||||||
privilege level to serialize instruction execution and its high efficiency
|
immediately before and after RDTSC. Remove use of CPUID in this case by
|
||||||
of execution. It's commonly used as a serializing instruction in an
|
using RDTSCP instead of RDTSC. RDTSCP waits until all previous
|
||||||
application, and a typical case is using CPUID immediately before and after
|
instructions have been executed before reading the counter, and the
|
||||||
RDTSC. In order to remove CPUID in this case, use RDTSCP instead of RDTSC.
|
subsequent instructions after the RDTSCP normally have data dependency
|
||||||
Because RDTSCP waits until all previous instructions have been executed
|
on it, so they must wait until the RDTSCP has been executed.
|
||||||
before reading the counter, and the subsequent instructions after the RDTSCP
|
|
||||||
normally have data dependency on it, they must wait until the RDTSCP has
|
RDMSR or WRMSR are instructions that cause VM-exits conditionally. On the
|
||||||
been executed.
|
ACRN RTVM, most MSRs are not intercepted by the HV, so they won't cause a
|
||||||
|
VM-exit. But there are exceptions for security consideration:
|
||||||
RDMSR or WRMSR are instructions that cause VM-exits conditionally. On the
|
|
||||||
ACRN RTVM, most MSRs are not intercepted by the HV, so they won't cause a
|
1) read from APICID and LDR;
|
||||||
VM-exit. But there are exceptions for security consideration: 1) read from
|
2) write to TSC_ADJUST if VMX_TSC_OFFSET_FULL is zero;
|
||||||
APICID and LDR; 2) write to TSC_ADJUST if VMX_TSC_OFFSET_FULL is zero;
|
otherwise, read and write to TSC_ADJUST and TSC_DEADLINE;
|
||||||
otherwise, read and write to TSC_ADJUST and TSC_DEADLINE; 3) write to ICR.
|
3) write to ICR.
|
||||||
|
|
||||||
**Tip 4:** Do not use RDMSR to access APICID and LDR at the RT critical
|
Tip: Do not use RDMSR to access APICID and LDR in an RT critical section.
|
||||||
section.
|
ACRN does not present a physical APICID to a guest, so APICID
|
||||||
|
and LDR are virtualized even though LAPIC is passthrough. As a result,
|
||||||
ACRN does not intend to present a physical APICID to a guest so that APICID
|
access to APICID and LDR can cause a VM-exit.
|
||||||
and LDR are virtualized even though LAPIC is passthrough. As a result,
|
|
||||||
access to APICID and LDR can cause a VM-exit.
|
Tip: Guarantee that VMX_TSC_OFFSET_FULL is zero; otherwise, do not access TSC_ADJUST and TSC_DEADLINE in the RT critical section.
|
||||||
|
ACRN uses VMX_TSC_OFFSET_FULL as the offset between vTSC_ADJUST and
|
||||||
**Tip 5:** Guarantee that VMX_TSC_OFFSET_FULL is zero; otherwise, do not
|
pTSC_ADJUST. If VMX_TSC_OFFSET_FULL is zero, intercepting
|
||||||
access TSC_ADJUST and TSC_DEADLINE in the RT critical section.
|
TSC_ADJUST and TSC_DEADLINE is not necessary. Otherwise, they should be
|
||||||
|
intercepted to guarantee functionality.
|
||||||
ACRN uses VMX_TSC_OFFSET_FULL as the offset between vTSC_ADJUST and
|
|
||||||
pTSC_ADJUST; therefore, if VMX_TSC_OFFSET_FULL is zero, intercepting
|
Tip: Utilize Preempt-RT Linux mechanisms to reduce the access of ICR from the RT core.
|
||||||
TSC_ADJUST and TSC_DEADLINE is not necessary. Otherwise, they should be
|
#. Add ``domain`` to ``isolcpus`` ( ``isolcpus=nohz,domain,1`` ) to the kernel parameters.
|
||||||
intercepted to guarantee functionality.
|
#. Add ``idle=poll`` to the kernel parameters.
|
||||||
|
#. Add ``rcu_nocb_poll`` along with ``rcu_nocbs=1`` to the kernel parameters.
|
||||||
**Tip 6:** Utilize Preempt-RT Linux mechanisms to reduce the access of ICR
|
#. Disable the logging service like journald, syslogd if possible.
|
||||||
from the RT core:
|
|
||||||
|
The parameters shown above are recommended for the guest Preempt-RT
|
||||||
#. Add "domain" to the "isolcpus" ( “isolcpus=nohz,domain,1” ) to the kernel parameters.
|
Linux. For an UP RTVM, ICR interception is not a problem. But for an SMP
|
||||||
#. Add "idle=poll" to the kernel parameters.
|
RTVM, IPI may be needed between vCPUs. These tips are about reducing ICR
|
||||||
#. Add "rcu_nocb_poll" along with "rcu_nocbs=1" to the kernel parameters.
|
access. The example above assumes it is a dual-core RTVM, while core 0
|
||||||
#. Disable the logging service like journald, syslogd if possible.
|
is a housekeeping core and core 1 is a real-time core. The ``domain``
|
||||||
|
flag makes strong isolation of the RT core from the general SMP
|
||||||
These parameters are recommended for the guest Preempt-RT Linux. For a UP
|
balancing and scheduling algorithms. The parameters ``idle=poll`` and
|
||||||
RTVM, ICR interception is not a problem. But for an SMP RTVM, IPI may be
|
``rcu_nocb_poll`` could prevent the RT core from sending reschedule IPI
|
||||||
needed between vCPUs; these tips are about to reduce the ICR access. The
|
to wakeup tasks on core 0 in most cases. The logging service is disabled
|
||||||
example above assumes it is a dual-core RTVM, while core 0 is a housekeeping
|
because an IPI may be issued to the housekeeping core to notify the
|
||||||
core and core 1 is a real-time core. The "domain" flag makes strong
|
logging service when there are kernel messages output on the RT core.
|
||||||
isolation of the RT core from the general SMP balancing and scheduling
|
|
||||||
algorithms. "idle=poll" and "rcu_nocb_poll" could prevent the RT core from
|
.. note::
|
||||||
sending reschedule IPI to wakeup tasks on core 0 in most cases. And the
|
If an ICR access is inevitable within the RT critical section, be
|
||||||
disabling of the logging service is because an IPI may be issued to the
|
aware of the extra 3~4 microsecont latency for each access.
|
||||||
housekeeping core to notify the logging service when there are kernel
|
|
||||||
messages output on the RT core.
|
Tip: Create and initialize the RT tasks at the beginning to avoid runtime access to control registers.
|
||||||
|
Accessing Control Registers is another cause of a VM-exit. An ACRN access
|
||||||
.. note::
|
to CR3 and CR8 does not cause a VM-exit. However, writes to CR0 and CR4 may cause a
|
||||||
If an ICR access is inevitable within the RT critical section, please be
|
VM-exit, which would happen at the spawning and initialization of a new task.
|
||||||
aware of the extra 3~4 us latency from each access.
|
|
||||||
|
Isolating the impact of neighbor VMs
|
||||||
**TIP 7:** Create and initialize the RT tasks at the beginning to avoid
|
************************************
|
||||||
runtime access to control registers.
|
|
||||||
|
ACRN makes use of several technologies and hardware features to avoid
|
||||||
The access to Control Registers is another cause of a VM-exit. An ACRN access
|
performance impact on the RTVM by neighbor VMs:
|
||||||
to CR3 and CR8 do not cause a VM-exit, but writes to CR0 and CR4 may cause a
|
|
||||||
VM-exit, which would happen at the spawning and initialization of a new task.
|
Tip: Do not share CPUs allocated to the RTVM with other RT or non-RT VMs.
|
||||||
|
ACRN enables CPU sharing to improve the utilization of CPU resources.
|
||||||
Isolating the impact of neighbor VMs
|
However, for an RT VM, CPUs should be dedicatedly allocated for determinism.
|
||||||
************************************
|
|
||||||
|
Tip: Use RDT such as CAT and MBA to allocate dedicated resources to the RTVM.
|
||||||
ACRN makes use of several technologies and hardware features to avoid the
|
ACRN enables Intel® Resource Director Technology such as CAT, and MBA
|
||||||
impact to the RTVM from neighbor VMs:
|
components such as the GPU via the memory hierarchy. The availability of RDT is
|
||||||
|
hardware-specific. Refer to the :ref:`rdt_configuration`.
|
||||||
**TIP 8:** Do not share CPUs allocated to the RTVM with other RT/non-RT VMs.
|
|
||||||
|
Tip: Lock the GPU to a feasible lowest frequency.
|
||||||
ACRN enables CPU sharing to improve the utilization of CPU resources.
|
A GPU can put a heavy load on the power/memory subsystem. Locking
|
||||||
However, for RT VM, CPUs should be dedicatedly allocated for the determinism.
|
the GPU frequency as low as possible can help improve RT performance
|
||||||
|
determinism. GPU frequency can usually be locked in the BIOS, but such
|
||||||
**TIP 9:** Use RDT such as CAT and MBA to allocate dedicated resources to
|
BIOS support is platform-specific.
|
||||||
the RTVM.
|
|
||||||
|
Miscellaneous
|
||||||
ACRN enables the Intel® Resource Director Technology, such as CAT and MBA,
|
*************
|
||||||
components such as the GPU via memory hierarchy. The availability of RDT is
|
|
||||||
hardware-specific. Refer to the :ref:`rdt_configuration`.
|
Tip: Disable timer migration on Preempt-RT Linux.
|
||||||
|
Because most tasks are set affinitive to the housekeeping core, the timer
|
||||||
**TIP 10:** Lock the GPU to a feasible lowest frequency.
|
armed by RT tasks might be migrated to the nearest busy CPU for power
|
||||||
|
saving. But it will hurt RT determinism because the timer interrupts raised
|
||||||
GPU can put heavy pressure on the power/memory subsystem, so locking the GPU
|
on the housekeeping core need to be resent to the RT core. The timer
|
||||||
frequency as low as possible can help to improve the determinism of RT
|
migration can be disabled by the command::
|
||||||
performance. It can be locked in the BIOS, but the availability of certain
|
|
||||||
BIOS option is platform-specific.
|
echo 0 > /proc/kernel/timer_migration
|
||||||
|
|
||||||
Miscellaneous
|
Tip: Add ``mce=off`` to RT VM kernel parameters.
|
||||||
*************
|
This parameter disables the mce periodic timer and avoids a VM-exit.
|
||||||
|
|
||||||
**TIP 11:** Disable timer migration on Preempt-RT Linux.
|
Tip: Disable the Intel processor C-State and P-State of the RTVM.
|
||||||
|
Power management of a processor could save power, but it could also impact
|
||||||
Because most tasks are set affinitive to the housekeeping core, the timer
|
the RT performance because the power state is changing. C-State and P-State
|
||||||
armed by RT tasks might be migrated to the nearest busy CPU for power
|
PM mechanism can be disabled by adding ``processor.max_cstate=0
|
||||||
saving. But it will hurt the determinism because the timer interrupts raised
|
intel_idle.max_cstate=0 intel_pstate=disabled`` to the kernel parameters.
|
||||||
on the housekeeping core need to be resent to the RT core. The timer
|
|
||||||
migration could be disabled by cmd: "echo 0 > /proc/kernel/timer_migration"
|
Tip: Exercise caution when setting ``/proc/sys/kernel/sched_rt_runtime_us``.
|
||||||
|
Setting ``/proc/sys/kernel/sched_rt_runtime_us`` to ``-1`` can be a
|
||||||
**TIP 12:** Add "mce=off" to RT VM kernel parameters.
|
problem. A value of ``-1`` allows RT tasks to monopolize a CPU, so that
|
||||||
|
a mechanism such as ``nohz`` might get no chance to work, which can hurt
|
||||||
"mce=off" can disable the mce periodic timer in order to void a VM-exit.
|
the RT performance or even (potentially) lock up a system.
|
||||||
|
|
||||||
**TIP 13:** Disable the Intel processor C-State and P-State of the RTVM.
|
Tip: Disable the software workaround for Machine Check Error on Page Size Change.
|
||||||
|
By default, the software workaround for Machine Check Error on Page Size
|
||||||
Power management of a processor could save power, but it could also impact
|
Change is conditionally applied to the models that may be affected by the
|
||||||
the RT performance because the power state is changing. C-State and P-State
|
issue. However, the software workaround has a negative impact on
|
||||||
PM mechanism can be disabled by adding "processor.max_cstate=0
|
performance. If all guest OS kernels are trusted, the
|
||||||
intel_idle.max_cstate=0 intel_pstate=disabled" to the kernel parameters.
|
:option:`CONFIG_MCE_ON_PSC_WORKAROUND_DISABLED` option could be set for performance.
|
||||||
|
|
||||||
**TIP 14:** Exercise caution when setting /proc/sys/kernel/sched_rt_runtime_us.
|
.. note::
|
||||||
|
The tips for preempt-RT Linux are mostly applicable to the Linux-based RT OS as well, such as Xenomai.
|
||||||
Setting /proc/sys/kernel/sched_rt_runtime_us to -1 can be dangerous. A value
|
|
||||||
of -1 allows RT tasks to monopolize a CPU, so that the mechanism such as
|
|
||||||
"nohz" might get no chance to work, which can hurt the RT performance or
|
|
||||||
even (potentially) lock up a system.
|
|
||||||
|
|
||||||
**TIP 15:** Disable the software workaround for Machine Check Error on Page
|
|
||||||
Size Change.
|
|
||||||
|
|
||||||
By default, the software workaround for Machine Check Error on Page Size
|
|
||||||
Change is conditionally applied to the models that may be affected by the
|
|
||||||
issue. However, the software workaround has a negative impact on
|
|
||||||
performance. If all guest OS kernels are trusted, the
|
|
||||||
:option:`CONFIG_MCE_ON_PSC_WORKAROUND_DISABLED` option could be set for performance.
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
The tips for preempt-RT Linux is mostly applicable to the Linux-based RT OS as well, such as Xenomai.
|
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user