mirror of
https://github.com/projectacrn/acrn-hypervisor.git
synced 2025-07-31 23:38:24 +00:00
doc: edits for rtvm_performance_tips doc
Fixed windows line endings, improved tip formatting, additional grammar and content simplification edits. Signed-off-by: David B. Kinder <david.b.kinder@intel.com>
This commit is contained in:
parent
f7594f0a93
commit
714b3a35d6
@ -1,213 +1,196 @@
|
||||
.. _rt_perf_tips_rtvm:
|
||||
|
||||
ACRN Real-Time VM Performance Tips
|
||||
##################################
|
||||
|
||||
Background
|
||||
**********
|
||||
|
||||
The ACRN real-time VM (RTVM) is a special type of ACRN post-launched VM. In
|
||||
order to achieve bare metal-like RT performance, a set of constraints and
|
||||
technologies are applied to the RTVM compared to the ACRN standard VM. With
|
||||
these additional constraints and technologies, RT tasks can run on the RTVM
|
||||
without a VM-exit, which is a key virtualization overhead issue.
|
||||
|
||||
In addition to the VM-exit, interference from neighbor VMs, such as Service
|
||||
VMs, Human-Machine-Interface (HMI) VMs, or other RT VMs may affect the
|
||||
execution of real-time tasks on a certain RTVM. Other technologies are
|
||||
applied to isolate noise from the neighbor VMs.
|
||||
|
||||
Here is the list of key technologies applied to enable the bare metal-like
|
||||
RT performance:
|
||||
|
||||
- LAPIC passthrough with core partitioning.
|
||||
- PCIe Device Passthrough: Only MSI interrupt-capable PCI devices will be
|
||||
supported for the RTVM.
|
||||
- Enable CAT (Cache Allocation Technology)-based cache isolation: RTVM uses
|
||||
a dedicated CLOS (Class of Service). While others may share CLOS, the GPU
|
||||
uses a CLOS that will not overlap with the RTVM CLOS.
|
||||
- PMD virtio: Both virtio BE and FE work in polling mode so that the
|
||||
interrupts or notification between the Service VM and RTVM are not needed.
|
||||
The RTVM guest memory is hidden from the Service VM except for the virtio
|
||||
queue memory which is all that the Service VM can access.
|
||||
|
||||
This document list tips that are summarized from issues encountered and
|
||||
resolved during real-time development and performance tuning.
|
||||
|
||||
Mandatory options for an RTVM
|
||||
*****************************
|
||||
|
||||
An RTVM is a post-launched VM with LAPIC passthrough. To launch an ACRN
|
||||
RTVM, take note of the following options:
|
||||
|
||||
**Tip 1:** Apply the acrn-dm option "--lapic_pt" and make the guest RTVM
|
||||
operate under the LAPIC X2APIC mode to enable the LAPIC passthrough.
|
||||
|
||||
The LAPIC passthrough feature of ACRN is configured via the "--lapic_pt"
|
||||
option, but the feature is actually enabled when LAPIC is switched to X2APIC
|
||||
mode. So, both conditions should be met to enable an RTVM. The "--rtvm"
|
||||
option will be automatically attached once "--lapic_pt" is applied.
|
||||
|
||||
**Tip 2:** If necessary, use virtio polling mode to prevent the frontend of
|
||||
the VM-exit from sending a notification to the backend.
|
||||
|
||||
We recommend that you passthrough a physical peripheral device to an RTVM,
|
||||
such as block or an ethernet device. If no physical device is available,
|
||||
ACRN supports virtio devices and enables the polling mode to avoid a VM-exit
|
||||
at the frontend. Virtio polling mode can be enabled via the option
|
||||
"--virtio_poll [polling interval]".
|
||||
|
||||
Avoid VM-exit latency
|
||||
*********************
|
||||
|
||||
VM-exit has a significant negative impact on virtualization performance.
|
||||
A single VM-exit can cause several micro-second latencies, or even longer,
|
||||
depending on what's done in VMX-root mode. VM-exit is classified into two
|
||||
types: triggered by external CPU events or triggered by operations initiated
|
||||
by the vCPU.
|
||||
|
||||
ACRN eliminates almost all VM-exits triggered by external events via the
|
||||
LAPIC passthrough. A few exceptions exist:
|
||||
|
||||
- SMI - it will bring the processor into the SMM, causing a much longer
|
||||
performance impact. The SMI should be handled in the BIOS.
|
||||
|
||||
- NMI - ACRN uses NMI for system-level notification.
|
||||
|
||||
Users should take care of VM-exits that are triggered by operations
|
||||
initiated by the vCPU. Refer to the Intel SMD: "Instructions Cause VM-exits
|
||||
Unconditionally" (SDM V3, 25.1.2) and "Instructions That Cause VM-exits
|
||||
Conditionally" (SDM V3, 25.1.3).
|
||||
|
||||
**Tip 3:** Do not use CPUID in the RT critical section.
|
||||
|
||||
CPUID is an instruction that causes VM-exits unconditionally. As to the
|
||||
normal usage of CPUID, this can be avoided by detecting the CPU capability
|
||||
before entering the RT critical section. CPUID can be executed at any
|
||||
privilege level to serialize instruction execution and its high efficiency
|
||||
of execution. It's commonly used as a serializing instruction in an
|
||||
application, and a typical case is using CPUID immediately before and after
|
||||
RDTSC. In order to remove CPUID in this case, use RDTSCP instead of RDTSC.
|
||||
Because RDTSCP waits until all previous instructions have been executed
|
||||
before reading the counter, and the subsequent instructions after the RDTSCP
|
||||
normally have data dependency on it, they must wait until the RDTSCP has
|
||||
been executed.
|
||||
|
||||
RDMSR or WRMSR are instructions that cause VM-exits conditionally. On the
|
||||
ACRN RTVM, most MSRs are not intercepted by the HV, so they won't cause a
|
||||
VM-exit. But there are exceptions for security consideration: 1) read from
|
||||
APICID and LDR; 2) write to TSC_ADJUST if VMX_TSC_OFFSET_FULL is zero;
|
||||
otherwise, read and write to TSC_ADJUST and TSC_DEADLINE; 3) write to ICR.
|
||||
|
||||
**Tip 4:** Do not use RDMSR to access APICID and LDR at the RT critical
|
||||
section.
|
||||
|
||||
ACRN does not intend to present a physical APICID to a guest so that APICID
|
||||
and LDR are virtualized even though LAPIC is passthrough. As a result,
|
||||
access to APICID and LDR can cause a VM-exit.
|
||||
|
||||
**Tip 5:** Guarantee that VMX_TSC_OFFSET_FULL is zero; otherwise, do not
|
||||
access TSC_ADJUST and TSC_DEADLINE in the RT critical section.
|
||||
|
||||
ACRN uses VMX_TSC_OFFSET_FULL as the offset between vTSC_ADJUST and
|
||||
pTSC_ADJUST; therefore, if VMX_TSC_OFFSET_FULL is zero, intercepting
|
||||
TSC_ADJUST and TSC_DEADLINE is not necessary. Otherwise, they should be
|
||||
intercepted to guarantee functionality.
|
||||
|
||||
**Tip 6:** Utilize Preempt-RT Linux mechanisms to reduce the access of ICR
|
||||
from the RT core:
|
||||
|
||||
#. Add "domain" to the "isolcpus" ( “isolcpus=nohz,domain,1” ) to the kernel parameters.
|
||||
#. Add "idle=poll" to the kernel parameters.
|
||||
#. Add "rcu_nocb_poll" along with "rcu_nocbs=1" to the kernel parameters.
|
||||
#. Disable the logging service like journald, syslogd if possible.
|
||||
|
||||
These parameters are recommended for the guest Preempt-RT Linux. For a UP
|
||||
RTVM, ICR interception is not a problem. But for an SMP RTVM, IPI may be
|
||||
needed between vCPUs; these tips are about to reduce the ICR access. The
|
||||
example above assumes it is a dual-core RTVM, while core 0 is a housekeeping
|
||||
core and core 1 is a real-time core. The "domain" flag makes strong
|
||||
isolation of the RT core from the general SMP balancing and scheduling
|
||||
algorithms. "idle=poll" and "rcu_nocb_poll" could prevent the RT core from
|
||||
sending reschedule IPI to wakeup tasks on core 0 in most cases. And the
|
||||
disabling of the logging service is because an IPI may be issued to the
|
||||
housekeeping core to notify the logging service when there are kernel
|
||||
messages output on the RT core.
|
||||
|
||||
.. note::
|
||||
If an ICR access is inevitable within the RT critical section, please be
|
||||
aware of the extra 3~4 us latency from each access.
|
||||
|
||||
**TIP 7:** Create and initialize the RT tasks at the beginning to avoid
|
||||
runtime access to control registers.
|
||||
|
||||
The access to Control Registers is another cause of a VM-exit. An ACRN access
|
||||
to CR3 and CR8 do not cause a VM-exit, but writes to CR0 and CR4 may cause a
|
||||
VM-exit, which would happen at the spawning and initialization of a new task.
|
||||
|
||||
Isolating the impact of neighbor VMs
|
||||
************************************
|
||||
|
||||
ACRN makes use of several technologies and hardware features to avoid the
|
||||
impact to the RTVM from neighbor VMs:
|
||||
|
||||
**TIP 8:** Do not share CPUs allocated to the RTVM with other RT/non-RT VMs.
|
||||
|
||||
ACRN enables CPU sharing to improve the utilization of CPU resources.
|
||||
However, for RT VM, CPUs should be dedicatedly allocated for the determinism.
|
||||
|
||||
**TIP 9:** Use RDT such as CAT and MBA to allocate dedicated resources to
|
||||
the RTVM.
|
||||
|
||||
ACRN enables the Intel® Resource Director Technology, such as CAT and MBA,
|
||||
components such as the GPU via memory hierarchy. The availability of RDT is
|
||||
hardware-specific. Refer to the :ref:`rdt_configuration`.
|
||||
|
||||
**TIP 10:** Lock the GPU to a feasible lowest frequency.
|
||||
|
||||
GPU can put heavy pressure on the power/memory subsystem, so locking the GPU
|
||||
frequency as low as possible can help to improve the determinism of RT
|
||||
performance. It can be locked in the BIOS, but the availability of certain
|
||||
BIOS option is platform-specific.
|
||||
|
||||
Miscellaneous
|
||||
*************
|
||||
|
||||
**TIP 11:** Disable timer migration on Preempt-RT Linux.
|
||||
|
||||
Because most tasks are set affinitive to the housekeeping core, the timer
|
||||
armed by RT tasks might be migrated to the nearest busy CPU for power
|
||||
saving. But it will hurt the determinism because the timer interrupts raised
|
||||
on the housekeeping core need to be resent to the RT core. The timer
|
||||
migration could be disabled by cmd: "echo 0 > /proc/kernel/timer_migration"
|
||||
|
||||
**TIP 12:** Add "mce=off" to RT VM kernel parameters.
|
||||
|
||||
"mce=off" can disable the mce periodic timer in order to void a VM-exit.
|
||||
|
||||
**TIP 13:** Disable the Intel processor C-State and P-State of the RTVM.
|
||||
|
||||
Power management of a processor could save power, but it could also impact
|
||||
the RT performance because the power state is changing. C-State and P-State
|
||||
PM mechanism can be disabled by adding "processor.max_cstate=0
|
||||
intel_idle.max_cstate=0 intel_pstate=disabled" to the kernel parameters.
|
||||
|
||||
**TIP 14:** Exercise caution when setting /proc/sys/kernel/sched_rt_runtime_us.
|
||||
|
||||
Setting /proc/sys/kernel/sched_rt_runtime_us to -1 can be dangerous. A value
|
||||
of -1 allows RT tasks to monopolize a CPU, so that the mechanism such as
|
||||
"nohz" might get no chance to work, which can hurt the RT performance or
|
||||
even (potentially) lock up a system.
|
||||
|
||||
**TIP 15:** Disable the software workaround for Machine Check Error on Page
|
||||
Size Change.
|
||||
|
||||
By default, the software workaround for Machine Check Error on Page Size
|
||||
Change is conditionally applied to the models that may be affected by the
|
||||
issue. However, the software workaround has a negative impact on
|
||||
performance. If all guest OS kernels are trusted, the
|
||||
:option:`CONFIG_MCE_ON_PSC_WORKAROUND_DISABLED` option could be set for performance.
|
||||
|
||||
.. note::
|
||||
The tips for preempt-RT Linux is mostly applicable to the Linux-based RT OS as well, such as Xenomai.
|
||||
|
||||
.. _rt_perf_tips_rtvm:
|
||||
|
||||
ACRN Real-Time VM Performance Tips
|
||||
##################################
|
||||
|
||||
Background
|
||||
**********
|
||||
|
||||
The ACRN real-time VM (RTVM) is a special type of ACRN post-launched VM.
|
||||
This document shows how you can configure RTVMs to potentially achieve
|
||||
near bare-metal performance by configuring certain key technologies and
|
||||
eliminating use of a VM-exit within RT tasks, thereby avoiding this
|
||||
common virtualization overhead issue.
|
||||
|
||||
Neighbor VMs such as Service VMs, Human-Machine-Interface (HMI) VMs, or
|
||||
other real-time VMs, may negatively affect the execution of real-time
|
||||
tasks on an RTVM. This document also shows technologies used to isolate
|
||||
potential runtime noise from neighbor VMs.
|
||||
|
||||
Here are some key technologies that can significantly improve
|
||||
RTVM performance:
|
||||
|
||||
- LAPIC passthrough with core partitioning.
|
||||
- PCIe Device Passthrough: Only MSI interrupt-capable PCI devices are
|
||||
supported for the RTVM.
|
||||
- Enable CAT (Cache Allocation Technology)-based cache isolation: RTVM uses
|
||||
a dedicated CLOS (Class of Service). While others may share CLOS, the GPU
|
||||
uses a CLOS that will not overlap with the RTVM CLOS.
|
||||
- PMD virtio: Both virtio BE and FE work in polling mode so
|
||||
interrupts and notification between the Service VM and RTVM are not needed.
|
||||
All RTVM guest memory is hidden from the Service VM except for the virtio
|
||||
queue memory.
|
||||
|
||||
This document summarizes tips from issues encountered and
|
||||
resolved during real-time development and performance tuning.
|
||||
|
||||
Mandatory options for an RTVM
|
||||
*****************************
|
||||
|
||||
An RTVM is a post-launched VM with LAPIC passthrough. Pay attention to
|
||||
these options when you launch an ACRN RTVM:
|
||||
|
||||
Tip: Apply the acrn-dm option ``--lapic_pt``
|
||||
The LAPIC passthrough feature of ACRN is configured via the
|
||||
``--lapic_pt`` option, but the feature is actually enabled when LAPIC is
|
||||
switched to X2APIC mode. Both conditions should be met to enable an
|
||||
RTVM. The ``--rtvm`` option will be automatically attached once
|
||||
``--lapic_pt`` is applied.
|
||||
|
||||
Tip: Use virtio polling mode
|
||||
Polling mode prevents the frontend of the VM-exit from sending a
|
||||
notification to the backend. We recommend that you passthrough a
|
||||
physical peripheral device (such as block or an ethernet device), to an
|
||||
RTVM. If no physical device is available, ACRN supports virtio devices
|
||||
and enables polling mode to avoid a VM-exit at the frontend. Enable
|
||||
virtio polling mode via the option ``--virtio_poll [polling interval]``.
|
||||
|
||||
Avoid VM-exit latency
|
||||
*********************
|
||||
|
||||
VM-exit has a significant negative impact on virtualization performance.
|
||||
A single VM-exit causes a several micro-second or longer latency,
|
||||
depending on what's done in VMX-root mode. VM-exit is classified into two
|
||||
types: triggered by external CPU events or triggered by operations initiated
|
||||
by the vCPU.
|
||||
|
||||
ACRN eliminates almost all VM-exits triggered by external events by
|
||||
using LAPIC passthrough. A few exceptions exist:
|
||||
|
||||
- SMI - This brings the processor into the SMM, causing a much longer
|
||||
performance impact. The SMI should be handled in the BIOS.
|
||||
|
||||
- NMI - ACRN uses NMI for system-level notification.
|
||||
|
||||
You should avoid VM-exits triggered by operations initiated by the
|
||||
vCPU. Refer to the `Intel Software Developer Manuals (SMD)
|
||||
<https://software.intel.com/en-us/articles/intel-sdm>`_ "Instructions
|
||||
Cause VM-exits Unconditionally" (SDM V3, 25.1.2) and "Instructions That
|
||||
Cause VM-exits Conditionally" (SDM V3, 25.1.3).
|
||||
|
||||
Tip: Do not use CPUID in a real-time critical section.
|
||||
The CPUID instruction causes VM-exits unconditionally. You should
|
||||
detect CPU capability **before** entering a RT-critical section.
|
||||
CPUID can be executed at any privilege level to serialize instruction
|
||||
execution and its high efficiency of execution. It's commonly used as a
|
||||
serializing instruction in an application by using CPUID
|
||||
immediately before and after RDTSC. Remove use of CPUID in this case by
|
||||
using RDTSCP instead of RDTSC. RDTSCP waits until all previous
|
||||
instructions have been executed before reading the counter, and the
|
||||
subsequent instructions after the RDTSCP normally have data dependency
|
||||
on it, so they must wait until the RDTSCP has been executed.
|
||||
|
||||
RDMSR or WRMSR are instructions that cause VM-exits conditionally. On the
|
||||
ACRN RTVM, most MSRs are not intercepted by the HV, so they won't cause a
|
||||
VM-exit. But there are exceptions for security consideration:
|
||||
|
||||
1) read from APICID and LDR;
|
||||
2) write to TSC_ADJUST if VMX_TSC_OFFSET_FULL is zero;
|
||||
otherwise, read and write to TSC_ADJUST and TSC_DEADLINE;
|
||||
3) write to ICR.
|
||||
|
||||
Tip: Do not use RDMSR to access APICID and LDR in an RT critical section.
|
||||
ACRN does not present a physical APICID to a guest, so APICID
|
||||
and LDR are virtualized even though LAPIC is passthrough. As a result,
|
||||
access to APICID and LDR can cause a VM-exit.
|
||||
|
||||
Tip: Guarantee that VMX_TSC_OFFSET_FULL is zero; otherwise, do not access TSC_ADJUST and TSC_DEADLINE in the RT critical section.
|
||||
ACRN uses VMX_TSC_OFFSET_FULL as the offset between vTSC_ADJUST and
|
||||
pTSC_ADJUST. If VMX_TSC_OFFSET_FULL is zero, intercepting
|
||||
TSC_ADJUST and TSC_DEADLINE is not necessary. Otherwise, they should be
|
||||
intercepted to guarantee functionality.
|
||||
|
||||
Tip: Utilize Preempt-RT Linux mechanisms to reduce the access of ICR from the RT core.
|
||||
#. Add ``domain`` to ``isolcpus`` ( ``isolcpus=nohz,domain,1`` ) to the kernel parameters.
|
||||
#. Add ``idle=poll`` to the kernel parameters.
|
||||
#. Add ``rcu_nocb_poll`` along with ``rcu_nocbs=1`` to the kernel parameters.
|
||||
#. Disable the logging service like journald, syslogd if possible.
|
||||
|
||||
The parameters shown above are recommended for the guest Preempt-RT
|
||||
Linux. For an UP RTVM, ICR interception is not a problem. But for an SMP
|
||||
RTVM, IPI may be needed between vCPUs. These tips are about reducing ICR
|
||||
access. The example above assumes it is a dual-core RTVM, while core 0
|
||||
is a housekeeping core and core 1 is a real-time core. The ``domain``
|
||||
flag makes strong isolation of the RT core from the general SMP
|
||||
balancing and scheduling algorithms. The parameters ``idle=poll`` and
|
||||
``rcu_nocb_poll`` could prevent the RT core from sending reschedule IPI
|
||||
to wakeup tasks on core 0 in most cases. The logging service is disabled
|
||||
because an IPI may be issued to the housekeeping core to notify the
|
||||
logging service when there are kernel messages output on the RT core.
|
||||
|
||||
.. note::
|
||||
If an ICR access is inevitable within the RT critical section, be
|
||||
aware of the extra 3~4 microsecont latency for each access.
|
||||
|
||||
Tip: Create and initialize the RT tasks at the beginning to avoid runtime access to control registers.
|
||||
Accessing Control Registers is another cause of a VM-exit. An ACRN access
|
||||
to CR3 and CR8 does not cause a VM-exit. However, writes to CR0 and CR4 may cause a
|
||||
VM-exit, which would happen at the spawning and initialization of a new task.
|
||||
|
||||
Isolating the impact of neighbor VMs
|
||||
************************************
|
||||
|
||||
ACRN makes use of several technologies and hardware features to avoid
|
||||
performance impact on the RTVM by neighbor VMs:
|
||||
|
||||
Tip: Do not share CPUs allocated to the RTVM with other RT or non-RT VMs.
|
||||
ACRN enables CPU sharing to improve the utilization of CPU resources.
|
||||
However, for an RT VM, CPUs should be dedicatedly allocated for determinism.
|
||||
|
||||
Tip: Use RDT such as CAT and MBA to allocate dedicated resources to the RTVM.
|
||||
ACRN enables Intel® Resource Director Technology such as CAT, and MBA
|
||||
components such as the GPU via the memory hierarchy. The availability of RDT is
|
||||
hardware-specific. Refer to the :ref:`rdt_configuration`.
|
||||
|
||||
Tip: Lock the GPU to a feasible lowest frequency.
|
||||
A GPU can put a heavy load on the power/memory subsystem. Locking
|
||||
the GPU frequency as low as possible can help improve RT performance
|
||||
determinism. GPU frequency can usually be locked in the BIOS, but such
|
||||
BIOS support is platform-specific.
|
||||
|
||||
Miscellaneous
|
||||
*************
|
||||
|
||||
Tip: Disable timer migration on Preempt-RT Linux.
|
||||
Because most tasks are set affinitive to the housekeeping core, the timer
|
||||
armed by RT tasks might be migrated to the nearest busy CPU for power
|
||||
saving. But it will hurt RT determinism because the timer interrupts raised
|
||||
on the housekeeping core need to be resent to the RT core. The timer
|
||||
migration can be disabled by the command::
|
||||
|
||||
echo 0 > /proc/kernel/timer_migration
|
||||
|
||||
Tip: Add ``mce=off`` to RT VM kernel parameters.
|
||||
This parameter disables the mce periodic timer and avoids a VM-exit.
|
||||
|
||||
Tip: Disable the Intel processor C-State and P-State of the RTVM.
|
||||
Power management of a processor could save power, but it could also impact
|
||||
the RT performance because the power state is changing. C-State and P-State
|
||||
PM mechanism can be disabled by adding ``processor.max_cstate=0
|
||||
intel_idle.max_cstate=0 intel_pstate=disabled`` to the kernel parameters.
|
||||
|
||||
Tip: Exercise caution when setting ``/proc/sys/kernel/sched_rt_runtime_us``.
|
||||
Setting ``/proc/sys/kernel/sched_rt_runtime_us`` to ``-1`` can be a
|
||||
problem. A value of ``-1`` allows RT tasks to monopolize a CPU, so that
|
||||
a mechanism such as ``nohz`` might get no chance to work, which can hurt
|
||||
the RT performance or even (potentially) lock up a system.
|
||||
|
||||
Tip: Disable the software workaround for Machine Check Error on Page Size Change.
|
||||
By default, the software workaround for Machine Check Error on Page Size
|
||||
Change is conditionally applied to the models that may be affected by the
|
||||
issue. However, the software workaround has a negative impact on
|
||||
performance. If all guest OS kernels are trusted, the
|
||||
:option:`CONFIG_MCE_ON_PSC_WORKAROUND_DISABLED` option could be set for performance.
|
||||
|
||||
.. note::
|
||||
The tips for preempt-RT Linux are mostly applicable to the Linux-based RT OS as well, such as Xenomai.
|
||||
|
Loading…
Reference in New Issue
Block a user