mirror of
https://github.com/projectacrn/acrn-hypervisor.git
synced 2025-06-19 12:12:16 +00:00
doc: add VT-d posted interrupt documentation
Tracked-On: #4506 Signed-off-by: dongshen <dongsheng.x.zhang@intel.com> Signed-off-by: David B. Kinder <david.b.kinder@intel.com>
This commit is contained in:
parent
691a0e2e56
commit
c8fb0d76ba
@ -4,20 +4,21 @@ Device Passthrough
|
|||||||
##################
|
##################
|
||||||
|
|
||||||
A critical part of virtualization is virtualizing devices: exposing all
|
A critical part of virtualization is virtualizing devices: exposing all
|
||||||
aspects of a device including its I/O, interrupts, DMA, and configuration.
|
aspects of a device including its I/O, interrupts, DMA, and
|
||||||
There are three typical device
|
configuration. There are three typical device virtualization methods:
|
||||||
virtualization methods: emulation, para-virtualization, and passthrough.
|
emulation, para-virtualization, and passthrough. All emulation,
|
||||||
All emulation, para-virtualization and passthrough are used in ACRN project. Device
|
para-virtualization and passthrough are used in ACRN project. Device
|
||||||
emulation is discussed in :ref:`hld-io-emulation`, para-virtualization is discussed
|
emulation is discussed in :ref:`hld-io-emulation`, para-virtualization
|
||||||
in :ref:`hld-virtio-devices` and device passthrough will be discussed here.
|
is discussed in :ref:`hld-virtio-devices` and device passthrough will be
|
||||||
|
discussed here.
|
||||||
|
|
||||||
In the ACRN project, device emulation means emulating all existing hardware
|
In the ACRN project, device emulation means emulating all existing
|
||||||
resource through a software component device model running in the
|
hardware resource through a software component device model running in
|
||||||
Service OS (SOS). Device
|
the Service OS (SOS). Device emulation must maintain the same SW
|
||||||
emulation must maintain the same SW interface as a native device,
|
interface as a native device, providing transparency to the VM software
|
||||||
providing transparency to the VM software stack. Passthrough implemented in
|
stack. Passthrough implemented in hypervisor assigns a physical device
|
||||||
hypervisor assigns a physical device to a VM so the VM can access
|
to a VM so the VM can access the hardware device directly with minimal
|
||||||
the hardware device directly with minimal (if any) VMM involvement.
|
(if any) VMM involvement.
|
||||||
|
|
||||||
The difference between device emulation and passthrough is shown in
|
The difference between device emulation and passthrough is shown in
|
||||||
:numref:`emu-passthru-diff`. You can notice device emulation has
|
:numref:`emu-passthru-diff`. You can notice device emulation has
|
||||||
@ -143,6 +144,102 @@ interrupt vector after checking the external interrupt request is valid. Transla
|
|||||||
physical vector to virtual vector is still needed to be done by hypervisor, which is
|
physical vector to virtual vector is still needed to be done by hypervisor, which is
|
||||||
also described in the below section :ref:`interrupt-remapping`.
|
also described in the below section :ref:`interrupt-remapping`.
|
||||||
|
|
||||||
|
VT-d posted interrupt (PI) enables direct delivery of external interrupts from
|
||||||
|
passthrough devices to VMs without having to exit to hypervisor, thereby improving
|
||||||
|
interrupt performance. ACRN uses VT-d posted interrupts if the platform
|
||||||
|
supports them. VT-d distinguishes between remapped
|
||||||
|
and posted interrupt modes by bit 15 in the low 64-bit of the IRTE. If cleared the
|
||||||
|
entry is remapped, if set it's posted.
|
||||||
|
The idea for posted interrupt is to keep a Posted Interrupt Descriptor (PID) in memory.
|
||||||
|
The PID is a 64-byte data structure that contains several fields:
|
||||||
|
|
||||||
|
Posted Interrupt Request (PIR):
|
||||||
|
a 256-bit field, one bit per request vector;
|
||||||
|
this is where the interrupts are posted;
|
||||||
|
|
||||||
|
Suppress Notification (SN):
|
||||||
|
determines whether to notify (``SN=0``) or not notify (``SN=1``)
|
||||||
|
the CPU for non-urgent interrupts. For ACRN,
|
||||||
|
all interrupts are treated as non-urgent. ACRN sets SN=0 during initialization
|
||||||
|
and then never changes it at runtime;
|
||||||
|
|
||||||
|
Notification Vector (NV):
|
||||||
|
the CPU must be notified with an interrupt and this
|
||||||
|
field specifies the vector for notification;
|
||||||
|
|
||||||
|
Notification Destination (NDST):
|
||||||
|
the physical APIC-ID of the destination.
|
||||||
|
ACRN does not support vCPU migration, one vCPU always runs on the same pCPU,
|
||||||
|
so for ACRN, NDST is never changed after initialization.
|
||||||
|
|
||||||
|
Outstanding Notification (ON):
|
||||||
|
indicates if a notification event is outstanding
|
||||||
|
|
||||||
|
The ACRN scheduler supports vCPU scheduling, where two or more vCPUs can
|
||||||
|
share the same pCPU using a time sharing technique. One issue emerges
|
||||||
|
here for VT-d posted interrupt handling process, where IRQs could happen
|
||||||
|
when the target vCPU is in a halted state. We need to handle the case
|
||||||
|
where the running vCPU disrupted by the external interrupt, is not the
|
||||||
|
target vCPU that an external interrupt should be delivered.
|
||||||
|
|
||||||
|
Consider this scenario:
|
||||||
|
|
||||||
|
* vCPU0 runs on pCPU0 and then enters a halted state,
|
||||||
|
* ACRN scheduler now chooses vCPU1 to run on pCPU0.
|
||||||
|
|
||||||
|
If an external interrupt from an assigned device destined to vCPU0
|
||||||
|
happens at this time, we do not want this interrupt to be incorrectly
|
||||||
|
consumed by vCPU1 currently running on pCPU0. This would happen if we
|
||||||
|
allocate the same Activation Notification Vector (ANV) to all vCPUs.
|
||||||
|
|
||||||
|
To circumvent this issue, ACRN allocates unique ANVs for each vCPU that
|
||||||
|
belongs to the same pCPU. The ANVs need only be unique within each pCPU,
|
||||||
|
not across all vCPUs. Since vCPU0's ANV is different from vCPU1's ANV,
|
||||||
|
if a vCPU0 is in a halted state, external interrupts from an assigned
|
||||||
|
device destined to vCPU0 delivered through the PID will not trigger the
|
||||||
|
posted interrupt processing. Instead, a VMExit to ACRN happens that can
|
||||||
|
then process the event such as waking up the halted vCPU0 and kick it
|
||||||
|
to run on pCPU0.
|
||||||
|
|
||||||
|
For ACRN, ``CONFIG_MAX_VM_NUM`` vCPUs may be running on top of a pCPU. ACRN
|
||||||
|
does not support two vCPUs of the same VM running on top of the same
|
||||||
|
pCPU. This reduces the number of pre-allocated ANVs for posted
|
||||||
|
interrupts to ``CONFIG_MAX_VM_NUM``, and enables ACRN to avoid switching
|
||||||
|
between active and wake-up vector values in the posted interrupt
|
||||||
|
descriptor on vCPU scheduling state changes. ACRN uses the following
|
||||||
|
formula to assign posted interrupt vectors to vCPUs::
|
||||||
|
|
||||||
|
NV = POSTED_INTR_VECTOR + vcpu->vm->vm_id
|
||||||
|
|
||||||
|
where ``POSTED_INTR_VECTOR`` is the starting vector (0xe3) for posted interrupts.
|
||||||
|
|
||||||
|
ACRN maintains a per-PCPU vCPU array that stores the pointers to
|
||||||
|
assigned vCPUs for each pCPU and is indexed by ``vcpu->vm->vm_id``.
|
||||||
|
When the vCPU is created, ACRN adds the vCPU to the containing pCPU's
|
||||||
|
vCPU array. When the vCPU is offline, ACRN removes the vCPU from the
|
||||||
|
related vCPU array.
|
||||||
|
|
||||||
|
An example to illustrate our solution:
|
||||||
|
|
||||||
|
.. figure:: images/passthru-image50.png
|
||||||
|
:align: center
|
||||||
|
|
||||||
|
ACRN sets ``SN=0`` during initialization and then never change it at
|
||||||
|
runtime. This means posted interrupt notification is never suppressed.
|
||||||
|
After posting the interrupt in Posted Interrupt Request (PIR), VT-d will
|
||||||
|
always notify the CPU using the interrupt vector NV, in both root and
|
||||||
|
non-root mode. With this scheme, if the target vCPU is running under
|
||||||
|
VMX non-root mode, it will receive the interrupts coming from
|
||||||
|
passed-through device without a VMExit (and therefore without any
|
||||||
|
intervention of the ACRN hypervisor).
|
||||||
|
|
||||||
|
If the target vCPU is in a halted state (under VMX non-root mode), a
|
||||||
|
scheduling request will be raised to wake it up. This is needed to
|
||||||
|
achieve real time behavior. If an RT-VM is waiting for an event, when
|
||||||
|
the event is fired (a PI interrupt fires), we need to wake up the VM
|
||||||
|
immediately.
|
||||||
|
|
||||||
|
|
||||||
MMIO Remapping
|
MMIO Remapping
|
||||||
**************
|
**************
|
||||||
|
|
||||||
|
BIN
doc/developer-guides/hld/images/passthru-image50.png
Normal file
BIN
doc/developer-guides/hld/images/passthru-image50.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 12 KiB |
Loading…
Reference in New Issue
Block a user