doc: update interrupt hld section
Transcode, edit, and upload HLD 0.7 sections 3.5 (Physical Interrupts) Tracked-on: #1610 Signed-off-by: David B. Kinder <david.b.kinder@intel.com>
@ -11,4 +11,4 @@ Hypervisor high-level design
|
|||||||
hv-cpu-virt
|
hv-cpu-virt
|
||||||
Memory management <memmgt-hld>
|
Memory management <memmgt-hld>
|
||||||
I/O Emulation <hld-io-emulation>
|
I/O Emulation <hld-io-emulation>
|
||||||
Interrupt management <interrupt-hld>
|
Physical Interrupt <interrupt-hld>
|
||||||
|
BIN
doc/developer-guides/hld/images/interrupt-image39.png
Normal file
After Width: | Height: | Size: 115 KiB |
Before Width: | Height: | Size: 36 KiB |
BIN
doc/developer-guides/hld/images/interrupt-image46.png
Normal file
After Width: | Height: | Size: 45 KiB |
BIN
doc/developer-guides/hld/images/interrupt-image48.png
Normal file
After Width: | Height: | Size: 127 KiB |
Before Width: | Height: | Size: 121 KiB |
Before Width: | Height: | Size: 29 KiB |
BIN
doc/developer-guides/hld/images/interrupt-image66.png
Normal file
After Width: | Height: | Size: 58 KiB |
Before Width: | Height: | Size: 18 KiB |
BIN
doc/developer-guides/hld/images/interrupt-image76.png
Normal file
After Width: | Height: | Size: 99 KiB |
BIN
doc/developer-guides/hld/images/interrupt-image89.png
Normal file
After Width: | Height: | Size: 11 KiB |
@ -1,15 +1,11 @@
|
|||||||
.. _interrupt-hld:
|
.. _interrupt-hld:
|
||||||
|
|
||||||
Interrupt Management high-level design
|
Physical Interrupt high-level design
|
||||||
######################################
|
####################################
|
||||||
|
|
||||||
|
|
||||||
Overview
|
Overview
|
||||||
********
|
********
|
||||||
|
|
||||||
This document describes the interrupt management high-level design for
|
|
||||||
the ACRN hypervisor.
|
|
||||||
|
|
||||||
The ACRN hypervisor implements a simple but fully functional framework
|
The ACRN hypervisor implements a simple but fully functional framework
|
||||||
to manage interrupts and exceptions, as show in
|
to manage interrupts and exceptions, as show in
|
||||||
:numref:`interrupt-modules-overview`. In its native layer, it configures
|
:numref:`interrupt-modules-overview`. In its native layer, it configures
|
||||||
@ -42,445 +38,451 @@ necessary virtual interrupt into the specific VM
|
|||||||
|
|
||||||
ACRN Interrupt SW Modules Overview
|
ACRN Interrupt SW Modules Overview
|
||||||
|
|
||||||
Hypervisor Physical Interrupt Management
|
|
||||||
****************************************
|
|
||||||
|
|
||||||
The ACRN hypervisor is responsible for all the physical interrupt
|
The hypervisor implements the following functionalities for handling
|
||||||
handling. All physical interrupts are first handled in VMX root-mode.
|
physical interrupts:
|
||||||
The "external-interrupt exiting" bit in the VM-Execution controls field
|
|
||||||
is set to support this. The ACRN hypervisor also initializes all the
|
|
||||||
interrupt related modules such as IDT, PIC, IOAPIC, and LAPIC.
|
|
||||||
|
|
||||||
Only a few physical interrupts (such as TSC-Deadline timer and IOMMU)
|
- Configure interrupt-related hardware including IDT, PIC, LAPIC, and
|
||||||
are fully serviced in the hypervisor. Most interrupts come from pass-thru
|
IOAPIC on startup.
|
||||||
devices whose interrupt are remapped to a virtual INTx/MSI source and
|
|
||||||
injected to the SOS or UOS, according to the pass-thru device
|
|
||||||
configuration.
|
|
||||||
|
|
||||||
The ACRN hypervisor does handle exceptions and any exception coming from
|
- Provide APIs to manipulate the registers of LAPIC and IOAPIC.
|
||||||
the VMX root-mode will lead to the CPU halting. For guest exception, the
|
|
||||||
hypervisor only traps #MC (machine check), prints a warning message, and
|
- Acknowledge physical interrupts.
|
||||||
injects the exception back into the guest OS.
|
|
||||||
|
- Set up a callback mechanism for the other components in the
|
||||||
|
hypervisor to request for an interrupt vector and register a
|
||||||
|
handler for that interrupt.
|
||||||
|
|
||||||
|
HV owns all native physical interrupts and manages 256 vectors per CPU.
|
||||||
|
All physical interrupts are first handled in VMX root-mode. The
|
||||||
|
"external-interrupt exiting" bit in VM-Execution controls field is set
|
||||||
|
to support this. The ACRN hypervisor also initializes all the interrupt
|
||||||
|
related modules like IDT, PIC, IOAPIC, and LAPIC.
|
||||||
|
|
||||||
|
HV does not own any host devices (except UART). All devices are by
|
||||||
|
default assigned to SOS. Any interrupts received by Guest VM (SOS or
|
||||||
|
UOS) device drivers are virtual interrupts injected by HV (via vLAPIC).
|
||||||
|
HV manages a Host-to-Guest mapping. When a native IRQ/interrupt occurs,
|
||||||
|
HV decides whether this IRQ/interrupt should be forwarded to a VM and
|
||||||
|
which VM to forward to (if any). Refer to section 3.7.6 for virtual
|
||||||
|
interrupt injection and section 3.9.6 for the management of interrupt
|
||||||
|
remapping.
|
||||||
|
|
||||||
|
HV does not own any exceptions. Guest VMCS are configured so no VM Exit
|
||||||
|
happens, with some exceptions such as #INT3 and #MC. This is to
|
||||||
|
simplify the design as HV does not support any exception handling
|
||||||
|
itself. HV supports only static memory mapping, so there should be no
|
||||||
|
#PF or #GP. If HV receives an exception indicating an error, an assert
|
||||||
|
function is then executed with an error message print out, and the
|
||||||
|
system then halts.
|
||||||
|
|
||||||
|
Native interrupts could be generated from one of the following
|
||||||
|
sources:
|
||||||
|
|
||||||
|
- GSI interrupts
|
||||||
|
|
||||||
|
- PIC or Legacy devices IRQ (0~15)
|
||||||
|
- IOAPIC pin
|
||||||
|
|
||||||
|
- PCI MSI/MSI-X vectors
|
||||||
|
- Inter CPU IPI
|
||||||
|
- LAPIC timer
|
||||||
|
|
||||||
Physical Interrupt Initialization
|
Physical Interrupt Initialization
|
||||||
=================================
|
*********************************
|
||||||
|
|
||||||
After the ACRN hypervisor get control from the bootloader, it
|
After ACRN hypervisor gets control from the bootloader, it
|
||||||
initializes all physical interrupt-related modules for all the CPUs. The
|
initializes all physical interrupt-related modules for all the CPUs. ACRN
|
||||||
ACRN hypervisor creates a framework to manage the physical interrupt for
|
hypervisor creates a framework to manage the physical interrupt for
|
||||||
hypervisor-local devices, pass-thru devices, and IPI between CPUs.
|
hypervisor local devices, pass-thru devices, and IPI between CPUs, as
|
||||||
|
shown in :numref:`hv-interrupt-init`:
|
||||||
|
|
||||||
IDT
|
.. figure:: images/interrupt-image66.png
|
||||||
---
|
|
||||||
|
|
||||||
The ACRN hypervisor builds its native Interrupt Descriptor Table (IDT) during
|
|
||||||
interrupt initialization. For exceptions, it links to function
|
|
||||||
``dispatch_exception``, and for external interrupts it links to function
|
|
||||||
``dispatch_interrupt``. Please refer to ``arch/x86/idt.S`` for more details.
|
|
||||||
|
|
||||||
LAPIC
|
|
||||||
-----
|
|
||||||
|
|
||||||
The ACRN hypervisor resets LAPIC for each CPU, and provides basic APIs
|
|
||||||
used, for example, by the local timer (TSC Deadline)
|
|
||||||
program and IPI notification program. These APIs include
|
|
||||||
write_laipic_reg32, send_lapic_eoi, send_startup_ipi, and
|
|
||||||
send_single_ipi.
|
|
||||||
|
|
||||||
|
|
||||||
.. comment
|
|
||||||
|
|
||||||
Need reference to API doc generated from doxygen comments
|
|
||||||
in hypervisor/include/arch/x86/lapic.h
|
|
||||||
|
|
||||||
PIC/IOAPIC
|
|
||||||
----------
|
|
||||||
|
|
||||||
The ACRN hypervisor masks all interrupts from PIC, so all the
|
|
||||||
legacy interrupts from PIC (<16) are linked to IOAPIC, as shown in
|
|
||||||
:numref:`interrupt-pic-pin`.
|
|
||||||
|
|
||||||
ACRN will pre-allocate vectors and mask them for these legacy interrupts
|
|
||||||
in IOAPIC RTE. For others (>= 16) ACRN will mask them with vector 0 in
|
|
||||||
RTE, and the vector will be dynamically allocated on demand.
|
|
||||||
|
|
||||||
.. figure:: images/interrupt-image5.png
|
|
||||||
:align: center
|
:align: center
|
||||||
:width: 600px
|
:name: hv-interrupt-init
|
||||||
:name: interrupt-pic-pin
|
|
||||||
|
|
||||||
PIC & IOAPIC Pin Connection
|
Physical Interrupt Initialization
|
||||||
|
|
||||||
Irq Desc
|
IDT Initialization
|
||||||
--------
|
==================
|
||||||
|
|
||||||
The ACRN hypervisor maintains a global ``irq_desc[]`` array shared among the
|
ACRN hypervisor builds its native IDT (interrupt descriptor table)
|
||||||
CPUs and uses a flat mode to manage the interrupts. The same
|
during interrupt initialization and set up the following handlers:
|
||||||
vector is linked to the same IRQ number for all CPUs.
|
|
||||||
|
|
||||||
.. comment
|
- On an exception, the hypervisor dumps its context and halts the current
|
||||||
|
physical processor (because physical exceptions are not expected).
|
||||||
|
|
||||||
Need reference to API doc generated from doxygen comments
|
- For external interrupts, HV may mask the interrupt (depending on the
|
||||||
for ``struct irq_desc`` in hypervisor/include/common/irq.h
|
trigger mode), followed by interrupt acknowledgement and dispatch
|
||||||
|
to the registered handler, if any.
|
||||||
|
|
||||||
|
Most interrupts and exceptions are handled without a stack switch,
|
||||||
|
except for machine-check, double fault, and stack fault exceptions which
|
||||||
|
have their own stack set in TSS.
|
||||||
|
|
||||||
The ``irq_desc[]`` array is indexed by the IRQ number. An
|
PIC/IOAPIC Initialization
|
||||||
``irq_handler`` field can be set to a common edge, level, or quick
|
|
||||||
handler called from ``interrupt_dispatch``. The ``irq_desc`` structure
|
|
||||||
also contains the ``dev_list`` field to maintain this IRQ's action
|
|
||||||
handler list.
|
|
||||||
|
|
||||||
The global array ``vector_to_irq[]`` is used to manage the vector
|
|
||||||
resource. This array is initialized with value ``IRQ_INVALID`` for all
|
|
||||||
vectors, and will be set to a valid IRQ number after the corresponding
|
|
||||||
vector is registered.
|
|
||||||
|
|
||||||
For example, if the local timer registers interrupt with IRQ number 271 and
|
|
||||||
vector 0xEF, then the arrays mentioned above will be set to::
|
|
||||||
|
|
||||||
irq_desc[271].irq = 271;
|
|
||||||
irq_desc[271].vector = 0xEF;
|
|
||||||
vector_to_irq[0xEF] = 271;
|
|
||||||
|
|
||||||
Physical Interrupt Flow
|
|
||||||
=======================
|
|
||||||
|
|
||||||
|
|
||||||
When an physical interrupt occurs, and the CPU is running under VMX root
|
|
||||||
mode, the interrupt is triggered from the standard native irq flow:
|
|
||||||
interrupt gate to irq handler. However, if the CPU is running under VMX
|
|
||||||
non-root mode, an external interrupt will trigger a VM exit for reason
|
|
||||||
"external-interrupt". See :numref:`interrupt-handle-flow`.
|
|
||||||
|
|
||||||
.. figure:: images/interrupt-image4.png
|
|
||||||
:align: center
|
|
||||||
:width: 800px
|
|
||||||
:name: interrupt-handle-flow
|
|
||||||
|
|
||||||
ACRN Hypervisor Interrupt Handle Flow
|
|
||||||
|
|
||||||
After an interrupt happens (in either case noted above), the ACRN
|
|
||||||
hypervisor jumps to ``dispatch_interrupt``. This function will check
|
|
||||||
which vector caused this interrupt, and the corresponding ``irq_desc``
|
|
||||||
structure's ``irq_handler`` will be called for the service.
|
|
||||||
|
|
||||||
There are several irq_handler's defined in the ACRN hypervisor, as shown
|
|
||||||
in :numref:`interrupt-handle-flow`, designed for different uses. For
|
|
||||||
example, ``quick_handler_nolock`` is used when no critical data needs
|
|
||||||
protection in the action handlers; the VCPU notification IPI and local
|
|
||||||
timer are good example of this use case.
|
|
||||||
|
|
||||||
The more complicated ``common_dev_handler_level`` handler is intended
|
|
||||||
for pass-thru devices with level triggered interrupts. To avoid
|
|
||||||
continuously triggering the interrupt, it initially masks IOAPIC pin and
|
|
||||||
unmasks it only when the corresponding vIOAPIC pin gets an explicit EOI
|
|
||||||
ACK from the guest.
|
|
||||||
|
|
||||||
All the irq handler's finally call their own action handler list, as
|
|
||||||
shown here:
|
|
||||||
|
|
||||||
.. code-block: c
|
|
||||||
|
|
||||||
struct dev_handler_node \*dev = desc->dev_list;
|
|
||||||
while (dev != NULL) {
|
|
||||||
if (dev->dev_handler != NULL)
|
|
||||||
dev->dev_handler(desc->irq, dev->dev_data);
|
|
||||||
dev = dev->next;
|
|
||||||
}
|
|
||||||
|
|
||||||
The common APIs for registering, updating, and unregistering
|
|
||||||
interrupt handlers include irq_to_vector, dev_to_irq, dev_to_vector,
|
|
||||||
pri_register_handler, normal_register_handler,
|
|
||||||
unregister_handler_common, and update_irq_handler.
|
|
||||||
|
|
||||||
.. comment
|
|
||||||
|
|
||||||
Need reference to API doc generated from doxygen comments
|
|
||||||
in hypervisor/include/common/irq.h
|
|
||||||
|
|
||||||
.. _physical_interrupt_source:
|
|
||||||
|
|
||||||
Physical Interrupt Source
|
|
||||||
=========================
|
=========================
|
||||||
|
|
||||||
The ACRN hypervisor handles interrupts from many different sources, as
|
ACRN hypervisor masks all interrupts from the PIC. All legacy interrupts
|
||||||
shown in :numref:`interrupt-source`:
|
from PIC (<16) will be linked to IOAPIC, as shown in the connections in
|
||||||
|
:numref:`hv-pic-config`.
|
||||||
|
|
||||||
|
ACRN will pre-allocate vectors and mask them for these legacy interrupt
|
||||||
|
in IOAPIC RTE. For others (>= 16), ACRN will mask them with vector 0 in
|
||||||
|
RTE, and the vector will be dynamically allocate on demand.
|
||||||
|
|
||||||
.. list-table:: Physical Interrupt Source
|
All external IOAPIC pins are categorized as GSI interrupt according to
|
||||||
:widths: 15 10 60
|
ACPI definition. HV supports multiple IOAPIC components. IRQ PIN to GSI
|
||||||
|
mappings are maintained internally to determine GSI source IOAPIC.
|
||||||
|
Native PIC is not used in the system.
|
||||||
|
|
||||||
|
.. figure:: images/interrupt-image46.png
|
||||||
|
:align: center
|
||||||
|
:name: hv-pic-config
|
||||||
|
|
||||||
|
HV PIC/IOAPIC/LAPIC configuration
|
||||||
|
|
||||||
|
LAPIC Initialization
|
||||||
|
====================
|
||||||
|
|
||||||
|
Physical LAPICs are in xAPIC mode in ACRN hypervisor. The hypervisor
|
||||||
|
initializes LAPIC for each physical CPU by masking all interrupts in the
|
||||||
|
local vector table (LVT), clearing all ISRs, and enabling LAPIC.
|
||||||
|
|
||||||
|
APIs are provided to access LAPIC for the other components in the
|
||||||
|
hypervisor, aiming for further usage of local timer (TSC Deadline)
|
||||||
|
program, IPI notification program, etc. See :ref:`hv_interrupt-data-api`
|
||||||
|
for a complete list.
|
||||||
|
|
||||||
|
HV Interrupt Vectors and Delivery Mode
|
||||||
|
======================================
|
||||||
|
|
||||||
|
The interrupt vectors are assigned as shown here:
|
||||||
|
|
||||||
|
**Vector 0-0x1F**
|
||||||
|
are exceptions that are not handled by HV. If
|
||||||
|
such an exception does occur, the system then halts.
|
||||||
|
|
||||||
|
**Vector: 0x20-0x2F**
|
||||||
|
are allocated statically for legacy IRQ0-15.
|
||||||
|
|
||||||
|
**Vector: 0x30-0xDF**
|
||||||
|
are dynamically allocated vectors for PCI devices
|
||||||
|
INTx or MSI/MIS-X usage. According to different interrupt delivery mode
|
||||||
|
(FLAT or PER_CPU mode), an interrupt will be assigned to a vector for
|
||||||
|
all the CPUs or a particular CPU.
|
||||||
|
|
||||||
|
**Vector: 0xE0-0xFE**
|
||||||
|
are high priority vectors reserved by HV for
|
||||||
|
dedicated purposes. For example, 0xEF is used for timer, 0xF0 is used
|
||||||
|
for IPI.
|
||||||
|
|
||||||
|
.. list-table::
|
||||||
|
:widths: 30 70
|
||||||
:header-rows: 1
|
:header-rows: 1
|
||||||
:name: interrupt-source
|
|
||||||
|
|
||||||
* - Interrupt Source
|
* - Vectors
|
||||||
- Vector
|
- Usage
|
||||||
- Description
|
|
||||||
* - TSC Deadline Timer
|
|
||||||
- 0xEF
|
|
||||||
- The TSC deadline timer implements the timer framework in
|
|
||||||
the hypervisor based on the LAPIC TSC deadline. This interrupt's
|
|
||||||
target is specific to the CPU to which the LAPIC belongs.
|
|
||||||
* - CPU Startup IPI
|
|
||||||
- N/A
|
|
||||||
- The BSP needs to trigger an INIT-SIPI sequence to wake up the
|
|
||||||
APs. This interrupt's target is specified by the BSP calling
|
|
||||||
`` start_cpus()``.
|
|
||||||
* - VCPU Notify IPI
|
|
||||||
- 0xF0
|
|
||||||
- When the hypervisor needs to kick the VCPU out of VMX non-root
|
|
||||||
mode to do requests such as virtual interrupt injection, EPT
|
|
||||||
flush, etc. This interrupt's target is specified by function
|
|
||||||
``send_single_ipi()``.
|
|
||||||
* - IOMMU MSI
|
|
||||||
- dynamic
|
|
||||||
- IOMMU device supports an MSI interrupt. The vtd device driver in
|
|
||||||
the hypervisor will register an interrupt to handle dmar fault.
|
|
||||||
This interrupt's target is specified by vtd device driver.
|
|
||||||
* - PTdev INTx
|
|
||||||
- dynamic
|
|
||||||
- All native devices are owned by the guest (SOS or UOS), taking
|
|
||||||
advantage of the pass-thru method. Each pass-thru device connected
|
|
||||||
with IOAPIC/PIC (PTdev INTx) will register an interrupt when
|
|
||||||
its attached interrupt controller pin first gets unmasked.
|
|
||||||
This interrupt's target is defined by and RTE entry in the IOAPIC.
|
|
||||||
* - PTdev MSI
|
|
||||||
- dynamic
|
|
||||||
- All native devices are owned by the guest (SOS or UOS), taking
|
|
||||||
advantage of pass-thru method. Each pass-thru device with
|
|
||||||
enabled MSI (PTdev MSI) will register an interrupt when the SOS
|
|
||||||
does an explicit hypercall. This interrupt's target is defined
|
|
||||||
by an MSI address entry.
|
|
||||||
|
|
||||||
Softirq
|
* - 0x0-0x13
|
||||||
=======
|
- Exceptions: NMI, INT3, page dault, GP, debug.
|
||||||
|
|
||||||
ACRN hypervisor implements a simple bottom-half softirq to execute the
|
* - 0x14-0x1F
|
||||||
interrupt handler, as showed in :numref:`interrupt-handle-flow`.
|
- Reserved
|
||||||
The softirq is executed when an interrupt is enabled. Several APIs for softirq
|
|
||||||
are defined including enable_softirq, disable_softirq, raise_softirq,
|
|
||||||
and exec_softirq.
|
|
||||||
|
|
||||||
.. comment
|
* - 0x20-0x2F
|
||||||
|
- Statically allocated for external IRQ (IRQ0-IRQ15)
|
||||||
|
|
||||||
Need reference to API doc generated from doxygen comments
|
* - 0x30-0xDF
|
||||||
in hypervisor/include/common/softirq.h
|
- Dynamically allocated for IOAPIC IRQ from PCI INTx/MSI
|
||||||
|
|
||||||
Physical Exception Handling
|
* - 0xE0-0xFE
|
||||||
===========================
|
- Static allocated for HV
|
||||||
|
|
||||||
As mentioned earlier, the ACRN hypervisor does not handle any
|
* - 0xEF
|
||||||
physical exceptions. The VMX root mode code path should guarantee no
|
- Timer
|
||||||
exceptions are triggered while the hypervisor is running.
|
|
||||||
|
|
||||||
Guest Virtual Interrupt Management
|
* - 0xF0
|
||||||
**********************************
|
- IPI
|
||||||
|
|
||||||
The previous sections describe physical interrupt management in the ACRN
|
* - 0xFF
|
||||||
hypervisor. After a physical interrupt happens, a registered action
|
- SPURIOUS_APIC_VECTOR
|
||||||
handler is executed. Usually, the action handler represents a service
|
|
||||||
for virtual interrupt injection. For example, if an interrupt is
|
|
||||||
triggered from a pass-thru device, the appropriate virtual interrupt
|
|
||||||
should be injected into its guest VM.
|
|
||||||
|
|
||||||
The virtual interrupt injection could also come from an emulated device.
|
Interrupts from either IOAPIC or MSI can be delivered to a target CPU.
|
||||||
The I/O mediator in the Service OS (SOS) could trigger an interrupt
|
By default they are configured as Lowest Priority (FLAT mode), i.e. they
|
||||||
through a hypercall, and then do the virtual interrupt injection in the
|
are delivered to a CPU core that is currently idle or executing lowest
|
||||||
hypervisor.
|
priority ISR. There is no guarantee a device's interrupt will be
|
||||||
|
delivered to a specific Guest's CPU. Timer interrupts are an exception -
|
||||||
|
these are always delivered to the CPU which programs the LAPIC timer.
|
||||||
|
|
||||||
The following sections give an introduction to the ACRN guest virtual
|
There are two interrupt delivery modes: FLAT mode and PER_CPU mode. ACRN
|
||||||
interrupt management, including VCPU request for virtual interrupt kick
|
uses FLAT MODE where the interrupt/irq to vector mapping is the same on all CPUs. Every
|
||||||
off, vPIC/vIOAPIC/vLAPIC for virtual interrupt injection interfaces,
|
CPU receives same interrupts. IOAPIC and LAPIC MSI delivery mode are
|
||||||
physical-to-virtual interrupt mapping for a pass-thru device, and the
|
configured to Lowest Priority.
|
||||||
process of VMX interrupt/exception injection.
|
|
||||||
|
|
||||||
VCPU Request
|
Vector allocation for CPUs is shown here:
|
||||||
============
|
|
||||||
|
|
||||||
As mentioned in `physical_interrupt_source`_, physical vector 0xF0 is
|
.. figure:: images/interrupt-image89.png
|
||||||
used to kick the VCPU out of its VMX non-root mode, and make a request
|
|
||||||
for virtual interrupt injection or other requests such as flush EPT.
|
|
||||||
|
|
||||||
The request-make API (vcpu_make_request) and eventid supports virtual interrupt
|
|
||||||
injection.
|
|
||||||
|
|
||||||
.. comment
|
|
||||||
|
|
||||||
Need reference to API doc generated from doxygen comments
|
|
||||||
in hypervisor/include/common/irq.h
|
|
||||||
|
|
||||||
There are requests for exception injection (ACRN_REQUEST_EXCP), vLAPIC
|
|
||||||
event (ACRN_REQUEST_EVENT), external interrupt from vPIC
|
|
||||||
(ACRN_REQUEST_EXTINT) and non-maskable-interrupt (ACRN_REQUEST_NMI).
|
|
||||||
|
|
||||||
The ``vcpu_make_request`` is necessary for a virtual interrupt
|
|
||||||
injection. If the target VCPU is running under VMX non-root mode, it
|
|
||||||
will send an IPI to kick it out and results in an external-interrupt
|
|
||||||
VM-Exit. The flow of :numref:`interrupt-handle-flow` could be executed
|
|
||||||
to complete the injection of a virtual interrupt.
|
|
||||||
|
|
||||||
There are some cases that do not need to send an IPI when making a
|
|
||||||
request because the CPU making the request is the target VCPU. For
|
|
||||||
example, the #GP exception request always happens on the current CPU
|
|
||||||
when an invalid emulation happens. An external interrupt for a pass-thru
|
|
||||||
device always happens on the VCPUs the device belongs to, so after it
|
|
||||||
triggers an external-interrupt VM-Exit, the current CPU is also the
|
|
||||||
target VCPU.
|
|
||||||
|
|
||||||
Virtual PIC
|
|
||||||
===========
|
|
||||||
|
|
||||||
The ACRN hypervisor emulates a vPIC for each VM based on IO ranges
|
|
||||||
0x20-0x21, 0xa0-0xa1, or 0x4d0-0x4d1.
|
|
||||||
|
|
||||||
If an interrupt source from vPIC needs to inject an interrupt,
|
|
||||||
the vpic_assert_irq, vpic_deassert_irq, or vpic_pulse_irq functions can
|
|
||||||
be called to make a request for ACRN_REQUEST_EXTINT or
|
|
||||||
ACRN_REQUEST_EVENT:
|
|
||||||
|
|
||||||
.. comment
|
|
||||||
|
|
||||||
Need reference to API doc generated from doxygen comments
|
|
||||||
in hypervisor/include/common/vpic.h
|
|
||||||
|
|
||||||
The vpic_pending_intr and vpic_intr_accepted APIs are used to query the
|
|
||||||
vector being injected and ACK the service, by moving the interrupt from
|
|
||||||
request service (IRR) to in service (ISR).
|
|
||||||
|
|
||||||
|
|
||||||
Virtual IOAPIC
|
|
||||||
==============
|
|
||||||
|
|
||||||
ACRN hypervisor emulates a vIOAPIC for each VM based on MMIO
|
|
||||||
VIOAPIC_BASE.
|
|
||||||
|
|
||||||
If an interrupt source from vIOAPIC needs to inject an interrupt, the
|
|
||||||
vioapic_assert_irq, vioapic_dessert_irq, and vioapic_pulse_irq APIs are
|
|
||||||
used to make a request for ACRN_REQUEST_EVENT.
|
|
||||||
|
|
||||||
As the vIOAPIC is always associated with a vLAPIC, the virtual interrupt
|
|
||||||
injection from vIOAPIC will finally trigger a request for an vLAPIC
|
|
||||||
event.
|
|
||||||
|
|
||||||
Virtual LAPIC
|
|
||||||
=============
|
|
||||||
|
|
||||||
The ACRN hypervisor emulates a vLAPIC for each VCPU based on MMIO
|
|
||||||
DEFAULT_APIC_BASE.
|
|
||||||
|
|
||||||
If an interrupt source from vLAPIC needs to inject an interrupt (e.g.,
|
|
||||||
from LVT such as an LAPIC timer, from vIOAPIC for a pass-thru device
|
|
||||||
interrupt, or from an emulated device for a MSI), vlapic_intr_level,
|
|
||||||
vlapic_intr_edge, vlapic_set_local_intr, vlapic_intr_msi,
|
|
||||||
vlapic_deliver_intr APIs need to be called, resulting in a request for
|
|
||||||
ACRN_REQUEST_EVENT.
|
|
||||||
|
|
||||||
.. comment
|
|
||||||
|
|
||||||
Need reference to API doc generated from doxygen comments
|
|
||||||
in hypervisor/include/common/vlapic.h
|
|
||||||
|
|
||||||
|
|
||||||
The vlapic_pending_intr and vlapic_intr_accepted APIs are used to query
|
|
||||||
the vector that needs to be injected and ACK
|
|
||||||
the service that move the interrupt from request service (IRR) to in
|
|
||||||
service (ISR).
|
|
||||||
|
|
||||||
By default, the ACRN hypervisor enables vAPIC to improve the performance of
|
|
||||||
a vLAPIC emulation.
|
|
||||||
|
|
||||||
Virtual Exception
|
|
||||||
=================
|
|
||||||
|
|
||||||
When doing emulation, an exception may be triggered in the hypervisor,
|
|
||||||
for example, if guest accesses an invalid vMSR register, or the
|
|
||||||
hypervisor needs to inject a #GP, or during instruction emulation, an
|
|
||||||
instruction fetch may access a non-exist page from rip_gva, and a #PF
|
|
||||||
must be injected.
|
|
||||||
|
|
||||||
ACRN hypervisor implements virtual exception injection using the
|
|
||||||
vcpu_queue_exception, vcpu_inject_gq, and vcpu_inject_pf APIs.
|
|
||||||
|
|
||||||
.. comment
|
|
||||||
|
|
||||||
Need reference to API doc generated from doxygen comments
|
|
||||||
in hypervisor/include/common/irq.h
|
|
||||||
|
|
||||||
The ACRN hypervisor uses vcpu_inject_gp/vcpu_inject_pf functions to
|
|
||||||
queue exception requests, and follows `Intel Software
|
|
||||||
Developer Manual, Vol 3. <SDM vol3>`_ - 6.15, Table 6-5
|
|
||||||
listing conditions for generating a double fault.
|
|
||||||
|
|
||||||
.. _SDM vol3: https://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-system-programming-manual-325384.html
|
|
||||||
|
|
||||||
Interrupt Mapping for a Pass-thru Device
|
|
||||||
========================================
|
|
||||||
|
|
||||||
A VM can control a PCI device directly through pass-thru device
|
|
||||||
assignment. The pass-thru entry is the major info object, and it is:
|
|
||||||
|
|
||||||
- A physical interrupt source, and could be a MSI/MSIX entry, PIC pins, or
|
|
||||||
IOAPIC pins
|
|
||||||
- Pass-thru remapping information between physical and virtual interrupt
|
|
||||||
source, for MSI/MSIX it is identified by a PCI device's BDF. For
|
|
||||||
PIC/IOAPIC it is identified by the pin number.
|
|
||||||
|
|
||||||
.. figure:: images/interrupt-image7.png
|
|
||||||
:align: center
|
:align: center
|
||||||
:width: 600px
|
|
||||||
:name: interrupt-pass-thru
|
|
||||||
|
|
||||||
Pass-thru Device Entry Assignment
|
FLAT mode vector allocation
|
||||||
|
|
||||||
As shown in :numref:`interrupt-pass-thru` above, a UOS will assign its
|
IRQ Descriptor Table
|
||||||
pass-thru device entry by the DM, and it will fill its entry info from:
|
====================
|
||||||
|
|
||||||
- vPIC/vIOAPIC interrupt mask/unmask
|
ACRN hypervisor maintains a global IRQ Descriptor Table shared among the
|
||||||
- MSI IOReq from UOS then MSI hypercall from SOS
|
physical CPUs. ACRN use FLAT MODE to manage the interrupts so the
|
||||||
|
same vector will link to same the IRQ number for all CPUs.
|
||||||
|
|
||||||
The SOS adds its pass-thru device entry at runtime and fills info for:
|
.. note:: need to reference API doc for irq_desc
|
||||||
|
|
||||||
- vPIC/vIOAPIC interrupt mask/unmask
|
|
||||||
- MSI hypercall from SOS
|
|
||||||
|
|
||||||
During the pass-thru device entry info filling, the hypervisor builds
|
The *irq_desc[]* array's index represents IRQ number. An *irq_handler*
|
||||||
native IOAPIC RTE/MSI entry based on vIOAPIC/vPIC/vMSI configuration,
|
field could be set to common edge/level/quick handler which will be
|
||||||
and register the physical interrupt handler for it. Then with the pass-thru
|
called from *interrupt_dispatch*. The *irq_desc* structure also
|
||||||
device entry as the handler private data, the physical interrupt can
|
contains the *dev_list* field to maintain this IRQ's action handler
|
||||||
be linked to a virtual pin of a guest's vPIC/vIOAPIC or virtual vector of
|
list.
|
||||||
a guest's vMSI. The handler then injects the corresponding virtual
|
|
||||||
interrupt into the guest, based on vPIC/vIOAPIC/vLAPIC APIs described
|
|
||||||
earlier.
|
|
||||||
|
|
||||||
Interrupt Storm Mitigation
|
Another reverse mapping from vector to IRQ is used in addition to the
|
||||||
==========================
|
IRQ descriptor table which maintains the mapping from IRQ to vector.
|
||||||
|
|
||||||
When the Device Model (DM) launches a User OS (UOS), the ACRN hypervisor
|
On initialization, the descriptor of the legacy IRQs are initialized with
|
||||||
will remap the interrupt for this user OS's pass-through devices. When
|
proper vectors and the corresponding reverse mapping is set up.
|
||||||
an interrupt occurs for a pass-through device, the CPU core is assigned
|
The descriptor of other IRQs are filled with an invalid
|
||||||
to that User OS gets trapped into the hypervisor. The benefit of such a
|
vector which will be updated on IRQ allocation.
|
||||||
mechanism is that, should an interrupt storm happen in a particular UOS,
|
|
||||||
it will have only a minimal effect on the performance of the Service OS.
|
|
||||||
|
|
||||||
Interrupt/Exception Injection Process
|
For example, if local timer registers an interrupt with IRQ number 271 and
|
||||||
=====================================
|
vector 0xEF, then this date will be set up:
|
||||||
|
|
||||||
As shown in :numref:`interrupt-handle-flow`, the ACRN hypervisor injects
|
.. code-block:: c
|
||||||
virtual interrupt/exception to the guest before its VM-Entry.
|
|
||||||
|
|
||||||
This is done by updating the VMX_ENTRY_INT_INFO_FIELD of the VCPU's
|
irq_desc[271].irq = 271
|
||||||
VMCS. As this field is unique, the interrupt/exception injection must
|
irq_desc[271].vector = 0xEF
|
||||||
follow a priority rule to handle one-by-one.
|
vector_to_irq[0xEF] = 271
|
||||||
|
|
||||||
:numref:`interrupt-injection` below shows the rules about how to inject
|
External Interrupt Handling
|
||||||
virtual interrupt/exception one-by-one. If a high priority
|
***************************
|
||||||
interrupt/exception was already injected, the next pending
|
|
||||||
interrupt/exception will enable an interrupt window where the next
|
|
||||||
injection will be done by the following VM-Exit, triggered by the
|
|
||||||
interrupt window.
|
|
||||||
|
|
||||||
.. figure:: images/interrupt-image6.png
|
CPU runs under VMX non-root mode and inside Guest VMs.
|
||||||
|
``MSR_IA32_VMX_PINBASED_CTLS.bit[0]`` and
|
||||||
|
``MSR_IA32_VMX_EXIT_CTLS.bit[15]`` are set to allow vCPU VM Exit to HV
|
||||||
|
whenever there are interrupts to that physical CPU under
|
||||||
|
non-root mode. HV ACKs the interrupts in VMX non-root and saves the
|
||||||
|
interrupt vector to the relevant VM Exit field for HV IRQ processing.
|
||||||
|
|
||||||
|
Note that as discussed above, an external interrupt causing vCPU VM Exit
|
||||||
|
to HV does not mean that the interrupt belongs to that Guest VM. When
|
||||||
|
CPU executes VM Exit into root-mode, interrupt handling will be enabled
|
||||||
|
and the interrupt will be delivered and processed as quickly as possible
|
||||||
|
inside HV. HV may emulate a virtual interrupt and inject to Guest if
|
||||||
|
necessary.
|
||||||
|
|
||||||
|
When an physical interrupt happened on a CPU, this CPU could be running
|
||||||
|
under VMX root mode or non-root mode. If the CPU is running under VMX
|
||||||
|
root mode, the interrupt is triggered from standard native IRQ flow -
|
||||||
|
interrupt gate to IRQ handler. If the CPU is running under VMX non-root
|
||||||
|
mode, an external interrupt will trigger a VM exit for reason
|
||||||
|
"external-interrupt".
|
||||||
|
|
||||||
|
Interrupt and IRQ processing flow diagrams are shown below:
|
||||||
|
|
||||||
|
.. figure:: images/interrupt-image48.png
|
||||||
:align: center
|
:align: center
|
||||||
:width: 600px
|
:name: phy-interrupt-processing
|
||||||
:name: interrupt-injection
|
|
||||||
|
|
||||||
ACRN Hypervisor Interrupt/Exception Injection Process
|
Processing of physical interrupts
|
||||||
|
|
||||||
|
.. figure:: images/interrupt-image39.png
|
||||||
|
:align: center
|
||||||
|
|
||||||
|
IRQ processing control flow
|
||||||
|
|
||||||
|
When a physical interrupt is raised and delivered to a physical CPU, the
|
||||||
|
CPU may be running under either VMX root mode or non-root mode.
|
||||||
|
|
||||||
|
- If the CPU is running under VMX root mode, the interrupt is handled
|
||||||
|
following the standard native IRQ flow: interrupt gate to
|
||||||
|
dispatch_interrupt(), IRQ handler, and finally the registered callback.
|
||||||
|
- If the CPU is running under VMX non-root mode, an external interrupt
|
||||||
|
calls a VM exit for reason "external-interrupt", and then the VM
|
||||||
|
exit processing flow will call dispatch_interrupt() to dispatch and
|
||||||
|
handle the interrupt.
|
||||||
|
|
||||||
|
After an interrupt occures from either path shown in
|
||||||
|
:numref:`phy-interrupt-processing`, ACRN hypervisor will jump to
|
||||||
|
dispatch_interrupt. This function gets the vector of the generated
|
||||||
|
interrupt from the context, gets IRQ number from vector_to_irq[], and
|
||||||
|
then gets the corresponding irq_desc.
|
||||||
|
|
||||||
|
Though there is only one generic IRQ handler for registered interrupt,
|
||||||
|
there are three different handling flows according to flags:
|
||||||
|
|
||||||
|
- ``!IRQF_LEVEL``
|
||||||
|
- ``IRQF_LEVEL && !IRQF_PT``
|
||||||
|
|
||||||
|
To avoid continuous interrupt triggers, it masks the IOAPIC pin and
|
||||||
|
unmask it only after IRQ action callback is executed
|
||||||
|
|
||||||
|
- ``IRQF_LEVEL && IRQF_PT``
|
||||||
|
|
||||||
|
For pass-thru devices, to avoid continuous interrupt triggers, it masks
|
||||||
|
the IOAPIC pin and leaves it unmasked until corresponding vIOAPIC
|
||||||
|
pin gets an explicit EOI ACK from guest.
|
||||||
|
|
||||||
|
Since interrupts are not shared for multiple devices, there is only one
|
||||||
|
IRQ action registered for each interrupt
|
||||||
|
|
||||||
|
The IRQ number inside HV is a software concept to identify GSI and
|
||||||
|
Vectors. Each GSI will be mapped to one IRQ. The GSI number is usually the same
|
||||||
|
as the IRQ number. IRQ numbers greater than max GSI (nr_gsi) number are dynamically
|
||||||
|
assigned. For example, HV allocates an interrupt vector to a PCI device,
|
||||||
|
an IRQ number is then assigned to that vector. When the vector later
|
||||||
|
reaches a CPU, the corresponding IRQ routine is located and executed.
|
||||||
|
|
||||||
|
See :numref:`request-irq` for request IRQ control flow for different
|
||||||
|
conditions:
|
||||||
|
|
||||||
|
.. figure:: images/interrupt-image76.png
|
||||||
|
:align: center
|
||||||
|
:name: request-irq
|
||||||
|
|
||||||
|
Request IRQ for different conditions
|
||||||
|
|
||||||
|
IPI Management
|
||||||
|
**************
|
||||||
|
|
||||||
|
The only purpose of IPI use in HV is to kick a vCPU out of non-root mode
|
||||||
|
and enter to HV mode. This requires I/O request and virtual interrupt
|
||||||
|
injection be distributed to different IPI vectors. The I/O request uses
|
||||||
|
IPI vector 0xF4 upcall (refer to Chapter 5.4). The virtual interrupt
|
||||||
|
injection uses IPI vector 0xF0.
|
||||||
|
|
||||||
|
0xF4 upcall
|
||||||
|
A Guest vCPU VM Exit exits due to EPT violation or IO instruction trap.
|
||||||
|
It requires Device Module to emulate the MMIO/PortIO instruction.
|
||||||
|
However it could be that the Service OS (SOS) vCPU0 is still in non-root
|
||||||
|
mode. So an IPI (0xF4 upcall vector) should be sent to the physical CPU0
|
||||||
|
(with non-root mode as vCPU0 inside SOS) to force vCPU0 to VM Exit due
|
||||||
|
to the external interrupt. The virtual upcall vector is then injected to
|
||||||
|
SOS, and the vCPU0 inside SOS then will pick up the IO request and do
|
||||||
|
emulation for other Guest.
|
||||||
|
|
||||||
|
0xF0 IPI flow
|
||||||
|
If Device Module inside SOS needs to inject an interrupt to other Guest
|
||||||
|
such as vCPU1, it will issue an IPI first to kick CPU1 (assuming CPU1 is
|
||||||
|
running on vCPU1) to root-hv_interrupt-data-apmode. CPU1 will inject the
|
||||||
|
interrupt before VM Enter.
|
||||||
|
|
||||||
|
.. _hv_interrupt-data-api:
|
||||||
|
|
||||||
|
Data structures and interfaces
|
||||||
|
******************************
|
||||||
|
|
||||||
|
IOAPIC
|
||||||
|
======
|
||||||
|
|
||||||
|
The following APIs are external interfaces for IOAPIC related
|
||||||
|
operations.
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
void ioapic_get_rte(uint32_t irq, union ioapic_rte *rte)
|
||||||
|
/* get the redirection table entry of an irq. */
|
||||||
|
|
||||||
|
void ioapic_set_rte(uint32_t irq, union ioapic_rte rte)
|
||||||
|
/* Set the redirection table entry of an irq. */
|
||||||
|
|
||||||
|
uint32_t pin_to_irq(uint8_t pin)
|
||||||
|
/* Get irq num from physical irq pin num */
|
||||||
|
|
||||||
|
void suspend_ioapic(void)
|
||||||
|
/* Suspended ioapic, mainly save the RTEs. */
|
||||||
|
|
||||||
|
void resume_ioapic(void)
|
||||||
|
/* Resume ioapic, mainly restore the RTEs. */
|
||||||
|
|
||||||
|
int get_ioapic_info(char *str_arg, int str_max_len)
|
||||||
|
/* Dump information of ioapic for debug, such as irq num, pin,
|
||||||
|
* RTE, vector, trigger mode etc. For debugging only.
|
||||||
|
*/
|
||||||
|
|
||||||
|
LAPIC
|
||||||
|
=====
|
||||||
|
|
||||||
|
The following APIs are external interfaces for LAPIC related operations.
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
void write_lapic_reg32(uint32_t offset, uint32_t value)
|
||||||
|
/* Write to lapic register. */
|
||||||
|
|
||||||
|
void early_init_lapic(void)
|
||||||
|
/* To get the local apic base addr, map lapic registers and check the
|
||||||
|
* xAPIC/x2APIC capability.
|
||||||
|
*/
|
||||||
|
|
||||||
|
void save_lapic(struct lapic_regs *regs)
|
||||||
|
/* Save context of lapic before entering s3. */
|
||||||
|
|
||||||
|
void restore_lapic(struct lapic_regs *regs)
|
||||||
|
/* Restore context of lapic when resume from s3. */
|
||||||
|
|
||||||
|
void resume_lapic(void)
|
||||||
|
/* Resume lapic by setting the apic base addr and restore the registers. */
|
||||||
|
|
||||||
|
uint8_t get_cur_lapic_id(void)
|
||||||
|
/* Get the lapic id. */
|
||||||
|
|
||||||
|
IPI
|
||||||
|
===
|
||||||
|
|
||||||
|
The following APIs are external interfaces for IPI related operations.
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
void send_startup_ipi(enum intr_cpu_startup_shorthand cpu_startup_shorthand,
|
||||||
|
uint16_t dest_pcpu_id, uint64_t cpu_startup_start_address)
|
||||||
|
/* Send an SIPI to a specific cpu, to notify the cpu to start booting. */
|
||||||
|
|
||||||
|
void send_dest_ipi(uint32_t dest, uint32_t vector, uint32_t dest_mode)
|
||||||
|
/* Send an IPI to a specific cpu with dest mode specified. */
|
||||||
|
|
||||||
|
void send_single_ipi(uint16_t pcpu_id, uint32_t vector)
|
||||||
|
/* Send an IPI to a specific cpu with physical dest mode. */
|
||||||
|
|
||||||
|
|
||||||
|
Physical Interrupt
|
||||||
|
==================
|
||||||
|
|
||||||
|
The following APIs are external interfaces for physical interrupt
|
||||||
|
related operations.
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int32_t request_irq(uint32_t req_irq, irq_action_t action_fn, void *priv_data,
|
||||||
|
uint32_t flags)
|
||||||
|
/* Request interrupt num if not specified, and register irq action for the
|
||||||
|
* specified/allocated irq.
|
||||||
|
*/
|
||||||
|
|
||||||
|
void free_irq(uint32_t irq)
|
||||||
|
/* Free irq num and unregister the irq action. */
|
||||||
|
|
||||||
|
void set_irq_trigger_mode(uint32_t irq, bool is_level_trigger)
|
||||||
|
/* Set the irq trigger mode: edge-triggered or level-triggered */
|
||||||
|
|
||||||
|
uint32_t irq_to_vector(uint32_t irq)
|
||||||
|
/* Convert irq num to vector */
|
||||||
|
|
||||||
|
void get_cpu_interrupt_info(char *str_arg, int str_max)
|
||||||
|
/* To dump interrupt statistics info, such as irq num, vector,
|
||||||
|
* irq count on each physical cpu.
|
||||||
|
*/
|
||||||
|
|
||||||
|
void dispatch_interrupt(struct intr_excp_ctx *ctx)
|
||||||
|
/* To dispatch an interrupt, an action callback will be called if registered. */
|
||||||
|
|
||||||
|
void interrupt_init(uint16_t pcpu_id)
|
||||||
|
/* To do interrupt initialization for a cpu, will be called for
|
||||||
|
* each physical cpu.
|
||||||
|
*/
|
||||||
|