diff --git a/doc/developer-guides/hld/hld-hypervisor.rst b/doc/developer-guides/hld/hld-hypervisor.rst index 6ca6fb440..0edc8754a 100644 --- a/doc/developer-guides/hld/hld-hypervisor.rst +++ b/doc/developer-guides/hld/hld-hypervisor.rst @@ -15,3 +15,4 @@ Hypervisor high-level design Timer Virtual Interrupt VT-d + Device Passthrough diff --git a/doc/developer-guides/hld/hv-dev-passthrough.rst b/doc/developer-guides/hld/hv-dev-passthrough.rst new file mode 100644 index 000000000..35fc79f79 --- /dev/null +++ b/doc/developer-guides/hld/hv-dev-passthrough.rst @@ -0,0 +1,275 @@ +.. _hv-device-passthrough: + +Device PassThrough +################## + +A critical part of virtualization is virtualizing devices: exposing all +aspects of a device including its I/O, interrupts, DMA, and configuration. +There are three typical device +virtualization methods: emulation, para-virtualization, and passthrough. +Both emulation and passthrough are used in ACRN project. Device +emulation is discussed in :ref:`hld-io-emulation` and +device passthrough will be discussed here. + +In the ACRN project, device emulation means emulating all existing hardware +resource through a software component device model running in the +Service OS (SOS). Device +emulation must maintain the same SW interface as a native device, +providing transparency to the VM software stack. Passthrough implemented in +hypervisor assigns a physical device to a VM so the VM can access +the hardware device directly with minimal (if any) VMM involvement. + +The difference between device emulation and passthrough is shown in +:numref:`emu-passthru-diff`. You can notice device emulation has +a longer access path which causes worse performance compared with +passthrough. Passthrough can deliver near-native performance, but +can’t support device sharing. + +.. figure:: images/passthru-image30.png + :align: center + :name: emu-passthru-diff + + Difference between Emulation and passthrough + +Passthrough in the hypervisor provides the following functionalities to +allow VM to access PCI devices directly: + +- DMA Remapping by VT-d for PCI device: hypervisor will setup DMA + remapping during VM initialization phase. +- MMIO Remapping between virtual and physical BAR +- Device configuration Emulation +- Remapping interrupts for PCI device +- ACPI configuration Virtualization +- GSI sharing violation check + +The following diagram details passthrough initialization control flow in ACRN: + +.. figure:: images/passthru-image22.png + :align: center + + Passthrough devices initialization control flow + +Passthrough Device status +************************* + +Most common devices on supported platforms are enabled for +passthrough, as detailed here: + +.. figure:: images/passthru-image77.png + :align: center + + Passthrough Device Status + +DMA Remapping +************* + +To enable passthrough, for VM DMA access the VM can only +support GPA, while physical DMA requires HPA. One work-around +is building identity mapping so that GPA is equal to HPA, but this +is not recommended as some VM don’t support relocation well. To +address this issue, Intel introduces VT-d in chipset to add one +remapping engine to translate GPA to HPA for DMA operations. + +Each VT-d engine (DMAR Unit), maintains a remapping structure +similar to a page table with device BDF (Bus/Dev/Func) as input and final +page table for GPA/HPA translation as output. The GPA/HPA translation +page table is similar to a normal multi-level page table. + +VM DMA depends on Intel VT-d to do the translation from GPA to HPA, so we +need to enable VT-d IOMMU engine in ACRN before we can passthrough any device. SOS +in ACRN is a VM running in non-root mode which also depends +on VT-d to access a device. In SOS DMA remapping +engine settings, GPA is equal to HPA. + +ACRN hypervisor checks DMA-Remapping Hardware unit Definition (DRHD) in +host DMAR ACPI table to get basic info, then sets up each DMAR unit. For +simplicity, ACRN reuses EPT table as the translation table in DMAR +unit for each passthrough device. The control flow is shown in the +following figures: + +.. figure:: images/passthru-image72.png + :align: center + + DMA Remapping control flow during HV init + +.. figure:: images/passthru-image86.png + :align: center + + ptdev assignment control flow + +.. figure:: images/passthru-image42.png + :align: center + + ptdev de-assignment control flow + + +MMIO Remapping +************** + +For PCI MMIO BAR, hypervisor builds EPT mapping between virtual BAR and +physical BAR, then VM can access MMIO directly. + +Device configuration emulation +****************************** + +PCI configuration is based on access of port 0xCF8/CFC. ACRN +implements PCI configuration emulation to handle 0xCF8/CFC to control +PCI device through two paths: implemented in hypervisor or in SOS device +model. + +- When configuration emulation is in the hypervisor, the interception of + 0xCF8/CFC port and emulatation of PCI configuration space access are + tricky and unclean. Therefore the final solution is to reuse the + PCI emulation infrastructure of SOS device model. The hypervisor + routes the UOS 0xCF8/CFC access to device model, and keeps blind to the + physical PCI devices. Upon receiving UOS PCI configuration space access + request, device model needs to emulate some critical space, for instance, + BAR, MSI capability, and INTLINE/INTPIN. + +- For other access, device model + reads/writes physical configuration space on behalf of UOS. To do + this, device model is linked with lib pci access to access physical PCI + device. + +Interrupt Remapping +******************* + +When the physical interrupt of a passthrough device happens, hypervisor has +to distribute it to the relevant VM according to interrupt remapping +relationships. The structure ``ptdev_remapping_info`` is used to define +the subordination relation between physical interrupt and VM, the +virtual destination, etc. See the following figure for details: + +.. figure:: images/passthru-image91.png + :align: center + + Remapping of physical interrupts + +There are two different types of interrupt source: IOAPIC and MSI. +The hypervisor will record different information for interrupt +distribution: physical and virtual IOAPIC pin for IOAPIC source, +physical and virtual BDF and other info for MSI source. + +SOS passthrough is also in the scope of interrupt remapping which is +done on-demand rather than on hypervisor initialization. + +.. figure:: images/passthru-image102.png + :align: center + :name: init-remapping + + Initialization of remapping of virtual IOAPIC interrupts for SOS + +:numref:`init-remapping` above illustrates how remapping of (virtual) IOAPIC +interrupts are remappied for SOS. VM exit occurs whenever SOS tries to +unmask an interrupt in (virtual) IOAPIC by writing to the Redirection +Table Entry (or RTE). The hypervisor then invokes the IOAPIC emulation +handler (refer to :ref:`hld-io-emulation` for details on I/O emulation) which +calls APIs to set up a remapping for the to-be-unmasked interrupt. + +Remapping of (virtual) PIC interrupts are set up in a similar sequence: + +.. figure:: images/passthru-image98.png + :align: center + + Initialization of remapping of virtual MSI for SOS + +This figure illustrates how mappings of MSI or MSIX are set up for +SOS. SOS is responsible for issuing an hypercall to notify the +hypervisor before it configures the PCI configuration space to enable an +MSI. The hypervisor takes this opportunity to set up a remapping for the +given MSI or MSIX before it is actually enabled by SOS. + +When the UOS needs to access the physical device by passthrough, it uses +the following steps: + +- UOS gets a virtual interrupt +- VM exit happens and the trapped vCPU is the target where the interrup + will be injected. +- Hypervisor will handle the interrupt and translate the vector + according to ptdev_remapping_info. +- Hypervisor delivers the interrupt to UOS. + +When the SOS needs to use the physical device, the passthrough is also +active because the SOS is the first VM. The detail steps are: + +- SOS get all physical interrupts. It assigns different interrupts for + different VMs during initialization and reassign when a VM is created or + deleted. +- When physical interrupt is trapped, an exception will happen after VMCS + has been set. +- Hypervisor will handle the vm exit issue according to + ptdev_remapping_info and translates the vector. +- The interrupt will be injected the same as a virtual interrupt. + +ACPI Virtualization +******************* + +ACPI virtualization is designed in ACRN with these assumptions: + +- HV has no knowledge of ACPI, +- SOS owns all physical ACPI resources, +- UOS sees virtual ACPI resources emulated by device model. + +Some passthrough devices require physical ACPI table entry for +initialization. The device model will create such device entry based on +the physical one according to vendor ID and device ID. Virtualization is +implemented in SOS device model and not in scope of the hypervisor. + +GSI Sharing Violation Check +*************************** + +All the PCI devices that are sharing the same GSI should be assigned to +the same VM to avoid physical GSI sharing between multiple VMs. For +devices that don't support MSI, ACRN DM +shares the same GSI pin to a GSI +sharing group. The devices in the same group should be assigned together to +the current VM, otherwise, none of them should be assigned to the +current VM. A device that violates the rule will be rejected to be +passthrough. The checking logic is implemented in Device Mode and not +in scope of hypervisor. + +Data structures and interfaces +****************************** + +.. note:: replace with reference to API docs + +The following APIs are provided to initialize interrupt remapping for +SOS: + +- int ptdev_intx_pin_remap(struct vm \*vm, uint8_t virt_pin, enum + ptdev_vpin_source vpin_src); + + Set up the remapping of the given virtual pin for the given vm. + +- int ptdev_msix_remap(struct vm \*vm, uint16_t virt_bdf, uint16_t + entry_nr, struct ptdev_msi_info \*info); + +The following APIs are provided to manipulate the interrupt remapping +for UOS. + +- int ptdev_add_intx_remapping(struct vm \*vm, uint16_t virt_bdf, + uint16_t phys_bdf, uint8_t virt_pin, uint8_t phys_pin, bool + pic_pin); + + Add mapping between the given virtual and physical pin for the + given vm. + +- void ptdev_remove_intx_remapping(struct vm \*vm, uint8_t + virt_pin, bool pic_pin); + + Remove mapping of the given virtual pin for the given vm. + +- int ptdev_add_msix_remapping(struct vm \*vm, uint16_t virt_bdf, + uint16_t phys_bdf, uint32_t vector_count); + + Add mapping of the given number of vectors between the given + physical and virtual BDF for the given vm. + +- void ptdev_remove_msix_remapping(struct vm \*vm, uint16_t + virt_bdf, uint32_t vector_count); + + Remove the mapping of given number of vectors of the given virtual + BDF for the given vm. + +The following APIs are provided to acknowledge a virtual interrupt. + diff --git a/doc/developer-guides/hld/images/passthru-image102.png b/doc/developer-guides/hld/images/passthru-image102.png new file mode 100644 index 000000000..df6217253 Binary files /dev/null and b/doc/developer-guides/hld/images/passthru-image102.png differ diff --git a/doc/developer-guides/hld/images/passthru-image22.png b/doc/developer-guides/hld/images/passthru-image22.png new file mode 100644 index 000000000..b3be30714 Binary files /dev/null and b/doc/developer-guides/hld/images/passthru-image22.png differ diff --git a/doc/developer-guides/hld/images/passthru-image30.png b/doc/developer-guides/hld/images/passthru-image30.png new file mode 100644 index 000000000..688594ad0 Binary files /dev/null and b/doc/developer-guides/hld/images/passthru-image30.png differ diff --git a/doc/developer-guides/hld/images/passthru-image42.png b/doc/developer-guides/hld/images/passthru-image42.png new file mode 100644 index 000000000..3de59eba8 Binary files /dev/null and b/doc/developer-guides/hld/images/passthru-image42.png differ diff --git a/doc/developer-guides/hld/images/passthru-image72.png b/doc/developer-guides/hld/images/passthru-image72.png new file mode 100644 index 000000000..311a50d8a Binary files /dev/null and b/doc/developer-guides/hld/images/passthru-image72.png differ diff --git a/doc/developer-guides/hld/images/passthru-image77.png b/doc/developer-guides/hld/images/passthru-image77.png new file mode 100644 index 000000000..254c85032 Binary files /dev/null and b/doc/developer-guides/hld/images/passthru-image77.png differ diff --git a/doc/developer-guides/hld/images/passthru-image86.png b/doc/developer-guides/hld/images/passthru-image86.png new file mode 100644 index 000000000..ce7b3290e Binary files /dev/null and b/doc/developer-guides/hld/images/passthru-image86.png differ diff --git a/doc/developer-guides/hld/images/passthru-image91.png b/doc/developer-guides/hld/images/passthru-image91.png new file mode 100644 index 000000000..dd3aa6e6a Binary files /dev/null and b/doc/developer-guides/hld/images/passthru-image91.png differ diff --git a/doc/developer-guides/hld/images/passthru-image98.png b/doc/developer-guides/hld/images/passthru-image98.png new file mode 100644 index 000000000..d03471d24 Binary files /dev/null and b/doc/developer-guides/hld/images/passthru-image98.png differ