diff --git a/doc/developer-guides/hld/hld-hypervisor.rst b/doc/developer-guides/hld/hld-hypervisor.rst index 17fd4f4f9..6ca6fb440 100644 --- a/doc/developer-guides/hld/hld-hypervisor.rst +++ b/doc/developer-guides/hld/hld-hypervisor.rst @@ -13,4 +13,5 @@ Hypervisor high-level design I/O Emulation Physical Interrupt Timer - Virtual Interrupt + Virtual Interrupt + VT-d diff --git a/doc/developer-guides/hld/hv-vt-d.rst b/doc/developer-guides/hld/hv-vt-d.rst new file mode 100644 index 000000000..cd097df0a --- /dev/null +++ b/doc/developer-guides/hld/hv-vt-d.rst @@ -0,0 +1,372 @@ +.. _vt-d-hld: + +VT-d +#### + +VT-d stands for Intel Virtual Technology for Directed IO, and provides +hardware capabilities to assign I/O devices to VMs and extending the +protection and isolation properties of VMs for I/O operations. + +VT-d provides the following main functions: + +- **DMA remapping**: for supporting address translations for DMA from + devices. + +- **Interrupt remapping**: for supporting isolation and routing of + interrupts from devices and external interrupt controllers to + appropriate VMs. + +- **Interrupt posting**: for supporting direct delivery of virtual + interrupts from devices and external controllers to virtual + processors. + +ACRN hypervisor supports DMA remapping that provides address translation +capability for PCI pass-through devices, and second-level translation, +which applies to requests-without-PASID. ACRN does not support +First-level / nested translation. + +DMAR Engines Discovery +********************** + +DMA Remapping Report ACPI table +=============================== + +For generic platforms, ACRN hypervisor retrieves DMAR information from +the ACPI table, and parses the DMAR reporting structure to discover the +number of DMA-remapping hardware units present in the platform as well as +the devices under the scope of a remapping hardware unit, as shown in +:numref:`dma-remap-report`: + +.. figure:: images/vt-d-image90.png + :align: center + :name: dma-remap-report + + DMA Remapping Reporting Structure + +Pre-parsed DMAR information +=========================== + +For specific platforms, ACRN hypervisor uses pre-parsed DMA remapping +reporting information directly to save time for hypervisor boot-up. + +DMA remapping unit for integrated graphics device +================================================= + +Generally, there is a dedicated remapping hardware unit for the Intel +integrated graphics device. ACRN implements GVT-g for graphics, but +GVT-g is not compatible with VT-d. The remapping hardware unit for +graphics device is disabled on ACRN if GVT-g is enabled. If the graphics +device needs to pass-through to a VM, then the remapping hardware unit +must be enabled. + +DMA Remapping +************* + +DMA remapping hardware is used to isolate device access to memory, +enabling each device in the system to be assigned to a specific domain +through a distinct set of paging structures. + +Domains +======= + +A domain is abstractly defined as an isolated environment in the +platform, to which a subset of the host physical memory is allocated. +The memory resource of a domain is specified by the address translation +tables. + +Device to Domain Mapping Structure +================================== + +VT-d hardware uses root-table and context-tables to build the mapping +between devices and domains as shown in :numref:`vt-d-mapping`. + +.. figure:: images/vt-d-image44.png + :align: center + :name: vt-d-mapping + + Device to Domain Mapping structures + +The root-table is 4-KByte in size and contains 256 root-entries to cover +the PCI bus number space (0-255). Each root-entry contains a +context-table pointer to reference the context-table for devices on the +bus identified by the root-entry, if the present flag of the root-entry +is set. + +Each context-table contains 256 entries, with each entry corresponding +to a PCI device function on the bus. For a PCI device, the device and +function numbers (8-bits) are used to index into the context-table. Each +context-entry contains a Second-level Page-table Pointer, which provides +the host physical address of the address translation structure in system +memory to be used for remapping requests-without-PASID processed through +the context-entry. + +For a given Bus, Device, and Function combination as shown in +:numref:`bdf-passthru`, a pass-through device can be associated with +address translation structures for a domain. + +.. figure:: images/vt-d-image19.png + :align: center + :name: bdf-passthru + + BDF Format of Pass-through Device + +Refer to the `VT-d spec`_ for the more details of Device to domain +mapping structures. + +.. _VT-d spec: + https://software.intel.com/sites/default/files/managed/c5/15/vt-directed-io-spec.pdf + +Address Translation Structures +============================== + +On ACRN, EPT table of a domain is used as the address translation +structures for the devices assigned to the domain, as shown +:numref:`vt-d-DMA`. + +.. figure:: images/vt-d-image40.png + :align: center + :name: vt-d-DMA + + DMA Remapping Diagram + +When the device attempts to access system memory, the DMA +remapping hardware intercepts the access, utilizes the EPT table of the +domain to determine whether the access is allowed, and translates the DMA +address according to the EPT table from guest physical address (GPA) to +host physical address (HPA). + +Domains and Memory Isolation +============================ + +There are no DMA operations inside the hypervisor, so ACRN doesn’t +create a domain for the hypervisor. No DMA operations from pass-through +devices can access the hypervisor memory. + +ACRN treats each virtual machine (VM) as a separate domain. For a VM, +there is a EPT table for Normal world, and there may be a EPT table for +Secure World. Secure world can access Normal World's memory, but Normal +world cannot access Secure World's memory. + +VM0 domain + VM0 domain is created when ithe hypervisor creates VM0 for the + Service OS. + + IOMMU uses the EPT table of Normal world of VM0 as the address + translation structures for the devices in VM0 domain. The Normal world’s + EPT table of VM0 doesn’t include the memory resource of ithe hypervisor + and Secure worlds if any. So the devices in VM0 domain can’t access the + memory belong to hypervisor or secure worlds. + +Other domains + Other VM domains will be created when hypervisor creates User OS. One + domain for each User OS. + + IOMMU uses the EPT table of Normal world of a VM as the address + translation structures for the devices in the domain. The Normal world’s + EPT table of the VM only allows devices to access the memory + allocated for Normal world of the VM. + +Page-walk coherency +=================== + +For the VT-d hardware, which doesn’t support page-walk coherency, +hypervisor needs to make sure the updates of VT-d tables are synced in +memory: + +- Device to Domain Mapping Structures, including Root-entries and + Context-entries + +- EPT table of a VM. + +ACRN will flush the related cache line after updates of these structures +if the VT-d hardware doesn’t support page-walk coherency. + +Super-page support +================== + +ACRN VT-d reuses the EPT table as address a translation table. VT-d capability +for super-page support should be identical with the usage of EPT table. + +Snoop control +============= + +If VT-d hardware supports snoop control, it allows VT-d to control to +ignore the “no-snoop attribute” in PCI-E transactions. + +The following table shows the snoop behavior of DMA operation controlled by the +combination of: + +- Snoop Control capability of VT-d DMAR unit +- The setting of SNP filed in leaf PTE +- No-snoop attribute in PCI-e request + +.. list-table:: + :widths: 25 25 25 25 + :header-rows: 1 + + * - SC cap of VT-d + - SNP filed in leaf PTE + - No-snoop attribute in request + - Snoop behavior + + * - 0 + - 0 (must be 0) + - no snoop + - No snoop + + * - 0 + - 0 (must be 0) + - snoop + - Snoop + + * - 1 + - 1 + - snoop / no snoop + - Snoop + + * - 1 + - 0 + - no snoop + - No snoop + + * - 1 + - 0 + - snoop + - Snoop + +ACRN enable Snoop Control by default if all enabled VT-d DMAR units +support Snoop Control by setting bit 11 of leaf PTE of EPT table. Bit 11 +of leaf PTE of EPT is ignored by MMU. So no side effect for MMU. + +If one of the enabled VT-d DMAR units doesn’t support Snoop Control, +then Bit 11 of leaf PET of EPT is not set since the field is treated as +reserved(0) by VT-d hardware implementations not supporting Snoop +Control. + +Initialization +************** + +During hypervisor initialization, it registers DMAR units on the +platform according to the reparsed information or DMAR table. There may +be multiple DMAR units on the platform, ACRN allows some of the DMAR +units to be ignored. If some DMAR unit(s) are marked as ignored, they +would not be enabled. + +Hypervisor creates VM0 domain using the Normal World’s EPT table of VM0 +as address translation table when creating VM0 as Service OS. And all +PCI devices on the platform are added to VM0 domain. Then enable DMAR +translation for DMAR unit(s) if they are not marked as ignored. + +Device assignment +***************** + +All devices are initially added to VM0 domain. +To assign a device means to assign the device to an User OS. The device +is remove from VM0 domain and added to the VM domain related to the User +OS, which changes the address translation table from EPT of VM0 to EPT +of User OS for the device. + +To un-assign a device means to un-assign the device from an User OS. The +device is remove from the VM domain related to the User OS, then added +back to VM0 domain, which changes the address translation table from EPT +of User OS to EPT of VM0 for the device. + +Power Management support for S3 +******************************* + +During platform S3 suspend and resume, the VT-d register values will be +lost. ACRN VT-d provide APIs to be called during S3 suspend and resume. + +During S3 suspend, some register values are saved in the memory, and +DMAR translation is disabled. During S3 resume, the register values +saved are restored. Root table address register is set. DMAR translation +is enabled. + +All the operations for S3 suspend and resume are performed on all DMAR +units on the platform, except for the DMAR units marked ignored. + +Error Handling +************** + +ACRN VT-d supports DMA remapping error reporting. ACRN VT-d requests a +IRQ / vector for DMAR error reporting. A DMAR fault handler is +registered for the IRQ. DMAR unit supports report fault event via MSI. +When a fault event occurs, a MSI is generated, so that the DMAR fault +handler will be called to report error event. + +Data structures and interfaces +****************************** + +.. note:: Needs API reference to include/arch/x86/vtd.h + +initialization and deinitialization +=================================== + +The following APIs are provided during initialization and +deinitialization: + +- void init_iommu(void) + +Register DMAR units on the platform according to the reparsed +information or DMAR table. + +- void init_iommu_vm0_domain(struct vm \*vm0) + + Create VM0 domain using the Normal World’s EPT table of VM0 as address + translation table. Add all PCI devices on the platform to VM0 domain. Then enable + DMAR translation. + +- void destroy_iommu_domain(struct iommu_domain \*domain) + + Destroy the iommu domain. + +VT-d +==== + +The following API are provided during runtime: + +- void suspend_iommu(void) + + Suspend IOMMU. + +- void resume_iommu(void) + + Resume IOMMU. + +- struct iommu_domain \*create_iommu_domain(uint16_t vm_id, + uint64_t translation_table, uint32_t addr_width) + + Create a iommu domain for a VM specified by vm_id. + translation_table should be the physical address of EPT table of the VM + specified by the vm_id, the value cannot be NULL. + Return the iommu_domain created for the VM if not NULL. + Error if the return NULL. + +- int assign_iommu_device(struct iommu_domain \*domain, uint8_t + bus, uint8_t devfun) + + Assign a device specified by bus & devfun to a iommu domain. The device + is removed from VM0 domain and added to the domain specified. + + domain: specified the domain the device should be assigned to. + + bus: the 8-bit bus value of the pass-through device. + + devfun: the 8-bit device function value of the pass-through device. + + return: return 0 if success, other if error. + +- int unassign_iommu_device(struct iommu_domain \*domain, + + uint8_t bus, uint8_t devfun); + + Unassign a device specified by bus & devfun from a iommu domain. + + domain: specified the domain the device should be removed from. + + bus: the 8-bit bus value of the pass-through device. + + devfun: the 8-bit device function value of the pass-through device. + + return: return 0 if success, other if error. + diff --git a/doc/developer-guides/hld/images/vt-d-image19.png b/doc/developer-guides/hld/images/vt-d-image19.png new file mode 100644 index 000000000..1448b0179 Binary files /dev/null and b/doc/developer-guides/hld/images/vt-d-image19.png differ diff --git a/doc/developer-guides/hld/images/vt-d-image40.png b/doc/developer-guides/hld/images/vt-d-image40.png new file mode 100644 index 000000000..d32cea32d Binary files /dev/null and b/doc/developer-guides/hld/images/vt-d-image40.png differ diff --git a/doc/developer-guides/hld/images/vt-d-image44.png b/doc/developer-guides/hld/images/vt-d-image44.png new file mode 100644 index 000000000..c680cf8fd Binary files /dev/null and b/doc/developer-guides/hld/images/vt-d-image44.png differ diff --git a/doc/developer-guides/hld/images/vt-d-image90.png b/doc/developer-guides/hld/images/vt-d-image90.png new file mode 100644 index 000000000..da13caba6 Binary files /dev/null and b/doc/developer-guides/hld/images/vt-d-image90.png differ