mirror of
https://github.com/projectacrn/acrn-hypervisor.git
synced 2025-06-20 04:33:55 +00:00
doc: update HLD VT-d
transcode, edit, and upload HLD 0.7 section 3.8 (VT-d) Tracked-on: #1643 Signed-off-by: David B. Kinder <david.b.kinder@intel.com>
This commit is contained in:
parent
e141150e4c
commit
7c192db1ba
@ -13,4 +13,5 @@ Hypervisor high-level design
|
|||||||
I/O Emulation <hv-io-emulation>
|
I/O Emulation <hv-io-emulation>
|
||||||
Physical Interrupt <hv-interrupt>
|
Physical Interrupt <hv-interrupt>
|
||||||
Timer <hv-timer>
|
Timer <hv-timer>
|
||||||
Virtual Interrupt <hv-virt-interrupt.rst>
|
Virtual Interrupt <hv-virt-interrupt>
|
||||||
|
VT-d <hv-vt-d>
|
||||||
|
372
doc/developer-guides/hld/hv-vt-d.rst
Normal file
372
doc/developer-guides/hld/hv-vt-d.rst
Normal file
@ -0,0 +1,372 @@
|
|||||||
|
.. _vt-d-hld:
|
||||||
|
|
||||||
|
VT-d
|
||||||
|
####
|
||||||
|
|
||||||
|
VT-d stands for Intel Virtual Technology for Directed IO, and provides
|
||||||
|
hardware capabilities to assign I/O devices to VMs and extending the
|
||||||
|
protection and isolation properties of VMs for I/O operations.
|
||||||
|
|
||||||
|
VT-d provides the following main functions:
|
||||||
|
|
||||||
|
- **DMA remapping**: for supporting address translations for DMA from
|
||||||
|
devices.
|
||||||
|
|
||||||
|
- **Interrupt remapping**: for supporting isolation and routing of
|
||||||
|
interrupts from devices and external interrupt controllers to
|
||||||
|
appropriate VMs.
|
||||||
|
|
||||||
|
- **Interrupt posting**: for supporting direct delivery of virtual
|
||||||
|
interrupts from devices and external controllers to virtual
|
||||||
|
processors.
|
||||||
|
|
||||||
|
ACRN hypervisor supports DMA remapping that provides address translation
|
||||||
|
capability for PCI pass-through devices, and second-level translation,
|
||||||
|
which applies to requests-without-PASID. ACRN does not support
|
||||||
|
First-level / nested translation.
|
||||||
|
|
||||||
|
DMAR Engines Discovery
|
||||||
|
**********************
|
||||||
|
|
||||||
|
DMA Remapping Report ACPI table
|
||||||
|
===============================
|
||||||
|
|
||||||
|
For generic platforms, ACRN hypervisor retrieves DMAR information from
|
||||||
|
the ACPI table, and parses the DMAR reporting structure to discover the
|
||||||
|
number of DMA-remapping hardware units present in the platform as well as
|
||||||
|
the devices under the scope of a remapping hardware unit, as shown in
|
||||||
|
:numref:`dma-remap-report`:
|
||||||
|
|
||||||
|
.. figure:: images/vt-d-image90.png
|
||||||
|
:align: center
|
||||||
|
:name: dma-remap-report
|
||||||
|
|
||||||
|
DMA Remapping Reporting Structure
|
||||||
|
|
||||||
|
Pre-parsed DMAR information
|
||||||
|
===========================
|
||||||
|
|
||||||
|
For specific platforms, ACRN hypervisor uses pre-parsed DMA remapping
|
||||||
|
reporting information directly to save time for hypervisor boot-up.
|
||||||
|
|
||||||
|
DMA remapping unit for integrated graphics device
|
||||||
|
=================================================
|
||||||
|
|
||||||
|
Generally, there is a dedicated remapping hardware unit for the Intel
|
||||||
|
integrated graphics device. ACRN implements GVT-g for graphics, but
|
||||||
|
GVT-g is not compatible with VT-d. The remapping hardware unit for
|
||||||
|
graphics device is disabled on ACRN if GVT-g is enabled. If the graphics
|
||||||
|
device needs to pass-through to a VM, then the remapping hardware unit
|
||||||
|
must be enabled.
|
||||||
|
|
||||||
|
DMA Remapping
|
||||||
|
*************
|
||||||
|
|
||||||
|
DMA remapping hardware is used to isolate device access to memory,
|
||||||
|
enabling each device in the system to be assigned to a specific domain
|
||||||
|
through a distinct set of paging structures.
|
||||||
|
|
||||||
|
Domains
|
||||||
|
=======
|
||||||
|
|
||||||
|
A domain is abstractly defined as an isolated environment in the
|
||||||
|
platform, to which a subset of the host physical memory is allocated.
|
||||||
|
The memory resource of a domain is specified by the address translation
|
||||||
|
tables.
|
||||||
|
|
||||||
|
Device to Domain Mapping Structure
|
||||||
|
==================================
|
||||||
|
|
||||||
|
VT-d hardware uses root-table and context-tables to build the mapping
|
||||||
|
between devices and domains as shown in :numref:`vt-d-mapping`.
|
||||||
|
|
||||||
|
.. figure:: images/vt-d-image44.png
|
||||||
|
:align: center
|
||||||
|
:name: vt-d-mapping
|
||||||
|
|
||||||
|
Device to Domain Mapping structures
|
||||||
|
|
||||||
|
The root-table is 4-KByte in size and contains 256 root-entries to cover
|
||||||
|
the PCI bus number space (0-255). Each root-entry contains a
|
||||||
|
context-table pointer to reference the context-table for devices on the
|
||||||
|
bus identified by the root-entry, if the present flag of the root-entry
|
||||||
|
is set.
|
||||||
|
|
||||||
|
Each context-table contains 256 entries, with each entry corresponding
|
||||||
|
to a PCI device function on the bus. For a PCI device, the device and
|
||||||
|
function numbers (8-bits) are used to index into the context-table. Each
|
||||||
|
context-entry contains a Second-level Page-table Pointer, which provides
|
||||||
|
the host physical address of the address translation structure in system
|
||||||
|
memory to be used for remapping requests-without-PASID processed through
|
||||||
|
the context-entry.
|
||||||
|
|
||||||
|
For a given Bus, Device, and Function combination as shown in
|
||||||
|
:numref:`bdf-passthru`, a pass-through device can be associated with
|
||||||
|
address translation structures for a domain.
|
||||||
|
|
||||||
|
.. figure:: images/vt-d-image19.png
|
||||||
|
:align: center
|
||||||
|
:name: bdf-passthru
|
||||||
|
|
||||||
|
BDF Format of Pass-through Device
|
||||||
|
|
||||||
|
Refer to the `VT-d spec`_ for the more details of Device to domain
|
||||||
|
mapping structures.
|
||||||
|
|
||||||
|
.. _VT-d spec:
|
||||||
|
https://software.intel.com/sites/default/files/managed/c5/15/vt-directed-io-spec.pdf
|
||||||
|
|
||||||
|
Address Translation Structures
|
||||||
|
==============================
|
||||||
|
|
||||||
|
On ACRN, EPT table of a domain is used as the address translation
|
||||||
|
structures for the devices assigned to the domain, as shown
|
||||||
|
:numref:`vt-d-DMA`.
|
||||||
|
|
||||||
|
.. figure:: images/vt-d-image40.png
|
||||||
|
:align: center
|
||||||
|
:name: vt-d-DMA
|
||||||
|
|
||||||
|
DMA Remapping Diagram
|
||||||
|
|
||||||
|
When the device attempts to access system memory, the DMA
|
||||||
|
remapping hardware intercepts the access, utilizes the EPT table of the
|
||||||
|
domain to determine whether the access is allowed, and translates the DMA
|
||||||
|
address according to the EPT table from guest physical address (GPA) to
|
||||||
|
host physical address (HPA).
|
||||||
|
|
||||||
|
Domains and Memory Isolation
|
||||||
|
============================
|
||||||
|
|
||||||
|
There are no DMA operations inside the hypervisor, so ACRN doesn’t
|
||||||
|
create a domain for the hypervisor. No DMA operations from pass-through
|
||||||
|
devices can access the hypervisor memory.
|
||||||
|
|
||||||
|
ACRN treats each virtual machine (VM) as a separate domain. For a VM,
|
||||||
|
there is a EPT table for Normal world, and there may be a EPT table for
|
||||||
|
Secure World. Secure world can access Normal World's memory, but Normal
|
||||||
|
world cannot access Secure World's memory.
|
||||||
|
|
||||||
|
VM0 domain
|
||||||
|
VM0 domain is created when ithe hypervisor creates VM0 for the
|
||||||
|
Service OS.
|
||||||
|
|
||||||
|
IOMMU uses the EPT table of Normal world of VM0 as the address
|
||||||
|
translation structures for the devices in VM0 domain. The Normal world’s
|
||||||
|
EPT table of VM0 doesn’t include the memory resource of ithe hypervisor
|
||||||
|
and Secure worlds if any. So the devices in VM0 domain can’t access the
|
||||||
|
memory belong to hypervisor or secure worlds.
|
||||||
|
|
||||||
|
Other domains
|
||||||
|
Other VM domains will be created when hypervisor creates User OS. One
|
||||||
|
domain for each User OS.
|
||||||
|
|
||||||
|
IOMMU uses the EPT table of Normal world of a VM as the address
|
||||||
|
translation structures for the devices in the domain. The Normal world’s
|
||||||
|
EPT table of the VM only allows devices to access the memory
|
||||||
|
allocated for Normal world of the VM.
|
||||||
|
|
||||||
|
Page-walk coherency
|
||||||
|
===================
|
||||||
|
|
||||||
|
For the VT-d hardware, which doesn’t support page-walk coherency,
|
||||||
|
hypervisor needs to make sure the updates of VT-d tables are synced in
|
||||||
|
memory:
|
||||||
|
|
||||||
|
- Device to Domain Mapping Structures, including Root-entries and
|
||||||
|
Context-entries
|
||||||
|
|
||||||
|
- EPT table of a VM.
|
||||||
|
|
||||||
|
ACRN will flush the related cache line after updates of these structures
|
||||||
|
if the VT-d hardware doesn’t support page-walk coherency.
|
||||||
|
|
||||||
|
Super-page support
|
||||||
|
==================
|
||||||
|
|
||||||
|
ACRN VT-d reuses the EPT table as address a translation table. VT-d capability
|
||||||
|
for super-page support should be identical with the usage of EPT table.
|
||||||
|
|
||||||
|
Snoop control
|
||||||
|
=============
|
||||||
|
|
||||||
|
If VT-d hardware supports snoop control, it allows VT-d to control to
|
||||||
|
ignore the “no-snoop attribute” in PCI-E transactions.
|
||||||
|
|
||||||
|
The following table shows the snoop behavior of DMA operation controlled by the
|
||||||
|
combination of:
|
||||||
|
|
||||||
|
- Snoop Control capability of VT-d DMAR unit
|
||||||
|
- The setting of SNP filed in leaf PTE
|
||||||
|
- No-snoop attribute in PCI-e request
|
||||||
|
|
||||||
|
.. list-table::
|
||||||
|
:widths: 25 25 25 25
|
||||||
|
:header-rows: 1
|
||||||
|
|
||||||
|
* - SC cap of VT-d
|
||||||
|
- SNP filed in leaf PTE
|
||||||
|
- No-snoop attribute in request
|
||||||
|
- Snoop behavior
|
||||||
|
|
||||||
|
* - 0
|
||||||
|
- 0 (must be 0)
|
||||||
|
- no snoop
|
||||||
|
- No snoop
|
||||||
|
|
||||||
|
* - 0
|
||||||
|
- 0 (must be 0)
|
||||||
|
- snoop
|
||||||
|
- Snoop
|
||||||
|
|
||||||
|
* - 1
|
||||||
|
- 1
|
||||||
|
- snoop / no snoop
|
||||||
|
- Snoop
|
||||||
|
|
||||||
|
* - 1
|
||||||
|
- 0
|
||||||
|
- no snoop
|
||||||
|
- No snoop
|
||||||
|
|
||||||
|
* - 1
|
||||||
|
- 0
|
||||||
|
- snoop
|
||||||
|
- Snoop
|
||||||
|
|
||||||
|
ACRN enable Snoop Control by default if all enabled VT-d DMAR units
|
||||||
|
support Snoop Control by setting bit 11 of leaf PTE of EPT table. Bit 11
|
||||||
|
of leaf PTE of EPT is ignored by MMU. So no side effect for MMU.
|
||||||
|
|
||||||
|
If one of the enabled VT-d DMAR units doesn’t support Snoop Control,
|
||||||
|
then Bit 11 of leaf PET of EPT is not set since the field is treated as
|
||||||
|
reserved(0) by VT-d hardware implementations not supporting Snoop
|
||||||
|
Control.
|
||||||
|
|
||||||
|
Initialization
|
||||||
|
**************
|
||||||
|
|
||||||
|
During hypervisor initialization, it registers DMAR units on the
|
||||||
|
platform according to the reparsed information or DMAR table. There may
|
||||||
|
be multiple DMAR units on the platform, ACRN allows some of the DMAR
|
||||||
|
units to be ignored. If some DMAR unit(s) are marked as ignored, they
|
||||||
|
would not be enabled.
|
||||||
|
|
||||||
|
Hypervisor creates VM0 domain using the Normal World’s EPT table of VM0
|
||||||
|
as address translation table when creating VM0 as Service OS. And all
|
||||||
|
PCI devices on the platform are added to VM0 domain. Then enable DMAR
|
||||||
|
translation for DMAR unit(s) if they are not marked as ignored.
|
||||||
|
|
||||||
|
Device assignment
|
||||||
|
*****************
|
||||||
|
|
||||||
|
All devices are initially added to VM0 domain.
|
||||||
|
To assign a device means to assign the device to an User OS. The device
|
||||||
|
is remove from VM0 domain and added to the VM domain related to the User
|
||||||
|
OS, which changes the address translation table from EPT of VM0 to EPT
|
||||||
|
of User OS for the device.
|
||||||
|
|
||||||
|
To un-assign a device means to un-assign the device from an User OS. The
|
||||||
|
device is remove from the VM domain related to the User OS, then added
|
||||||
|
back to VM0 domain, which changes the address translation table from EPT
|
||||||
|
of User OS to EPT of VM0 for the device.
|
||||||
|
|
||||||
|
Power Management support for S3
|
||||||
|
*******************************
|
||||||
|
|
||||||
|
During platform S3 suspend and resume, the VT-d register values will be
|
||||||
|
lost. ACRN VT-d provide APIs to be called during S3 suspend and resume.
|
||||||
|
|
||||||
|
During S3 suspend, some register values are saved in the memory, and
|
||||||
|
DMAR translation is disabled. During S3 resume, the register values
|
||||||
|
saved are restored. Root table address register is set. DMAR translation
|
||||||
|
is enabled.
|
||||||
|
|
||||||
|
All the operations for S3 suspend and resume are performed on all DMAR
|
||||||
|
units on the platform, except for the DMAR units marked ignored.
|
||||||
|
|
||||||
|
Error Handling
|
||||||
|
**************
|
||||||
|
|
||||||
|
ACRN VT-d supports DMA remapping error reporting. ACRN VT-d requests a
|
||||||
|
IRQ / vector for DMAR error reporting. A DMAR fault handler is
|
||||||
|
registered for the IRQ. DMAR unit supports report fault event via MSI.
|
||||||
|
When a fault event occurs, a MSI is generated, so that the DMAR fault
|
||||||
|
handler will be called to report error event.
|
||||||
|
|
||||||
|
Data structures and interfaces
|
||||||
|
******************************
|
||||||
|
|
||||||
|
.. note:: Needs API reference to include/arch/x86/vtd.h
|
||||||
|
|
||||||
|
initialization and deinitialization
|
||||||
|
===================================
|
||||||
|
|
||||||
|
The following APIs are provided during initialization and
|
||||||
|
deinitialization:
|
||||||
|
|
||||||
|
- void init_iommu(void)
|
||||||
|
|
||||||
|
Register DMAR units on the platform according to the reparsed
|
||||||
|
information or DMAR table.
|
||||||
|
|
||||||
|
- void init_iommu_vm0_domain(struct vm \*vm0)
|
||||||
|
|
||||||
|
Create VM0 domain using the Normal World’s EPT table of VM0 as address
|
||||||
|
translation table. Add all PCI devices on the platform to VM0 domain. Then enable
|
||||||
|
DMAR translation.
|
||||||
|
|
||||||
|
- void destroy_iommu_domain(struct iommu_domain \*domain)
|
||||||
|
|
||||||
|
Destroy the iommu domain.
|
||||||
|
|
||||||
|
VT-d
|
||||||
|
====
|
||||||
|
|
||||||
|
The following API are provided during runtime:
|
||||||
|
|
||||||
|
- void suspend_iommu(void)
|
||||||
|
|
||||||
|
Suspend IOMMU.
|
||||||
|
|
||||||
|
- void resume_iommu(void)
|
||||||
|
|
||||||
|
Resume IOMMU.
|
||||||
|
|
||||||
|
- struct iommu_domain \*create_iommu_domain(uint16_t vm_id,
|
||||||
|
uint64_t translation_table, uint32_t addr_width)
|
||||||
|
|
||||||
|
Create a iommu domain for a VM specified by vm_id.
|
||||||
|
translation_table should be the physical address of EPT table of the VM
|
||||||
|
specified by the vm_id, the value cannot be NULL.
|
||||||
|
Return the iommu_domain created for the VM if not NULL.
|
||||||
|
Error if the return NULL.
|
||||||
|
|
||||||
|
- int assign_iommu_device(struct iommu_domain \*domain, uint8_t
|
||||||
|
bus, uint8_t devfun)
|
||||||
|
|
||||||
|
Assign a device specified by bus & devfun to a iommu domain. The device
|
||||||
|
is removed from VM0 domain and added to the domain specified.
|
||||||
|
|
||||||
|
domain: specified the domain the device should be assigned to.
|
||||||
|
|
||||||
|
bus: the 8-bit bus value of the pass-through device.
|
||||||
|
|
||||||
|
devfun: the 8-bit device function value of the pass-through device.
|
||||||
|
|
||||||
|
return: return 0 if success, other if error.
|
||||||
|
|
||||||
|
- int unassign_iommu_device(struct iommu_domain \*domain,
|
||||||
|
|
||||||
|
uint8_t bus, uint8_t devfun);
|
||||||
|
|
||||||
|
Unassign a device specified by bus & devfun from a iommu domain.
|
||||||
|
|
||||||
|
domain: specified the domain the device should be removed from.
|
||||||
|
|
||||||
|
bus: the 8-bit bus value of the pass-through device.
|
||||||
|
|
||||||
|
devfun: the 8-bit device function value of the pass-through device.
|
||||||
|
|
||||||
|
return: return 0 if success, other if error.
|
||||||
|
|
BIN
doc/developer-guides/hld/images/vt-d-image19.png
Normal file
BIN
doc/developer-guides/hld/images/vt-d-image19.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 1.9 KiB |
BIN
doc/developer-guides/hld/images/vt-d-image40.png
Normal file
BIN
doc/developer-guides/hld/images/vt-d-image40.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 7.8 KiB |
BIN
doc/developer-guides/hld/images/vt-d-image44.png
Normal file
BIN
doc/developer-guides/hld/images/vt-d-image44.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 48 KiB |
BIN
doc/developer-guides/hld/images/vt-d-image90.png
Normal file
BIN
doc/developer-guides/hld/images/vt-d-image90.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 7.5 KiB |
Loading…
Reference in New Issue
Block a user