Doc: Final edits to Memory Mangt HLD doc.

Signed-off-by: Deb Taylor <deb.taylor@intel.com>
This commit is contained in:
Deb Taylor 2019-11-08 14:42:04 -05:00 committed by wenlingz
parent 1c3f16f56d
commit a74a75514a

View File

@ -9,29 +9,28 @@ Overview
******** ********
The hypervisor (HV) virtualizes real physical memory so an unmodified OS The hypervisor (HV) virtualizes real physical memory so an unmodified OS
(such as Linux or Android), running in a virtual machine, has the view of (such as Linux or Android) that is running in a virtual machine can
managing its own contiguous physical memory. HV uses virtual-processor manage its own contiguous physical memory. The HV uses virtual-processor
identifiers (VPIDs) and the extended page-table mechanism (EPT) to identifiers (VPIDs) and the extended page-table mechanism (EPT) to
translate guest-physical address into host-physical address. HV enables translate a guest-physical address into a host-physical address. The HV enables
EPT and VPID hardware virtualization features, establishes EPT page EPT and VPID hardware virtualization features, establishes EPT page
tables for SOS/UOS, and provides EPT page tables operation interfaces to tables for Service and User VMs, and provides EPT page tables operation interfaces to others.
others.
In the ACRN hypervisor system, there are few different memory spaces to In the ACRN hypervisor system, there are few different memory spaces to
consider. From the hypervisor's point of view there are: consider. From the hypervisor's point of view:
- **Host Physical Address (HPA)**: the native physical address space, and - **Host Physical Address (HPA)**: the native physical address space.
- **Host Virtual Address (HVA)**: the native virtual address space based on - **Host Virtual Address (HVA)**: the native virtual address space based on
a MMU. A page table is used to translate from HVA to HPA spaces. an MMU. A page table is used to translate from HVA to HPA spaces.
From the Guest OS running on a hypervisor there are: From the Guest OS running on a hypervisor:
- **Guest Physical Address (GPA)**: the guest physical address space from a - **Guest Physical Address (GPA)**: the guest physical address space from a
virtual machine. GPA to HPA transition is usually based on a virtual machine. GPA to HPA transition is usually based on an
MMU-like hardware module (EPT in X86), and associated with a page MMU-like hardware module (EPT in X86), and is associated with a page
table table.
- **Guest Virtual Address (GVA)**: the guest virtual address space from a - **Guest Virtual Address (GVA)**: the guest virtual address space from a
virtual machine based on a vMMU virtual machine based on a vMMU.
.. figure:: images/mem-image2.png .. figure:: images/mem-image2.png
:align: center :align: center
@ -52,25 +51,25 @@ ACRN hypervisor and how it handles the different memory space views
inside the hypervisor and from a VM: inside the hypervisor and from a VM:
- How ACRN hypervisor manages host memory (HPA/HVA) - How ACRN hypervisor manages host memory (HPA/HVA)
- How ACRN hypervisor manages SOS guest memory (HPA/GPA) - How ACRN hypervisor manages the Service VM guest memory (HPA/GPA)
- How ACRN hypervisor & SOS DM manage UOS guest memory (HPA/GPA) - How ACRN hypervisor and the Service VM DM manage the User MV guest memory (HPA/GPA)
Hypervisor Physical Memory Management Hypervisor Physical Memory Management
************************************* *************************************
In the ACRN, the HV initializes MMU page tables to manage all physical In ACRN, the HV initializes MMU page tables to manage all physical
memory and then switches to the new MMU page tables. After MMU page memory and then switches to the new MMU page tables. After MMU page
tables are initialized at the platform initialization stage, no updates tables are initialized at the platform initialization stage, no updates
are made for MMU page tables except hv_access_memory_region_update is called. are made for MMU page tables except when hv_access_memory_region_update is called.
However, the memory region updated by hv_access_memory_region_update However, the memory region updated by hv_access_memory_region_update
must not be accessed by ACRN hypervisor in advance. Because access could must not be accessed by the ACRN hypervisor in advance because access could
make mapping in TLB and there is no TLB flush mechanism for ACRN hv memory. make mapping in the TLB and there is no TLB flush mechanism for the ACRN HV memory.
Hypervisor Physical Memory Layout - E820 Hypervisor Physical Memory Layout - E820
======================================== ========================================
The ACRN hypervisor is the primary owner to manage system memory. The ACRN hypervisor is the primary owner for managing system memory.
Typically the boot firmware (e.g., EFI) passes the platform physical Typically, the boot firmware (e.g., EFI) passes the platform physical
memory layout - E820 table to the hypervisor. The ACRN hypervisor does memory layout - E820 table to the hypervisor. The ACRN hypervisor does
its memory management based on this table using 4-level paging. its memory management based on this table using 4-level paging.
@ -85,21 +84,20 @@ the platform.
Physical Memory Layout Example Physical Memory Layout Example
:numref:`mem-layout` is an example of the physical memory layout based on a simple :numref:`mem-layout` is an example of the physical memory layout based on a simple platform E820 table.
platform E820 table.
Hypervisor Memory Initialization Hypervisor Memory Initialization
================================ ================================
The ACRN hypervisor runs under paging mode. After the bootstrap The ACRN hypervisor runs in paging mode. After the bootstrap
processor (BSP) gets the platform E820 table, BSP creates its MMU page processor (BSP) gets the platform E820 table, the BSP creates its MMU page
table based on it. This is done by the function *init_paging()*. table based on it. This is done by the function *init_paging()*.
After the application processor (AP) receives IPI CPU startup After the application processor (AP) receives the IPI CPU startup
interrupt, it uses the MMU page tables created by BSP. In order to bring interrupt, it uses the MMU page tables created by the BSP. In order to bring
the memory access rights into effect, some other APIs are provided: the memory access rights into effect, some other APIs are provided:
enable_paging will enable IA32_EFER.NXE and CR0.WP, enable_smep will enable_paging will enable IA32_EFER.NXE and CR0.WP, enable_smep will
enable CR4.SMEP and enable_smap will enale CR4.SMAP. enable CR4.SMEP, and enable_smap will enale CR4.SMAP.
:numref:`hv-mem-init` describes the hypervisor memory initialization for BSP :numref:`hv-mem-init` describes the hypervisor memory initialization for the BSP
and APs. and APs.
.. figure:: images/mem-image8.png .. figure:: images/mem-image8.png
@ -108,18 +106,18 @@ and APs.
Hypervisor Memory Initialization Hypervisor Memory Initialization
The memory mapping policy used is: The following memory mapping policy used is:
- Identical mapping (ACRN hypervisor memory could be relocatable in - Identical mapping (ACRN hypervisor memory could be relocatable in
the future) the future)
- Map all the address space with UNCACHED type, read/write, user - Map all address spaces with UNCACHED type, read/write, user
and execute-disable access right and execute-disable access right
- Remap [0, low32_max_ram) regions to WRITE-BACK type - Remap [0, low32_max_ram) regions to WRITE-BACK type
- Remap [4G, high64_max_ram) regions to WRITE-BACK type - Remap [4G, high64_max_ram) regions to WRITE-BACK type
- set the paging-structure entries' U/S flag to - set the paging-structure entries' U/S flag to
supervisor-mode for hypervisor owned memroy supervisor-mode for hypervisor-owned memory
(exclude the memory reserve for trusty) (exclude the memory reserve for trusty)
- remove 'NX' bit for pages that contain hv code section - Remove 'NX' bit for pages that contain the hv code section
.. figure:: images/mem-image69.png .. figure:: images/mem-image69.png
:align: center :align: center
@ -136,8 +134,8 @@ The memory mapping policy used is:
code/data (< 1M part is for secondary CPU reset code) code/data (< 1M part is for secondary CPU reset code)
The hypervisor should use minimum memory pages to map from virtual The hypervisor should use minimum memory pages to map from virtual
address space into physical address space. So ACRN only support address space into the physical address space. So ACRN only supports
map linear addresses to 2-MByte pages, or 1-GByte pages, doesn't map linear addresses to 2-MByte pages, or 1-GByte pages; it doesn't
support map linear addresses to 4-KByte pages. support map linear addresses to 4-KByte pages.
- If 1GB hugepage can be used - If 1GB hugepage can be used
@ -147,23 +145,23 @@ support map linear addresses to 4-KByte pages.
address space mapping and 2MB hugepage can be used, the corresponding address space mapping and 2MB hugepage can be used, the corresponding
PDT entry shall be set for this 2MB hugepage. PDT entry shall be set for this 2MB hugepage.
If memory type or access rights of a page is updated, or some virtual If the memory type or access rights of a page is updated, or some virtual
address space is deleted, it will lead to splitting of the corresponding address space is deleted, it will lead to splitting of the corresponding
page. The hypervisor will still keep using minimum memory pages to map from page. The hypervisor will still keep using minimum memory pages to map from
virtual address space into physical address space. the virtual address space into the physical address space.
Memory Pages Pool Functions Memory Pages Pool Functions
=========================== ===========================
Memory pages pool functions provide static management of one Memory pages pool functions provide static management of one
4KB page-size memory block for each page level for each VM or HV, 4KB page-size memory block for each page level for each VM or HV; it is
used by the hypervisor to do memory mapping. used by the hypervisor to do memory mapping.
Data Flow Design Data Flow Design
================ ================
The physical memory management unit provides MMU 4-level page tables The physical memory management unit provides MMU 4-level page tables
creating and updating services, MMU page tables switching service, SMEP creation and services updates, MMU page tables switching service, SMEP
enable service, and HPA/HVA retrieving service to other units. enable service, and HPA/HVA retrieving service to other units.
:numref:`mem-data-flow-physical` shows the data flow diagram :numref:`mem-data-flow-physical` shows the data flow diagram
of physical memory management. of physical memory management.
@ -212,35 +210,35 @@ Address Space Translation
Hypervisor Memory Virtualization Hypervisor Memory Virtualization
******************************** ********************************
The hypervisor provides a contiguous region of physical memory for SOS The hypervisor provides a contiguous region of physical memory for the Service VM
and each UOS. It also guarantees that the SOS and UOS can not access and each User VM. It also guarantees that the Service and User VMs can not access the
code and internal data in the hypervisor, and each UOS can not access code and internal data in the hypervisor, and each User VM can not access
code and internal data of the SOS and other UOSs. the code and internal data of the Service VM and other User VMs.
The hypervisor: The hypervisor:
- enables EPT and VPID hardware virtualization features, - enables EPT and VPID hardware virtualization features
- establishes EPT page tables for SOS/UOS, - establishes EPT page tables for the Service and User VMs
- provides EPT page tables operations services, - provides EPT page tables operations services
- virtualizes MTRR for SOS/UOS, - virtualizes MTRR for Service and User VMs
- provides VPID operations services, - provides VPID operations services
- provides services for address spaces translation between GPA and HPA, and - provides services for address spaces translation between the GPA and HPA
- provides services for data transfer between hypervisor and virtual machine. - provides services for data transfer between the hypervisor and the virtual machine
Memory Virtualization Capability Checking Memory Virtualization Capability Checking
========================================= =========================================
In the hypervisor, memory virtualization provides EPT/VPID capability In the hypervisor, memory virtualization provides EPT/VPID capability
checking service and EPT hugepage supporting checking service. Before HV checking service and an EPT hugepage supporting checking service. Before the HV
enables memory virtualization and uses EPT hugepage, these service need enables memory virtualization and uses the EPT hugepage, these services need
to be invoked by other units. to be invoked by other units.
Data Transfer between Different Address Spaces Data Transfer between Different Address Spaces
============================================== ==============================================
In ACRN, different memory space management is used in the hypervisor, In ACRN, different memory space management is used in the hypervisor,
Service OS, and User OS to achieve spatial isolation. Between memory Service VM, and User VM to achieve spatial isolation. Between memory
spaces, there are different kinds of data transfer, such as a SOS/UOS spaces, there are different kinds of data transfer, such as when a Service/User VM
may hypercall to request hypervisor services which includes data may hypercall to request hypervisor services which includes data
transferring, or when the hypervisor does instruction emulation: the HV transferring, or when the hypervisor does instruction emulation: the HV
needs to access the guest instruction pointer register to fetch guest needs to access the guest instruction pointer register to fetch guest
@ -249,9 +247,9 @@ instruction data.
Access GPA from Hypervisor Access GPA from Hypervisor
-------------------------- --------------------------
When hypervisor need access GPA for data transfer, the caller from guest When the hypervisor needs to access the GPA for data transfer, the caller from guest
must make sure this memory range's GPA is continuous. But for HPA in must make sure this memory range's GPA is continuous. But for HPA in the
hypervisor, it could be dis-continuous (especially for UOS under hugetlb hypervisor, it could be discontinuous (especially for User VM under hugetlb
allocation mechanism). For example, a 4M GPA range may map to 2 allocation mechanism). For example, a 4M GPA range may map to 2
different 2M huge host-physical pages. The ACRN hypervisor must take different 2M huge host-physical pages. The ACRN hypervisor must take
care of this kind of data transfer by doing EPT page walking based on care of this kind of data transfer by doing EPT page walking based on
@ -260,9 +258,9 @@ its HPA.
Access GVA from Hypervisor Access GVA from Hypervisor
-------------------------- --------------------------
When hypervisor needs to access GVA for data transfer, it's likely both When the hypervisor needs to access GVA for data transfer, it's likely both
GPA and HPA could be address dis-continuous. The ACRN hypervisor must GPA and HPA could be address discontinuous. The ACRN hypervisor must
watch for this kind of data transfer, and handle it by doing page watch for this kind of data transfer and handle it by doing page
walking based on both its GPA and HPA. walking based on both its GPA and HPA.
EPT Page Tables Operations EPT Page Tables Operations
@ -317,7 +315,7 @@ VPID operations
=============== ===============
Virtual-processor identifier (VPID) is a hardware feature to optimize Virtual-processor identifier (VPID) is a hardware feature to optimize
TLB management. When VPID is enable, hardware will add a tag for TLB of TLB management. When VPID is enabled, hardware will add a tag for the TLB of
a logical processor and cache information for multiple linear-address a logical processor and cache information for multiple linear-address
spaces. VMX transitions may retain cached information and the logical spaces. VMX transitions may retain cached information and the logical
processor switches to a different address space, avoiding unnecessary processor switches to a different address space, avoiding unnecessary
@ -382,8 +380,8 @@ VM Exit about EPT
There are two VM exit handlers for EPT violation and EPT There are two VM exit handlers for EPT violation and EPT
misconfiguration in the hypervisor. EPT page tables are misconfiguration in the hypervisor. EPT page tables are
always configured correctly for SOS and UOS. If EPT misconfiguration is always configured correctly for the Service ans User VMs. If an EPT misconfiguration is
detected, a fatal error is reported by HV. The hypervisor detected, a fatal error is reported by the HV. The hypervisor
uses EPT violation to intercept MMIO access to do device emulation. EPT uses EPT violation to intercept MMIO access to do device emulation. EPT
violation handling data flow is described in the violation handling data flow is described in the
:ref:`instruction-emulation`. :ref:`instruction-emulation`.
@ -391,7 +389,7 @@ violation handling data flow is described in the
Memory Virtualization APIs Memory Virtualization APIs
========================== ==========================
Here is a list of major memory related APIs in HV: Here is a list of major memory related APIs in the HV:
EPT/VPID Capability Checking EPT/VPID Capability Checking
---------------------------- ----------------------------
@ -470,20 +468,20 @@ VPID
Service OS Memory Management Service OS Memory Management
**************************** ****************************
After the ACRN hypervisor starts, it creates the Service OS as its first After the ACRN hypervisor starts, it creates the Service VM as its first
VM. The Service OS runs all the native device drivers, manage the VM. The Service VM runs all the native device drivers, manages the
hardware devices, and provides I/O mediation to guest VMs. The Service hardware devices, and provides I/O mediation to guest VMs. The Service
OS is in charge of the memory allocation for Guest VMs as well. OS is in charge of the memory allocation for Guest VMs as well.
ACRN hypervisor passes the whole system memory access (except its own ACRN hypervisor passes the whole system memory access (except its own
part) to the Service OS. The Service OS must be able to access all of part) to the Service VM. The Service VM must be able to access all of
the system memory except the hypervisor part. the system memory except the hypervisor part.
Guest Physical Memory Layout - E820 Guest Physical Memory Layout - E820
=================================== ===================================
The ACRN hypervisor passes the original E820 table to the Service OS The ACRN hypervisor passes the original E820 table to the Service VM
after filtering out its own part. So from Service OS's view, it sees after filtering out its own part. So from Service VM's view, it sees
almost all the system memory as shown here: almost all the system memory as shown here:
.. figure:: images/mem-image3.png .. figure:: images/mem-image3.png
@ -496,9 +494,9 @@ almost all the system memory as shown here:
Host to Guest Mapping Host to Guest Mapping
===================== =====================
ACRN hypervisor creates Service OS's guest (GPA) to host (HPA) mapping ACRN hypervisor creates the Service OS's guest (GPA) to host (HPA) mapping
(EPT mapping) through the function ``prepare_sos_vm_memmap()`` (EPT mapping) through the function ``prepare_sos_vm_memmap()``
when it creates the SOS VM. It follows these rules: when it creates the Service VM. It follows these rules:
- Identical mapping - Identical mapping
- Map all memory range with UNCACHED type - Map all memory range with UNCACHED type
@ -507,18 +505,18 @@ when it creates the SOS VM. It follows these rules:
- Unmap all platform EPC resource - Unmap all platform EPC resource
- Unmap ACRN hypervisor emulated vLAPIC/vIOAPIC MMIO range - Unmap ACRN hypervisor emulated vLAPIC/vIOAPIC MMIO range
The guest to host mapping is static for the Service OS; it will not The guest to host mapping is static for the Service VM; it will not
change after the Service OS begins running except the PCI device BAR change after the Service VM begins running except the PCI device BAR
address mapping could be re-programmed by the Service OS. EPT violation address mapping could be re-programmed by the Service VM. EPT violation
is serving for vLAPIC/vIOAPIC's emulation or PCI MSI-X table BAR's emulation is serving for vLAPIC/vIOAPIC's emulation or PCI MSI-X table BAR's emulation
in the hypervisor for Service OS VM. in the hypervisor for Service VM.
Trusty Trusty
****** ******
For an Android User OS, there is a secure world named trusty world For an Android User OS, there is a secure world named trusty world
support, whose memory must be secured by the ACRN hypervisor and support, whose memory must be secured by the ACRN hypervisor and
must not be accessible by SOS and UOS normal world. must not be accessible by the Seervice/User VM normal world.
.. figure:: images/mem-image18.png .. figure:: images/mem-image18.png
:align: center :align: center