doc: update Memory management HLD

Update HLD documentation with HLD 0.7 section 3.3 (Memory Management).
Add a referenced target link to hv-cpu-virt.rst

Tracked-on: #1590

Signed-off-by: David B. Kinder <david.b.kinder@intel.com>
This commit is contained in:
David B. Kinder 2018-10-24 14:58:51 -07:00 committed by David Kinder
parent 2f8c31f6b4
commit bc7b06ae2f
8 changed files with 536 additions and 135 deletions

View File

@ -1207,6 +1207,7 @@ APIs to register its IO/MMIO range:
- unregister a MMIO emulation handler for a hypervisor emulated device
by specific MMIO range
.. _instruction-emulation:
Instruction Emulation
*********************

Binary file not shown.

After

Width:  |  Height:  |  Size: 52 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 19 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 56 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 161 KiB

View File

@ -8,21 +8,30 @@ This document describes memory management for the ACRN hypervisor.
Overview
********
The hypervisor (HV) virtualizes real physical memory so an unmodified OS
(such as Linux or Android) running in a virtual machine, has the view of
managing its own contiguous physical memory. HV uses virtual-processor
identifiers (VPIDs) and the extended page-table mechanism (EPT) to
translate guest-physical address into host-physical address. HV enables
EPT and VPID hardware virtualization features, establishes EPT page
tables for SOS/UOS, and provides EPT page tables operation interfaces to
others.
In the ACRN hypervisor system, there are few different memory spaces to
consider. From the hypervisor's point of view there are:
- Host Physical Address (HPA): the native physical address space, and
- Host Virtual Address (HVA): the native virtual address space based on
a MMU. A page table is used to do the translation between HPA and HVA
- **Host Physical Address (HPA)**: the native physical address space, and
- **Host Virtual Address (HVA)**: the native virtual address space based on
a MMU. A page table is used to translate between HPA and HVA
spaces.
And from the Guest OS running on a hypervisor there are:
From the Guest OS running on a hypervisor there are:
- Guest Physical Address (GPA): the guest physical address space from a
- **Guest Physical Address (GPA)**: the guest physical address space from a
virtual machine. GPA to HPA transition is usually based on a
MMU-like hardware module (EPT in X86), and associated with a page
table
- Guest Virtual Address (GVA): the guest virtual address space from a
- **Guest Virtual Address (GVA)**: the guest virtual address space from a
virtual machine based on a vMMU
.. figure:: images/mem-image2.png
@ -47,19 +56,25 @@ inside the hypervisor and from a VM:
- How ACRN hypervisor manages SOS guest memory (HPA/GPA)
- How ACRN hypervisor & SOS DM manage UOS guest memory (HPA/GPA)
Hypervisor Memory Management
****************************
Hypervisor Physical Memory Management
*************************************
The ACRN hypervisor is the primary owner to manage system
memory. Typically the boot firmware (e.g., EFI) passes the platform physical
memory layout - E820 table to the hypervisor. The ACRN hypervisor does its memory
management based on this table.
In the ACRN, the HV initializes MMU page tables to manage all physical
memory and then switches to the new MMU page tables. After MMU page
tables are initialized at the platform initialization stage, no updates
are made for MMU page tables.
Physical Memory Layout - E820
=============================
Hypervisor Physical Memory Layout - E820
========================================
The boot firmware (e.g., EFI) passes the E820 table through a multiboot protocol.
This table contains the original memory layout for the platform.
The ACRN hypervisor is the primary owner to manage system memory.
Typically the boot firmware (e.g., EFI) passes the platform physical
memory layout - E820 table to the hypervisor. The ACRN hypervisor does
its memory management based on this table using 4-level paging.
The BIOS/bootloader firmware (e.g., EFI) passes the E820 table through a
multiboot protocol. This table contains the original memory layout for
the platform.
.. figure:: images/mem-image1.png
:align: center
@ -69,38 +84,511 @@ This table contains the original memory layout for the platform.
Physical Memory Layout Example
:numref:`mem-layout` is an example of the physical memory layout based on a simple
platform E820 table. The following sections demonstrate different memory
space management by referencing it.
platform E820 table.
Physical to Virtual Mapping
===========================
Hypervisor Memory Initialization
================================
ACRN hypervisor is running under paging mode, so after receiving
the platform E820 table, ACRN hypervisor creates its MMU page table
based on it. This is done by the function init_paging() for all
physical CPUs.
The ACRN hypervisor runs under paging mode. After the bootstrap
processor (BSP) gets the platform E820 table, BSP creates its MMU page
table based on it. This is done by the function *init_paging()* and
*smep()*. After the application processor (AP) receives IPI CPU startup
interrupt, it uses the MMU page tables created by BSP and enable SMEP.
:numref:`hv-mem-init` describes the hypervisor memory initialization for BSP
and APs.
The memory mapping policy here is:
- Identical mapping for each physical CPU (ACRN hypervisor's memory
could be relocatable in a future implementation)
- Map all memory regions with UNCACHED type
- Remap RAM regions to WRITE-BACK type
.. figure:: images/mem-image4.png
.. figure:: images/mem-image8.png
:align: center
:width: 900px
:name: vm-layout
:name: hv-mem-init
Hypervisor Memory Initialization
The memory mapping policy used is:
- Identical mapping (ACRN hypervisor memory could be relocatable in
the future)
- Map all memory regions with UNCACHED type
- Remap RAM regions to WRITE-BACK type
.. figure:: images/mem-image69.png
:align: center
:name: hv-mem-vm-init
Hypervisor Virtual Memory Layout
:numref:`vm-layout` shows:
:numref:`hv-mem-vm-init` above shows:
- Hypervisor can access all of system memory
- Hypervisor has an UNCACHED MMIO/PCI hole reserved for devices, such
as for LAPIC/IOAPIC access
- Hypervisor has its own memory with WRITE-BACK cache type for its
code and data (< 1M part is for secondary CPU reset code)
- Hypervisor has a view of and can access all system memory
- Hypervisor has UNCACHED MMIO/PCI hole reserved for devices such as
LAPIC/IOAPIC accessing
- Hypervisor has its own memory with WRITE-BACK cache type for its
code/data (< 1M part is for secondary CPU reset code)
The hypervisor should use minimum memory pages to map from virtual
address space into physical address space.
- If 1GB hugepage can be used
for virtual address space mapping, the corresponding PDPT entry shall be
set for this 1GB hugepage.
- If 1GB hugepage can't be used for virtual
address space mapping and 2MB hugepage can be used, the corresponding
PDT entry shall be set for this 2MB hugepage.
- If both of 1GB hugepage
and 2MB hugepage can't be used for virtual address space mapping, the
corresponding PT entry shall be set.
If memory type or access rights of a page is updated, or some virtual
address space is deleted, it will lead to splitting of the corresponding
page. The hypervisor will still keep using minimum memory pages to map from
virtual address space into physical address space.
Memory Pages Pool Functions
===========================
Memory pages pool functions provide dynamic management of multiple
4KB page-size memory blocks, used by the hypervisor to store internal
data. Through these functions, the hypervisor can allocate and
deallocate pages.
Data Flow Design
================
The physical memory management unit provides MMU 4-level page tables
creating and updating services, MMU page tables switching service, SMEP
enable service, and HPA/HVA retrieving service to other units.
:numref:`mem-data-flow-physical` shows the data flow diagram
of physical memory management.
.. figure:: images/mem-image45.png
:align: center
:name: mem-data-flow-physical
Data Flow of Hypervisor Physical Memory Management
Data Structure Design
=====================
The page tables operation type:
.. code-block:: c
enum _page_table_type {
PTT_HOST = 0, /* Operations for MMU page tables */
PTT_EPT = 1, /* Operations for EPT page tables */
PAGETABLE_TYPE_UNKNOWN, /* Page tables operation type is unknown */
};
Interfaces Design
=================
MMU Initialization
------------------
.. list-table::
:widths: 50 50
:header-rows: 1
* - APIs
- Description
* - void enable_smep(void)
- Supervisor-mode execution prevention (SMEP) enable
* - void enable_paging(uint64_t pml64_base_addr)
- MMU paging enable
* - void init_paging(void)
- MMU page tables initialization
* - uint64_t get_paging_pml4(void)
- Page map level 4 (PML4) table start address getting
Page Allocation
---------------
.. list-table::
:widths: 50 50
:header-rows: 1
* - APIs
- Description
* - void \* alloc_paging_struct(void)
- Allocate one page from memory page pool
Address Space Translation
-------------------------
.. list-table::
:widths: 50 50
:header-rows: 1
* - APIs
- Description
* - HPA2HVA(x)
- Translate host-physical address to host-virtual address
* - HVA2HPA(x)
- Translate host-virtual address to host-physical address
Hypervisor Memory Virtualization
********************************
The hypervisor provides a contiguous region of physical memory for SOS
and each UOS. It also guarantees that the SOS and UOS can not access
code and internal data in the hypervisor, and each UOS can not access
code and internal data of the SOS and other UOSs.
The hypervisor:
- enables EPT and VPID hardware virtualization features,
- establishes EPT page tables for SOS/UOS,
- provides EPT page tables operations services,
- virtualizes MTRR for SOS/UOS,
- provides VPID operations services,
- provides services for address spaces translation between GPA and HPA, and
- provides services for data transfer between hypervisor and virtual machine.
Memory Virtualization Capability Checking
=========================================
In the hypervisor, memory virtualization provides EPT/VPID capability
checking service and EPT hugepage supporting checking service. Before HV
enables memory virtualization and uses EPT hugepage, these service need
to be invoked by other units.
Data Transfer between Different Address Spaces
==============================================
In ACRN, different memory space management is used in the hypervisor,
Service OS, and User OS to achieve spatial isolation. Between memory
spaces, there are different kinds of data transfer, such as a SOS/UOS
may hypercall to request hypervisor services which includes data
transferring, or when the hypervisor does instruction emulation: the HV
needs to access the guest instruction pointer register to fetch guest
instruction data.
Access GPA from Hypervisor
--------------------------
When hypervisor need access GPA for data transfer, the caller from guest
must make sure this memory range's GPA is continuous. But for HPA in
hypervisor, it could be dis-continuous (especially for UOS under hugetlb
allocation mechanism). For example, a 4M GPA range may map to 2
different 2M huge host-physical pages. The ACRN hypervisor must take
care of this kind of data transfer by doing EPT page walking based on
its HPA.
Access GVA from Hypervisor
--------------------------
When hypervisor needs to access GVA for data transfer, it's likely both
GPA and HPA could be address dis-continuous. The ACRN hypervisor must
watch for this kind of data transfer, and handle it by doing page
walking based on both its GPA and HPA.
EPT Page Tables Operations
==========================
The hypervisor should use a minimum of memory pages to map from
guest-physical address (GPA) space into host-physical address (HPA)
space.
- If 1GB hugepage can be used for GPA space mapping, the
corresponding EPT PDPT entry shall be set for this 1GB hugepage.
- If 1GB hugepage can't be used for GPA space mapping and 2MB hugepage can be
used, the corresponding EPT PDT entry shall be set for this 2MB
hugepage.
- If both 1GB hugepage and 2MB hugepage can't be used for GPA
space mapping, the corresponding EPT PT entry shall be set.
If memory type or access rights of a page is updated or some GPA space
is deleted, it will lead to the corresponding EPT page being split. The
hypervisor should still keep to using minimum EPT pages to map from GPA
space into HPA space.
The hypervisor provides EPT guest-physical mappings adding service, EPT
guest-physical mappings modifying/deleting service, EPT page tables
deallocation, and EPT guest-physical mappings invalidation service.
Virtual MTRR
************
In ACRN, the hypervisor only virtualizes MTRRs fixed range (0~1MB).
The HV sets MTRRs of the fixed range as Write-Back for UOS, and the SOS reads
native MTRRs of the fixed range set by BIOS.
If the guest physical address is not in the fixed range (0~1MB), the
hypervisor uses the default memory type in the MTRR (Write-Back).
When the guest disables MTRRs, the HV sets the guest address memory type
as UC.
If the guest physical address is in fixed range (0~1MB), the HV sets
memory type according to the fixed virtual MTRRs.
When the guest enable MTRRs, MTRRs have no effect on the memory type
used for access to GPA. The HV first intercepts MTRR MSR registers
access through MSR access VM exit and updates EPT memory type field in EPT
PTE according to the memory type selected by MTRRs. This combines with
PAT entry in the PAT MSR (which is determined by PAT, PCD, and PWT bits
from the guest paging structures) to determine the effective memory
type.
VPID operations
===============
Virtual-processor identifier (VPID) is a hardware feature to optimize
TLB management. When VPID is enable, hardware will add a tag for TLB of
a logical processor and cache information for multiple linear-address
spaces. VMX transitions may retain cached information and the logical
processor switches to a different address space, avoiding unnecessary
TLB flushes.
In ACRN, an unique VPID must be allocated for each virtual CPU
when a virtual CPU is created. The logical processor invalidates linear
mappings and combined mapping associated with all VPIDs (except VPID
0000H), and with all PCIDs when the logical processor launches the virtual
CPU. The logical processor invalidates all linear mapping and combined
mappings associated with the specified VPID when the interrupt pending
request handling needs to invalidate cached mapping of the specified
VPID.
Data Flow Design
================
The memory virtualization unit includes address space translation
functions, data transferring functions, VM EPT operations functions,
VPID operations functions, VM exit hanging about EPT violation and EPT
misconfiguration, and MTRR virtualization functions. This unit handles
guest-physical mapping updates by creating or updating related EPT page
tables. It virtualizes MTRR for guest OS by updating related EPT page
tables. It handles address translation from GPA to HPA by walking EPT
page tables. It copies data from VM into the HV or from the HV to VM by
walking guest MMU page tables and EPT page tables. It provides services
to allocate VPID for each virtual CPU and TLB invalidation related VPID.
It handles VM exit about EPT violation and EPT misconfiguration. The
following :numref:`mem-flow-mem-virt` describes the data flow diagram of
the memory virtualization unit.
.. figure:: images/mem-image84.png
:align: center
:name: mem-flow-mem-virt
Data Flow of Hypervisor Memory Virtualization
Data Structure Design
=====================
EPT Memory Type Data Definition:
.. code-block:: c
/* EPT memory type is specified in bits 5:3 of the last EPT
* paging-structure entry */
#define EPT_MT_SHIFT 3U
/* EPT memory type is uncacheable */
#define EPT_UNCACHED (0UL << EPT_MT_SHIFT)
/* EPT memory type is write combining */
#define EPT_WC (1UL << EPT_MT_SHIFT)
/* EPT memory type is write through */
#define EPT_WT (4UL << EPT_MT_SHIFT)
/* EPT memory type is write protected */
#define EPT_WP (5UL << EPT_MT_SHIFT)
/* EPT memory type is write back */
#define EPT_WB (6UL << EPT_MT_SHIFT)
EPT Memory Access Right Definition:
.. code-block:: c
/* EPT memory access right is read-only */
#define EPT_RD (1UL << 0U)
/* EPT memory access right is read/write */
#define EPT_WR (1UL << 1U)
/* EPT memory access right is executable */
#define EPT_EXE (1UL << 2U)
/* EPT memory access right is read/write and executable */
define EPT_RWX (EPT_RD | EPT_WR | EPT_EXE)
Interfaces Design
=================
The memory virtualization unit interacts with external units through VM
exit and APIs.
VM Exit about EPT
=================
There are two VM exit handlers for EPT violation and EPT
misconfiguration in the hypervisor. EPT page tables are
always configured correctly for SOS and UOS. If EPT misconfiguration is
detected, a fatal error is reported by HV. The hypervisor
uses EPT violation to intercept MMIO access to do device emulation. EPT
violation handling data flow is described in the
:ref:`instruction-emulation`.
Memory Virtualization APIs
==========================
Here is a list of major memory related APIs in HV:
EPT/VPID Capability Checking
----------------------------
.. list-table::
:widths: 50 50
:header-rows: 1
* - APIs
- Description
* - int check_vmx_mmu_cap(void)
- EPT and VPID capability checking
1GB Hugepage Supporting Checking
--------------------------------
.. list-table::
:widths: 50 50
:header-rows: 1
* - APIs
- Description
* - bool check_mmu_1gb_support(enum _page_table_type page_table_type)
- 1GB page supporting capability checking
Data Transferring between hypervisor and VM
-------------------------------------------
.. list-table::
:widths: 50 50
:header-rows: 1
* - APIs
- Description
* - int copy_from_gpa(const struct vm \*vm, void \*h_ptr, uint64_t gpa, uint32_t size)
- Copy data from VM GPA space to HV address space
* - int copy_to_gpa(const struct vm \*vm, void \*h_ptr, uint64_t gpa, uint32_t size)
- Copy data from HV address space to VM GPA space
* - int copy_from_gva(struct vcpu \*vcpu, void \*h_ptr, uint64_t gva,
uint32_t size, uint32_t \*err_code, uint64_t \*fault_addr)
- Copy data from VM GVA space to HV address space
* - int copy_to_gva(struct vcpu \*vcpu, void \*h_ptr, uint64_t gva,
uint32_t size, uint32_t \*err_code, uint64_t \*fault_addr)
- Copy data from HV address space to VM GVA space
Address Space Translation
-------------------------
.. list-table::
:widths: 50 50
:header-rows: 1
* - APIs
- Description
* - uint64_t gpa2hpa(const struct vm \*vm, uint64_t gpa)
- Translating from guest-physical address to host-physical address
* - uint64_t hpa2gpa(const struct vm \*vm, uint64_t hpa)
- Translating from host-physical address to guest-physical address
* - bool check_continuous_hpa(struct vm \*vm, uint64_t gpa_arg, uint64_t size_arg)
- Host-physical address continuous checking
EPT
---
.. list-table::
:widths: 50 50
:header-rows: 1
* - APIs
- Description
* - int ept_mr_add(const struct vm \*vm, uint64_t hpa_arg, uint64_t gpa_arg,
uint64_t size, uint32_t prot_arg)
- Guest-physical memory region mapping
* - int ept_mr_del(const struct vm \*vm, uint64_t \*pml4_page, uint64_t gpa,
uint64_t size)
- Guest-physical memory region unmapping
* - int ept_mr_modify(const struct vm \*vm, uint64_t \*pml4_page, uint64_t gpa,
uint64_t size, uint64_t prot_set, uint64_t prot_clr)
- Guest-physical memory page access right or memory type updating
* - void destroy_ept(struct vm \*vm)
- EPT page tables destroy
* - void free_ept_mem(void \*pml4_addr)
- EPT page tables free
* - void invept(struct vcpu \*vcpu)
- Guest-physical mappings and combined mappings invalidation
* - int ept_violation_vmexit_handler(struct vcpu \*vcpu)
- EPT violation handling
* - int ept_misconfig_vmexit_handler(__unused struct vcpu \*vcpu)
- EPT misconfiguration handling
Virtual MTRR
------------
.. list-table::
:widths: 50 50
:header-rows: 1
* - APIs
- Description
* - void init_mtrr(struct vcpu \*vcpu)
- Virtual MTRR initialization
* - void mtrr_wrmsr(struct vcpu \*vcpu, uint32_t msr, uint64_t value)
- Virtual MTRR MSR write
* - uint64_t mtrr_rdmsr(struct vcpu \*vcpu, uint32_t msr)
- Virtual MTRR MSR read
VPID
----
.. list-table::
:widths: 50 50
:header-rows: 1
* - APIs
- Description
* - uint16_t allocate_vpid(void)
- VPID allocation
* - void flush_vpid_single(uint16_t vpid)
- Specified VPID flush
* - void flush_vpid_global(void)
- All VPID flush
Service OS Memory Management
****************************
@ -132,9 +620,8 @@ Host to Guest Mapping
=====================
ACRN hypervisor creates Service OS's host (HPA) to guest (GPA) mapping
(EPT mapping) through the function
``prepare_vm0_memmap_and_e820()`` when it creates the SOS VM. It follows
these rules:
(EPT mapping) through the function ``prepare_vm0_memmap_and_e820()``
when it creates the SOS VM. It follows these rules:
- Identical mapping
- Map all memory range with UNCACHED type
@ -148,101 +635,14 @@ can access its MMIO through this static mapping. EPT violation is only
serving for vLAPIC/vIOAPIC's emulation in the hypervisor for Service OS
VM.
User OS Memory Management
*************************
User OS VM is created by the DM (Device Model) application running in
the Service OS. DM is responsible for the memory allocation for a User
or Guest OS VM.
Guest Physical Memory Layout - E820
===================================
DM will create the E820 table for a User OS VM based on these simple
rules:
- If requested VM memory size < low memory limitation (defined in DM,
as 2GB), then low memory range = [0, requested VM memory size]
- If requested VM memory size > low memory limitation (defined in DM,
as 2GB), then low memory range = [0, 2GB], high memory range = [4GB,
4GB + requested VM memory size - 2GB]
.. figure:: images/mem-image6.png
:align: center
:width: 900px
:name: uos-mem-layout
UOS Physical Memory Layout
DM is doing UOS memory allocation based on hugeTLB mechanism by
default. The real memory mapping
may be scattered in SOS physical memory space, as shown below:
.. figure:: images/mem-image5.png
:align: center
:width: 900px
:name: uos-mem-layout-hugetlb
UOS Physical Memory Layout Based on Hugetlb
Host to Guest Mapping
=====================
A User OS VM's memory is allocated by the Service OS DM application, and
may come from different huge pages in the Service OS as shown in
:ref:`uos-mem-layout-hugetlb`.
As Service OS has the full information of these huge pages size,
SOS-GPA and UOS-GPA, it works with the hypervisor to complete UOS's host
to guest mapping using this pseudo code:
.. code-block:: c
for x in allocated huge pages do
x.hpa = gpa2hpa_for_sos(x.sos_gpa)
host2guest_map_for_uos(x.hpa, x.uos_gpa, x.size)
end
Trusty
======
******
For an Android User OS, there is a secure world called "trusty world
support", whose memory needs are taken care by the ACRN hypervisor for
security consideration. From the memory management's view, the trusty
memory space should not be accessible by SOS or UOS normal world.
For an Android User OS, there is a secure world named trusty world
support, whose memory must be secured by the ACRN hypervisor and
must not be accessible by SOS and UOS normal world.
.. figure:: images/mem-image7.png
.. figure:: images/mem-image18.png
:align: center
:width: 900px
:name: uos-mem-layout-trusty
UOS Physical Memory Layout with Trusty
Memory Interaction
******************
Previous sections described different memory spaces management in the
ACRN hypervisor, Service OS, and User OS. Among these memory spaces,
there are different kinds of interaction, for example, a VM may do a
hypercall to the hypervisor that includes a data transfer, or an
instruction emulation in the hypervisor may need to access the Guest
instruction pointer register to fetch instruction data.
Access GPA from Hypervisor
==========================
When a hypervisor needs access to the GPA for data transfers, the caller
from the Guest must make sure this memory range's GPA is address
continuous. But for HPA in the hypervisor, it could be address
dis-continuous (especially for UOS under hugetlb allocation mechanism).
For example, a 4MB GPA range may map to 2 different 2MB huge pages. The
ACRN hypervisor needs to take care of this kind of data transfer by
doing EPT page walking based on its HPA.
Access GVA from Hypervisor
==========================
Likely, when hypervisor need to access GVA for data transfer, both GPA
and HPA could be address dis-continuous. The ACRN hypervisor must pay
attention to this kind of data transfer, and handle it by doing page
walking based on both its GPA and HPA.