doc: add partition mode hld
Partition mode HLD content added to hypervisor hld. Add target links in referenced docs. Signed-off-by: David B. Kinder <david.b.kinder@intel.com>
@ -16,6 +16,7 @@ Hypervisor high-level design
|
||||
Virtual Interrupt <hv-virt-interrupt>
|
||||
VT-d <hv-vt-d>
|
||||
Device Passthrough <hv-dev-passthrough>
|
||||
hv-partitionmode
|
||||
Power Management <hv-pm>
|
||||
Console, Shell, and vUART <hv-console>
|
||||
Hypercall / VHM upcall <hv-hypercall>
|
||||
|
@ -1,8 +1,10 @@
|
||||
.. _hv-console:
|
||||
.. _hv-console-shell-uart:
|
||||
|
||||
Hypervisor console, hypervisor shell, and virtual UART
|
||||
######################################################
|
||||
|
||||
.. _hv-console:
|
||||
|
||||
Hypervisor console
|
||||
******************
|
||||
|
||||
|
367
doc/developer-guides/hld/hv-partitionmode.rst
Normal file
@ -0,0 +1,367 @@
|
||||
.. _partition-mode-hld:
|
||||
|
||||
Partition mode
|
||||
##############
|
||||
|
||||
ACRN is type-1 hypervisor that supports running multiple guest operating
|
||||
systems (OS). Typically, the platform BIOS/boot-loader boots ACRN, and
|
||||
ACRN loads single or multiple guest OSes. Refer to :ref:`hv-startup` for
|
||||
details on the start-up flow of the ACRN hypervisor.
|
||||
|
||||
ACRN supports two modes of operation: Sharing mode and Partition mode.
|
||||
This document describes ACRN's high-level design for Partition mode
|
||||
support.
|
||||
|
||||
.. contents::
|
||||
:depth: 2
|
||||
:local:
|
||||
|
||||
Introduction
|
||||
************
|
||||
|
||||
In partition mode, ACRN provides guests with exclusive access to cores,
|
||||
memory, cache, and peripheral devices. Partition mode enables developers
|
||||
to dedicate resources exclusively among the guests. However there is no
|
||||
support today in x86 hardware or in ACRN to partition resources such as
|
||||
peripheral buses (e.g. PCI) or memory bandwidth. Cache partitioning
|
||||
technology, such as Cache Allocation Technology (CAT) in x86, can be
|
||||
used by developers to partition Last Level Cache (LLC) among the guests.
|
||||
(Note: ACRN support for x86 CAT is on the roadmap, but not currently
|
||||
supported).
|
||||
|
||||
ACRN expects static partitioning of resources either by code
|
||||
modification for guest configuration or through compile-time config
|
||||
options. All the devices exposed to the guests are either physical
|
||||
resources or emulated in the hypervisor. So, there is no need for
|
||||
device-model and Service OS. :numref:`pmode2vms` shows a partition mode
|
||||
example of two VMs with exclusive access to physical resources.
|
||||
|
||||
.. figure:: images/partition-image3.png
|
||||
:align: center
|
||||
:name: pmode2vms
|
||||
|
||||
Partition Mode example with two VMs
|
||||
|
||||
Guest info
|
||||
**********
|
||||
|
||||
ACRN uses multi-boot info passed from the platform boot-loader to know
|
||||
the location of each guest kernel in memory. ACRN creates a copy of each
|
||||
guest kernel into each of the guests' memory. Current implementation of
|
||||
ACRN requires developers to specify kernel parameters for the guests as
|
||||
part of guest configuration. ACRN picks up kernel parameters from guest
|
||||
configuration and copies them to the corresponding guest memory.
|
||||
|
||||
.. figure:: images/partition-image18.png
|
||||
:align: center
|
||||
|
||||
ACRN set-up for guests
|
||||
**********************
|
||||
|
||||
Cores
|
||||
=====
|
||||
|
||||
ACRN requires the developer to specify the number of guests and the
|
||||
cores dedicated for each guest. Also the developer needs to specify
|
||||
the physical core used as the Boot Strap Processor (BSP) for each guest. As
|
||||
the processors are brought to life in the hypervisor, it checks if they are
|
||||
configured as BSP for any of the guests. If a processor is BSP of any of
|
||||
the guests, ACRN proceeds to build the memory mapping for the guest,
|
||||
mptable, E820 entries, and zero page for the guest. As described in
|
||||
`Guest info`_, ACRN creates copies of guest kernel and kernel
|
||||
parameters into guest memory. :numref:`partBSPsetup` explains these
|
||||
events in chronological order.
|
||||
|
||||
.. figure:: images/partition-image7.png
|
||||
:align: center
|
||||
:name: partBSPsetup
|
||||
|
||||
Memory
|
||||
======
|
||||
|
||||
For each guest in partition mode, the ACRN developer specifies the size of
|
||||
memory for the guest and the starting address in the host physical
|
||||
address in the guest configuration. There is no support for HIGHMEM for
|
||||
partition mode guests. The developer needs to take care of two aspects
|
||||
for assigning host memory to the guests:
|
||||
|
||||
1) Sum of guest PCI hole and guest "System RAM" is less than 4GB.
|
||||
|
||||
2) Pick the starting address in the host physical address and the
|
||||
size, so that it does not overlap with any reserved regions in
|
||||
host E820.
|
||||
|
||||
ACRN creates EPT mapping for the guest between GPA (0, memory size) and
|
||||
HPA (starting address in guest configuration, memory size).
|
||||
|
||||
E820 and zero page info
|
||||
=======================
|
||||
|
||||
A default E820 is used for all the guests in partition mode. This table
|
||||
shows the reference E820 layout. Zero page is created with this
|
||||
e820 info for all the guests.
|
||||
|
||||
+------------------------+
|
||||
| RAM |
|
||||
| |
|
||||
| 0 - 0xEFFFFH |
|
||||
+------------------------+
|
||||
| RESERVED (MPTABLE) |
|
||||
| |
|
||||
| 0xF0000H - 0x100000H |
|
||||
+------------------------+
|
||||
| RAM |
|
||||
| |
|
||||
| 0x100000H - LOWMEM |
|
||||
+------------------------+
|
||||
| RESERVED |
|
||||
+------------------------+
|
||||
| PCI HOLE |
|
||||
+------------------------+
|
||||
| RESERVED |
|
||||
+------------------------+
|
||||
|
||||
Platform info - mptable
|
||||
=======================
|
||||
|
||||
ACRN, in partition mode, uses mptable to convey platform info to each
|
||||
guest. Using this platform information, number of cores used for each
|
||||
guest, and whether the guest needs devices with INTX, ACRN builds
|
||||
mptable and copies it to the guest memory. In partition mode, ACRN uses
|
||||
physical APIC IDs to pass to the guests.
|
||||
|
||||
I/O - Virtual devices
|
||||
=====================
|
||||
|
||||
Port I/O is supported for PCI device config space 0xcfc and 0xcf8, vUART
|
||||
0x3f8, vRTC 0x70 and 0x71, and vPIC ranges 0x20/21, 0xa0/a1, and
|
||||
0x4d0/4d1. MMIO is supported for vIOAPIC. ACRN exposes a virtual
|
||||
host-bridge at BDF (Bus Device Function) 0.0:0 to each guest. Access to
|
||||
256 bytes of config space for virtual host bridge is emulated.
|
||||
|
||||
I/O - Pass-thru devices
|
||||
=======================
|
||||
|
||||
ACRN, in partition mode, supports passing thru PCI devices on the
|
||||
platform. All the pass-thru devices are exposed as child devices under
|
||||
the virtual host bridge. ACRN does not support either passing thru
|
||||
bridges or emulating virtual bridges. Pass-thru devices should be
|
||||
statically allocated to each guest using the guest configuration. ACRN
|
||||
expects the developer to provide the virtual BDF to BDF of the
|
||||
physical device mapping for all the pass-thru devices as
|
||||
part of each guest configuration.
|
||||
|
||||
Run-time ACRN support for guests
|
||||
********************************
|
||||
|
||||
ACRN, in partition mode, supports an option to pass-thru LAPIC of the
|
||||
physical CPUs to the guest. ACRN expects developers to specify if the
|
||||
guest needs LAPIC pass-thru using guest configuration. When guest
|
||||
configures vLAPIC as x2APIC, and if the guest configuration has LAPIC
|
||||
pass-thru enabled, ACRN passes the LAPIC to the guest. Guest can access
|
||||
the LAPIC hardware directly without hypervisor interception. During
|
||||
runtime of the guest, this option differentiates how ACRN supports
|
||||
inter-processor interrupt handling and device interrupt handling. This
|
||||
will be discussed in detail in the corresponding sections.
|
||||
|
||||
.. figure:: images/partition-image16.png
|
||||
:align: center
|
||||
|
||||
|
||||
Guest SMP boot flow
|
||||
===================
|
||||
|
||||
The core APIC IDs are reported to the guest using mptable info. SMP boot
|
||||
flow is similar to sharing mode. Refer to :ref:`vm-startup`
|
||||
for guest SMP boot flow in ACRN. Partition mode guests startup is same as
|
||||
the SOS startup in sharing mode.
|
||||
|
||||
Inter-processor Interrupt (IPI) Handling
|
||||
========================================
|
||||
|
||||
Guests w/o LAPIC pass-thru
|
||||
--------------------------
|
||||
|
||||
For guests without LAPIC pass-thru, IPIs between guest CPUs are handled in
|
||||
the same way as sharing mode of ACRN. Refer to :ref:`virtual-interrupt-hld`
|
||||
for more details.
|
||||
|
||||
Guests w/ LAPIC pass-thru
|
||||
-------------------------
|
||||
|
||||
ACRN supports pass-thru if and only if the guest is using x2APIC mode
|
||||
for the vLAPIC. In LAPIC pass-thru mode, writes to Interrupt Command
|
||||
Register (ICR) x2APIC MSR is intercepted. Guest writes the IPI info
|
||||
including vector, destination APIC IDs to the ICR. Upon an IPI request
|
||||
from the guest, ACRN does sanity check on the destination processors
|
||||
programmed into ICR. If the destination is a valid target for the guest,
|
||||
ACRN sends IPI with the same vector from ICR to the physical CPUs
|
||||
corresponding to the destination processor info in ICR.
|
||||
|
||||
.. figure:: images/partition-image14.png
|
||||
:align: center
|
||||
|
||||
|
||||
Pass-thru device support
|
||||
========================
|
||||
|
||||
Configuration space access
|
||||
--------------------------
|
||||
|
||||
ACRN emulates Configuration Space Address (0xcf8) I/O port and
|
||||
Configuration Space Data (0xcfc) I/O port for guests to access PCI
|
||||
devices configuration space. Within the config space of a device, Base
|
||||
Address registers (BAR), offsets starting from 0x10H to 0x24H, provide
|
||||
the information about the resources (I/O and MMIO) used by the PCI
|
||||
device. ACRN virtualizes the BAR registers and for the rest of the
|
||||
config space, forwards reads and writes to the physical config space of
|
||||
pass-thru devices. Refer to `I/O`_ section below for more details.
|
||||
|
||||
.. figure:: images/partition-image1.png
|
||||
:align: center
|
||||
|
||||
|
||||
DMA
|
||||
---
|
||||
|
||||
ACRN developers need to statically define the pass-thru devices for each
|
||||
guest using the guest configuration. For devices to DMA to/from guest
|
||||
memory directly, ACRN parses the list of pass-thru devices for each
|
||||
guest and creates context entries in the VT-d remapping hardware. EPT
|
||||
page tables created for the guest are used for VT-d page tables.
|
||||
|
||||
I/O
|
||||
---
|
||||
|
||||
ACRN supports I/O for pass-thru devices with two restrictions.
|
||||
|
||||
1) Supports only MMIO. So requires developers to expose I/O BARs as
|
||||
not present in the guest configuration.
|
||||
|
||||
2) Supports only 32-bit MMIO BAR type.
|
||||
|
||||
As guest PCI sub-system scans the PCI bus and assigns Guest Physical
|
||||
Address (GPA) to the MMIO BAR, ACRN maps GPA to the address in the
|
||||
physical BAR of the pass-thru device using EPT. Following timeline chart
|
||||
explains how PCI devices are assigned to guest and BARs are mapped upon
|
||||
guest initialization.
|
||||
|
||||
.. figure:: images/partition-image13.png
|
||||
:align: center
|
||||
|
||||
|
||||
Interrupt Configuration
|
||||
-----------------------
|
||||
|
||||
ACRN supports both legacy (INTx) and MSI interrupts for pass-thru
|
||||
devices.
|
||||
|
||||
INTx support
|
||||
~~~~~~~~~~~~
|
||||
|
||||
ACRN expects developers to identify the interrupt line info (0x3CH) from
|
||||
the physical BAR of the pass-thru device and build an interrupt entry in
|
||||
the mptable for the corresponding guest. As guest configures the vIOAPIC
|
||||
for the interrupt RTE, ACRN writes the info from the guest RTE into the
|
||||
physical IOAPIC RTE. Upon guest kernel request to mask the interrupt,
|
||||
ACRN writes to the physical RTE to mask the interrupt at the physical
|
||||
IOAPIC. When guest masks the RTE in vIOAPIC, ACRN masks the interrupt
|
||||
RTE in the physical IOAPIC. Level triggered interrupts are not
|
||||
supported.
|
||||
|
||||
MSI support
|
||||
~~~~~~~~~~~
|
||||
|
||||
Guest reads/writes to PCI configuration space for configuring MSI
|
||||
interrupts using address. Data and control registers are pass-thru to
|
||||
the physical BAR of pass-thru device. Refer to `Configuration
|
||||
space access`_ for details on how PCI configuration space is emulated.
|
||||
|
||||
Virtual device support
|
||||
======================
|
||||
|
||||
ACRN provides read-only vRTC support for partition mode guests. Writes
|
||||
to the data port are discarded.
|
||||
|
||||
For port I/O to ports other than vPIC, vRTC, or vUART, reads return 0xFF and
|
||||
writes are discarded.
|
||||
|
||||
Interrupt delivery
|
||||
==================
|
||||
|
||||
Guests w/o LAPIC pass-thru
|
||||
--------------------------
|
||||
|
||||
In partition mode of ACRN, interrupts stay disabled after a vmexit. The
|
||||
processor does not take interrupts when it is executing in VMX root
|
||||
mode. ACRN configures the processor to take vmexit upon external
|
||||
interrupt if the processor is executing in VMX non-root mode. Upon an
|
||||
external interrupt, after sending EOI to the physical LAPIC, ACRN
|
||||
injects the vector into the vLAPIC of vCPU currently running on the
|
||||
processor. Guests using Linux as kernel, uses vectors less than 0xECh
|
||||
for device interrupts.
|
||||
|
||||
.. figure:: images/partition-image20.png
|
||||
:align: center
|
||||
|
||||
|
||||
Guests w/ LAPIC pass-thru
|
||||
-------------------------
|
||||
|
||||
For guests with LAPIC pass-thru, ACRN does not configure vmexit upon
|
||||
external interrupts. There is no vmexit upon device interrupts and they are
|
||||
handled by the guest IDT.
|
||||
|
||||
Hypervisor IPI service
|
||||
======================
|
||||
|
||||
ACRN needs IPIs for events such as flushing TLBs across CPUs, sending virtual
|
||||
device interrupts (e.g. vUART to vCPUs), and others.
|
||||
|
||||
Guests w/o LAPIC pass-thru
|
||||
--------------------------
|
||||
|
||||
Hypervisor IPIs work the same way as in sharing mode.
|
||||
|
||||
Guests w/ LAPIC pass-thru
|
||||
-------------------------
|
||||
|
||||
Since external interrupts are pass-thru to guest IDT, IPIs do not
|
||||
trigger vmexit. ACRN uses NMI delivery mode and the NMI exiting is
|
||||
chosen for vCPUs. At the time of NMI interrupt on the target processor,
|
||||
if the processor is in non-root mode, vmexit happens on the processor
|
||||
and the event mask is checked for servicing the events.
|
||||
|
||||
Debug Console
|
||||
=============
|
||||
|
||||
For details on how hypervisor console works, refer to
|
||||
:ref:`hv-console`.
|
||||
|
||||
For a guest console in partition mode, ACRN provides an option to pass
|
||||
``vmid`` as an argument to ``sos_console``. vmid is same as the one
|
||||
developer uses in the guest configuration.
|
||||
|
||||
Guests w/o LAPIC pass-thru
|
||||
--------------------------
|
||||
|
||||
Works the same way as sharing mode.
|
||||
|
||||
Hypervisor Console
|
||||
==================
|
||||
|
||||
ACRN uses TSC deadline timer to provide timer service. Hypervisor
|
||||
console uses a timer on CPU0 to poll characters on the serial device. To
|
||||
support LAPIC pass-thru, TSC deadline MSR is pass-thru and the local
|
||||
timer interrupt also delivered to the guest IDT. Instead of TSC deadline
|
||||
timer, ACRN uses VMX preemption timer to poll the serial device.
|
||||
|
||||
Guest Console
|
||||
=============
|
||||
|
||||
ACRN exposes vUART to partition mode guests. vUART uses vPIC to inject
|
||||
interrupt to the guest BSP. In cases of guest having more than one core,
|
||||
during runtime, vUART might need to inject interrupt to guest BSP from
|
||||
another core (other than BSP). As mentioned in section <Hypervisor IPI
|
||||
service>, ACRN uses NMI delivery mode for notifying the CPU running BSP
|
||||
of the guest.
|
@ -101,6 +101,8 @@ initial states, including IDT and physical PICs.
|
||||
After BSP detects that all APs are up, BSP will start creating the first
|
||||
VM, i.e. SOS, as explained in the next section.
|
||||
|
||||
.. _vm-startup:
|
||||
|
||||
VM Startup
|
||||
**********
|
||||
|
||||
|
BIN
doc/developer-guides/hld/images/partition-image1.png
Normal file
After Width: | Height: | Size: 22 KiB |
BIN
doc/developer-guides/hld/images/partition-image13.png
Normal file
After Width: | Height: | Size: 37 KiB |
BIN
doc/developer-guides/hld/images/partition-image14.png
Normal file
After Width: | Height: | Size: 23 KiB |
BIN
doc/developer-guides/hld/images/partition-image16.png
Normal file
After Width: | Height: | Size: 28 KiB |
BIN
doc/developer-guides/hld/images/partition-image18.png
Normal file
After Width: | Height: | Size: 41 KiB |
BIN
doc/developer-guides/hld/images/partition-image20.png
Normal file
After Width: | Height: | Size: 37 KiB |
BIN
doc/developer-guides/hld/images/partition-image3.png
Normal file
After Width: | Height: | Size: 38 KiB |
BIN
doc/developer-guides/hld/images/partition-image7.png
Normal file
After Width: | Height: | Size: 46 KiB |