doc: continue doc restructuring
Changing the folder structure will cause too many broken links for external references (from other sites). So, let's put the content back where it was before the reorg, and instead use the new persona-based navigation to point to documents in the original locations. Also, introduce redirects for some documents that no longer exits. Signed-off-by: David B. Kinder <david.b.kinder@intel.com>
954
doc/developer-guides/hld/hld-APL_GVT-g.rst
Normal file
@@ -0,0 +1,954 @@
|
||||
.. _APL_GVT-g-hld:
|
||||
|
||||
GVT-g high-level design
|
||||
#######################
|
||||
|
||||
Introduction
|
||||
************
|
||||
|
||||
Purpose of this Document
|
||||
========================
|
||||
|
||||
This high-level design (HLD) document describes the usage requirements
|
||||
and high level design for Intel |reg| Graphics Virtualization Technology for
|
||||
shared virtual :term:`GPU` technology (:term:`GVT-g`) on Apollo Lake-I
|
||||
SoCs.
|
||||
|
||||
This document describes:
|
||||
|
||||
- The different GPU virtualization techniques
|
||||
- GVT-g mediated pass-through
|
||||
- High level design
|
||||
- Key components
|
||||
- GVT-g new architecture differentiation
|
||||
|
||||
Audience
|
||||
========
|
||||
|
||||
This document is for developers, validation teams, architects and
|
||||
maintainers of Intel |reg| GVT-g for the Apollo Lake SoCs.
|
||||
|
||||
The reader should have some familiarity with the basic concepts of
|
||||
system virtualization and Intel processor graphics.
|
||||
|
||||
Reference Documents
|
||||
===================
|
||||
|
||||
The following documents were used as references for this specification:
|
||||
|
||||
- Paper in USENIX ATC '14 - *Full GPU Virtualization Solution with
|
||||
Mediated Pass-Through* - https://www.usenix.org/node/183932
|
||||
|
||||
- Hardware Specification - PRMs -
|
||||
https://01.org/linuxgraphics/documentation/hardware-specification-prms
|
||||
|
||||
Background
|
||||
**********
|
||||
|
||||
Intel GVT-g is an enabling technology in emerging graphics
|
||||
virtualization scenarios. It adopts a full GPU virtualization approach
|
||||
based on mediated pass-through technology, to achieve good performance,
|
||||
scalability and secure isolation among Virtual Machines (VMs). A virtual
|
||||
GPU (vGPU), with full GPU features, is presented to each VM so that a
|
||||
native graphics driver can run directly inside a VM.
|
||||
|
||||
Intel GVT-g technology for Apollo Lake (APL) has been implemented in
|
||||
open source hypervisors or Virtual Machine Monitors (VMMs):
|
||||
|
||||
- Intel GVT-g for ACRN, also known as, "AcrnGT"
|
||||
- Intel GVT-g for KVM, also known as, "KVMGT"
|
||||
- Intel GVT-g for Xen, also known as, "XenGT"
|
||||
|
||||
The core vGPU device model is released under BSD/MIT dual license, so it
|
||||
can be reused in other proprietary hypervisors.
|
||||
|
||||
Intel has a portfolio of graphics virtualization technologies
|
||||
(:term:`GVT-g`, :term:`GVT-d` and :term:`GVT-s`). GVT-d and GVT-s are
|
||||
outside of the scope of this document.
|
||||
|
||||
This HLD applies to the Apollo Lake platform only. Support of other
|
||||
hardware is outside the scope of this HLD.
|
||||
|
||||
Targeted Usages
|
||||
===============
|
||||
|
||||
The main targeted usage of GVT-g is in automotive applications, such as:
|
||||
|
||||
- An Instrument cluster running in one domain
|
||||
- An In Vehicle Infotainment (IVI) solution running in another domain
|
||||
- Additional domains for specific purposes, such as Rear Seat
|
||||
Entertainment or video camera capturing.
|
||||
|
||||
.. figure:: images/APL_GVT-g-ive-use-case.png
|
||||
:width: 900px
|
||||
:align: center
|
||||
:name: ive-use-case
|
||||
|
||||
IVE Use Case
|
||||
|
||||
Existing Techniques
|
||||
===================
|
||||
|
||||
A graphics device is no different from any other I/O device, with
|
||||
respect to how the device I/O interface is virtualized. Therefore,
|
||||
existing I/O virtualization techniques can be applied to graphics
|
||||
virtualization. However, none of the existing techniques can meet the
|
||||
general requirement of performance, scalability, and secure isolation
|
||||
simultaneously. In this section, we review the pros and cons of each
|
||||
technique in detail, enabling the audience to understand the rationale
|
||||
behind the entire GVT-g effort.
|
||||
|
||||
Emulation
|
||||
---------
|
||||
|
||||
A device can be emulated fully in software, including its I/O registers
|
||||
and internal functional blocks. There would be no dependency on the
|
||||
underlying hardware capability, therefore compatibility can be achieved
|
||||
across platforms. However, due to the CPU emulation cost, this technique
|
||||
is usually used for legacy devices, such as a keyboard, mouse, and VGA
|
||||
card. There would be great complexity and extremely low performance to
|
||||
fully emulate a modern accelerator, such as a GPU. It may be acceptable
|
||||
for use in a simulation environment, but it is definitely not suitable
|
||||
for production usage.
|
||||
|
||||
API Forwarding
|
||||
--------------
|
||||
|
||||
API forwarding, or a split driver model, is another widely-used I/O
|
||||
virtualization technology. It has been used in commercial virtualization
|
||||
productions, for example, VMware*, PCoIP*, and Microsoft* RemoteFx*.
|
||||
It is a natural path when researchers study a new type of
|
||||
I/O virtualization usage, for example, when GPGPU computing in VM was
|
||||
initially proposed. Intel GVT-s is based on this approach.
|
||||
|
||||
The architecture of API forwarding is shown in :numref:`api-forwarding`:
|
||||
|
||||
.. figure:: images/APL_GVT-g-api-forwarding.png
|
||||
:width: 400px
|
||||
:align: center
|
||||
:name: api-forwarding
|
||||
|
||||
API Forwarding
|
||||
|
||||
A frontend driver is employed to forward high-level API calls (OpenGL,
|
||||
Directx, and so on) inside a VM, to a Backend driver in the Hypervisor
|
||||
for acceleration. The Backend may be using a different graphics stack,
|
||||
so API translation between different graphics protocols may be required.
|
||||
The Backend driver allocates a physical GPU resource for each VM,
|
||||
behaving like a normal graphics application in a Hypervisor. Shared
|
||||
memory may be used to reduce memory copying between the host and guest
|
||||
graphic stacks.
|
||||
|
||||
API forwarding can bring hardware acceleration capability into a VM,
|
||||
with other merits such as vendor independence and high density. However, it
|
||||
also suffers from the following intrinsic limitations:
|
||||
|
||||
- Lagging features - Every new API version needs to be specifically
|
||||
handled, so it means slow time-to-market (TTM) to support new standards.
|
||||
For example,
|
||||
only DirectX9 is supported, when DirectX11 is already in the market.
|
||||
Also, there is a big gap in supporting media and compute usages.
|
||||
|
||||
- Compatibility issues - A GPU is very complex, and consequently so are
|
||||
high level graphics APIs. Different protocols are not 100% compatible
|
||||
on every subtle API, so the customer can observe feature/quality loss
|
||||
for specific applications.
|
||||
|
||||
- Maintenance burden - Occurs when supported protocols and specific
|
||||
versions are incremented.
|
||||
|
||||
- Performance overhead - Different API forwarding implementations
|
||||
exhibit quite different performance, which gives rise to a need for a
|
||||
fine-grained graphics tuning effort.
|
||||
|
||||
Direct Pass-Through
|
||||
-------------------
|
||||
|
||||
"Direct pass-through" dedicates the GPU to a single VM, providing full
|
||||
features and good performance, but at the cost of device sharing
|
||||
capability among VMs. Only one VM at a time can use the hardware
|
||||
acceleration capability of the GPU, which is a major limitation of this
|
||||
technique. However, it is still a good approach to enable graphics
|
||||
virtualization usages on Intel server platforms, as an intermediate
|
||||
solution. Intel GVT-d uses this mechanism.
|
||||
|
||||
.. figure:: images/APL_GVT-g-pass-through.png
|
||||
:width: 400px
|
||||
:align: center
|
||||
:name: gvt-pass-through
|
||||
|
||||
Pass-Through
|
||||
|
||||
SR-IOV
|
||||
------
|
||||
|
||||
Single Root IO Virtualization (SR-IOV) implements I/O virtualization
|
||||
directly on a device. Multiple Virtual Functions (VFs) are implemented,
|
||||
with each VF directly assignable to a VM.
|
||||
|
||||
.. _Graphic_mediation:
|
||||
|
||||
Mediated Pass-Through
|
||||
*********************
|
||||
|
||||
Intel GVT-g achieves full GPU virtualization using a "mediated
|
||||
pass-through" technique.
|
||||
|
||||
Concept
|
||||
=======
|
||||
|
||||
Mediated pass-through allows a VM to access performance-critical I/O
|
||||
resources (usually partitioned) directly, without intervention from the
|
||||
hypervisor in most cases. Privileged operations from this VM are
|
||||
trapped-and-emulated to provide secure isolation among VMs.
|
||||
|
||||
.. figure:: images/APL_GVT-g-mediated-pass-through.png
|
||||
:width: 400px
|
||||
:align: center
|
||||
:name: mediated-pass-through
|
||||
|
||||
Mediated Pass-Through
|
||||
|
||||
The Hypervisor must ensure that no vulnerability is exposed when
|
||||
assigning performance-critical resource to each VM. When a
|
||||
performance-critical resource cannot be partitioned, a scheduler must be
|
||||
implemented (either in software or hardware) to allow time-based sharing
|
||||
among multiple VMs. In this case, the device must allow the hypervisor
|
||||
to save and restore the hardware state associated with the shared resource,
|
||||
either through direct I/O register reads and writes (when there is no software
|
||||
invisible state) or through a device-specific context save and restore
|
||||
mechanism (where there is a software invisible state).
|
||||
|
||||
Examples of performance-critical I/O resources include the following:
|
||||
|
||||
.. figure:: images/APL_GVT-g-perf-critical.png
|
||||
:width: 800px
|
||||
:align: center
|
||||
:name: perf-critical
|
||||
|
||||
Performance-Critical I/O Resources
|
||||
|
||||
|
||||
The key to implementing mediated pass-through for a specific device is
|
||||
to define the right policy for various I/O resources.
|
||||
|
||||
Virtualization Policies for GPU Resources
|
||||
=========================================
|
||||
|
||||
:numref:`graphics-arch` shows how Intel Processor Graphics works at a high level.
|
||||
Software drivers write commands into a command buffer through the CPU.
|
||||
The Render Engine in the GPU fetches these commands and executes them.
|
||||
The Display Engine fetches pixel data from the Frame Buffer and sends
|
||||
them to the external monitors for display.
|
||||
|
||||
.. figure:: images/APL_GVT-g-graphics-arch.png
|
||||
:width: 400px
|
||||
:align: center
|
||||
:name: graphics-arch
|
||||
|
||||
Architecture of Intel Processor Graphics
|
||||
|
||||
This architecture abstraction applies to most modern GPUs, but may
|
||||
differ in how graphics memory is implemented. Intel Processor Graphics
|
||||
uses system memory as graphics memory. System memory can be mapped into
|
||||
multiple virtual address spaces by GPU page tables. A 4 GB global
|
||||
virtual address space called "global graphics memory", accessible from
|
||||
both the GPU and CPU, is mapped through a global page table. Local
|
||||
graphics memory spaces are supported in the form of multiple 4 GB local
|
||||
virtual address spaces, but are only limited to access by the Render
|
||||
Engine through local page tables. Global graphics memory is mostly used
|
||||
for the Frame Buffer and also serves as the Command Buffer. Massive data
|
||||
accesses are made to local graphics memory when hardware acceleration is
|
||||
in progress. Other GPUs have similar page table mechanism accompanying
|
||||
the on-die memory.
|
||||
|
||||
The CPU programs the GPU through GPU-specific commands, shown in
|
||||
:numref:`graphics-arch`, using a producer-consumer model. The graphics
|
||||
driver programs GPU commands into the Command Buffer, including primary
|
||||
buffer and batch buffer, according to the high-level programming APIs,
|
||||
such as OpenGL* or DirectX*. Then, the GPU fetches and executes the
|
||||
commands. The primary buffer (called a ring buffer) may chain other
|
||||
batch buffers together. The primary buffer and ring buffer are used
|
||||
interchangeably thereafter. The batch buffer is used to convey the
|
||||
majority of the commands (up to ~98% of them) per programming model. A
|
||||
register tuple (head, tail) is used to control the ring buffer. The CPU
|
||||
submits the commands to the GPU by updating the tail, while the GPU
|
||||
fetches commands from the head, and then notifies the CPU by updating
|
||||
the head, after the commands have finished execution. Therefore, when
|
||||
the GPU has executed all commands from the ring buffer, the head and
|
||||
tail pointers are the same.
|
||||
|
||||
Having introduced the GPU architecture abstraction, it is important for
|
||||
us to understand how real-world graphics applications use the GPU
|
||||
hardware so that we can virtualize it in VMs efficiently. To do so, we
|
||||
characterized, for some representative GPU-intensive 3D workloads (the
|
||||
Phoronix Test Suite), the usages of the four critical interfaces:
|
||||
|
||||
1) the Frame Buffer,
|
||||
2) the Command Buffer,
|
||||
3) the GPU Page Table Entries (PTEs), which carry the GPU page tables, and
|
||||
4) the I/O registers, including Memory-Mapped I/O (MMIO) registers,
|
||||
Port I/O (PIO) registers, and PCI configuration space registers
|
||||
for internal state.
|
||||
|
||||
:numref:`access-patterns` shows the average access frequency of running
|
||||
Phoronix 3D workloads on the four interfaces.
|
||||
|
||||
The Frame Buffer and Command Buffer exhibit the most
|
||||
performance-critical resources, as shown in :numref:`access-patterns`.
|
||||
When the applications are being loaded, lots of source vertices and
|
||||
pixels are written by the CPU, so the Frame Buffer accesses occur in the
|
||||
range of hundreds of thousands per second. Then at run-time, the CPU
|
||||
programs the GPU through the commands, to render the Frame Buffer, so
|
||||
the Command Buffer accesses become the largest group, also in the
|
||||
hundreds of thousands per second. PTE and I/O accesses are minor in both
|
||||
load and run-time phases ranging in tens of thousands per second.
|
||||
|
||||
.. figure:: images/APL_GVT-g-access-patterns.png
|
||||
:width: 400px
|
||||
:align: center
|
||||
:name: access-patterns
|
||||
|
||||
Access Patterns of Running 3D Workloads
|
||||
|
||||
High Level Architecture
|
||||
***********************
|
||||
|
||||
:numref:`gvt-arch` shows the overall architecture of GVT-g, based on the
|
||||
ACRN hypervisor, with SOS as the privileged VM, and multiple user
|
||||
guests. A GVT-g device model working with the ACRN hypervisor,
|
||||
implements the policies of trap and pass-through. Each guest runs the
|
||||
native graphics driver and can directly access performance-critical
|
||||
resources: the Frame Buffer and Command Buffer, with resource
|
||||
partitioning (as presented later). To protect privileged resources, that
|
||||
is, the I/O registers and PTEs, corresponding accesses from the graphics
|
||||
driver in user VMs are trapped and forwarded to the GVT device model in
|
||||
SOS for emulation. The device model leverages i915 interfaces to access
|
||||
the physical GPU.
|
||||
|
||||
In addition, the device model implements a GPU scheduler that runs
|
||||
concurrently with the CPU scheduler in ACRN to share the physical GPU
|
||||
timeslot among the VMs. GVT-g uses the physical GPU to directly execute
|
||||
all the commands submitted from a VM, so it avoids the complexity of
|
||||
emulating the Render Engine, which is the most complex part of the GPU.
|
||||
In the meantime, the resource pass-through of both the Frame Buffer and
|
||||
Command Buffer minimizes the hypervisor's intervention of CPU accesses,
|
||||
while the GPU scheduler guarantees every VM a quantum time-slice for
|
||||
direct GPU execution. With that, GVT-g can achieve near-native
|
||||
performance for a VM workload.
|
||||
|
||||
In :numref:`gvt-arch`, the yellow GVT device model works as a client on
|
||||
top of an i915 driver in the SOS. It has a generic Mediated Pass-Through
|
||||
(MPT) interface, compatible with all types of hypervisors. For ACRN,
|
||||
some extra development work is needed for such MPT interfaces. For
|
||||
example, we need some changes in ACRN-DM to make ACRN compatible with
|
||||
the MPT framework. The vGPU lifecycle is the same as the lifecycle of
|
||||
the guest VM creation through ACRN-DM. They interact through sysfs,
|
||||
exposed by the GVT device model.
|
||||
|
||||
.. figure:: images/APL_GVT-g-arch.png
|
||||
:width: 600px
|
||||
:align: center
|
||||
:name: gvt-arch
|
||||
|
||||
AcrnGT High-level Architecture
|
||||
|
||||
Key Techniques
|
||||
**************
|
||||
|
||||
vGPU Device Model
|
||||
=================
|
||||
|
||||
The vGPU Device model is the main component because it constructs the
|
||||
vGPU instance for each guest to satisfy every GPU request from the guest
|
||||
and gives the corresponding result back to the guest.
|
||||
|
||||
The vGPU Device Model provides the basic framework to do
|
||||
trap-and-emulation, including MMIO virtualization, interrupt
|
||||
virtualization, and display virtualization. It also handles and
|
||||
processes all the requests internally, such as, command scan and shadow,
|
||||
schedules them in the proper manner, and finally submits to
|
||||
the SOS i915 driver.
|
||||
|
||||
.. figure:: images/APL_GVT-g-DM.png
|
||||
:width: 800px
|
||||
:align: center
|
||||
:name: GVT-DM
|
||||
|
||||
GVT-g Device Model
|
||||
|
||||
MMIO Virtualization
|
||||
-------------------
|
||||
|
||||
Intel Processor Graphics implements two PCI MMIO BARs:
|
||||
|
||||
- **GTTMMADR BAR**: Combines both :term:`GGTT` modification range and Memory
|
||||
Mapped IO range. It is 16 MB on :term:`BDW`, with 2 MB used by MMIO, 6 MB
|
||||
reserved and 8 MB allocated to GGTT. GGTT starts from
|
||||
:term:`GTTMMADR` + 8 MB. In this section, we focus on virtualization of
|
||||
the MMIO range, discussing GGTT virtualization later.
|
||||
|
||||
- **GMADR BAR**: As the PCI aperture is used by the CPU to access tiled
|
||||
graphics memory, GVT-g partitions this aperture range among VMs for
|
||||
performance reasons.
|
||||
|
||||
A 2 MB virtual MMIO structure is allocated per vGPU instance.
|
||||
|
||||
All the virtual MMIO registers are emulated as simple in-memory
|
||||
read-write, that is, guest driver will read back the same value that was
|
||||
programmed earlier. A common emulation handler (for example,
|
||||
intel_gvt_emulate_read/write) is enough to handle such general
|
||||
emulation requirements. However, some registers need to be emulated with
|
||||
specific logic, for example, affected by change of other states or
|
||||
additional audit or translation when updating the virtual register.
|
||||
Therefore, a specific emulation handler must be installed for those
|
||||
special registers.
|
||||
|
||||
The graphics driver may have assumptions about the initial device state,
|
||||
which stays with the point when the BIOS transitions to the OS. To meet
|
||||
the driver expectation, we need to provide an initial state of vGPU that
|
||||
a driver may observe on a pGPU. So the host graphics driver is expected
|
||||
to generate a snapshot of physical GPU state, which it does before guest
|
||||
driver's initialization. This snapshot is used as the initial vGPU state
|
||||
by the device model.
|
||||
|
||||
PCI Configuration Space Virtualization
|
||||
--------------------------------------
|
||||
|
||||
PCI configuration space also needs to be virtualized in the device
|
||||
model. Different implementations may choose to implement the logic
|
||||
within the vGPU device model or in default system device model (for
|
||||
example, ACRN-DM). GVT-g emulates the logic in the device model.
|
||||
|
||||
Some information is vital for the vGPU device model, including:
|
||||
Guest PCI BAR, Guest PCI MSI, and Base of ACPI OpRegion.
|
||||
|
||||
Legacy VGA Port I/O Virtualization
|
||||
----------------------------------
|
||||
|
||||
Legacy VGA is not supported in the vGPU device model. We rely on the
|
||||
default device model (for example, :term:`QEMU`) to provide legacy VGA
|
||||
emulation, which means either ISA VGA emulation or
|
||||
PCI VGA emulation.
|
||||
|
||||
Interrupt Virtualization
|
||||
------------------------
|
||||
|
||||
The GVT device model does not touch the hardware interrupt in the new
|
||||
architecture, since it is hard to combine the interrupt controlling
|
||||
logic between the virtual device model and the host driver. To prevent
|
||||
architectural changes in the host driver, the host GPU interrupt does
|
||||
not go to the virtual device model and the virtual device model has to
|
||||
handle the GPU interrupt virtualization by itself. Virtual GPU
|
||||
interrupts are categorized into three types:
|
||||
|
||||
- Periodic GPU interrupts are emulated by timers. However, a notable
|
||||
exception to this is the VBlank interrupt. Due to the demands of user
|
||||
space compositors, such as Wayland, which requires a flip done event
|
||||
to be synchronized with a VBlank, this interrupt is forwarded from
|
||||
SOS to UOS when SOS receives it from the hardware.
|
||||
|
||||
- Event-based GPU interrupts are emulated by the emulation logic. For
|
||||
example, AUX Channel Interrupt.
|
||||
|
||||
- GPU command interrupts are emulated by a command parser and workload
|
||||
dispatcher. The command parser marks out which GPU command interrupts
|
||||
are generated during the command execution and the workload
|
||||
dispatcher injects those interrupts into the VM after the workload is
|
||||
finished.
|
||||
|
||||
.. figure:: images/APL_GVT-g-interrupt-virt.png
|
||||
:width: 400px
|
||||
:align: center
|
||||
:name: interrupt-virt
|
||||
|
||||
Interrupt Virtualization
|
||||
|
||||
Workload Scheduler
|
||||
------------------
|
||||
|
||||
The scheduling policy and workload scheduler are decoupled for
|
||||
scalability reasons. For example, a future QoS enhancement will only
|
||||
impact the scheduling policy, any i915 interface change or HW submission
|
||||
interface change (from execlist to :term:`GuC`) will only need workload
|
||||
scheduler updates.
|
||||
|
||||
The scheduling policy framework is the core of the vGPU workload
|
||||
scheduling system. It controls all of the scheduling actions and
|
||||
provides the developer with a generic framework for easy development of
|
||||
scheduling policies. The scheduling policy framework controls the work
|
||||
scheduling process without caring about how the workload is dispatched
|
||||
or completed. All the detailed workload dispatching is hidden in the
|
||||
workload scheduler, which is the actual executer of a vGPU workload.
|
||||
|
||||
The workload scheduler handles everything about one vGPU workload. Each
|
||||
hardware ring is backed by one workload scheduler kernel thread. The
|
||||
workload scheduler picks the workload from current vGPU workload queue
|
||||
and communicates with the virtual HW submission interface to emulate the
|
||||
"schedule-in" status for the vGPU. It performs context shadow, Command
|
||||
Buffer scan and shadow, PPGTT page table pin/unpin/out-of-sync, before
|
||||
submitting this workload to the host i915 driver. When the vGPU workload
|
||||
is completed, the workload scheduler asks the virtual HW submission
|
||||
interface to emulate the "schedule-out" status for the vGPU. The VM
|
||||
graphics driver then knows that a GPU workload is finished.
|
||||
|
||||
.. figure:: images/APL_GVT-g-scheduling.png
|
||||
:width: 500px
|
||||
:align: center
|
||||
:name: scheduling
|
||||
|
||||
GVT-g Scheduling Framework
|
||||
|
||||
Workload Submission Path
|
||||
------------------------
|
||||
|
||||
Software submits the workload using the legacy ring buffer mode on Intel
|
||||
Processor Graphics before Broadwell, which is no longer supported by the
|
||||
GVT-g virtual device model. A new HW submission interface named
|
||||
"Execlist" is introduced since Broadwell. With the new HW submission
|
||||
interface, software can achieve better programmability and easier
|
||||
context management. In Intel GVT-g, the vGPU submits the workload
|
||||
through the virtual HW submission interface. Each workload in submission
|
||||
will be represented as an ``intel_vgpu_workload`` data structure, a vGPU
|
||||
workload, which will be put on a per-vGPU and per-engine workload queue
|
||||
later after performing a few basic checks and verifications.
|
||||
|
||||
.. figure:: images/APL_GVT-g-workload.png
|
||||
:width: 800px
|
||||
:align: center
|
||||
:name: workload
|
||||
|
||||
GVT-g Workload Submission
|
||||
|
||||
|
||||
Display Virtualization
|
||||
----------------------
|
||||
|
||||
GVT-g reuses the i915 graphics driver in the SOS to initialize the Display
|
||||
Engine, and then manages the Display Engine to show different VM frame
|
||||
buffers. When two vGPUs have the same resolution, only the frame buffer
|
||||
locations are switched.
|
||||
|
||||
.. figure:: images/APL_GVT-g-display-virt.png
|
||||
:width: 800px
|
||||
:align: center
|
||||
:name: display-virt
|
||||
|
||||
Display Virtualization
|
||||
|
||||
Direct Display Model
|
||||
--------------------
|
||||
|
||||
.. figure:: images/APL_GVT-g-direct-display.png
|
||||
:width: 600px
|
||||
:align: center
|
||||
:name: direct-display
|
||||
|
||||
Direct Display Model
|
||||
|
||||
A typical automotive use case is where there are two displays in the car
|
||||
and each one needs to show one domain's content, with the two domains
|
||||
being the Instrument cluster and the In Vehicle Infotainment (IVI). As
|
||||
shown in :numref:`direct-display`, this can be accomplished through the direct
|
||||
display model of GVT-g, where the SOS and UOS are each assigned all HW
|
||||
planes of two different pipes. GVT-g has a concept of display owner on a
|
||||
per HW plane basis. If it determines that a particular domain is the
|
||||
owner of a HW plane, then it allows the domain's MMIO register write to
|
||||
flip a frame buffer to that plane to go through to the HW. Otherwise,
|
||||
such writes are blocked by the GVT-g.
|
||||
|
||||
Indirect Display Model
|
||||
----------------------
|
||||
|
||||
.. figure:: images/APL_GVT-g-indirect-display.png
|
||||
:width: 600px
|
||||
:align: center
|
||||
:name: indirect-display
|
||||
|
||||
Indirect Display Model
|
||||
|
||||
For security or fastboot reasons, it may be determined that the UOS is
|
||||
either not allowed to display its content directly on the HW or it may
|
||||
be too late before it boots up and displays its content. In such a
|
||||
scenario, the responsibility of displaying content on all displays lies
|
||||
with the SOS. One of the use cases that can be realized is to display the
|
||||
entire frame buffer of the UOS on a secondary display. GVT-g allows for this
|
||||
model by first trapping all MMIO writes by the UOS to the HW. A proxy
|
||||
application can then capture the address in GGTT where the UOS has written
|
||||
its frame buffer and using the help of the Hypervisor and the SOS's i915
|
||||
driver, can convert the Guest Physical Addresses (GPAs) into Host
|
||||
Physical Addresses (HPAs) before making a texture source or EGL image
|
||||
out of the frame buffer and then either post processing it further or
|
||||
simply displaying it on a HW plane of the secondary display.
|
||||
|
||||
GGTT-Based Surface Sharing
|
||||
--------------------------
|
||||
|
||||
One of the major automotive use case is called "surface sharing". This
|
||||
use case requires that the SOS accesses an individual surface or a set of
|
||||
surfaces from the UOS without having to access the entire frame buffer of
|
||||
the UOS. Unlike the previous two models, where the UOS did not have to do
|
||||
anything to show its content and therefore a completely unmodified UOS
|
||||
could continue to run, this model requires changes to the UOS.
|
||||
|
||||
This model can be considered an extension of the indirect display model.
|
||||
Under the indirect display model, the UOS's frame buffer was temporarily
|
||||
pinned by it in the video memory access through the Global graphics
|
||||
translation table. This GGTT-based surface sharing model takes this a
|
||||
step further by having a compositor of the UOS to temporarily pin all
|
||||
application buffers into GGTT. It then also requires the compositor to
|
||||
create a metadata table with relevant surface information such as width,
|
||||
height, and GGTT offset, and flip that in lieu of the frame buffer.
|
||||
In the SOS, the proxy application knows that the GGTT offset has been
|
||||
flipped, maps it, and through it can access the GGTT offset of an
|
||||
application that it wants to access. It is worth mentioning that in this
|
||||
model, UOS applications did not require any changes, and only the
|
||||
compositor, Mesa, and i915 driver had to be modified.
|
||||
|
||||
This model has a major benefit and a major limitation. The
|
||||
benefit is that since it builds on top of the indirect display model,
|
||||
there are no special drivers necessary for it on either SOS or UOS.
|
||||
Therefore, any Real Time Operating System (RTOS) that use
|
||||
this model can simply do so without having to implement a driver, the
|
||||
infrastructure for which may not be present in their operating system.
|
||||
The limitation of this model is that video memory dedicated for a UOS is
|
||||
generally limited to a couple of hundred MBs. This can easily be
|
||||
exhausted by a few application buffers so the number and size of buffers
|
||||
is limited. Since it is not a highly-scalable model, in general, Intel
|
||||
recommends the Hyper DMA buffer sharing model, described next.
|
||||
|
||||
Hyper DMA Buffer Sharing
|
||||
------------------------
|
||||
|
||||
.. figure:: images/APL_GVT-g-hyper-dma.png
|
||||
:width: 800px
|
||||
:align: center
|
||||
:name: hyper-dma
|
||||
|
||||
Hyper DMA Buffer Design
|
||||
|
||||
Another approach to surface sharing is Hyper DMA Buffer sharing. This
|
||||
model extends the Linux DMA buffer sharing mechanism where one driver is
|
||||
able to share its pages with another driver within one domain.
|
||||
|
||||
Applications buffers are backed by i915 Graphics Execution Manager
|
||||
Buffer Objects (GEM BOs). As in GGTT surface
|
||||
sharing, this model also requires compositor changes. The compositor of
|
||||
UOS requests i915 to export these application GEM BOs and then passes
|
||||
them on to a special driver called the Hyper DMA Buf exporter whose job
|
||||
is to create a scatter gather list of pages mapped by PDEs and PTEs and
|
||||
export a Hyper DMA Buf ID back to the compositor.
|
||||
|
||||
The compositor then shares this Hyper DMA Buf ID with the SOS's Hyper DMA
|
||||
Buf importer driver which then maps the memory represented by this ID in
|
||||
the SOS. A proxy application in the SOS can then provide the ID of this driver
|
||||
to the SOS i915, which can create its own GEM BO. Finally, the application
|
||||
can use it as an EGL image and do any post processing required before
|
||||
either providing it to the SOS compositor or directly flipping it on a
|
||||
HW plane in the compositor's absence.
|
||||
|
||||
This model is highly scalable and can be used to share up to 4 GB worth
|
||||
of pages. It is also not limited to only sharing graphics buffers. Other
|
||||
buffers for the IPU and others, can also be shared with it. However, it
|
||||
does require that the SOS port the Hyper DMA Buffer importer driver. Also,
|
||||
the SOS OS must comprehend and implement the DMA buffer sharing model.
|
||||
|
||||
For detailed information about this model, please refer to the `Linux
|
||||
HYPER_DMABUF Driver High Level Design
|
||||
<https://github.com/downor/linux_hyper_dmabuf/blob/hyper_dmabuf_integration_v4/Documentation/hyper-dmabuf-sharing.txt>`_.
|
||||
|
||||
.. _plane_restriction:
|
||||
|
||||
Plane-Based Domain Ownership
|
||||
----------------------------
|
||||
|
||||
.. figure:: images/APL_GVT-g-plane-based.png
|
||||
:width: 600px
|
||||
:align: center
|
||||
:name: plane-based
|
||||
|
||||
Plane-Based Domain Ownership
|
||||
|
||||
Yet another mechanism for showing content of both the SOS and UOS on the
|
||||
same physical display is called plane-based domain ownership. Under this
|
||||
model, both the SOS and UOS are provided a set of HW planes that they can
|
||||
flip their contents on to. Since each domain provides its content, there
|
||||
is no need for any extra composition to be done through the SOS. The display
|
||||
controller handles alpha blending contents of different domains on a
|
||||
single pipe. This saves on any complexity on either the SOS or the UOS
|
||||
SW stack.
|
||||
|
||||
It is important to provide only specific planes and have them statically
|
||||
assigned to different Domains. To achieve this, the i915 driver of both
|
||||
domains is provided a command line parameter that specifies the exact
|
||||
planes that this domain has access to. The i915 driver then enumerates
|
||||
only those HW planes and exposes them to its compositor. It is then left
|
||||
to the compositor configuration to use these planes appropriately and
|
||||
show the correct content on them. No other changes are necessary.
|
||||
|
||||
While the biggest benefit of this model is that is extremely simple and
|
||||
quick to implement, it also has some drawbacks. First, since each domain
|
||||
is responsible for showing the content on the screen, there is no
|
||||
control of the UOS by the SOS. If the UOS is untrusted, this could
|
||||
potentially cause some unwanted content to be displayed. Also, there is
|
||||
no post processing capability, except that provided by the display
|
||||
controller (for example, scaling, rotation, and so on). So each domain
|
||||
must provide finished buffers with the expectation that alpha blending
|
||||
with another domain will not cause any corruption or unwanted artifacts.
|
||||
|
||||
Graphics Memory Virtualization
|
||||
==============================
|
||||
|
||||
To achieve near-to-native graphics performance, GVT-g passes through the
|
||||
performance-critical operations, such as Frame Buffer and Command Buffer
|
||||
from the VM. For the global graphics memory space, GVT-g uses graphics
|
||||
memory resource partitioning and an address space ballooning mechanism.
|
||||
For local graphics memory spaces, GVT-g implements per-VM local graphics
|
||||
memory through a render context switch because local graphics memory is
|
||||
only accessible by the GPU.
|
||||
|
||||
Global Graphics Memory
|
||||
----------------------
|
||||
|
||||
Graphics Memory Resource Partitioning
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
|
||||
GVT-g partitions the global graphics memory among VMs. Splitting the
|
||||
CPU/GPU scheduling mechanism requires that the global graphics memory of
|
||||
different VMs can be accessed by the CPU and the GPU simultaneously.
|
||||
Consequently, GVT-g must, at any time, present each VM with its own
|
||||
resource, leading to the resource partitioning approaching, for global
|
||||
graphics memory, as shown in :numref:`mem-part`.
|
||||
|
||||
.. figure:: images/APL_GVT-g-mem-part.png
|
||||
:width: 800px
|
||||
:align: center
|
||||
:name: mem-part
|
||||
|
||||
Memory Partition and Ballooning
|
||||
|
||||
The performance impact of reduced global graphics memory resource
|
||||
due to memory partitioning is very limited according to various test
|
||||
results.
|
||||
|
||||
Address Space Ballooning
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
|
||||
The address space ballooning technique is introduced to eliminate the
|
||||
address translation overhead, shown in :numref:`mem-part`. GVT-g exposes the
|
||||
partitioning information to the VM graphics driver through the PVINFO
|
||||
MMIO window. The graphics driver marks the other VMs' regions as
|
||||
'ballooned', and reserves them as not being used from its graphics
|
||||
memory allocator. Under this design, the guest view of global graphics
|
||||
memory space is exactly the same as the host view and the driver
|
||||
programmed addresses, using guest physical address, can be directly used
|
||||
by the hardware. Address space ballooning is different from traditional
|
||||
memory ballooning techniques. Memory ballooning is for memory usage
|
||||
control concerning the number of ballooned pages, while address space
|
||||
ballooning is to balloon special memory address ranges.
|
||||
|
||||
Another benefit of address space ballooning is that there is no address
|
||||
translation overhead as we use the guest Command Buffer for direct GPU
|
||||
execution.
|
||||
|
||||
Per-VM Local Graphics Memory
|
||||
----------------------------
|
||||
|
||||
GVT-g allows each VM to use the full local graphics memory spaces of its
|
||||
own, similar to the virtual address spaces on the CPU. The local
|
||||
graphics memory spaces are only visible to the Render Engine in the GPU.
|
||||
Therefore, any valid local graphics memory address, programmed by a VM,
|
||||
can be used directly by the GPU. The GVT-g device model switches the
|
||||
local graphics memory spaces, between VMs, when switching render
|
||||
ownership.
|
||||
|
||||
GPU Page Table Virtualization
|
||||
=============================
|
||||
|
||||
Shared Shadow GGTT
|
||||
------------------
|
||||
|
||||
To achieve resource partitioning and address space ballooning, GVT-g
|
||||
implements a shared shadow global page table for all VMs. Each VM has
|
||||
its own guest global page table to translate the graphics memory page
|
||||
number to the Guest memory Page Number (GPN). The shadow global page
|
||||
table is then translated from the graphics memory page number to the
|
||||
Host memory Page Number (HPN).
|
||||
|
||||
The shared shadow global page table maintains the translations for all
|
||||
VMs to support concurrent accesses from the CPU and GPU concurrently.
|
||||
Therefore, GVT-g implements a single, shared shadow global page table by
|
||||
trapping guest PTE updates, as shown in :numref:`shared-shadow`. The
|
||||
global page table, in MMIO space, has 1024K PTE entries, each pointing
|
||||
to a 4 KB system memory page, so the global page table overall creates a
|
||||
4 GB global graphics memory space. GVT-g audits the guest PTE values
|
||||
according to the address space ballooning information before updating
|
||||
the shadow PTE entries.
|
||||
|
||||
.. figure:: images/APL_GVT-g-shared-shadow.png
|
||||
:width: 600px
|
||||
:align: center
|
||||
:name: shared-shadow
|
||||
|
||||
Shared Shadow Global Page Table
|
||||
|
||||
Per-VM Shadow PPGTT
|
||||
-------------------
|
||||
|
||||
To support local graphics memory access pass-through, GVT-g implements
|
||||
per-VM shadow local page tables. The local graphics memory is only
|
||||
accessible from the Render Engine. The local page tables have two-level
|
||||
paging structures, as shown in :numref:`per-vm-shadow`.
|
||||
|
||||
The first level, Page Directory Entries (PDEs), located in the global
|
||||
page table, points to the second level, Page Table Entries (PTEs) in
|
||||
system memory, so guest accesses to the PDE are trapped and emulated,
|
||||
through the implementation of shared shadow global page table.
|
||||
|
||||
GVT-g also write-protects a list of guest PTE pages for each VM. The
|
||||
GVT-g device model synchronizes the shadow page with the guest page, at
|
||||
the time of write-protection page fault, and switches the shadow local
|
||||
page tables at render context switches.
|
||||
|
||||
.. figure:: images/APL_GVT-g-per-vm-shadow.png
|
||||
:width: 800px
|
||||
:align: center
|
||||
:name: per-vm-shadow
|
||||
|
||||
Per-VM Shadow PPGTT
|
||||
|
||||
.. _GVT-g-prioritized-rendering:
|
||||
|
||||
Prioritized Rendering and Preemption
|
||||
====================================
|
||||
|
||||
Different Schedulers and Their Roles
|
||||
------------------------------------
|
||||
|
||||
.. figure:: images/APL_GVT-g-scheduling-policy.png
|
||||
:width: 800px
|
||||
:align: center
|
||||
:name: scheduling-policy
|
||||
|
||||
Scheduling Policy
|
||||
|
||||
In the system, there are three different schedulers for the GPU:
|
||||
|
||||
- i915 UOS scheduler
|
||||
- Mediator GVT scheduler
|
||||
- i915 SOS scheduler
|
||||
|
||||
Since UOS always uses the host-based command submission (ELSP) model,
|
||||
and it never accesses the GPU or the Graphic Micro Controller (GuC)
|
||||
directly, its scheduler cannot do any preemption by itself.
|
||||
The i915 scheduler does ensure batch buffers are
|
||||
submitted in dependency order, that is, if a compositor had to wait for
|
||||
an application buffer to finish before its workload can be submitted to
|
||||
the GPU, then the i915 scheduler of the UOS ensures that this happens.
|
||||
|
||||
The UOS assumes that by submitting its batch buffers to the Execlist
|
||||
Submission Port (ELSP), the GPU will start working on them. However,
|
||||
the MMIO write to the ELSP is captured by the Hypervisor, which forwards
|
||||
these requests to the GVT module. GVT then creates a shadow context
|
||||
based on this batch buffer and submits the shadow context to the SOS
|
||||
i915 driver.
|
||||
|
||||
However, it is dependent on a second scheduler called the GVT
|
||||
scheduler. This scheduler is time based and uses a round robin algorithm
|
||||
to provide a specific time for each UOS to submit its workload when it
|
||||
is considered as a "render owner". The workload of the UOSs that are not
|
||||
render owners during a specific time period end up waiting in the
|
||||
virtual GPU context until the GVT scheduler makes them render owners.
|
||||
The GVT shadow context submits only one workload at
|
||||
a time, and once the workload is finished by the GPU, it copies any
|
||||
context state back to DomU and sends the appropriate interrupts before
|
||||
picking up any other workloads from either this UOS or another one. This
|
||||
also implies that this scheduler does not do any preemption of
|
||||
workloads.
|
||||
|
||||
Finally, there is the i915 scheduler in the SOS. This scheduler uses the
|
||||
GuC or ELSP to do command submission of SOS local content as well as any
|
||||
content that GVT is submitting to it on behalf of the UOSs. This
|
||||
scheduler uses GuC or ELSP to preempt workloads. GuC has four different
|
||||
priority queues, but the SOS i915 driver uses only two of them. One of
|
||||
them is considered high priority and the other is normal priority with a
|
||||
GuC rule being that any command submitted on the high priority queue
|
||||
would immediately try to preempt any workload submitted on the normal
|
||||
priority queue. For ELSP submission, the i915 will submit a preempt
|
||||
context to preempt the current running context and then wait for the GPU
|
||||
engine to be idle.
|
||||
|
||||
While the identification of workloads to be preempted is decided by
|
||||
customizable scheduling policies, once a candidate for preemption is
|
||||
identified, the i915 scheduler simply submits a preemption request to
|
||||
the GuC high-priority queue. Based on the HW's ability to preempt (on an
|
||||
Apollo Lake SoC, 3D workload is preemptible on a 3D primitive level with
|
||||
some exceptions), the currently executing workload is saved and
|
||||
preempted. The GuC informs the driver using an interrupt of a preemption
|
||||
event occurring. After handling the interrupt, the driver submits the
|
||||
high-priority workload through the normal priority GuC queue. As such,
|
||||
the normal priority GuC queue is used for actual execbuf submission most
|
||||
of the time with the high-priority GuC queue only being used for the
|
||||
preemption of lower-priority workload.
|
||||
|
||||
Scheduling policies are customizable and left to customers to change if
|
||||
they are not satisfied with the built-in i915 driver policy, where all
|
||||
workloads of the SOS are considered higher priority than those of the
|
||||
UOS. This policy can be enforced through an SOS i915 kernel command line
|
||||
parameter, and can replace the default in-order command submission (no
|
||||
preemption) policy.
|
||||
|
||||
AcrnGT
|
||||
*******
|
||||
|
||||
ACRN is a flexible, lightweight reference hypervisor, built with
|
||||
real-time and safety-criticality in mind, optimized to streamline
|
||||
embedded development through an open source platform.
|
||||
|
||||
AcrnGT is the GVT-g implementation on the ACRN hypervisor. It adapts
|
||||
the MPT interface of GVT-g onto ACRN by using the kernel APIs provided
|
||||
by ACRN.
|
||||
|
||||
:numref:`full-pic` shows the full architecture of AcrnGT with a Linux Guest
|
||||
OS and an Android Guest OS.
|
||||
|
||||
.. figure:: images/APL_GVT-g-full-pic.png
|
||||
:width: 800px
|
||||
:align: center
|
||||
:name: full-pic
|
||||
|
||||
Full picture of the AcrnGT
|
||||
|
||||
AcrnGT in kernel
|
||||
=================
|
||||
|
||||
The AcrnGT module in the SOS kernel acts as an adaption layer to connect
|
||||
between GVT-g in the i915, the VHM module, and the ACRN-DM user space
|
||||
application:
|
||||
|
||||
- AcrnGT module implements the MPT interface of GVT-g to provide
|
||||
services to it, including set and unset trap areas, set and unset
|
||||
write-protection pages, etc.
|
||||
|
||||
- It calls the VHM APIs provided by the ACRN VHM module in the SOS
|
||||
kernel, to eventually call into the routines provided by ACRN
|
||||
hypervisor through hyper-calls.
|
||||
|
||||
- It provides user space interfaces through ``sysfs`` to the user space
|
||||
ACRN-DM, so that DM can manage the lifecycle of the virtual GPUs.
|
||||
|
||||
AcrnGT in DM
|
||||
=============
|
||||
|
||||
To emulate a PCI device to a Guest, we need an AcrnGT sub-module in the
|
||||
ACRN-DM. This sub-module is responsible for:
|
||||
|
||||
- registering the virtual GPU device to the PCI device tree presented to
|
||||
guest;
|
||||
|
||||
- registerng the MMIO resources to ACRN-DM so that it can reserve
|
||||
resources in ACPI table;
|
||||
|
||||
- managing the lifecycle of the virtual GPU device, such as creation,
|
||||
destruction, and resetting according to the state of the virtual
|
||||
machine.
|
1180
doc/developer-guides/hld/hld-devicemodel.rst
Normal file
18
doc/developer-guides/hld/hld-emulated-devices.rst
Normal file
@@ -0,0 +1,18 @@
|
||||
.. _hld-emulated-devices:
|
||||
|
||||
Emulated devices high-level design
|
||||
##################################
|
||||
|
||||
Full virtualization device models can typically
|
||||
reuse existing native device drivers to avoid implementing front-end
|
||||
drivers. ACRN implements several fully virtualized devices, as
|
||||
documented in this section.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
usb-virt-hld
|
||||
UART virtualization <uart-virt-hld>
|
||||
Watchdoc virtualization <watchdog-hld>
|
||||
random-virt-hld
|
||||
GVT-g GPU Virtualization <hld-APL_GVT-g>
|
24
doc/developer-guides/hld/hld-hypervisor.rst
Normal file
@@ -0,0 +1,24 @@
|
||||
.. _hld-hypervisor:
|
||||
|
||||
Hypervisor high-level design
|
||||
############################
|
||||
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
hv-startup
|
||||
hv-cpu-virt
|
||||
Memory management <hv-memmgt>
|
||||
I/O Emulation <hv-io-emulation>
|
||||
IOC Virtualization <hv-ioc-virt>
|
||||
Physical Interrupt <hv-interrupt>
|
||||
Timer <hv-timer>
|
||||
Virtual Interrupt <hv-virt-interrupt>
|
||||
VT-d <hv-vt-d>
|
||||
Device Passthrough <hv-dev-passthrough>
|
||||
hv-partitionmode
|
||||
Power Management <hv-pm>
|
||||
Console, Shell, and vUART <hv-console>
|
||||
Hypercall / VHM upcall <hv-hypercall>
|
||||
Compile-time configuration <hv-config>
|
529
doc/developer-guides/hld/hld-overview.rst
Normal file
@@ -0,0 +1,529 @@
|
||||
.. _hld-overview:
|
||||
|
||||
ACRN high-level design overview
|
||||
###############################
|
||||
|
||||
ACRN is an open source reference hypervisor (HV) running on top of Intel
|
||||
Apollo Lake platforms for Software Defined Cockpit (SDC) or In-Vehicle
|
||||
Experience (IVE) solutions. ACRN provides embedded hypervisor vendors
|
||||
with a reference I/O mediation solution with a permissive license and
|
||||
provides auto makers a reference software stack for in-vehicle use.
|
||||
|
||||
ACRN Supported Use Cases
|
||||
************************
|
||||
|
||||
Software Defined Cockpit
|
||||
========================
|
||||
|
||||
The SDC system consists of multiple systems: the instrument cluster (IC)
|
||||
system, the In-vehicle Infotainment (IVI) system, and one or more rear
|
||||
seat entertainment (RSE) systems. Each system runs as a VM for better
|
||||
isolation.
|
||||
|
||||
The Instrument Control (IC) system manages graphics display of
|
||||
|
||||
- driving speed, engine RPM, temperature, fuel level, odometer, trip mile, etc.
|
||||
- alerts of low fuel or tire pressure
|
||||
- rear-view camera(RVC) and surround-camera view for driving assistance.
|
||||
|
||||
In-Vehicle Infotainment
|
||||
=======================
|
||||
|
||||
A typical In-Vehicle Infotainment (IVI) system would support:
|
||||
|
||||
- Navigation systems;
|
||||
- Radios, audio and video playback;
|
||||
- Mobile devices connection for calls, music and applications via voice
|
||||
recognition and/or gesture Recognition / Touch.
|
||||
- Rear-seat RSE services such as:
|
||||
|
||||
- entertainment system
|
||||
- virtual office
|
||||
- connection to IVI front system and mobile devices (cloud
|
||||
connectivity)
|
||||
|
||||
ACRN supports guest OSes of Clear Linux OS and Android. OEMs can use the ACRN
|
||||
hypervisor and Linux or Android guest OS reference code to implement their own
|
||||
VMs for a customized IC/IVI/RSE.
|
||||
|
||||
Hardware Requirements
|
||||
*********************
|
||||
|
||||
Mandatory IA CPU features are support for:
|
||||
|
||||
- Long mode
|
||||
- MTRR
|
||||
- TSC deadline timer
|
||||
- NX, SMAP, SMEP
|
||||
- Intel-VT including VMX, EPT, VT-d, APICv, VPID, invept and invvpid
|
||||
|
||||
Recommended Memory: 4GB, 8GB preferred.
|
||||
|
||||
|
||||
ACRN Architecture
|
||||
*****************
|
||||
|
||||
ACRN is a type-I hypervisor, running on top of bare metal. It supports
|
||||
Intel Apollo Lake platforms and can be easily extended to support future
|
||||
platforms. ACRN implements a hybrid VMM architecture, using a privileged
|
||||
service VM running the service OS (SOS) to manage I/O devices and
|
||||
provide I/O mediation. Multiple user VMs can be supported, running Clear
|
||||
Linux OS or Android OS as the user OS (UOS).
|
||||
|
||||
Instrument cluster applications are critical in the Software Defined
|
||||
Cockpit (SDC) use case, and may require functional safety certification
|
||||
in the future. Running the IC system in a separate VM can isolate it from
|
||||
other VMs and their applications, thereby reducing the attack surface
|
||||
and minimizing potential interference. However, running the IC system in
|
||||
a separate VM introduces additional latency for the IC applications.
|
||||
Some country regulations requires an IVE system to show a rear-view
|
||||
camera (RVC) within 2 seconds, which is difficult to achieve if a
|
||||
separate instrument cluster VM is started after the SOS is booted.
|
||||
|
||||
:numref:`overview-arch` shows the architecture of ACRN together with IC VM and
|
||||
service VM. As shown, SOS owns most of platform devices and provides I/O
|
||||
mediation to VMs. Some of the PCIe devices function as pass-through mode
|
||||
to UOSs according to VM configuration. In addition, the SOS could run
|
||||
the IC applications and HV helper applications such as the Device Model,
|
||||
VM manager, etc. where the VM manager is responsible for VM
|
||||
start/stop/pause, virtual CPU pause/resume,etc.
|
||||
|
||||
.. figure:: images/over-image34.png
|
||||
:align: center
|
||||
:name: overview-arch
|
||||
|
||||
ACRN Architecture
|
||||
|
||||
.. _intro-io-emulation:
|
||||
|
||||
Device Emulation
|
||||
================
|
||||
|
||||
ACRN adopts various approaches for emulating devices for UOS:
|
||||
|
||||
- **Emulated device**: A virtual device using this approach is emulated in
|
||||
the SOS by trapping accesses to the device in UOS. Two sub-categories
|
||||
exist for emulated device:
|
||||
|
||||
- fully emulated, allowing native drivers to be used
|
||||
unmodified in the UOS, and
|
||||
- para-virtualized, requiring front-end drivers in
|
||||
the UOS to function.
|
||||
|
||||
- **Pass-through device**: A device passed through to UOS is fully
|
||||
accessible to UOS without interception. However, interrupts
|
||||
are first handled by the hypervisor before
|
||||
being injected to the UOS.
|
||||
|
||||
- **Mediated pass-through device**: A mediated pass-through device is a
|
||||
hybrid of the previous two approaches. Performance-critical
|
||||
resources (mostly data-plane related) are passed-through to UOSes and
|
||||
others (mostly control-plane related) are emulated.
|
||||
|
||||
I/O Emulation
|
||||
-------------
|
||||
|
||||
The device model (DM) is a place for managing UOS devices: it allocates
|
||||
memory for UOSes, configures and initializes the devices shared by the
|
||||
guest, loads the virtual BIOS and initializes the virtual CPU state, and
|
||||
invokes hypervisor service to execute the guest instructions.
|
||||
|
||||
The following diagram illustrates the control flow of emulating a port
|
||||
I/O read from UOS.
|
||||
|
||||
.. figure:: images/over-image29.png
|
||||
:align: center
|
||||
:name: overview-io-emu-path
|
||||
|
||||
I/O (PIO/MMIO) Emulation Path
|
||||
|
||||
:numref:`overview-io-emu-path` shows an example I/O emulation flow path.
|
||||
When a guest executes an I/O instruction (port I/O or MMIO), an VM exit
|
||||
happens. HV takes control, and executes the request based on the VM exit
|
||||
reason ``VMX_EXIT_REASON_IO_INSTRUCTION`` for port I/O access, for
|
||||
example. HV will then fetch the additional guest instructions, if any,
|
||||
and processes the port I/O instructions at a pre-configured port address
|
||||
(in ``AL, 20h`` for example), and place the decoded information such as
|
||||
the port I/O address, size of access, read/write, and target register
|
||||
into the I/O request in the I/O request buffer (shown in
|
||||
:numref:`overview-io-emu-path`) and notify/interrupt SOS to process.
|
||||
|
||||
The virtio and HV service module (VHM) in SOS intercepts HV interrupts,
|
||||
and accesses the I/O request buffer for the port I/O instructions. It will
|
||||
then check if there is any kernel device claiming ownership of the
|
||||
I/O port. The owning device, if any, executes the requested APIs from a
|
||||
VM. Otherwise, the VHM module leaves the I/O request in the request buffer
|
||||
and wakes up the DM thread for processing.
|
||||
|
||||
DM follows the same mechanism as VHM. The I/O processing thread of the
|
||||
DM queries the I/O request buffer to get the PIO instruction details and
|
||||
checks to see if any (guest) device emulation modules claim ownership of
|
||||
the I/O port. If yes, the owning module is invoked to execute requested
|
||||
APIs.
|
||||
|
||||
When the DM completes the emulation (port IO 20h access in this example)
|
||||
of a device such as uDev1, uDev1 will put the result into the request
|
||||
buffer (register AL). The DM will then return the control to HV
|
||||
indicating completion of an IO instruction emulation, typically thru
|
||||
VHM/hypercall. The HV then stores the result to the guest register
|
||||
context, advances the guest IP to indicate the completion of instruction
|
||||
execution, and resumes the guest.
|
||||
|
||||
MMIO access path is similar except for a VM exit reason of *EPT
|
||||
violation*.
|
||||
|
||||
DMA Emulation
|
||||
-------------
|
||||
|
||||
Currently the only fully virtualized devices to UOS are USB xHCI, UART,
|
||||
and Automotive I/O controller. None of these require emulating
|
||||
DMA transactions. ACRN does not currently support virtual DMA.
|
||||
|
||||
Hypervisor
|
||||
**********
|
||||
|
||||
ACRN takes advantage of Intel Virtualization Technology (Intel VT).
|
||||
The ACRN HV runs in Virtual Machine Extension (VMX) root operation,
|
||||
host mode, or VMM mode, while the SOS and UOS guests run
|
||||
in VMX non-root operation, or guest mode. (We'll use "root mode"
|
||||
and "non-root mode" for simplicity).
|
||||
|
||||
The VMM mode has 4 rings. ACRN
|
||||
runs HV in ring 0 privilege only, and leaves ring 1-3 unused. A guest
|
||||
running in non-root mode, has its own full rings (ring 0 to 3). The
|
||||
guest kernel runs in ring 0 in guest mode, while guest user land
|
||||
applications run in ring 3 of guest mode (ring 1 and 2 are usually not
|
||||
used by commercial OS).
|
||||
|
||||
.. figure:: images/over-image11.png
|
||||
:align: center
|
||||
:name: overview-arch-hv
|
||||
|
||||
|
||||
Architecture of ACRN hypervisor
|
||||
|
||||
:numref:`overview-arch-hv` shows an overview of the ACRN hypervisor architecture.
|
||||
|
||||
- A platform initialization layer provides an entry
|
||||
point, checking hardware capabilities and initializing the
|
||||
processors, memory, and interrupts. Relocation of the hypervisor
|
||||
image, derivation of encryption seeds are also supported by this
|
||||
component.
|
||||
|
||||
- A hardware management and utilities layer provides services for
|
||||
managing physical resources at runtime. Examples include handling
|
||||
physical interrupts and low power state changes.
|
||||
|
||||
- A layer siting on top of hardware management enables virtual
|
||||
CPUs (or vCPUs), leveraging Intel VT. A vCPU loop runs a vCPU in
|
||||
non-root mode and handles VM exit events triggered by the vCPU.
|
||||
This layer handles CPU and memory related VM
|
||||
exits and provides a way to inject exceptions or interrupts to a
|
||||
vCPU.
|
||||
|
||||
- On top of vCPUs are three components for device emulation: one for
|
||||
emulation inside the hypervisor, another for communicating with
|
||||
SOS for mediation, and the third for managing pass-through
|
||||
devices.
|
||||
|
||||
- The highest layer is a VM management module providing
|
||||
VM lifecycle and power operations.
|
||||
|
||||
- A library component provides basic utilities for the rest of the
|
||||
hypervisor, including encryption algorithms, mutual-exclusion
|
||||
primitives, etc.
|
||||
|
||||
There are three ways that the hypervisor interacts with SOS:
|
||||
VM exits (including hypercalls), upcalls, and through the I/O request buffer.
|
||||
Interaction between the hypervisor and UOS is more restricted, including
|
||||
only VM exits and hypercalls related to trusty.
|
||||
|
||||
SOS
|
||||
***
|
||||
|
||||
SOS (Service OS) is an important guest OS in the ACRN architecture. It
|
||||
runs in non-root mode, and contains many critical components including VM
|
||||
manager, device model (DM), ACRN services, kernel mediation, and virtio
|
||||
and hypercall module (VHM). DM manages UOS (User OS) and
|
||||
provide device emulation for it. The SOS also provides services
|
||||
for system power lifecycle management through ACRN service and VM manager,
|
||||
and services for system debugging through ACRN log/trace tools.
|
||||
|
||||
DM
|
||||
==
|
||||
|
||||
DM (Device Model) is an user level QEMU-like application in SOS
|
||||
responsible for creating an UOS VM and then performing devices emulation
|
||||
based on command line configurations.
|
||||
|
||||
Based on a VHM kernel module, DM interacts with VM manager to create UOS
|
||||
VM. It then emulates devices through full virtualization in DM user
|
||||
level, or para-virtualized based on kernel mediator (such as virtio,
|
||||
GVT), or pass-through based on kernel VHM APIs.
|
||||
|
||||
Refer to :ref:`hld-devicemodel` for more details.
|
||||
|
||||
VM Manager
|
||||
==========
|
||||
|
||||
VM Manager is an user level service in SOS handling UOS VM creation and
|
||||
VM state management, according to the application requirements or system
|
||||
power operations.
|
||||
|
||||
VM Manager creates UOS VM based on DM application, and does UOS VM state
|
||||
management by interacting with lifecycle service in ACRN service.
|
||||
|
||||
Please refer to VM management chapter for more details.
|
||||
|
||||
ACRN Service
|
||||
============
|
||||
|
||||
ACRN service provides
|
||||
system lifecycle management based on IOC polling. It communicates with
|
||||
VM manager to handle UOS VM state, such as S3 and power-off.
|
||||
|
||||
VHM
|
||||
===
|
||||
|
||||
VHM (virtio & hypercall module) kernel module is an SOS kernel driver
|
||||
supporting UOS VM management and device emulation. Device Model follows
|
||||
the standard Linux char device API (ioctl) to access VHM
|
||||
functionalities. VHM communicates with the ACRN hypervisor through
|
||||
hypercall or upcall interrupts.
|
||||
|
||||
Please refer to VHM chapter for more details.
|
||||
|
||||
Kernel Mediators
|
||||
================
|
||||
|
||||
Kernel mediators are kernel modules providing a para-virtualization method
|
||||
for the UOS VMs, for example, an i915 gvt driver.
|
||||
|
||||
Log/Trace Tools
|
||||
===============
|
||||
|
||||
ACRN Log/Trace tools are user level applications used to
|
||||
capture ACRN hypervisor log and trace data. The VHM kernel module provides a
|
||||
middle layer to support these tools.
|
||||
|
||||
Refer to :ref:`hld-trace-log` for more details.
|
||||
|
||||
UOS
|
||||
***
|
||||
|
||||
Currently, ACRN can boot Linux and Android guest OSes. For Android guest OS, ACRN
|
||||
provides a VM environment with two worlds: normal world and trusty
|
||||
world. The Android OS runs in the the normal world. The trusty OS and
|
||||
security sensitive applications run in the trusty world. The trusty
|
||||
world can see the memory of normal world, but normal world cannot see
|
||||
trusty world.
|
||||
|
||||
Guest Physical Memory Layout - UOS E820
|
||||
=======================================
|
||||
|
||||
DM will create E820 table for a User OS VM based on these simple rules:
|
||||
|
||||
- If requested VM memory size < low memory limitation (currently 2 GB,
|
||||
defined in DM), then low memory range = [0, requested VM memory
|
||||
size]
|
||||
|
||||
- If requested VM memory size > low memory limitation, then low
|
||||
memory range = [0, 2G], and high memory range =
|
||||
[4G, 4G + requested VM memory size - 2G]
|
||||
|
||||
.. figure:: images/over-image13.png
|
||||
:align: center
|
||||
|
||||
UOS Physical Memory Layout
|
||||
|
||||
UOS Memory Allocation
|
||||
=====================
|
||||
|
||||
DM does UOS memory allocation based on hugetlb mechanism by default.
|
||||
The real memory mapping may be scattered in SOS physical
|
||||
memory space, as shown in :numref:`overview-mem-layout`:
|
||||
|
||||
.. figure:: images/over-image15.png
|
||||
:align: center
|
||||
:name: overview-mem-layout
|
||||
|
||||
|
||||
UOS Physical Memory Layout Based on Hugetlb
|
||||
|
||||
User OS's memory is allocated by Service OS DM application, it may come
|
||||
from different huge pages in Service OS as shown in
|
||||
:numref:`overview-mem-layout`.
|
||||
|
||||
As Service OS has full knowledge of these huge pages size,
|
||||
GPA\ :sup:`SOS` and GPA\ :sup:`UOS`, it works with the hypervisor
|
||||
to complete UOS's host-to-guest mapping using this pseudo code:
|
||||
|
||||
.. code-block: none
|
||||
|
||||
for x in allocated huge pages do
|
||||
x.hpa = gpa2hpa_for_sos(x.sos_gpa)
|
||||
host2guest_map_for_uos(x.hpa, x.uos_gpa, x.size)
|
||||
end
|
||||
|
||||
Virtual Slim bootloader
|
||||
=======================
|
||||
|
||||
Virtual Slim bootloader (vSBL) is the virtual bootloader that supports
|
||||
booting the UOS on the ACRN hypervisor platform. The vSBL design is
|
||||
derived from Slim Bootloader. It follows a staged design approach that
|
||||
provides hardware initialization and payload launching that provides the
|
||||
boot logic. As shown in :numref:`overview-sbl`, the virtual SBL has an
|
||||
initialization unit to initialize virtual hardware, and a payload unit
|
||||
to boot Linux or Android guest OS.
|
||||
|
||||
.. figure:: images/over-image110.png
|
||||
:align: center
|
||||
:name: overview-sbl
|
||||
|
||||
vSBL System Context Diagram
|
||||
|
||||
The vSBL image is released as a part of the Service OS (SOS) root
|
||||
filesystem (rootfs). The vSBL is copied to UOS memory by the VM manager
|
||||
in the SOS while creating the UOS virtual BSP of UOS. The SOS passes the
|
||||
start of vSBL and related information to HV. HV sets guest RIP of UOS
|
||||
virtual BSP as the start of vSBL and related guest registers, and
|
||||
launches the UOS virtual BSP. The vSBL starts running in the virtual
|
||||
real mode within the UOS. Conceptually, vSBL is part of the UOS runtime.
|
||||
|
||||
In the current design, the vSBL supports booting Android guest OS or
|
||||
Linux guest OS using the same vSBL image.
|
||||
|
||||
For an Android VM, the vSBL will load and verify trusty OS first, and
|
||||
trusty OS will then load and verify Android OS according to the Android
|
||||
OS verification mechanism.
|
||||
|
||||
Freedom From Interference
|
||||
*************************
|
||||
|
||||
The hypervisor is critical for preventing inter-VM interference, using
|
||||
the following mechanisms:
|
||||
|
||||
- Each physical CPU is dedicated to one vCPU.
|
||||
|
||||
Sharing a physical CPU among multiple vCPUs gives rise to multiple
|
||||
sources of interference such as the vCPU of one VM flushing the
|
||||
L1 & L2 cache for another, or tremendous interrupts for one VM
|
||||
delaying the execution of another. It also requires vCPU
|
||||
scheduling in the hypervisor to consider more complexities such as
|
||||
scheduling latency and vCPU priority, exposing more opportunities
|
||||
for one VM to interfere another.
|
||||
|
||||
To prevent such interference, ACRN hypervisor adopts static
|
||||
core partitioning by dedicating each physical CPU to one vCPU. The
|
||||
physical CPU loops in idle when the vCPU is paused by I/O
|
||||
emulation. This makes the vCPU scheduling deterministic and physical
|
||||
resource sharing is minimized.
|
||||
|
||||
- Hardware mechanisms including EPT, VT-d, SMAP and SMEP are leveraged
|
||||
to prevent unintended memory accesses.
|
||||
|
||||
Memory corruption can be a common failure mode. ACRN hypervisor properly
|
||||
sets up the memory-related hardware mechanisms to ensure that:
|
||||
|
||||
1. SOS cannot access the memory of the hypervisor, unless explicitly
|
||||
allowed,
|
||||
|
||||
2. UOS cannot access the memory of SOS and the hypervisor, and
|
||||
|
||||
3. The hypervisor does not unintendedly access the memory of SOS or UOS.
|
||||
|
||||
- Destination of external interrupts are set to be the physical core
|
||||
where the VM that handles them is running.
|
||||
|
||||
External interrupts are always handled by the hypervisor in ACRN.
|
||||
Excessive interrupts to one VM (say VM A) could slow down another
|
||||
VM (VM B) if they are handled by the physical core running VM B
|
||||
instead of VM A. Two mechanisms are designed to mitigate such
|
||||
interference.
|
||||
|
||||
1. The destination of an external interrupt is set to the physical core
|
||||
that runs the vCPU where virtual interrupts will be injected.
|
||||
|
||||
2. The hypervisor maintains statistics on the total number of received
|
||||
interrupts to SOS via a hypercall, and has a delay mechanism to
|
||||
temporarily block certain virtual interrupts from being injected.
|
||||
This allows SOS to detect the occurrence of an interrupt storm and
|
||||
control the interrupt injection rate when necessary.
|
||||
|
||||
- Mitigation of DMA storm.
|
||||
|
||||
(To be documented later.)
|
||||
|
||||
Boot Flow
|
||||
*********
|
||||
|
||||
.. figure:: images/over-image85.png
|
||||
:align: center
|
||||
|
||||
|
||||
ACRN Boot Flow
|
||||
|
||||
Power Management
|
||||
****************
|
||||
|
||||
CPU P-state & C-state
|
||||
=====================
|
||||
|
||||
In ACRN, CPU P-state and C-state (Px/Cx) are controlled by the guest OS.
|
||||
The corresponding governors are managed in SOS/UOS for best power
|
||||
efficiency and simplicity.
|
||||
|
||||
Guest should be able to process the ACPI P/C-state request from OSPM.
|
||||
The needed ACPI objects for P/C-state management should be ready in
|
||||
ACPI table.
|
||||
|
||||
Hypervisor can restrict guest's P/C-state request (per customer
|
||||
requirement). MSR accesses of P-state requests could be intercepted by
|
||||
the hypervisor and forwarded to the host directly if the requested
|
||||
P-state is valid. Guest MWAIT/Port IO accesses of C-state control could
|
||||
be passed through to host with no hypervisor interception to minimize
|
||||
performance impacts.
|
||||
|
||||
This diagram shows CPU P/C-state management blocks:
|
||||
|
||||
.. figure:: images/over-image4.png
|
||||
:align: center
|
||||
|
||||
|
||||
CPU P/C-state management block diagram
|
||||
|
||||
System power state
|
||||
==================
|
||||
|
||||
ACRN supports ACPI standard defined power state: S3 and S5 in system
|
||||
level. For each guest, ACRN assume guest implements OSPM and controls its
|
||||
own power state accordingly. ACRN doesn't involve guest OSPM. Instead,
|
||||
it traps the power state transition request from guest and emulates it.
|
||||
|
||||
.. figure:: images/over-image21.png
|
||||
:align: center
|
||||
:name: overview-pm-block
|
||||
|
||||
ACRN Power Management Diagram Block
|
||||
|
||||
:numref:`overview-pm-block` shows the basic diagram block for ACRN PM.
|
||||
The OSPM in each guest manages the guest power state transition. The
|
||||
Device Model running in SOS traps and emulates the power state
|
||||
transition of UOS (Linux VM or Android VM in
|
||||
:numref:`overview-pm-block`). VM Manager knows all UOS power states and
|
||||
notifies OSPM of SOS (Service OS in :numref:`overview-pm-block`) once
|
||||
active UOS is in the required power state.
|
||||
|
||||
Then OSPM of the SOS starts the power state transition of SOS which is
|
||||
trapped to "Sx Agency" in ACRN, and it will start the power state
|
||||
transition.
|
||||
|
||||
Some details about the ACPI table for UOS and SOS:
|
||||
|
||||
- The ACPI table in UOS is emulated by Device Model. The Device Model
|
||||
knows which register the UOS writes to trigger power state
|
||||
transitions. Device Model must register an I/O handler for it.
|
||||
|
||||
- The ACPI table in SOS is passthru. There is no ACPI parser
|
||||
in ACRN HV. The power management related ACPI table is
|
||||
generated offline and hardcoded in ACRN HV.
|
179
doc/developer-guides/hld/hld-power-management.rst
Normal file
@@ -0,0 +1,179 @@
|
||||
.. _hld-power-management:
|
||||
|
||||
Power Management high-level design
|
||||
##################################
|
||||
|
||||
P-state/C-state management
|
||||
**************************
|
||||
|
||||
ACPI Px/Cx data
|
||||
===============
|
||||
|
||||
CPU P-state/C-state are controlled by the guest OS. The ACPI
|
||||
P/C-state driver relies on some P/C-state-related ACPI data in the guest
|
||||
ACPI table.
|
||||
|
||||
SOS could run ACPI driver with no problem because it can access native
|
||||
the ACPI table. For UOS though, we need to prepare the corresponding ACPI data
|
||||
for Device Model to build virtual ACPI table.
|
||||
|
||||
The Px/Cx data includes four
|
||||
ACPI objects: _PCT, _PPC, and _PSS for P-state management, and _CST for
|
||||
C-state management. All these ACPI data must be consistent with the
|
||||
native data because the control method is a kind of pass through.
|
||||
|
||||
These ACPI objects data are parsed by an offline tool and hard-coded in a
|
||||
Hypervisor module named CPU state table:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
struct cpu_px_data {
|
||||
uint64_t core_frequency; /* megahertz */
|
||||
uint64_t power; /* milliWatts */
|
||||
uint64_t transition_latency; /* microseconds */
|
||||
uint64_t bus_master_latency; /* microseconds */
|
||||
uint64_t control; /* control value */
|
||||
uint64_t status; /* success indicator */
|
||||
} __attribute__((aligned(8)));
|
||||
|
||||
struct acpi_generic_address {
|
||||
uint8_t space_id;
|
||||
uint8_t bit_width;
|
||||
uint8_t bit_offset;
|
||||
uint8_t access_size;
|
||||
uint64_t address;
|
||||
} __attribute__((aligned(8)));
|
||||
|
||||
struct cpu_cx_data {
|
||||
struct acpi_generic_address cx_reg;
|
||||
uint8_t type;
|
||||
uint32_t latency;
|
||||
uint64_t power;
|
||||
} __attribute__((aligned(8)));
|
||||
|
||||
|
||||
With these Px/Cx data, the Hypervisor is able to intercept guest's
|
||||
P/C-state requests with desired restrictions.
|
||||
|
||||
Virtual ACPI table build flow
|
||||
=============================
|
||||
|
||||
:numref:`vACPItable` shows how to build virtual ACPI table with
|
||||
Px/Cx data for UOS P/C-state management:
|
||||
|
||||
.. figure:: images/hld-pm-image28.png
|
||||
:align: center
|
||||
:name: vACPItable
|
||||
|
||||
System block for building vACPI table with Px/Cx data
|
||||
|
||||
Some ioctl APIs are defined for Device model to query Px/Cx data from
|
||||
SOS VHM. The Hypervisor needs to provide hypercall APIs to transit Px/Cx
|
||||
data from CPU state table to SOS VHM.
|
||||
|
||||
The build flow is:
|
||||
|
||||
1) Use offline tool (e.g. **iasl**) to parse the Px/Cx data and hard-code to
|
||||
CPU state table in Hypervisor. Hypervisor loads the data after
|
||||
system boot up.
|
||||
2) Before UOS launching, Device mode queries the Px/Cx data from SOS VHM
|
||||
via ioctl interface.
|
||||
3) VHM transmits the query request to Hypervisor by hypercall.
|
||||
4) Hypervisor returns the Px/Cx data.
|
||||
5) Device model builds the virtual ACPI table with these Px/Cx data
|
||||
|
||||
Intercept Policy
|
||||
================
|
||||
|
||||
Hypervisor should be able to restrict guest's
|
||||
P/C-state request, with a user-customized policy.
|
||||
|
||||
Hypervisor should intercept guest P-state request and validate whether
|
||||
it is a valid P-state. Any invalid P-state (e.g. doesn't exist in CPU state
|
||||
table) should be rejected.
|
||||
|
||||
It is better not to intercept C-state request because the trap would
|
||||
impact both power and performance.
|
||||
|
||||
.. note:: For P-state control you should pay attention to SoC core
|
||||
voltage domain design when doing P-state measurement. The highest
|
||||
P-state would win if different P-state requests on the cores shared
|
||||
same voltage domain. In this case APERF/MPERF must be used to see
|
||||
what P-state was granted on that core.
|
||||
|
||||
S3/S5
|
||||
*****
|
||||
|
||||
ACRN assumes guest has complete S3/S5 power state management and follows
|
||||
the ACPI standard exactly. System S3/S5 needs to follow well-defined
|
||||
enter/exit paths and cooperate among different components.
|
||||
|
||||
System low power state enter process
|
||||
====================================
|
||||
|
||||
Each time, when OSPM of UOS starts power state transition, it will
|
||||
finally write the ACPI register per ACPI spec requirement.
|
||||
With help of ACRN I/O emulation framework, the UOS ACPI
|
||||
register writing will be dispatched to Device Model and Device Model
|
||||
will emulate the UOS power state (pause UOS VM for S3 and power off UOS
|
||||
VM for S5)
|
||||
|
||||
The VM Manager monitors all UOS. If all active UOSes are in required power
|
||||
state, VM Manager will notify OSPM of SOS to start SOS power state
|
||||
transition. OSPM of SOS follows a very similar process as UOS for power
|
||||
state transition. The difference is SOS ACPI register writing is trapped
|
||||
to ACRN HV. And ACRN HV will emulate SOS power state (pause SOS VM for
|
||||
S3 and no special action for S5)
|
||||
|
||||
Once SOS low power state is done, ACRN HV will go through its own low
|
||||
power state enter path.
|
||||
|
||||
The whole system is finally put into low power state.
|
||||
|
||||
System low power state exit process
|
||||
===================================
|
||||
|
||||
The low power state exit process is in reverse order. The ACRN
|
||||
hypervisor is woken up at first. It will go through its own low power
|
||||
state exit path. Then ACRN hypervisor will resume the SOS to let SOS go
|
||||
through SOS low power state exit path. After that, the DM is resumed and
|
||||
let UOS go through UOS low power state exit path. The system is resumed
|
||||
to running state after at least one UOS is resumed to running state.
|
||||
|
||||
:numref:`pmworkflow` shows the flow of low power S3 enter/exit process (S5 follows
|
||||
very similar process)
|
||||
|
||||
.. figure:: images/hld-pm-image62.png
|
||||
:align: center
|
||||
:name: pmworkflow
|
||||
|
||||
ACRN system power management workflow
|
||||
|
||||
For system power state entry:
|
||||
|
||||
1. UOS OSPM start UOS S3 entry
|
||||
2. The UOS S3 entering request is trapped ACPI PM Device of DM
|
||||
3. DM pauses UOS VM to emulate UOS S3 and notifies VM Manager that the UOS
|
||||
dedicated to it is in S3
|
||||
4. If all UOSes are in S3, VM Manager will notify OSPM of SOS
|
||||
5. SOS OSPM starts SOS S3 enter
|
||||
6. SOS S3 entering request is trapped to Sx Agency in ACRN HV
|
||||
7. ACRN HV pauses SOS VM to emulate SOS S3 and starts ACRN HV S3 entry.
|
||||
|
||||
For system power state exit:
|
||||
|
||||
1. When system is resumed from S3, native bootloader will jump to wake
|
||||
up vector of HV
|
||||
2. HV resumes S3 and jumps to wake up vector to emulate SOS resume from S3
|
||||
3. OSPM of SOS is running
|
||||
4. OSPM of SOS notifies VM Manager that it's ready to wake up UOS
|
||||
5. VM Manager will notify DM to resume the UOS
|
||||
6. DM resets the UOS VM to emulate UOS resume from S3
|
||||
|
||||
According to ACPI standard, S3 is mapped to suspend to RAM and S5 is
|
||||
mapped to shutdown. So the S5 process is a little different:
|
||||
|
||||
- UOS enters S3 -> UOS powers off
|
||||
- System enters S3 -> System powers off
|
||||
- System resumes From S3 -> System fresh start
|
||||
- UOS resumes from S3 -> UOS fresh startup
|
1079
doc/developer-guides/hld/hld-security.rst
Normal file
241
doc/developer-guides/hld/hld-trace-log.rst
Normal file
@@ -0,0 +1,241 @@
|
||||
.. _hld-trace-log:
|
||||
|
||||
Tracing and Logging high-level design
|
||||
#####################################
|
||||
|
||||
Both Trace and Log are built on top of a mechanism named shared
|
||||
buffer (Sbuf).
|
||||
|
||||
Shared Buffer
|
||||
*************
|
||||
|
||||
Shared Buffer is a ring buffer divided into predetermined-size slots. There
|
||||
are two use scenarios of Sbuf:
|
||||
|
||||
- sbuf can serve as a lockless ring buffer to share data from ACRN HV to
|
||||
SOS in non-overwritten mode. (Writing will fail if an overrun
|
||||
happens.)
|
||||
- sbuf can serve as a conventional ring buffer in hypervisor in
|
||||
over-written mode. A lock is required to synchronize access by the
|
||||
producer and consumer.
|
||||
|
||||
Both ACRNTrace and ACRNLog use sbuf as a lockless ring buffer. The Sbuf
|
||||
is allocated by SOS and assigned to HV via a hypercall. To hold pointers
|
||||
to sbuf passed down via hypercall, an array ``sbuf[ACRN_SBUF_ID_MAX]``
|
||||
is defined in per_cpu region of HV, with predefined sbuf id to identify
|
||||
the usage, such as ACRNTrace, ACRNLog, etc.
|
||||
|
||||
For each physical CPU there is a dedicated Sbuf. Only a single producer
|
||||
is allowed to put data into that Sbuf in HV, and a single consumer is
|
||||
allowed to get data from Sbuf in SOS. Therefore, no lock is required to
|
||||
synchronize access by the producer and consumer.
|
||||
|
||||
sbuf APIs
|
||||
=========
|
||||
|
||||
.. note:: reference APIs defined in hypervisor/include/debug/sbuf.h
|
||||
|
||||
|
||||
ACRN Trace
|
||||
**********
|
||||
|
||||
ACRNTrace is a tool running on the Service OS (SOS) to capture trace
|
||||
data. It allows developers to add performance profiling trace points at
|
||||
key locations to get a picture of what is going on inside the
|
||||
hypervisor. Scripts to analyze the collected trace data are also
|
||||
provided.
|
||||
|
||||
As shown in :numref:`acrntrace-arch`, ACRNTrace is built using
|
||||
Shared Buffers (Sbuf), and consists of three parts from bottom layer
|
||||
up:
|
||||
|
||||
- **ACRNTrace userland app**: Userland application collecting trace data to
|
||||
files (Per Physical CPU)
|
||||
|
||||
- **SOS Trace Module**: allocates/frees SBufs, creates device for each
|
||||
SBuf, sets up sbuf shared between SOS and HV, and provides a dev node for the
|
||||
userland app to retrieve trace data from Sbuf
|
||||
|
||||
- **Trace APIs**: provide APIs to generate trace event and insert to Sbuf.
|
||||
|
||||
.. figure:: images/log-image50.png
|
||||
:align: center
|
||||
:name: acrntrace-arch
|
||||
|
||||
Architectural diagram of ACRNTrace
|
||||
|
||||
Trace APIs
|
||||
==========
|
||||
|
||||
.. note:: reference APIs defined in hypervisor/include/debug/trace.h
|
||||
for trace_entry struct and functions.
|
||||
|
||||
|
||||
SOS Trace Module
|
||||
================
|
||||
|
||||
The SOS trace module is responsible for:
|
||||
|
||||
- allocating sbuf in sos memory range for each physical CPU, and assign
|
||||
the gpa of Sbuf to ``per_cpu sbuf[ACRN_TRACE]``
|
||||
- create a misc device for each physical CPU
|
||||
- provide mmap operation to map entire Sbuf to userspace for high
|
||||
flexible and efficient access.
|
||||
|
||||
On SOS shutdown, the trace module is responsible to remove misc devices, free
|
||||
SBufs, and set ``per_cpu sbuf[ACRN_TRACE]`` to null.
|
||||
|
||||
ACRNTrace Application
|
||||
=====================
|
||||
|
||||
ACRNTrace application includes a binary to retrieve trace data from
|
||||
Sbuf, and Python scripts to convert trace data from raw format into
|
||||
readable text, and do analysis.
|
||||
|
||||
Figure 2.2 shows the sequence of trace initialization and trace data
|
||||
collection. With a debug build, trace components are initialized at boot
|
||||
time. After initialization, HV writes trace event date into sbuf
|
||||
until sbuf is full, which can happen easily if the ACRNTrace app is not
|
||||
consuming trace data from Sbuf on SOS user space.
|
||||
|
||||
Once ACRNTrace is launched, for each physical CPU a consumer thread is
|
||||
created to periodically read RAW trace data from sbuf and write to a
|
||||
file.
|
||||
|
||||
.. note:: figure is missing
|
||||
|
||||
Figure 2.2 Sequence of trace init and trace data collection
|
||||
|
||||
These are the Python scripts provided:
|
||||
|
||||
- **acrntrace_format.py** converts RAW trace data to human-readable
|
||||
text offline according to given format;
|
||||
|
||||
- **acrnalyze.py** analyzes trace data (as output by acrntrace)
|
||||
based on given analyzer filters, such as vm_exit or irq, and generates a
|
||||
report.
|
||||
|
||||
See :ref:`acrntrace` for details and usage.
|
||||
|
||||
ACRN Log
|
||||
********
|
||||
|
||||
acrnlog is a tool used to capture ACRN hypervisor log to files on
|
||||
SOS filesystem. It can run as an SOS service at boot, capturing two
|
||||
kinds of logs:
|
||||
|
||||
- Current runtime logs;
|
||||
- Logs remaining in the buffer, from last crashed running.
|
||||
|
||||
Architectural diagram
|
||||
=====================
|
||||
|
||||
Similar to the design of ACRN Trace, ACRN Log is built on the top of
|
||||
Shared Buffer (Sbuf), and consists of three parts from bottom layer
|
||||
up:
|
||||
|
||||
- **ACRN Log app**: Userland application collecting hypervisor log to
|
||||
files;
|
||||
- **SOS ACRN Log Module**: constructs/frees SBufs at reserved memory
|
||||
area, creates dev for current/last logs, sets up sbuf shared between
|
||||
SOS and HV, and provides a dev node for the userland app to
|
||||
retrieve logs
|
||||
- **ACRN log support in HV**: put logs at specified loglevel to Sbuf.
|
||||
|
||||
.. figure:: images/log-image73.png
|
||||
:align: center
|
||||
|
||||
Architectural diagram of ACRN Log
|
||||
|
||||
|
||||
ACRN log support in Hypervisor
|
||||
==============================
|
||||
|
||||
To support acrn log, the following adaption was made to hypervisor log
|
||||
system:
|
||||
|
||||
- log messages with severity level higher than a specified value will
|
||||
be put into Sbuf when calling logmsg in hypervisor
|
||||
- allocate sbuf to accommodate early hypervisor logs before SOS
|
||||
can allocate and set up sbuf
|
||||
|
||||
There are 6 different loglevels, as shown below. The specified
|
||||
severity loglevel is stored in ``mem_loglevel``, initialized
|
||||
by :option:`CONFIG_MEM_LOGLEVEL_DEFAULT`. The loglevel can
|
||||
be set to a new value
|
||||
at runtime via hypervisor shell command "loglevel".
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
#define LOG_FATAL 1U
|
||||
#define LOG_ACRN 2U
|
||||
#define LOG_ERROR 3U
|
||||
#define LOG_WARNING 4U
|
||||
#define LOG_INFO 5U
|
||||
#define LOG_DEBUG 6U
|
||||
|
||||
|
||||
The element size of sbuf for logs is fixed at 80 bytes, and the max size
|
||||
of a single log message is 320 bytes. Log messages with a length between
|
||||
80 and 320 bytes will be separated into multiple sbuf elements. Log
|
||||
messages with length larger then 320 will be truncated.
|
||||
|
||||
For security, SOS allocates sbuf in its memory range and assigns it to
|
||||
the hypervisor. To handle log messages before SOS boots, sbuf for each
|
||||
physical cpu will be allocated in acrn hypervisor memory range for any
|
||||
early log entries. Once sbuf in the SOS memory range is allocated and
|
||||
assigned to hypervisor via hypercall, the Hypervisor logmsg will switch
|
||||
to use SOS allocated sbuf, early logs will be copied, and early sbuf in
|
||||
hypervisor memory range will be freed.
|
||||
|
||||
SOS ACRN Log Module
|
||||
===================
|
||||
|
||||
To enable retrieving log messages from a crash, 4MB of memory from
|
||||
0x6DE00000 is reserved for acrn log. This space is further divided into
|
||||
two each ranges, one for current run and one for last previous run:
|
||||
|
||||
.. figure:: images/log-image59.png
|
||||
:align: center
|
||||
|
||||
ACRN Log crash log/current log buffers
|
||||
|
||||
On SOS boot, SOS acrnlog module is responsible to:
|
||||
|
||||
- examine if there are log messages remaining from last crashed
|
||||
run by checking the magic number of each sbuf
|
||||
|
||||
- if there are previous crash logs, construct sbuf and create misc devices for
|
||||
these last logs
|
||||
|
||||
- construct sbuf in the usable buf range for each physical CPU,
|
||||
assign the gpa of Sbuf to ``per_cpu sbuf[ACRN_LOG]`` and create a misc
|
||||
device for each physical CPU
|
||||
|
||||
- the misc devices implement read() file operation to allow
|
||||
userspace app to read one Sbuf element.
|
||||
|
||||
When checking the validity of sbuf for last logs examination, it sets the
|
||||
current sbuf with magic number ``0x5aa57aa71aa13aa3``, and changes the
|
||||
magic number of last sbuf to ``0x5aa57aa71aa13aa2``, to distinguish which is
|
||||
the current/last.
|
||||
|
||||
On SOS shutdown, the module is responsible to remove misc devices,
|
||||
free SBufs, and set ``per_cpu sbuf[ACRN_TRACE]`` to null.
|
||||
|
||||
ACRN Log Application
|
||||
====================
|
||||
|
||||
ACRNLog application reads log messages from sbuf for each physical
|
||||
CPU and combines them into log files with log messages in ascending
|
||||
order by the global sequence number. If the sequence number is not
|
||||
continuous, a warning of "incontinuous logs" will be inserted.
|
||||
|
||||
To avoid using up storage space, the size of a single log file and
|
||||
the total number of log files are both limited. By default, log file
|
||||
size limitation is 1MB and file number limitation is 4.
|
||||
|
||||
If there are last log devices, ACRN log will read out the log
|
||||
messages, combine them, and save them into last log files.
|
||||
|
||||
See :ref:`acrnlog` for usage details.
|
763
doc/developer-guides/hld/hld-virtio-devices.rst
Normal file
@@ -0,0 +1,763 @@
|
||||
.. _hld-virtio-devices:
|
||||
.. _virtio-hld:
|
||||
|
||||
Virtio devices high-level design
|
||||
################################
|
||||
|
||||
The ACRN Hypervisor follows the `Virtual I/O Device (virtio)
|
||||
specification
|
||||
<http://docs.oasis-open.org/virtio/virtio/v1.0/virtio-v1.0.html>`_ to
|
||||
realize I/O virtualization for many performance-critical devices
|
||||
supported in the ACRN project. Adopting the virtio specification lets us
|
||||
reuse many frontend virtio drivers already available in a Linux-based
|
||||
User OS, drastically reducing potential development effort for frontend
|
||||
virtio drivers. To further reduce the development effort of backend
|
||||
virtio drivers, the hypervisor provides the virtio backend service
|
||||
(VBS) APIs, that make it very straightforward to implement a virtio
|
||||
device in the hypervisor.
|
||||
|
||||
The virtio APIs can be divided into 3 groups: DM APIs, virtio backend
|
||||
service (VBS) APIs, and virtqueue (VQ) APIs, as shown in
|
||||
:numref:`be-interface`.
|
||||
|
||||
.. figure:: images/virtio-hld-image0.png
|
||||
:width: 900px
|
||||
:align: center
|
||||
:name: be-interface
|
||||
|
||||
ACRN Virtio Backend Service Interface
|
||||
|
||||
- **DM APIs** are exported by the DM, and are mainly used during the
|
||||
device initialization phase and runtime. The DM APIs also include
|
||||
PCIe emulation APIs because each virtio device is a PCIe device in
|
||||
the SOS and UOS.
|
||||
- **VBS APIs** are mainly exported by the VBS and related modules.
|
||||
Generally they are callbacks to be
|
||||
registered into the DM.
|
||||
- **VQ APIs** are used by a virtio backend device to access and parse
|
||||
information from the shared memory between the frontend and backend
|
||||
device drivers.
|
||||
|
||||
Virtio framework is the para-virtualization specification that ACRN
|
||||
follows to implement I/O virtualization of performance-critical
|
||||
devices such as audio, eAVB/TSN, IPU, and CSMU devices. This section gives
|
||||
an overview about virtio history, motivation, and advantages, and then
|
||||
highlights virtio key concepts. Second, this section will describe
|
||||
ACRN's virtio architectures, and elaborates on ACRN virtio APIs. Finally
|
||||
this section will introduce all the virtio devices currently supported
|
||||
by ACRN.
|
||||
|
||||
Virtio introduction
|
||||
*******************
|
||||
|
||||
Virtio is an abstraction layer over devices in a para-virtualized
|
||||
hypervisor. Virtio was developed by Rusty Russell when he worked at IBM
|
||||
research to support his lguest hypervisor in 2007, and it quickly became
|
||||
the de facto standard for KVM's para-virtualized I/O devices.
|
||||
|
||||
Virtio is very popular for virtual I/O devices because is provides a
|
||||
straightforward, efficient, standard, and extensible mechanism, and
|
||||
eliminates the need for boutique, per-environment, or per-OS mechanisms.
|
||||
For example, rather than having a variety of device emulation
|
||||
mechanisms, virtio provides a common frontend driver framework that
|
||||
standardizes device interfaces, and increases code reuse across
|
||||
different virtualization platforms.
|
||||
|
||||
Given the advantages of virtio, ACRN also follows the virtio
|
||||
specification.
|
||||
|
||||
Key Concepts
|
||||
************
|
||||
|
||||
To better understand virtio, especially its usage in ACRN, we'll
|
||||
highlight several key virtio concepts important to ACRN:
|
||||
|
||||
|
||||
Frontend virtio driver (FE)
|
||||
Virtio adopts a frontend-backend architecture that enables a simple but
|
||||
flexible framework for both frontend and backend virtio drivers. The FE
|
||||
driver merely needs to offer services configure the interface, pass messages,
|
||||
produce requests, and kick backend virtio driver. As a result, the FE
|
||||
driver is easy to implement and the performance overhead of emulating
|
||||
a device is eliminated.
|
||||
|
||||
Backend virtio driver (BE)
|
||||
Similar to FE driver, the BE driver, running either in user-land or
|
||||
kernel-land of the host OS, consumes requests from the FE driver and sends them
|
||||
to the host native device driver. Once the requests are done by the host
|
||||
native device driver, the BE driver notifies the FE driver that the
|
||||
request is complete.
|
||||
|
||||
Note: to distinguish BE driver from host native device driver, the host
|
||||
native device driver is called "native driver" in this document.
|
||||
|
||||
Straightforward: virtio devices as standard devices on existing buses
|
||||
Instead of creating new device buses from scratch, virtio devices are
|
||||
built on existing buses. This gives a straightforward way for both FE
|
||||
and BE drivers to interact with each other. For example, FE driver could
|
||||
read/write registers of the device, and the virtual device could
|
||||
interrupt FE driver, on behalf of the BE driver, in case something of
|
||||
interest is happening.
|
||||
|
||||
Currently virtio supports PCI/PCIe bus and MMIO bus. In ACRN, only
|
||||
PCI/PCIe bus is supported, and all the virtio devices share the same
|
||||
vendor ID 0x1AF4.
|
||||
|
||||
Note: For MMIO, the "bus" is a little bit an overstatement since
|
||||
basically it is a few descriptors describing the devices.
|
||||
|
||||
Efficient: batching operation is encouraged
|
||||
Batching operation and deferred notification are important to achieve
|
||||
high-performance I/O, since notification between FE and BE driver
|
||||
usually involves an expensive exit of the guest. Therefore batching
|
||||
operating and notification suppression are highly encouraged if
|
||||
possible. This will give an efficient implementation for
|
||||
performance-critical devices.
|
||||
|
||||
Standard: virtqueue
|
||||
All virtio devices share a standard ring buffer and descriptor
|
||||
mechanism, called a virtqueue, shown in :numref:`virtqueue`. A virtqueue is a
|
||||
queue of scatter-gather buffers. There are three important methods on
|
||||
virtqueues:
|
||||
|
||||
- **add_buf** is for adding a request/response buffer in a virtqueue,
|
||||
- **get_buf** is for getting a response/request in a virtqueue, and
|
||||
- **kick** is for notifying the other side for a virtqueue to consume buffers.
|
||||
|
||||
The virtqueues are created in guest physical memory by the FE drivers.
|
||||
BE drivers only need to parse the virtqueue structures to obtain
|
||||
the requests and process them. How a virtqueue is organized is
|
||||
specific to the Guest OS. In the Linux implementation of virtio, the
|
||||
virtqueue is implemented as a ring buffer structure called vring.
|
||||
|
||||
In ACRN, the virtqueue APIs can be leveraged directly so that users
|
||||
don't need to worry about the details of the virtqueue. (Refer to guest
|
||||
OS for more details about the virtqueue implementation.)
|
||||
|
||||
.. figure:: images/virtio-hld-image2.png
|
||||
:width: 900px
|
||||
:align: center
|
||||
:name: virtqueue
|
||||
|
||||
Virtqueue
|
||||
|
||||
Extensible: feature bits
|
||||
A simple extensible feature negotiation mechanism exists for each
|
||||
virtual device and its driver. Each virtual device could claim its
|
||||
device specific features while the corresponding driver could respond to
|
||||
the device with the subset of features the driver understands. The
|
||||
feature mechanism enables forward and backward compatibility for the
|
||||
virtual device and driver.
|
||||
|
||||
Virtio Device Modes
|
||||
The virtio specification defines three modes of virtio devices:
|
||||
a legacy mode device, a transitional mode device, and a modern mode
|
||||
device. A legacy mode device is compliant to virtio specification
|
||||
version 0.95, a transitional mode device is compliant to both
|
||||
0.95 and 1.0 spec versions, and a modern mode
|
||||
device is only compatible to the version 1.0 specification.
|
||||
|
||||
In ACRN, all the virtio devices are transitional devices, meaning that
|
||||
they should be compatible with both 0.95 and 1.0 versions of virtio
|
||||
specification.
|
||||
|
||||
Virtio Device Discovery
|
||||
Virtio devices are commonly implemented as PCI/PCIe devices. A
|
||||
virtio device using virtio over PCI/PCIe bus must expose an interface to
|
||||
the Guest OS that meets the PCI/PCIe specifications.
|
||||
|
||||
Conventionally, any PCI device with Vendor ID 0x1AF4,
|
||||
PCI_VENDOR_ID_REDHAT_QUMRANET, and Device ID 0x1000 through 0x107F
|
||||
inclusive is a virtio device. Among the Device IDs, the
|
||||
legacy/transitional mode virtio devices occupy the first 64 IDs ranging
|
||||
from 0x1000 to 0x103F, while the range 0x1040-0x107F belongs to
|
||||
virtio modern devices. In addition, the Subsystem Vendor ID should
|
||||
reflect the PCI/PCIe vendor ID of the environment, and the Subsystem
|
||||
Device ID indicates which virtio device is supported by the device.
|
||||
|
||||
Virtio Frameworks
|
||||
*****************
|
||||
|
||||
This section describes the overall architecture of virtio, and then
|
||||
introduce ACRN specific implementations of the virtio framework.
|
||||
|
||||
Architecture
|
||||
============
|
||||
|
||||
Virtio adopts a frontend-backend
|
||||
architecture, as shown in :numref:`virtio-arch`. Basically the FE and BE driver
|
||||
communicate with each other through shared memory, via the
|
||||
virtqueues. The FE driver talks to the BE driver in the same way it
|
||||
would talk to a real PCIe device. The BE driver handles requests
|
||||
from the FE driver, and notifies the FE driver if the request has been
|
||||
processed.
|
||||
|
||||
.. figure:: images/virtio-hld-image1.png
|
||||
:width: 900px
|
||||
:align: center
|
||||
:name: virtio-arch
|
||||
|
||||
Virtio Architecture
|
||||
|
||||
In addition to virtio's frontend-backend architecture, both FE and BE
|
||||
drivers follow a layered architecture, as shown in
|
||||
:numref:`virtio-fe-be`. Each
|
||||
side has three layers: transports, core models, and device types.
|
||||
All virtio devices share the same virtio infrastructure, including
|
||||
virtqueues, feature mechanisms, configuration space, and buses.
|
||||
|
||||
.. figure:: images/virtio-hld-image4.png
|
||||
:width: 900px
|
||||
:align: center
|
||||
:name: virtio-fe-be
|
||||
|
||||
Virtio Frontend/Backend Layered Architecture
|
||||
|
||||
Virtio Framework Considerations
|
||||
===============================
|
||||
|
||||
How to realize the virtio framework is specific to a
|
||||
hypervisor implementation. In ACRN, the virtio framework implementations
|
||||
can be classified into two types, virtio backend service in user-land
|
||||
(VBS-U) and virtio backend service in kernel-land (VBS-K), according to
|
||||
where the virtio backend service (VBS) is located. Although different in BE
|
||||
drivers, both VBS-U and VBS-K share the same FE drivers. The reason
|
||||
behind the two virtio implementations is to meet the requirement of
|
||||
supporting a large amount of diverse I/O devices in ACRN project.
|
||||
|
||||
When developing a virtio BE device driver, the device owner should choose
|
||||
carefully between the VBS-U and VBS-K. Generally VBS-U targets
|
||||
non-performance-critical devices, but enables easy development and
|
||||
debugging. VBS-K targets performance critical devices.
|
||||
|
||||
The next two sections introduce ACRN's two implementations of the virtio
|
||||
framework.
|
||||
|
||||
User-Land Virtio Framework
|
||||
==========================
|
||||
|
||||
The architecture of ACRN user-land virtio framework (VBS-U) is shown in
|
||||
:numref:`virtio-userland`.
|
||||
|
||||
The FE driver talks to the BE driver as if it were talking with a PCIe
|
||||
device. This means for "control plane", the FE driver could poke device
|
||||
registers through PIO or MMIO, and the device will interrupt the FE
|
||||
driver when something happens. For "data plane", the communication
|
||||
between the FE and BE driver is through shared memory, in the form of
|
||||
virtqueues.
|
||||
|
||||
On the service OS side where the BE driver is located, there are several
|
||||
key components in ACRN, including device model (DM), virtio and HV
|
||||
service module (VHM), VBS-U, and user-level vring service API helpers.
|
||||
|
||||
DM bridges the FE driver and BE driver since each VBS-U module emulates
|
||||
a PCIe virtio device. VHM bridges DM and the hypervisor by providing
|
||||
remote memory map APIs and notification APIs. VBS-U accesses the
|
||||
virtqueue through the user-level vring service API helpers.
|
||||
|
||||
.. figure:: images/virtio-hld-image3.png
|
||||
:width: 900px
|
||||
:align: center
|
||||
:name: virtio-userland
|
||||
|
||||
ACRN User-Land Virtio Framework
|
||||
|
||||
Kernel-Land Virtio Framework
|
||||
============================
|
||||
|
||||
ACRN supports two kernel-land virtio frameworks: VBS-K, designed from
|
||||
scratch for ACRN, the other called Vhost, compatible with Linux Vhost.
|
||||
|
||||
VBS-K framework
|
||||
---------------
|
||||
|
||||
The architecture of ACRN VBS-K is shown in
|
||||
:numref:`kernel-virtio-framework` below.
|
||||
|
||||
Generally VBS-K provides acceleration towards performance critical
|
||||
devices emulated by VBS-U modules by handling the "data plane" of the
|
||||
devices directly in the kernel. When VBS-K is enabled for certain
|
||||
devices, the kernel-land vring service API helpers, instead of the
|
||||
user-land helpers, are used to access the virtqueues shared by the FE
|
||||
driver. Compared to VBS-U, this eliminates the overhead of copying data
|
||||
back-and-forth between user-land and kernel-land within service OS, but
|
||||
pays with the extra implementation complexity of the BE drivers.
|
||||
|
||||
Except for the differences mentioned above, VBS-K still relies on VBS-U
|
||||
for feature negotiations between FE and BE drivers. This means the
|
||||
"control plane" of the virtio device still remains in VBS-U. When
|
||||
feature negotiation is done, which is determined by FE driver setting up
|
||||
an indicative flag, VBS-K module will be initialized by VBS-U.
|
||||
Afterwards, all request handling will be offloaded to the VBS-K in
|
||||
kernel.
|
||||
|
||||
Finally the FE driver is not aware of how the BE driver is implemented,
|
||||
either in VBS-U or VBS-K. This saves engineering effort regarding FE
|
||||
driver development.
|
||||
|
||||
.. figure:: images/virtio-hld-image54.png
|
||||
:align: center
|
||||
:name: kernel-virtio-framework
|
||||
|
||||
ACRN Kernel Land Virtio Framework
|
||||
|
||||
Vhost framework
|
||||
---------------
|
||||
|
||||
Vhost is similar to VBS-K. Vhost is a common solution upstreamed in the
|
||||
Linux kernel, with several kernel mediators based on it.
|
||||
|
||||
Architecture
|
||||
~~~~~~~~~~~~
|
||||
|
||||
Vhost/virtio is a semi-virtualized device abstraction interface
|
||||
specification that has been widely applied in various virtualization
|
||||
solutions. Vhost is a specific kind of virtio where the data plane is
|
||||
put into host kernel space to reduce the context switch while processing
|
||||
the IO request. It is usually called "virtio" when used as a front-end
|
||||
driver in a guest operating system or "vhost" when used as a back-end
|
||||
driver in a host. Compared with a pure virtio solution on a host, vhost
|
||||
uses the same frontend driver as virtio solution and can achieve better
|
||||
performance. :numref:`vhost-arch` shows the vhost architecture on ACRN.
|
||||
|
||||
.. figure:: images/virtio-hld-image71.png
|
||||
:align: center
|
||||
:name: vhost-arch
|
||||
|
||||
Vhost Architecture on ACRN
|
||||
|
||||
Compared with a userspace virtio solution, vhost decomposes data plane
|
||||
from user space to kernel space. The vhost general data plane workflow
|
||||
can be described as:
|
||||
|
||||
1. vhost proxy creates two eventfds per virtqueue, one is for kick,
|
||||
(an ioeventfd), the other is for call, (an irqfd).
|
||||
2. vhost proxy registers the two eventfds to VHM through VHM character
|
||||
device:
|
||||
|
||||
a) Ioevenftd is bound with a PIO/MMIO range. If it is a PIO, it is
|
||||
registered with (fd, port, len, value). If it is a MMIO, it is
|
||||
registered with (fd, addr, len).
|
||||
b) Irqfd is registered with MSI vector.
|
||||
|
||||
3. vhost proxy sets the two fds to vhost kernel through ioctl of vhost
|
||||
device.
|
||||
4. vhost starts polling the kick fd and wakes up when guest kicks a
|
||||
virtqueue, which results a event_signal on kick fd by VHM ioeventfd.
|
||||
5. vhost device in kernel signals on the irqfd to notify the guest.
|
||||
|
||||
Ioeventfd implementation
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Ioeventfd module is implemented in VHM, and can enhance a registered
|
||||
eventfd to listen to IO requests (PIO/MMIO) from vhm ioreq module and
|
||||
signal the eventfd when needed. :numref:`ioeventfd-workflow` shows the
|
||||
general workflow of ioeventfd.
|
||||
|
||||
.. figure:: images/virtio-hld-image58.png
|
||||
:align: center
|
||||
:name: ioeventfd-workflow
|
||||
|
||||
ioeventfd general work flow
|
||||
|
||||
The workflow can be summarized as:
|
||||
|
||||
1. vhost device init. Vhost proxy create two eventfd for ioeventfd and
|
||||
irqfd.
|
||||
2. pass ioeventfd to vhost kernel driver.
|
||||
3. pass ioevent fd to vhm driver
|
||||
4. UOS FE driver triggers ioreq and forwarded to SOS by hypervisor
|
||||
5. ioreq is dispatched by vhm driver to related vhm client.
|
||||
6. ioeventfd vhm client traverse the io_range list and find
|
||||
corresponding eventfd.
|
||||
7. trigger the signal to related eventfd.
|
||||
|
||||
Irqfd implementation
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The irqfd module is implemented in VHM, and can enhance an registered
|
||||
eventfd to inject an interrupt to a guest OS when the eventfd gets
|
||||
signaled. :numref:`irqfd-workflow` shows the general flow for irqfd.
|
||||
|
||||
.. figure:: images/virtio-hld-image60.png
|
||||
:align: center
|
||||
:name: irqfd-workflow
|
||||
|
||||
irqfd general flow
|
||||
|
||||
The workflow can be summarized as:
|
||||
|
||||
1. vhost device init. Vhost proxy create two eventfd for ioeventfd and
|
||||
irqfd.
|
||||
2. pass irqfd to vhost kernel driver.
|
||||
3. pass irq fd to vhm driver
|
||||
4. vhost device driver triggers irq eventfd signal once related native
|
||||
transfer is completed.
|
||||
5. irqfd related logic traverses the irqfd list to retrieve related irq
|
||||
information.
|
||||
6. irqfd related logic inject an interrupt through vhm interrupt API.
|
||||
7. interrupt is delivered to UOS FE driver through hypervisor.
|
||||
|
||||
Virtio APIs
|
||||
***********
|
||||
|
||||
This section provides details on the ACRN virtio APIs. As outlined previously,
|
||||
the ACRN virtio APIs can be divided into three groups: DM_APIs,
|
||||
VBS_APIs, and VQ_APIs. The following sections will elaborate on
|
||||
these APIs.
|
||||
|
||||
VBS-U Key Data Structures
|
||||
=========================
|
||||
|
||||
The key data structures for VBS-U are listed as following, and their
|
||||
relationships are shown in :numref:`VBS-U-data`.
|
||||
|
||||
``struct pci_virtio_blk``
|
||||
An example virtio device, such as virtio-blk.
|
||||
``struct virtio_common``
|
||||
A common component to any virtio device.
|
||||
``struct virtio_ops``
|
||||
Virtio specific operation functions for this type of virtio device.
|
||||
``struct pci_vdev``
|
||||
Instance of a virtual PCIe device, and any virtio
|
||||
device is a virtual PCIe device.
|
||||
``struct pci_vdev_ops``
|
||||
PCIe device's operation functions for this type
|
||||
of device.
|
||||
``struct vqueue_info``
|
||||
Instance of a virtqueue.
|
||||
|
||||
.. figure:: images/virtio-hld-image5.png
|
||||
:width: 900px
|
||||
:align: center
|
||||
:name: VBS-U-data
|
||||
|
||||
VBS-U Key Data Structures
|
||||
|
||||
Each virtio device is a PCIe device. In addition, each virtio device
|
||||
could have none or multiple virtqueues, depending on the device type.
|
||||
The ``struct virtio_common`` is a key data structure to be manipulated by
|
||||
DM, and DM finds other key data structures through it. The ``struct
|
||||
virtio_ops`` abstracts a series of virtio callbacks to be provided by
|
||||
device owner.
|
||||
|
||||
VBS-K Key Data Structures
|
||||
=========================
|
||||
|
||||
The key data structures for VBS-K are listed as follows, and their
|
||||
relationships are shown in :numref:`VBS-K-data`.
|
||||
|
||||
``struct vbs_k_rng``
|
||||
In-kernel VBS-K component handling data plane of a
|
||||
VBS-U virtio device, for example virtio random_num_generator.
|
||||
``struct vbs_k_dev``
|
||||
In-kernel VBS-K component common to all VBS-K.
|
||||
``struct vbs_k_vq``
|
||||
In-kernel VBS-K component to be working with kernel
|
||||
vring service API helpers.
|
||||
``struct vbs_k_dev_inf``
|
||||
Virtio device information to be synchronized
|
||||
from VBS-U to VBS-K kernel module.
|
||||
``struct vbs_k_vq_info``
|
||||
A single virtqueue information to be
|
||||
synchronized from VBS-U to VBS-K kernel module.
|
||||
``struct vbs_k_vqs_info``
|
||||
Virtqueue(s) information, of a virtio device,
|
||||
to be synchronized from VBS-U to VBS-K kernel module.
|
||||
|
||||
.. figure:: images/virtio-hld-image8.png
|
||||
:width: 900px
|
||||
:align: center
|
||||
:name: VBS-K-data
|
||||
|
||||
VBS-K Key Data Structures
|
||||
|
||||
In VBS-K, the struct vbs_k_xxx represents the in-kernel component
|
||||
handling a virtio device's data plane. It presents a char device for VBS-U
|
||||
to open and register device status after feature negotiation with the FE
|
||||
driver.
|
||||
|
||||
The device status includes negotiated features, number of virtqueues,
|
||||
interrupt information, and more. All these status will be synchronized
|
||||
from VBS-U to VBS-K. In VBS-U, the ``struct vbs_k_dev_info`` and ``struct
|
||||
vbs_k_vqs_info`` will collect all the information and notify VBS-K through
|
||||
ioctls. In VBS-K, the ``struct vbs_k_dev`` and ``struct vbs_k_vq``, which are
|
||||
common to all VBS-K modules, are the counterparts to preserve the
|
||||
related information. The related information is necessary to kernel-land
|
||||
vring service API helpers.
|
||||
|
||||
VHOST Key Data Structures
|
||||
=========================
|
||||
|
||||
The key data structures for vhost are listed as follows.
|
||||
|
||||
.. doxygenstruct:: vhost_dev
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenstruct:: vhost_vq
|
||||
:project: Project ACRN
|
||||
|
||||
DM APIs
|
||||
=======
|
||||
|
||||
The DM APIs are exported by DM, and they should be used when realizing
|
||||
BE device drivers on ACRN.
|
||||
|
||||
.. doxygenfunction:: paddr_guest2host
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: pci_set_cfgdata8
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: pci_set_cfgdata16
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: pci_set_cfgdata32
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: pci_get_cfgdata8
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: pci_get_cfgdata16
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: pci_get_cfgdata32
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: pci_lintr_assert
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: pci_lintr_deassert
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: pci_generate_msi
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: pci_generate_msix
|
||||
:project: Project ACRN
|
||||
|
||||
VBS APIs
|
||||
========
|
||||
|
||||
The VBS APIs are exported by VBS related modules, including VBS, DM, and
|
||||
SOS kernel modules. They can be classified into VBS-U and VBS-K APIs
|
||||
listed as follows.
|
||||
|
||||
VBS-U APIs
|
||||
----------
|
||||
|
||||
These APIs provided by VBS-U are callbacks to be registered to DM, and
|
||||
the virtio framework within DM will invoke them appropriately.
|
||||
|
||||
.. doxygenstruct:: virtio_ops
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: virtio_pci_read
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: virtio_pci_write
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: virtio_interrupt_init
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: virtio_linkup
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: virtio_reset_dev
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: virtio_set_io_bar
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: virtio_set_modern_bar
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: virtio_config_changed
|
||||
:project: Project ACRN
|
||||
|
||||
VBS-K APIs
|
||||
----------
|
||||
|
||||
The VBS-K APIs are exported by VBS-K related modules. Users could use
|
||||
the following APIs to implement their VBS-K modules.
|
||||
|
||||
APIs provided by DM
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. doxygenfunction:: vbs_kernel_reset
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: vbs_kernel_start
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: vbs_kernel_stop
|
||||
:project: Project ACRN
|
||||
|
||||
APIs provided by VBS-K modules in service OS
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. kernel-doc:: include/linux/vbs/vbs.h
|
||||
:functions: virtio_dev_init
|
||||
virtio_dev_ioctl
|
||||
virtio_vqs_ioctl
|
||||
virtio_dev_register
|
||||
virtio_dev_deregister
|
||||
virtio_vqs_index_get
|
||||
virtio_dev_reset
|
||||
|
||||
VHOST APIS
|
||||
==========
|
||||
|
||||
APIs provided by DM
|
||||
-------------------
|
||||
|
||||
.. doxygenfunction:: vhost_dev_init
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: vhost_dev_deinit
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: vhost_dev_start
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: vhost_dev_stop
|
||||
:project: Project ACRN
|
||||
|
||||
Linux vhost IOCTLs
|
||||
------------------
|
||||
|
||||
``#define VHOST_GET_FEATURES _IOR(VHOST_VIRTIO, 0x00, __u64)``
|
||||
This IOCTL is used to get the supported feature flags by vhost kernel driver.
|
||||
``#define VHOST_SET_FEATURES _IOW(VHOST_VIRTIO, 0x00, __u64)``
|
||||
This IOCTL is used to set the supported feature flags to vhost kernel driver.
|
||||
``#define VHOST_SET_OWNER _IO(VHOST_VIRTIO, 0x01)``
|
||||
This IOCTL is used to set current process as the exclusive owner of the vhost
|
||||
char device. It must be called before any other vhost commands.
|
||||
``#define VHOST_RESET_OWNER _IO(VHOST_VIRTIO, 0x02)``
|
||||
This IOCTL is used to give up the ownership of the vhost char device.
|
||||
``#define VHOST_SET_MEM_TABLE _IOW(VHOST_VIRTIO, 0x03, struct vhost_memory)``
|
||||
This IOCTL is used to convey the guest OS memory layout to vhost kernel driver.
|
||||
``#define VHOST_SET_VRING_NUM _IOW(VHOST_VIRTIO, 0x10, struct vhost_vring_state)``
|
||||
This IOCTL is used to set the number of descriptors in virtio ring. It cannot
|
||||
be modified while the virtio ring is running.
|
||||
``#define VHOST_SET_VRING_ADDR _IOW(VHOST_VIRTIO, 0x11, struct vhost_vring_addr)``
|
||||
This IOCTL is used to set the address of the virtio ring.
|
||||
``#define VHOST_SET_VRING_BASE _IOW(VHOST_VIRTIO, 0x12, struct vhost_vring_state)``
|
||||
This IOCTL is used to set the base value where virtqueue looks for available
|
||||
descriptors.
|
||||
``#define VHOST_GET_VRING_BASE _IOWR(VHOST_VIRTIO, 0x12, struct vhost_vring_state)``
|
||||
This IOCTL is used to get the base value where virtqueue looks for available
|
||||
descriptors.
|
||||
``#define VHOST_SET_VRING_KICK _IOW(VHOST_VIRTIO, 0x20, struct vhost_vring_file)``
|
||||
This IOCTL is used to set the eventfd on which vhost can poll for guest
|
||||
virtqueue kicks.
|
||||
``#define VHOST_SET_VRING_CALL _IOW(VHOST_VIRTIO, 0x21, struct vhost_vring_file)``
|
||||
This IOCTL is used to set the eventfd which is used by vhost do inject
|
||||
virtual interrupt.
|
||||
|
||||
VHM eventfd IOCTLs
|
||||
------------------
|
||||
|
||||
.. doxygenstruct:: acrn_ioeventfd
|
||||
:project: Project ACRN
|
||||
|
||||
``#define IC_EVENT_IOEVENTFD _IC_ID(IC_ID, IC_ID_EVENT_BASE + 0x00)``
|
||||
This IOCTL is used to register/unregister ioeventfd with appropriate address,
|
||||
length and data value.
|
||||
|
||||
.. doxygenstruct:: acrn_irqfd
|
||||
:project: Project ACRN
|
||||
|
||||
``#define IC_EVENT_IRQFD _IC_ID(IC_ID, IC_ID_EVENT_BASE + 0x01)``
|
||||
This IOCTL is used to register/unregister irqfd with appropriate MSI information.
|
||||
|
||||
VQ APIs
|
||||
=======
|
||||
|
||||
The virtqueue APIs, or VQ APIs, are used by a BE device driver to
|
||||
access the virtqueues shared by the FE driver. The VQ APIs abstract the
|
||||
details of virtqueues so that users don't need to worry about the data
|
||||
structures within the virtqueues. In addition, the VQ APIs are designed
|
||||
to be identical between VBS-U and VBS-K, so that users don't need to
|
||||
learn different APIs when implementing BE drivers based on VBS-U and
|
||||
VBS-K.
|
||||
|
||||
.. doxygenfunction:: vq_interrupt
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: vq_getchain
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: vq_retchain
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: vq_relchain
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: vq_endchains
|
||||
:project: Project ACRN
|
||||
|
||||
Below is an example showing a typical logic of how a BE driver handles
|
||||
requests from a FE driver.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
static void BE_callback(struct pci_virtio_xxx *pv, struct vqueue_info *vq ) {
|
||||
while (vq_has_descs(vq)) {
|
||||
vq_getchain(vq, &idx, &iov, 1, NULL);
|
||||
/* handle requests in iov */
|
||||
request_handle_proc();
|
||||
/* Release this chain and handle more */
|
||||
vq_relchain(vq, idx, len);
|
||||
}
|
||||
/* Generate interrupt if appropriate. 1 means ring empty \*/
|
||||
vq_endchains(vq, 1);
|
||||
}
|
||||
|
||||
Supported Virtio Devices
|
||||
************************
|
||||
|
||||
All the BE virtio drivers are implemented using the
|
||||
ACRN virtio APIs, and the FE drivers are reusing the standard Linux FE
|
||||
virtio drivers. For the devices with FE drivers available in the Linux
|
||||
kernel, they should use standard virtio Vendor ID/Device ID and
|
||||
Subsystem Vendor ID/Subsystem Device ID. For other devices within ACRN,
|
||||
their temporary IDs are listed in the following table.
|
||||
|
||||
.. table:: Virtio Devices without existing FE drivers in Linux
|
||||
:align: center
|
||||
:name: virtio-device-table
|
||||
|
||||
+--------------+-------------+-------------+-------------+-------------+
|
||||
| virtio | Vendor ID | Device ID | Subvendor | Subdevice |
|
||||
| device | | | ID | ID |
|
||||
+--------------+-------------+-------------+-------------+-------------+
|
||||
| RPMB | 0x8086 | 0x8601 | 0x8086 | 0xFFFF |
|
||||
+--------------+-------------+-------------+-------------+-------------+
|
||||
| HECI | 0x8086 | 0x8602 | 0x8086 | 0xFFFE |
|
||||
+--------------+-------------+-------------+-------------+-------------+
|
||||
| audio | 0x8086 | 0x8603 | 0x8086 | 0xFFFD |
|
||||
+--------------+-------------+-------------+-------------+-------------+
|
||||
| IPU | 0x8086 | 0x8604 | 0x8086 | 0xFFFC |
|
||||
+--------------+-------------+-------------+-------------+-------------+
|
||||
| TSN/AVB | 0x8086 | 0x8605 | 0x8086 | 0xFFFB |
|
||||
+--------------+-------------+-------------+-------------+-------------+
|
||||
| hyper_dmabuf | 0x8086 | 0x8606 | 0x8086 | 0xFFFA |
|
||||
+--------------+-------------+-------------+-------------+-------------+
|
||||
| HDCP | 0x8086 | 0x8607 | 0x8086 | 0xFFF9 |
|
||||
+--------------+-------------+-------------+-------------+-------------+
|
||||
| COREU | 0x8086 | 0x8608 | 0x8086 | 0xFFF8 |
|
||||
+--------------+-------------+-------------+-------------+-------------+
|
||||
|
||||
The following sections introduce the status of virtio devices currently
|
||||
supported in ACRN.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
virtio-blk
|
||||
virtio-net
|
||||
virtio-input
|
||||
virtio-console
|
||||
virtio-rnd
|
161
doc/developer-guides/hld/hld-vm-management.rst
Normal file
@@ -0,0 +1,161 @@
|
||||
.. _hld-vm-management:
|
||||
|
||||
VM Management high-level design
|
||||
###############################
|
||||
|
||||
Management of a Virtual Machine (VM) means to switch a VM to the right
|
||||
state, according to the requirements of applications or system power
|
||||
operations.
|
||||
|
||||
VM state
|
||||
********
|
||||
|
||||
Generally, a VM is not running at the beginning: it is in a 'stopped'
|
||||
state. After its UOS is launched successfully, the VM enter a 'running'
|
||||
state. When the UOS powers off, the VM returns to a 'stopped' state again.
|
||||
A UOS can sleep when it is running, so there is also a 'paused' state.
|
||||
|
||||
Because VMs are designed to work under an SOS environment, a VM can
|
||||
only run and change its state when the SOS is running. A VM must be put to
|
||||
'paused' or 'stopped' state before the SOS can sleep or power-off.
|
||||
Otherwise the VM may be damaged and user data would be lost.
|
||||
|
||||
Scenarios of VM state change
|
||||
****************************
|
||||
|
||||
Button-initiated System Power On
|
||||
================================
|
||||
|
||||
When the user presses the power button to power on the system,
|
||||
everything is started at the beginning. VMs that run user applications
|
||||
are launched automatically after the SOS is ready.
|
||||
|
||||
Button-initiated VM Power on
|
||||
============================
|
||||
|
||||
At SOS boot up, SOS-Life-Cycle-Service and Acrnd are automatically started
|
||||
as system services. SOS-Life-Cycle-Service notifies Acrnd that SOS is
|
||||
started, then Acrnd starts an Acrn-DM for launching each UOS, whose state
|
||||
changes from 'stopped' to 'running'.
|
||||
|
||||
Button-initiated VM Power off
|
||||
=============================
|
||||
|
||||
When SOS is about to shutdown, IOC powers off all VMs.
|
||||
SOS-Life-Cycle-Service delays the SOS shutdown operation using heartbeat,
|
||||
and waits for Acrnd to notify it can shutdown.
|
||||
|
||||
Acrnd keeps query states of all VMs. When all of them are 'stopped',
|
||||
it notifies SOS-Life-Cycle-Service. SOS-Life-Cycle-Service stops the send delay
|
||||
shutdown heartbeat, allowing SOS to continue the shutdown process.
|
||||
|
||||
RTC S3/S5 entry
|
||||
===============
|
||||
|
||||
UOS asks Acrnd to resume/restart itself later by sending an RTC timer request,
|
||||
and suspends/powers-off. SOS suspends/powers-off before that RTC
|
||||
timer expires. Acrnd stores the RTC resume/restart time to a file, and
|
||||
send the RTC timer request to SOS-Life-Cycle-Service.
|
||||
SOS-Life-Cycle-Service sets the RTC timer to IOC. Finally, the SOS is
|
||||
suspended/powered-off.
|
||||
|
||||
RTC S3/S5 exiting
|
||||
=================
|
||||
|
||||
SOS is resumed/started by IOC RTC timer. SOS-Life-Cycle-Service notifies
|
||||
Acrnd SOS has become alive again. Acrnd checks that the wakeup reason
|
||||
was because SOS is resumed/started by IOC RTC. It then reads UOS
|
||||
resume/restart time from the file, and resumes/restarts the UOS when
|
||||
time is expired.
|
||||
|
||||
VM State management
|
||||
*******************
|
||||
|
||||
Overview of VM State Management
|
||||
===============================
|
||||
|
||||
Management of VMs on SOS uses the
|
||||
SOS-Life-Cycle-Service, Acrnd, and Acrn-dm, working together and using
|
||||
Acrn-Manager-AIP as IPC interface.
|
||||
|
||||
* The Lifecycle-Service get the Wakeup-Reason from IOC controller. It can set
|
||||
different power cycle method, and RTC timer, by sending a heartbeat to IOC
|
||||
with proper data.
|
||||
|
||||
* The Acrnd get Wakeup Reason from Lifecycle-Service and forwards it to
|
||||
Acrn-dm. It coordinates the lifecycle of VMs and SOS and handles IOC-timed
|
||||
wakeup/poweron.
|
||||
|
||||
* Acrn-Dm is the device model of a VM running on SOS. Virtual IOC
|
||||
inside Acrn-DM is responsible to control VM power state, usually triggered by Acrnd.
|
||||
|
||||
SOS Life Cycle Service
|
||||
======================
|
||||
|
||||
SOS-Life-Cycle-Service (SOS-LCS) is a daemon service running on SOS.
|
||||
|
||||
SOS-LCS listens on ``/dev/cbc-lifecycle`` tty port to receive "wakeup
|
||||
reason" information from IOC controller. SOS-LCS keeps reading system
|
||||
status from IOC, to discover which power cycle method IOC is
|
||||
doing. SOS-LCS should reply a heartbeat to IOC. This heartbeat can tell
|
||||
IOC to keep doing this power cycle method, or change to another power
|
||||
cycle method. SOS-LCS heartbeat can also set RTC timer to IOC.
|
||||
|
||||
SOS-LCS handles SHUTDOWN, SUSPEND, and REBOOT acrn-manager messages
|
||||
request from Acrnd. When these messages are received, SOS-LCS switches IOC
|
||||
power cycle method to shutdown, suspend, and reboot, respectively.
|
||||
|
||||
SOS-LCS handles WAKEUP_REASON acrn-manager messages request from Acrnd.
|
||||
When it receives this message, SOS-LCS sends "wakeup reason" to Acrnd.
|
||||
|
||||
SOS-LCS handles RTC_TIMER acrn-manager messages request from Acrnd.
|
||||
When it receives this message, SOS-LCS setup IOC RTC timer for Acrnd.
|
||||
|
||||
SOS-LCS notifies Acrnd at the moment system becomes alive from other
|
||||
status.
|
||||
|
||||
Acrnd
|
||||
=====
|
||||
|
||||
Acrnd is a daemon service running on SOS.
|
||||
|
||||
Acrnd can start/resume VMs and query VM states for SOS-LCS, helping
|
||||
SOS-LCS to decide which power cycle method is right. It also helps UOS
|
||||
to be started/resumed by timer, required by S3/S5 feature.
|
||||
|
||||
Acrnd forwards wakeup reason to acrn-dm. Acrnd is responsible to retrieve
|
||||
wakeup reason from SOS-LCS service and attach the wakeup reason to
|
||||
acrn-dm parameter for ioc-dm.
|
||||
|
||||
When SOS is about to suspend/shutdown, SOS lifecycle service will send a
|
||||
request to Acrnd to guarantee all guest VMs are suspended or shutdown
|
||||
before SOS suspending/shutdown process continue. On receiving the
|
||||
request, Acrnd starts polling the guest VMs state, and notifies SOS
|
||||
lifecycle service when all guest VMs are put in proper state gracefully.
|
||||
|
||||
Guest UOS may need to
|
||||
resume/start in a future time for some tasks. To
|
||||
setup a timed resume/start, ioc-dm will send a request to acrnd to
|
||||
maintain a list of timed requests from guest VMs. acrnd selects the
|
||||
nearest request and sends it to SOS lifecycle service who will setup the
|
||||
physical IOC.
|
||||
|
||||
Acrn-DM
|
||||
=======
|
||||
|
||||
Acrn-Dm is the device model of VM running on SOS. Dm-IOC inside Acrn-DM
|
||||
operates virtual IOC to control VM power state, and collects VM power
|
||||
state information. Acrn-DM Monitor abstracts these Virtual IOC
|
||||
functions into monitor-vm-ops, and allows Acrnd to use them via
|
||||
Acrn-Manager IPC helper functions.
|
||||
|
||||
Acrn-manager IPC helper
|
||||
=======================
|
||||
|
||||
SOS-LCS, Acrnd, and Acrn-DM use sockets to do IPC. Acrn-Manager IPC helper API
|
||||
makes socket transparent for them. These are:
|
||||
|
||||
- int mngr_open_un() - create a descriptor for vm management IPC
|
||||
- void mngr_close() - close descriptor and release the resources
|
||||
- int mngr_add_handler() - add a handler for message specified by message
|
||||
- int mngr_send_msg() - send a message and wait for acknowledgement
|
4
doc/developer-guides/hld/hld-vsbl.rst
Normal file
@@ -0,0 +1,4 @@
|
||||
.. _hld-vsbl:
|
||||
|
||||
Virtual Slim-Bootloader high-level design
|
||||
#########################################
|
51
doc/developer-guides/hld/hv-config.rst
Normal file
@@ -0,0 +1,51 @@
|
||||
.. _hv-config:
|
||||
|
||||
Compile-time Configuration
|
||||
##########################
|
||||
|
||||
The hypervisor provides a kconfig-like way for manipulating compile-time
|
||||
configurations. Basically the hypervisor defines a set of configuration
|
||||
symbols and declare their default value. A configuration file is
|
||||
created, containing the values of each symbol, before building the
|
||||
sources.
|
||||
|
||||
Similar to Linux kconfig, there are three files involved:
|
||||
|
||||
- **.config** This files stores the values of all configuration
|
||||
symbols.
|
||||
|
||||
- **config.mk** This file is a conversion of .config in Makefile
|
||||
syntax, and can be included in makefiles so that the build
|
||||
process can rely on the configurations.
|
||||
|
||||
- **config.h** This file is a conversion of .config in C syntax, and is
|
||||
automatically included in every source file so that the values of
|
||||
the configuration symbols are available in the sources.
|
||||
|
||||
.. figure:: images/config-image103.png
|
||||
:align: center
|
||||
:name: config-build-workflow
|
||||
|
||||
Hypervisor configuration and build workflow
|
||||
|
||||
:numref:`config-build-workflow` shows the workflow of building the
|
||||
hypervisor:
|
||||
|
||||
1. Three targets are introduced for manipulating the configurations.
|
||||
|
||||
a. **defconfig** creates a .config based on a predefined
|
||||
configuration file.
|
||||
|
||||
b. **oldconfig** updates an existing .config after creating one if it
|
||||
does not exist.
|
||||
|
||||
c. **menuconfig** presents a terminal UI to navigate and modify the
|
||||
configurations in an interactive manner.
|
||||
|
||||
2. The target oldconfig is also used to create a .config if a .config
|
||||
file does not exist when building the source directly.
|
||||
|
||||
3. The other two files for makefiles and C sources are regenerated after
|
||||
.config changes.
|
||||
|
||||
Refer to :ref:`configuration` for a complete list of configuration symbols.
|
101
doc/developer-guides/hld/hv-console.rst
Normal file
@@ -0,0 +1,101 @@
|
||||
.. _hv-console-shell-uart:
|
||||
|
||||
Hypervisor console, hypervisor shell, and virtual UART
|
||||
######################################################
|
||||
|
||||
.. _hv-console:
|
||||
|
||||
Hypervisor console
|
||||
******************
|
||||
|
||||
The hypervisor console is a text-based terminal accessible from UART.
|
||||
:numref:`console-processing` shows the workflow of the console:
|
||||
|
||||
.. figure:: images/console-image93.png
|
||||
:align: center
|
||||
:name: console-processing
|
||||
|
||||
Periodic console processing
|
||||
|
||||
A periodic timer is set on initialization to trigger console processing every 40ms.
|
||||
Processing behavior depends on whether the vUART
|
||||
is active:
|
||||
|
||||
- If it is not active, the hypervisor shell is kicked to handle
|
||||
inputs from the physical UART, if there are any.
|
||||
|
||||
- If the vUART is active, the bytes from
|
||||
the physical UART are redirected to the RX fifo of the vUART, and those
|
||||
in the vUART TX fifo to the physical UART.
|
||||
|
||||
.. note:: The console is only available in the debug version of the hypervisor,
|
||||
configured at compile time. In the release version, the console is
|
||||
disabled and the physical UART is not used by the hypervisor or SOS.
|
||||
|
||||
Hypervisor shell
|
||||
****************
|
||||
|
||||
For debugging, the hypervisor shell provides commands to list some
|
||||
internal states and statistics of the hypervisor. It is accessible on
|
||||
the physical UART only when the vUART is deactivated. See
|
||||
:ref:`acrnshell` for the list of available hypervisor shell commands.
|
||||
|
||||
Virtual UART
|
||||
************
|
||||
|
||||
Currently UART 16550 is owned by the hypervisor itself and used for
|
||||
debugging purposes. Properties are configured by hypervisor command
|
||||
line. Hypervisor emulates a UART device with 0x3F8 address to SOS that
|
||||
acts as the console of SOS with these features:
|
||||
|
||||
- The vUART is exposed via I/O port 0x3f8.
|
||||
- Incorporate a 256-byte RX buffer and 65536 TX buffer.
|
||||
- Full emulation of input/output bytes and related interrupts.
|
||||
- For other read-write registers the value is stored without effects
|
||||
and reads get the latest stored value. For read-only registers
|
||||
writes are ignored.
|
||||
- vUART activation via shell command and deactivate via hotkey.
|
||||
|
||||
The following diagram shows the activation state transition of vUART.
|
||||
|
||||
.. figure:: images/console-image41.png
|
||||
:align: center
|
||||
|
||||
Periodic console processing
|
||||
|
||||
Specifically:
|
||||
|
||||
- After initialization vUART is disabled.
|
||||
- The vUART is activated after the command "vm_console" is executed on
|
||||
the hypervisor shell. Inputs to the physical UART will be
|
||||
redirected to the vUART starting from the next timer event.
|
||||
|
||||
- The vUART is deactivated after a :kbd:`Ctrl + Space` hotkey is received
|
||||
from the physical UART. Inputs to the physical UART will be
|
||||
handled by the hypervisor shell starting from the next timer
|
||||
event.
|
||||
|
||||
The workflows are described as follows:
|
||||
|
||||
- RX flow:
|
||||
|
||||
- Characters are read from UART HW into a sbuf whose size is 2048
|
||||
bytes, triggered by console_read
|
||||
|
||||
- Characters are read from this sbuf and put to rxFIFO,
|
||||
triggered by vuart_console_rx_chars
|
||||
|
||||
- A virtual interrupt is sent to SOS, triggered by a read from
|
||||
SOS. Characters in rxFIFO are sent to SOS by emulation of
|
||||
read of register UART16550_RBR
|
||||
|
||||
- TX flow:
|
||||
|
||||
- Characters are put to txFIFO by emulation of write of register
|
||||
UART16550_THR
|
||||
|
||||
- Characters in txFIFO are read out one by one and sent to console
|
||||
by printf, triggered by vuart_console_tx_chars
|
||||
|
||||
- Implementation of printf is based on console, which finally sends
|
||||
characters to UART HW by writing to register UART16550_RBR
|
1238
doc/developer-guides/hld/hv-cpu-virt.rst
Normal file
261
doc/developer-guides/hld/hv-dev-passthrough.rst
Normal file
@@ -0,0 +1,261 @@
|
||||
.. _hv-device-passthrough:
|
||||
|
||||
Device Passthrough
|
||||
##################
|
||||
|
||||
A critical part of virtualization is virtualizing devices: exposing all
|
||||
aspects of a device including its I/O, interrupts, DMA, and configuration.
|
||||
There are three typical device
|
||||
virtualization methods: emulation, para-virtualization, and passthrough.
|
||||
Both emulation and passthrough are used in ACRN project. Device
|
||||
emulation is discussed in :ref:`hld-io-emulation` and
|
||||
device passthrough will be discussed here.
|
||||
|
||||
In the ACRN project, device emulation means emulating all existing hardware
|
||||
resource through a software component device model running in the
|
||||
Service OS (SOS). Device
|
||||
emulation must maintain the same SW interface as a native device,
|
||||
providing transparency to the VM software stack. Passthrough implemented in
|
||||
hypervisor assigns a physical device to a VM so the VM can access
|
||||
the hardware device directly with minimal (if any) VMM involvement.
|
||||
|
||||
The difference between device emulation and passthrough is shown in
|
||||
:numref:`emu-passthru-diff`. You can notice device emulation has
|
||||
a longer access path which causes worse performance compared with
|
||||
passthrough. Passthrough can deliver near-native performance, but
|
||||
can't support device sharing.
|
||||
|
||||
.. figure:: images/passthru-image30.png
|
||||
:align: center
|
||||
:name: emu-passthru-diff
|
||||
|
||||
Difference between Emulation and passthrough
|
||||
|
||||
Passthrough in the hypervisor provides the following functionalities to
|
||||
allow VM to access PCI devices directly:
|
||||
|
||||
- DMA Remapping by VT-d for PCI device: hypervisor will setup DMA
|
||||
remapping during VM initialization phase.
|
||||
- MMIO Remapping between virtual and physical BAR
|
||||
- Device configuration Emulation
|
||||
- Remapping interrupts for PCI device
|
||||
- ACPI configuration Virtualization
|
||||
- GSI sharing violation check
|
||||
|
||||
The following diagram details passthrough initialization control flow in ACRN:
|
||||
|
||||
.. figure:: images/passthru-image22.png
|
||||
:align: center
|
||||
|
||||
Passthrough devices initialization control flow
|
||||
|
||||
Passthrough Device status
|
||||
*************************
|
||||
|
||||
Most common devices on supported platforms are enabled for
|
||||
passthrough, as detailed here:
|
||||
|
||||
.. figure:: images/passthru-image77.png
|
||||
:align: center
|
||||
|
||||
Passthrough Device Status
|
||||
|
||||
DMA Remapping
|
||||
*************
|
||||
|
||||
To enable passthrough, for VM DMA access the VM can only
|
||||
support GPA, while physical DMA requires HPA. One work-around
|
||||
is building identity mapping so that GPA is equal to HPA, but this
|
||||
is not recommended as some VM don't support relocation well. To
|
||||
address this issue, Intel introduces VT-d in chipset to add one
|
||||
remapping engine to translate GPA to HPA for DMA operations.
|
||||
|
||||
Each VT-d engine (DMAR Unit), maintains a remapping structure
|
||||
similar to a page table with device BDF (Bus/Dev/Func) as input and final
|
||||
page table for GPA/HPA translation as output. The GPA/HPA translation
|
||||
page table is similar to a normal multi-level page table.
|
||||
|
||||
VM DMA depends on Intel VT-d to do the translation from GPA to HPA, so we
|
||||
need to enable VT-d IOMMU engine in ACRN before we can passthrough any device. SOS
|
||||
in ACRN is a VM running in non-root mode which also depends
|
||||
on VT-d to access a device. In SOS DMA remapping
|
||||
engine settings, GPA is equal to HPA.
|
||||
|
||||
ACRN hypervisor checks DMA-Remapping Hardware unit Definition (DRHD) in
|
||||
host DMAR ACPI table to get basic info, then sets up each DMAR unit. For
|
||||
simplicity, ACRN reuses EPT table as the translation table in DMAR
|
||||
unit for each passthrough device. The control flow is shown in the
|
||||
following figures:
|
||||
|
||||
.. figure:: images/passthru-image72.png
|
||||
:align: center
|
||||
|
||||
DMA Remapping control flow during HV init
|
||||
|
||||
.. figure:: images/passthru-image86.png
|
||||
:align: center
|
||||
|
||||
ptdev assignment control flow
|
||||
|
||||
.. figure:: images/passthru-image42.png
|
||||
:align: center
|
||||
|
||||
ptdev de-assignment control flow
|
||||
|
||||
|
||||
MMIO Remapping
|
||||
**************
|
||||
|
||||
For PCI MMIO BAR, hypervisor builds EPT mapping between virtual BAR and
|
||||
physical BAR, then VM can access MMIO directly.
|
||||
|
||||
Device configuration emulation
|
||||
******************************
|
||||
|
||||
PCI configuration is based on access of port 0xCF8/CFC. ACRN
|
||||
implements PCI configuration emulation to handle 0xCF8/CFC to control
|
||||
PCI device through two paths: implemented in hypervisor or in SOS device
|
||||
model.
|
||||
|
||||
- When configuration emulation is in the hypervisor, the interception of
|
||||
0xCF8/CFC port and emulation of PCI configuration space access are
|
||||
tricky and unclean. Therefore the final solution is to reuse the
|
||||
PCI emulation infrastructure of SOS device model. The hypervisor
|
||||
routes the UOS 0xCF8/CFC access to device model, and keeps blind to the
|
||||
physical PCI devices. Upon receiving UOS PCI configuration space access
|
||||
request, device model needs to emulate some critical space, for instance,
|
||||
BAR, MSI capability, and INTLINE/INTPIN.
|
||||
|
||||
- For other access, device model
|
||||
reads/writes physical configuration space on behalf of UOS. To do
|
||||
this, device model is linked with lib pci access to access physical PCI
|
||||
device.
|
||||
|
||||
Interrupt Remapping
|
||||
*******************
|
||||
|
||||
When the physical interrupt of a passthrough device happens, hypervisor has
|
||||
to distribute it to the relevant VM according to interrupt remapping
|
||||
relationships. The structure ``ptirq_remapping_info`` is used to define
|
||||
the subordination relation between physical interrupt and VM, the
|
||||
virtual destination, etc. See the following figure for details:
|
||||
|
||||
.. figure:: images/passthru-image91.png
|
||||
:align: center
|
||||
|
||||
Remapping of physical interrupts
|
||||
|
||||
There are two different types of interrupt source: IOAPIC and MSI.
|
||||
The hypervisor will record different information for interrupt
|
||||
distribution: physical and virtual IOAPIC pin for IOAPIC source,
|
||||
physical and virtual BDF and other info for MSI source.
|
||||
|
||||
SOS passthrough is also in the scope of interrupt remapping which is
|
||||
done on-demand rather than on hypervisor initialization.
|
||||
|
||||
.. figure:: images/passthru-image102.png
|
||||
:align: center
|
||||
:name: init-remapping
|
||||
|
||||
Initialization of remapping of virtual IOAPIC interrupts for SOS
|
||||
|
||||
:numref:`init-remapping` above illustrates how remapping of (virtual) IOAPIC
|
||||
interrupts are remapped for SOS. VM exit occurs whenever SOS tries to
|
||||
unmask an interrupt in (virtual) IOAPIC by writing to the Redirection
|
||||
Table Entry (or RTE). The hypervisor then invokes the IOAPIC emulation
|
||||
handler (refer to :ref:`hld-io-emulation` for details on I/O emulation) which
|
||||
calls APIs to set up a remapping for the to-be-unmasked interrupt.
|
||||
|
||||
Remapping of (virtual) PIC interrupts are set up in a similar sequence:
|
||||
|
||||
.. figure:: images/passthru-image98.png
|
||||
:align: center
|
||||
|
||||
Initialization of remapping of virtual MSI for SOS
|
||||
|
||||
This figure illustrates how mappings of MSI or MSIX are set up for
|
||||
SOS. SOS is responsible for issuing an hypercall to notify the
|
||||
hypervisor before it configures the PCI configuration space to enable an
|
||||
MSI. The hypervisor takes this opportunity to set up a remapping for the
|
||||
given MSI or MSIX before it is actually enabled by SOS.
|
||||
|
||||
When the UOS needs to access the physical device by passthrough, it uses
|
||||
the following steps:
|
||||
|
||||
- UOS gets a virtual interrupt
|
||||
- VM exit happens and the trapped vCPU is the target where the interrupt
|
||||
will be injected.
|
||||
- Hypervisor will handle the interrupt and translate the vector
|
||||
according to ptirq_remapping_info.
|
||||
- Hypervisor delivers the interrupt to UOS.
|
||||
|
||||
When the SOS needs to use the physical device, the passthrough is also
|
||||
active because the SOS is the first VM. The detail steps are:
|
||||
|
||||
- SOS get all physical interrupts. It assigns different interrupts for
|
||||
different VMs during initialization and reassign when a VM is created or
|
||||
deleted.
|
||||
- When physical interrupt is trapped, an exception will happen after VMCS
|
||||
has been set.
|
||||
- Hypervisor will handle the vm exit issue according to
|
||||
ptirq_remapping_info and translates the vector.
|
||||
- The interrupt will be injected the same as a virtual interrupt.
|
||||
|
||||
ACPI Virtualization
|
||||
*******************
|
||||
|
||||
ACPI virtualization is designed in ACRN with these assumptions:
|
||||
|
||||
- HV has no knowledge of ACPI,
|
||||
- SOS owns all physical ACPI resources,
|
||||
- UOS sees virtual ACPI resources emulated by device model.
|
||||
|
||||
Some passthrough devices require physical ACPI table entry for
|
||||
initialization. The device model will create such device entry based on
|
||||
the physical one according to vendor ID and device ID. Virtualization is
|
||||
implemented in SOS device model and not in scope of the hypervisor.
|
||||
|
||||
GSI Sharing Violation Check
|
||||
***************************
|
||||
|
||||
All the PCI devices that are sharing the same GSI should be assigned to
|
||||
the same VM to avoid physical GSI sharing between multiple VMs. For
|
||||
devices that don't support MSI, ACRN DM
|
||||
shares the same GSI pin to a GSI
|
||||
sharing group. The devices in the same group should be assigned together to
|
||||
the current VM, otherwise, none of them should be assigned to the
|
||||
current VM. A device that violates the rule will be rejected to be
|
||||
passthrough. The checking logic is implemented in Device Mode and not
|
||||
in scope of hypervisor.
|
||||
|
||||
Data structures and interfaces
|
||||
******************************
|
||||
|
||||
The following APIs are provided to initialize interrupt remapping for
|
||||
SOS:
|
||||
|
||||
.. doxygenfunction:: ptirq_intx_pin_remap
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: ptirq_msix_remap
|
||||
:project: Project ACRN
|
||||
|
||||
The following APIs are provided to manipulate the interrupt remapping
|
||||
for UOS.
|
||||
|
||||
.. doxygenfunction:: ptirq_add_intx_remapping
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: ptirq_remove_intx_remapping
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: ptirq_add_msix_remapping
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: ptirq_remove_msix_remapping
|
||||
:project: Project ACRN
|
||||
|
||||
The following APIs are provided to acknowledge a virtual interrupt.
|
||||
|
||||
.. doxygenfunction:: ptirq_intx_ack
|
||||
:project: Project ACRN
|
21
doc/developer-guides/hld/hv-hypercall.rst
Normal file
@@ -0,0 +1,21 @@
|
||||
.. _hv-hypercall:
|
||||
|
||||
Hypercall / VHM upcall
|
||||
######################
|
||||
|
||||
HV currently supports hypercall APIs for VM management, I/O request
|
||||
distribution, and guest memory mapping.
|
||||
|
||||
HV and Service OS (SOS) also use vector 0xF7, reserved as x86 platform
|
||||
IPI vector for HV notification to SOS. This upcall is necessary whenever
|
||||
there is device emulation requirement to SOS. The upcall vector 0xF7 is
|
||||
injected to SOS vCPU0
|
||||
|
||||
SOS will register the irq handler for 0xF7 and notify the I/O emulation
|
||||
module in SOS once the irq is triggered.
|
||||
|
||||
|
||||
.. note:: Add API doc references for General interface, VM management
|
||||
interface, IRQ and Interrupts, Device Model IO request distribution,
|
||||
Guest memory management, PCI assignment and IOMMU, Debug, Trusty, Power
|
||||
management
|
423
doc/developer-guides/hld/hv-interrupt.rst
Normal file
@@ -0,0 +1,423 @@
|
||||
.. _interrupt-hld:
|
||||
|
||||
Physical Interrupt high-level design
|
||||
####################################
|
||||
|
||||
Overview
|
||||
********
|
||||
|
||||
The ACRN hypervisor implements a simple but fully functional framework
|
||||
to manage interrupts and exceptions, as show in
|
||||
:numref:`interrupt-modules-overview`. In its native layer, it configures
|
||||
the physical PIC, IOAPIC, and LAPIC to support different interrupt
|
||||
sources from local timer/IPI to external INTx/MSI. In its virtual guest
|
||||
layer, it emulates virtual PIC, virtual IOAPIC and virtual LAPIC, and
|
||||
provides full APIs allowing virtual interrupt injection from emulated or
|
||||
pass-thru devices.
|
||||
|
||||
.. figure:: images/interrupt-image3.png
|
||||
:align: center
|
||||
:width: 600px
|
||||
:name: interrupt-modules-overview
|
||||
|
||||
ACRN Interrupt Modules Overview
|
||||
|
||||
In the software modules view shown in :numref:`interrupt-sw-modules`,
|
||||
the ACRN hypervisor sets up the physical interrupt in its basic
|
||||
interrupt modules (e.g., IOAPIC/LAPIC/IDT). It dispatches the interrupt
|
||||
in the hypervisor interrupt flow control layer to the corresponding
|
||||
handlers, that could be pre-defined IPI notification, timer, or runtime
|
||||
registered pass-thru devices. The ACRN hypervisor then uses its VM
|
||||
interfaces based on vPIC, vIOAPIC, and vMSI modules, to inject the
|
||||
necessary virtual interrupt into the specific VM
|
||||
|
||||
.. figure:: images/interrupt-image2.png
|
||||
:align: center
|
||||
:width: 600px
|
||||
:name: interrupt-sw-modules
|
||||
|
||||
ACRN Interrupt SW Modules Overview
|
||||
|
||||
|
||||
The hypervisor implements the following functionalities for handling
|
||||
physical interrupts:
|
||||
|
||||
- Configure interrupt-related hardware including IDT, PIC, LAPIC, and
|
||||
IOAPIC on startup.
|
||||
|
||||
- Provide APIs to manipulate the registers of LAPIC and IOAPIC.
|
||||
|
||||
- Acknowledge physical interrupts.
|
||||
|
||||
- Set up a callback mechanism for the other components in the
|
||||
hypervisor to request for an interrupt vector and register a
|
||||
handler for that interrupt.
|
||||
|
||||
HV owns all native physical interrupts and manages 256 vectors per CPU.
|
||||
All physical interrupts are first handled in VMX root-mode. The
|
||||
"external-interrupt exiting" bit in VM-Execution controls field is set
|
||||
to support this. The ACRN hypervisor also initializes all the interrupt
|
||||
related modules like IDT, PIC, IOAPIC, and LAPIC.
|
||||
|
||||
HV does not own any host devices (except UART). All devices are by
|
||||
default assigned to SOS. Any interrupts received by Guest VM (SOS or
|
||||
UOS) device drivers are virtual interrupts injected by HV (via vLAPIC).
|
||||
HV manages a Host-to-Guest mapping. When a native IRQ/interrupt occurs,
|
||||
HV decides whether this IRQ/interrupt should be forwarded to a VM and
|
||||
which VM to forward to (if any). Refer to section 3.7.6 for virtual
|
||||
interrupt injection and section 3.9.6 for the management of interrupt
|
||||
remapping.
|
||||
|
||||
HV does not own any exceptions. Guest VMCS are configured so no VM Exit
|
||||
happens, with some exceptions such as #INT3 and #MC. This is to
|
||||
simplify the design as HV does not support any exception handling
|
||||
itself. HV supports only static memory mapping, so there should be no
|
||||
#PF or #GP. If HV receives an exception indicating an error, an assert
|
||||
function is then executed with an error message print out, and the
|
||||
system then halts.
|
||||
|
||||
Native interrupts could be generated from one of the following
|
||||
sources:
|
||||
|
||||
- GSI interrupts
|
||||
|
||||
- PIC or Legacy devices IRQ (0~15)
|
||||
- IOAPIC pin
|
||||
|
||||
- PCI MSI/MSI-X vectors
|
||||
- Inter CPU IPI
|
||||
- LAPIC timer
|
||||
|
||||
Physical Interrupt Initialization
|
||||
*********************************
|
||||
|
||||
After ACRN hypervisor gets control from the bootloader, it
|
||||
initializes all physical interrupt-related modules for all the CPUs. ACRN
|
||||
hypervisor creates a framework to manage the physical interrupt for
|
||||
hypervisor local devices, pass-thru devices, and IPI between CPUs, as
|
||||
shown in :numref:`hv-interrupt-init`:
|
||||
|
||||
.. figure:: images/interrupt-image66.png
|
||||
:align: center
|
||||
:name: hv-interrupt-init
|
||||
|
||||
Physical Interrupt Initialization
|
||||
|
||||
IDT Initialization
|
||||
==================
|
||||
|
||||
ACRN hypervisor builds its native IDT (interrupt descriptor table)
|
||||
during interrupt initialization and set up the following handlers:
|
||||
|
||||
- On an exception, the hypervisor dumps its context and halts the current
|
||||
physical processor (because physical exceptions are not expected).
|
||||
|
||||
- For external interrupts, HV may mask the interrupt (depending on the
|
||||
trigger mode), followed by interrupt acknowledgement and dispatch
|
||||
to the registered handler, if any.
|
||||
|
||||
Most interrupts and exceptions are handled without a stack switch,
|
||||
except for machine-check, double fault, and stack fault exceptions which
|
||||
have their own stack set in TSS.
|
||||
|
||||
PIC/IOAPIC Initialization
|
||||
=========================
|
||||
|
||||
ACRN hypervisor masks all interrupts from the PIC. All legacy interrupts
|
||||
from PIC (<16) will be linked to IOAPIC, as shown in the connections in
|
||||
:numref:`hv-pic-config`.
|
||||
|
||||
ACRN will pre-allocate vectors and mask them for these legacy interrupt
|
||||
in IOAPIC RTE. For others (>= 16), ACRN will mask them with vector 0 in
|
||||
RTE, and the vector will be dynamically allocate on demand.
|
||||
|
||||
All external IOAPIC pins are categorized as GSI interrupt according to
|
||||
ACPI definition. HV supports multiple IOAPIC components. IRQ PIN to GSI
|
||||
mappings are maintained internally to determine GSI source IOAPIC.
|
||||
Native PIC is not used in the system.
|
||||
|
||||
.. figure:: images/interrupt-image46.png
|
||||
:align: center
|
||||
:name: hv-pic-config
|
||||
|
||||
HV PIC/IOAPIC/LAPIC configuration
|
||||
|
||||
LAPIC Initialization
|
||||
====================
|
||||
|
||||
Physical LAPICs are in xAPIC mode in ACRN hypervisor. The hypervisor
|
||||
initializes LAPIC for each physical CPU by masking all interrupts in the
|
||||
local vector table (LVT), clearing all ISRs, and enabling LAPIC.
|
||||
|
||||
APIs are provided to access LAPIC for the other components in the
|
||||
hypervisor, aiming for further usage of local timer (TSC Deadline)
|
||||
program, IPI notification program, etc. See :ref:`hv_interrupt-data-api`
|
||||
for a complete list.
|
||||
|
||||
HV Interrupt Vectors and Delivery Mode
|
||||
======================================
|
||||
|
||||
The interrupt vectors are assigned as shown here:
|
||||
|
||||
**Vector 0-0x1F**
|
||||
are exceptions that are not handled by HV. If
|
||||
such an exception does occur, the system then halts.
|
||||
|
||||
**Vector: 0x20-0x2F**
|
||||
are allocated statically for legacy IRQ0-15.
|
||||
|
||||
**Vector: 0x30-0xDF**
|
||||
are dynamically allocated vectors for PCI devices
|
||||
INTx or MSI/MIS-X usage. According to different interrupt delivery mode
|
||||
(FLAT or PER_CPU mode), an interrupt will be assigned to a vector for
|
||||
all the CPUs or a particular CPU.
|
||||
|
||||
**Vector: 0xE0-0xFE**
|
||||
are high priority vectors reserved by HV for
|
||||
dedicated purposes. For example, 0xEF is used for timer, 0xF0 is used
|
||||
for IPI.
|
||||
|
||||
.. list-table::
|
||||
:widths: 30 70
|
||||
:header-rows: 1
|
||||
|
||||
* - Vectors
|
||||
- Usage
|
||||
|
||||
* - 0x0-0x13
|
||||
- Exceptions: NMI, INT3, page dault, GP, debug.
|
||||
|
||||
* - 0x14-0x1F
|
||||
- Reserved
|
||||
|
||||
* - 0x20-0x2F
|
||||
- Statically allocated for external IRQ (IRQ0-IRQ15)
|
||||
|
||||
* - 0x30-0xDF
|
||||
- Dynamically allocated for IOAPIC IRQ from PCI INTx/MSI
|
||||
|
||||
* - 0xE0-0xFE
|
||||
- Static allocated for HV
|
||||
|
||||
* - 0xEF
|
||||
- Timer
|
||||
|
||||
* - 0xF0
|
||||
- IPI
|
||||
|
||||
* - 0xFF
|
||||
- SPURIOUS_APIC_VECTOR
|
||||
|
||||
Interrupts from either IOAPIC or MSI can be delivered to a target CPU.
|
||||
By default they are configured as Lowest Priority (FLAT mode), i.e. they
|
||||
are delivered to a CPU core that is currently idle or executing lowest
|
||||
priority ISR. There is no guarantee a device's interrupt will be
|
||||
delivered to a specific Guest's CPU. Timer interrupts are an exception -
|
||||
these are always delivered to the CPU which programs the LAPIC timer.
|
||||
|
||||
There are two interrupt delivery modes: FLAT mode and PER_CPU mode. ACRN
|
||||
uses FLAT MODE where the interrupt/irq to vector mapping is the same on all CPUs. Every
|
||||
CPU receives same interrupts. IOAPIC and LAPIC MSI delivery mode are
|
||||
configured to Lowest Priority.
|
||||
|
||||
Vector allocation for CPUs is shown here:
|
||||
|
||||
.. figure:: images/interrupt-image89.png
|
||||
:align: center
|
||||
|
||||
FLAT mode vector allocation
|
||||
|
||||
IRQ Descriptor Table
|
||||
====================
|
||||
|
||||
ACRN hypervisor maintains a global IRQ Descriptor Table shared among the
|
||||
physical CPUs. ACRN use FLAT MODE to manage the interrupts so the
|
||||
same vector will link to same the IRQ number for all CPUs.
|
||||
|
||||
.. note:: need to reference API doc for irq_desc
|
||||
|
||||
|
||||
The *irq_desc[]* array's index represents IRQ number. An *irq_handler*
|
||||
field could be set to common edge/level/quick handler which will be
|
||||
called from *interrupt_dispatch*. The *irq_desc* structure also
|
||||
contains the *dev_list* field to maintain this IRQ's action handler
|
||||
list.
|
||||
|
||||
Another reverse mapping from vector to IRQ is used in addition to the
|
||||
IRQ descriptor table which maintains the mapping from IRQ to vector.
|
||||
|
||||
On initialization, the descriptor of the legacy IRQs are initialized with
|
||||
proper vectors and the corresponding reverse mapping is set up.
|
||||
The descriptor of other IRQs are filled with an invalid
|
||||
vector which will be updated on IRQ allocation.
|
||||
|
||||
For example, if local timer registers an interrupt with IRQ number 271 and
|
||||
vector 0xEF, then this date will be set up:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
irq_desc[271].irq = 271
|
||||
irq_desc[271].vector = 0xEF
|
||||
vector_to_irq[0xEF] = 271
|
||||
|
||||
External Interrupt Handling
|
||||
***************************
|
||||
|
||||
CPU runs under VMX non-root mode and inside Guest VMs.
|
||||
``MSR_IA32_VMX_PINBASED_CTLS.bit[0]`` and
|
||||
``MSR_IA32_VMX_EXIT_CTLS.bit[15]`` are set to allow vCPU VM Exit to HV
|
||||
whenever there are interrupts to that physical CPU under
|
||||
non-root mode. HV ACKs the interrupts in VMX non-root and saves the
|
||||
interrupt vector to the relevant VM Exit field for HV IRQ processing.
|
||||
|
||||
Note that as discussed above, an external interrupt causing vCPU VM Exit
|
||||
to HV does not mean that the interrupt belongs to that Guest VM. When
|
||||
CPU executes VM Exit into root-mode, interrupt handling will be enabled
|
||||
and the interrupt will be delivered and processed as quickly as possible
|
||||
inside HV. HV may emulate a virtual interrupt and inject to Guest if
|
||||
necessary.
|
||||
|
||||
When an physical interrupt happened on a CPU, this CPU could be running
|
||||
under VMX root mode or non-root mode. If the CPU is running under VMX
|
||||
root mode, the interrupt is triggered from standard native IRQ flow -
|
||||
interrupt gate to IRQ handler. If the CPU is running under VMX non-root
|
||||
mode, an external interrupt will trigger a VM exit for reason
|
||||
"external-interrupt".
|
||||
|
||||
Interrupt and IRQ processing flow diagrams are shown below:
|
||||
|
||||
.. figure:: images/interrupt-image48.png
|
||||
:align: center
|
||||
:name: phy-interrupt-processing
|
||||
|
||||
Processing of physical interrupts
|
||||
|
||||
.. figure:: images/interrupt-image39.png
|
||||
:align: center
|
||||
|
||||
IRQ processing control flow
|
||||
|
||||
When a physical interrupt is raised and delivered to a physical CPU, the
|
||||
CPU may be running under either VMX root mode or non-root mode.
|
||||
|
||||
- If the CPU is running under VMX root mode, the interrupt is handled
|
||||
following the standard native IRQ flow: interrupt gate to
|
||||
dispatch_interrupt(), IRQ handler, and finally the registered callback.
|
||||
- If the CPU is running under VMX non-root mode, an external interrupt
|
||||
calls a VM exit for reason "external-interrupt", and then the VM
|
||||
exit processing flow will call dispatch_interrupt() to dispatch and
|
||||
handle the interrupt.
|
||||
|
||||
After an interrupt occurs from either path shown in
|
||||
:numref:`phy-interrupt-processing`, ACRN hypervisor will jump to
|
||||
dispatch_interrupt. This function gets the vector of the generated
|
||||
interrupt from the context, gets IRQ number from vector_to_irq[], and
|
||||
then gets the corresponding irq_desc.
|
||||
|
||||
Though there is only one generic IRQ handler for registered interrupt,
|
||||
there are three different handling flows according to flags:
|
||||
|
||||
- ``!IRQF_LEVEL``
|
||||
- ``IRQF_LEVEL && !IRQF_PT``
|
||||
|
||||
To avoid continuous interrupt triggers, it masks the IOAPIC pin and
|
||||
unmask it only after IRQ action callback is executed
|
||||
|
||||
- ``IRQF_LEVEL && IRQF_PT``
|
||||
|
||||
For pass-thru devices, to avoid continuous interrupt triggers, it masks
|
||||
the IOAPIC pin and leaves it unmasked until corresponding vIOAPIC
|
||||
pin gets an explicit EOI ACK from guest.
|
||||
|
||||
Since interrupts are not shared for multiple devices, there is only one
|
||||
IRQ action registered for each interrupt
|
||||
|
||||
The IRQ number inside HV is a software concept to identify GSI and
|
||||
Vectors. Each GSI will be mapped to one IRQ. The GSI number is usually the same
|
||||
as the IRQ number. IRQ numbers greater than max GSI (nr_gsi) number are dynamically
|
||||
assigned. For example, HV allocates an interrupt vector to a PCI device,
|
||||
an IRQ number is then assigned to that vector. When the vector later
|
||||
reaches a CPU, the corresponding IRQ routine is located and executed.
|
||||
|
||||
See :numref:`request-irq` for request IRQ control flow for different
|
||||
conditions:
|
||||
|
||||
.. figure:: images/interrupt-image76.png
|
||||
:align: center
|
||||
:name: request-irq
|
||||
|
||||
Request IRQ for different conditions
|
||||
|
||||
.. _ipi-management:
|
||||
|
||||
IPI Management
|
||||
**************
|
||||
|
||||
The only purpose of IPI use in HV is to kick a vCPU out of non-root mode
|
||||
and enter to HV mode. This requires I/O request and virtual interrupt
|
||||
injection be distributed to different IPI vectors. The I/O request uses
|
||||
IPI vector 0xF4 upcall (refer to Chapter 5.4). The virtual interrupt
|
||||
injection uses IPI vector 0xF0.
|
||||
|
||||
0xF4 upcall
|
||||
A Guest vCPU VM Exit exits due to EPT violation or IO instruction trap.
|
||||
It requires Device Module to emulate the MMIO/PortIO instruction.
|
||||
However it could be that the Service OS (SOS) vCPU0 is still in non-root
|
||||
mode. So an IPI (0xF4 upcall vector) should be sent to the physical CPU0
|
||||
(with non-root mode as vCPU0 inside SOS) to force vCPU0 to VM Exit due
|
||||
to the external interrupt. The virtual upcall vector is then injected to
|
||||
SOS, and the vCPU0 inside SOS then will pick up the IO request and do
|
||||
emulation for other Guest.
|
||||
|
||||
0xF0 IPI flow
|
||||
If Device Module inside SOS needs to inject an interrupt to other Guest
|
||||
such as vCPU1, it will issue an IPI first to kick CPU1 (assuming CPU1 is
|
||||
running on vCPU1) to root-hv_interrupt-data-apmode. CPU1 will inject the
|
||||
interrupt before VM Enter.
|
||||
|
||||
.. _hv_interrupt-data-api:
|
||||
|
||||
Data structures and interfaces
|
||||
******************************
|
||||
|
||||
IOAPIC
|
||||
======
|
||||
|
||||
The following APIs are external interfaces for IOAPIC related
|
||||
operations.
|
||||
|
||||
.. doxygengroup:: ioapic_ext_apis
|
||||
:project: Project ACRN
|
||||
:content-only:
|
||||
|
||||
|
||||
LAPIC
|
||||
=====
|
||||
|
||||
The following APIs are external interfaces for LAPIC related operations.
|
||||
|
||||
.. doxygengroup:: lapic_ext_apis
|
||||
:project: Project ACRN
|
||||
:content-only:
|
||||
|
||||
|
||||
IPI
|
||||
===
|
||||
|
||||
The following APIs are external interfaces for IPI related operations.
|
||||
|
||||
.. doxygengroup:: ipi_ext_apis
|
||||
:project: Project ACRN
|
||||
:content-only:
|
||||
|
||||
|
||||
Physical Interrupt
|
||||
==================
|
||||
|
||||
The following APIs are external interfaces for physical interrupt
|
||||
related operations.
|
||||
|
||||
.. doxygengroup:: phys_int_ext_apis
|
||||
:project: Project ACRN
|
||||
:content-only:
|
||||
|
329
doc/developer-guides/hld/hv-io-emulation.rst
Normal file
@@ -0,0 +1,329 @@
|
||||
.. _hld-io-emulation:
|
||||
|
||||
I/O Emulation high-level design
|
||||
###############################
|
||||
|
||||
As discussed in :ref:`intro-io-emulation`, there are multiple ways and
|
||||
places to handle I/O emulation, including HV, SOS Kernel VHM, and SOS
|
||||
user-land device model (acrn-dm).
|
||||
|
||||
I/O emulation in the hypervisor provides these functionalities:
|
||||
|
||||
- Maintain lists of port I/O or MMIO handlers in the hypervisor for
|
||||
emulating trapped I/O accesses in a certain range.
|
||||
|
||||
- Forward I/O accesses to SOS when they cannot be handled by the
|
||||
hypervisor by any registered handlers.
|
||||
|
||||
:numref:`io-control-flow` illustrates the main control flow steps of I/O emulation
|
||||
inside the hypervisor:
|
||||
|
||||
1. Trap and decode I/O access by VM exits and decode the access from
|
||||
exit qualification or by invoking the instruction decoder.
|
||||
|
||||
2. If the range of the I/O access overlaps with any registered handler,
|
||||
call that handler if it completely covers the range of the
|
||||
access, or ignore the access if the access crosses the boundary.
|
||||
|
||||
3. If the range of the I/O access does not overlap the range of any I/O
|
||||
handler, deliver an I/O request to SOS.
|
||||
|
||||
.. figure:: images/ioem-image101.png
|
||||
:align: center
|
||||
:name: io-control-flow
|
||||
|
||||
Control flow of I/O emulation in the hypervisor
|
||||
|
||||
I/O emulation does not rely on any calibration data.
|
||||
|
||||
Trap Path
|
||||
*********
|
||||
|
||||
Port I/O accesses are trapped by VM exits with the basic exit reason
|
||||
"I/O instruction". The port address to be accessed, size, and direction
|
||||
(read or write) are fetched from the VM exit qualification. For writes
|
||||
the value to be written to the I/O port is fetched from guest registers
|
||||
al, ax or eax, depending on the access size.
|
||||
|
||||
MMIO accesses are trapped by VM exits with the basic exit reason "EPT
|
||||
violation". The instruction emulator is invoked to decode the
|
||||
instruction that triggers the VM exit to get the memory address being
|
||||
accessed, size, direction (read or write), and the involved register.
|
||||
|
||||
The I/O bitmaps and EPT are used to configure the addresses that will
|
||||
trigger VM exits when accessed by a VM. Refer to
|
||||
:ref:`io-mmio-emulation` for details.
|
||||
|
||||
I/O Emulation in the Hypervisor
|
||||
*******************************
|
||||
|
||||
When a port I/O or MMIO access is trapped, the hypervisor first checks
|
||||
whether the to-be-accessed address falls in the range of any registered
|
||||
handler, and calls the handler when such a handler exists.
|
||||
|
||||
Handler Management
|
||||
==================
|
||||
|
||||
Each VM has two lists of I/O handlers, one for port I/O and the other
|
||||
for MMIO. Each element of the list contains a memory range and a pointer
|
||||
to the handler which emulates the accesses falling in the range. See
|
||||
:ref:`io-handler-init` for descriptions of the related data structures.
|
||||
|
||||
The I/O handlers are registered on VM creation and never changed until
|
||||
the destruction of that VM, when the handlers are unregistered. If
|
||||
multiple handlers are registered for the same address, the one
|
||||
registered later wins. See :ref:`io-handler-init` for the interfaces
|
||||
used to register and unregister I/O handlers.
|
||||
|
||||
I/O Dispatching
|
||||
===============
|
||||
|
||||
When a port I/O or MMIO access is trapped, the hypervisor first walks
|
||||
through the corresponding I/O handler list in the reverse order of
|
||||
registration, looking for a proper handler to emulate the access. The
|
||||
following cases exist:
|
||||
|
||||
- If a handler whose range overlaps the range of the I/O access is
|
||||
found,
|
||||
|
||||
- If the range of the I/O access falls completely in the range the
|
||||
handler can emulate, that handler is called.
|
||||
|
||||
- Otherwise it is implied that the access crosses the boundary of
|
||||
multiple devices which the hypervisor does not emulate. Thus
|
||||
no handler is called and no I/O request will be delivered to
|
||||
SOS. I/O reads get all 1's and I/O writes are dropped.
|
||||
|
||||
- If the range of the I/O access does not overlap with any range of the
|
||||
handlers, the I/O access is delivered to SOS as an I/O request
|
||||
for further processing.
|
||||
|
||||
I/O Requests
|
||||
************
|
||||
|
||||
An I/O request is delivered to SOS vCPU 0 if the hypervisor does not
|
||||
find any handler that overlaps the range of a trapped I/O access. This
|
||||
section describes the initialization of the I/O request mechanism and
|
||||
how an I/O access is emulated via I/O requests in the hypervisor.
|
||||
|
||||
Initialization
|
||||
==============
|
||||
|
||||
For each UOS the hypervisor shares a page with SOS to exchange I/O
|
||||
requests. The 4-KByte page consists of 16 256-Byte slots, indexed by
|
||||
vCPU ID. It is required for the DM to allocate and set up the request
|
||||
buffer on VM creation, otherwise I/O accesses from UOS cannot be
|
||||
emulated by SOS, and all I/O accesses not handled by the I/O handlers in
|
||||
the hypervisor will be dropped (reads get all 1's).
|
||||
|
||||
Refer to Section 4.4.1 for the details of I/O requests and the
|
||||
initialization of the I/O request buffer.
|
||||
|
||||
Types of I/O Requests
|
||||
=====================
|
||||
|
||||
There are four types of I/O requests:
|
||||
|
||||
.. list-table::
|
||||
:widths: 50 50
|
||||
:header-rows: 1
|
||||
|
||||
* - I/O Request Type
|
||||
- Description
|
||||
|
||||
* - PIO
|
||||
- A port I/O access.
|
||||
|
||||
* - MMIO
|
||||
- A MMIO access to a GPA with no mapping in EPT.
|
||||
|
||||
* - PCI
|
||||
- A PCI configuration space access.
|
||||
|
||||
* - WP
|
||||
- A MMIO access to a GPA with a read-only mapping in EPT.
|
||||
|
||||
|
||||
For port I/O accesses, the hypervisor will always deliver an I/O request
|
||||
of type PIO to SOS. For MMIO accesses, the hypervisor will deliver an
|
||||
I/O request of either MMIO or WP, depending on the mapping of the
|
||||
accessed address (in GPA) in the EPT of the vCPU. The hypervisor will
|
||||
never deliver any I/O request of type PCI, but will handle such I/O
|
||||
requests in the same ways as port I/O accesses on their completion.
|
||||
|
||||
Refer to :ref:`io-structs-interfaces` for a detailed description of the
|
||||
data held by each type of I/O request.
|
||||
|
||||
I/O Request State Transitions
|
||||
=============================
|
||||
|
||||
Each slot in the I/O request buffer is managed by a finite state machine
|
||||
with four states. The following figure illustrates the state transitions
|
||||
and the events that trigger them.
|
||||
|
||||
.. figure:: images/ioem-image92.png
|
||||
:align: center
|
||||
|
||||
State Transition of I/O Requests
|
||||
|
||||
The four states are:
|
||||
|
||||
FREE
|
||||
The I/O request slot is not used and new I/O requests can be
|
||||
delivered. This is the initial state on UOS creation.
|
||||
|
||||
PENDING
|
||||
The I/O request slot is occupied with an I/O request pending
|
||||
to be processed by SOS.
|
||||
|
||||
PROCESSING
|
||||
The I/O request has been dispatched to a client but the
|
||||
client has not finished handling it yet.
|
||||
|
||||
COMPLETE
|
||||
The client has completed the I/O request but the hypervisor
|
||||
has not consumed the results yet.
|
||||
|
||||
The contents of an I/O request slot are owned by the hypervisor when the
|
||||
state of an I/O request slot is FREE or COMPLETE. In such cases SOS can
|
||||
only access the state of that slot. Similarly the contents are owned by
|
||||
SOS when the state is PENDING or PROCESSING, when the hypervisor can
|
||||
only access the state of that slot.
|
||||
|
||||
The states are transferred as follow:
|
||||
|
||||
1. To deliver an I/O request, the hypervisor takes the slot
|
||||
corresponding to the vCPU triggering the I/O access, fills the
|
||||
contents, changes the state to PENDING and notifies SOS via
|
||||
upcall.
|
||||
|
||||
2. On upcalls, SOS dispatches each I/O request in the PENDING state to
|
||||
clients and changes the state to PROCESSING.
|
||||
|
||||
3. The client assigned an I/O request changes the state to COMPLETE
|
||||
after it completes the emulation of the I/O request. A hypercall
|
||||
is made to notify the hypervisor on I/O request completion after
|
||||
the state change.
|
||||
|
||||
4. The hypervisor finishes the post-work of a I/O request after it is
|
||||
notified on its completion and change the state back to FREE.
|
||||
|
||||
States are accessed using atomic operations to avoid getting unexpected
|
||||
states on one core when it is written on another.
|
||||
|
||||
Note that there is no state to represent a 'failed' I/O request. SOS
|
||||
should return all 1's for reads and ignore writes whenever it cannot
|
||||
handle the I/O request, and change the state of the request to COMPLETE.
|
||||
|
||||
Post-work
|
||||
=========
|
||||
|
||||
After an I/O request is completed, some more work needs to be done for
|
||||
I/O reads to update guest registers accordingly. Currently the
|
||||
hypervisor re-enters the vCPU thread every time a vCPU is scheduled back
|
||||
in, rather than switching to where the vCPU is scheduled out. As a result,
|
||||
post-work is introduced for this purpose.
|
||||
|
||||
The hypervisor pauses a vCPU before an I/O request is delivered to SOS.
|
||||
Once the I/O request emulation is completed, a client notifies the
|
||||
hypervisor by a hypercall. The hypervisor will pick up that request, do
|
||||
the post-work, and resume the guest vCPU. The post-work takes care of
|
||||
updating the vCPU guest state to reflect the effect of the I/O reads.
|
||||
|
||||
.. figure:: images/ioem-image100.png
|
||||
:align: center
|
||||
|
||||
Workflow of MMIO I/O request completion
|
||||
|
||||
The figure above illustrates the workflow to complete an I/O
|
||||
request for MMIO. Once the I/O request is completed, SOS makes a
|
||||
hypercall to notify the hypervisor which resumes the UOS vCPU triggering
|
||||
the access after requesting post-work on that vCPU. After the UOS vCPU
|
||||
resumes, it does the post-work first to update the guest registers if
|
||||
the access reads an address, changes the state of the corresponding I/O
|
||||
request slot to FREE, and continues execution of the vCPU.
|
||||
|
||||
.. figure:: images/ioem-image106.png
|
||||
:align: center
|
||||
:name: port-io-completion
|
||||
|
||||
Workflow of port I/O request completion
|
||||
|
||||
Completion of a port I/O request (shown in :numref:`port-io-completion`
|
||||
above) is
|
||||
similar to the MMIO case, except the post-work is done before resuming
|
||||
the vCPU. This is because the post-work for port I/O reads need to update
|
||||
the general register eax of the vCPU, while the post-work for MMIO reads
|
||||
need further emulation of the trapped instruction. This is much more
|
||||
complex and may impact the performance of SOS.
|
||||
|
||||
.. _io-structs-interfaces:
|
||||
|
||||
Data Structures and Interfaces
|
||||
******************************
|
||||
|
||||
External Interfaces
|
||||
===================
|
||||
|
||||
The following structures represent an I/O request. *struct vhm_request*
|
||||
is the main structure and the others are detailed representations of I/O
|
||||
requests of different kinds. Refer to Section 4.4.4 for the usage of
|
||||
*struct pci_request*.
|
||||
|
||||
.. doxygenstruct:: mmio_request
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenstruct:: pio_request
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenstruct:: pci_request
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenunion:: vhm_io_request
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenstruct:: vhm_request
|
||||
:project: Project ACRN
|
||||
|
||||
For hypercalls related to I/O emulation, refer to Section 3.11.4.
|
||||
|
||||
.. _io-handler-init:
|
||||
|
||||
Initialization and Deinitialization
|
||||
===================================
|
||||
|
||||
The following structure represents a port I/O handler:
|
||||
|
||||
.. doxygenstruct:: vm_io_handler_desc
|
||||
:project: Project ACRN
|
||||
|
||||
The following structure represents a MMIO handler.
|
||||
|
||||
.. doxygenstruct:: mem_io_node
|
||||
:project: Project ACRN
|
||||
|
||||
The following APIs are provided to initialize, deinitialize or configure
|
||||
I/O bitmaps and register or unregister I/O handlers:
|
||||
|
||||
.. doxygenfunction:: allow_guest_pio_access
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: register_pio_emulation_handler
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: register_mmio_emulation_handler
|
||||
:project: Project ACRN
|
||||
|
||||
I/O Emulation
|
||||
=============
|
||||
|
||||
The following APIs are provided for I/O emulation at runtime:
|
||||
|
||||
.. doxygenfunction:: acrn_insert_request
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: pio_instr_vmexit_handler
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: ept_violation_vmexit_handler
|
||||
:project: Project ACRN
|
728
doc/developer-guides/hld/hv-ioc-virt.rst
Normal file
@@ -0,0 +1,728 @@
|
||||
.. _IOC_virtualization_hld:
|
||||
|
||||
IOC Virtualization high-level design
|
||||
####################################
|
||||
|
||||
|
||||
.. author: Yuan Liu
|
||||
|
||||
The I/O Controller (IOC) is an SoC bridge we can use to communicate
|
||||
with a Vehicle Bus in automotive applications, routing Vehicle Bus
|
||||
signals, such as those extracted from CAN messages, from the IOC to the
|
||||
SoC and back, as well as signals the SoC uses to control onboard
|
||||
peripherals.
|
||||
|
||||
.. note::
|
||||
NUC and UP2 platforms do not support IOC hardware, and as such, IOC
|
||||
virtualization is not supported on these platforms.
|
||||
|
||||
The main purpose of IOC virtualization is to transfer data between
|
||||
native Carrier Board Communication (CBC) char devices and a virtual
|
||||
UART. IOC virtualization is implemented as full virtualization so the
|
||||
user OS can directly reuse native CBC driver.
|
||||
|
||||
The IOC Mediator has several virtualization requirements, such as S3/S5
|
||||
wakeup reason emulation, CBC link frame packing/unpacking, signal
|
||||
whitelist, and RTC configuration.
|
||||
|
||||
IOC Mediator Design
|
||||
*******************
|
||||
|
||||
Architecture Diagrams
|
||||
=====================
|
||||
|
||||
IOC introduction
|
||||
----------------
|
||||
|
||||
.. figure:: images/ioc-image12.png
|
||||
:width: 600px
|
||||
:align: center
|
||||
:name: ioc-mediator-arch
|
||||
|
||||
IOC Mediator Architecture
|
||||
|
||||
- Vehicle Bus communication involves a wide range of individual signals
|
||||
to be used, varying from single GPIO signals on the IOC up to
|
||||
complete automotive networks that connect many external ECUs.
|
||||
- IOC (I/O controller) is an SoC bridge to communicate with a Vehicle
|
||||
Bus. It routes Vehicle Bus signals (extracted from CAN
|
||||
messages for example) back and forth between the IOC and SoC. It also
|
||||
controls the onboard peripherals from the SoC.
|
||||
- IOC is always turned on. The power supply of the SoC and its memory are
|
||||
controlled by the IOC. IOC monitors some wakeup reason to control SoC
|
||||
lifecycle-related features.
|
||||
- Some hardware signals are connected to the IOC, allowing the SoC to control
|
||||
them.
|
||||
- Besides, there is one NVM (Non-Volatile Memory) that is connected to
|
||||
IOC for storing persistent data. The IOC is in charge of accessing NVM
|
||||
following the SoC's requirements.
|
||||
|
||||
CBC protocol introduction
|
||||
-------------------------
|
||||
|
||||
The Carrier Board Communication (CBC) protocol multiplexes and
|
||||
prioritizes communication from the available interface between the SoC
|
||||
and the IOC.
|
||||
|
||||
The CBC protocol offers a layered approach, which allows it to run on
|
||||
different serial connections, such as SPI or UART.
|
||||
|
||||
.. figure:: images/ioc-image14.png
|
||||
:width: 900px
|
||||
:align: center
|
||||
:name: ioc-cbc-frame-def
|
||||
|
||||
IOC Native - CBC frame definition
|
||||
|
||||
The CBC protocol is based on a four-layer system:
|
||||
|
||||
- The **Physical layer** is a serial interface with full
|
||||
duplex capabilities. A hardware handshake is required. The required
|
||||
bit rate depends on the peripherals connected, e.g. UART, and SPI.
|
||||
- The **Link layer** handles the length and payload verification.
|
||||
- The **Address Layer** is used to distinguish between the general data
|
||||
transferred. It is placed in front of the underlying Service Layer
|
||||
and contains Multiplexer (MUX) and Priority fields.
|
||||
- The **Service Layer** contains the payload data.
|
||||
|
||||
Native architecture
|
||||
-------------------
|
||||
|
||||
In the native architecture, the IOC controller connects to UART
|
||||
hardware, and communicates with the CAN bus to access peripheral
|
||||
devices. ``cbc_attach`` is an application to enable the CBC ldisc
|
||||
function, which creates several CBC char devices. All userspace
|
||||
subsystems or services communicate with IOC firmware via the CBC char
|
||||
devices.
|
||||
|
||||
.. figure:: images/ioc-image13.png
|
||||
:width: 900px
|
||||
:align: center
|
||||
:name: ioc-software-arch
|
||||
|
||||
IOC Native - Software architecture
|
||||
|
||||
Virtualization architecture
|
||||
---------------------------
|
||||
|
||||
In the virtualization architecture, the IOC Device Model (DM) is
|
||||
responsible for communication between the UOS and IOC firmware. The IOC
|
||||
DM communicates with several native CBC char devices and a PTY device.
|
||||
The native CBC char devices only include ``/dev/cbc-lifecycle``,
|
||||
``/dev/cbc-signals``, and ``/dev/cbc-raw0`` - ``/dev/cbc-raw11``. Others
|
||||
are not used by the IOC DM. IOC DM opens the ``/dev/ptmx`` device to
|
||||
create a pair of devices (master and slave), The IOC DM uses these
|
||||
devices to communicate with UART DM since UART DM needs a TTY capable
|
||||
device as its backend.
|
||||
|
||||
.. figure:: images/ioc-image15.png
|
||||
:width: 900px
|
||||
:align: center
|
||||
:name: ioc-virt-software-arch
|
||||
|
||||
IOC Virtualization - Software architecture
|
||||
|
||||
High-Level Design
|
||||
=================
|
||||
|
||||
There are five parts in this high-level design:
|
||||
|
||||
* Software data flow introduces data transfer in the IOC mediator
|
||||
* State transfer introduces IOC mediator work states
|
||||
* CBC protocol illustrates the CBC data packing/unpacking
|
||||
* Power management involves boot/resume/suspend/shutdown flows
|
||||
* Emulated CBC commands introduces some commands work flow
|
||||
|
||||
IOC mediator has three threads to transfer data between UOS and SOS. The
|
||||
core thread is responsible for data reception, and Tx and Rx threads are
|
||||
used for data transmission. Each of the transmission threads has one
|
||||
data queue as a buffer, so that the IOC mediator can read data from CBC
|
||||
char devices and UART DM immediately.
|
||||
|
||||
.. figure:: images/ioc-image16.png
|
||||
:width: 900px
|
||||
:align: center
|
||||
:name: ioc-med-sw-data-flow
|
||||
|
||||
IOC Mediator - Software data flow
|
||||
|
||||
- For Tx direction, the data comes from IOC firmware. IOC mediator
|
||||
receives service data from native CBC char devices such as
|
||||
``/dev/cbc-lifecycle``. If service data is CBC wakeup reason, some wakeup
|
||||
reason bits will be masked. If service data is CBC signal, the data
|
||||
will be dropped and will not be defined in the whitelist. If service
|
||||
data comes from a raw channel, the data will be passed forward. Before
|
||||
transmitting to the virtual UART interface, all data needs to be
|
||||
packed with an address header and link header.
|
||||
- For Rx direction, the data comes from the UOS. The IOC mediator receives link
|
||||
data from the virtual UART interface. The data will be unpacked by Core
|
||||
thread, and then forwarded to Rx queue, similar to how the Tx direction flow
|
||||
is done except that the heartbeat and RTC are only used by the IOC
|
||||
mediator and will not be transferred to IOC
|
||||
firmware.
|
||||
- Currently, IOC mediator only cares about lifecycle, signal, and raw data.
|
||||
Others, e.g. diagnosis, are not used by the IOC mediator.
|
||||
|
||||
State transfer
|
||||
--------------
|
||||
|
||||
IOC mediator has four states and five events for state transfer.
|
||||
|
||||
.. figure:: images/ioc-image18.png
|
||||
:width: 600px
|
||||
:align: center
|
||||
:name: ioc-state-transfer
|
||||
|
||||
IOC Mediator - State Transfer
|
||||
|
||||
- **INIT state**: This state is the initialized state of the IOC mediator.
|
||||
All CBC protocol packets are handled normally. In this state, the UOS
|
||||
has not yet sent an active heartbeat.
|
||||
- **ACTIVE state**: Enter this state if an HB ACTIVE event is triggered,
|
||||
indicating that the UOS state has been active and need to set the bit
|
||||
23 (SoC bit) in the wakeup reason.
|
||||
- **SUSPENDING state**: Enter this state if a RAM REFRESH event or HB
|
||||
INACTIVE event is triggered. The related event handler needs to mask
|
||||
all wakeup reason bits except SoC bit and drop the queued CBC
|
||||
protocol frames.
|
||||
- **SUSPENDED state**: Enter this state if a SHUTDOWN event is triggered to
|
||||
close all native CBC char devices. The IOC mediator will be put to
|
||||
sleep until a RESUME event is triggered to re-open the closed native
|
||||
CBC char devices and transition to the INIT state.
|
||||
|
||||
CBC protocol
|
||||
------------
|
||||
|
||||
IOC mediator needs to pack/unpack the CBC link frame for IOC
|
||||
virtualization, as shown in the detailed flow below:
|
||||
|
||||
.. figure:: images/ioc-image17.png
|
||||
:width: 900px
|
||||
:align: center
|
||||
:name: ioc-cbc-frame-usage
|
||||
|
||||
IOC Native - CBC frame usage
|
||||
|
||||
In the native architecture, the CBC link frame is unpacked by CBC
|
||||
driver. The usage services only get the service data from the CBC char
|
||||
devices. For data packing, CBC driver will compute the checksum and set
|
||||
priority for the frame, then send data to the UART driver.
|
||||
|
||||
.. figure:: images/ioc-image20.png
|
||||
:width: 900px
|
||||
:align: center
|
||||
:name: ioc-cbc-prot
|
||||
|
||||
IOC Virtualizaton - CBC protocol virtualization
|
||||
|
||||
The difference between the native and virtualization architectures is
|
||||
that the IOC mediator needs to re-compute the checksum and reset
|
||||
priority. Currently, priority is not supported by IOC firmware; the
|
||||
priority setting by the IOC mediator is based on the priority setting of
|
||||
the CBC driver. The SOS and UOS use the same CBC driver.
|
||||
|
||||
Power management virtualization
|
||||
-------------------------------
|
||||
|
||||
In acrn-dm, the IOC power management architecture involves PM DM, IOC
|
||||
DM, and UART DM modules. PM DM is responsible for UOS power management,
|
||||
and IOC DM is responsible for heartbeat and wakeup reason flows for IOC
|
||||
firmware. The heartbeat flow is used to control IOC firmware power state
|
||||
and wakeup reason flow is used to indicate IOC power state to the OS.
|
||||
UART DM transfers all IOC data between the SOS and UOS. These modules
|
||||
complete boot/suspend/resume/shutdown functions.
|
||||
|
||||
Boot flow
|
||||
+++++++++
|
||||
|
||||
.. figure:: images/ioc-image19.png
|
||||
:width: 900px
|
||||
:align: center
|
||||
:name: ioc-virt-boot
|
||||
|
||||
IOC Virtualizaton - Boot flow
|
||||
|
||||
#. Press ignition button for booting.
|
||||
#. SOS lifecycle service gets a "booting" wakeup reason.
|
||||
#. SOS lifecycle service notifies wakeup reason to VM Manager, and VM
|
||||
Manager starts VM.
|
||||
#. VM Manager sets the VM state to "start".
|
||||
#. IOC DM forwards the wakeup reason to UOS.
|
||||
#. PM DM starts UOS.
|
||||
#. UOS lifecycle gets a "booting" wakeup reason.
|
||||
|
||||
Suspend & Shutdown flow
|
||||
+++++++++++++++++++++++
|
||||
|
||||
.. figure:: images/ioc-image21.png
|
||||
:width: 900px
|
||||
:align: center
|
||||
:name: ioc-suspend
|
||||
|
||||
IOC Virtualizaton - Suspend and Shutdown by Ignition
|
||||
|
||||
#. Press ignition button to suspend or shutdown.
|
||||
#. SOS lifecycle service gets a 0x800000 wakeup reason, then keeps
|
||||
sending a shutdown delay heartbeat to IOC firmware, and notifies a
|
||||
"stop" event to VM Manager.
|
||||
#. IOC DM forwards the wakeup reason to UOS lifecycle service.
|
||||
#. SOS lifecycle service sends a "stop" event to VM Manager, and waits for
|
||||
the stop response before timeout.
|
||||
#. UOS lifecycle service gets a 0x800000 wakeup reason and sends inactive
|
||||
heartbeat with suspend or shutdown SUS_STAT to IOC DM.
|
||||
#. UOS lifecycle service gets a 0x000000 wakeup reason, then enters
|
||||
suspend or shutdown kernel PM flow based on SUS_STAT.
|
||||
#. PM DM executes UOS suspend/shutdown request based on ACPI.
|
||||
#. VM Manager queries each VM state from PM DM. Suspend request maps
|
||||
to a paused state and shutdown request maps to a stop state.
|
||||
#. VM Manager collects all VMs state, and reports it to SOS lifecycle
|
||||
service.
|
||||
#. SOS lifecycle sends inactive heartbeat to IOC firmware with
|
||||
suspend/shutdown SUS_STAT, based on the SOS' own lifecycle service
|
||||
policy.
|
||||
|
||||
Resume flow
|
||||
+++++++++++
|
||||
|
||||
.. figure:: images/ioc-image22.png
|
||||
:width: 900px
|
||||
:align: center
|
||||
:name: ioc-resume
|
||||
|
||||
IOC Virtualizaton - Resume flow
|
||||
|
||||
The resume reason contains both the ignition button and RTC, and have
|
||||
the same flow blocks.
|
||||
|
||||
For ignition resume flow:
|
||||
|
||||
#. Press ignition button to resume.
|
||||
#. SOS lifecycle service gets an initial wakeup reason from the IOC
|
||||
firmware. The wakeup reason is 0x000020, from which the ignition button
|
||||
bit is set. It then sends active or initial heartbeat to IOC firmware.
|
||||
#. SOS lifecycle forwards the wakeup reason and sends start event to VM
|
||||
Manager. The VM Manager starts to resume VMs.
|
||||
#. IOC DM gets the wakeup reason from the VM Manager and forwards it to UOS
|
||||
lifecycle service.
|
||||
#. VM Manager sets the VM state to starting for PM DM.
|
||||
#. PM DM resumes UOS.
|
||||
#. UOS lifecycle service gets wakeup reason 0x000020, and then sends an initial
|
||||
or active heartbeat. The UOS gets wakeup reason 0x800020 after
|
||||
resuming.
|
||||
|
||||
For RTC resume flow
|
||||
|
||||
#. RTC timer expires.
|
||||
#. SOS lifecycle service gets initial wakeup reason from the IOC
|
||||
firmware. The wakeup reason is 0x000200, from which RTC bit is set.
|
||||
It then sends active or initial heartbeat to IOC firmware.
|
||||
#. SOS lifecycle forwards the wakeup reason and sends start event to VM
|
||||
Manager. VM Manager begins resuming VMs.
|
||||
#. IOC DM gets the wakeup reason from the VM Manager, and forwards it to
|
||||
the UOS lifecycle service.
|
||||
#. VM Manager sets the VM state to starting for PM DM.
|
||||
#. PM DM resumes UOS.
|
||||
#. UOS lifecycle service gets the wakeup reason 0x000200, and sends
|
||||
initial or active heartbeat. The UOS gets wakeup reason 0x800200
|
||||
after resuming..
|
||||
|
||||
System control data
|
||||
-------------------
|
||||
|
||||
IOC mediator has several emulated CBC commands, including wakeup reason,
|
||||
heartbeat, and RTC.
|
||||
|
||||
The wakeup reason, heartbeat, and RTC commands belong to the system
|
||||
control frames, which are used for startup or shutdown control. System
|
||||
control includes Wakeup Reasons, Heartbeat, Boot Selector, Suppress
|
||||
Heartbeat Check, and Set Wakeup Timer functions. Details are in this
|
||||
table:
|
||||
|
||||
.. list-table:: System control SVC values
|
||||
:header-rows: 1
|
||||
|
||||
* - System Control
|
||||
- Value Name
|
||||
- Description
|
||||
- Data Direction
|
||||
|
||||
* - 1
|
||||
- Wakeup Reasons
|
||||
- Wakeup Reasons
|
||||
- IOC to SoC
|
||||
|
||||
* - 2
|
||||
- Heartbeat
|
||||
- Heartbeat
|
||||
- Soc to IOC
|
||||
|
||||
* - 3
|
||||
- Boot Selector
|
||||
- Boot Selector
|
||||
- Soc to IOC
|
||||
|
||||
* - 4
|
||||
- Suppress Heartbeat Check
|
||||
- Suppress Heartbeat Check
|
||||
- Soc to IOC
|
||||
|
||||
* - 5
|
||||
- Set Wakeup Timer
|
||||
- Set Wakeup Timer in AIOC firmware
|
||||
- Soc to IOC
|
||||
|
||||
- IOC mediator only supports wakeup reasons Heartbeat and Set Wakeup
|
||||
Timer.
|
||||
- The Boot Selector command is used to configure which partition the
|
||||
IOC has to use for normal and emergency boots. Additionally, the IOC
|
||||
has to report to the SoC after the CBC communication has been
|
||||
established successfully with which boot partition has been started
|
||||
and for what reason.
|
||||
- The Suppress Heartbeat Check command is sent by the SoC in
|
||||
preparation for maintenance tasks which requires the CBC Server to be
|
||||
shut down for a certain period of time. It instructs the IOC not to
|
||||
expect CBC heartbeat messages during the specified time. The IOC must
|
||||
disable any watchdog on the CBC heartbeat messages during this period
|
||||
of time.
|
||||
|
||||
Wakeup reason
|
||||
+++++++++++++
|
||||
|
||||
The wakeup reasons command contains a bit mask of all reasons, which is
|
||||
currently keeping the SoC/IOC active. The SoC itself also has a wakeup
|
||||
reason, which allows the SoC to keep the IOC active. The wakeup reasons
|
||||
should be sent every 1000 ms by the IOC.
|
||||
|
||||
Wakeup reason frame definition is as below:
|
||||
|
||||
.. figure:: images/ioc-image24.png
|
||||
:width: 900px
|
||||
:align: center
|
||||
:name: ioc-wakeup-reason
|
||||
|
||||
Wakeup Reason Frame Definition
|
||||
|
||||
Currently the wakeup reason bits are supported by sources shown here:
|
||||
|
||||
.. list-table:: Wakeup Reason Bits
|
||||
:header-rows: 1
|
||||
|
||||
* - Wakeup Reason
|
||||
- Bit
|
||||
- Source
|
||||
|
||||
* - wakeup_button
|
||||
- 5
|
||||
- Get from IOC FW, forward to UOS
|
||||
|
||||
* - RTC wakeup
|
||||
- 9
|
||||
- Get from IOC FW, forward to UOS
|
||||
|
||||
* - car door wakeup
|
||||
- 11
|
||||
- Get from IOC FW, forward to UOS
|
||||
|
||||
* - SoC wakeup
|
||||
- 23
|
||||
- Emulation (Depends on UOS's heartbeat message
|
||||
|
||||
- CBC_WK_RSN_BTN (bit 5): ignition button.
|
||||
- CBC_WK_RSN_RTC (bit 9): RTC timer.
|
||||
- CBC_WK_RSN_DOR (bit 11): Car door.
|
||||
- CBC_WK_RSN_SOC (bit 23): SoC active/inactive.
|
||||
|
||||
.. figure:: images/ioc-image4.png
|
||||
:width: 600px
|
||||
:align: center
|
||||
:name: ioc-wakeup-flow
|
||||
|
||||
IOC Mediator - Wakeup reason flow
|
||||
|
||||
Bit 23 is for the SoC wakeup indicator and should not be forwarded
|
||||
directly because every VM has a different heartbeat status.
|
||||
|
||||
Heartbeat
|
||||
+++++++++
|
||||
|
||||
The Heartbeat is used for SOC watchdog, indicating the SOC power
|
||||
reset behavior. Heartbeat needs to be sent every 1000 ms by
|
||||
the SoC.
|
||||
|
||||
.. figure:: images/ioc-image5.png
|
||||
:width: 900px
|
||||
:align: center
|
||||
:name: ioc-heartbeat
|
||||
|
||||
System control - Heartbeat
|
||||
|
||||
Heartbeat frame definition is shown here:
|
||||
|
||||
.. figure:: images/ioc-image6.png
|
||||
:width: 900px
|
||||
:align: center
|
||||
:name: ioc-heartbeat-frame
|
||||
|
||||
Heartbeat Frame Definition
|
||||
|
||||
- Heartbeat active is repeatedly sent from SoC to IOC to signal that
|
||||
the SoC is active and intends to stay active. The On SUS_STAT action
|
||||
must be set to invalid.
|
||||
- Heartbeat inactive is sent once from SoC to IOC to signal that the
|
||||
SoC is ready for power shutdown. The On SUS_STAT action must be set
|
||||
to a required value.
|
||||
- Heartbeat delay is repeatedly sent from SoC to IOC to signal that the
|
||||
SoC has received the shutdown request, but isn't ready for
|
||||
shutdown yet (for example, a phone call or other time consuming
|
||||
action is active). The On SUS_STAT action must be set to invalid.
|
||||
|
||||
.. figure:: images/ioc-image7.png
|
||||
:width: 600px
|
||||
:align: center
|
||||
:name: ioc-heartbeat-commands
|
||||
|
||||
Heartbeat Commands
|
||||
|
||||
- SUS_STAT invalid action needs to be set with a heartbeat active
|
||||
message.
|
||||
- For the heartbeat inactive message, the SoC needs to be set from
|
||||
command 1 to 7 following the related scenarios. For example: S3 case
|
||||
needs to be set at 7 to prevent from power gating the memory.
|
||||
- The difference between halt and reboot is related if the power rail
|
||||
that supplies to customer peripherals (such as Fan, HDMI-in, BT/Wi-Fi,
|
||||
M.2, and Ethernet) is reset.
|
||||
|
||||
.. figure:: images/ioc-image8.png
|
||||
:width: 900px
|
||||
:align: center
|
||||
:name: ioc-heartbeat-flow
|
||||
|
||||
IOC Mediator - Heartbeat Flow
|
||||
|
||||
- IOC DM will not maintain a watchdog timer for a heartbeat message. This
|
||||
is because it already has other watchdog features, so the main use of
|
||||
Heartbeat active command is to maintain the virtual wakeup reason
|
||||
bitmap variable.
|
||||
- For Heartbeat, IOC mediator supports Heartbeat shutdown prepared,
|
||||
Heartbeat active, Heartbeat shutdown delay, Heartbeat initial, and
|
||||
Heartbeat Standby.
|
||||
- For SUS_STAT, IOC mediator supports invalid action and RAM refresh
|
||||
action.
|
||||
- For Suppress heartbeat check will also be dropped directly.
|
||||
|
||||
RTC
|
||||
+++
|
||||
|
||||
RTC timer is used to wakeup SoC when the timer is expired. (A use
|
||||
case is for an automatic software upgrade with a specific time.) RTC frame
|
||||
definition is as below.
|
||||
|
||||
.. figure:: images/ioc-image9.png
|
||||
:width: 600px
|
||||
:align: center
|
||||
|
||||
- The RTC command contains a relative time but not an absolute time.
|
||||
- SOS lifecycle service will re-compute the time offset before it is
|
||||
sent to the IOC firmware.
|
||||
|
||||
.. figure:: images/ioc-image10.png
|
||||
:width: 900px
|
||||
:align: center
|
||||
:name: ioc-rtc-flow
|
||||
|
||||
IOC Mediator - RTC flow
|
||||
|
||||
Signal data
|
||||
-----------
|
||||
|
||||
Signal channel is an API between the SOC and IOC for
|
||||
miscellaneous requirements. The process data includes all vehicle bus and
|
||||
carrier board data (GPIO, sensors, and so on). It supports
|
||||
transportation of single signals and group signals. Each signal consists
|
||||
of a signal ID (reference), its value, and its length. IOC and SOC need
|
||||
agreement on the definition of signal IDs that can be treated as API
|
||||
interface definitions.
|
||||
|
||||
IOC signal type definitions are as below.
|
||||
|
||||
.. figure:: images/ioc-image1.png
|
||||
:width: 600px
|
||||
:align: center
|
||||
:name: ioc-process-data-svc-val
|
||||
|
||||
Process Data SVC values
|
||||
|
||||
.. figure:: images/ioc-image2.png
|
||||
:width: 900px
|
||||
:align: center
|
||||
:name: ioc-med-signal-flow
|
||||
|
||||
IOC Mediator - Signal flow
|
||||
|
||||
- The IOC backend needs to emulate the channel open/reset/close message which
|
||||
shouldn't be forward to the native cbc signal channel. The SOS signal
|
||||
related services should do a real open/reset/close signal channel.
|
||||
- Every backend should maintain a whitelist for different VMs. The
|
||||
whitelist can be stored in the SOS file system (Read only) in the
|
||||
future, but currently it is hard coded.
|
||||
|
||||
IOC mediator has two whitelist tables, one is used for rx
|
||||
signals(SOC->IOC), and the other one is used for tx signals. The IOC
|
||||
mediator drops the single signals and group signals if the signals are
|
||||
not defined in the whitelist. For multi signal, IOC mediator generates a
|
||||
new multi signal, which contains the signals in the whitelist.
|
||||
|
||||
.. figure:: images/ioc-image3.png
|
||||
:width: 600px
|
||||
:align: center
|
||||
:name: ioc-med-multi-signal
|
||||
|
||||
IOC Mediator - Multi-Signal whitelist
|
||||
|
||||
Raw data
|
||||
--------
|
||||
|
||||
OEM raw channel only assigns to a specific UOS following that OEM
|
||||
configuration. The IOC Mediator will directly forward all read/write
|
||||
message from IOC firmware to UOS without any modification.
|
||||
|
||||
Dependencies and Constraints
|
||||
****************************
|
||||
|
||||
HW External Dependencies
|
||||
========================
|
||||
|
||||
+--------------------------------------+--------------------------------------+
|
||||
| Dependency | Runtime Mechanism to Detect |
|
||||
| | Violations |
|
||||
+======================================+======================================+
|
||||
| VMX should be supported | Boot-time checks to CPUID. See |
|
||||
| | section A.1 in SDM for details. |
|
||||
+--------------------------------------+--------------------------------------+
|
||||
| EPT should be supported | Boot-time checks to primary and |
|
||||
| | secondary processor-based |
|
||||
| | VM-execution controls. See section |
|
||||
| | A.3.2 and A.3.3 in SDM for details. |
|
||||
+--------------------------------------+--------------------------------------+
|
||||
|
||||
SW External Dependencies
|
||||
========================
|
||||
|
||||
+--------------------------------------+--------------------------------------+
|
||||
| Dependency | Runtime Mechanism to Detect |
|
||||
| | Violations |
|
||||
+======================================+======================================+
|
||||
| When invoking the hypervisor, the | Check the magic value in EAX. See |
|
||||
| bootloader should have established a | section 3.2 & 3.3 in Multiboot |
|
||||
| multiboot-compliant state | Specification for details. |
|
||||
+--------------------------------------+--------------------------------------+
|
||||
|
||||
Constraints
|
||||
===========
|
||||
|
||||
+--------------------------+--------------------------+--------------------------+
|
||||
| Description | Rationale | How such constraint is |
|
||||
| | | enforced |
|
||||
+==========================+==========================+==========================+
|
||||
| Physical cores are | To avoid interference | A bitmap indicating free |
|
||||
| exclusively assigned to | between vcpus on the | pcpus; on vcpu creation |
|
||||
| vcpus. | same core. | a free pcpu is picked. |
|
||||
+--------------------------+--------------------------+--------------------------+
|
||||
| Only PCI devices | Without HW reset it is | |
|
||||
| supporting HW reset can | challenging to manage | |
|
||||
| be passed through to a | devices on UOS crashes | |
|
||||
| UOS. | | |
|
||||
+--------------------------+--------------------------+--------------------------+
|
||||
|
||||
|
||||
Interface Specification
|
||||
***********************
|
||||
|
||||
Doxygen-style comments in the code are used for interface specification.
|
||||
This section provides some examples on how functions and structure
|
||||
should be commented.
|
||||
|
||||
Function Header Template
|
||||
========================
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
/**
|
||||
* @brief Initialize environment for Trusty-OS on a VCPU.
|
||||
*
|
||||
* More info here.
|
||||
*
|
||||
* @param[in] vcpu Pointer to VCPU data structure
|
||||
* @param[inout] param guest physical address. This gpa points to
|
||||
* struct trusty_boot_param
|
||||
*
|
||||
* @return 0 - on success.
|
||||
* @return -EIO - (description when this error can happen)
|
||||
* @return -EINVAL - (description )
|
||||
*
|
||||
* @pre vcpu must not be NULL.
|
||||
* @pre param must ...
|
||||
*
|
||||
* @post the return value is non-zero if param is ....
|
||||
* @post
|
||||
*
|
||||
* @remark The api must be invoked with interrupt disabled.
|
||||
* @remark (Other usage constraints here)
|
||||
*/
|
||||
|
||||
|
||||
Structure
|
||||
=========
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
/**
|
||||
* @brief An mmio request.
|
||||
*
|
||||
* More info here.
|
||||
*/
|
||||
struct mmio_request {
|
||||
uint32_t direction; /**< Direction of this request. */
|
||||
uint32_t reserved; /**< Reserved. */
|
||||
int64_t address; /**< gpa of the register to be accessed. */
|
||||
int64_t size; /**< Width of the register to be accessed. */
|
||||
int64_t value; /**< Value read from or to be written to the
|
||||
register. */
|
||||
} __aligned(8);
|
||||
|
||||
|
||||
IOC Mediator Configuration
|
||||
**************************
|
||||
|
||||
TBD
|
||||
|
||||
IOC Mediator Usage
|
||||
******************
|
||||
|
||||
The device model configuration command syntax for IOC mediator is as
|
||||
follows::
|
||||
|
||||
-i,[ioc_channel_path],[wakeup_reason]
|
||||
-l,[lpc_port],[ioc_channel_path]
|
||||
|
||||
The "ioc_channel_path" is an absolute path for communication between
|
||||
IOC mediator and UART DM.
|
||||
|
||||
The "lpc_port" is "com1" or "com2", IOC mediator needs one unassigned
|
||||
lpc port for data transfer between UOS and SOS.
|
||||
|
||||
The "wakeup_reason" is IOC mediator boot up reason, each bit represents
|
||||
one wakeup reason.
|
||||
|
||||
For example, the following commands are used to enable IOC feature, the
|
||||
initial wakeup reason is the ignition button and cbc_attach uses ttyS1
|
||||
for TTY line discipline in UOS::
|
||||
|
||||
-i /run/acrn/ioc_$vm_name,0x20
|
||||
-l com2,/run/acrn/ioc_$vm_name
|
||||
|
||||
|
||||
Porting and adaptation to different platforms
|
||||
*********************************************
|
||||
|
||||
TBD
|
498
doc/developer-guides/hld/hv-memmgt.rst
Normal file
@@ -0,0 +1,498 @@
|
||||
.. _memmgt-hld:
|
||||
|
||||
Memory Management high-level design
|
||||
###################################
|
||||
|
||||
This document describes memory management for the ACRN hypervisor.
|
||||
|
||||
Overview
|
||||
********
|
||||
|
||||
The hypervisor (HV) virtualizes real physical memory so an unmodified OS
|
||||
(such as Linux or Android) running in a virtual machine, has the view of
|
||||
managing its own contiguous physical memory. HV uses virtual-processor
|
||||
identifiers (VPIDs) and the extended page-table mechanism (EPT) to
|
||||
translate guest-physical address into host-physical address. HV enables
|
||||
EPT and VPID hardware virtualization features, establishes EPT page
|
||||
tables for SOS/UOS, and provides EPT page tables operation interfaces to
|
||||
others.
|
||||
|
||||
In the ACRN hypervisor system, there are few different memory spaces to
|
||||
consider. From the hypervisor's point of view there are:
|
||||
|
||||
- **Host Physical Address (HPA)**: the native physical address space, and
|
||||
- **Host Virtual Address (HVA)**: the native virtual address space based on
|
||||
a MMU. A page table is used to translate between HPA and HVA
|
||||
spaces.
|
||||
|
||||
From the Guest OS running on a hypervisor there are:
|
||||
|
||||
- **Guest Physical Address (GPA)**: the guest physical address space from a
|
||||
virtual machine. GPA to HPA transition is usually based on a
|
||||
MMU-like hardware module (EPT in X86), and associated with a page
|
||||
table
|
||||
- **Guest Virtual Address (GVA)**: the guest virtual address space from a
|
||||
virtual machine based on a vMMU
|
||||
|
||||
.. figure:: images/mem-image2.png
|
||||
:align: center
|
||||
:width: 900px
|
||||
:name: mem-overview
|
||||
|
||||
ACRN Memory Mapping Overview
|
||||
|
||||
:numref:`mem-overview` provides an overview of the ACRN system memory
|
||||
mapping, showing:
|
||||
|
||||
- GVA to GPA mapping based on vMMU on a VCPU in a VM
|
||||
- GPA to HPA mapping based on EPT for a VM in the hypervisor
|
||||
- HVA to HPA mapping based on MMU in the hypervisor
|
||||
|
||||
This document illustrates the memory management infrastructure for the
|
||||
ACRN hypervisor and how it handles the different memory space views
|
||||
inside the hypervisor and from a VM:
|
||||
|
||||
- How ACRN hypervisor manages host memory (HPA/HVA)
|
||||
- How ACRN hypervisor manages SOS guest memory (HPA/GPA)
|
||||
- How ACRN hypervisor & SOS DM manage UOS guest memory (HPA/GPA)
|
||||
|
||||
Hypervisor Physical Memory Management
|
||||
*************************************
|
||||
|
||||
In the ACRN, the HV initializes MMU page tables to manage all physical
|
||||
memory and then switches to the new MMU page tables. After MMU page
|
||||
tables are initialized at the platform initialization stage, no updates
|
||||
are made for MMU page tables.
|
||||
|
||||
Hypervisor Physical Memory Layout - E820
|
||||
========================================
|
||||
|
||||
The ACRN hypervisor is the primary owner to manage system memory.
|
||||
Typically the boot firmware (e.g., EFI) passes the platform physical
|
||||
memory layout - E820 table to the hypervisor. The ACRN hypervisor does
|
||||
its memory management based on this table using 4-level paging.
|
||||
|
||||
The BIOS/bootloader firmware (e.g., EFI) passes the E820 table through a
|
||||
multiboot protocol. This table contains the original memory layout for
|
||||
the platform.
|
||||
|
||||
.. figure:: images/mem-image1.png
|
||||
:align: center
|
||||
:width: 900px
|
||||
:name: mem-layout
|
||||
|
||||
Physical Memory Layout Example
|
||||
|
||||
:numref:`mem-layout` is an example of the physical memory layout based on a simple
|
||||
platform E820 table.
|
||||
|
||||
Hypervisor Memory Initialization
|
||||
================================
|
||||
|
||||
The ACRN hypervisor runs under paging mode. After the bootstrap
|
||||
processor (BSP) gets the platform E820 table, BSP creates its MMU page
|
||||
table based on it. This is done by the function *init_paging()* and
|
||||
*smep()*. After the application processor (AP) receives IPI CPU startup
|
||||
interrupt, it uses the MMU page tables created by BSP and enable SMEP.
|
||||
:numref:`hv-mem-init` describes the hypervisor memory initialization for BSP
|
||||
and APs.
|
||||
|
||||
.. figure:: images/mem-image8.png
|
||||
:align: center
|
||||
:name: hv-mem-init
|
||||
|
||||
Hypervisor Memory Initialization
|
||||
|
||||
The memory mapping policy used is:
|
||||
|
||||
- Identical mapping (ACRN hypervisor memory could be relocatable in
|
||||
the future)
|
||||
- Map all memory regions with UNCACHED type
|
||||
- Remap RAM regions to WRITE-BACK type
|
||||
|
||||
.. figure:: images/mem-image69.png
|
||||
:align: center
|
||||
:name: hv-mem-vm-init
|
||||
|
||||
Hypervisor Virtual Memory Layout
|
||||
|
||||
:numref:`hv-mem-vm-init` above shows:
|
||||
|
||||
- Hypervisor has a view of and can access all system memory
|
||||
- Hypervisor has UNCACHED MMIO/PCI hole reserved for devices such as
|
||||
LAPIC/IOAPIC accessing
|
||||
- Hypervisor has its own memory with WRITE-BACK cache type for its
|
||||
code/data (< 1M part is for secondary CPU reset code)
|
||||
|
||||
The hypervisor should use minimum memory pages to map from virtual
|
||||
address space into physical address space.
|
||||
|
||||
- If 1GB hugepage can be used
|
||||
for virtual address space mapping, the corresponding PDPT entry shall be
|
||||
set for this 1GB hugepage.
|
||||
- If 1GB hugepage can't be used for virtual
|
||||
address space mapping and 2MB hugepage can be used, the corresponding
|
||||
PDT entry shall be set for this 2MB hugepage.
|
||||
- If both of 1GB hugepage
|
||||
and 2MB hugepage can't be used for virtual address space mapping, the
|
||||
corresponding PT entry shall be set.
|
||||
|
||||
If memory type or access rights of a page is updated, or some virtual
|
||||
address space is deleted, it will lead to splitting of the corresponding
|
||||
page. The hypervisor will still keep using minimum memory pages to map from
|
||||
virtual address space into physical address space.
|
||||
|
||||
Memory Pages Pool Functions
|
||||
===========================
|
||||
|
||||
Memory pages pool functions provide dynamic management of multiple
|
||||
4KB page-size memory blocks, used by the hypervisor to store internal
|
||||
data. Through these functions, the hypervisor can allocate and
|
||||
deallocate pages.
|
||||
|
||||
Data Flow Design
|
||||
================
|
||||
|
||||
The physical memory management unit provides MMU 4-level page tables
|
||||
creating and updating services, MMU page tables switching service, SMEP
|
||||
enable service, and HPA/HVA retrieving service to other units.
|
||||
:numref:`mem-data-flow-physical` shows the data flow diagram
|
||||
of physical memory management.
|
||||
|
||||
.. figure:: images/mem-image45.png
|
||||
:align: center
|
||||
:name: mem-data-flow-physical
|
||||
|
||||
Data Flow of Hypervisor Physical Memory Management
|
||||
|
||||
Interfaces Design
|
||||
=================
|
||||
|
||||
|
||||
MMU Initialization
|
||||
------------------
|
||||
|
||||
.. doxygenfunction:: enable_smep
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: enable_paging
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: init_paging
|
||||
:project: Project ACRN
|
||||
|
||||
Address Space Translation
|
||||
-------------------------
|
||||
|
||||
.. doxygenfunction:: hpa2hva
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: hva2hpa
|
||||
:project: Project ACRN
|
||||
|
||||
|
||||
Hypervisor Memory Virtualization
|
||||
********************************
|
||||
|
||||
The hypervisor provides a contiguous region of physical memory for SOS
|
||||
and each UOS. It also guarantees that the SOS and UOS can not access
|
||||
code and internal data in the hypervisor, and each UOS can not access
|
||||
code and internal data of the SOS and other UOSs.
|
||||
|
||||
The hypervisor:
|
||||
|
||||
- enables EPT and VPID hardware virtualization features,
|
||||
- establishes EPT page tables for SOS/UOS,
|
||||
- provides EPT page tables operations services,
|
||||
- virtualizes MTRR for SOS/UOS,
|
||||
- provides VPID operations services,
|
||||
- provides services for address spaces translation between GPA and HPA, and
|
||||
- provides services for data transfer between hypervisor and virtual machine.
|
||||
|
||||
Memory Virtualization Capability Checking
|
||||
=========================================
|
||||
|
||||
In the hypervisor, memory virtualization provides EPT/VPID capability
|
||||
checking service and EPT hugepage supporting checking service. Before HV
|
||||
enables memory virtualization and uses EPT hugepage, these service need
|
||||
to be invoked by other units.
|
||||
|
||||
Data Transfer between Different Address Spaces
|
||||
==============================================
|
||||
|
||||
In ACRN, different memory space management is used in the hypervisor,
|
||||
Service OS, and User OS to achieve spatial isolation. Between memory
|
||||
spaces, there are different kinds of data transfer, such as a SOS/UOS
|
||||
may hypercall to request hypervisor services which includes data
|
||||
transferring, or when the hypervisor does instruction emulation: the HV
|
||||
needs to access the guest instruction pointer register to fetch guest
|
||||
instruction data.
|
||||
|
||||
Access GPA from Hypervisor
|
||||
--------------------------
|
||||
|
||||
When hypervisor need access GPA for data transfer, the caller from guest
|
||||
must make sure this memory range's GPA is continuous. But for HPA in
|
||||
hypervisor, it could be dis-continuous (especially for UOS under hugetlb
|
||||
allocation mechanism). For example, a 4M GPA range may map to 2
|
||||
different 2M huge host-physical pages. The ACRN hypervisor must take
|
||||
care of this kind of data transfer by doing EPT page walking based on
|
||||
its HPA.
|
||||
|
||||
Access GVA from Hypervisor
|
||||
--------------------------
|
||||
|
||||
When hypervisor needs to access GVA for data transfer, it's likely both
|
||||
GPA and HPA could be address dis-continuous. The ACRN hypervisor must
|
||||
watch for this kind of data transfer, and handle it by doing page
|
||||
walking based on both its GPA and HPA.
|
||||
|
||||
EPT Page Tables Operations
|
||||
==========================
|
||||
|
||||
The hypervisor should use a minimum of memory pages to map from
|
||||
guest-physical address (GPA) space into host-physical address (HPA)
|
||||
space.
|
||||
|
||||
- If 1GB hugepage can be used for GPA space mapping, the
|
||||
corresponding EPT PDPT entry shall be set for this 1GB hugepage.
|
||||
- If 1GB hugepage can't be used for GPA space mapping and 2MB hugepage can be
|
||||
used, the corresponding EPT PDT entry shall be set for this 2MB
|
||||
hugepage.
|
||||
- If both 1GB hugepage and 2MB hugepage can't be used for GPA
|
||||
space mapping, the corresponding EPT PT entry shall be set.
|
||||
|
||||
If memory type or access rights of a page is updated or some GPA space
|
||||
is deleted, it will lead to the corresponding EPT page being split. The
|
||||
hypervisor should still keep to using minimum EPT pages to map from GPA
|
||||
space into HPA space.
|
||||
|
||||
The hypervisor provides EPT guest-physical mappings adding service, EPT
|
||||
guest-physical mappings modifying/deleting service, EPT page tables
|
||||
deallocation, and EPT guest-physical mappings invalidation service.
|
||||
|
||||
Virtual MTRR
|
||||
************
|
||||
|
||||
In ACRN, the hypervisor only virtualizes MTRRs fixed range (0~1MB).
|
||||
The HV sets MTRRs of the fixed range as Write-Back for UOS, and the SOS reads
|
||||
native MTRRs of the fixed range set by BIOS.
|
||||
|
||||
If the guest physical address is not in the fixed range (0~1MB), the
|
||||
hypervisor uses the default memory type in the MTRR (Write-Back).
|
||||
|
||||
When the guest disables MTRRs, the HV sets the guest address memory type
|
||||
as UC.
|
||||
|
||||
If the guest physical address is in fixed range (0~1MB), the HV sets
|
||||
memory type according to the fixed virtual MTRRs.
|
||||
|
||||
When the guest enable MTRRs, MTRRs have no effect on the memory type
|
||||
used for access to GPA. The HV first intercepts MTRR MSR registers
|
||||
access through MSR access VM exit and updates EPT memory type field in EPT
|
||||
PTE according to the memory type selected by MTRRs. This combines with
|
||||
PAT entry in the PAT MSR (which is determined by PAT, PCD, and PWT bits
|
||||
from the guest paging structures) to determine the effective memory
|
||||
type.
|
||||
|
||||
VPID operations
|
||||
===============
|
||||
|
||||
Virtual-processor identifier (VPID) is a hardware feature to optimize
|
||||
TLB management. When VPID is enable, hardware will add a tag for TLB of
|
||||
a logical processor and cache information for multiple linear-address
|
||||
spaces. VMX transitions may retain cached information and the logical
|
||||
processor switches to a different address space, avoiding unnecessary
|
||||
TLB flushes.
|
||||
|
||||
In ACRN, an unique VPID must be allocated for each virtual CPU
|
||||
when a virtual CPU is created. The logical processor invalidates linear
|
||||
mappings and combined mapping associated with all VPIDs (except VPID
|
||||
0000H), and with all PCIDs when the logical processor launches the virtual
|
||||
CPU. The logical processor invalidates all linear mapping and combined
|
||||
mappings associated with the specified VPID when the interrupt pending
|
||||
request handling needs to invalidate cached mapping of the specified
|
||||
VPID.
|
||||
|
||||
Data Flow Design
|
||||
================
|
||||
|
||||
The memory virtualization unit includes address space translation
|
||||
functions, data transferring functions, VM EPT operations functions,
|
||||
VPID operations functions, VM exit hanging about EPT violation and EPT
|
||||
misconfiguration, and MTRR virtualization functions. This unit handles
|
||||
guest-physical mapping updates by creating or updating related EPT page
|
||||
tables. It virtualizes MTRR for guest OS by updating related EPT page
|
||||
tables. It handles address translation from GPA to HPA by walking EPT
|
||||
page tables. It copies data from VM into the HV or from the HV to VM by
|
||||
walking guest MMU page tables and EPT page tables. It provides services
|
||||
to allocate VPID for each virtual CPU and TLB invalidation related VPID.
|
||||
It handles VM exit about EPT violation and EPT misconfiguration. The
|
||||
following :numref:`mem-flow-mem-virt` describes the data flow diagram of
|
||||
the memory virtualization unit.
|
||||
|
||||
.. figure:: images/mem-image84.png
|
||||
:align: center
|
||||
:name: mem-flow-mem-virt
|
||||
|
||||
Data Flow of Hypervisor Memory Virtualization
|
||||
|
||||
Data Structure Design
|
||||
=====================
|
||||
|
||||
EPT Memory Type Definition:
|
||||
|
||||
.. doxygengroup:: ept_mem_type
|
||||
:project: Project ACRN
|
||||
:content-only:
|
||||
|
||||
EPT Memory Access Right Definition:
|
||||
|
||||
.. doxygengroup:: ept_mem_access_right
|
||||
:project: Project ACRN
|
||||
:content-only:
|
||||
|
||||
|
||||
Interfaces Design
|
||||
=================
|
||||
|
||||
The memory virtualization unit interacts with external units through VM
|
||||
exit and APIs.
|
||||
|
||||
VM Exit about EPT
|
||||
=================
|
||||
|
||||
There are two VM exit handlers for EPT violation and EPT
|
||||
misconfiguration in the hypervisor. EPT page tables are
|
||||
always configured correctly for SOS and UOS. If EPT misconfiguration is
|
||||
detected, a fatal error is reported by HV. The hypervisor
|
||||
uses EPT violation to intercept MMIO access to do device emulation. EPT
|
||||
violation handling data flow is described in the
|
||||
:ref:`instruction-emulation`.
|
||||
|
||||
Memory Virtualization APIs
|
||||
==========================
|
||||
|
||||
Here is a list of major memory related APIs in HV:
|
||||
|
||||
EPT/VPID Capability Checking
|
||||
----------------------------
|
||||
|
||||
Data Transferring between hypervisor and VM
|
||||
-------------------------------------------
|
||||
|
||||
.. doxygenfunction:: copy_from_gpa
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: copy_to_gpa
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: copy_from_gva
|
||||
:project: Project ACRN
|
||||
|
||||
Address Space Translation
|
||||
-------------------------
|
||||
|
||||
.. doxygenfunction:: gpa2hpa
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: sos_vm_hpa2gpa
|
||||
:project: Project ACRN
|
||||
|
||||
EPT
|
||||
---
|
||||
|
||||
.. doxygenfunction:: ept_add_mr
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: ept_del_mr
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: ept_modify_mr
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: destroy_ept
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: invept
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: ept_misconfig_vmexit_handler
|
||||
:project: Project ACRN
|
||||
|
||||
Virtual MTRR
|
||||
------------
|
||||
|
||||
.. doxygenfunction:: init_vmtrr
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: write_vmtrr
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: read_vmtrr
|
||||
:project: Project ACRN
|
||||
|
||||
VPID
|
||||
----
|
||||
.. doxygenfunction:: flush_vpid_single
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: flush_vpid_global
|
||||
:project: Project ACRN
|
||||
|
||||
Service OS Memory Management
|
||||
****************************
|
||||
|
||||
After the ACRN hypervisor starts, it creates the Service OS as its first
|
||||
VM. The Service OS runs all the native device drivers, manage the
|
||||
hardware devices, and provides I/O mediation to guest VMs. The Service
|
||||
OS is in charge of the memory allocation for Guest VMs as well.
|
||||
|
||||
ACRN hypervisor passes the whole system memory access (except its own
|
||||
part) to the Service OS. The Service OS must be able to access all of
|
||||
the system memory except the hypervisor part.
|
||||
|
||||
Guest Physical Memory Layout - E820
|
||||
===================================
|
||||
|
||||
The ACRN hypervisor passes the original E820 table to the Service OS
|
||||
after filtering out its own part. So from Service OS's view, it sees
|
||||
almost all the system memory as shown here:
|
||||
|
||||
.. figure:: images/mem-image3.png
|
||||
:align: center
|
||||
:width: 900px
|
||||
:name: sos-mem-layout
|
||||
|
||||
SOS Physical Memory Layout
|
||||
|
||||
Host to Guest Mapping
|
||||
=====================
|
||||
|
||||
ACRN hypervisor creates Service OS's host (HPA) to guest (GPA) mapping
|
||||
(EPT mapping) through the function ``prepare_sos_vm_memmap()``
|
||||
when it creates the SOS VM. It follows these rules:
|
||||
|
||||
- Identical mapping
|
||||
- Map all memory range with UNCACHED type
|
||||
- Remap RAM entries in E820 (revised) with WRITE-BACK type
|
||||
- Unmap ACRN hypervisor memory range
|
||||
- Unmap ACRN hypervisor emulated vLAPIC/vIOAPIC MMIO range
|
||||
|
||||
The host to guest mapping is static for the Service OS; it will not
|
||||
change after the Service OS begins running. Each native device driver
|
||||
can access its MMIO through this static mapping. EPT violation is only
|
||||
serving for vLAPIC/vIOAPIC's emulation in the hypervisor for Service OS
|
||||
VM.
|
||||
|
||||
Trusty
|
||||
******
|
||||
|
||||
For an Android User OS, there is a secure world named trusty world
|
||||
support, whose memory must be secured by the ACRN hypervisor and
|
||||
must not be accessible by SOS and UOS normal world.
|
||||
|
||||
.. figure:: images/mem-image18.png
|
||||
:align: center
|
||||
|
||||
UOS Physical Memory Layout with Trusty
|
367
doc/developer-guides/hld/hv-partitionmode.rst
Normal file
@@ -0,0 +1,367 @@
|
||||
.. _partition-mode-hld:
|
||||
|
||||
Partition mode
|
||||
##############
|
||||
|
||||
ACRN is type-1 hypervisor that supports running multiple guest operating
|
||||
systems (OS). Typically, the platform BIOS/boot-loader boots ACRN, and
|
||||
ACRN loads single or multiple guest OSes. Refer to :ref:`hv-startup` for
|
||||
details on the start-up flow of the ACRN hypervisor.
|
||||
|
||||
ACRN supports two modes of operation: Sharing mode and Partition mode.
|
||||
This document describes ACRN's high-level design for Partition mode
|
||||
support.
|
||||
|
||||
.. contents::
|
||||
:depth: 2
|
||||
:local:
|
||||
|
||||
Introduction
|
||||
************
|
||||
|
||||
In partition mode, ACRN provides guests with exclusive access to cores,
|
||||
memory, cache, and peripheral devices. Partition mode enables developers
|
||||
to dedicate resources exclusively among the guests. However there is no
|
||||
support today in x86 hardware or in ACRN to partition resources such as
|
||||
peripheral buses (e.g. PCI) or memory bandwidth. Cache partitioning
|
||||
technology, such as Cache Allocation Technology (CAT) in x86, can be
|
||||
used by developers to partition Last Level Cache (LLC) among the guests.
|
||||
(Note: ACRN support for x86 CAT is on the roadmap, but not currently
|
||||
supported).
|
||||
|
||||
ACRN expects static partitioning of resources either by code
|
||||
modification for guest configuration or through compile-time config
|
||||
options. All the devices exposed to the guests are either physical
|
||||
resources or emulated in the hypervisor. So, there is no need for
|
||||
device-model and Service OS. :numref:`pmode2vms` shows a partition mode
|
||||
example of two VMs with exclusive access to physical resources.
|
||||
|
||||
.. figure:: images/partition-image3.png
|
||||
:align: center
|
||||
:name: pmode2vms
|
||||
|
||||
Partition Mode example with two VMs
|
||||
|
||||
Guest info
|
||||
**********
|
||||
|
||||
ACRN uses multi-boot info passed from the platform boot-loader to know
|
||||
the location of each guest kernel in memory. ACRN creates a copy of each
|
||||
guest kernel into each of the guests' memory. Current implementation of
|
||||
ACRN requires developers to specify kernel parameters for the guests as
|
||||
part of guest configuration. ACRN picks up kernel parameters from guest
|
||||
configuration and copies them to the corresponding guest memory.
|
||||
|
||||
.. figure:: images/partition-image18.png
|
||||
:align: center
|
||||
|
||||
ACRN set-up for guests
|
||||
**********************
|
||||
|
||||
Cores
|
||||
=====
|
||||
|
||||
ACRN requires the developer to specify the number of guests and the
|
||||
cores dedicated for each guest. Also the developer needs to specify
|
||||
the physical core used as the Boot Strap Processor (BSP) for each guest. As
|
||||
the processors are brought to life in the hypervisor, it checks if they are
|
||||
configured as BSP for any of the guests. If a processor is BSP of any of
|
||||
the guests, ACRN proceeds to build the memory mapping for the guest,
|
||||
mptable, E820 entries, and zero page for the guest. As described in
|
||||
`Guest info`_, ACRN creates copies of guest kernel and kernel
|
||||
parameters into guest memory. :numref:`partBSPsetup` explains these
|
||||
events in chronological order.
|
||||
|
||||
.. figure:: images/partition-image7.png
|
||||
:align: center
|
||||
:name: partBSPsetup
|
||||
|
||||
Memory
|
||||
======
|
||||
|
||||
For each guest in partition mode, the ACRN developer specifies the size of
|
||||
memory for the guest and the starting address in the host physical
|
||||
address in the guest configuration. There is no support for HIGHMEM for
|
||||
partition mode guests. The developer needs to take care of two aspects
|
||||
for assigning host memory to the guests:
|
||||
|
||||
1) Sum of guest PCI hole and guest "System RAM" is less than 4GB.
|
||||
|
||||
2) Pick the starting address in the host physical address and the
|
||||
size, so that it does not overlap with any reserved regions in
|
||||
host E820.
|
||||
|
||||
ACRN creates EPT mapping for the guest between GPA (0, memory size) and
|
||||
HPA (starting address in guest configuration, memory size).
|
||||
|
||||
E820 and zero page info
|
||||
=======================
|
||||
|
||||
A default E820 is used for all the guests in partition mode. This table
|
||||
shows the reference E820 layout. Zero page is created with this
|
||||
e820 info for all the guests.
|
||||
|
||||
+------------------------+
|
||||
| RAM |
|
||||
| |
|
||||
| 0 - 0xEFFFFH |
|
||||
+------------------------+
|
||||
| RESERVED (MPTABLE) |
|
||||
| |
|
||||
| 0xF0000H - 0x100000H |
|
||||
+------------------------+
|
||||
| RAM |
|
||||
| |
|
||||
| 0x100000H - LOWMEM |
|
||||
+------------------------+
|
||||
| RESERVED |
|
||||
+------------------------+
|
||||
| PCI HOLE |
|
||||
+------------------------+
|
||||
| RESERVED |
|
||||
+------------------------+
|
||||
|
||||
Platform info - mptable
|
||||
=======================
|
||||
|
||||
ACRN, in partition mode, uses mptable to convey platform info to each
|
||||
guest. Using this platform information, number of cores used for each
|
||||
guest, and whether the guest needs devices with INTX, ACRN builds
|
||||
mptable and copies it to the guest memory. In partition mode, ACRN uses
|
||||
physical APIC IDs to pass to the guests.
|
||||
|
||||
I/O - Virtual devices
|
||||
=====================
|
||||
|
||||
Port I/O is supported for PCI device config space 0xcfc and 0xcf8, vUART
|
||||
0x3f8, vRTC 0x70 and 0x71, and vPIC ranges 0x20/21, 0xa0/a1, and
|
||||
0x4d0/4d1. MMIO is supported for vIOAPIC. ACRN exposes a virtual
|
||||
host-bridge at BDF (Bus Device Function) 0.0:0 to each guest. Access to
|
||||
256 bytes of config space for virtual host bridge is emulated.
|
||||
|
||||
I/O - Pass-thru devices
|
||||
=======================
|
||||
|
||||
ACRN, in partition mode, supports passing thru PCI devices on the
|
||||
platform. All the pass-thru devices are exposed as child devices under
|
||||
the virtual host bridge. ACRN does not support either passing thru
|
||||
bridges or emulating virtual bridges. Pass-thru devices should be
|
||||
statically allocated to each guest using the guest configuration. ACRN
|
||||
expects the developer to provide the virtual BDF to BDF of the
|
||||
physical device mapping for all the pass-thru devices as
|
||||
part of each guest configuration.
|
||||
|
||||
Run-time ACRN support for guests
|
||||
********************************
|
||||
|
||||
ACRN, in partition mode, supports an option to pass-thru LAPIC of the
|
||||
physical CPUs to the guest. ACRN expects developers to specify if the
|
||||
guest needs LAPIC pass-thru using guest configuration. When guest
|
||||
configures vLAPIC as x2APIC, and if the guest configuration has LAPIC
|
||||
pass-thru enabled, ACRN passes the LAPIC to the guest. Guest can access
|
||||
the LAPIC hardware directly without hypervisor interception. During
|
||||
runtime of the guest, this option differentiates how ACRN supports
|
||||
inter-processor interrupt handling and device interrupt handling. This
|
||||
will be discussed in detail in the corresponding sections.
|
||||
|
||||
.. figure:: images/partition-image16.png
|
||||
:align: center
|
||||
|
||||
|
||||
Guest SMP boot flow
|
||||
===================
|
||||
|
||||
The core APIC IDs are reported to the guest using mptable info. SMP boot
|
||||
flow is similar to sharing mode. Refer to :ref:`vm-startup`
|
||||
for guest SMP boot flow in ACRN. Partition mode guests startup is same as
|
||||
the SOS startup in sharing mode.
|
||||
|
||||
Inter-processor Interrupt (IPI) Handling
|
||||
========================================
|
||||
|
||||
Guests w/o LAPIC pass-thru
|
||||
--------------------------
|
||||
|
||||
For guests without LAPIC pass-thru, IPIs between guest CPUs are handled in
|
||||
the same way as sharing mode of ACRN. Refer to :ref:`virtual-interrupt-hld`
|
||||
for more details.
|
||||
|
||||
Guests w/ LAPIC pass-thru
|
||||
-------------------------
|
||||
|
||||
ACRN supports pass-thru if and only if the guest is using x2APIC mode
|
||||
for the vLAPIC. In LAPIC pass-thru mode, writes to Interrupt Command
|
||||
Register (ICR) x2APIC MSR is intercepted. Guest writes the IPI info
|
||||
including vector, destination APIC IDs to the ICR. Upon an IPI request
|
||||
from the guest, ACRN does sanity check on the destination processors
|
||||
programmed into ICR. If the destination is a valid target for the guest,
|
||||
ACRN sends IPI with the same vector from ICR to the physical CPUs
|
||||
corresponding to the destination processor info in ICR.
|
||||
|
||||
.. figure:: images/partition-image14.png
|
||||
:align: center
|
||||
|
||||
|
||||
Pass-thru device support
|
||||
========================
|
||||
|
||||
Configuration space access
|
||||
--------------------------
|
||||
|
||||
ACRN emulates Configuration Space Address (0xcf8) I/O port and
|
||||
Configuration Space Data (0xcfc) I/O port for guests to access PCI
|
||||
devices configuration space. Within the config space of a device, Base
|
||||
Address registers (BAR), offsets starting from 0x10H to 0x24H, provide
|
||||
the information about the resources (I/O and MMIO) used by the PCI
|
||||
device. ACRN virtualizes the BAR registers and for the rest of the
|
||||
config space, forwards reads and writes to the physical config space of
|
||||
pass-thru devices. Refer to `I/O`_ section below for more details.
|
||||
|
||||
.. figure:: images/partition-image1.png
|
||||
:align: center
|
||||
|
||||
|
||||
DMA
|
||||
---
|
||||
|
||||
ACRN developers need to statically define the pass-thru devices for each
|
||||
guest using the guest configuration. For devices to DMA to/from guest
|
||||
memory directly, ACRN parses the list of pass-thru devices for each
|
||||
guest and creates context entries in the VT-d remapping hardware. EPT
|
||||
page tables created for the guest are used for VT-d page tables.
|
||||
|
||||
I/O
|
||||
---
|
||||
|
||||
ACRN supports I/O for pass-thru devices with two restrictions.
|
||||
|
||||
1) Supports only MMIO. So requires developers to expose I/O BARs as
|
||||
not present in the guest configuration.
|
||||
|
||||
2) Supports only 32-bit MMIO BAR type.
|
||||
|
||||
As guest PCI sub-system scans the PCI bus and assigns Guest Physical
|
||||
Address (GPA) to the MMIO BAR, ACRN maps GPA to the address in the
|
||||
physical BAR of the pass-thru device using EPT. Following timeline chart
|
||||
explains how PCI devices are assigned to guest and BARs are mapped upon
|
||||
guest initialization.
|
||||
|
||||
.. figure:: images/partition-image13.png
|
||||
:align: center
|
||||
|
||||
|
||||
Interrupt Configuration
|
||||
-----------------------
|
||||
|
||||
ACRN supports both legacy (INTx) and MSI interrupts for pass-thru
|
||||
devices.
|
||||
|
||||
INTx support
|
||||
~~~~~~~~~~~~
|
||||
|
||||
ACRN expects developers to identify the interrupt line info (0x3CH) from
|
||||
the physical BAR of the pass-thru device and build an interrupt entry in
|
||||
the mptable for the corresponding guest. As guest configures the vIOAPIC
|
||||
for the interrupt RTE, ACRN writes the info from the guest RTE into the
|
||||
physical IOAPIC RTE. Upon guest kernel request to mask the interrupt,
|
||||
ACRN writes to the physical RTE to mask the interrupt at the physical
|
||||
IOAPIC. When guest masks the RTE in vIOAPIC, ACRN masks the interrupt
|
||||
RTE in the physical IOAPIC. Level triggered interrupts are not
|
||||
supported.
|
||||
|
||||
MSI support
|
||||
~~~~~~~~~~~
|
||||
|
||||
Guest reads/writes to PCI configuration space for configuring MSI
|
||||
interrupts using address. Data and control registers are pass-thru to
|
||||
the physical BAR of pass-thru device. Refer to `Configuration
|
||||
space access`_ for details on how PCI configuration space is emulated.
|
||||
|
||||
Virtual device support
|
||||
======================
|
||||
|
||||
ACRN provides read-only vRTC support for partition mode guests. Writes
|
||||
to the data port are discarded.
|
||||
|
||||
For port I/O to ports other than vPIC, vRTC, or vUART, reads return 0xFF and
|
||||
writes are discarded.
|
||||
|
||||
Interrupt delivery
|
||||
==================
|
||||
|
||||
Guests w/o LAPIC pass-thru
|
||||
--------------------------
|
||||
|
||||
In partition mode of ACRN, interrupts stay disabled after a vmexit. The
|
||||
processor does not take interrupts when it is executing in VMX root
|
||||
mode. ACRN configures the processor to take vmexit upon external
|
||||
interrupt if the processor is executing in VMX non-root mode. Upon an
|
||||
external interrupt, after sending EOI to the physical LAPIC, ACRN
|
||||
injects the vector into the vLAPIC of vCPU currently running on the
|
||||
processor. Guests using Linux as kernel, uses vectors less than 0xECh
|
||||
for device interrupts.
|
||||
|
||||
.. figure:: images/partition-image20.png
|
||||
:align: center
|
||||
|
||||
|
||||
Guests w/ LAPIC pass-thru
|
||||
-------------------------
|
||||
|
||||
For guests with LAPIC pass-thru, ACRN does not configure vmexit upon
|
||||
external interrupts. There is no vmexit upon device interrupts and they are
|
||||
handled by the guest IDT.
|
||||
|
||||
Hypervisor IPI service
|
||||
======================
|
||||
|
||||
ACRN needs IPIs for events such as flushing TLBs across CPUs, sending virtual
|
||||
device interrupts (e.g. vUART to vCPUs), and others.
|
||||
|
||||
Guests w/o LAPIC pass-thru
|
||||
--------------------------
|
||||
|
||||
Hypervisor IPIs work the same way as in sharing mode.
|
||||
|
||||
Guests w/ LAPIC pass-thru
|
||||
-------------------------
|
||||
|
||||
Since external interrupts are pass-thru to guest IDT, IPIs do not
|
||||
trigger vmexit. ACRN uses NMI delivery mode and the NMI exiting is
|
||||
chosen for vCPUs. At the time of NMI interrupt on the target processor,
|
||||
if the processor is in non-root mode, vmexit happens on the processor
|
||||
and the event mask is checked for servicing the events.
|
||||
|
||||
Debug Console
|
||||
=============
|
||||
|
||||
For details on how hypervisor console works, refer to
|
||||
:ref:`hv-console`.
|
||||
|
||||
For a guest console in partition mode, ACRN provides an option to pass
|
||||
``vmid`` as an argument to ``vm_console``. vmid is same as the one
|
||||
developer uses in the guest configuration.
|
||||
|
||||
Guests w/o LAPIC pass-thru
|
||||
--------------------------
|
||||
|
||||
Works the same way as sharing mode.
|
||||
|
||||
Hypervisor Console
|
||||
==================
|
||||
|
||||
ACRN uses TSC deadline timer to provide timer service. Hypervisor
|
||||
console uses a timer on CPU0 to poll characters on the serial device. To
|
||||
support LAPIC pass-thru, TSC deadline MSR is pass-thru and the local
|
||||
timer interrupt also delivered to the guest IDT. Instead of TSC deadline
|
||||
timer, ACRN uses VMX preemption timer to poll the serial device.
|
||||
|
||||
Guest Console
|
||||
=============
|
||||
|
||||
ACRN exposes vUART to partition mode guests. vUART uses vPIC to inject
|
||||
interrupt to the guest BSP. In cases of guest having more than one core,
|
||||
during runtime, vUART might need to inject interrupt to guest BSP from
|
||||
another core (other than BSP). As mentioned in section <Hypervisor IPI
|
||||
service>, ACRN uses NMI delivery mode for notifying the CPU running BSP
|
||||
of the guest.
|
49
doc/developer-guides/hld/hv-pm.rst
Normal file
@@ -0,0 +1,49 @@
|
||||
.. _pm_hld:
|
||||
|
||||
Power Management
|
||||
################
|
||||
|
||||
System PM module
|
||||
****************
|
||||
|
||||
The PM module in the hypervisor does three things:
|
||||
|
||||
- When all UOSes enter low power state, VM management will notify the SOS
|
||||
lifecycle service and trigger the SOS to enter a low-power state.
|
||||
SOS follows its own standard low-power state entry process and
|
||||
writes the ACPI control register to put SOS into low-power state.
|
||||
Hypervisor traps the ACPI control register writing and
|
||||
emulates SOS low-power state entry.
|
||||
|
||||
- Once SOS low-power emulation is done, Hypervisor handles its
|
||||
own low-power state transition
|
||||
|
||||
- Once system resumes from low-power mode, the hypervisor handles its
|
||||
own resume and emulates SOS resume too.
|
||||
|
||||
It is assumed that SOS does not trigger any power state transition until
|
||||
the VM manager of ACRN notifies it that all UOSes are inactive and SOS
|
||||
offlines all its virtual APs.
|
||||
|
||||
:numref:`pm-low-power-transition` shows the SOS/Hypervisor low-power
|
||||
state transition process. SOS triggers power state transition by
|
||||
writing ACPI control register on its virtual BSP (which is pinned to the
|
||||
physical BSP). The hypervisor then does the following in sequence before
|
||||
it writes to the physical ACPI control register to trigger physical
|
||||
power state transition:
|
||||
|
||||
- Pauses SOS.
|
||||
- Offlines all physical APs.
|
||||
- Save the context of console, ioapic of SOS, I/O MMU, lapic of SOS,
|
||||
virtual BSP.
|
||||
- Save the context of physical BSP.
|
||||
|
||||
When exiting from low-power mode, the hypervisor does similar steps in
|
||||
reverse order to restore contexts, start APs and resume SOS. SOS is
|
||||
responsible for starting its own virtual APs as well as UOSes.
|
||||
|
||||
.. figure:: images/pm-image24-105.png
|
||||
:align: center
|
||||
:name: pm-low-power-transition
|
||||
|
||||
SOS/Hypervisor low power state transition process
|
207
doc/developer-guides/hld/hv-startup.rst
Normal file
@@ -0,0 +1,207 @@
|
||||
.. _hv-startup:
|
||||
|
||||
Hypervisor Startup
|
||||
##################
|
||||
|
||||
This section is an overview of the ACRN hypervisor startup.
|
||||
The ACRN hypervisor
|
||||
compiles to a 32-bit multiboot-compliant ELF file.
|
||||
The bootloader (ABL or SBL) loads the hypervisor according to the
|
||||
addresses specified in the ELF header. The BSP starts the hypervisor
|
||||
with an initial state compliant to multiboot 1 specification, after the
|
||||
bootloader prepares full configurations including ACPI, E820, etc.
|
||||
|
||||
The HV startup has two parts: the native startup followed by
|
||||
VM startup.
|
||||
|
||||
Native Startup
|
||||
**************
|
||||
|
||||
.. figure:: images/hld-image107.png
|
||||
:align: center
|
||||
:name: hvstart-nativeflow
|
||||
|
||||
Hypervisor Native Startup Flow
|
||||
|
||||
Native startup sets up a baseline environment for HV, including basic
|
||||
memory and interrupt initialization as shown in
|
||||
:numref:`hvstart-nativeflow`. Here is a short
|
||||
description for the flow:
|
||||
|
||||
- **BSP Startup:** The starting point for bootstrap processor.
|
||||
|
||||
- **Relocation**: relocate the hypervisor image if the hypervisor image
|
||||
is not placed at the assumed base address.
|
||||
|
||||
- **UART Init:** Initialize a pre-configured UART device used
|
||||
as the base physical console for HV and Service OS.
|
||||
|
||||
- **Shell Init:** Start a command shell for HV accessible via the UART.
|
||||
|
||||
- **Memory Init:** Initialize memory type and cache policy, and creates
|
||||
MMU page table mapping for HV.
|
||||
|
||||
- **Interrupt Init:** Initialize interrupt and exception for native HV
|
||||
including IDT and ``do_IRQ`` infrastructure; a timer interrupt
|
||||
framework is then built. The native/physical interrupts will go
|
||||
through this ``do_IRQ`` infrastructure then distribute to special
|
||||
targets (HV or VMs).
|
||||
|
||||
- **Start AP:** BSP kicks ``INIT-SIPI-SIPI`` IPI sequence to start other
|
||||
native APs (application processor). Each AP will initialize its
|
||||
own memory and interrupts, notifies the BSP on completion and
|
||||
enter the default idle loop.
|
||||
|
||||
Symbols in the hypervisor are placed with an assumed base address, but
|
||||
the bootloader may not place the hypervisor at that specified base. In
|
||||
such case the hypervisor will relocate itself to where the bootloader
|
||||
loads it.
|
||||
|
||||
Here is a summary of CPU and memory initial states that are set up after
|
||||
native startup.
|
||||
|
||||
CPU
|
||||
ACRN hypervisor brings all physical processors to 64-bit IA32e
|
||||
mode, with the assumption that the BSP starts in protection mode where
|
||||
segmentation and paging sets an identical mapping of the first 4G
|
||||
addresses without permission restrictions. The control registers and
|
||||
some MSRs are set as follows:
|
||||
|
||||
- cr0: The following features are enabled: paging, write protection,
|
||||
protection mode, numeric error and co-processor monitoring.
|
||||
|
||||
- cr3: refer to the initial state of memory.
|
||||
|
||||
- cr4: The following features are enabled: physical address extension,
|
||||
machine-check, FXSAVE/FXRSTOR, SMEP, VMX operation and unmask
|
||||
SIMD FP exception. The other features are disabled.
|
||||
|
||||
- MSR_IA32_EFER: only IA32e mode is enabled.
|
||||
|
||||
- MSR_IA32_FS_BASE: the address of stack canary, used for detecting
|
||||
stack smashing.
|
||||
|
||||
- MSR_IA32_TSC_AUX: a unique logical ID is set for each physical
|
||||
processor.
|
||||
|
||||
- stack: each physical processor has a separate stack.
|
||||
|
||||
Memory
|
||||
All physical processors are in 64-bit IA32e mode after
|
||||
startup. The GDT holds four entries, one unused, one for code and
|
||||
another for data, both of which have a base of all 0's and a limit of
|
||||
all 1's, and the other for 64-bit TSS. The TSS only holds three stack
|
||||
pointers (for machine-check, double fault and stack fault) in the
|
||||
interrupt stack table (IST) which are different across physical
|
||||
processors. LDT is disabled.
|
||||
|
||||
Refer to section 3.5.2 for a detailed description of interrupt-related
|
||||
initial states, including IDT and physical PICs.
|
||||
|
||||
After BSP detects that all APs are up, BSP will start creating the first
|
||||
VM, i.e. SOS, as explained in the next section.
|
||||
|
||||
.. _vm-startup:
|
||||
|
||||
VM Startup
|
||||
**********
|
||||
|
||||
SOS is created and launched on the physical BSP after the hypervisor
|
||||
initializes itself. Meanwhile, the APs enter the default idle loop
|
||||
(refer to :ref:`VCPU_lifecycle` for details), waiting for any vCPU to be
|
||||
scheduled to them.
|
||||
|
||||
:numref:`hvstart-vmflow` illustrates a high-level execution flow of
|
||||
creating and launching a VM, applicable to both SOS and UOS. One major
|
||||
difference in the creation of SOS and UOS is that SOS is created by the
|
||||
hypervisor, while the creation of UOSes is triggered by the DM in SOS.
|
||||
The main steps include:
|
||||
|
||||
- **Create VM**: A VM structure is allocated and initialized. A unique
|
||||
VM ID is picked, EPT is created, I/O bitmap is set up, I/O
|
||||
emulation handlers initialized and registered and virtual CPUID
|
||||
entries filled. For SOS an addition e820 table is prepared.
|
||||
|
||||
- **Create vCPUs:** Create the vCPUs, assign the physical processor it
|
||||
is pinned to, a unique-per-VM vCPU ID and a globally unique VPID,
|
||||
and initializes its virtual lapic and MTRR. For SOS one vCPU is
|
||||
created for each physical CPU on the platform. For UOS the DM
|
||||
determines the number of vCPUs to be created.
|
||||
|
||||
- **SW Load:** The BSP of a VM also prepares for each VM's SW
|
||||
configuration including kernel entry address, ramdisk address,
|
||||
bootargs, zero page etc. This is done by the hypervisor for SOS
|
||||
while by DM for UOS.
|
||||
|
||||
- **Schedule vCPUs:** The vCPUs are scheduled to the corresponding
|
||||
physical processors for execution.
|
||||
|
||||
- **Init VMCS:** Initialize vCPU's VMCS for its host state, guest
|
||||
state, execution control, entry control and exit control. It's
|
||||
the last configuration before vCPU runs.
|
||||
|
||||
- **vCPU thread:** vCPU kicks out to run. For "Primary CPU" it will
|
||||
start running into kernel image which SW Load is configured; for
|
||||
"Non-Primary CPU" it will wait for INIT-SIPI-SIPI IPI sequence
|
||||
trigger from its "Primary CPU".
|
||||
|
||||
.. figure:: images/hld-image104.png
|
||||
:align: center
|
||||
:name: hvstart-vmflow
|
||||
|
||||
Hypervisor VM Startup Flow
|
||||
|
||||
SW configuration for Service OS (SOS_VM):
|
||||
|
||||
- **ACPI**: HV passes the entire ACPI table from bootloader to Service
|
||||
OS directly. Legacy mode is currently supported as the ACPI table
|
||||
is loaded at F-Segment.
|
||||
|
||||
- **E820**: HV passes e820 table from bootloader through multi-boot
|
||||
information after the HV reserved memory (32M for example) is
|
||||
filtered out.
|
||||
|
||||
- **Zero Page**: HV prepares the zero page at the high end of Service
|
||||
OS memory which is determined by SOS_VM guest FIT binary build. The
|
||||
zero page includes configuration for ramdisk, bootargs and e820
|
||||
entries. The zero page address will be set to "Primary CPU" RSI
|
||||
register before VCPU gets run.
|
||||
|
||||
- **Entry address**: HV will copy Service OS kernel image to 0x1000000
|
||||
as entry address for SOS_VM's "Primary CPU". This entry address will
|
||||
be set to "Primary CPU" RIP register before VCPU gets run.
|
||||
|
||||
SW configuration for User OS (VMx):
|
||||
|
||||
- **ACPI**: the virtual ACPI table is built by DM and put at VMx's
|
||||
F-Segment. Refer to :ref:`hld-io-emulation` for details.
|
||||
|
||||
- **E820**: the virtual E820 table is built by the DM then passed to
|
||||
the zero page. Refer to :ref:`hld-io-emulation` for details.
|
||||
|
||||
- **Zero Page**: the DM prepares the zero page at location of
|
||||
"lowmem_top - 4K" in VMx. This location is set into VMx's
|
||||
"Primary CPU" RSI register in **SW Load**.
|
||||
|
||||
- **Entry address**: the DM will copy User OS kernel image to 0x1000000
|
||||
as entry address for VMx's "Primary CPU". This entry address will
|
||||
be set to "Primary CPU" RIP register before VCPU gets run.
|
||||
|
||||
Here is initial mode of vCPUs:
|
||||
|
||||
|
||||
+------------------------------+-------------------------------+
|
||||
| VM and Processor Type | Initial Mode |
|
||||
+=============+================+===============================+
|
||||
| SOS | BSP | Same as physical BSP |
|
||||
| +----------------+-------------------------------+
|
||||
| | AP | Real Mode |
|
||||
+-------------+----------------+-------------------------------+
|
||||
| UOS | BSP | Real Mode |
|
||||
| +----------------+-------------------------------+
|
||||
| | AP | Real Mode |
|
||||
+-------------+----------------+-------------------------------+
|
||||
|
||||
Note that SOS is started with the same number of vCPUs as the physical
|
||||
CPUs to boost the boot-up. SOS will offline the APs right before it
|
||||
starts any UOS.
|
61
doc/developer-guides/hld/hv-timer.rst
Normal file
@@ -0,0 +1,61 @@
|
||||
.. _timer-hld:
|
||||
|
||||
Timer
|
||||
#####
|
||||
|
||||
Because ACRN is a flexible, lightweight reference hypervisor, we provide
|
||||
limited timer management services:
|
||||
|
||||
- Only lapic tsc-deadline timer is supported as the clock source.
|
||||
|
||||
- A timer can only be added on the logical CPU for a process or thread. Timer
|
||||
scheduling or timer migrating are not supported.
|
||||
|
||||
How it works
|
||||
************
|
||||
|
||||
When the system boots, we check that the hardware supports lapic
|
||||
tsc-deadline timer by checking CPUID.01H:ECX.TSC_Deadline[bit 24]. If
|
||||
support is missing, we output an error message and panic the hypervisor.
|
||||
If supported, we register the timer interrupt callback that raises a
|
||||
timer softirq on each logical CPU and set the lapic timer mode to
|
||||
tsc-deadline timer mode by writing the local APIC LVT register.
|
||||
|
||||
Data Structures and APIs
|
||||
************************
|
||||
|
||||
Interfaces Design
|
||||
=================
|
||||
|
||||
.. doxygenfunction:: initialize_timer
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: timer_expired
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: add_timer
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: del_timer
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: timer_init
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: calibrate_tsc
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: us_to_ticks
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: ticks_to_us
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: ticks_to_ms
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: rdtsc
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: get_tsc_khz
|
||||
:project: Project ACRN
|
264
doc/developer-guides/hld/hv-virt-interrupt.rst
Normal file
@@ -0,0 +1,264 @@
|
||||
.. _virtual-interrupt-hld:
|
||||
|
||||
Virtual Interrupt
|
||||
#################
|
||||
|
||||
This section introduces ACRN guest virtual interrupt
|
||||
management, which includes:
|
||||
|
||||
- VCPU request for virtual interrupt kick off,
|
||||
- vPIC/vIOAPIC/vLAPIC for virtual interrupt injection interfaces,
|
||||
- physical-to-virtual interrupt mapping for a pass-thru device, and
|
||||
- the process of VMX interrupt/exception injection.
|
||||
|
||||
A guest VM never owns any physical interrupts. All interrupts received by
|
||||
Guest OS come from a virtual interrupt injected by vLAPIC, vIOAPIC or
|
||||
vPIC. Such virtual interrupts are triggered either from a pass-through
|
||||
device or from I/O mediators in SOS via hypercalls. Section 3.8.6
|
||||
introduces how the hypervisor manages the mapping between physical and
|
||||
virtual interrupts for pass-through devices.
|
||||
|
||||
Emulation for devices is inside SOS user space device model, i.e.,
|
||||
acrn-dm. However for performance consideration: vLAPIC, vIOAPIC, and vPIC
|
||||
are emulated inside HV directly.
|
||||
|
||||
From guest OS point of view, vPIC is Virtual Wire Mode via vIOAPIC. The
|
||||
symmetric I/O Mode is shown in :numref:`pending-virt-interrupt` later in
|
||||
this section.
|
||||
|
||||
The following command line
|
||||
options to guest Linux affects whether it uses PIC or IOAPIC:
|
||||
|
||||
- **Kernel boot param with vPIC**: add "maxcpu=0" Guest OS will use PIC
|
||||
- **Kernel boot param with vIOAPIC**: add "maxcpu=1" (as long as not "0")
|
||||
Guest OS will use IOAPIC. And Keep IOAPIC pin2 as source of PIC.
|
||||
|
||||
vCPU Request for Interrupt Injection
|
||||
************************************
|
||||
|
||||
The vCPU request mechanism (described in :ref:`pending-request-handlers`) is leveraged
|
||||
to inject interrupts to a certain vCPU. As mentioned in
|
||||
:ref:`ipi-management`,
|
||||
physical vector 0xF0 is used to kick VCPU out of its VMX non-root mode,
|
||||
used to make a request for virtual interrupt injection or other
|
||||
requests such as flush EPT.
|
||||
|
||||
The eventid supported for virtual interrupt injection includes:
|
||||
|
||||
.. doxygengroup:: virt_int_injection
|
||||
:project: Project ACRN
|
||||
:content-only:
|
||||
|
||||
|
||||
The *vcpu_make_request* is necessary for a virtual interrupt
|
||||
injection. If the target vCPU is running under VMX non-root mode, it
|
||||
will send an IPI to kick it out, which leads to an external-interrupt
|
||||
VM-Exit. For some cases there is no need to send IPI when making a request,
|
||||
because the CPU making the request itself is the target VCPU. For
|
||||
example, the #GP exception request always happens on the current CPU when it
|
||||
finds an invalid emulation has happened. An external interrupt for a pass-thru
|
||||
device always happens on the VCPUs this device belonging to, so after it
|
||||
triggers an external-interrupt VM-Exit, the current CPU is also the
|
||||
target VCPU.
|
||||
|
||||
Virtual LAPIC
|
||||
*************
|
||||
|
||||
LAPIC is virtualized for all Guest types: SOS and UOS. Given support by
|
||||
the
|
||||
physical processor, APICv Virtual Interrupt Delivery (VID) is enabled
|
||||
and will support Posted-Interrupt feature. Otherwise, it will fall back to legacy
|
||||
virtual interrupt injection mode.
|
||||
|
||||
vLAPIC provides the same features as the native LAPIC:
|
||||
|
||||
- Vector mask/unmask
|
||||
- Virtual vector injections (Level or Edge trigger mode) to vCPU
|
||||
- vIOAPIC notification of EOI processing
|
||||
- TSC Timer service
|
||||
- vLAPIC support CR8 to update TPR
|
||||
- INIT/STARTUP handling
|
||||
|
||||
vLAPIC APIs
|
||||
===========
|
||||
|
||||
APIs are provided when an interrupt source from vLAPIC needs to inject
|
||||
an interrupt, for example:
|
||||
|
||||
- from LVT like LAPIC timer
|
||||
- from vIOAPIC for a pass-thru device interrupt
|
||||
- from an emulated device for a MSI
|
||||
|
||||
These APIs will finish by making a request for *ACRN_REQUEST_EVENT.*
|
||||
|
||||
.. doxygenfunction:: vlapic_set_local_intr
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: vlapic_intr_msi
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: apicv_get_pir_desc_paddr
|
||||
:project: Project ACRN
|
||||
|
||||
EOI processing
|
||||
==============
|
||||
|
||||
EOI virtualization is enabled if APICv virtual interrupt delivery is
|
||||
supported. Except for level triggered interrupts, VM will not exit in
|
||||
case of EOI.
|
||||
|
||||
In case of no APICv virtual interrupt delivery support, vLAPIC requires
|
||||
EOI from Guest OS whenever a vector was acknowledged and processed by
|
||||
guest. vLAPIC behavior is the same as HW LAPIC. Once an EOI is received,
|
||||
it clears the highest priority vector in ISR and TMR, and updates PPR
|
||||
status. vLAPIC will then notify vIOAPIC if the corresponding vector
|
||||
comes from vIOAPIC. This only occurs for the level triggered interrupts.
|
||||
|
||||
LAPIC passthrough based on vLAPIC
|
||||
=================================
|
||||
|
||||
LAPIC passthrough is supported based on vLAPIC, after switch to x2APIC
|
||||
mode. In case of LAPIC passthrough based on vLAPIC, the system will have the
|
||||
following characteristics.
|
||||
|
||||
* IRQs received by the LAPIC can be handled by the Guest VM without ``vmexit``
|
||||
* Guest VM always see virtual LAPIC IDs for security reasons
|
||||
* most MSRs are directly accessible from Guest VM except for ``XAPICID``,
|
||||
``LDR`` and ``ICR``. Write operations to ``ICR`` will be trapped to avoid
|
||||
malicious IPI. Read operations to ``XAPIC`` and ``LDR`` will be trapped in
|
||||
order to make the Guest VM always see the virtual LAPIC IDs instead of the
|
||||
physical ones.
|
||||
|
||||
Virtual IOAPIC
|
||||
**************
|
||||
|
||||
vIOAPIC is emulated by HV when Guest accesses MMIO GPA range:
|
||||
0xFEC00000-0xFEC01000. vIOAPIC for SOS should match to the native HW
|
||||
IOAPIC Pin numbers. vIOAPIC for UOS provides 48 Pins. As the vIOAPIC is
|
||||
always associated with vLAPIC, the virtual interrupt injection from
|
||||
vIOAPIC will finally trigger a request for vLAPIC event by calling
|
||||
vLAPIC APIs.
|
||||
|
||||
**Supported APIs:**
|
||||
|
||||
.. doxygenfunction:: vioapic_set_irqline_lock
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: vioapic_set_irqline_nolock
|
||||
:project: Project ACRN
|
||||
|
||||
Virtual PIC
|
||||
***********
|
||||
|
||||
vPIC is required for TSC calculation. Normally UOS will boot with
|
||||
vIOAPIC and vPIC as the source of external interrupts to Guest. On every
|
||||
VM Exit, HV will check if there are any pending external PIC interrupts.
|
||||
vPIC APIs usage are similar to vIOAPIC.
|
||||
|
||||
ACRN hypervisor emulates a vPIC for each VM based on IO range 0x20~0x21,
|
||||
0xa0~0xa1 and 0x4d0~0x4d1.
|
||||
|
||||
If an interrupt source from vPIC need to inject an interrupt, the
|
||||
following APIs need be called, which will finally make a request for
|
||||
*ACRN_REQUEST_EXTINT or ACRN_REQUEST_EVENT*:
|
||||
|
||||
.. doxygenfunction:: vpic_set_irqline
|
||||
:project: Project ACRN
|
||||
|
||||
The following APIs are used to query the vector needed to be injected and ACK
|
||||
the service (means move the interrupt from request service - IRR to in
|
||||
service - ISR):
|
||||
|
||||
.. doxygenfunction:: vpic_pending_intr
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: vpic_intr_accepted
|
||||
:project: Project ACRN
|
||||
|
||||
Virtual Exception
|
||||
*****************
|
||||
|
||||
When doing emulation, an exception may need to be triggered in
|
||||
hypervisor, for example:
|
||||
|
||||
- if guest accesses an invalid vMSR register,
|
||||
- hypervisor needs to inject a #GP, or
|
||||
- during instruction emulation, an instruction fetch may access
|
||||
a non-exist page from rip_gva, at that time a #PF need be injected.
|
||||
|
||||
ACRN hypervisor implements virtual exception injection using these APIs:
|
||||
|
||||
.. doxygenfunction:: vcpu_queue_exception
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: vcpu_inject_extint
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: vcpu_inject_nmi
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: vcpu_inject_gp
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: vcpu_inject_pf
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: vcpu_inject_ud
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: vcpu_inject_ss
|
||||
:project: Project ACRN
|
||||
|
||||
ACRN hypervisor uses the *vcpu_inject_gp/vcpu_inject_pf* functions
|
||||
to queue exception request, and follows SDM vol3 - 6.15, Table 6-5 to
|
||||
generate double fault if the condition is met.
|
||||
|
||||
Virtual Interrupt Injection
|
||||
***************************
|
||||
|
||||
The source of virtual interrupts comes from either DM or assigned
|
||||
devices.
|
||||
|
||||
- **For SOS assigned devices**: as all devices are assigned to SOS
|
||||
directly. Whenever there is a device's physical interrupt, the
|
||||
corresponding virtual interrupts are injected to SOS via
|
||||
vLAPIC/vIOAPIC. SOS does not use vPIC and does not have emulated
|
||||
devices. See section 3.8.5 Device assignment.
|
||||
|
||||
- **For UOS assigned devices**: only PCI devices could be assigned to
|
||||
UOS. Virtual interrupt injection follows the same way as SOS. A
|
||||
virtual interrupt injection operation is triggered when a
|
||||
device's physical interrupt occurs.
|
||||
|
||||
- **For UOS emulated devices**: DM (acrn-dm) is responsible for UOS
|
||||
emulated devices' interrupt lifecycle management. DM knows when
|
||||
an emulated device needs to assert a virtual IOPAIC/PIC Pin or
|
||||
needs to send a virtual MSI vector to Guest. These logic is
|
||||
entirely handled by DM.
|
||||
|
||||
.. figure:: images/virtint-image64.png
|
||||
:align: center
|
||||
:name: pending-virt-interrupt
|
||||
|
||||
Handle pending virtual interrupt
|
||||
|
||||
Before APICv virtual interrupt delivery, a virtual interrupt can be
|
||||
injected only if guest interrupt is allowed. There are many cases
|
||||
that Guest ``RFLAGS.IF`` gets cleared and it would not accept any further
|
||||
interrupts. HV will check for the available Guest IRQ windows before
|
||||
injection.
|
||||
|
||||
NMI is unmasked interrupt and its injection is always allowed
|
||||
regardless of the guest IRQ window status. If current IRQ
|
||||
windows is not present, HV would enable
|
||||
``MSR_IA32_VMX_PROCBASED_CTLS_IRQ_WIN (PROCBASED_CTRL.bit[2])`` and
|
||||
VM Enter directly. The injection will be done on next VM Exit once Guest
|
||||
issues ``STI (GuestRFLAG.IF=1)``.
|
||||
|
||||
Data structures and interfaces
|
||||
******************************
|
||||
|
||||
There is no data structure exported to the other components in the
|
||||
hypervisor for virtual interrupts. The APIs listed in the previous
|
||||
sections are meant to be called whenever a virtual interrupt should be
|
||||
injected or acknowledged.
|
327
doc/developer-guides/hld/hv-vt-d.rst
Normal file
@@ -0,0 +1,327 @@
|
||||
.. _vt-d-hld:
|
||||
|
||||
VT-d
|
||||
####
|
||||
|
||||
VT-d stands for Intel Virtual Technology for Directed IO, and provides
|
||||
hardware capabilities to assign I/O devices to VMs and extending the
|
||||
protection and isolation properties of VMs for I/O operations.
|
||||
|
||||
VT-d provides the following main functions:
|
||||
|
||||
- **DMA remapping**: for supporting address translations for DMA from
|
||||
devices.
|
||||
|
||||
- **Interrupt remapping**: for supporting isolation and routing of
|
||||
interrupts from devices and external interrupt controllers to
|
||||
appropriate VMs.
|
||||
|
||||
- **Interrupt posting**: for supporting direct delivery of virtual
|
||||
interrupts from devices and external controllers to virtual
|
||||
processors.
|
||||
|
||||
ACRN hypervisor supports DMA remapping that provides address translation
|
||||
capability for PCI pass-through devices, and second-level translation,
|
||||
which applies to requests-without-PASID. ACRN does not support
|
||||
First-level / nested translation.
|
||||
|
||||
DMAR Engines Discovery
|
||||
**********************
|
||||
|
||||
DMA Remapping Report ACPI table
|
||||
===============================
|
||||
|
||||
For generic platforms, ACRN hypervisor retrieves DMAR information from
|
||||
the ACPI table, and parses the DMAR reporting structure to discover the
|
||||
number of DMA-remapping hardware units present in the platform as well as
|
||||
the devices under the scope of a remapping hardware unit, as shown in
|
||||
:numref:`dma-remap-report`:
|
||||
|
||||
.. figure:: images/vt-d-image90.png
|
||||
:align: center
|
||||
:name: dma-remap-report
|
||||
|
||||
DMA Remapping Reporting Structure
|
||||
|
||||
Pre-parsed DMAR information
|
||||
===========================
|
||||
|
||||
For specific platforms, ACRN hypervisor uses pre-parsed DMA remapping
|
||||
reporting information directly to save time for hypervisor boot-up.
|
||||
|
||||
DMA remapping unit for integrated graphics device
|
||||
=================================================
|
||||
|
||||
Generally, there is a dedicated remapping hardware unit for the Intel
|
||||
integrated graphics device. ACRN implements GVT-g for graphics, but
|
||||
GVT-g is not compatible with VT-d. The remapping hardware unit for
|
||||
graphics device is disabled on ACRN if GVT-g is enabled. If the graphics
|
||||
device needs to pass-through to a VM, then the remapping hardware unit
|
||||
must be enabled.
|
||||
|
||||
DMA Remapping
|
||||
*************
|
||||
|
||||
DMA remapping hardware is used to isolate device access to memory,
|
||||
enabling each device in the system to be assigned to a specific domain
|
||||
through a distinct set of paging structures.
|
||||
|
||||
Domains
|
||||
=======
|
||||
|
||||
A domain is abstractly defined as an isolated environment in the
|
||||
platform, to which a subset of the host physical memory is allocated.
|
||||
The memory resource of a domain is specified by the address translation
|
||||
tables.
|
||||
|
||||
Device to Domain Mapping Structure
|
||||
==================================
|
||||
|
||||
VT-d hardware uses root-table and context-tables to build the mapping
|
||||
between devices and domains as shown in :numref:`vt-d-mapping`.
|
||||
|
||||
.. figure:: images/vt-d-image44.png
|
||||
:align: center
|
||||
:name: vt-d-mapping
|
||||
|
||||
Device to Domain Mapping structures
|
||||
|
||||
The root-table is 4-KByte in size and contains 256 root-entries to cover
|
||||
the PCI bus number space (0-255). Each root-entry contains a
|
||||
context-table pointer to reference the context-table for devices on the
|
||||
bus identified by the root-entry, if the present flag of the root-entry
|
||||
is set.
|
||||
|
||||
Each context-table contains 256 entries, with each entry corresponding
|
||||
to a PCI device function on the bus. For a PCI device, the device and
|
||||
function numbers (8-bits) are used to index into the context-table. Each
|
||||
context-entry contains a Second-level Page-table Pointer, which provides
|
||||
the host physical address of the address translation structure in system
|
||||
memory to be used for remapping requests-without-PASID processed through
|
||||
the context-entry.
|
||||
|
||||
For a given Bus, Device, and Function combination as shown in
|
||||
:numref:`bdf-passthru`, a pass-through device can be associated with
|
||||
address translation structures for a domain.
|
||||
|
||||
.. figure:: images/vt-d-image19.png
|
||||
:align: center
|
||||
:name: bdf-passthru
|
||||
|
||||
BDF Format of Pass-through Device
|
||||
|
||||
Refer to the `VT-d spec`_ for the more details of Device to domain
|
||||
mapping structures.
|
||||
|
||||
.. _VT-d spec:
|
||||
https://software.intel.com/sites/default/files/managed/c5/15/vt-directed-io-spec.pdf
|
||||
|
||||
Address Translation Structures
|
||||
==============================
|
||||
|
||||
On ACRN, EPT table of a domain is used as the address translation
|
||||
structures for the devices assigned to the domain, as shown
|
||||
:numref:`vt-d-DMA`.
|
||||
|
||||
.. figure:: images/vt-d-image40.png
|
||||
:align: center
|
||||
:name: vt-d-DMA
|
||||
|
||||
DMA Remapping Diagram
|
||||
|
||||
When the device attempts to access system memory, the DMA
|
||||
remapping hardware intercepts the access, utilizes the EPT table of the
|
||||
domain to determine whether the access is allowed, and translates the DMA
|
||||
address according to the EPT table from guest physical address (GPA) to
|
||||
host physical address (HPA).
|
||||
|
||||
Domains and Memory Isolation
|
||||
============================
|
||||
|
||||
There are no DMA operations inside the hypervisor, so ACRN doesn't
|
||||
create a domain for the hypervisor. No DMA operations from pass-through
|
||||
devices can access the hypervisor memory.
|
||||
|
||||
ACRN treats each virtual machine (VM) as a separate domain. For a VM,
|
||||
there is a EPT table for Normal world, and there may be a EPT table for
|
||||
Secure World. Secure world can access Normal World's memory, but Normal
|
||||
world cannot access Secure World's memory.
|
||||
|
||||
SOS_VM domain
|
||||
SOS_VM domain is created when the hypervisor creates VM for the
|
||||
Service OS.
|
||||
|
||||
IOMMU uses the EPT table of Normal world of SOS_VM as the address
|
||||
translation structures for the devices in SOS_VM domain. The Normal world's
|
||||
EPT table of SOS_VM doesn't include the memory resource of the hypervisor
|
||||
and Secure worlds if any. So the devices in SOS_VM domain can't access the
|
||||
memory belong to hypervisor or secure worlds.
|
||||
|
||||
Other domains
|
||||
Other VM domains will be created when hypervisor creates User OS. One
|
||||
domain for each User OS.
|
||||
|
||||
IOMMU uses the EPT table of Normal world of a VM as the address
|
||||
translation structures for the devices in the domain. The Normal world's
|
||||
EPT table of the VM only allows devices to access the memory
|
||||
allocated for Normal world of the VM.
|
||||
|
||||
Page-walk coherency
|
||||
===================
|
||||
|
||||
For the VT-d hardware, which doesn't support page-walk coherency,
|
||||
hypervisor needs to make sure the updates of VT-d tables are synced in
|
||||
memory:
|
||||
|
||||
- Device to Domain Mapping Structures, including Root-entries and
|
||||
Context-entries
|
||||
|
||||
- EPT table of a VM.
|
||||
|
||||
ACRN will flush the related cache line after updates of these structures
|
||||
if the VT-d hardware doesn't support page-walk coherency.
|
||||
|
||||
Super-page support
|
||||
==================
|
||||
|
||||
ACRN VT-d reuses the EPT table as address a translation table. VT-d capability
|
||||
for super-page support should be identical with the usage of EPT table.
|
||||
|
||||
Snoop control
|
||||
=============
|
||||
|
||||
If VT-d hardware supports snoop control, it allows VT-d to control to
|
||||
ignore the "no-snoop attribute" in PCI-E transactions.
|
||||
|
||||
The following table shows the snoop behavior of DMA operation controlled by the
|
||||
combination of:
|
||||
|
||||
- Snoop Control capability of VT-d DMAR unit
|
||||
- The setting of SNP filed in leaf PTE
|
||||
- No-snoop attribute in PCI-e request
|
||||
|
||||
.. list-table::
|
||||
:widths: 25 25 25 25
|
||||
:header-rows: 1
|
||||
|
||||
* - SC cap of VT-d
|
||||
- SNP filed in leaf PTE
|
||||
- No-snoop attribute in request
|
||||
- Snoop behavior
|
||||
|
||||
* - 0
|
||||
- 0 (must be 0)
|
||||
- no snoop
|
||||
- No snoop
|
||||
|
||||
* - 0
|
||||
- 0 (must be 0)
|
||||
- snoop
|
||||
- Snoop
|
||||
|
||||
* - 1
|
||||
- 1
|
||||
- snoop / no snoop
|
||||
- Snoop
|
||||
|
||||
* - 1
|
||||
- 0
|
||||
- no snoop
|
||||
- No snoop
|
||||
|
||||
* - 1
|
||||
- 0
|
||||
- snoop
|
||||
- Snoop
|
||||
|
||||
ACRN enable Snoop Control by default if all enabled VT-d DMAR units
|
||||
support Snoop Control by setting bit 11 of leaf PTE of EPT table. Bit 11
|
||||
of leaf PTE of EPT is ignored by MMU. So no side effect for MMU.
|
||||
|
||||
If one of the enabled VT-d DMAR units doesn't support Snoop Control,
|
||||
then Bit 11 of leaf PET of EPT is not set since the field is treated as
|
||||
reserved(0) by VT-d hardware implementations not supporting Snoop
|
||||
Control.
|
||||
|
||||
Initialization
|
||||
**************
|
||||
|
||||
During hypervisor initialization, it registers DMAR units on the
|
||||
platform according to the reparsed information or DMAR table. There may
|
||||
be multiple DMAR units on the platform, ACRN allows some of the DMAR
|
||||
units to be ignored. If some DMAR unit(s) are marked as ignored, they
|
||||
would not be enabled.
|
||||
|
||||
Hypervisor creates SOS_VM domain using the Normal World's EPT table of SOS_VM
|
||||
as address translation table when creating SOS_VM as Service OS. And all
|
||||
PCI devices on the platform are added to SOS_VM domain. Then enable DMAR
|
||||
translation for DMAR unit(s) if they are not marked as ignored.
|
||||
|
||||
Device assignment
|
||||
*****************
|
||||
|
||||
All devices are initially added to SOS_VM domain.
|
||||
To assign a device means to assign the device to an User OS. The device
|
||||
is remove from SOS_VM domain and added to the VM domain related to the User
|
||||
OS, which changes the address translation table from EPT of SOS_VM to EPT
|
||||
of User OS for the device.
|
||||
|
||||
To unassign a device means to unassign the device from an User OS. The
|
||||
device is remove from the VM domain related to the User OS, then added
|
||||
back to SOS_VM domain, which changes the address translation table from EPT
|
||||
of User OS to EPT of SOS_VM for the device.
|
||||
|
||||
Power Management support for S3
|
||||
*******************************
|
||||
|
||||
During platform S3 suspend and resume, the VT-d register values will be
|
||||
lost. ACRN VT-d provide APIs to be called during S3 suspend and resume.
|
||||
|
||||
During S3 suspend, some register values are saved in the memory, and
|
||||
DMAR translation is disabled. During S3 resume, the register values
|
||||
saved are restored. Root table address register is set. DMAR translation
|
||||
is enabled.
|
||||
|
||||
All the operations for S3 suspend and resume are performed on all DMAR
|
||||
units on the platform, except for the DMAR units marked ignored.
|
||||
|
||||
Error Handling
|
||||
**************
|
||||
|
||||
ACRN VT-d supports DMA remapping error reporting. ACRN VT-d requests a
|
||||
IRQ / vector for DMAR error reporting. A DMAR fault handler is
|
||||
registered for the IRQ. DMAR unit supports report fault event via MSI.
|
||||
When a fault event occurs, a MSI is generated, so that the DMAR fault
|
||||
handler will be called to report error event.
|
||||
|
||||
Data structures and interfaces
|
||||
******************************
|
||||
|
||||
initialization and deinitialization
|
||||
===================================
|
||||
|
||||
The following APIs are provided during initialization and
|
||||
deinitialization:
|
||||
|
||||
.. doxygenfunction:: init_iommu
|
||||
:project: Project ACRN
|
||||
|
||||
runtime
|
||||
=======
|
||||
|
||||
The following API are provided during runtime:
|
||||
|
||||
.. doxygenfunction:: create_iommu_domain
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: destroy_iommu_domain
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: suspend_iommu
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: resume_iommu
|
||||
:project: Project ACRN
|
||||
|
||||
.. doxygenfunction:: move_pt_device
|
||||
:project: Project ACRN
|
BIN
doc/developer-guides/hld/images/APL_GVT-g-DM.png
Normal file
After Width: | Height: | Size: 81 KiB |
BIN
doc/developer-guides/hld/images/APL_GVT-g-access-patterns.png
Normal file
After Width: | Height: | Size: 26 KiB |
BIN
doc/developer-guides/hld/images/APL_GVT-g-api-forwarding.png
Normal file
After Width: | Height: | Size: 72 KiB |
BIN
doc/developer-guides/hld/images/APL_GVT-g-arch.png
Normal file
After Width: | Height: | Size: 60 KiB |
BIN
doc/developer-guides/hld/images/APL_GVT-g-direct-display.png
Normal file
After Width: | Height: | Size: 34 KiB |
BIN
doc/developer-guides/hld/images/APL_GVT-g-display-virt.png
Normal file
After Width: | Height: | Size: 173 KiB |
BIN
doc/developer-guides/hld/images/APL_GVT-g-full-pic.png
Normal file
After Width: | Height: | Size: 201 KiB |
BIN
doc/developer-guides/hld/images/APL_GVT-g-graphics-arch.png
Normal file
After Width: | Height: | Size: 14 KiB |
BIN
doc/developer-guides/hld/images/APL_GVT-g-hyper-dma.png
Normal file
After Width: | Height: | Size: 147 KiB |
BIN
doc/developer-guides/hld/images/APL_GVT-g-indirect-display.png
Normal file
After Width: | Height: | Size: 35 KiB |
BIN
doc/developer-guides/hld/images/APL_GVT-g-interrupt-virt.png
Normal file
After Width: | Height: | Size: 58 KiB |
BIN
doc/developer-guides/hld/images/APL_GVT-g-ive-use-case.png
Normal file
After Width: | Height: | Size: 117 KiB |
After Width: | Height: | Size: 166 KiB |
BIN
doc/developer-guides/hld/images/APL_GVT-g-mem-part.png
Normal file
After Width: | Height: | Size: 101 KiB |
BIN
doc/developer-guides/hld/images/APL_GVT-g-pass-through.png
Normal file
After Width: | Height: | Size: 71 KiB |
BIN
doc/developer-guides/hld/images/APL_GVT-g-per-vm-shadow.png
Normal file
After Width: | Height: | Size: 450 KiB |
BIN
doc/developer-guides/hld/images/APL_GVT-g-perf-critical.png
Normal file
After Width: | Height: | Size: 62 KiB |
BIN
doc/developer-guides/hld/images/APL_GVT-g-plane-based.png
Normal file
After Width: | Height: | Size: 76 KiB |
BIN
doc/developer-guides/hld/images/APL_GVT-g-scheduling-policy.png
Normal file
After Width: | Height: | Size: 75 KiB |
BIN
doc/developer-guides/hld/images/APL_GVT-g-scheduling.png
Normal file
After Width: | Height: | Size: 86 KiB |
BIN
doc/developer-guides/hld/images/APL_GVT-g-shared-shadow.png
Normal file
After Width: | Height: | Size: 34 KiB |
BIN
doc/developer-guides/hld/images/APL_GVT-g-workload.png
Normal file
After Width: | Height: | Size: 32 KiB |
BIN
doc/developer-guides/hld/images/config-image103.png
Normal file
After Width: | Height: | Size: 38 KiB |
BIN
doc/developer-guides/hld/images/console-image41.png
Normal file
After Width: | Height: | Size: 8.1 KiB |
BIN
doc/developer-guides/hld/images/console-image93.png
Normal file
After Width: | Height: | Size: 20 KiB |
BIN
doc/developer-guides/hld/images/dm-image108.png
Normal file
After Width: | Height: | Size: 41 KiB |
BIN
doc/developer-guides/hld/images/dm-image26.png
Normal file
After Width: | Height: | Size: 36 KiB |
BIN
doc/developer-guides/hld/images/dm-image33.png
Normal file
After Width: | Height: | Size: 60 KiB |
BIN
doc/developer-guides/hld/images/dm-image36.png
Normal file
After Width: | Height: | Size: 37 KiB |
BIN
doc/developer-guides/hld/images/dm-image43.png
Normal file
After Width: | Height: | Size: 30 KiB |
BIN
doc/developer-guides/hld/images/dm-image52.png
Normal file
After Width: | Height: | Size: 50 KiB |
BIN
doc/developer-guides/hld/images/dm-image74.png
Normal file
After Width: | Height: | Size: 18 KiB |
BIN
doc/developer-guides/hld/images/dm-image75.png
Normal file
After Width: | Height: | Size: 54 KiB |
BIN
doc/developer-guides/hld/images/dm-image80.png
Normal file
After Width: | Height: | Size: 27 KiB |
BIN
doc/developer-guides/hld/images/dm-image83.png
Normal file
After Width: | Height: | Size: 80 KiB |
BIN
doc/developer-guides/hld/images/dm-image94.png
Normal file
After Width: | Height: | Size: 23 KiB |
BIN
doc/developer-guides/hld/images/dm-image96.png
Normal file
After Width: | Height: | Size: 23 KiB |
BIN
doc/developer-guides/hld/images/dm-image97.png
Normal file
After Width: | Height: | Size: 22 KiB |
BIN
doc/developer-guides/hld/images/dm-image99.png
Normal file
After Width: | Height: | Size: 14 KiB |
BIN
doc/developer-guides/hld/images/hld-image104.png
Normal file
After Width: | Height: | Size: 20 KiB |
BIN
doc/developer-guides/hld/images/hld-image107.png
Normal file
After Width: | Height: | Size: 25 KiB |
BIN
doc/developer-guides/hld/images/hld-image17.png
Normal file
After Width: | Height: | Size: 56 KiB |
BIN
doc/developer-guides/hld/images/hld-image35.png
Normal file
After Width: | Height: | Size: 56 KiB |
BIN
doc/developer-guides/hld/images/hld-image38.png
Normal file
After Width: | Height: | Size: 18 KiB |
BIN
doc/developer-guides/hld/images/hld-image47.png
Normal file
After Width: | Height: | Size: 42 KiB |
BIN
doc/developer-guides/hld/images/hld-image68.png
Normal file
After Width: | Height: | Size: 28 KiB |
BIN
doc/developer-guides/hld/images/hld-image7.png
Normal file
After Width: | Height: | Size: 67 KiB |
BIN
doc/developer-guides/hld/images/hld-image82.png
Normal file
After Width: | Height: | Size: 33 KiB |
BIN
doc/developer-guides/hld/images/hld-pm-image28.png
Normal file
After Width: | Height: | Size: 57 KiB |
BIN
doc/developer-guides/hld/images/hld-pm-image62.png
Normal file
After Width: | Height: | Size: 21 KiB |
BIN
doc/developer-guides/hld/images/interrupt-image2.png
Normal file
After Width: | Height: | Size: 20 KiB |
BIN
doc/developer-guides/hld/images/interrupt-image3.png
Normal file
After Width: | Height: | Size: 24 KiB |
BIN
doc/developer-guides/hld/images/interrupt-image39.png
Normal file
After Width: | Height: | Size: 115 KiB |
BIN
doc/developer-guides/hld/images/interrupt-image46.png
Normal file
After Width: | Height: | Size: 45 KiB |
BIN
doc/developer-guides/hld/images/interrupt-image48.png
Normal file
After Width: | Height: | Size: 127 KiB |
BIN
doc/developer-guides/hld/images/interrupt-image66.png
Normal file
After Width: | Height: | Size: 58 KiB |
BIN
doc/developer-guides/hld/images/interrupt-image76.png
Normal file
After Width: | Height: | Size: 99 KiB |
BIN
doc/developer-guides/hld/images/interrupt-image89.png
Normal file
After Width: | Height: | Size: 11 KiB |
BIN
doc/developer-guides/hld/images/ioc-image1.png
Normal file
After Width: | Height: | Size: 82 KiB |
BIN
doc/developer-guides/hld/images/ioc-image10.png
Normal file
After Width: | Height: | Size: 44 KiB |
BIN
doc/developer-guides/hld/images/ioc-image12.png
Normal file
After Width: | Height: | Size: 8.8 KiB |
BIN
doc/developer-guides/hld/images/ioc-image13.png
Normal file
After Width: | Height: | Size: 37 KiB |
BIN
doc/developer-guides/hld/images/ioc-image14.png
Normal file
After Width: | Height: | Size: 24 KiB |
BIN
doc/developer-guides/hld/images/ioc-image15.png
Normal file
After Width: | Height: | Size: 56 KiB |
BIN
doc/developer-guides/hld/images/ioc-image16.png
Normal file
After Width: | Height: | Size: 46 KiB |
BIN
doc/developer-guides/hld/images/ioc-image17.png
Normal file
After Width: | Height: | Size: 31 KiB |
BIN
doc/developer-guides/hld/images/ioc-image18.png
Normal file
After Width: | Height: | Size: 18 KiB |
BIN
doc/developer-guides/hld/images/ioc-image19.png
Normal file
After Width: | Height: | Size: 58 KiB |
BIN
doc/developer-guides/hld/images/ioc-image2.png
Normal file
After Width: | Height: | Size: 31 KiB |
BIN
doc/developer-guides/hld/images/ioc-image20.png
Normal file
After Width: | Height: | Size: 53 KiB |
BIN
doc/developer-guides/hld/images/ioc-image21.png
Normal file
After Width: | Height: | Size: 62 KiB |
BIN
doc/developer-guides/hld/images/ioc-image22.png
Normal file
After Width: | Height: | Size: 58 KiB |
BIN
doc/developer-guides/hld/images/ioc-image24.png
Normal file
After Width: | Height: | Size: 14 KiB |
BIN
doc/developer-guides/hld/images/ioc-image3.png
Normal file
After Width: | Height: | Size: 15 KiB |