mirror of
https://github.com/projectacrn/acrn-hypervisor.git
synced 2025-06-18 19:57:31 +00:00
doc: copy editing in the hld topics
Signed-off-by: Benjamin Fitch <benjamin.fitch@intel.com>
This commit is contained in:
parent
5df3455e8d
commit
2c4249fb96
@ -4,11 +4,11 @@ AHCI Virtualization in Device Model
|
||||
###################################
|
||||
|
||||
AHCI (Advanced Host Controller Interface) is a hardware mechanism
|
||||
that allows software to communicate with Serial ATA devices. AHCI HBA
|
||||
that enables software to communicate with Serial ATA devices. AHCI HBA
|
||||
(host bus adapters) is a PCI class device that acts as a data movement
|
||||
engine between system memory and Serial ATA devices. The ACPI HBA in
|
||||
ACRN supports both ATA and ATAPI devices. The architecture is shown in
|
||||
the below diagram.
|
||||
the diagram below:
|
||||
|
||||
.. figure:: images/ahci-image1.png
|
||||
:align: center
|
||||
@ -16,17 +16,17 @@ the below diagram.
|
||||
:name: achi-device
|
||||
|
||||
HBA is registered to the PCI system with device id 0x2821 and vendor id
|
||||
0x8086. Its memory registers are mapped in BAR 5. It only supports 6
|
||||
ports (refer to ICH8 AHCI). AHCI driver in the User VM can access HBA in DM
|
||||
through the PCI BAR. And HBA can inject MSI interrupts through the PCI
|
||||
0x8086. Its memory registers are mapped in BAR 5. It supports only six
|
||||
ports (refer to ICH8 AHCI). The AHCI driver in the User VM can access HBA in
|
||||
DM through the PCI BAR, and HBA can inject MSI interrupts through the PCI
|
||||
framework.
|
||||
|
||||
When the application in the User VM reads data from /dev/sda, the request will
|
||||
send through the AHCI driver and then the PCI driver. The User VM will trap to
|
||||
hypervisor, and hypervisor dispatch the request to DM. According to the
|
||||
offset in the BAR, the request will dispatch to port control handler.
|
||||
Then the request is parse to a block I/O request which can be processed
|
||||
by Block backend model.
|
||||
be sent through the AHCI driver and then the PCI driver. The Hypervisor will
|
||||
trap the request from the User VM and dispatch it to the DM. According to the
|
||||
offset in the BAR, the request will be dispatched to the port control handler.
|
||||
Then the request is parsed to a block I/O request which can be processed by
|
||||
the Block backend model.
|
||||
|
||||
Usage:
|
||||
|
||||
@ -34,7 +34,7 @@ Usage:
|
||||
|
||||
Type: 'hd' and 'cd' are available.
|
||||
|
||||
Filepath: the path for the backend file, could be a partition or a
|
||||
Filepath: the path for the backend file; could be a partition or a
|
||||
regular file.
|
||||
|
||||
For example,
|
||||
|
@ -19,7 +19,7 @@ The PS2 port is a 6-pin mini-Din connector used for connecting keyboards and mic
|
||||
PS2 Keyboard Emulation
|
||||
**********************
|
||||
|
||||
ACRN supports AT keyboard controller for PS2 keyboard that can be accessed through I/O ports(0x60 and 0x64). 0x60 is used to access AT keyboard controller data register, 0x64 is used to access AT keyboard controller address register.
|
||||
ACRN supports the AT keyboard controller for PS2 keyboard that can be accessed through I/O ports (0x60 and 0x64). 0x60 is used to access AT keyboard controller data register; 0x64 is used to access AT keyboard controller address register.
|
||||
|
||||
The PS2 keyboard ACPI description as below::
|
||||
|
||||
@ -48,8 +48,8 @@ The PS2 keyboard ACPI description as below::
|
||||
PS2 Mouse Emulation
|
||||
*******************
|
||||
|
||||
ACRN supports AT keyboard controller for PS2 mouse that can be accessed through I/O ports(0x60 and 0x64).
|
||||
0x60 is used to access AT keyboard controller data register, 0x64 is used to access AT keyboard controller address register.
|
||||
ACRN supports AT keyboard controller for PS2 mouse that can be accessed through I/O ports (0x60 and 0x64).
|
||||
0x60 is used to access AT keyboard controller data register; 0x64 is used to access AT keyboard controller address register.
|
||||
|
||||
The PS2 mouse ACPI description as below::
|
||||
|
||||
|
@ -10,7 +10,7 @@ Purpose of This Document
|
||||
========================
|
||||
|
||||
This high-level design (HLD) document describes the usage requirements
|
||||
and high level design for Intel |reg| Graphics Virtualization Technology for
|
||||
and high-level design for Intel |reg| Graphics Virtualization Technology for
|
||||
shared virtual :term:`GPU` technology (:term:`GVT-g`) on Apollo Lake-I
|
||||
SoCs.
|
||||
|
||||
@ -18,14 +18,14 @@ This document describes:
|
||||
|
||||
- The different GPU virtualization techniques
|
||||
- GVT-g mediated passthrough
|
||||
- High level design
|
||||
- High-level design
|
||||
- Key components
|
||||
- GVT-g new architecture differentiation
|
||||
|
||||
Audience
|
||||
========
|
||||
|
||||
This document is for developers, validation teams, architects and
|
||||
This document is for developers, validation teams, architects, and
|
||||
maintainers of Intel |reg| GVT-g for the Apollo Lake SoCs.
|
||||
|
||||
The reader should have some familiarity with the basic concepts of
|
||||
@ -47,24 +47,24 @@ Background
|
||||
|
||||
Intel GVT-g is an enabling technology in emerging graphics
|
||||
virtualization scenarios. It adopts a full GPU virtualization approach
|
||||
based on mediated passthrough technology, to achieve good performance,
|
||||
scalability and secure isolation among Virtual Machines (VMs). A virtual
|
||||
based on mediated passthrough technology to achieve good performance,
|
||||
scalability, and secure isolation among Virtual Machines (VMs). A virtual
|
||||
GPU (vGPU), with full GPU features, is presented to each VM so that a
|
||||
native graphics driver can run directly inside a VM.
|
||||
|
||||
Intel GVT-g technology for Apollo Lake (APL) has been implemented in
|
||||
open source hypervisors or Virtual Machine Monitors (VMMs):
|
||||
open-source hypervisors or Virtual Machine Monitors (VMMs):
|
||||
|
||||
- Intel GVT-g for ACRN, also known as, "AcrnGT"
|
||||
- Intel GVT-g for KVM, also known as, "KVMGT"
|
||||
- Intel GVT-g for Xen, also known as, "XenGT"
|
||||
|
||||
The core vGPU device model is released under BSD/MIT dual license, so it
|
||||
The core vGPU device model is released under the BSD/MIT dual license, so it
|
||||
can be reused in other proprietary hypervisors.
|
||||
|
||||
Intel has a portfolio of graphics virtualization technologies
|
||||
(:term:`GVT-g`, :term:`GVT-d` and :term:`GVT-s`). GVT-d and GVT-s are
|
||||
outside of the scope of this document.
|
||||
(:term:`GVT-g`, :term:`GVT-d`, and :term:`GVT-s`). GVT-d and GVT-s are
|
||||
outside the scope of this document.
|
||||
|
||||
This HLD applies to the Apollo Lake platform only. Support of other
|
||||
hardware is outside the scope of this HLD.
|
||||
@ -89,11 +89,11 @@ The main targeted usage of GVT-g is in automotive applications, such as:
|
||||
Existing Techniques
|
||||
===================
|
||||
|
||||
A graphics device is no different from any other I/O device, with
|
||||
A graphics device is no different from any other I/O device with
|
||||
respect to how the device I/O interface is virtualized. Therefore,
|
||||
existing I/O virtualization techniques can be applied to graphics
|
||||
virtualization. However, none of the existing techniques can meet the
|
||||
general requirement of performance, scalability, and secure isolation
|
||||
general requirements of performance, scalability, and secure isolation
|
||||
simultaneously. In this section, we review the pros and cons of each
|
||||
technique in detail, enabling the audience to understand the rationale
|
||||
behind the entire GVT-g effort.
|
||||
@ -102,12 +102,12 @@ Emulation
|
||||
---------
|
||||
|
||||
A device can be emulated fully in software, including its I/O registers
|
||||
and internal functional blocks. There would be no dependency on the
|
||||
underlying hardware capability, therefore compatibility can be achieved
|
||||
and internal functional blocks. Because there is no dependency on the
|
||||
underlying hardware capability, compatibility can be achieved
|
||||
across platforms. However, due to the CPU emulation cost, this technique
|
||||
is usually used for legacy devices, such as a keyboard, mouse, and VGA
|
||||
card. There would be great complexity and extremely low performance to
|
||||
fully emulate a modern accelerator, such as a GPU. It may be acceptable
|
||||
is usually used only for legacy devices such as a keyboard, mouse, and VGA
|
||||
card. Fully emulating a modern accelerator such as a GPU would involve great
|
||||
complexity and extremely low performance. It may be acceptable
|
||||
for use in a simulation environment, but it is definitely not suitable
|
||||
for production usage.
|
||||
|
||||
@ -116,9 +116,9 @@ API Forwarding
|
||||
|
||||
API forwarding, or a split driver model, is another widely-used I/O
|
||||
virtualization technology. It has been used in commercial virtualization
|
||||
productions, for example, VMware*, PCoIP*, and Microsoft* RemoteFx*.
|
||||
productions such as VMware*, PCoIP*, and Microsoft* RemoteFx*.
|
||||
It is a natural path when researchers study a new type of
|
||||
I/O virtualization usage, for example, when GPGPU computing in VM was
|
||||
I/O virtualization usage—for example, when GPGPU computing in a VM was
|
||||
initially proposed. Intel GVT-s is based on this approach.
|
||||
|
||||
The architecture of API forwarding is shown in :numref:`api-forwarding`:
|
||||
@ -131,10 +131,10 @@ The architecture of API forwarding is shown in :numref:`api-forwarding`:
|
||||
API Forwarding
|
||||
|
||||
A frontend driver is employed to forward high-level API calls (OpenGL,
|
||||
Directx, and so on) inside a VM, to a Backend driver in the Hypervisor
|
||||
for acceleration. The Backend may be using a different graphics stack,
|
||||
DirectX, and so on) inside a VM to a backend driver in the Hypervisor
|
||||
for acceleration. The backend may be using a different graphics stack,
|
||||
so API translation between different graphics protocols may be required.
|
||||
The Backend driver allocates a physical GPU resource for each VM,
|
||||
The backend driver allocates a physical GPU resource for each VM,
|
||||
behaving like a normal graphics application in a Hypervisor. Shared
|
||||
memory may be used to reduce memory copying between the host and guest
|
||||
graphic stacks.
|
||||
@ -143,16 +143,16 @@ API forwarding can bring hardware acceleration capability into a VM,
|
||||
with other merits such as vendor independence and high density. However, it
|
||||
also suffers from the following intrinsic limitations:
|
||||
|
||||
- Lagging features - Every new API version needs to be specifically
|
||||
handled, so it means slow time-to-market (TTM) to support new standards.
|
||||
- Lagging features - Every new API version must be specifically
|
||||
handled, which means slow time-to-market (TTM) to support new standards.
|
||||
For example,
|
||||
only DirectX9 is supported, when DirectX11 is already in the market.
|
||||
only DirectX9 is supported while DirectX11 is already in the market.
|
||||
Also, there is a big gap in supporting media and compute usages.
|
||||
|
||||
- Compatibility issues - A GPU is very complex, and consequently so are
|
||||
high level graphics APIs. Different protocols are not 100% compatible
|
||||
on every subtle API, so the customer can observe feature/quality loss
|
||||
for specific applications.
|
||||
high-level graphics APIs. Different protocols are not 100% compatible
|
||||
on every subtly different API, so the customer can observe feature/quality
|
||||
loss for specific applications.
|
||||
|
||||
- Maintenance burden - Occurs when supported protocols and specific
|
||||
versions are incremented.
|
||||
@ -165,10 +165,10 @@ Direct Passthrough
|
||||
-------------------
|
||||
|
||||
"Direct passthrough" dedicates the GPU to a single VM, providing full
|
||||
features and good performance, but at the cost of device sharing
|
||||
features and good performance at the cost of device sharing
|
||||
capability among VMs. Only one VM at a time can use the hardware
|
||||
acceleration capability of the GPU, which is a major limitation of this
|
||||
technique. However, it is still a good approach to enable graphics
|
||||
technique. However, it is still a good approach for enabling graphics
|
||||
virtualization usages on Intel server platforms, as an intermediate
|
||||
solution. Intel GVT-d uses this mechanism.
|
||||
|
||||
@ -197,7 +197,7 @@ passthrough" technique.
|
||||
Concept
|
||||
=======
|
||||
|
||||
Mediated passthrough allows a VM to access performance-critical I/O
|
||||
Mediated passthrough enables a VM to access performance-critical I/O
|
||||
resources (usually partitioned) directly, without intervention from the
|
||||
hypervisor in most cases. Privileged operations from this VM are
|
||||
trapped-and-emulated to provide secure isolation among VMs.
|
||||
@ -212,7 +212,7 @@ trapped-and-emulated to provide secure isolation among VMs.
|
||||
The Hypervisor must ensure that no vulnerability is exposed when
|
||||
assigning performance-critical resource to each VM. When a
|
||||
performance-critical resource cannot be partitioned, a scheduler must be
|
||||
implemented (either in software or hardware) to allow time-based sharing
|
||||
implemented (either in software or hardware) to enable time-based sharing
|
||||
among multiple VMs. In this case, the device must allow the hypervisor
|
||||
to save and restore the hardware state associated with the shared resource,
|
||||
either through direct I/O register reads and writes (when there is no software
|
||||
@ -255,7 +255,7 @@ multiple virtual address spaces by GPU page tables. A 4 GB global
|
||||
virtual address space called "global graphics memory", accessible from
|
||||
both the GPU and CPU, is mapped through a global page table. Local
|
||||
graphics memory spaces are supported in the form of multiple 4 GB local
|
||||
virtual address spaces, but are only limited to access by the Render
|
||||
virtual address spaces but are limited to access by the Render
|
||||
Engine through local page tables. Global graphics memory is mostly used
|
||||
for the Frame Buffer and also serves as the Command Buffer. Massive data
|
||||
accesses are made to local graphics memory when hardware acceleration is
|
||||
@ -265,24 +265,24 @@ the on-die memory.
|
||||
The CPU programs the GPU through GPU-specific commands, shown in
|
||||
:numref:`graphics-arch`, using a producer-consumer model. The graphics
|
||||
driver programs GPU commands into the Command Buffer, including primary
|
||||
buffer and batch buffer, according to the high-level programming APIs,
|
||||
such as OpenGL* or DirectX*. Then, the GPU fetches and executes the
|
||||
buffer and batch buffer, according to the high-level programming APIs
|
||||
such as OpenGL* and DirectX*. Then, the GPU fetches and executes the
|
||||
commands. The primary buffer (called a ring buffer) may chain other
|
||||
batch buffers together. The primary buffer and ring buffer are used
|
||||
interchangeably thereafter. The batch buffer is used to convey the
|
||||
majority of the commands (up to ~98% of them) per programming model. A
|
||||
register tuple (head, tail) is used to control the ring buffer. The CPU
|
||||
submits the commands to the GPU by updating the tail, while the GPU
|
||||
fetches commands from the head, and then notifies the CPU by updating
|
||||
the head, after the commands have finished execution. Therefore, when
|
||||
fetches commands from the head and then notifies the CPU by updating
|
||||
the head after the commands have finished execution. Therefore, when
|
||||
the GPU has executed all commands from the ring buffer, the head and
|
||||
tail pointers are the same.
|
||||
|
||||
Having introduced the GPU architecture abstraction, it is important for
|
||||
us to understand how real-world graphics applications use the GPU
|
||||
hardware so that we can virtualize it in VMs efficiently. To do so, we
|
||||
characterized, for some representative GPU-intensive 3D workloads (the
|
||||
Phoronix Test Suite), the usages of the four critical interfaces:
|
||||
characterized the usages of the four critical interfaces for some
|
||||
representative GPU-intensive 3D workloads (the Phoronix Test Suite):
|
||||
|
||||
1) the Frame Buffer,
|
||||
2) the Command Buffer,
|
||||
@ -299,9 +299,9 @@ performance-critical resources, as shown in :numref:`access-patterns`.
|
||||
When the applications are being loaded, lots of source vertices and
|
||||
pixels are written by the CPU, so the Frame Buffer accesses occur in the
|
||||
range of hundreds of thousands per second. Then at run-time, the CPU
|
||||
programs the GPU through the commands, to render the Frame Buffer, so
|
||||
the Command Buffer accesses become the largest group, also in the
|
||||
hundreds of thousands per second. PTE and I/O accesses are minor in both
|
||||
programs the GPU through the commands to render the Frame Buffer, so
|
||||
the Command Buffer accesses become the largest group (also in the
|
||||
hundreds of thousands per second). PTE and I/O accesses are minor in both
|
||||
load and run-time phases ranging in tens of thousands per second.
|
||||
|
||||
.. figure:: images/APL_GVT-g-access-patterns.png
|
||||
@ -311,18 +311,18 @@ load and run-time phases ranging in tens of thousands per second.
|
||||
|
||||
Access Patterns of Running 3D Workloads
|
||||
|
||||
High Level Architecture
|
||||
High-Level Architecture
|
||||
***********************
|
||||
|
||||
:numref:`gvt-arch` shows the overall architecture of GVT-g, based on the
|
||||
ACRN hypervisor, with Service VM as the privileged VM, and multiple user
|
||||
guests. A GVT-g device model working with the ACRN hypervisor,
|
||||
guests. A GVT-g device model working with the ACRN hypervisor
|
||||
implements the policies of trap and passthrough. Each guest runs the
|
||||
native graphics driver and can directly access performance-critical
|
||||
resources: the Frame Buffer and Command Buffer, with resource
|
||||
partitioning (as presented later). To protect privileged resources, that
|
||||
is, the I/O registers and PTEs, corresponding accesses from the graphics
|
||||
driver in user VMs are trapped and forwarded to the GVT device model in
|
||||
partitioning (as presented later). To protect privileged resources—that
|
||||
is, the I/O registers and PTEs—corresponding accesses from the graphics
|
||||
driver in user VMs are trapped and forwarded to the GVT device model in the
|
||||
Service VM for emulation. The device model leverages i915 interfaces to access
|
||||
the physical GPU.
|
||||
|
||||
@ -366,7 +366,7 @@ and gives the corresponding result back to the guest.
|
||||
The vGPU Device Model provides the basic framework to do
|
||||
trap-and-emulation, including MMIO virtualization, interrupt
|
||||
virtualization, and display virtualization. It also handles and
|
||||
processes all the requests internally, such as, command scan and shadow,
|
||||
processes all the requests internally (such as command scan and shadow),
|
||||
schedules them in the proper manner, and finally submits to
|
||||
the Service VM i915 driver.
|
||||
|
||||
@ -384,9 +384,9 @@ Intel Processor Graphics implements two PCI MMIO BARs:
|
||||
|
||||
- **GTTMMADR BAR**: Combines both :term:`GGTT` modification range and Memory
|
||||
Mapped IO range. It is 16 MB on :term:`BDW`, with 2 MB used by MMIO, 6 MB
|
||||
reserved and 8 MB allocated to GGTT. GGTT starts from
|
||||
reserved, and 8 MB allocated to GGTT. GGTT starts from
|
||||
:term:`GTTMMADR` + 8 MB. In this section, we focus on virtualization of
|
||||
the MMIO range, discussing GGTT virtualization later.
|
||||
the MMIO range, leaving discussion of GGTT virtualization for later.
|
||||
|
||||
- **GMADR BAR**: As the PCI aperture is used by the CPU to access tiled
|
||||
graphics memory, GVT-g partitions this aperture range among VMs for
|
||||
@ -395,11 +395,11 @@ Intel Processor Graphics implements two PCI MMIO BARs:
|
||||
A 2 MB virtual MMIO structure is allocated per vGPU instance.
|
||||
|
||||
All the virtual MMIO registers are emulated as simple in-memory
|
||||
read-write, that is, guest driver will read back the same value that was
|
||||
read-write; that is, the guest driver will read back the same value that was
|
||||
programmed earlier. A common emulation handler (for example,
|
||||
intel_gvt_emulate_read/write) is enough to handle such general
|
||||
emulation requirements. However, some registers need to be emulated with
|
||||
specific logic, for example, affected by change of other states or
|
||||
emulation requirements. However, some registers must be emulated with
|
||||
specific logic—for example, affected by change of other states or
|
||||
additional audit or translation when updating the virtual register.
|
||||
Therefore, a specific emulation handler must be installed for those
|
||||
special registers.
|
||||
@ -408,19 +408,19 @@ The graphics driver may have assumptions about the initial device state,
|
||||
which stays with the point when the BIOS transitions to the OS. To meet
|
||||
the driver expectation, we need to provide an initial state of vGPU that
|
||||
a driver may observe on a pGPU. So the host graphics driver is expected
|
||||
to generate a snapshot of physical GPU state, which it does before guest
|
||||
to generate a snapshot of physical GPU state, which it does before the guest
|
||||
driver's initialization. This snapshot is used as the initial vGPU state
|
||||
by the device model.
|
||||
|
||||
PCI Configuration Space Virtualization
|
||||
--------------------------------------
|
||||
|
||||
PCI configuration space also needs to be virtualized in the device
|
||||
The PCI configuration space also must be virtualized in the device
|
||||
model. Different implementations may choose to implement the logic
|
||||
within the vGPU device model or in default system device model (for
|
||||
within the vGPU device model or in the default system device model (for
|
||||
example, ACRN-DM). GVT-g emulates the logic in the device model.
|
||||
|
||||
Some information is vital for the vGPU device model, including:
|
||||
Some information is vital for the vGPU device model, including
|
||||
Guest PCI BAR, Guest PCI MSI, and Base of ACPI OpRegion.
|
||||
|
||||
Legacy VGA Port I/O Virtualization
|
||||
@ -443,17 +443,17 @@ handle the GPU interrupt virtualization by itself. Virtual GPU
|
||||
interrupts are categorized into three types:
|
||||
|
||||
- Periodic GPU interrupts are emulated by timers. However, a notable
|
||||
exception to this is the VBlank interrupt. Due to the demands of user
|
||||
space compositors, such as Wayland, which requires a flip done event
|
||||
to be synchronized with a VBlank, this interrupt is forwarded from
|
||||
Service VM to User VM when Service VM receives it from the hardware.
|
||||
exception to this is the VBlank interrupt. Due to the demands of user space
|
||||
compositors such as Wayland, which requires a flip done event to be
|
||||
synchronized with a VBlank, this interrupt is forwarded from the Service VM
|
||||
to the User VM when the Service VM receives it from the hardware.
|
||||
|
||||
- Event-based GPU interrupts are emulated by the emulation logic. For
|
||||
example, AUX Channel Interrupt.
|
||||
- Event-based GPU interrupts are emulated by the emulation logic (for
|
||||
example, AUX Channel Interrupt).
|
||||
|
||||
- GPU command interrupts are emulated by a command parser and workload
|
||||
dispatcher. The command parser marks out which GPU command interrupts
|
||||
are generated during the command execution and the workload
|
||||
are generated during the command execution, and the workload
|
||||
dispatcher injects those interrupts into the VM after the workload is
|
||||
finished.
|
||||
|
||||
@ -468,27 +468,27 @@ Workload Scheduler
|
||||
------------------
|
||||
|
||||
The scheduling policy and workload scheduler are decoupled for
|
||||
scalability reasons. For example, a future QoS enhancement will only
|
||||
impact the scheduling policy, any i915 interface change or HW submission
|
||||
interface change (from execlist to :term:`GuC`) will only need workload
|
||||
scalability reasons. For example, a future QoS enhancement will impact
|
||||
only the scheduling policy, and any i915 interface change or hardware submission
|
||||
interface change (from execlist to :term:`GuC`) will need only workload
|
||||
scheduler updates.
|
||||
|
||||
The scheduling policy framework is the core of the vGPU workload
|
||||
scheduling system. It controls all of the scheduling actions and
|
||||
provides the developer with a generic framework for easy development of
|
||||
scheduling policies. The scheduling policy framework controls the work
|
||||
scheduling process without caring about how the workload is dispatched
|
||||
scheduling process without regard for how the workload is dispatched
|
||||
or completed. All the detailed workload dispatching is hidden in the
|
||||
workload scheduler, which is the actual executer of a vGPU workload.
|
||||
|
||||
The workload scheduler handles everything about one vGPU workload. Each
|
||||
hardware ring is backed by one workload scheduler kernel thread. The
|
||||
workload scheduler picks the workload from current vGPU workload queue
|
||||
and communicates with the virtual HW submission interface to emulate the
|
||||
and communicates with the virtual hardware submission interface to emulate the
|
||||
"schedule-in" status for the vGPU. It performs context shadow, Command
|
||||
Buffer scan and shadow, PPGTT page table pin/unpin/out-of-sync, before
|
||||
Buffer scan and shadow, and PPGTT page table pin/unpin/out-of-sync before
|
||||
submitting this workload to the host i915 driver. When the vGPU workload
|
||||
is completed, the workload scheduler asks the virtual HW submission
|
||||
is completed, the workload scheduler asks the virtual hardware submission
|
||||
interface to emulate the "schedule-out" status for the vGPU. The VM
|
||||
graphics driver then knows that a GPU workload is finished.
|
||||
|
||||
@ -504,11 +504,11 @@ Workload Submission Path
|
||||
|
||||
Software submits the workload using the legacy ring buffer mode on Intel
|
||||
Processor Graphics before Broadwell, which is no longer supported by the
|
||||
GVT-g virtual device model. A new HW submission interface named
|
||||
"Execlist" is introduced since Broadwell. With the new HW submission
|
||||
GVT-g virtual device model. A new hardware submission interface named
|
||||
"Execlist" is introduced since Broadwell. With the new hardware submission
|
||||
interface, software can achieve better programmability and easier
|
||||
context management. In Intel GVT-g, the vGPU submits the workload
|
||||
through the virtual HW submission interface. Each workload in submission
|
||||
through the virtual hardware submission interface. Each workload in submission
|
||||
will be represented as an ``intel_vgpu_workload`` data structure, a vGPU
|
||||
workload, which will be put on a per-vGPU and per-engine workload queue
|
||||
later after performing a few basic checks and verifications.
|
||||
@ -546,15 +546,15 @@ Direct Display Model
|
||||
|
||||
Direct Display Model
|
||||
|
||||
A typical automotive use case is where there are two displays in the car
|
||||
and each one needs to show one domain's content, with the two domains
|
||||
In a typical automotive use case, there are two displays in the car
|
||||
and each one must show one domain's content, with the two domains
|
||||
being the Instrument cluster and the In Vehicle Infotainment (IVI). As
|
||||
shown in :numref:`direct-display`, this can be accomplished through the direct
|
||||
display model of GVT-g, where the Service VM and User VM are each assigned all HW
|
||||
display model of GVT-g, where the Service VM and User VM are each assigned all hardware
|
||||
planes of two different pipes. GVT-g has a concept of display owner on a
|
||||
per HW plane basis. If it determines that a particular domain is the
|
||||
owner of a HW plane, then it allows the domain's MMIO register write to
|
||||
flip a frame buffer to that plane to go through to the HW. Otherwise,
|
||||
per hardware plane basis. If it determines that a particular domain is the
|
||||
owner of a hardware plane, then it allows the domain's MMIO register write to
|
||||
flip a frame buffer to that plane to go through to the hardware. Otherwise,
|
||||
such writes are blocked by the GVT-g.
|
||||
|
||||
Indirect Display Model
|
||||
@ -568,23 +568,23 @@ Indirect Display Model
|
||||
Indirect Display Model
|
||||
|
||||
For security or fastboot reasons, it may be determined that the User VM is
|
||||
either not allowed to display its content directly on the HW or it may
|
||||
either not allowed to display its content directly on the hardware or it may
|
||||
be too late before it boots up and displays its content. In such a
|
||||
scenario, the responsibility of displaying content on all displays lies
|
||||
with the Service VM. One of the use cases that can be realized is to display the
|
||||
entire frame buffer of the User VM on a secondary display. GVT-g allows for this
|
||||
model by first trapping all MMIO writes by the User VM to the HW. A proxy
|
||||
model by first trapping all MMIO writes by the User VM to the hardware. A proxy
|
||||
application can then capture the address in GGTT where the User VM has written
|
||||
its frame buffer and using the help of the Hypervisor and the Service VM's i915
|
||||
driver, can convert the Guest Physical Addresses (GPAs) into Host
|
||||
Physical Addresses (HPAs) before making a texture source or EGL image
|
||||
out of the frame buffer and then either post processing it further or
|
||||
simply displaying it on a HW plane of the secondary display.
|
||||
simply displaying it on a hardware plane of the secondary display.
|
||||
|
||||
GGTT-Based Surface Sharing
|
||||
--------------------------
|
||||
|
||||
One of the major automotive use case is called "surface sharing". This
|
||||
One of the major automotive use cases is called "surface sharing". This
|
||||
use case requires that the Service VM accesses an individual surface or a set of
|
||||
surfaces from the User VM without having to access the entire frame buffer of
|
||||
the User VM. Unlike the previous two models, where the User VM did not have to do
|
||||
@ -608,13 +608,13 @@ compositor, Mesa, and i915 driver had to be modified.
|
||||
This model has a major benefit and a major limitation. The
|
||||
benefit is that since it builds on top of the indirect display model,
|
||||
there are no special drivers necessary for it on either Service VM or User VM.
|
||||
Therefore, any Real Time Operating System (RTOS) that use
|
||||
Therefore, any Real Time Operating System (RTOS) that uses
|
||||
this model can simply do so without having to implement a driver, the
|
||||
infrastructure for which may not be present in their operating system.
|
||||
The limitation of this model is that video memory dedicated for a User VM is
|
||||
generally limited to a couple of hundred MBs. This can easily be
|
||||
exhausted by a few application buffers so the number and size of buffers
|
||||
is limited. Since it is not a highly-scalable model, in general, Intel
|
||||
is limited. Since it is not a highly-scalable model in general, Intel
|
||||
recommends the Hyper DMA buffer sharing model, described next.
|
||||
|
||||
Hyper DMA Buffer Sharing
|
||||
@ -628,12 +628,12 @@ Hyper DMA Buffer Sharing
|
||||
Hyper DMA Buffer Design
|
||||
|
||||
Another approach to surface sharing is Hyper DMA Buffer sharing. This
|
||||
model extends the Linux DMA buffer sharing mechanism where one driver is
|
||||
model extends the Linux DMA buffer sharing mechanism in which one driver is
|
||||
able to share its pages with another driver within one domain.
|
||||
|
||||
Applications buffers are backed by i915 Graphics Execution Manager
|
||||
Buffer Objects (GEM BOs). As in GGTT surface
|
||||
sharing, this model also requires compositor changes. The compositor of
|
||||
sharing, this model also requires compositor changes. The compositor of the
|
||||
User VM requests i915 to export these application GEM BOs and then passes
|
||||
them on to a special driver called the Hyper DMA Buf exporter whose job
|
||||
is to create a scatter gather list of pages mapped by PDEs and PTEs and
|
||||
@ -643,13 +643,13 @@ The compositor then shares this Hyper DMA Buf ID with the Service VM's Hyper DMA
|
||||
Buf importer driver which then maps the memory represented by this ID in
|
||||
the Service VM. A proxy application in the Service VM can then provide the ID of this driver
|
||||
to the Service VM i915, which can create its own GEM BO. Finally, the application
|
||||
can use it as an EGL image and do any post processing required before
|
||||
can use it as an EGL image and do any post-processing required before
|
||||
either providing it to the Service VM compositor or directly flipping it on a
|
||||
HW plane in the compositor's absence.
|
||||
hardware plane in the compositor's absence.
|
||||
|
||||
This model is highly scalable and can be used to share up to 4 GB worth
|
||||
of pages. It is also not limited to only sharing graphics buffers. Other
|
||||
buffers for the IPU and others, can also be shared with it. However, it
|
||||
of pages. It is also not limited to sharing graphics buffers. Other
|
||||
buffers for the IPU and others can also be shared with it. However, it
|
||||
does require that the Service VM port the Hyper DMA Buffer importer driver. Also,
|
||||
the Service VM must comprehend and implement the DMA buffer sharing model.
|
||||
|
||||
@ -671,8 +671,8 @@ Plane-Based Domain Ownership
|
||||
|
||||
Yet another mechanism for showing content of both the Service VM and User VM on the
|
||||
same physical display is called plane-based domain ownership. Under this
|
||||
model, both the Service VM and User VM are provided a set of HW planes that they can
|
||||
flip their contents on to. Since each domain provides its content, there
|
||||
model, both the Service VM and User VM are provided a set of hardware planes that they can
|
||||
flip their contents onto. Since each domain provides its content, there
|
||||
is no need for any extra composition to be done through the Service VM. The display
|
||||
controller handles alpha blending contents of different domains on a
|
||||
single pipe. This saves on any complexity on either the Service VM or the User VM
|
||||
@ -680,9 +680,9 @@ SW stack.
|
||||
|
||||
It is important to provide only specific planes and have them statically
|
||||
assigned to different Domains. To achieve this, the i915 driver of both
|
||||
domains is provided a command line parameter that specifies the exact
|
||||
domains is provided a command-line parameter that specifies the exact
|
||||
planes that this domain has access to. The i915 driver then enumerates
|
||||
only those HW planes and exposes them to its compositor. It is then left
|
||||
only those hardware planes and exposes them to its compositor. It is then left
|
||||
to the compositor configuration to use these planes appropriately and
|
||||
show the correct content on them. No other changes are necessary.
|
||||
|
||||
@ -691,7 +691,7 @@ quick to implement, it also has some drawbacks. First, since each domain
|
||||
is responsible for showing the content on the screen, there is no
|
||||
control of the User VM by the Service VM. If the User VM is untrusted, this could
|
||||
potentially cause some unwanted content to be displayed. Also, there is
|
||||
no post processing capability, except that provided by the display
|
||||
no post-processing capability, except that provided by the display
|
||||
controller (for example, scaling, rotation, and so on). So each domain
|
||||
must provide finished buffers with the expectation that alpha blending
|
||||
with another domain will not cause any corruption or unwanted artifacts.
|
||||
@ -705,7 +705,7 @@ from the VM. For the global graphics memory space, GVT-g uses graphics
|
||||
memory resource partitioning and an address space ballooning mechanism.
|
||||
For local graphics memory spaces, GVT-g implements per-VM local graphics
|
||||
memory through a render context switch because local graphics memory is
|
||||
only accessible by the GPU.
|
||||
accessible only by the GPU.
|
||||
|
||||
Global Graphics Memory
|
||||
----------------------
|
||||
@ -717,7 +717,7 @@ GVT-g partitions the global graphics memory among VMs. Splitting the
|
||||
CPU/GPU scheduling mechanism requires that the global graphics memory of
|
||||
different VMs can be accessed by the CPU and the GPU simultaneously.
|
||||
Consequently, GVT-g must, at any time, present each VM with its own
|
||||
resource, leading to the resource partitioning approaching, for global
|
||||
resource, leading to the resource partitioning approach, for global
|
||||
graphics memory, as shown in :numref:`mem-part`.
|
||||
|
||||
.. figure:: images/APL_GVT-g-mem-part.png
|
||||
@ -727,7 +727,7 @@ graphics memory, as shown in :numref:`mem-part`.
|
||||
|
||||
Memory Partition and Ballooning
|
||||
|
||||
The performance impact of reduced global graphics memory resource
|
||||
The performance impact of reduced global graphics memory resources
|
||||
due to memory partitioning is very limited according to various test
|
||||
results.
|
||||
|
||||
@ -740,7 +740,7 @@ partitioning information to the VM graphics driver through the PVINFO
|
||||
MMIO window. The graphics driver marks the other VMs' regions as
|
||||
'ballooned', and reserves them as not being used from its graphics
|
||||
memory allocator. Under this design, the guest view of global graphics
|
||||
memory space is exactly the same as the host view and the driver
|
||||
memory space is exactly the same as the host view, and the driver
|
||||
programmed addresses, using guest physical address, can be directly used
|
||||
by the hardware. Address space ballooning is different from traditional
|
||||
memory ballooning techniques. Memory ballooning is for memory usage
|
||||
@ -756,7 +756,7 @@ Per-VM Local Graphics Memory
|
||||
|
||||
GVT-g allows each VM to use the full local graphics memory spaces of its
|
||||
own, similar to the virtual address spaces on the CPU. The local
|
||||
graphics memory spaces are only visible to the Render Engine in the GPU.
|
||||
graphics memory spaces are visible only to the Render Engine in the GPU.
|
||||
Therefore, any valid local graphics memory address, programmed by a VM,
|
||||
can be used directly by the GPU. The GVT-g device model switches the
|
||||
local graphics memory spaces, between VMs, when switching render
|
||||
@ -796,13 +796,13 @@ Per-VM Shadow PPGTT
|
||||
-------------------
|
||||
|
||||
To support local graphics memory access passthrough, GVT-g implements
|
||||
per-VM shadow local page tables. The local graphics memory is only
|
||||
accessible from the Render Engine. The local page tables have two-level
|
||||
per-VM shadow local page tables. The local graphics memory is accessible
|
||||
only from the Render Engine. The local page tables have two-level
|
||||
paging structures, as shown in :numref:`per-vm-shadow`.
|
||||
|
||||
The first level, Page Directory Entries (PDEs), located in the global
|
||||
page table, points to the second level, Page Table Entries (PTEs) in
|
||||
system memory, so guest accesses to the PDE are trapped and emulated,
|
||||
system memory, so guest accesses to the PDE are trapped and emulated
|
||||
through the implementation of shared shadow global page table.
|
||||
|
||||
GVT-g also write-protects a list of guest PTE pages for each VM. The
|
||||
@ -838,11 +838,11 @@ In the system, there are three different schedulers for the GPU:
|
||||
- Mediator GVT scheduler
|
||||
- i915 Service VM scheduler
|
||||
|
||||
Since User VM always uses the host-based command submission (ELSP) model,
|
||||
Because the User VM always uses the host-based command submission (ELSP) model
|
||||
and it never accesses the GPU or the Graphic Micro Controller (:term:`GuC`)
|
||||
directly, its scheduler cannot do any preemption by itself.
|
||||
The i915 scheduler does ensure batch buffers are
|
||||
submitted in dependency order, that is, if a compositor had to wait for
|
||||
The i915 scheduler does ensure that batch buffers are
|
||||
submitted in dependency order—that is, if a compositor has to wait for
|
||||
an application buffer to finish before its workload can be submitted to
|
||||
the GPU, then the i915 scheduler of the User VM ensures that this happens.
|
||||
|
||||
@ -879,23 +879,23 @@ context to preempt the current running context and then wait for the GPU
|
||||
engine to be idle.
|
||||
|
||||
While the identification of workloads to be preempted is decided by
|
||||
customizable scheduling policies, once a candidate for preemption is
|
||||
identified, the i915 scheduler simply submits a preemption request to
|
||||
the :term:`GuC` high-priority queue. Based on the HW's ability to preempt (on an
|
||||
customizable scheduling policies, the i915 scheduler simply submits a
|
||||
preemption request to the :term:`GuC` high-priority queue once a candidate for
|
||||
preemption is identified. Based on the hardware's ability to preempt (on an
|
||||
Apollo Lake SoC, 3D workload is preemptible on a 3D primitive level with
|
||||
some exceptions), the currently executing workload is saved and
|
||||
preempted. The :term:`GuC` informs the driver using an interrupt of a preemption
|
||||
event occurring. After handling the interrupt, the driver submits the
|
||||
high-priority workload through the normal priority :term:`GuC` queue. As such,
|
||||
the normal priority :term:`GuC` queue is used for actual execbuf submission most
|
||||
of the time with the high-priority :term:`GuC` queue only being used for the
|
||||
of the time with the high-priority :term:`GuC` queue being used only for the
|
||||
preemption of lower-priority workload.
|
||||
|
||||
Scheduling policies are customizable and left to customers to change if
|
||||
they are not satisfied with the built-in i915 driver policy, where all
|
||||
workloads of the Service VM are considered higher priority than those of the
|
||||
User VM. This policy can be enforced through an Service VM i915 kernel command line
|
||||
parameter, and can replace the default in-order command submission (no
|
||||
User VM. This policy can be enforced through a Service VM i915 kernel command-line
|
||||
parameter and can replace the default in-order command submission (no
|
||||
preemption) policy.
|
||||
|
||||
AcrnGT
|
||||
@ -903,7 +903,7 @@ AcrnGT
|
||||
|
||||
ACRN is a flexible, lightweight reference hypervisor, built with
|
||||
real-time and safety-criticality in mind, optimized to streamline
|
||||
embedded development through an open source platform.
|
||||
embedded development through an open-source platform.
|
||||
|
||||
AcrnGT is the GVT-g implementation on the ACRN hypervisor. It adapts
|
||||
the MPT interface of GVT-g onto ACRN by using the kernel APIs provided
|
||||
@ -935,7 +935,7 @@ application:
|
||||
hypervisor through hyper-calls.
|
||||
|
||||
- It provides user space interfaces through ``sysfs`` to the user space
|
||||
ACRN-DM, so that DM can manage the lifecycle of the virtual GPUs.
|
||||
ACRN-DM so that DM can manage the lifecycle of the virtual GPUs.
|
||||
|
||||
AcrnGT in DM
|
||||
=============
|
||||
|
Loading…
Reference in New Issue
Block a user