doc: copy editing in the hld topics

Signed-off-by: Benjamin Fitch <benjamin.fitch@intel.com>
This commit is contained in:
Benjamin Fitch 2021-05-12 16:30:45 -07:00 committed by fitchbe
parent 5df3455e8d
commit 2c4249fb96
3 changed files with 129 additions and 129 deletions

View File

@ -4,11 +4,11 @@ AHCI Virtualization in Device Model
################################### ###################################
AHCI (Advanced Host Controller Interface) is a hardware mechanism AHCI (Advanced Host Controller Interface) is a hardware mechanism
that allows software to communicate with Serial ATA devices. AHCI HBA that enables software to communicate with Serial ATA devices. AHCI HBA
(host bus adapters) is a PCI class device that acts as a data movement (host bus adapters) is a PCI class device that acts as a data movement
engine between system memory and Serial ATA devices. The ACPI HBA in engine between system memory and Serial ATA devices. The ACPI HBA in
ACRN supports both ATA and ATAPI devices. The architecture is shown in ACRN supports both ATA and ATAPI devices. The architecture is shown in
the below diagram. the diagram below:
.. figure:: images/ahci-image1.png .. figure:: images/ahci-image1.png
:align: center :align: center
@ -16,17 +16,17 @@ the below diagram.
:name: achi-device :name: achi-device
HBA is registered to the PCI system with device id 0x2821 and vendor id HBA is registered to the PCI system with device id 0x2821 and vendor id
0x8086. Its memory registers are mapped in BAR 5. It only supports 6 0x8086. Its memory registers are mapped in BAR 5. It supports only six
ports (refer to ICH8 AHCI). AHCI driver in the User VM can access HBA in DM ports (refer to ICH8 AHCI). The AHCI driver in the User VM can access HBA in
through the PCI BAR. And HBA can inject MSI interrupts through the PCI DM through the PCI BAR, and HBA can inject MSI interrupts through the PCI
framework. framework.
When the application in the User VM reads data from /dev/sda, the request will When the application in the User VM reads data from /dev/sda, the request will
send through the AHCI driver and then the PCI driver. The User VM will trap to be sent through the AHCI driver and then the PCI driver. The Hypervisor will
hypervisor, and hypervisor dispatch the request to DM. According to the trap the request from the User VM and dispatch it to the DM. According to the
offset in the BAR, the request will dispatch to port control handler. offset in the BAR, the request will be dispatched to the port control handler.
Then the request is parse to a block I/O request which can be processed Then the request is parsed to a block I/O request which can be processed by
by Block backend model. the Block backend model.
Usage: Usage:
@ -34,7 +34,7 @@ Usage:
Type: 'hd' and 'cd' are available. Type: 'hd' and 'cd' are available.
Filepath: the path for the backend file, could be a partition or a Filepath: the path for the backend file; could be a partition or a
regular file. regular file.
For example, For example,

View File

@ -19,7 +19,7 @@ The PS2 port is a 6-pin mini-Din connector used for connecting keyboards and mic
PS2 Keyboard Emulation PS2 Keyboard Emulation
********************** **********************
ACRN supports AT keyboard controller for PS2 keyboard that can be accessed through I/O ports(0x60 and 0x64). 0x60 is used to access AT keyboard controller data register, 0x64 is used to access AT keyboard controller address register. ACRN supports the AT keyboard controller for PS2 keyboard that can be accessed through I/O ports (0x60 and 0x64). 0x60 is used to access AT keyboard controller data register; 0x64 is used to access AT keyboard controller address register.
The PS2 keyboard ACPI description as below:: The PS2 keyboard ACPI description as below::
@ -48,8 +48,8 @@ The PS2 keyboard ACPI description as below::
PS2 Mouse Emulation PS2 Mouse Emulation
******************* *******************
ACRN supports AT keyboard controller for PS2 mouse that can be accessed through I/O ports(0x60 and 0x64). ACRN supports AT keyboard controller for PS2 mouse that can be accessed through I/O ports (0x60 and 0x64).
0x60 is used to access AT keyboard controller data register, 0x64 is used to access AT keyboard controller address register. 0x60 is used to access AT keyboard controller data register; 0x64 is used to access AT keyboard controller address register.
The PS2 mouse ACPI description as below:: The PS2 mouse ACPI description as below::

View File

@ -10,7 +10,7 @@ Purpose of This Document
======================== ========================
This high-level design (HLD) document describes the usage requirements This high-level design (HLD) document describes the usage requirements
and high level design for Intel |reg| Graphics Virtualization Technology for and high-level design for Intel |reg| Graphics Virtualization Technology for
shared virtual :term:`GPU` technology (:term:`GVT-g`) on Apollo Lake-I shared virtual :term:`GPU` technology (:term:`GVT-g`) on Apollo Lake-I
SoCs. SoCs.
@ -18,14 +18,14 @@ This document describes:
- The different GPU virtualization techniques - The different GPU virtualization techniques
- GVT-g mediated passthrough - GVT-g mediated passthrough
- High level design - High-level design
- Key components - Key components
- GVT-g new architecture differentiation - GVT-g new architecture differentiation
Audience Audience
======== ========
This document is for developers, validation teams, architects and This document is for developers, validation teams, architects, and
maintainers of Intel |reg| GVT-g for the Apollo Lake SoCs. maintainers of Intel |reg| GVT-g for the Apollo Lake SoCs.
The reader should have some familiarity with the basic concepts of The reader should have some familiarity with the basic concepts of
@ -47,24 +47,24 @@ Background
Intel GVT-g is an enabling technology in emerging graphics Intel GVT-g is an enabling technology in emerging graphics
virtualization scenarios. It adopts a full GPU virtualization approach virtualization scenarios. It adopts a full GPU virtualization approach
based on mediated passthrough technology, to achieve good performance, based on mediated passthrough technology to achieve good performance,
scalability and secure isolation among Virtual Machines (VMs). A virtual scalability, and secure isolation among Virtual Machines (VMs). A virtual
GPU (vGPU), with full GPU features, is presented to each VM so that a GPU (vGPU), with full GPU features, is presented to each VM so that a
native graphics driver can run directly inside a VM. native graphics driver can run directly inside a VM.
Intel GVT-g technology for Apollo Lake (APL) has been implemented in Intel GVT-g technology for Apollo Lake (APL) has been implemented in
open source hypervisors or Virtual Machine Monitors (VMMs): open-source hypervisors or Virtual Machine Monitors (VMMs):
- Intel GVT-g for ACRN, also known as, "AcrnGT" - Intel GVT-g for ACRN, also known as, "AcrnGT"
- Intel GVT-g for KVM, also known as, "KVMGT" - Intel GVT-g for KVM, also known as, "KVMGT"
- Intel GVT-g for Xen, also known as, "XenGT" - Intel GVT-g for Xen, also known as, "XenGT"
The core vGPU device model is released under BSD/MIT dual license, so it The core vGPU device model is released under the BSD/MIT dual license, so it
can be reused in other proprietary hypervisors. can be reused in other proprietary hypervisors.
Intel has a portfolio of graphics virtualization technologies Intel has a portfolio of graphics virtualization technologies
(:term:`GVT-g`, :term:`GVT-d` and :term:`GVT-s`). GVT-d and GVT-s are (:term:`GVT-g`, :term:`GVT-d`, and :term:`GVT-s`). GVT-d and GVT-s are
outside of the scope of this document. outside the scope of this document.
This HLD applies to the Apollo Lake platform only. Support of other This HLD applies to the Apollo Lake platform only. Support of other
hardware is outside the scope of this HLD. hardware is outside the scope of this HLD.
@ -89,11 +89,11 @@ The main targeted usage of GVT-g is in automotive applications, such as:
Existing Techniques Existing Techniques
=================== ===================
A graphics device is no different from any other I/O device, with A graphics device is no different from any other I/O device with
respect to how the device I/O interface is virtualized. Therefore, respect to how the device I/O interface is virtualized. Therefore,
existing I/O virtualization techniques can be applied to graphics existing I/O virtualization techniques can be applied to graphics
virtualization. However, none of the existing techniques can meet the virtualization. However, none of the existing techniques can meet the
general requirement of performance, scalability, and secure isolation general requirements of performance, scalability, and secure isolation
simultaneously. In this section, we review the pros and cons of each simultaneously. In this section, we review the pros and cons of each
technique in detail, enabling the audience to understand the rationale technique in detail, enabling the audience to understand the rationale
behind the entire GVT-g effort. behind the entire GVT-g effort.
@ -102,12 +102,12 @@ Emulation
--------- ---------
A device can be emulated fully in software, including its I/O registers A device can be emulated fully in software, including its I/O registers
and internal functional blocks. There would be no dependency on the and internal functional blocks. Because there is no dependency on the
underlying hardware capability, therefore compatibility can be achieved underlying hardware capability, compatibility can be achieved
across platforms. However, due to the CPU emulation cost, this technique across platforms. However, due to the CPU emulation cost, this technique
is usually used for legacy devices, such as a keyboard, mouse, and VGA is usually used only for legacy devices such as a keyboard, mouse, and VGA
card. There would be great complexity and extremely low performance to card. Fully emulating a modern accelerator such as a GPU would involve great
fully emulate a modern accelerator, such as a GPU. It may be acceptable complexity and extremely low performance. It may be acceptable
for use in a simulation environment, but it is definitely not suitable for use in a simulation environment, but it is definitely not suitable
for production usage. for production usage.
@ -116,9 +116,9 @@ API Forwarding
API forwarding, or a split driver model, is another widely-used I/O API forwarding, or a split driver model, is another widely-used I/O
virtualization technology. It has been used in commercial virtualization virtualization technology. It has been used in commercial virtualization
productions, for example, VMware*, PCoIP*, and Microsoft* RemoteFx*. productions such as VMware*, PCoIP*, and Microsoft* RemoteFx*.
It is a natural path when researchers study a new type of It is a natural path when researchers study a new type of
I/O virtualization usage, for example, when GPGPU computing in VM was I/O virtualization usage—for example, when GPGPU computing in a VM was
initially proposed. Intel GVT-s is based on this approach. initially proposed. Intel GVT-s is based on this approach.
The architecture of API forwarding is shown in :numref:`api-forwarding`: The architecture of API forwarding is shown in :numref:`api-forwarding`:
@ -131,10 +131,10 @@ The architecture of API forwarding is shown in :numref:`api-forwarding`:
API Forwarding API Forwarding
A frontend driver is employed to forward high-level API calls (OpenGL, A frontend driver is employed to forward high-level API calls (OpenGL,
Directx, and so on) inside a VM, to a Backend driver in the Hypervisor DirectX, and so on) inside a VM to a backend driver in the Hypervisor
for acceleration. The Backend may be using a different graphics stack, for acceleration. The backend may be using a different graphics stack,
so API translation between different graphics protocols may be required. so API translation between different graphics protocols may be required.
The Backend driver allocates a physical GPU resource for each VM, The backend driver allocates a physical GPU resource for each VM,
behaving like a normal graphics application in a Hypervisor. Shared behaving like a normal graphics application in a Hypervisor. Shared
memory may be used to reduce memory copying between the host and guest memory may be used to reduce memory copying between the host and guest
graphic stacks. graphic stacks.
@ -143,16 +143,16 @@ API forwarding can bring hardware acceleration capability into a VM,
with other merits such as vendor independence and high density. However, it with other merits such as vendor independence and high density. However, it
also suffers from the following intrinsic limitations: also suffers from the following intrinsic limitations:
- Lagging features - Every new API version needs to be specifically - Lagging features - Every new API version must be specifically
handled, so it means slow time-to-market (TTM) to support new standards. handled, which means slow time-to-market (TTM) to support new standards.
For example, For example,
only DirectX9 is supported, when DirectX11 is already in the market. only DirectX9 is supported while DirectX11 is already in the market.
Also, there is a big gap in supporting media and compute usages. Also, there is a big gap in supporting media and compute usages.
- Compatibility issues - A GPU is very complex, and consequently so are - Compatibility issues - A GPU is very complex, and consequently so are
high level graphics APIs. Different protocols are not 100% compatible high-level graphics APIs. Different protocols are not 100% compatible
on every subtle API, so the customer can observe feature/quality loss on every subtly different API, so the customer can observe feature/quality
for specific applications. loss for specific applications.
- Maintenance burden - Occurs when supported protocols and specific - Maintenance burden - Occurs when supported protocols and specific
versions are incremented. versions are incremented.
@ -165,10 +165,10 @@ Direct Passthrough
------------------- -------------------
"Direct passthrough" dedicates the GPU to a single VM, providing full "Direct passthrough" dedicates the GPU to a single VM, providing full
features and good performance, but at the cost of device sharing features and good performance at the cost of device sharing
capability among VMs. Only one VM at a time can use the hardware capability among VMs. Only one VM at a time can use the hardware
acceleration capability of the GPU, which is a major limitation of this acceleration capability of the GPU, which is a major limitation of this
technique. However, it is still a good approach to enable graphics technique. However, it is still a good approach for enabling graphics
virtualization usages on Intel server platforms, as an intermediate virtualization usages on Intel server platforms, as an intermediate
solution. Intel GVT-d uses this mechanism. solution. Intel GVT-d uses this mechanism.
@ -197,7 +197,7 @@ passthrough" technique.
Concept Concept
======= =======
Mediated passthrough allows a VM to access performance-critical I/O Mediated passthrough enables a VM to access performance-critical I/O
resources (usually partitioned) directly, without intervention from the resources (usually partitioned) directly, without intervention from the
hypervisor in most cases. Privileged operations from this VM are hypervisor in most cases. Privileged operations from this VM are
trapped-and-emulated to provide secure isolation among VMs. trapped-and-emulated to provide secure isolation among VMs.
@ -212,7 +212,7 @@ trapped-and-emulated to provide secure isolation among VMs.
The Hypervisor must ensure that no vulnerability is exposed when The Hypervisor must ensure that no vulnerability is exposed when
assigning performance-critical resource to each VM. When a assigning performance-critical resource to each VM. When a
performance-critical resource cannot be partitioned, a scheduler must be performance-critical resource cannot be partitioned, a scheduler must be
implemented (either in software or hardware) to allow time-based sharing implemented (either in software or hardware) to enable time-based sharing
among multiple VMs. In this case, the device must allow the hypervisor among multiple VMs. In this case, the device must allow the hypervisor
to save and restore the hardware state associated with the shared resource, to save and restore the hardware state associated with the shared resource,
either through direct I/O register reads and writes (when there is no software either through direct I/O register reads and writes (when there is no software
@ -255,7 +255,7 @@ multiple virtual address spaces by GPU page tables. A 4 GB global
virtual address space called "global graphics memory", accessible from virtual address space called "global graphics memory", accessible from
both the GPU and CPU, is mapped through a global page table. Local both the GPU and CPU, is mapped through a global page table. Local
graphics memory spaces are supported in the form of multiple 4 GB local graphics memory spaces are supported in the form of multiple 4 GB local
virtual address spaces, but are only limited to access by the Render virtual address spaces but are limited to access by the Render
Engine through local page tables. Global graphics memory is mostly used Engine through local page tables. Global graphics memory is mostly used
for the Frame Buffer and also serves as the Command Buffer. Massive data for the Frame Buffer and also serves as the Command Buffer. Massive data
accesses are made to local graphics memory when hardware acceleration is accesses are made to local graphics memory when hardware acceleration is
@ -265,24 +265,24 @@ the on-die memory.
The CPU programs the GPU through GPU-specific commands, shown in The CPU programs the GPU through GPU-specific commands, shown in
:numref:`graphics-arch`, using a producer-consumer model. The graphics :numref:`graphics-arch`, using a producer-consumer model. The graphics
driver programs GPU commands into the Command Buffer, including primary driver programs GPU commands into the Command Buffer, including primary
buffer and batch buffer, according to the high-level programming APIs, buffer and batch buffer, according to the high-level programming APIs
such as OpenGL* or DirectX*. Then, the GPU fetches and executes the such as OpenGL* and DirectX*. Then, the GPU fetches and executes the
commands. The primary buffer (called a ring buffer) may chain other commands. The primary buffer (called a ring buffer) may chain other
batch buffers together. The primary buffer and ring buffer are used batch buffers together. The primary buffer and ring buffer are used
interchangeably thereafter. The batch buffer is used to convey the interchangeably thereafter. The batch buffer is used to convey the
majority of the commands (up to ~98% of them) per programming model. A majority of the commands (up to ~98% of them) per programming model. A
register tuple (head, tail) is used to control the ring buffer. The CPU register tuple (head, tail) is used to control the ring buffer. The CPU
submits the commands to the GPU by updating the tail, while the GPU submits the commands to the GPU by updating the tail, while the GPU
fetches commands from the head, and then notifies the CPU by updating fetches commands from the head and then notifies the CPU by updating
the head, after the commands have finished execution. Therefore, when the head after the commands have finished execution. Therefore, when
the GPU has executed all commands from the ring buffer, the head and the GPU has executed all commands from the ring buffer, the head and
tail pointers are the same. tail pointers are the same.
Having introduced the GPU architecture abstraction, it is important for Having introduced the GPU architecture abstraction, it is important for
us to understand how real-world graphics applications use the GPU us to understand how real-world graphics applications use the GPU
hardware so that we can virtualize it in VMs efficiently. To do so, we hardware so that we can virtualize it in VMs efficiently. To do so, we
characterized, for some representative GPU-intensive 3D workloads (the characterized the usages of the four critical interfaces for some
Phoronix Test Suite), the usages of the four critical interfaces: representative GPU-intensive 3D workloads (the Phoronix Test Suite):
1) the Frame Buffer, 1) the Frame Buffer,
2) the Command Buffer, 2) the Command Buffer,
@ -299,9 +299,9 @@ performance-critical resources, as shown in :numref:`access-patterns`.
When the applications are being loaded, lots of source vertices and When the applications are being loaded, lots of source vertices and
pixels are written by the CPU, so the Frame Buffer accesses occur in the pixels are written by the CPU, so the Frame Buffer accesses occur in the
range of hundreds of thousands per second. Then at run-time, the CPU range of hundreds of thousands per second. Then at run-time, the CPU
programs the GPU through the commands, to render the Frame Buffer, so programs the GPU through the commands to render the Frame Buffer, so
the Command Buffer accesses become the largest group, also in the the Command Buffer accesses become the largest group (also in the
hundreds of thousands per second. PTE and I/O accesses are minor in both hundreds of thousands per second). PTE and I/O accesses are minor in both
load and run-time phases ranging in tens of thousands per second. load and run-time phases ranging in tens of thousands per second.
.. figure:: images/APL_GVT-g-access-patterns.png .. figure:: images/APL_GVT-g-access-patterns.png
@ -311,18 +311,18 @@ load and run-time phases ranging in tens of thousands per second.
Access Patterns of Running 3D Workloads Access Patterns of Running 3D Workloads
High Level Architecture High-Level Architecture
*********************** ***********************
:numref:`gvt-arch` shows the overall architecture of GVT-g, based on the :numref:`gvt-arch` shows the overall architecture of GVT-g, based on the
ACRN hypervisor, with Service VM as the privileged VM, and multiple user ACRN hypervisor, with Service VM as the privileged VM, and multiple user
guests. A GVT-g device model working with the ACRN hypervisor, guests. A GVT-g device model working with the ACRN hypervisor
implements the policies of trap and passthrough. Each guest runs the implements the policies of trap and passthrough. Each guest runs the
native graphics driver and can directly access performance-critical native graphics driver and can directly access performance-critical
resources: the Frame Buffer and Command Buffer, with resource resources: the Frame Buffer and Command Buffer, with resource
partitioning (as presented later). To protect privileged resources, that partitioning (as presented later). To protect privileged resourcesthat
is, the I/O registers and PTEs, corresponding accesses from the graphics is, the I/O registers and PTEscorresponding accesses from the graphics
driver in user VMs are trapped and forwarded to the GVT device model in driver in user VMs are trapped and forwarded to the GVT device model in the
Service VM for emulation. The device model leverages i915 interfaces to access Service VM for emulation. The device model leverages i915 interfaces to access
the physical GPU. the physical GPU.
@ -366,7 +366,7 @@ and gives the corresponding result back to the guest.
The vGPU Device Model provides the basic framework to do The vGPU Device Model provides the basic framework to do
trap-and-emulation, including MMIO virtualization, interrupt trap-and-emulation, including MMIO virtualization, interrupt
virtualization, and display virtualization. It also handles and virtualization, and display virtualization. It also handles and
processes all the requests internally, such as, command scan and shadow, processes all the requests internally (such as command scan and shadow),
schedules them in the proper manner, and finally submits to schedules them in the proper manner, and finally submits to
the Service VM i915 driver. the Service VM i915 driver.
@ -384,9 +384,9 @@ Intel Processor Graphics implements two PCI MMIO BARs:
- **GTTMMADR BAR**: Combines both :term:`GGTT` modification range and Memory - **GTTMMADR BAR**: Combines both :term:`GGTT` modification range and Memory
Mapped IO range. It is 16 MB on :term:`BDW`, with 2 MB used by MMIO, 6 MB Mapped IO range. It is 16 MB on :term:`BDW`, with 2 MB used by MMIO, 6 MB
reserved and 8 MB allocated to GGTT. GGTT starts from reserved, and 8 MB allocated to GGTT. GGTT starts from
:term:`GTTMMADR` + 8 MB. In this section, we focus on virtualization of :term:`GTTMMADR` + 8 MB. In this section, we focus on virtualization of
the MMIO range, discussing GGTT virtualization later. the MMIO range, leaving discussion of GGTT virtualization for later.
- **GMADR BAR**: As the PCI aperture is used by the CPU to access tiled - **GMADR BAR**: As the PCI aperture is used by the CPU to access tiled
graphics memory, GVT-g partitions this aperture range among VMs for graphics memory, GVT-g partitions this aperture range among VMs for
@ -395,11 +395,11 @@ Intel Processor Graphics implements two PCI MMIO BARs:
A 2 MB virtual MMIO structure is allocated per vGPU instance. A 2 MB virtual MMIO structure is allocated per vGPU instance.
All the virtual MMIO registers are emulated as simple in-memory All the virtual MMIO registers are emulated as simple in-memory
read-write, that is, guest driver will read back the same value that was read-write; that is, the guest driver will read back the same value that was
programmed earlier. A common emulation handler (for example, programmed earlier. A common emulation handler (for example,
intel_gvt_emulate_read/write) is enough to handle such general intel_gvt_emulate_read/write) is enough to handle such general
emulation requirements. However, some registers need to be emulated with emulation requirements. However, some registers must be emulated with
specific logic, for example, affected by change of other states or specific logicfor example, affected by change of other states or
additional audit or translation when updating the virtual register. additional audit or translation when updating the virtual register.
Therefore, a specific emulation handler must be installed for those Therefore, a specific emulation handler must be installed for those
special registers. special registers.
@ -408,19 +408,19 @@ The graphics driver may have assumptions about the initial device state,
which stays with the point when the BIOS transitions to the OS. To meet which stays with the point when the BIOS transitions to the OS. To meet
the driver expectation, we need to provide an initial state of vGPU that the driver expectation, we need to provide an initial state of vGPU that
a driver may observe on a pGPU. So the host graphics driver is expected a driver may observe on a pGPU. So the host graphics driver is expected
to generate a snapshot of physical GPU state, which it does before guest to generate a snapshot of physical GPU state, which it does before the guest
driver's initialization. This snapshot is used as the initial vGPU state driver's initialization. This snapshot is used as the initial vGPU state
by the device model. by the device model.
PCI Configuration Space Virtualization PCI Configuration Space Virtualization
-------------------------------------- --------------------------------------
PCI configuration space also needs to be virtualized in the device The PCI configuration space also must be virtualized in the device
model. Different implementations may choose to implement the logic model. Different implementations may choose to implement the logic
within the vGPU device model or in default system device model (for within the vGPU device model or in the default system device model (for
example, ACRN-DM). GVT-g emulates the logic in the device model. example, ACRN-DM). GVT-g emulates the logic in the device model.
Some information is vital for the vGPU device model, including: Some information is vital for the vGPU device model, including
Guest PCI BAR, Guest PCI MSI, and Base of ACPI OpRegion. Guest PCI BAR, Guest PCI MSI, and Base of ACPI OpRegion.
Legacy VGA Port I/O Virtualization Legacy VGA Port I/O Virtualization
@ -443,17 +443,17 @@ handle the GPU interrupt virtualization by itself. Virtual GPU
interrupts are categorized into three types: interrupts are categorized into three types:
- Periodic GPU interrupts are emulated by timers. However, a notable - Periodic GPU interrupts are emulated by timers. However, a notable
exception to this is the VBlank interrupt. Due to the demands of user exception to this is the VBlank interrupt. Due to the demands of user space
space compositors, such as Wayland, which requires a flip done event compositors such as Wayland, which requires a flip done event to be
to be synchronized with a VBlank, this interrupt is forwarded from synchronized with a VBlank, this interrupt is forwarded from the Service VM
Service VM to User VM when Service VM receives it from the hardware. to the User VM when the Service VM receives it from the hardware.
- Event-based GPU interrupts are emulated by the emulation logic. For - Event-based GPU interrupts are emulated by the emulation logic (for
example, AUX Channel Interrupt. example, AUX Channel Interrupt).
- GPU command interrupts are emulated by a command parser and workload - GPU command interrupts are emulated by a command parser and workload
dispatcher. The command parser marks out which GPU command interrupts dispatcher. The command parser marks out which GPU command interrupts
are generated during the command execution and the workload are generated during the command execution, and the workload
dispatcher injects those interrupts into the VM after the workload is dispatcher injects those interrupts into the VM after the workload is
finished. finished.
@ -468,27 +468,27 @@ Workload Scheduler
------------------ ------------------
The scheduling policy and workload scheduler are decoupled for The scheduling policy and workload scheduler are decoupled for
scalability reasons. For example, a future QoS enhancement will only scalability reasons. For example, a future QoS enhancement will impact
impact the scheduling policy, any i915 interface change or HW submission only the scheduling policy, and any i915 interface change or hardware submission
interface change (from execlist to :term:`GuC`) will only need workload interface change (from execlist to :term:`GuC`) will need only workload
scheduler updates. scheduler updates.
The scheduling policy framework is the core of the vGPU workload The scheduling policy framework is the core of the vGPU workload
scheduling system. It controls all of the scheduling actions and scheduling system. It controls all of the scheduling actions and
provides the developer with a generic framework for easy development of provides the developer with a generic framework for easy development of
scheduling policies. The scheduling policy framework controls the work scheduling policies. The scheduling policy framework controls the work
scheduling process without caring about how the workload is dispatched scheduling process without regard for how the workload is dispatched
or completed. All the detailed workload dispatching is hidden in the or completed. All the detailed workload dispatching is hidden in the
workload scheduler, which is the actual executer of a vGPU workload. workload scheduler, which is the actual executer of a vGPU workload.
The workload scheduler handles everything about one vGPU workload. Each The workload scheduler handles everything about one vGPU workload. Each
hardware ring is backed by one workload scheduler kernel thread. The hardware ring is backed by one workload scheduler kernel thread. The
workload scheduler picks the workload from current vGPU workload queue workload scheduler picks the workload from current vGPU workload queue
and communicates with the virtual HW submission interface to emulate the and communicates with the virtual hardware submission interface to emulate the
"schedule-in" status for the vGPU. It performs context shadow, Command "schedule-in" status for the vGPU. It performs context shadow, Command
Buffer scan and shadow, PPGTT page table pin/unpin/out-of-sync, before Buffer scan and shadow, and PPGTT page table pin/unpin/out-of-sync before
submitting this workload to the host i915 driver. When the vGPU workload submitting this workload to the host i915 driver. When the vGPU workload
is completed, the workload scheduler asks the virtual HW submission is completed, the workload scheduler asks the virtual hardware submission
interface to emulate the "schedule-out" status for the vGPU. The VM interface to emulate the "schedule-out" status for the vGPU. The VM
graphics driver then knows that a GPU workload is finished. graphics driver then knows that a GPU workload is finished.
@ -504,11 +504,11 @@ Workload Submission Path
Software submits the workload using the legacy ring buffer mode on Intel Software submits the workload using the legacy ring buffer mode on Intel
Processor Graphics before Broadwell, which is no longer supported by the Processor Graphics before Broadwell, which is no longer supported by the
GVT-g virtual device model. A new HW submission interface named GVT-g virtual device model. A new hardware submission interface named
"Execlist" is introduced since Broadwell. With the new HW submission "Execlist" is introduced since Broadwell. With the new hardware submission
interface, software can achieve better programmability and easier interface, software can achieve better programmability and easier
context management. In Intel GVT-g, the vGPU submits the workload context management. In Intel GVT-g, the vGPU submits the workload
through the virtual HW submission interface. Each workload in submission through the virtual hardware submission interface. Each workload in submission
will be represented as an ``intel_vgpu_workload`` data structure, a vGPU will be represented as an ``intel_vgpu_workload`` data structure, a vGPU
workload, which will be put on a per-vGPU and per-engine workload queue workload, which will be put on a per-vGPU and per-engine workload queue
later after performing a few basic checks and verifications. later after performing a few basic checks and verifications.
@ -546,15 +546,15 @@ Direct Display Model
Direct Display Model Direct Display Model
A typical automotive use case is where there are two displays in the car In a typical automotive use case, there are two displays in the car
and each one needs to show one domain's content, with the two domains and each one must show one domain's content, with the two domains
being the Instrument cluster and the In Vehicle Infotainment (IVI). As being the Instrument cluster and the In Vehicle Infotainment (IVI). As
shown in :numref:`direct-display`, this can be accomplished through the direct shown in :numref:`direct-display`, this can be accomplished through the direct
display model of GVT-g, where the Service VM and User VM are each assigned all HW display model of GVT-g, where the Service VM and User VM are each assigned all hardware
planes of two different pipes. GVT-g has a concept of display owner on a planes of two different pipes. GVT-g has a concept of display owner on a
per HW plane basis. If it determines that a particular domain is the per hardware plane basis. If it determines that a particular domain is the
owner of a HW plane, then it allows the domain's MMIO register write to owner of a hardware plane, then it allows the domain's MMIO register write to
flip a frame buffer to that plane to go through to the HW. Otherwise, flip a frame buffer to that plane to go through to the hardware. Otherwise,
such writes are blocked by the GVT-g. such writes are blocked by the GVT-g.
Indirect Display Model Indirect Display Model
@ -568,23 +568,23 @@ Indirect Display Model
Indirect Display Model Indirect Display Model
For security or fastboot reasons, it may be determined that the User VM is For security or fastboot reasons, it may be determined that the User VM is
either not allowed to display its content directly on the HW or it may either not allowed to display its content directly on the hardware or it may
be too late before it boots up and displays its content. In such a be too late before it boots up and displays its content. In such a
scenario, the responsibility of displaying content on all displays lies scenario, the responsibility of displaying content on all displays lies
with the Service VM. One of the use cases that can be realized is to display the with the Service VM. One of the use cases that can be realized is to display the
entire frame buffer of the User VM on a secondary display. GVT-g allows for this entire frame buffer of the User VM on a secondary display. GVT-g allows for this
model by first trapping all MMIO writes by the User VM to the HW. A proxy model by first trapping all MMIO writes by the User VM to the hardware. A proxy
application can then capture the address in GGTT where the User VM has written application can then capture the address in GGTT where the User VM has written
its frame buffer and using the help of the Hypervisor and the Service VM's i915 its frame buffer and using the help of the Hypervisor and the Service VM's i915
driver, can convert the Guest Physical Addresses (GPAs) into Host driver, can convert the Guest Physical Addresses (GPAs) into Host
Physical Addresses (HPAs) before making a texture source or EGL image Physical Addresses (HPAs) before making a texture source or EGL image
out of the frame buffer and then either post processing it further or out of the frame buffer and then either post processing it further or
simply displaying it on a HW plane of the secondary display. simply displaying it on a hardware plane of the secondary display.
GGTT-Based Surface Sharing GGTT-Based Surface Sharing
-------------------------- --------------------------
One of the major automotive use case is called "surface sharing". This One of the major automotive use cases is called "surface sharing". This
use case requires that the Service VM accesses an individual surface or a set of use case requires that the Service VM accesses an individual surface or a set of
surfaces from the User VM without having to access the entire frame buffer of surfaces from the User VM without having to access the entire frame buffer of
the User VM. Unlike the previous two models, where the User VM did not have to do the User VM. Unlike the previous two models, where the User VM did not have to do
@ -608,13 +608,13 @@ compositor, Mesa, and i915 driver had to be modified.
This model has a major benefit and a major limitation. The This model has a major benefit and a major limitation. The
benefit is that since it builds on top of the indirect display model, benefit is that since it builds on top of the indirect display model,
there are no special drivers necessary for it on either Service VM or User VM. there are no special drivers necessary for it on either Service VM or User VM.
Therefore, any Real Time Operating System (RTOS) that use Therefore, any Real Time Operating System (RTOS) that uses
this model can simply do so without having to implement a driver, the this model can simply do so without having to implement a driver, the
infrastructure for which may not be present in their operating system. infrastructure for which may not be present in their operating system.
The limitation of this model is that video memory dedicated for a User VM is The limitation of this model is that video memory dedicated for a User VM is
generally limited to a couple of hundred MBs. This can easily be generally limited to a couple of hundred MBs. This can easily be
exhausted by a few application buffers so the number and size of buffers exhausted by a few application buffers so the number and size of buffers
is limited. Since it is not a highly-scalable model, in general, Intel is limited. Since it is not a highly-scalable model in general, Intel
recommends the Hyper DMA buffer sharing model, described next. recommends the Hyper DMA buffer sharing model, described next.
Hyper DMA Buffer Sharing Hyper DMA Buffer Sharing
@ -628,12 +628,12 @@ Hyper DMA Buffer Sharing
Hyper DMA Buffer Design Hyper DMA Buffer Design
Another approach to surface sharing is Hyper DMA Buffer sharing. This Another approach to surface sharing is Hyper DMA Buffer sharing. This
model extends the Linux DMA buffer sharing mechanism where one driver is model extends the Linux DMA buffer sharing mechanism in which one driver is
able to share its pages with another driver within one domain. able to share its pages with another driver within one domain.
Applications buffers are backed by i915 Graphics Execution Manager Applications buffers are backed by i915 Graphics Execution Manager
Buffer Objects (GEM BOs). As in GGTT surface Buffer Objects (GEM BOs). As in GGTT surface
sharing, this model also requires compositor changes. The compositor of sharing, this model also requires compositor changes. The compositor of the
User VM requests i915 to export these application GEM BOs and then passes User VM requests i915 to export these application GEM BOs and then passes
them on to a special driver called the Hyper DMA Buf exporter whose job them on to a special driver called the Hyper DMA Buf exporter whose job
is to create a scatter gather list of pages mapped by PDEs and PTEs and is to create a scatter gather list of pages mapped by PDEs and PTEs and
@ -643,13 +643,13 @@ The compositor then shares this Hyper DMA Buf ID with the Service VM's Hyper DMA
Buf importer driver which then maps the memory represented by this ID in Buf importer driver which then maps the memory represented by this ID in
the Service VM. A proxy application in the Service VM can then provide the ID of this driver the Service VM. A proxy application in the Service VM can then provide the ID of this driver
to the Service VM i915, which can create its own GEM BO. Finally, the application to the Service VM i915, which can create its own GEM BO. Finally, the application
can use it as an EGL image and do any post processing required before can use it as an EGL image and do any post-processing required before
either providing it to the Service VM compositor or directly flipping it on a either providing it to the Service VM compositor or directly flipping it on a
HW plane in the compositor's absence. hardware plane in the compositor's absence.
This model is highly scalable and can be used to share up to 4 GB worth This model is highly scalable and can be used to share up to 4 GB worth
of pages. It is also not limited to only sharing graphics buffers. Other of pages. It is also not limited to sharing graphics buffers. Other
buffers for the IPU and others, can also be shared with it. However, it buffers for the IPU and others can also be shared with it. However, it
does require that the Service VM port the Hyper DMA Buffer importer driver. Also, does require that the Service VM port the Hyper DMA Buffer importer driver. Also,
the Service VM must comprehend and implement the DMA buffer sharing model. the Service VM must comprehend and implement the DMA buffer sharing model.
@ -671,8 +671,8 @@ Plane-Based Domain Ownership
Yet another mechanism for showing content of both the Service VM and User VM on the Yet another mechanism for showing content of both the Service VM and User VM on the
same physical display is called plane-based domain ownership. Under this same physical display is called plane-based domain ownership. Under this
model, both the Service VM and User VM are provided a set of HW planes that they can model, both the Service VM and User VM are provided a set of hardware planes that they can
flip their contents on to. Since each domain provides its content, there flip their contents onto. Since each domain provides its content, there
is no need for any extra composition to be done through the Service VM. The display is no need for any extra composition to be done through the Service VM. The display
controller handles alpha blending contents of different domains on a controller handles alpha blending contents of different domains on a
single pipe. This saves on any complexity on either the Service VM or the User VM single pipe. This saves on any complexity on either the Service VM or the User VM
@ -680,9 +680,9 @@ SW stack.
It is important to provide only specific planes and have them statically It is important to provide only specific planes and have them statically
assigned to different Domains. To achieve this, the i915 driver of both assigned to different Domains. To achieve this, the i915 driver of both
domains is provided a command line parameter that specifies the exact domains is provided a command-line parameter that specifies the exact
planes that this domain has access to. The i915 driver then enumerates planes that this domain has access to. The i915 driver then enumerates
only those HW planes and exposes them to its compositor. It is then left only those hardware planes and exposes them to its compositor. It is then left
to the compositor configuration to use these planes appropriately and to the compositor configuration to use these planes appropriately and
show the correct content on them. No other changes are necessary. show the correct content on them. No other changes are necessary.
@ -691,7 +691,7 @@ quick to implement, it also has some drawbacks. First, since each domain
is responsible for showing the content on the screen, there is no is responsible for showing the content on the screen, there is no
control of the User VM by the Service VM. If the User VM is untrusted, this could control of the User VM by the Service VM. If the User VM is untrusted, this could
potentially cause some unwanted content to be displayed. Also, there is potentially cause some unwanted content to be displayed. Also, there is
no post processing capability, except that provided by the display no post-processing capability, except that provided by the display
controller (for example, scaling, rotation, and so on). So each domain controller (for example, scaling, rotation, and so on). So each domain
must provide finished buffers with the expectation that alpha blending must provide finished buffers with the expectation that alpha blending
with another domain will not cause any corruption or unwanted artifacts. with another domain will not cause any corruption or unwanted artifacts.
@ -705,7 +705,7 @@ from the VM. For the global graphics memory space, GVT-g uses graphics
memory resource partitioning and an address space ballooning mechanism. memory resource partitioning and an address space ballooning mechanism.
For local graphics memory spaces, GVT-g implements per-VM local graphics For local graphics memory spaces, GVT-g implements per-VM local graphics
memory through a render context switch because local graphics memory is memory through a render context switch because local graphics memory is
only accessible by the GPU. accessible only by the GPU.
Global Graphics Memory Global Graphics Memory
---------------------- ----------------------
@ -717,7 +717,7 @@ GVT-g partitions the global graphics memory among VMs. Splitting the
CPU/GPU scheduling mechanism requires that the global graphics memory of CPU/GPU scheduling mechanism requires that the global graphics memory of
different VMs can be accessed by the CPU and the GPU simultaneously. different VMs can be accessed by the CPU and the GPU simultaneously.
Consequently, GVT-g must, at any time, present each VM with its own Consequently, GVT-g must, at any time, present each VM with its own
resource, leading to the resource partitioning approaching, for global resource, leading to the resource partitioning approach, for global
graphics memory, as shown in :numref:`mem-part`. graphics memory, as shown in :numref:`mem-part`.
.. figure:: images/APL_GVT-g-mem-part.png .. figure:: images/APL_GVT-g-mem-part.png
@ -727,7 +727,7 @@ graphics memory, as shown in :numref:`mem-part`.
Memory Partition and Ballooning Memory Partition and Ballooning
The performance impact of reduced global graphics memory resource The performance impact of reduced global graphics memory resources
due to memory partitioning is very limited according to various test due to memory partitioning is very limited according to various test
results. results.
@ -740,7 +740,7 @@ partitioning information to the VM graphics driver through the PVINFO
MMIO window. The graphics driver marks the other VMs' regions as MMIO window. The graphics driver marks the other VMs' regions as
'ballooned', and reserves them as not being used from its graphics 'ballooned', and reserves them as not being used from its graphics
memory allocator. Under this design, the guest view of global graphics memory allocator. Under this design, the guest view of global graphics
memory space is exactly the same as the host view and the driver memory space is exactly the same as the host view, and the driver
programmed addresses, using guest physical address, can be directly used programmed addresses, using guest physical address, can be directly used
by the hardware. Address space ballooning is different from traditional by the hardware. Address space ballooning is different from traditional
memory ballooning techniques. Memory ballooning is for memory usage memory ballooning techniques. Memory ballooning is for memory usage
@ -756,7 +756,7 @@ Per-VM Local Graphics Memory
GVT-g allows each VM to use the full local graphics memory spaces of its GVT-g allows each VM to use the full local graphics memory spaces of its
own, similar to the virtual address spaces on the CPU. The local own, similar to the virtual address spaces on the CPU. The local
graphics memory spaces are only visible to the Render Engine in the GPU. graphics memory spaces are visible only to the Render Engine in the GPU.
Therefore, any valid local graphics memory address, programmed by a VM, Therefore, any valid local graphics memory address, programmed by a VM,
can be used directly by the GPU. The GVT-g device model switches the can be used directly by the GPU. The GVT-g device model switches the
local graphics memory spaces, between VMs, when switching render local graphics memory spaces, between VMs, when switching render
@ -796,13 +796,13 @@ Per-VM Shadow PPGTT
------------------- -------------------
To support local graphics memory access passthrough, GVT-g implements To support local graphics memory access passthrough, GVT-g implements
per-VM shadow local page tables. The local graphics memory is only per-VM shadow local page tables. The local graphics memory is accessible
accessible from the Render Engine. The local page tables have two-level only from the Render Engine. The local page tables have two-level
paging structures, as shown in :numref:`per-vm-shadow`. paging structures, as shown in :numref:`per-vm-shadow`.
The first level, Page Directory Entries (PDEs), located in the global The first level, Page Directory Entries (PDEs), located in the global
page table, points to the second level, Page Table Entries (PTEs) in page table, points to the second level, Page Table Entries (PTEs) in
system memory, so guest accesses to the PDE are trapped and emulated, system memory, so guest accesses to the PDE are trapped and emulated
through the implementation of shared shadow global page table. through the implementation of shared shadow global page table.
GVT-g also write-protects a list of guest PTE pages for each VM. The GVT-g also write-protects a list of guest PTE pages for each VM. The
@ -838,11 +838,11 @@ In the system, there are three different schedulers for the GPU:
- Mediator GVT scheduler - Mediator GVT scheduler
- i915 Service VM scheduler - i915 Service VM scheduler
Since User VM always uses the host-based command submission (ELSP) model, Because the User VM always uses the host-based command submission (ELSP) model
and it never accesses the GPU or the Graphic Micro Controller (:term:`GuC`) and it never accesses the GPU or the Graphic Micro Controller (:term:`GuC`)
directly, its scheduler cannot do any preemption by itself. directly, its scheduler cannot do any preemption by itself.
The i915 scheduler does ensure batch buffers are The i915 scheduler does ensure that batch buffers are
submitted in dependency order, that is, if a compositor had to wait for submitted in dependency order—that is, if a compositor has to wait for
an application buffer to finish before its workload can be submitted to an application buffer to finish before its workload can be submitted to
the GPU, then the i915 scheduler of the User VM ensures that this happens. the GPU, then the i915 scheduler of the User VM ensures that this happens.
@ -879,23 +879,23 @@ context to preempt the current running context and then wait for the GPU
engine to be idle. engine to be idle.
While the identification of workloads to be preempted is decided by While the identification of workloads to be preempted is decided by
customizable scheduling policies, once a candidate for preemption is customizable scheduling policies, the i915 scheduler simply submits a
identified, the i915 scheduler simply submits a preemption request to preemption request to the :term:`GuC` high-priority queue once a candidate for
the :term:`GuC` high-priority queue. Based on the HW's ability to preempt (on an preemption is identified. Based on the hardware's ability to preempt (on an
Apollo Lake SoC, 3D workload is preemptible on a 3D primitive level with Apollo Lake SoC, 3D workload is preemptible on a 3D primitive level with
some exceptions), the currently executing workload is saved and some exceptions), the currently executing workload is saved and
preempted. The :term:`GuC` informs the driver using an interrupt of a preemption preempted. The :term:`GuC` informs the driver using an interrupt of a preemption
event occurring. After handling the interrupt, the driver submits the event occurring. After handling the interrupt, the driver submits the
high-priority workload through the normal priority :term:`GuC` queue. As such, high-priority workload through the normal priority :term:`GuC` queue. As such,
the normal priority :term:`GuC` queue is used for actual execbuf submission most the normal priority :term:`GuC` queue is used for actual execbuf submission most
of the time with the high-priority :term:`GuC` queue only being used for the of the time with the high-priority :term:`GuC` queue being used only for the
preemption of lower-priority workload. preemption of lower-priority workload.
Scheduling policies are customizable and left to customers to change if Scheduling policies are customizable and left to customers to change if
they are not satisfied with the built-in i915 driver policy, where all they are not satisfied with the built-in i915 driver policy, where all
workloads of the Service VM are considered higher priority than those of the workloads of the Service VM are considered higher priority than those of the
User VM. This policy can be enforced through an Service VM i915 kernel command line User VM. This policy can be enforced through a Service VM i915 kernel command-line
parameter, and can replace the default in-order command submission (no parameter and can replace the default in-order command submission (no
preemption) policy. preemption) policy.
AcrnGT AcrnGT
@ -903,7 +903,7 @@ AcrnGT
ACRN is a flexible, lightweight reference hypervisor, built with ACRN is a flexible, lightweight reference hypervisor, built with
real-time and safety-criticality in mind, optimized to streamline real-time and safety-criticality in mind, optimized to streamline
embedded development through an open source platform. embedded development through an open-source platform.
AcrnGT is the GVT-g implementation on the ACRN hypervisor. It adapts AcrnGT is the GVT-g implementation on the ACRN hypervisor. It adapts
the MPT interface of GVT-g onto ACRN by using the kernel APIs provided the MPT interface of GVT-g onto ACRN by using the kernel APIs provided
@ -935,7 +935,7 @@ application:
hypervisor through hyper-calls. hypervisor through hyper-calls.
- It provides user space interfaces through ``sysfs`` to the user space - It provides user space interfaces through ``sysfs`` to the user space
ACRN-DM, so that DM can manage the lifecycle of the virtual GPUs. ACRN-DM so that DM can manage the lifecycle of the virtual GPUs.
AcrnGT in DM AcrnGT in DM
============= =============