mirror of
https://github.com/projectacrn/acrn-hypervisor.git
synced 2025-07-02 02:03:20 +00:00
doc: Style cleanup in CPU virt hld
Style changes per Acrolinx recommendations and for consistency Signed-off-by: Reyes, Amy <amy.reyes@intel.com>
This commit is contained in:
parent
81336a8ee4
commit
09980e778e
@ -18,13 +18,13 @@ Based on Intel VT-x virtualization technology, ACRN emulates a virtual CPU
|
||||
- **core partition**: one vCPU is dedicated and associated with one
|
||||
physical CPU (pCPU),
|
||||
making much of the hardware register emulation simply
|
||||
passthrough. This provides good isolation for physical interrupts
|
||||
passthrough. This method provides good isolation for physical interrupts
|
||||
and guest execution. (See `Static CPU Partitioning`_ for more
|
||||
information.)
|
||||
|
||||
- **core sharing** (to be added): two or more vCPUs share one
|
||||
physical CPU (pCPU). A more complicated context switch is needed
|
||||
between different vCPUs' switching. This provides flexible computing
|
||||
between different vCPUs' switching. This method provides flexible computing
|
||||
resources sharing for low-performance demand vCPU tasks.
|
||||
(See `Flexible CPU Sharing`_ for more information.)
|
||||
|
||||
@ -36,47 +36,49 @@ Based on Intel VT-x virtualization technology, ACRN emulates a virtual CPU
|
||||
the vCPU thread for emulating a guest CPU, switching between VMX root
|
||||
mode and non-root mode. A CPU schedules out to default idle when an
|
||||
operation needs it to stay in VMX root mode, such as when waiting for
|
||||
an I/O request from the DM or when ready to destroy.
|
||||
an I/O request from the Device Model (DM) or when ready to destroy.
|
||||
|
||||
- **round-robin scheduler** (to be added): allows more vCPU thread loops
|
||||
to run on a CPU. A CPU switches among different vCPU threads and default
|
||||
idle threads as it runs out corresponding timeslices or necessary
|
||||
scheduling outs such as waiting for an I/O request. A vCPU can yield
|
||||
itself as well, such as when it executes "PAUSE" instruction.
|
||||
itself as well, such as when it executes a "PAUSE" instruction.
|
||||
|
||||
|
||||
Static CPU Partitioning
|
||||
***********************
|
||||
|
||||
CPU partitioning is a policy for mapping a virtual
|
||||
CPU (vCPU) to a physical CPU. To enable this, the ACRN hypervisor can
|
||||
CPU (vCPU) to a physical CPU. To enable this feature, the ACRN hypervisor can
|
||||
configure a noop scheduler as the schedule policy for this physical CPU.
|
||||
|
||||
ACRN then forces a fixed 1:1 mapping between a vCPU and this physical CPU
|
||||
when creating a vCPU for the guest Operating System. This makes the vCPU
|
||||
when creating a vCPU for the guest operating system. This makes the vCPU
|
||||
management code much simpler.
|
||||
|
||||
``cpu_affinity`` in ``vm config`` helps to decide which physical CPU a
|
||||
VCPU in a VM affines to, then finalize the fixed mapping. When launching a
|
||||
User VM, need to choose pCPUs from the VM's cpu_affinity that are not
|
||||
used by any other VMs.
|
||||
ACRN uses the ``cpu_affinity`` parameter in ``vm config`` to decide which
|
||||
physical CPU to map to a vCPU in a VM, then finalizes the fixed mapping. When
|
||||
launching a User VM, need to choose pCPUs from the VM's ``cpu_affinity`` that
|
||||
are not used by any other VMs.
|
||||
|
||||
Flexible CPU Sharing
|
||||
********************
|
||||
|
||||
To enable CPU sharing, ACRN hypervisor can configure IORR
|
||||
To enable CPU sharing, the ACRN hypervisor can configure the IORR
|
||||
(IO sensitive Round-Robin) or the BVT (Borrowed Virtual Time) scheduler
|
||||
policy.
|
||||
|
||||
``cpu_affinity`` in ``vm config`` indicates all the physical CPUs on which
|
||||
this VM is allowed to run. A pCPU can be shared among a Service VM and any
|
||||
User VM as long as the local APIC passthrough is not enabled in that User
|
||||
The ``cpu_affinity`` parameter in ``vm config`` indicates all the physical CPUs
|
||||
on which this VM is allowed to run. A pCPU can be shared among a Service VM and
|
||||
any User VM as long as the local APIC passthrough is not enabled in that User
|
||||
VM.
|
||||
|
||||
See :ref:`cpu_sharing` for more information.
|
||||
|
||||
.. _hv-cpu-virt-cpu-mgmt-partition:
|
||||
|
||||
CPU Management in the Service VM Under Static CPU Partitioning
|
||||
==============================================================
|
||||
**************************************************************
|
||||
|
||||
With ACRN, all ACPI table entries are passthrough to the Service VM, including
|
||||
the Multiple Interrupt Controller Table (MADT). The Service VM sees all
|
||||
@ -84,7 +86,7 @@ physical CPUs by parsing the MADT when the Service VM kernel boots. All
|
||||
physical CPUs are initially assigned to the Service VM by creating the same
|
||||
number of virtual CPUs.
|
||||
|
||||
When the Service VM boot is finished, it releases the physical CPUs intended
|
||||
After the Service VM boots, it releases the physical CPUs intended
|
||||
for User VM use.
|
||||
|
||||
Here is an example flow of CPU allocation on a multi-core platform.
|
||||
@ -94,35 +96,33 @@ Here is an example flow of CPU allocation on a multi-core platform.
|
||||
:align: center
|
||||
:name: static-core-cpu-allocation
|
||||
|
||||
CPU allocation on a multi-core platform
|
||||
CPU Allocation on a Multi-core Platform
|
||||
|
||||
CPU Management in the Service VM Under Flexible CPU Sharing
|
||||
===========================================================
|
||||
***********************************************************
|
||||
|
||||
As all Service VM CPUs could share with different User VMs, ACRN can still passthrough
|
||||
MADT to Service VM, and the Service VM is still able to see all physical CPUs.
|
||||
|
||||
But as under CPU sharing, the Service VM does not need offline/release the physical
|
||||
CPUs intended for User VM use.
|
||||
The Service VM sees all physical CPUs via the MADT, as described in
|
||||
:ref:`hv-cpu-virt-cpu-mgmt-partition`. However, the Service VM does not release
|
||||
the physical CPUs intended for User VM use.
|
||||
|
||||
CPU Management in the User VM
|
||||
=============================
|
||||
*****************************
|
||||
|
||||
``cpu_affinity`` in ``vm config`` defines a set of pCPUs that a User VM
|
||||
is allowed to run on. acrn-dm could choose to launch on only a subset of the pCPUs
|
||||
or on all pCPUs listed in cpu_affinity, but it can't assign
|
||||
any pCPU that is not included in it.
|
||||
The ``cpu_affinity`` parameter in ``vm config`` defines a set of pCPUs that a
|
||||
User VM is allowed to run on. The Device Model can launch a User VM on only a
|
||||
subset of the pCPUs or on all pCPUs listed in ``cpu_affinity``, but it cannot
|
||||
assign any pCPU that is not included in it.
|
||||
|
||||
CPU Assignment Management in HV
|
||||
===============================
|
||||
CPU Assignment Management in the Hypervisor
|
||||
*******************************************
|
||||
|
||||
The physical CPU assignment is predefined by ``cpu_affinity`` in
|
||||
``vm config``, while post-launched VMs could be launched on pCPUs that are
|
||||
a subset of it.
|
||||
|
||||
Currently, the ACRN hypervisor does not support virtual CPU migration to
|
||||
different physical CPUs. This means no changes to the virtual CPU to
|
||||
physical CPU can happen without first calling offline_vcpu.
|
||||
different physical CPUs. No changes to the mapping of the virtual CPU to
|
||||
physical CPU can happen without first calling ``offline_vcpu``.
|
||||
|
||||
|
||||
.. _vCPU_lifecycle:
|
||||
@ -134,26 +134,26 @@ A vCPU lifecycle is shown in :numref:`hv-vcpu-transitions` below, where
|
||||
the major states are:
|
||||
|
||||
- **VCPU_INIT**: vCPU is in an initialized state, and its vCPU thread
|
||||
is not ready to run on its associated CPU
|
||||
is not ready to run on its associated CPU.
|
||||
|
||||
- **VCPU_RUNNING**: vCPU is running, and its vCPU thread is ready (in
|
||||
the queue) or running on its associated CPU
|
||||
the queue) or running on its associated CPU.
|
||||
|
||||
- **VCPU_PAUSED**: vCPU is paused, and its vCPU thread is not running
|
||||
on its associated CPU
|
||||
on its associated CPU.
|
||||
|
||||
- **VPCU_ZOMBIE**: vCPU is being offline, and its vCPU thread is not
|
||||
running on its associated CPU
|
||||
- **VPCU_ZOMBIE**: vCPU is transitioning to an offline state, and its vCPU thread is
|
||||
not running on its associated CPU.
|
||||
|
||||
- **VPCU_OFFLINE**: vCPU is offline
|
||||
- **VPCU_OFFLINE**: vCPU is offline.
|
||||
|
||||
.. figure:: images/hld-image17.png
|
||||
:align: center
|
||||
:name: hv-vcpu-transitions
|
||||
|
||||
ACRN vCPU state transitions
|
||||
ACRN vCPU State Transitions
|
||||
|
||||
Following functions are used to drive the state machine of the vCPU
|
||||
The following functions are used to drive the state machine of the vCPU
|
||||
lifecycle:
|
||||
|
||||
.. doxygenfunction:: create_vcpu
|
||||
@ -176,21 +176,20 @@ vCPU Scheduling Under Static CPU Partitioning
|
||||
:align: center
|
||||
:name: hv-vcpu-schedule
|
||||
|
||||
ACRN vCPU scheduling flow under static CPU partitioning
|
||||
ACRN vCPU Scheduling Flow Under Static CPU Partitioning
|
||||
|
||||
As describes in the CPU virtualization overview, if under static
|
||||
CPU partitioning, ACRN implements a simple scheduling mechanism
|
||||
based on two threads: vcpu_thread and default_idle. A vCPU with
|
||||
VCPU_RUNNING state always runs in a vcpu_thread loop, meanwhile
|
||||
a vCPU with VCPU_PAUSED or VCPU_ZOMBIE state runs in default_idle
|
||||
loop. The detail behaviors in vcpu_thread and default_idle threads
|
||||
For static CPU partitioning, ACRN implements a simple scheduling mechanism
|
||||
based on two threads: vcpu_thread and default_idle. A vCPU in the
|
||||
VCPU_RUNNING state always runs in a vcpu_thread loop.
|
||||
A vCPU in the VCPU_PAUSED or VCPU_ZOMBIE state runs in a default_idle
|
||||
loop. The behaviors in the vcpu_thread and default_idle threads
|
||||
are illustrated in :numref:`hv-vcpu-schedule`:
|
||||
|
||||
- The **vcpu_thread** loop will do the loop of handling VM exits,
|
||||
and pending requests around the VM entry/exit.
|
||||
It will also check the reschedule request then schedule out to
|
||||
default_idle if necessary. See `vCPU Thread`_ for more details
|
||||
of vcpu_thread.
|
||||
about vcpu_thread.
|
||||
|
||||
- The **default_idle** loop simply does do_cpu_idle while also
|
||||
checking for need-offline and reschedule requests.
|
||||
@ -206,23 +205,23 @@ Some example scenario flows are shown here:
|
||||
.. figure:: images/hld-image7.png
|
||||
:align: center
|
||||
|
||||
ACRN vCPU scheduling scenarios
|
||||
ACRN vCPU Scheduling Scenarios
|
||||
|
||||
- **During starting a VM**: after create a vCPU, BSP calls *launch_vcpu*
|
||||
through *start_vm*, AP calls *launch_vcpu* through vlapic
|
||||
INIT-SIPI emulation, finally this vCPU runs in a
|
||||
*vcpu_thread* loop.
|
||||
- **During VM startup**: after a vCPU is created, the bootstrap processor (BSP)
|
||||
calls *launch_vcpu* through *start_vm*. The application processor (AP) calls
|
||||
*launch_vcpu* through vLAPIC INIT-SIPI emulation. Finally, this vCPU runs in
|
||||
a *vcpu_thread* loop.
|
||||
|
||||
- **During shutting down a VM**: *pause_vm* function call makes a vCPU
|
||||
- **During VM shutdown**: *pause_vm* function forces a vCPU
|
||||
running in *vcpu_thread* to schedule out to *default_idle*. The
|
||||
following *reset_vcpu* and *offline_vcpu* de-init and then offline
|
||||
this vCPU instance.
|
||||
|
||||
- **During IOReq handling**: after an IOReq is sent to DM for emulation, a
|
||||
vCPU running in *vcpu_thread* schedules out to *default_idle*
|
||||
through *acrn_insert_request_wait->pause_vcpu*. After DM
|
||||
complete the emulation for this IOReq, it calls
|
||||
*hcall_notify_ioreq_finish->resume_vcpu* and makes the vCPU
|
||||
through *acrn_insert_request_wait->pause_vcpu*. After the DM
|
||||
completes the emulation for this IOReq, it calls
|
||||
*hcall_notify_ioreq_finish->resume_vcpu* and changes the vCPU
|
||||
schedule back to *vcpu_thread* to continue its guest execution.
|
||||
|
||||
vCPU Scheduling Under Flexible CPU Sharing
|
||||
@ -238,7 +237,7 @@ The vCPU thread flow is a loop as shown and described below:
|
||||
.. figure:: images/hld-image68.png
|
||||
:align: center
|
||||
|
||||
ACRN vCPU thread
|
||||
ACRN vCPU Thread
|
||||
|
||||
|
||||
1. Check if *vcpu_thread* needs to schedule out to *default_idle* or
|
||||
@ -251,7 +250,7 @@ The vCPU thread flow is a loop as shown and described below:
|
||||
3. VM Enter by calling *start/run_vcpu*, then enter non-root mode to do
|
||||
guest execution.
|
||||
|
||||
4. VM Exit from *start/run_vcpu* when guest trigger VM exit reason in
|
||||
4. VM Exit from *start/run_vcpu* when the guest triggers a VM exit reason in
|
||||
non-root mode.
|
||||
|
||||
5. Handle VM exit based on specific reason.
|
||||
@ -272,8 +271,8 @@ categories:
|
||||
|
||||
- Always save/restore during VM exit/entry:
|
||||
|
||||
- These registers must be saved every time VM exit, and restored
|
||||
every time VM entry
|
||||
- These registers must be saved for each VM exit, and restored
|
||||
for each VM entry
|
||||
- Registers include: general purpose registers, CR2, and
|
||||
IA32_SPEC_CTRL
|
||||
- Definition in *vcpu->run_context*
|
||||
@ -360,7 +359,7 @@ that will trigger an error message and return without handling:
|
||||
|
||||
* - **VM Exit Reason**
|
||||
- **Handler**
|
||||
- **Desc**
|
||||
- **Description**
|
||||
|
||||
* - VMX_EXIT_REASON_EXCEPTION_OR_NMI
|
||||
- exception_vmexit_handler
|
||||
@ -372,11 +371,11 @@ that will trigger an error message and return without handling:
|
||||
|
||||
* - VMX_EXIT_REASON_TRIPLE_FAULT
|
||||
- triple_fault_vmexit_handler
|
||||
- Handle triple fault from vcpu
|
||||
- Handle triple fault from vCPU
|
||||
|
||||
* - VMX_EXIT_REASON_INIT_SIGNAL
|
||||
- init_signal_vmexit_handler
|
||||
- Handle INIT signal from vcpu
|
||||
- Handle INIT signal from vCPU
|
||||
|
||||
* - VMX_EXIT_REASON_INTERRUPT_WINDOW
|
||||
- interrupt_window_vmexit_handler
|
||||
@ -482,7 +481,7 @@ running). See :ref:`vcpu-request-interrupt-injection` for details.
|
||||
}
|
||||
}
|
||||
|
||||
For each request, function *acrn_handle_pending_request* handles each
|
||||
The function *acrn_handle_pending_request* handles each
|
||||
request as shown below.
|
||||
|
||||
|
||||
@ -491,7 +490,7 @@ request as shown below.
|
||||
:header-rows: 1
|
||||
|
||||
* - **Request**
|
||||
- **Desc**
|
||||
- **Description**
|
||||
- **Request Maker**
|
||||
- **Request Handler**
|
||||
|
||||
@ -504,7 +503,7 @@ request as shown below.
|
||||
on exception priority
|
||||
|
||||
* - ACRN_REQUEST_EVENT
|
||||
- Request for vlapic interrupt vector injection
|
||||
- Request for vLAPIC interrupt vector injection
|
||||
- vlapic_fire_lvt or vlapic_set_intr, which could be triggered
|
||||
by vlapic lvt, vioapic, or vmsi
|
||||
- vcpu_do_pending_event
|
||||
@ -517,10 +516,10 @@ request as shown below.
|
||||
* - ACRN_REQUEST_NMI
|
||||
- Request for nmi injection
|
||||
- vcpu_inject_nmi
|
||||
- program VMX_ENTRY_INT_INFO_FIELD directly
|
||||
- Program VMX_ENTRY_INT_INFO_FIELD directly
|
||||
|
||||
* - ACRN_REQUEST_EOI_EXIT_BITMAP_UPDATE
|
||||
- Request for update VEOI bitmap update for level triggered vector
|
||||
- Request for VEOI bitmap update for level triggered vector
|
||||
- vlapic_reset_tmr or vlapic_set_tmr change trigger mode in RTC
|
||||
- vcpu_set_vmcs_eoi_exit
|
||||
|
||||
@ -546,27 +545,26 @@ request as shown below.
|
||||
VMX Initialization
|
||||
******************
|
||||
|
||||
ACRN will attempt to initialize the vCPU's VMCS before its first
|
||||
launch with the host state, execution control, guest state,
|
||||
entry control and exit control, as shown in the table below.
|
||||
ACRN attempts to initialize the vCPU's VMCS before its first
|
||||
launch. ACRN sets the host state, execution control, guest state,
|
||||
entry control, and exit control, as shown in the table below.
|
||||
|
||||
The table briefly shows how each field got configured.
|
||||
The guest state field is critical for a guest CPU start to run
|
||||
The table briefly shows how each field is configured.
|
||||
The guest state field is critical for running a guest CPU
|
||||
based on different CPU modes.
|
||||
|
||||
For a guest vCPU's state initialization:
|
||||
|
||||
- If it's BSP, the guest state configuration is done in SW load,
|
||||
which could be initialized by different objects:
|
||||
|
||||
- The Service VM BSP: hypervisor will do context initialization in different
|
||||
SW load based on different boot mode
|
||||
- If it's BSP, the guest state configuration is done in software load,
|
||||
which can be initialized by different objects:
|
||||
|
||||
- Service VM BSP: Hypervisor does context initialization in different
|
||||
software load based on different boot mode
|
||||
|
||||
- User VM BSP: DM context initialization through hypercall
|
||||
|
||||
- If it's AP, then it will always start from real mode, and the start
|
||||
vector will always come from vlapic INIT-SIPI emulation.
|
||||
- If it's AP, it always starts from real mode, and the start
|
||||
vector always comes from vLAPIC INIT-SIPI emulation.
|
||||
|
||||
.. doxygenstruct:: acrn_regs
|
||||
:project: Project ACRN
|
||||
@ -605,7 +603,7 @@ For a guest vCPU's state initialization:
|
||||
- n/a
|
||||
- Set to 0
|
||||
|
||||
* - **exec control**
|
||||
* - **execution control**
|
||||
- VMX_PIN_VM_EXEC_CONTROLS
|
||||
- 0
|
||||
- Enable external-interrupt exiting
|
||||
@ -759,20 +757,20 @@ For a guest vCPU's state initialization:
|
||||
CPUID Virtualization
|
||||
********************
|
||||
|
||||
CPUID access from guest would cause VM exits unconditionally if executed
|
||||
CPUID access from a guest would cause VM exits unconditionally if executed
|
||||
as a VMX non-root operation. ACRN must return the emulated processor
|
||||
identification and feature information in the EAX, EBX, ECX, and EDX
|
||||
registers.
|
||||
|
||||
To simplify, ACRN returns the same values from the physical CPU for most
|
||||
of the CPUID, and specially handle a few CPUID features which are APIC
|
||||
of the CPUID, and specially handles a few CPUID features that are APIC
|
||||
ID related such as CPUID.01H.
|
||||
|
||||
ACRN emulates some extra CPUID features for the hypervisor as well.
|
||||
|
||||
There is a per-vm *vcpuid_entries* array, initialized during VM creation
|
||||
The per-vm *vcpuid_entries* array is initialized during VM creation
|
||||
and used to cache most of the CPUID entries for each VM. During guest
|
||||
CPUID emulation, ACRN will read the cached value from this array, except
|
||||
CPUID emulation, ACRN reads the cached value from this array, except
|
||||
some APIC ID-related CPUID data emulated at runtime.
|
||||
|
||||
This table describes details for CPUID emulation:
|
||||
@ -809,8 +807,8 @@ This table describes details for CPUID emulation:
|
||||
|
||||
* - 16H
|
||||
- - Get from per-vm CPUID entries cache
|
||||
- If physical CPU support CPUID.16H, read from physical CPUID
|
||||
- If physical CPU does not support it, emulate with tsc freq
|
||||
- If physical CPU supports CPUID.16H, read from physical CPUID
|
||||
- If physical CPU does not support it, emulate with TSC frequency
|
||||
|
||||
* - 40000000H
|
||||
- - Get from per-vm CPUID entries cache
|
||||
@ -819,41 +817,41 @@ This table describes details for CPUID emulation:
|
||||
|
||||
* - 40000010H
|
||||
- - Get from per-vm CPUID entries cache
|
||||
- EAX: virtual TSC frequency in KHz
|
||||
- EAX: virtual TSC frequency in kHz
|
||||
- EBX, ECX, EDX: reserved to 0
|
||||
|
||||
* - 0AH
|
||||
- - PMU Currently disabled
|
||||
- - PMU currently disabled
|
||||
|
||||
* - 0FH, 10H
|
||||
- - Intel RDT Currently disabled
|
||||
- - Intel RDT currently disabled
|
||||
|
||||
* - 12H
|
||||
- - Fill according to SGX virtualization
|
||||
|
||||
* - 14H
|
||||
- - Intel Processor Trace Currently disabled
|
||||
- - Intel Processor Trace currently disabled
|
||||
|
||||
* - Others
|
||||
- - Get from per-vm CPUID entries cache
|
||||
|
||||
.. note:: ACRN needs to take care of
|
||||
some CPUID values that can change at runtime, for example, XD feature in
|
||||
CPUID.80000001H may be cleared by MISC_ENABLE MSR.
|
||||
some CPUID values that can change at runtime, for example, the XD feature in
|
||||
CPUID.80000001H may be cleared by the MISC_ENABLE MSR.
|
||||
|
||||
|
||||
MSR Virtualization
|
||||
******************
|
||||
|
||||
ACRN always enables MSR bitmap in *VMX_PROC_VM_EXEC_CONTROLS* VMX
|
||||
ACRN always enables an MSR bitmap in the *VMX_PROC_VM_EXEC_CONTROLS* VMX
|
||||
execution control field. This bitmap marks the MSRs to cause a VM
|
||||
exit upon guest access for both read and write. The VM
|
||||
exit reason for reading or writing these MSRs is respectively
|
||||
*VMX_EXIT_REASON_RDMSR* or *VMX_EXIT_REASON_WRMSR* and the VM exit
|
||||
handler is *rdmsr_vmexit_handler* or *wrmsr_vmexit_handler*.
|
||||
|
||||
This table shows the predefined MSRs ACRN will trap for all the guests. For
|
||||
the MSRs whose bitmap are not set in the MSR bitmap, guest access will be
|
||||
This table shows the predefined MSRs that ACRN will trap for all the guests. For
|
||||
the MSRs whose bitmap values are not set in the MSR bitmap, guest access will be
|
||||
passthrough directly:
|
||||
|
||||
.. list-table::
|
||||
@ -866,15 +864,15 @@ passthrough directly:
|
||||
|
||||
* - MSR_IA32_TSC_ADJUST
|
||||
- TSC adjustment of local APIC's TSC deadline mode
|
||||
- emulates with vlapic
|
||||
- Emulates with vLAPIC
|
||||
|
||||
* - MSR_IA32_TSC_DEADLINE
|
||||
- TSC target of local APIC's TSC deadline mode
|
||||
- emulates with vlapic
|
||||
- Emulates with vLAPIC
|
||||
|
||||
* - MSR_IA32_BIOS_UPDT_TRIG
|
||||
- BIOS update trigger
|
||||
- work for update microcode from the Service VM, the signature ID read is from
|
||||
- Update microcode from the Service VM, the signature ID read is from
|
||||
physical MSR, and a BIOS update trigger from the Service VM will trigger a
|
||||
physical microcode update.
|
||||
|
||||
@ -884,44 +882,44 @@ passthrough directly:
|
||||
|
||||
* - MSR_IA32_TIME_STAMP_COUNTER
|
||||
- Time-stamp counter
|
||||
- work with VMX_TSC_OFFSET_FULL to emulate virtual TSC
|
||||
- Work with VMX_TSC_OFFSET_FULL to emulate virtual TSC
|
||||
|
||||
* - MSR_IA32_APIC_BASE
|
||||
- APIC base address
|
||||
- emulates with vlapic
|
||||
- Emulates with vLAPIC
|
||||
|
||||
* - MSR_IA32_PAT
|
||||
- Page-attribute table
|
||||
- save/restore in vCPU, write to VMX_GUEST_IA32_PAT_FULL if cr0.cd is 0
|
||||
- Save/restore in vCPU, write to VMX_GUEST_IA32_PAT_FULL if cr0.cd is 0
|
||||
|
||||
* - MSR_IA32_PERF_CTL
|
||||
- Performance control
|
||||
- Trigger real p-state change if p-state is valid when writing,
|
||||
- Trigger real P-state change if P-state is valid when writing,
|
||||
fetch physical MSR when reading
|
||||
|
||||
* - MSR_IA32_FEATURE_CONTROL
|
||||
- Feature control bits that configure operation of VMX and SMX
|
||||
- disabled, locked
|
||||
- Disabled, locked
|
||||
|
||||
* - MSR_IA32_MCG_CAP/STATUS
|
||||
- Machine-Check global control/status
|
||||
- emulates with vMCE
|
||||
- Emulates with vMCE
|
||||
|
||||
* - MSR_IA32_MISC_ENABLE
|
||||
- Miscellaneous feature control
|
||||
- readonly, except MONITOR/MWAIT enable bit
|
||||
- Read-only, except MONITOR/MWAIT enable bit
|
||||
|
||||
* - MSR_IA32_SGXLEPUBKEYHASH0/1/2/3
|
||||
- SHA256 digest of the authorized launch enclaves
|
||||
- emulates with vSGX
|
||||
- Emulates with vSGX
|
||||
|
||||
* - MSR_IA32_SGX_SVN_STATUS
|
||||
- Status and SVN threshold of SGX support for ACM
|
||||
- readonly, emulates with vSGX
|
||||
- Read-only, emulates with vSGX
|
||||
|
||||
* - MSR_IA32_MTRR_CAP
|
||||
- Memory type range register related
|
||||
- Handled by MTRR emulation.
|
||||
- Handled by MTRR emulation
|
||||
|
||||
* - MSR_IA32_MTRR_DEF_TYPE
|
||||
- \"
|
||||
@ -945,23 +943,23 @@ passthrough directly:
|
||||
|
||||
* - MSR_IA32_X2APIC_*
|
||||
- x2APIC related MSRs (offset from 0x800 to 0x900)
|
||||
- emulates with vlapic
|
||||
- Emulates with vLAPIC
|
||||
|
||||
* - MSR_IA32_L2_MASK_BASE~n
|
||||
- L2 CAT mask for CLOSn
|
||||
- disabled for guest access
|
||||
- Disabled for guest access
|
||||
|
||||
* - MSR_IA32_L3_MASK_BASE~n
|
||||
- L3 CAT mask for CLOSn
|
||||
- disabled for guest access
|
||||
- Disabled for guest access
|
||||
|
||||
* - MSR_IA32_MBA_MASK_BASE~n
|
||||
- MBA delay mask for CLOSn
|
||||
- disabled for guest access
|
||||
- Disabled for guest access
|
||||
|
||||
* - MSR_IA32_VMX_BASIC~VMX_TRUE_ENTRY_CTLS
|
||||
- VMX related MSRs
|
||||
- not support, access will inject #GP
|
||||
- Not supported, access will inject #GP
|
||||
|
||||
|
||||
CR Virtualization
|
||||
@ -976,7 +974,7 @@ from cr8`` through *cr_access_vmexit_handler* based on
|
||||
*VMX_PROC_VM_EXEC_CONTROLS*.
|
||||
|
||||
A VM can ``mov from cr0`` and ``mov from
|
||||
cr4`` without triggering a VM exit. The value read are the read shadows
|
||||
cr4`` without triggering a VM exit. The values read are the read shadows
|
||||
of the corresponding register in VMCS. The shadows are updated by the
|
||||
hypervisor on CR writes.
|
||||
|
||||
@ -991,13 +989,13 @@ hypervisor on CR writes.
|
||||
- Based on vCPU set context API: vcpu_set_cr0 -> vmx_write_cr0
|
||||
|
||||
* - mov to cr4
|
||||
- Based on vCPU set context API: vcpu_set_cr4 ->vmx_write_cr4
|
||||
- Based on vCPU set context API: vcpu_set_cr4 -> vmx_write_cr4
|
||||
|
||||
* - mov to cr8
|
||||
- Based on vlapic tpr API: vlapic_set_cr8->vlapic_set_tpr
|
||||
- Based on vLAPIC tpr API: vlapic_set_cr8 -> vlapic_set_tpr
|
||||
|
||||
* - mov from cr8
|
||||
- Based on vlapic tpr API: vlapic_get_cr8->vlapic_get_tpr
|
||||
- Based on vLAPIC tpr API: vlapic_get_cr8 -> vlapic_get_tpr
|
||||
|
||||
|
||||
For ``mov to cr0`` and ``mov to cr4``, ACRN sets
|
||||
@ -1006,7 +1004,7 @@ for the bitmask causing VM exit.
|
||||
|
||||
As ACRN always enables ``unrestricted guest`` in
|
||||
*VMX_PROC_VM_EXEC_CONTROLS2*, *CR0.PE* and *CR0.PG* can be
|
||||
controlled by guest.
|
||||
controlled by the guest.
|
||||
|
||||
.. list-table::
|
||||
:widths: 20 40 40
|
||||
@ -1018,29 +1016,29 @@ controlled by guest.
|
||||
|
||||
* - cr0_always_on_mask
|
||||
- fixed0 & (~(CR0_PE | CR0_PG))
|
||||
- where fixed0 is gotten from MSR_IA32_VMX_CR0_FIXED0, means these bits
|
||||
are fixed to be 1 under VMX operation
|
||||
- fixed0 comes from MSR_IA32_VMX_CR0_FIXED0, these bits
|
||||
are fixed to be 1 under VMX operation.
|
||||
|
||||
* - cr0_always_off_mask
|
||||
- ~fixed1
|
||||
- where ~fixed1 is gotten from MSR_IA32_VMX_CR0_FIXED1, means these bits
|
||||
are fixed to be 0 under VMX operation
|
||||
- ~fixed1 comes from MSR_IA32_VMX_CR0_FIXED1, these bits
|
||||
are fixed to be 0 under VMX operation.
|
||||
|
||||
* - CR0_TRAP_MASK
|
||||
- CR0_PE | CR0_PG | CR0_WP | CR0_CD | CR0_NW
|
||||
- ACRN will also trap PE, PG, WP, CD, and NW bits
|
||||
- ACRN will also trap PE, PG, WP, CD, and NW bits.
|
||||
|
||||
* - cr0_host_mask
|
||||
- ~(fixed0 ^ fixed1) | CR0_TRAP_MASK
|
||||
- ACRN will finally trap bits under VMX root mode control plus
|
||||
additionally added bits
|
||||
additionally added bits.
|
||||
|
||||
|
||||
For ``mov to cr0`` emulation, ACRN will handle a paging mode change based on
|
||||
PG bit change, and a cache mode change based on CD and NW bits changes.
|
||||
ACRN also takes care of illegal writing from guest to invalid
|
||||
ACRN also takes care of illegal writing from a guest to invalid
|
||||
CR0 bits (for example, set PG while CR4.PAE = 0 and IA32_EFER.LME = 1),
|
||||
which will finally inject a #GP to guest. Finally,
|
||||
which will finally inject a #GP to the guest. Finally,
|
||||
*VMX_CR0_READ_SHADOW* will be updated for guest reading of host
|
||||
controlled bits, and *VMX_GUEST_CR0* will be updated for real vmx cr0
|
||||
setting.
|
||||
@ -1055,12 +1053,12 @@ setting.
|
||||
|
||||
* - cr4_always_on_mask
|
||||
- fixed0
|
||||
- where fixed0 is gotten from MSR_IA32_VMX_CR4_FIXED0, means these bits
|
||||
- fixed0 comes from MSR_IA32_VMX_CR4_FIXED0, these bits
|
||||
are fixed to be 1 under VMX operation
|
||||
|
||||
* - cr4_always_off_mask
|
||||
- ~fixed1
|
||||
- where ~fixed1 is gotten from MSR_IA32_VMX_CR4_FIXED1, means these bits
|
||||
- ~fixed1 comes from MSR_IA32_VMX_CR4_FIXED1, these bits
|
||||
are fixed to be 0 under VMX operation
|
||||
|
||||
* - CR4_TRAP_MASK
|
||||
@ -1080,39 +1078,36 @@ The ``mov to cr4`` emulation is similar to cr0 emulation noted above.
|
||||
IO/MMIO Emulation
|
||||
*****************
|
||||
|
||||
ACRN always enables I/O bitmap in *VMX_PROC_VM_EXEC_CONTROLS* and EPT
|
||||
ACRN always enables an I/O bitmap in *VMX_PROC_VM_EXEC_CONTROLS* and EPT
|
||||
in *VMX_PROC_VM_EXEC_CONTROLS2*. Based on them,
|
||||
*pio_instr_vmexit_handler* and *ept_violation_vmexit_handler* are
|
||||
used for IO/MMIO emulation for a emulated device. The emulated device
|
||||
could locate in hypervisor or DM in the Service VM. Refer to the "I/O
|
||||
Emulation" section for more details.
|
||||
used for IO/MMIO emulation for an emulated device. The device can
|
||||
be emulated by the hypervisor or DM in the Service VM.
|
||||
|
||||
For an emulated device done in the hypervisor, ACRN provide some basic
|
||||
For a device emulated by the hypervisor, ACRN provides some basic
|
||||
APIs to register its IO/MMIO range:
|
||||
|
||||
- For the Service VM, the default I/O bitmap are all set to 0, which means
|
||||
the Service VM will passthrough all I/O port access by default. Adding an I/O handler
|
||||
for a hypervisor emulated device needs to first set its corresponding
|
||||
I/O bitmap to 1.
|
||||
- For the Service VM, the default I/O bitmap values are all set to 0, which
|
||||
means the Service VM will passthrough all I/O port access by default. Adding
|
||||
an I/O handler for a hypervisor emulated device needs to first set its
|
||||
corresponding I/O bitmap to 1.
|
||||
|
||||
- For the User VM, the default I/O bitmap are all set to 1, which means the User Vm will trap
|
||||
all I/O port access by default. Adding an I/O handler for a
|
||||
hypervisor emulated device does not need change its I/O bitmap.
|
||||
If the trapped I/O port access does not fall into a hypervisor
|
||||
emulated device, it will create an I/O request and pass it to the Service VM
|
||||
DM.
|
||||
- For the User VM, the default I/O bitmap values are all set to 1, which means
|
||||
the User VM will trap all I/O port access by default. Adding an I/O handler
|
||||
for a hypervisor emulated device does not need to change its I/O bitmap. If
|
||||
the trapped I/O port access does not fall into a hypervisor emulated device,
|
||||
it will create an I/O request and pass it to the Service VM DM.
|
||||
|
||||
- For the Service VM, EPT maps all range of memory to the Service VM except for ACRN hypervisor
|
||||
area. This means the Service VM will passthrough all MMIO access by
|
||||
default. Adding a MMIO handler for a hypervisor emulated
|
||||
- For the Service VM, EPT maps the entire range of memory to the Service VM
|
||||
except for the ACRN hypervisor area. The Service VM will passthrough all
|
||||
MMIO access by default. Adding an MMIO handler for a hypervisor emulated
|
||||
device needs to first remove its MMIO range from EPT mapping.
|
||||
|
||||
- For the User VM, EPT only maps its system RAM to the User VM, which means the User VM will
|
||||
trap all MMIO access by default. Adding an MMIO handler for a
|
||||
hypervisor emulated device does not need to change its EPT mapping.
|
||||
If the trapped MMIO access does not fall into a hypervisor
|
||||
emulated device, it will create an I/O request and pass it to the Service VM
|
||||
DM.
|
||||
- For the User VM, EPT only maps its system RAM to the User VM, which means the
|
||||
User VM will trap all MMIO access by default. Adding an MMIO handler for a
|
||||
hypervisor emulated device does not need to change its EPT mapping. If the
|
||||
trapped MMIO access does not fall into a hypervisor emulated device, it will
|
||||
create an I/O request and pass it to the Service VM DM.
|
||||
|
||||
.. list-table::
|
||||
:widths: 30 70
|
||||
@ -1122,12 +1117,12 @@ APIs to register its IO/MMIO range:
|
||||
- **Description**
|
||||
|
||||
* - register_pio_emulation_handler
|
||||
- register an I/O emulation handler for a hypervisor emulated device
|
||||
by specific I/O range
|
||||
- Register an I/O emulation handler for a hypervisor emulated device
|
||||
by specific I/O range.
|
||||
|
||||
* - register_mmio_emulation_handler
|
||||
- register a MMIO emulation handler for a hypervisor emulated device
|
||||
by specific MMIO range
|
||||
- Register an MMIO emulation handler for a hypervisor emulated device
|
||||
by specific MMIO range.
|
||||
|
||||
.. _instruction-emulation:
|
||||
|
||||
@ -1140,7 +1135,7 @@ hypervisor needs to decode the instruction from RIP then attempt the
|
||||
corresponding emulation based on its instruction and read/write direction.
|
||||
|
||||
ACRN currently supports emulating instructions for ``mov``, ``movx``,
|
||||
``movs``, ``stos``, ``test``, ``and``, ``or``, ``cmp``, ``sub`` and
|
||||
``movs``, ``stos``, ``test``, ``and``, ``or``, ``cmp``, ``sub``, and
|
||||
``bittest`` without support for lock prefix. Real mode emulation is not
|
||||
supported.
|
||||
|
||||
@ -1151,21 +1146,21 @@ supported.
|
||||
|
||||
In the handlers for EPT violation or APIC access VM exit, ACRN will:
|
||||
|
||||
1. Fetch the MMIO access request's address and size
|
||||
1. Fetch the MMIO access request's address and size.
|
||||
|
||||
2. Do *decode_instruction* for the instruction in current RIP
|
||||
2. Do *decode_instruction* for the instruction in the current RIP
|
||||
with the following check:
|
||||
|
||||
a. Is the instruction supported? If not, inject #UD to guest.
|
||||
b. Is GVA of RIP, dest, and src valid? If not, inject #PF to guest.
|
||||
c. Is stack valid? If not, inject #SS to guest.
|
||||
a. Is the instruction supported? If not, inject #UD to the guest.
|
||||
b. Is the GVA of RIP, dest, and src valid? If not, inject #PF to the guest.
|
||||
c. Is the stack valid? If not, inject #SS to the guest.
|
||||
|
||||
3. If step 2 succeeds, check the access direction. If it's a write, then
|
||||
do *emulate_instruction* to fetch MMIO request's value from
|
||||
do *emulate_instruction* to fetch the MMIO request's value from
|
||||
instruction operands.
|
||||
|
||||
4. Execute MMIO request handler, for EPT violation is *emulate_io*
|
||||
while APIC access is *vlapic_write/read* based on access
|
||||
4. Execute the MMIO request handler. For EPT violation, it is *emulate_io*.
|
||||
For APIC access, it is *vlapic_write/read* based on access
|
||||
direction. It will finally complete this MMIO request emulation
|
||||
by:
|
||||
|
||||
@ -1173,25 +1168,25 @@ In the handlers for EPT violation or APIC access VM exit, ACRN will:
|
||||
b. getting req.val from req.addr for read operation
|
||||
|
||||
5. If the access direction is read, then do *emulate_instruction* to
|
||||
put MMIO request's value into instruction operands.
|
||||
put the MMIO request's value into instruction operands.
|
||||
|
||||
6. Return to guest.
|
||||
6. Return to the guest.
|
||||
|
||||
TSC Emulation
|
||||
*************
|
||||
|
||||
Guest vCPU execution of *RDTSC/RDTSCP* and access to
|
||||
*MSR_IA32_TSC_AUX* does not cause a VM Exit to the hypervisor.
|
||||
Hypervisor uses *MSR_IA32_TSC_AUX* to record CPU ID, thus
|
||||
the CPU ID provided by *MSR_IA32_TSC_AUX* might be changed via Guest.
|
||||
*MSR_IA32_TSC_AUX* do not cause a VM Exit to the hypervisor.
|
||||
The hypervisor uses *MSR_IA32_TSC_AUX* to record CPU ID, thus
|
||||
the CPU ID provided by *MSR_IA32_TSC_AUX* might be changed via the guest.
|
||||
|
||||
*RDTSCP* is widely used by hypervisor to identify current CPU ID. Due
|
||||
to no VM Exit for *MSR_IA32_TSC_AUX* MSR register, ACRN hypervisor
|
||||
saves/restores *MSR_IA32_TSC_AUX* value on every VM Exit/Enter.
|
||||
Before hypervisor restores host CPU ID, *rdtscp* should not be
|
||||
called as it could get vCPU ID instead of host CPU ID.
|
||||
*RDTSCP* is widely used by the hypervisor to identify the current CPU ID. Due
|
||||
to no VM Exit for the *MSR_IA32_TSC_AUX* MSR register, the ACRN hypervisor
|
||||
saves the *MSR_IA32_TSC_AUX* value on every VM Exit and restores it on every VM Enter.
|
||||
Before the hypervisor restores the host CPU ID, *rdtscp* should not be
|
||||
called as it could get the vCPU ID instead of the host CPU ID.
|
||||
|
||||
The *MSR_IA32_TIME_STAMP_COUNTER* is emulated by ACRN hypervisor, with a
|
||||
The *MSR_IA32_TIME_STAMP_COUNTER* is emulated by the ACRN hypervisor, with a
|
||||
simple implementation based on *TSC_OFFSET* (enabled
|
||||
in *VMX_PROC_VM_EXEC_CONTROLS*):
|
||||
|
||||
@ -1202,7 +1197,7 @@ ART Virtualization
|
||||
******************
|
||||
|
||||
The invariant TSC is based on the invariant timekeeping hardware (called
|
||||
Always Running Timer or ART), that runs at the core crystal clock frequency.
|
||||
Always Running Timer or ART), which runs at the core crystal clock frequency.
|
||||
The ratio defined by the CPUID leaf 15H expresses the frequency relationship
|
||||
between the ART hardware and the TSC.
|
||||
|
||||
@ -1215,34 +1210,34 @@ Where `K` is an offset that can be adjusted by a privileged agent.
|
||||
When ART hardware is reset, both invariant TSC and K are also reset.
|
||||
|
||||
The guideline of ART virtualization (vART) is that software in native can run in
|
||||
VM too. The vART solution is:
|
||||
the VM too. The vART solution is:
|
||||
|
||||
- Present the ART capability to guest through CPUID leaf 15H for `CPUID.15H:EBX[31:0]`
|
||||
- Present the ART capability to the guest through CPUID leaf 15H for `CPUID.15H:EBX[31:0]`
|
||||
and `CPUID.15H:EAX[31:0]`.
|
||||
- Passthrough devices see the physical ART_Value (vART_Value = pART_Value)
|
||||
- Relationship between the ART and TSC in guest is:
|
||||
- Passthrough devices see the physical ART_Value (vART_Value = pART_Value).
|
||||
- Relationship between the ART and TSC in the guest is:
|
||||
``vTSC_Value = (vART_Value * CPUID.15H:EBX[31:0]) / CPUID.15H:EAX[31:0] + vK``
|
||||
Where `vK = K + VMCS.TSC_OFFSET`.
|
||||
- If `vK` or `vTSC_Value` are changed by guest, we change the `VMCS.TSC_OFFSET` accordingly.
|
||||
- `K` should never be changed by hypervisor.
|
||||
where `vK = K + VMCS.TSC_OFFSET`.
|
||||
- If the guest changes `vK` or `vTSC_Value`, we change the `VMCS.TSC_OFFSET` accordingly.
|
||||
- `K` should never be changed by the hypervisor.
|
||||
|
||||
XSAVE Emulation
|
||||
***************
|
||||
|
||||
The XSAVE feature set is comprised of eight instructions:
|
||||
The XSAVE feature set is composed of eight instructions:
|
||||
|
||||
- *XGETBV* and *XSETBV* allow software to read and write the extended
|
||||
control register *XCR0*, which controls the operation of the
|
||||
XSAVE feature set.
|
||||
|
||||
- *XSAVE*, *XSAVEOPT*, *XSAVEC*, and *XSAVES* are four instructions
|
||||
that save processor state to memory.
|
||||
that save the processor state to memory.
|
||||
|
||||
- *XRSTOR* and *XRSTORS* are corresponding instructions that load
|
||||
- *XRSTOR* and *XRSTORS* are corresponding instructions that load the
|
||||
processor state from memory.
|
||||
|
||||
- *XGETBV*, *XSAVE*, *XSAVEOPT*, *XSAVEC*, and *XRSTOR* can be executed
|
||||
at any privilege level;
|
||||
at any privilege level.
|
||||
|
||||
- *XSETBV*, *XSAVES*, and *XRSTORS* can be executed only if CPL = 0.
|
||||
|
||||
@ -1256,7 +1251,7 @@ and IA32_XSS MSR. Refer to the `Intel SDM Volume 1`_ chapter 13 for more details
|
||||
.. figure:: images/hld-image38.png
|
||||
:align: center
|
||||
|
||||
ACRN Hypervisor XSAVE emulation
|
||||
ACRN Hypervisor XSAVE Emulation
|
||||
|
||||
By default, ACRN enables XSAVES/XRSTORS in
|
||||
*VMX_PROC_VM_EXEC_CONTROLS2*, so it allows the guest to use the XSAVE
|
||||
@ -1265,12 +1260,12 @@ exit, ACRN actually needs to take care of XCR0 access.
|
||||
|
||||
ACRN emulates XSAVE features through the following rules:
|
||||
|
||||
1. Enumerate CPUID.01H for native XSAVE feature support
|
||||
2. If yes for step 1, enable XSAVE in hypervisor by CR4.OSXSAVE
|
||||
3. Emulates XSAVE related CPUID.01H & CPUID.0DH to guest
|
||||
4. Emulates XCR0 access through *xsetbv_vmexit_handler*
|
||||
5. ACRN passthrough the access of IA32_XSS MSR to guest
|
||||
6. ACRN hypervisor does NOT use any feature of XSAVE
|
||||
7. As ACRN emulate vCPU with partition mode, so based on above rules 5
|
||||
1. Enumerate CPUID.01H for native XSAVE feature support.
|
||||
2. If yes for step 1, enable XSAVE in the hypervisor by CR4.OSXSAVE.
|
||||
3. Emulate XSAVE related CPUID.01H and CPUID.0DH to the guest.
|
||||
4. Emulate XCR0 access through *xsetbv_vmexit_handler*.
|
||||
5. Passthrough the access of IA32_XSS MSR to the guest.
|
||||
6. ACRN hypervisor does NOT use any feature of XSAVE.
|
||||
7. When ACRN emulates the vCPU with partition mode: based on above rules 5
|
||||
and 6, a guest vCPU will fully control the XSAVE feature in
|
||||
non-root mode.
|
||||
|
Binary file not shown.
Before Width: | Height: | Size: 58 KiB After Width: | Height: | Size: 68 KiB |
Loading…
Reference in New Issue
Block a user