mirror of
https://github.com/projectacrn/acrn-hypervisor.git
synced 2025-05-04 06:26:54 +00:00
First pass at updating obsolete usage of "UOS" and "SOS" Signed-off-by: David B. Kinder <david.b.kinder@intel.com>
1285 lines
37 KiB
ReStructuredText
1285 lines
37 KiB
ReStructuredText
.. _hv-cpu-virt:
|
|
|
|
CPU Virtualization
|
|
##################
|
|
|
|
.. figure:: images/hld-image47.png
|
|
:align: center
|
|
:name: hv-cpu-virt-components
|
|
|
|
ACRN Hypervisor CPU Virtualization Components
|
|
|
|
The following sections discuss the major modules (shown in blue) in the
|
|
CPU virtualization overview shown in :numref:`hv-cpu-virt-components`.
|
|
|
|
Based on Intel VT-x virtualization technology, ACRN emulates a virtual CPU
|
|
(vCPU) with the following methods:
|
|
|
|
- **core partition**: one vCPU is dedicated and associated with one
|
|
physical CPU (pCPU),
|
|
making much of hardware register emulation simply
|
|
pass-through and provides good isolation for physical interrupt
|
|
and guest execution. (See `Static CPU Partitioning`_ for more
|
|
information.)
|
|
|
|
- **core sharing** (to be added): two or more vCPUs are sharing one
|
|
physical CPU (pCPU); a more complicated context switch is needed
|
|
between different vCPUs' switching, and provides flexible computing
|
|
resources sharing for low performance demand vCPU tasks.
|
|
(See `Flexible CPU Sharing`_ for more information.)
|
|
|
|
- **simple schedule**: a well-designed scheduler framework allows ACRN
|
|
to adopt different scheduling policy, for example - noop & round-robin:
|
|
|
|
noop scheduler - only two thread loops are maintained for a CPU -
|
|
vCPU thread and default idle thread. A CPU runs most of the time in
|
|
the vCPU thread for emulating a guest CPU, switching between VMX root
|
|
mode and non-root mode. A CPU schedules out to default idle when an
|
|
operation needs it to stay in VMX root mode, such as when waiting for
|
|
an I/O request from DM or ready to destroy.
|
|
|
|
round-robin scheduler (to be added) - allow more vcpu thread loops
|
|
running on a CPU. A CPU switches among different vCPU thread and default
|
|
idle thread, upon running out corresponding timeslice or necessary
|
|
scheduling out such as waiting for an I/O request. A vCPU could yield
|
|
itself as well, for example when it executes "PAUSE" instruction.
|
|
|
|
|
|
Static CPU Partitioning
|
|
***********************
|
|
|
|
CPU partitioning is a policy for mapping a virtual
|
|
CPU (VCPU) to a physical CPU. To enable this, the ACRN hypervisor could
|
|
configure "noop scheduler" as the schedule policy for this physical CPU.
|
|
|
|
ACRN then forces a fixed 1:1 mapping between a VCPU and this physical CPU
|
|
when creating a VCPU for the guest Operating System. This makes the VCPU
|
|
management code much simpler.
|
|
|
|
``vcpu_affinity`` in ``vm config`` help to decide which physical CPU a
|
|
VCPU in a VM affine to, then finalize the fixed mapping.
|
|
|
|
Flexible CPU Sharing
|
|
********************
|
|
|
|
This is a TODO feature.
|
|
To enable CPU sharing, the ACRN hypervisor could configure "round-robin
|
|
scheduler" as the schedule policy for corresponding physical CPU.
|
|
|
|
``vcpu_affinity`` in ``vm config`` help to decide which physical CPU two
|
|
or more VCPUs from different VMs are sharing.
|
|
|
|
CPU management in the Service VM under static CPU partitioning
|
|
==============================================================
|
|
|
|
With ACRN, all ACPI table entries are pass-thru to the Service VM, including
|
|
the Multiple Interrupt Controller Table (MADT). The Service VM sees all
|
|
physical CPUs by parsing the MADT when the Service VM kernel boots. All
|
|
physical CPUs are initially assigned to the Service VM by creating the same
|
|
number of virtual CPUs.
|
|
|
|
When the Service VM boot is finished, it releases the physical CPUs intended
|
|
for User VM use.
|
|
|
|
Here is an example flow of CPU allocation on a multi-core platform.
|
|
|
|
.. figure:: images/static-core-image2.png
|
|
:width: 600px
|
|
:align: center
|
|
:name: static-core-cpu-allocation
|
|
|
|
CPU allocation on a multi-core platform
|
|
|
|
CPU management in the Service VM under flexing CPU sharing
|
|
==========================================================
|
|
|
|
As all Service VM CPUs could share with different User VMs, ACRN can still pass-thru
|
|
MADT to Service VM, and the Service VM is still able to see all physical CPUs.
|
|
|
|
But as under CPU sharing, the Service VM does not need offline/release the physical
|
|
CPUs intended for User VM use.
|
|
|
|
CPU management in User VM
|
|
=========================
|
|
|
|
From the User VM point of view, CPU management is very simple - when DM does
|
|
hypercalls to create VMs, the hypervisor will create its virtual CPUs
|
|
based on the configuration in this User VM's ``vm config``.
|
|
|
|
As mentioned in previous description, ``vcpu_affinity`` in ``vm config``
|
|
tells which physical CPUs a VM's VCPU will use, and the scheduler policy
|
|
associated with corresponding physical CPU decide this VCPU will run in
|
|
partition or sharing mode.
|
|
|
|
|
|
CPU assignment management in HV
|
|
===============================
|
|
|
|
The physical CPU assignment is pre-defined by ``vcpu_affinity`` in
|
|
``vm config``, necessary sanitize check should be done to ensure
|
|
|
|
- in one VM, each VCPU will have only one prefer physical CPU
|
|
|
|
- in one VM, its VCPUs will not share same physical CPU
|
|
|
|
- in one VM, if a VCPU is using "noop scheduler", corresponding
|
|
physical CPU will not be shared with any other VM's VCPU
|
|
|
|
Currently, the ACRN hypervisor does not support virtual CPU migration to
|
|
different physical CPUs. This means no changes to the virtual CPU to
|
|
physical CPU can happen without first calling offline_vcpu.
|
|
|
|
|
|
.. _vCPU_lifecycle:
|
|
|
|
vCPU Lifecycle
|
|
**************
|
|
|
|
A vCPU lifecycle is shown in :numref:`hv-vcpu-transitions` below, where
|
|
the major states are:
|
|
|
|
- **VCPU_INIT**: vCPU is in an initialized state, and its vCPU thread
|
|
is not ready to run on its associated CPU
|
|
|
|
- **VCPU_RUNNING**: vCPU is running, and its vCPU thread is ready (in
|
|
the queue) or running on its associated CPU
|
|
|
|
- **VCPU_PAUSED**: vCPU is paused, and its vCPU thread is not running
|
|
on its associated CPU
|
|
|
|
- **VPCU_ZOMBIE**: vCPU is being offline, and its vCPU thread is not
|
|
running on its associated CPU
|
|
|
|
- **VPCU_OFFLINE**: vCPU is offline
|
|
|
|
.. figure:: images/hld-image17.png
|
|
:align: center
|
|
:name: hv-vcpu-transitions
|
|
|
|
ACRN vCPU state transitions
|
|
|
|
Following functions are used to drive the state machine of the vCPU
|
|
lifecycle:
|
|
|
|
.. doxygenfunction:: create_vcpu
|
|
:project: Project ACRN
|
|
|
|
.. doxygenfunction:: pause_vcpu
|
|
:project: Project ACRN
|
|
|
|
.. doxygenfunction:: resume_vcpu
|
|
:project: Project ACRN
|
|
|
|
.. doxygenfunction:: reset_vcpu
|
|
:project: Project ACRN
|
|
|
|
.. doxygenfunction:: offline_vcpu
|
|
:project: Project ACRN
|
|
|
|
|
|
vCPU Scheduling under static CPU partitioning
|
|
*********************************************
|
|
|
|
.. figure:: images/hld-image35.png
|
|
:align: center
|
|
:name: hv-vcpu-schedule
|
|
|
|
ACRN vCPU scheduling flow under static CPU partitioning
|
|
|
|
As describes in the CPU virtualization overview, if under static
|
|
CPU partitioning, ACRN implements a simple scheduling mechanism
|
|
based on two threads: vcpu_thread and default_idle. A vCPU with
|
|
VCPU_RUNNING state always runs in a vcpu_thread loop, meanwhile
|
|
a vCPU with VCPU_PAUSED or VCPU_ZOMBIE state runs in default_idle
|
|
loop. The detail behaviors in vcpu_thread and default_idle threads
|
|
are illustrated in :numref:`hv-vcpu-schedule`:
|
|
|
|
- The **vcpu_thread** loop will do the loop of handling vm exits,
|
|
and pending requests around the VM entry/exit.
|
|
It will also check the reschedule request then schedule out to
|
|
default_idle if necessary. See `vCPU Thread`_ for more details
|
|
of vcpu_thread.
|
|
|
|
- The **default_idle** loop simply does do_cpu_idle while also
|
|
checking for need-offline and reschedule requests.
|
|
If a CPU is marked as need-offline, it will go to cpu_dead.
|
|
If a reschedule request is made for this CPU, it will
|
|
schedule out to vcpu_thread if necessary.
|
|
|
|
- The function ``make_reschedule_request`` drives the thread
|
|
switch between vcpu_thread and default_idle.
|
|
|
|
Some example scenario flows are shown here:
|
|
|
|
.. figure:: images/hld-image7.png
|
|
:align: center
|
|
|
|
ACRN vCPU scheduling scenarios
|
|
|
|
- **During starting a VM**: after create a vCPU, BSP calls *launch_vcpu*
|
|
through *start_vm*, AP calls *launch_vcpu* through vlapic
|
|
INIT-SIPI emulation, finally this vCPU runs in a
|
|
*vcpu_thread* loop.
|
|
|
|
- **During shutting down a VM**: *pause_vm* function call makes a vCPU
|
|
running in *vcpu_thread* to schedule out to *default_idle*. The
|
|
following *reset_vcpu* and *offline_vcpu* de-init and then offline
|
|
this vCPU instance.
|
|
|
|
- **During IOReq handling**: after an IOReq is sent to DM for emulation, a
|
|
vCPU running in *vcpu_thread* schedules out to *default_idle*
|
|
through *acrn_insert_request_wait->pause_vcpu*. After DM
|
|
complete the emulation for this IOReq, it calls
|
|
*hcall_notify_ioreq_finish->resume_vcpu* and makes the vCPU
|
|
schedule back to *vcpu_thread* to continue its guest execution.
|
|
|
|
vCPU Scheduling under flexible CPU sharing
|
|
******************************************
|
|
|
|
To be added.
|
|
|
|
vCPU Thread
|
|
***********
|
|
|
|
The vCPU thread flow is a loop as shown and described below:
|
|
|
|
.. figure:: images/hld-image68.png
|
|
:align: center
|
|
|
|
ACRN vCPU thread
|
|
|
|
|
|
1. Check if *vcpu_thread* needs to schedule out to *default_idle* or
|
|
other *vcpu_thread* by reschedule request. If needed, then schedule
|
|
out to *default_idle* or other *vcpu_thread*.
|
|
|
|
2. Handle pending request by calling *acrn_handle_pending_request*.
|
|
(See `Pending Request Handlers`_.)
|
|
|
|
3. VM Enter by calling *start/run_vcpu*, then enter non-root mode to do
|
|
guest execution.
|
|
|
|
4. VM Exit from *start/run_vcpu* when guest trigger vm exit reason in
|
|
non-root mode.
|
|
|
|
5. Handle vm exit based on specific reason.
|
|
|
|
6. Loop back to step 1.
|
|
|
|
vCPU Run Context
|
|
================
|
|
|
|
During a vCPU switch between root and non-root mode, the run context of
|
|
the vCPU is saved and restored using this structure:
|
|
|
|
.. doxygenstruct:: run_context
|
|
:project: Project ACRN
|
|
|
|
The vCPU handles runtime context saving by three different
|
|
categories:
|
|
|
|
- Always save/restore during vm exit/entry:
|
|
|
|
- These registers must be saved every time vm exit, and restored
|
|
every time vm entry
|
|
- Registers include: general purpose registers, CR2, and
|
|
IA32_SPEC_CTRL
|
|
- Definition in *vcpu->run_context*
|
|
- Get/Set them through *vcpu_get/set_xxx*
|
|
|
|
- On-demand cache/update during vm exit/entry:
|
|
|
|
- These registers are used frequently. They should be cached from
|
|
VMCS on first time access after a VM exit, and updated to VMCS on
|
|
VM entry if marked dirty
|
|
- Registers include: RSP, RIP, EFER, RFLAGS, CR0, and CR4
|
|
- Definition in *vcpu->run_context*
|
|
- Get/Set them through *vcpu_get/set_xxx*
|
|
|
|
- Always read/write from/to VMCS:
|
|
|
|
- These registers are rarely used. Access to them is always
|
|
from/to VMCS.
|
|
- Registers are in VMCS but not list in the two cases above.
|
|
- No definition in *vcpu->run_context*
|
|
- Get/Set them through VMCS API
|
|
|
|
For the first two categories above, ACRN provides these get/set APIs:
|
|
|
|
.. doxygenfunction:: vcpu_get_gpreg
|
|
:project: Project ACRN
|
|
|
|
.. doxygenfunction:: vcpu_set_gpreg
|
|
:project: Project ACRN
|
|
|
|
.. doxygenfunction:: vcpu_get_rip
|
|
:project: Project ACRN
|
|
|
|
.. doxygenfunction:: vcpu_set_rip
|
|
:project: Project ACRN
|
|
|
|
.. doxygenfunction:: vcpu_get_rsp
|
|
:project: Project ACRN
|
|
|
|
.. doxygenfunction:: vcpu_set_rsp
|
|
:project: Project ACRN
|
|
|
|
.. doxygenfunction:: vcpu_get_efer
|
|
:project: Project ACRN
|
|
|
|
.. doxygenfunction:: vcpu_set_efer
|
|
:project: Project ACRN
|
|
|
|
.. doxygenfunction:: vcpu_get_rflags
|
|
:project: Project ACRN
|
|
|
|
.. doxygenfunction:: vcpu_set_rflags
|
|
:project: Project ACRN
|
|
|
|
.. doxygenfunction:: vcpu_get_cr0
|
|
:project: Project ACRN
|
|
|
|
.. doxygenfunction:: vcpu_set_cr0
|
|
:project: Project ACRN
|
|
|
|
.. doxygenfunction:: vcpu_get_cr2
|
|
:project: Project ACRN
|
|
|
|
.. doxygenfunction:: vcpu_set_cr2
|
|
:project: Project ACRN
|
|
|
|
.. doxygenfunction:: vcpu_get_cr4
|
|
:project: Project ACRN
|
|
|
|
.. doxygenfunction:: vcpu_set_cr4
|
|
:project: Project ACRN
|
|
|
|
|
|
VM Exit Handlers
|
|
================
|
|
|
|
ACRN implements its VM exit handlers with a static table. Except for the
|
|
exit reasons listed below, a default *unhandled_vmexit_handler* is used
|
|
that will trigger an error message and return without handling:
|
|
|
|
.. list-table::
|
|
:widths: 33 33 33
|
|
:header-rows: 1
|
|
|
|
* - **VM Exit Reason**
|
|
- **Handler**
|
|
- **Desc**
|
|
|
|
* - VMX_EXIT_REASON_EXCEPTION_OR_NMI
|
|
- exception_vmexit_handler
|
|
- Only trap #MC, print error then inject back to guest
|
|
|
|
* - VMX_EXIT_REASON_EXTERNAL_INTERRUPT
|
|
- external_interrupt_vmexit_handler
|
|
- External interrupt handler for physical interrupt happening in non-root mode
|
|
|
|
* - VMX_EXIT_REASON_TRIPLE_FAULT
|
|
- triple_fault_vmexit_handler
|
|
- Handle triple fault from vcpu
|
|
|
|
* - VMX_EXIT_REASON_INIT_SIGNAL
|
|
- init_signal_vmexit_handler
|
|
- Handle INIT signal from vcpu
|
|
|
|
* - VMX_EXIT_REASON_INTERRUPT_WINDOW
|
|
- interrupt_window_vmexit_handler
|
|
- To support interrupt window if VID is disabled
|
|
|
|
* - VMX_EXIT_REASON_CPUID
|
|
- cpuid_vmexit_handler
|
|
- Handle CPUID access from guest
|
|
|
|
* - VMX_EXIT_REASON_VMCALL
|
|
- vmcall_vmexit_handler
|
|
- Handle hypercall from guest
|
|
|
|
* - VMX_EXIT_REASON_CR_ACCESS
|
|
- cr_access_vmexit_handler
|
|
- Handle CR registers access from guest
|
|
|
|
* - VMX_EXIT_REASON_IO_INSTRUCTION
|
|
- pio_instr_vmexit_handler
|
|
- Emulate I/O access with range in IO_BITMAP,
|
|
which may have a handler in hypervisor (such as vuart or vpic),
|
|
or need to create an I/O request to DM
|
|
|
|
* - VMX_EXIT_REASON_RDMSR
|
|
- rdmsr_vmexit_handler
|
|
- Read MSR from guest in MSR_BITMAP
|
|
|
|
* - VMX_EXIT_REASON_WRMSR
|
|
- wrmsr_vmexit_handler
|
|
- Write MSR from guest in MSR_BITMAP
|
|
|
|
* - VMX_EXIT_REASON_APIC_ACCESS
|
|
- apic_access_vmexit_handler
|
|
- APIC access for APICv
|
|
|
|
* - VMX_EXIT_REASON_VIRTUALIZED_EOI
|
|
- veoi_vmexit_handler
|
|
- Trap vLAPIC EOI for specific vector with level trigger mode
|
|
in vIOAPIC, required for supporting PTdev
|
|
|
|
* - VMX_EXIT_REASON_EPT_VIOLATION
|
|
- ept_violation_vmexit_handler
|
|
- MMIO emulation, which may have handler in hypervisor
|
|
(such as vLAPIC or vIOAPIC), or need to create an I/O
|
|
request to DM
|
|
|
|
* - VMX_EXIT_REASON_XSETBV
|
|
- xsetbv_vmexit_handler
|
|
- Set host owned XCR0 for supporting xsave
|
|
|
|
* - VMX_EXIT_REASON_APIC_WRITE
|
|
- apic_write_vmexit_handler
|
|
- APIC write for APICv
|
|
|
|
|
|
Details of each vm exit reason handler are described in other sections.
|
|
|
|
.. _pending-request-handlers:
|
|
|
|
Pending Request Handlers
|
|
========================
|
|
|
|
ACRN uses the function *acrn_handle_pending_request* to handle
|
|
requests before VM entry in *vcpu_thread*.
|
|
|
|
A bitmap in the vCPU structure lists the different requests:
|
|
|
|
.. code-block:: c
|
|
|
|
#define ACRN_REQUEST_EXCP 0U
|
|
#define ACRN_REQUEST_EVENT 1U
|
|
#define ACRN_REQUEST_EXTINT 2U
|
|
#define ACRN_REQUEST_NMI 3U
|
|
#define ACRN_REQUEST_EOI_EXIT_BITMAP_UPDATE 4U
|
|
#define ACRN_REQUEST_EPT_FLUSH 5U
|
|
#define ACRN_REQUEST_TRP_FAULT 6U
|
|
#define ACRN_REQUEST_VPID_FLUSH 7U /* flush vpid tlb */
|
|
|
|
|
|
ACRN provides the function *vcpu_make_request* to make different
|
|
requests, set the bitmap of the corresponding request, and notify the target
|
|
vCPU through the IPI if necessary (when the target vCPU is not currently
|
|
running). See :ref:`vcpu-request-interrupt-injection` for details.
|
|
|
|
.. code-block:: c
|
|
|
|
void vcpu_make_request(struct vcpu *vcpu, uint16_t eventid)
|
|
{
|
|
uint16_t pcpu_id = pcpuid_from_vcpu(vcpu);
|
|
|
|
bitmap_set_lock(eventid, &vcpu->arch_vcpu.pending_req);
|
|
/*
|
|
* if current hostcpu is not the target vcpu's hostcpu, we need
|
|
* to invoke IPI to wake up target vcpu
|
|
*
|
|
* TODO: Here we just compare with cpuid, since cpuid currently is
|
|
* global under pCPU / vCPU 1:1 mapping. If later we enabled vcpu
|
|
* scheduling, we need change here to determine it target vcpu is
|
|
* VMX non-root or root mode
|
|
*/
|
|
if (get_cpu_id() != pcpu_id) {
|
|
send_single_ipi(pcpu_id, VECTOR_NOTIFY_VCPU);
|
|
}
|
|
}
|
|
|
|
For each request, function *acrn_handle_pending_request* handles each
|
|
request as shown below.
|
|
|
|
|
|
.. list-table::
|
|
:widths: 25 25 25 25
|
|
:header-rows: 1
|
|
|
|
* - **Request**
|
|
- **Desc**
|
|
- **Request Maker**
|
|
- **Request Handler**
|
|
|
|
* - ACRN_REQUEST_EXCP
|
|
- Request for exception injection
|
|
- vcpu_inject_gp, vcpu_inject_pf, vcpu_inject_ud, vcpu_inject_ac,
|
|
or vcpu_inject_ss and then queue corresponding exception by
|
|
vcpu_queue_exception
|
|
- vcpu_inject_hi_exception, vcpu_inject_lo_exception based
|
|
on exception priority
|
|
|
|
* - ACRN_REQUEST_EVENT
|
|
- Request for vlapic interrupt vector injection
|
|
- vlapic_fire_lvt or vlapic_set_intr, which could be triggered
|
|
by vlapic lvt, vioapic, or vmsi
|
|
- vcpu_do_pending_event
|
|
|
|
* - ACRN_REQUEST_EXTINT
|
|
- Request for extint vector injection
|
|
- vcpu_inject_extint, triggered by vpic
|
|
- vcpu_do_pending_extint
|
|
|
|
* - ACRN_REQUEST_NMI
|
|
- Request for nmi injection
|
|
- vcpu_inject_nmi
|
|
- program VMX_ENTRY_INT_INFO_FIELD directly
|
|
|
|
* - ACRN_REQUEST_EOI_EXIT_BITMAP_UPDATE
|
|
- Request for update VEOI bitmap update for level triggered vector
|
|
- vlapic_reset_tmr or vlapic_set_tmr change trigger mode in RTC
|
|
- vcpu_set_vmcs_eoi_exit
|
|
|
|
* - ACRN_REQUEST_EPT_FLUSH
|
|
- Request for EPT flush
|
|
- ept_add_mr, ept_modify_mr, ept_del_mr, or vmx_write_cr0 disable cache
|
|
- invept
|
|
|
|
* - ACRN_REQUEST_TRP_FAULT
|
|
- Request for handling triple fault
|
|
- vcpu_queue_exception meet triple fault
|
|
- fatal error
|
|
|
|
* - ACRN_REQUEST_VPID_FLUSH
|
|
- Request for VPID flush
|
|
- None
|
|
- flush_vpid_single
|
|
|
|
.. note:: Refer to the interrupt management chapter for request
|
|
handling order for exception, nmi, and interrupts. For other requests
|
|
such as tmr update, or EPT flush, there is no mandatory order.
|
|
|
|
VMX Initialization
|
|
******************
|
|
|
|
ACRN will attempt to initialize the vCPU's VMCS before its first
|
|
launch with the host state, execution control, guest state,
|
|
entry control and exit control, as shown in the table below.
|
|
|
|
The table briefly shows how each field got configured.
|
|
The guest state field is critical for a guest CPU start to run
|
|
based on different CPU modes.
|
|
|
|
For a guest vCPU's state initialization:
|
|
|
|
- If it's BSP, the guest state configuration is done in SW load,
|
|
which could be initialized by different objects:
|
|
|
|
- The Service VM BSP: hypervisor will do context initialization in different
|
|
SW load based on different boot mode
|
|
|
|
|
|
- User VM BSP: DM context initialization through hypercall
|
|
|
|
- If it's AP, then it will always start from real mode, and the start
|
|
vector will always come from vlapic INIT-SIPI emulation.
|
|
|
|
.. doxygenstruct:: acrn_vcpu_regs
|
|
:project: Project ACRN
|
|
|
|
.. list-table::
|
|
:widths: 20 40 10 30
|
|
:header-rows: 1
|
|
|
|
* - **VMX Domain**
|
|
- **Fields**
|
|
- **Bits**
|
|
- **Description**
|
|
|
|
* - **host state**
|
|
- CS, DS, ES, FS, GS, TR, LDTR, GDTR, IDTR
|
|
- n/a
|
|
- According to host
|
|
|
|
* -
|
|
- MSR_IA32_PAT, MSR_IA32_EFER
|
|
- n/a
|
|
- According to host
|
|
|
|
* -
|
|
- CR0, CR3, CR4
|
|
- n/a
|
|
- According to host
|
|
|
|
* -
|
|
- RIP
|
|
- n/a
|
|
- Set to vm_exit pointer
|
|
|
|
* -
|
|
- IA32_SYSENTER_CS/ESP/EIP
|
|
- n/a
|
|
- Set to 0
|
|
|
|
* - **exec control**
|
|
- VMX_PIN_VM_EXEC_CONTROLS
|
|
- 0
|
|
- Enable external-interrupt exiting
|
|
|
|
* -
|
|
-
|
|
- 7
|
|
- Enable posted interrupts
|
|
|
|
* -
|
|
- VMX_PROC_VM_EXEC_CONTROLS
|
|
- 3
|
|
- Use TSC offsetting
|
|
|
|
* -
|
|
-
|
|
- 21
|
|
- Use TPR shadow
|
|
|
|
* -
|
|
-
|
|
- 25
|
|
- Use I/O bitmaps
|
|
|
|
* -
|
|
-
|
|
- 28
|
|
- Use MSR bitmaps
|
|
|
|
* -
|
|
-
|
|
- 31
|
|
- Activate secondary controls
|
|
|
|
* -
|
|
- VMX_PROC_VM_EXEC_CONTROLS2
|
|
- 0
|
|
- Virtualize APIC accesses
|
|
|
|
* -
|
|
-
|
|
- 1
|
|
- Enable EPT
|
|
|
|
* -
|
|
-
|
|
- 3
|
|
- Enable RDTSCP
|
|
|
|
* -
|
|
-
|
|
- 5
|
|
- Enable VPID
|
|
|
|
* -
|
|
-
|
|
- 7
|
|
- Unrestricted guest
|
|
|
|
* -
|
|
-
|
|
- 8
|
|
- APIC-register virtualization
|
|
|
|
* -
|
|
-
|
|
- 9
|
|
- Virtual-interrupt delivery
|
|
|
|
* -
|
|
-
|
|
- 20
|
|
- Enable XSAVES/XRSTORS
|
|
|
|
* - **guest state**
|
|
- CS, DS, ES, FS, GS, TR, LDTR, GDTR, IDTR
|
|
- n/a
|
|
- According to vCPU mode and init_ctx
|
|
|
|
* -
|
|
- RIP, RSP
|
|
- n/a
|
|
- According to vCPU mode and init_ctx
|
|
|
|
* -
|
|
- CR0, CR3, CR4
|
|
- n/a
|
|
- According to vCPU mode and init_ctx
|
|
|
|
* -
|
|
- GUEST_IA32_SYSENTER_CS/ESP/EIP
|
|
- n/a
|
|
- Set to 0
|
|
|
|
* -
|
|
- GUEST_IA32_PAT
|
|
- n/a
|
|
- Set to PAT_POWER_ON_VALUE
|
|
|
|
* - **entry control**
|
|
- VMX_ENTRY_CONTROLS
|
|
- 2
|
|
- Load debug controls
|
|
|
|
* -
|
|
-
|
|
- 14
|
|
- Load IA32_PAT
|
|
|
|
* -
|
|
-
|
|
- 15
|
|
- Load IA23_EFER
|
|
|
|
* - **exit control**
|
|
- VMX_EXIT_CONTROLS
|
|
- 2
|
|
- Save debug controls
|
|
|
|
* -
|
|
-
|
|
- 9
|
|
- Host address space size
|
|
|
|
* -
|
|
-
|
|
- 15
|
|
- Acknowledge Interrupt on exit
|
|
|
|
* -
|
|
-
|
|
- 18
|
|
- Save IA32_PAT
|
|
|
|
* -
|
|
-
|
|
- 19
|
|
- Load IA32_PAT
|
|
|
|
* -
|
|
-
|
|
- 20
|
|
- Save IA32_EFER
|
|
|
|
* -
|
|
-
|
|
- 21
|
|
- Load IA32_EFER
|
|
|
|
|
|
CPUID Virtualization
|
|
********************
|
|
|
|
CPUID access from guest would cause VM exits unconditionally if executed
|
|
as a VMX non-root operation. ACRN must return the emulated processor
|
|
identification and feature information in the EAX, EBX, ECX, and EDX
|
|
registers.
|
|
|
|
To simplify, ACRN returns the same values from the physical CPU for most
|
|
of the CPUID, and specially handle a few CPUID features which are APIC
|
|
ID related such as CPUID.01H.
|
|
|
|
ACRN emulates some extra CPUID features for the hypervisor as well.
|
|
|
|
There is a per-vm *vcpuid_entries* array, initialized during VM creation
|
|
and used to cache most of the CPUID entries for each VM. During guest
|
|
CPUID emulation, ACRN will read the cached value from this array, except
|
|
some APIC ID-related CPUID data emulated at runtime.
|
|
|
|
This table describes details for CPUID emulation:
|
|
|
|
.. list-table::
|
|
:widths: 20 80
|
|
:header-rows: 1
|
|
|
|
|
|
* - **CPUID**
|
|
- **Emulation Description**
|
|
|
|
* - 01H
|
|
- - Get original value from physical CPUID
|
|
- Fill APIC ID from vLAPIC
|
|
- Disable x2APIC
|
|
- Disable PCID
|
|
- Disable VMX
|
|
- Disable XSAVE if host not enabled
|
|
|
|
* - 0BH
|
|
- - Fill according to X2APIC feature support (default is disabled)
|
|
- If not supported, fill all registers with 0
|
|
- If supported, get from physical CPUID
|
|
|
|
* - 0DH
|
|
- - Fill according to XSAVE feature support
|
|
- If not supported, fill all registers with 0
|
|
- If supported, get from physical CPUID
|
|
|
|
* - 07H
|
|
- - Get from per-vm CPUID entries cache
|
|
- For subleaf 0, disabled INVPCID, Intel RDT
|
|
|
|
* - 16H
|
|
- - Get from per-vm CPUID entries cache
|
|
- If physical CPU support CPUID.16H, read from physical CPUID
|
|
- If physical CPU does not support it, emulate with tsc freq
|
|
|
|
* - 40000000H
|
|
- - Get from per-vm CPUID entries cache
|
|
- EAX: the maximum input value for CPUID supported by ACRN (40000010)
|
|
- EBX, ECX, EDX: hypervisor vendor ID signature - "ACRNACRNACRN"
|
|
|
|
* - 40000010H
|
|
- - Get from per-vm CPUID entries cache
|
|
- EAX: virtual TSC frequency in KHz
|
|
- EBX, ECX, EDX: reserved to 0
|
|
|
|
* - 0AH
|
|
- - PMU Currently disabled
|
|
|
|
* - 0FH, 10H
|
|
- - Intel RDT Currently disabled
|
|
|
|
* - 12H
|
|
- - Fill according to SGX virtualization
|
|
|
|
* - 14H
|
|
- - Intel Processor Trace Currently disabled
|
|
|
|
* - Others
|
|
- - Get from per-vm CPUID entries cache
|
|
|
|
.. note:: ACRN needs to take care of
|
|
some CPUID values that can change at runtime, for example, XD feature in
|
|
CPUID.80000001H may be cleared by MISC_ENABLE MSR.
|
|
|
|
|
|
MSR Virtualization
|
|
******************
|
|
|
|
ACRN always enables MSR bitmap in *VMX_PROC_VM_EXEC_CONTROLS* VMX
|
|
execution control field. This bitmap marks the MSRs to cause a VM
|
|
exit upon guest access for both read and write. The VM
|
|
exit reason for reading or writing these MSRs is respectively
|
|
*VMX_EXIT_REASON_RDMSR* or *VMX_EXIT_REASON_WRMSR* and the vm exit
|
|
handler is *rdmsr_vmexit_handler* or *wrmsr_vmexit_handler*.
|
|
|
|
This table shows the predefined MSRs ACRN will trap for all the guests. For
|
|
the MSRs whose bitmap are not set in the MSR bitmap, guest access will be
|
|
pass-through directly:
|
|
|
|
.. list-table::
|
|
:widths: 33 33 33
|
|
:header-rows: 1
|
|
|
|
* - **MSR**
|
|
- **Description**
|
|
- **Handler**
|
|
|
|
* - MSR_IA32_TSC_ADJUST
|
|
- TSC adjustment of local APIC's TSC deadline mode
|
|
- emulates with vlapic
|
|
|
|
* - MSR_IA32_TSC_DEADLINE
|
|
- TSC target of local APIC's TSC deadline mode
|
|
- emulates with vlapic
|
|
|
|
* - MSR_IA32_BIOS_UPDT_TRIG
|
|
- BIOS update trigger
|
|
- work for update microcode from the Service VM, the signature ID read is from
|
|
physical MSR, and a BIOS update trigger from the Service VM will trigger a
|
|
physical microcode update.
|
|
|
|
* - MSR_IA32_BIOS_SIGN_ID
|
|
- BIOS update signature ID
|
|
- "
|
|
|
|
* - MSR_IA32_TIME_STAMP_COUNTER
|
|
- Time-stamp counter
|
|
- work with VMX_TSC_OFFSET_FULL to emulate virtual TSC
|
|
|
|
* - MSR_IA32_APIC_BASE
|
|
- APIC base address
|
|
- emulates with vlapic
|
|
|
|
* - MSR_IA32_PAT
|
|
- Page-attribute table
|
|
- save/restore in vCPU, write to VMX_GUEST_IA32_PAT_FULL if cr0.cd is 0
|
|
|
|
* - MSR_IA32_PERF_CTL
|
|
- Performance control
|
|
- Trigger real p-state change if p-state is valid when writing,
|
|
fetch physical MSR when reading
|
|
|
|
* - MSR_IA32_FEATURE_CONTROL
|
|
- Feature control bits that configure operation of VMX and SMX
|
|
- disabled, locked
|
|
|
|
* - MSR_IA32_MCG_CAP/STATUS
|
|
- Machine-Check global control/status
|
|
- emulates with vMCE
|
|
|
|
* - MSR_IA32_MISC_ENABLE
|
|
- Miscellaneous feature control
|
|
- readonly, except MONITOR/MWAIT enable bit
|
|
|
|
* - MSR_IA32_SGXLEPUBKEYHASH0/1/2/3
|
|
- SHA256 digest of the authorized launch enclaves
|
|
- emulates with vSGX
|
|
|
|
* - MSR_IA32_SGX_SVN_STATUS
|
|
- Status and SVN threshold of SGX support for ACM
|
|
- readonly, emulates with vSGX
|
|
|
|
* - MSR_IA32_MTRR_CAP
|
|
- Memory type range register related
|
|
- Handled by MTRR emulation.
|
|
|
|
* - MSR_IA32_MTRR_DEF_TYPE
|
|
- "
|
|
- "
|
|
|
|
* - MSR_IA32_MTRR_PHYSBASE_0~9
|
|
- "
|
|
- "
|
|
|
|
* - MSR_IA32_MTRR_FIX64K_00000
|
|
- "
|
|
- "
|
|
|
|
* - MSR_IA32_MTRR_FIX16K_80000/A0000
|
|
- "
|
|
- "
|
|
|
|
* - MSR_IA32_MTRR_FIX4K_C0000~F8000
|
|
- "
|
|
- "
|
|
|
|
* - MSR_IA32_X2APIC_*
|
|
- x2APIC related MSRs (offset from 0x800 to 0x900)
|
|
- emulates with vlapic
|
|
|
|
* - MSR_IA32_L2_MASK_BASE~n
|
|
- L2 CAT mask for CLOSn
|
|
- disabled for guest access
|
|
|
|
* - MSR_IA32_L3_MASK_BASE~n
|
|
- L3 CAT mask for CLOSn
|
|
- disabled for guest access
|
|
|
|
* - MSR_IA32_MBA_MASK_BASE~n
|
|
- MBA delay mask for CLOSn
|
|
- disabled for guest access
|
|
|
|
* - MSR_IA32_VMX_BASIC~VMX_TRUE_ENTRY_CTLS
|
|
- VMX related MSRs
|
|
- not support, access will inject #GP
|
|
|
|
|
|
CR Virtualization
|
|
*****************
|
|
|
|
ACRN emulates ``mov to cr0``, ``mov to cr4``, ``mov to cr8``, and ``mov
|
|
from cr8`` through *cr_access_vmexit_handler* based on
|
|
*VMX_EXIT_REASON_CR_ACCESS*.
|
|
|
|
.. note:: Currently ``mov to cr8`` and ``mov from cr8`` are actually
|
|
not valid as ``CR8-load/store exiting`` bits are set as 0 in
|
|
*VMX_PROC_VM_EXEC_CONTROLS*.
|
|
|
|
A VM can ``mov from cr0`` and ``mov from
|
|
cr4`` without triggering a VM exit. The value read are the read shadows
|
|
of the corresponding register in VMCS. The shadows are updated by the
|
|
hypervisor on CR writes.
|
|
|
|
.. list-table::
|
|
:widths: 30 70
|
|
:header-rows: 1
|
|
|
|
* - **Operation**
|
|
- **Handler**
|
|
|
|
* - mov to cr0
|
|
- Based on vCPU set context API: vcpu_set_cr0 -> vmx_write_cr0
|
|
|
|
* - mov to cr4
|
|
- Based on vCPU set context API: vcpu_set_cr4 ->vmx_write_cr4
|
|
|
|
* - mov to cr8
|
|
- Based on vlapic tpr API: vlapic_set_cr8->vlapic_set_tpr
|
|
|
|
* - mov from cr8
|
|
- Based on vlapic tpr API: vlapic_get_cr8->vlapic_get_tpr
|
|
|
|
|
|
For ``mov to cr0`` and ``mov to cr4``, ACRN sets
|
|
*cr0_host_mask/cr4_host_mask* into *VMX_CR0_MASK/VMX_CR4_MASK*
|
|
for the bitmask causing vm exit.
|
|
|
|
As ACRN always enables ``unrestricted guest`` in
|
|
*VMX_PROC_VM_EXEC_CONTROLS2*, *CR0.PE* and *CR0.PG* can be
|
|
controlled by guest.
|
|
|
|
.. list-table::
|
|
:widths: 20 40 40
|
|
:header-rows: 1
|
|
|
|
* - **CR0 MASK**
|
|
- **Value**
|
|
- **Comments**
|
|
|
|
* - cr0_always_on_mask
|
|
- fixed0 & (~(CR0_PE | CR0_PG))
|
|
- where fixed0 is gotten from MSR_IA32_VMX_CR0_FIXED0, means these bits
|
|
are fixed to be 1 under VMX operation
|
|
|
|
* - cr0_always_off_mask
|
|
- ~fixed1
|
|
- where ~fixed1 is gotten from MSR_IA32_VMX_CR0_FIXED1, means these bits
|
|
are fixed to be 0 under VMX operation
|
|
|
|
* - CR0_TRAP_MASK
|
|
- CR0_PE | CR0_PG | CR0_WP | CR0_CD | CR0_NW
|
|
- ACRN will also trap PE, PG, WP, CD, and NW bits
|
|
|
|
* - cr0_host_mask
|
|
- ~(fixed0 ^ fixed1) | CR0_TRAP_MASK
|
|
- ACRN will finally trap bits under VMX root mode control plus
|
|
additionally added bits
|
|
|
|
|
|
For ``mov to cr0`` emulation, ACRN will handle a paging mode change based on
|
|
PG bit change, and a cache mode change based on CD and NW bits changes.
|
|
ACRN also takes care of illegal writing from guest to invalid
|
|
CR0 bits (for example, set PG while CR4.PAE = 0 and IA32_EFER.LME = 1),
|
|
which will finally inject a #GP to guest. Finally,
|
|
*VMX_CR0_READ_SHADOW* will be updated for guest reading of host
|
|
controlled bits, and *VMX_GUEST_CR0* will be updated for real vmx cr0
|
|
setting.
|
|
|
|
.. list-table::
|
|
:widths: 20 40 40
|
|
:header-rows: 1
|
|
|
|
* - **CR4 MASK**
|
|
- **Value**
|
|
- **Comments**
|
|
|
|
* - cr4_always_on_mask
|
|
- fixed0
|
|
- where fixed0 is gotten from MSR_IA32_VMX_CR4_FIXED0, means these bits
|
|
are fixed to be 1 under VMX operation
|
|
|
|
* - cr4_always_off_mask
|
|
- ~fixed1
|
|
- where ~fixed1 is gotten from MSR_IA32_VMX_CR4_FIXED1, means these bits
|
|
are fixed to be 0 under VMX operation
|
|
|
|
* - CR4_TRAP_MASK
|
|
- CR4_PSE | CR4_PAE | CR4_VMXE | CR4_PCIDE | CR4_SMEP | CR4_SMAP | CR4_PKE
|
|
- ACRN will also trap PSE, PAE, VMXE, and PCIDE bits
|
|
|
|
* - cr4_host_mask
|
|
- ~(fixed0 ^ fixed1) | CR4_TRAP_MASK
|
|
- ACRN will finally trap bits under VMX root mode control plus
|
|
additionally added bits
|
|
|
|
|
|
The ``mov to cr4`` emulation is similar to cr0 emulation noted above.
|
|
|
|
.. _io-mmio-emulation:
|
|
|
|
IO/MMIO Emulation
|
|
*****************
|
|
|
|
ACRN always enables I/O bitmap in *VMX_PROC_VM_EXEC_CONTROLS* and EPT
|
|
in *VMX_PROC_VM_EXEC_CONTROLS2*. Based on them,
|
|
*pio_instr_vmexit_handler* and *ept_violation_vmexit_handler* are
|
|
used for IO/MMIO emulation for a emulated device. The emulated device
|
|
could locate in hypervisor or DM in the Service VM. Please refer to the "I/O
|
|
Emulation" section for more details.
|
|
|
|
For an emulated device done in the hypervisor, ACRN provide some basic
|
|
APIs to register its IO/MMIO range:
|
|
|
|
- For the Service VM, the default I/O bitmap are all set to 0, which means the Service VM will pass
|
|
through all I/O port access by default. Adding an I/O handler
|
|
for a hypervisor emulated device needs to first set its corresponding
|
|
I/O bitmap to 1.
|
|
|
|
- For User VM, the default I/O bitmap are all set to 1, which means User VM will trap
|
|
all I/O port access by default. Adding an I/O handler for a
|
|
hypervisor emulated device does not need change its I/O bitmap.
|
|
If the trapped I/O port access does not fall into a hypervisor
|
|
emulated device, it will create an I/O request and pass it to the Service VM
|
|
DM.
|
|
|
|
- For the Service VM, EPT maps all range of memory to the Service VM except for ACRN hypervisor
|
|
area. This means the Service VM will pass through all MMIO access by
|
|
default. Adding a MMIO handler for a hypervisor emulated
|
|
device needs to first remove its MMIO range from EPT mapping.
|
|
|
|
- For User VM, EPT only maps its system RAM to the User VM, which means User VM will
|
|
trap all MMIO access by default. Adding a MMIO handler for a
|
|
hypervisor emulated device does not need to change its EPT mapping.
|
|
If the trapped MMIO access does not fall into a hypervisor
|
|
emulated device, it will create an I/O request and pass it to the Service VM
|
|
DM.
|
|
|
|
.. list-table::
|
|
:widths: 30 70
|
|
:header-rows: 1
|
|
|
|
* - **API**
|
|
- **Description**
|
|
|
|
* - register_pio_emulation_handler
|
|
- register an I/O emulation handler for a hypervisor emulated device
|
|
by specific I/O range
|
|
|
|
* - register_mmio_emulation_handler
|
|
- register a MMIO emulation handler for a hypervisor emulated device
|
|
by specific MMIO range
|
|
|
|
.. _instruction-emulation:
|
|
|
|
Instruction Emulation
|
|
*********************
|
|
|
|
ACRN implements a simple instruction emulation infrastructure for
|
|
MMIO (EPT) and APIC access emulation. When such a VM exit is triggered, the
|
|
hypervisor needs to decode the instruction from RIP then attempt the
|
|
corresponding emulation based on its instruction and read/write direction.
|
|
|
|
ACRN currently supports emulating instructions for ``mov``, ``movx``,
|
|
``movs``, ``stos``, ``test``, ``and``, ``or``, ``cmp``, ``sub`` and
|
|
``bittest`` without support for lock prefix. Real mode emulation is not
|
|
supported.
|
|
|
|
.. figure:: images/hld-image82.png
|
|
:align: center
|
|
|
|
Instruction Emulation Work Flow
|
|
|
|
In the handlers for EPT violation or APIC access VM exit, ACRN will:
|
|
|
|
1. Fetch the MMIO access request's address and size
|
|
|
|
2. Do *decode_instruction* for the instruction in current RIP
|
|
with the following check:
|
|
|
|
a. Is the instruction supported? If not, inject #UD to guest.
|
|
b. Is GVA of RIP, dest, and src valid? If not, inject #PF to guest.
|
|
c. Is stack valid? If not, inject #SS to guest.
|
|
|
|
3. If step 2 succeeds, check the access direction. If it's a write, then
|
|
do *emulate_instruction* to fetch MMIO request's value from
|
|
instruction operands.
|
|
|
|
4. Execute MMIO request handler, for EPT violation is *emulate_io*
|
|
while APIC access is *vlapic_write/read* based on access
|
|
direction. It will finally complete this MMIO request emulation
|
|
by:
|
|
|
|
a. putting req.val to req.addr for write operation
|
|
b. getting req.val from req.addr for read operation
|
|
|
|
5. If the access direction is read, then do *emulate_instruction* to
|
|
put MMIO request's value into instruction operands.
|
|
|
|
6. Return to guest.
|
|
|
|
TSC Emulation
|
|
*************
|
|
|
|
Guest vCPU execution of *RDTSC/RDTSCP* and access to
|
|
*MSR_IA32_TSC_AUX* does not cause a VM Exit to the hypervisor.
|
|
Hypervisor uses *MSR_IA32_TSC_AUX* to record CPU ID, thus
|
|
the CPU ID provided by *MSR_IA32_TSC_AUX* might be changed via Guest.
|
|
|
|
*RDTSCP* is widely used by hypervisor to identify current CPU ID. Due
|
|
to no VM Exit for *MSR_IA32_TSC_AUX* MSR register, ACRN hypervisor
|
|
saves/restores *MSR_IA32_TSC_AUX* value on every VM Exit/Enter.
|
|
Before hypervisor restores host CPU ID, *rdtscp* should not be
|
|
called as it could get vCPU ID instead of host CPU ID.
|
|
|
|
The *MSR_IA32_TIME_STAMP_COUNTER* is emulated by ACRN hypervisor, with a
|
|
simple implementation based on *TSC_OFFSET* (enabled
|
|
in *VMX_PROC_VM_EXEC_CONTROLS*):
|
|
|
|
- For read: ``val = rdtsc() + exec_vmread64(VMX_TSC_OFFSET_FULL)``
|
|
- For write: ``exec_vmwrite64(VMX_TSC_OFFSET_FULL, val - rdtsc())``
|
|
|
|
ART Virtualization
|
|
******************
|
|
|
|
The invariant TSC is based on the invariant timekeeping hardware (called
|
|
Always Running Timer or ART), that runs at the core crystal clock frequency.
|
|
The ratio defined by the CPUID leaf 15H expresses the frequency relationship
|
|
between the ART hardware and the TSC.
|
|
|
|
If CPUID.15H.EBX[31:0] != 0 and CPUID.80000007H:EDX[InvariantTSC] = 1, the
|
|
following linearity relationship holds between the TSC and the ART hardware:
|
|
|
|
``TSC_Value = (ART_Value * CPUID.15H:EBX[31:0]) / CPUID.15H:EAX[31:0] + K``
|
|
|
|
Where `K` is an offset that can be adjusted by a privileged agent.
|
|
When ART hardware is reset, both invariant TSC and K are also reset.
|
|
|
|
The guideline of ART virtualization (vART) is that software in native can run in
|
|
VM too. The vART solution is:
|
|
|
|
- Present the ART capability to guest through CPUID leaf 15H for `CPUID.15H:EBX[31:0]`
|
|
and `CPUID.15H:EAX[31:0]`.
|
|
- Passthrough devices see the physical ART_Value (vART_Value = pART_Value)
|
|
- Relationship between the ART and TSC in guest is:
|
|
``vTSC_Value = (vART_Value * CPUID.15H:EBX[31:0]) / CPUID.15H:EAX[31:0] + vK``
|
|
Where `vK = K + VMCS.TSC_OFFSET`.
|
|
- If `vK` or `vTSC_Value` are changed by guest, we change the `VMCS.TSC_OFFSET` accordingly.
|
|
- `K` should never be changed by hypervisor.
|
|
|
|
XSAVE Emulation
|
|
***************
|
|
|
|
The XSAVE feature set is comprised of eight instructions:
|
|
|
|
- *XGETBV* and *XSETBV* allow software to read and write the extended
|
|
control register *XCR0*, which controls the operation of the
|
|
XSAVE feature set.
|
|
|
|
- *XSAVE*, *XSAVEOPT*, *XSAVEC*, and *XSAVES* are four instructions
|
|
that save processor state to memory.
|
|
|
|
- *XRSTOR* and *XRSTORS* are corresponding instructions that load
|
|
processor state from memory.
|
|
|
|
- *XGETBV*, *XSAVE*, *XSAVEOPT*, *XSAVEC*, and *XRSTOR* can be executed
|
|
at any privilege level;
|
|
|
|
- *XSETBV*, *XSAVES*, and *XRSTORS* can be executed only if CPL = 0.
|
|
|
|
Enabling the XSAVE feature set is controlled by XCR0 (through XSETBV)
|
|
and IA32_XSS MSR. Refer to the `Intel SDM Volume 1`_ chapter 13 for more details.
|
|
|
|
|
|
.. _Intel SDM Volume 1:
|
|
https://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-vol-1-manual.html
|
|
|
|
.. figure:: images/hld-image38.png
|
|
:align: center
|
|
|
|
ACRN Hypervisor XSAVE emulation
|
|
|
|
By default, ACRN enables XSAVES/XRSTORS in
|
|
*VMX_PROC_VM_EXEC_CONTROLS2*, so it allows the guest to use the XSAVE
|
|
feature. Because guest execution of *XSETBV* will always trigger XSETBV VM
|
|
exit, ACRN actually needs to take care of XCR0 access.
|
|
|
|
ACRN emulates XSAVE features through the following rules:
|
|
|
|
1. Enumerate CPUID.01H for native XSAVE feature support
|
|
2. If yes for step 1, enable XSAVE in hypervisor by CR4.OSXSAVE
|
|
3. Emulates XSAVE related CPUID.01H & CPUID.0DH to guest
|
|
4. Emulates XCR0 access through *xsetbv_vmexit_handler*
|
|
5. ACRN pass-through the access of IA32_XSS MSR to guest
|
|
6. ACRN hypervisor does NOT use any feature of XSAVE
|
|
7. As ACRN emulate vCPU with partition mode, so based on above rules 5
|
|
and 6, a guest vCPU will fully control the XSAVE feature in
|
|
non-root mode.
|