Commit Graph

35 Commits

Author SHA1 Message Date
Zide Chen
cbf3825140 hv: Pass-through IA32_TSC_AUX MSR to L1 guest
Use an unused MSR on host to save ACRN pcpu ID and avoid saving and
restoring TSC AUX MSR on VMX transitions.

Tracked-On: #6289
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
2021-08-26 09:25:54 +08:00
Zide Chen
0980420aea hv: minor cleanup of hv_main.c
- remove vcpu->arch.nrexits which is useless.
- record full 32 bits of exit_reason to TRACE_2L(). Make the code simpler.

Tracked-On: #6289
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-08-25 08:49:54 +08:00
Zide Chen
6d7eb6d7b6 hv: emulate IA32_EFER and adjust Load EFER VMX controls
This helps to improve performance:

- Don't need to execute VMREAD in vcpu_get_efer(), which is frequently
  called.

- VMX_EXIT_CTLS_SAVE_EFER can be removed from VM-Exit Controls.

- If the value of IA32_EFER MSR is identical between the host and guest
  (highly likely), adjust the VMX controls not to load IA32_EFER on
  VMExit and VMEntry.

It's convenient to continue use the exiting vcpu_s/get_efer() APIs,
other than the common vcpu_s/get_guest_msr().

Tracked-On: #6289
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
2021-08-24 11:16:53 +08:00
Zhou, Wu
53f6720d13 HV: Combine the acpi loading fucntion to one place
Remove the acpi loading function from elf_loader, rawimage_loaer and
bzimage_loader, and call it together in vm_sw_loader.

Now the vm_sw_loader's job is not just loading sw, so we rename it to
prepare_os_image.

Tracked-On: #6323

Signed-off-by: Zhou, Wu <wu.zhou@intel.com>
Reviewed-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-08-19 20:00:45 +08:00
Fei Li
20061b7c39 hv: remove xsave dependence
ACRN could run without XSAVE Capability. So remove XSAVE dependence to support
more (hardware or virtual) platforms.

Tracked-On: #6287
Signed-off-by: Fei Li <fei1.li@intel.com>
2021-08-10 16:36:15 +08:00
Victor Sun
2fbc4c26e6 HV: vm_load: remove kernel_load_addr in sw_kernel_info struct
When guest kernel has multiple loading segments like ELF format image, just
define one load address in sw_kernel_info struct is meaningless.

The patch removes kernel_load_addr member in struct sw_kernel_info, the load
address should be parsed in each specified format image processing.

Tracked-On: #6323

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-08-03 13:44:51 +08:00
Tao Yuhong
2aba7f31db HV: rename splitlock file name
Because the emulation code is for both split-lock and uc-lock,
rename splitlock.c/splitlock.h to lock_instr_emul.c/lock_instr_emul.h

Tracked-On: #6299
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-07-21 11:25:47 +08:00
Tao Yuhong
7926504011 HV: rename split-lock emulation APIs
Because the emulation code is for both split-lock and uc-lock, Changed
these API names:
vcpu_kick_splitlock_emulation() -> vcpu_kick_lock_instr_emulation()
vcpu_complete_splitlock_emulation() -> vcpu_complete_lock_instr_emulation()
emulate_splitlock() -> emulate_lock_instr()

Tracked-On: #6299
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-07-21 11:25:47 +08:00
Tao Yuhong
553d59644b HV: Fix decode_instruction() trigger #UD for emulating UC-lock
When ACRN uses decode_instruction to emulate split-lock/uc-lock
instruction, It is actually a try-decode to see if it is XCHG.
If the instruction is XCHG instruction, ACRN must emulate it
(inject #PF if it is triggered) with peer VCPUs paused, and advance
the guest IP. If the instruction is a LOCK prefixed instruction
with accessing the UC memory, ACRN Halted the peer VCPUs, and
advance the IP to skip the LOCK prefix, and then let the VCPU
Executes one instruction by enabling IRQ Windows vm-exit. For
other cases, ACRN injects the exception back to VCPU without
emulating it.
So change the API to decode_instruction(vcpu, bool full_decode),
when full_decode is true, the API does same thing as before. When
full_decode is false, the different is if decode_instruction() meet unknown
instruction, will keep return = -1 and do not inject #UD. We can use
this to distinguish that an #UD has been skipped, and need inject #AC/#GP back.

Tracked-On: #6299
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
2021-07-21 11:25:47 +08:00
Shuo A Liu
6e0b12180c hv: dm: Use new power management data structures
struct cpu_px_data		->	struct acrn_pstate_data
struct cpu_cx_data		->	struct acrn_cstate_data
enum pm_cmd_type		->	enum acrn_pm_cmd_type
struct acpi_generic_address	->	struct acrn_acpi_generic_address
cpu_cx_data			->	acrn_cstate_data
cpu_px_data			->	acrn_pstate_data

IC_PM_GET_CPU_STATE		->	ACRN_IOCTL_PM_GET_CPU_STATE

PMCMD_GET_PX_CNT		->	ACRN_PMCMD_GET_PX_CNT
PMCMD_GET_CX_CNT		->	ACRN_PMCMD_GET_CX_CNT
PMCMD_GET_PX_DATA		->	ACRN_PMCMD_GET_PX_DATA
PMCMD_GET_CX_DATA		->	ACRN_PMCMD_GET_CX_DATA

Tracked-On: #6282
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
2021-07-15 11:53:54 +08:00
Shuo A Liu
9c910bae44 hv: dm: Use new I/O request data structures
struct vhm_request		->	struct acrn_io_request
union vhm_request_buffer	->	struct acrn_io_request_buffer
struct pio_request		->	struct acrn_pio_request
struct mmio_request		->	struct acrn_mmio_request
struct ioreq_notify		->	struct acrn_ioreq_notify

VHM_REQ_PIO_INVAL		->	IOREQ_PIO_INVAL
VHM_REQ_MMIO_INVAL		->	IOREQ_MMIO_INVAL
REQ_PORTIO			->	ACRN_IOREQ_TYPE_PORTIO
REQ_MMIO			->	ACRN_IOREQ_TYPE_MMIO
REQ_PCICFG			->	ACRN_IOREQ_TYPE_PCICFG
REQ_WP				->	ACRN_IOREQ_TYPE_WP

REQUEST_READ			->	ACRN_IOREQ_DIR_READ
REQUEST_WRITE			->	ACRN_IOREQ_DIR_WRITE
REQ_STATE_PROCESSING		->	ACRN_IOREQ_STATE_PROCESSING
REQ_STATE_PENDING		->	ACRN_IOREQ_STATE_PENDING
REQ_STATE_COMPLETE		->	ACRN_IOREQ_STATE_COMPLETE
REQ_STATE_FREE			->	ACRN_IOREQ_STATE_FREE

IC_CREATE_IOREQ_CLIENT		->	ACRN_IOCTL_CREATE_IOREQ_CLIENT
IC_DESTROY_IOREQ_CLIENT		->	ACRN_IOCTL_DESTROY_IOREQ_CLIENT
IC_ATTACH_IOREQ_CLIENT		->	ACRN_IOCTL_ATTACH_IOREQ_CLIENT
IC_NOTIFY_REQUEST_FINISH	->	ACRN_IOCTL_NOTIFY_REQUEST_FINISH
IC_CLEAR_VM_IOREQ		->	ACRN_IOCTL_CLEAR_VM_IOREQ
HYPERVISOR_CALLBACK_VHM_VECTOR	->	HYPERVISOR_CALLBACK_HSM_VECTOR

arch_fire_vhm_interrupt()	->	arch_fire_hsm_interrupt()
get_vhm_notification_vector()	->	get_hsm_notification_vector()
set_vhm_notification_vector()	->	set_hsm_notification_vector()
acrn_vhm_notification_vector	->	acrn_hsm_notification_vector
get_vhm_req_state()		->	get_io_req_state()
set_vhm_req_state()		->	set_io_req_state()

Below structures have slight difference with former ones.

  struct acrn_ioreq_notify
  strcut acrn_io_request

Tracked-On: #6282
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
2021-07-15 11:53:54 +08:00
Shuo A Liu
107cae316a hv: dm: Use new ioctl ACRN_IOCTL_SET_VCPU_REGS
struct acrn_set_vcpu_regs	->	struct acrn_vcpu_regs
struct acrn_vcpu_regs		->	struct acrn_regs
IC_SET_VCPU_REGS		->	ACRN_IOCTL_SET_VCPU_REGS

Tracked-On: #6282
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
2021-07-15 11:53:54 +08:00
Victor Sun
ed97022646 HV: add find_space_from_ve820() api
The API would search ve820 table and return a valid GPA when the requested
size of memory is available in the specified memory range, or return
INVALID_GPA if the requested memory slot is not available;

Tracked-On: #5626

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-06-11 10:06:02 +08:00
Shuo A Liu
387ea23961 hv: Rename get_ept_entry() to get_eptp()
get_ept_entry() actually returns the EPTP of a VM. So rename it to
get_eptp() for readability.

Tracked-On: #5923
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-06-09 10:07:05 +08:00
Shuo A Liu
b10b5658bd hv: nested: Introduce L2 VM EPT VIOLATION handler
With shadow EPT, the hypervisor walks through guest EPT table:

  * If the entry is not present in guest EPT, ACRN injects EPT_VIOLATION
    to L1 VM and resumes to L1 VM.

  * If the entry is present in guest EPT, do the EPT_MISCONFIG check.
    Inject EPT_MISCONFIG to L1 VM if the check failed.

  * If the entry is present in guest EPT, do permission check.
    Reflect EPT_VIOLATION to L1 VM if the check failed.

  * If the entry is present in guest EPT but shadow EPT entry is not
    present, create the shadow entry and resumes to L2 VM.

  * If the entry is present in guest EPT but the GPA in the entry is
    invalid, injects EPT_VIOLATION to L1 VM and resumes L1 VM.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-06-04 13:53:47 +08:00
Shuo A Liu
540a484147 hv: nested: Manage shadow EPTP according to guest VMCS change
'struct nept_desc' is used to associate guest EPTP with a shadow EPTP.
It's created in the first reference and be freed while no reference.

The life cycle seems like,

While guest VMCS VMX_EPT_POINTER_FULL is changed, the 'struct nept_desc'
of the new guest EPTP is referenced; the 'struct nept_desc' of the old
guest EPTP is dereferenced.

While guest VMCS be cleared(by VMCLEAR in L1 VM), the 'struct nept_desc'
of the old guest EPTP is dereferenced.

While a new guest VMCS be loaded(by VMPTRLD in L1 VM), the 'struct
nept_desc' of the new guest EPTP is referenced. The 'struct nept_desc'
of the old guest EPTP is dereferenced.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-06-04 13:53:47 +08:00
Shuo A Liu
10ec896f99 hv: nested: Introduce shadow EPT infrastructure
To shadow guest EPT, the hypervisor needs construct a shadow EPT for each
guest EPT. The key to associate a shadow EPT and a guest EPT is the EPTP
(EPT pointer). This patch provides following structure to do the association.

	struct nept_desc {
	       /*
	        * A shadow EPTP.
	        * The format is same with 'EPT pointer' in VMCS.
	        * Its PML4 address field is a HVA of the hypervisor.
	        */
	       uint64_t shadow_eptp;
	       /*
	        * An guest EPTP configured by L1 VM.
	        * The format is same with 'EPT pointer' in VMCS.
	        * Its PML4 address field is a GPA of the L1 VM.
	        */
	       uint64_t guest_eptp;
	       uint32_t ref_count;
	};

Due to lack of dynamic memory allocation of the hypervisor, a array
nept_bucket of type 'struct nept_desc' is introduced to store those
association information. A guest EPT might be shared between different
L2 vCPUs, so this patch provides several functions to handle the
reference of the structure.

Interface get_shadow_eptp() also is introduced. To find the shadow EPTP
of a specified guest EPTP.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-06-04 13:53:47 +08:00
Shuo A Liu
17bc7f08c9 hv: nested: Create a page pool for shadow EPT construction
Shadow EPT uses lots of pages to construct the shadow page table. To
utilize the memory more efficient, a page poll sept_page_pool is
introduced.

For simplicity, total platform RAM size is considered to calculate the
memory needed for shadow page tables. This is not an accurate upper
bound.  This can satisfy typical use-cases where there is not a lot
of overcommitment and sharing of memory between L2 VMs.

Memory of the pool is marked as reserved from E820 table in early stage.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-06-04 13:53:47 +08:00
Zide Chen
811e367ad9 hv: nested: implement nested VM exit handler
Nested VM exits happen when vCPU is in guest mode (VMCS02 is current).
Initially we reflect all nested VM exits to L1 hypervisor. To prepare
the environment to run L1 guest:

- restore some VMCS fields to the value as what L1 hypervisor programmed.
- VMCLEAR VMCS02, VMPTRLD VMCS01 and enable VMCS shadowing.
- load the non-shadowing host states from VMCS12 to VMCS01 guest states.
- VMRESUME to L1 guest with this modified VMCS01.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Alexander Merritt <alex.merritt@intel.com>
2021-06-03 15:23:25 +08:00
Zide Chen
4acc65eacc hv: nested: support for INVEPT and INVVPID emulation
invvpid and invept instructions cause VM exits unconditionally.
For initial support, we pass all the instruction operands as is
to the pCPU.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-06-03 15:23:25 +08:00
Zide Chen
4c29a0bb29 hv: nested: support for VMLAUNCH and VMRESUME emulation
Implement the VMLAUNCH and VMRESUME instructions, allowing a L1
hypervisor to run nested guests.

- merge VMCS control fields and VMCS guest fields to VMCS02
- clear shadow VMCS indicator on VMCS02 and load VMCS02 as current
- set VMCS12 launch state to "launched" in VMLAUNCH handler

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Alex Merritt <alex.merritt@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-06-03 15:23:25 +08:00
Zide Chen
6d69058a9d hv: nested: support for VMREAD and VMWRITE emulation
This patch implements the VMREAD and VMWRITE instructions.

When L1 guest is running with an active VMCS12, the “VMCS shadowing”
VM-execution control is always set to 1 in VMCS01. Thus the possible
behavior of VMREAD or VMWRITE from L1 could be:

- It causes a VM exit to L0 if the bit corresponds to the target VMCS
  field in the VMREAD bitmap or VMWRITE bitmap is set to 1.
- It accesses the VMCS referenced by VMCS01 link pointer (VMCS02 in
  our case) if the above mentioned bit is set to 0.

This patch handles the VMREAD and VMWRITE VM exits in this way:

- on VMWRITE, it writes the desired VMCS value to the respective field
  in the cached VMCS12. For VMCS fields that need to be synced to VMCS02,
  sets the corresponding dirty flag.

- on VMREAD, it reads the desired VMCS value from the cached VMCS12.

Tracked-On: #5923
Signed-off-by: Alex Merritt <alex.merritt@intel.com>
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-05-24 10:34:01 +08:00
Zide Chen
2bd269c11c hv: nested: support for VMCLEAR emulation
This patch is to emulate VMCLEAR instruction.

L1 hypervisor issues VMCLEAR on a VMCS12 whose state could be any of
these: active and current, active but not current, not yet VMPTRLDed.

To emulate the VMCLEAR instruction, ACRN sets the VMCS12 launch state to
"clear", and if L0 already cached this VMCS12, need to sync it back to
guest memory:

- sync shadow fields from shadow VMCS VMCS to cache VMCS12
- copy cache VMCS12 to L1 guest memory

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
2021-05-24 10:34:01 +08:00
Zide Chen
863e58e539 hv: nested: define software layout for VMCS12 and helper functions
Software layout of VMCS12 data is a contract between L1 guest and L0
hypervisor to run a L2 guest.

ACRN hypervisor caches the VMCS12 which is passed down from L1 hypervisor
by the VMPTRLD instructin. At the time of VMCLEAR, ACRN syncs the cached
VMCS12 back to L1 guest memory.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-05-24 10:34:01 +08:00
Zide Chen
f5744174b5 hv: nested: support for VMPTRLD emulation
This patch emulates the VMPTRLD instruction. L0 hypervisor (ACRN) caches
the VMCS12 that is passed down from the VMPTRLD instruction, and merges it
with VMCS01 to create VMCS02 to run the nested VM.

- Currently ACRN can't cache multiple VMCS12 on one vCPU, so it needs to
  flushes active but not current VMCS12s to L1 guest.
- ACRN creates VMCS02 to run nested VM based on VMCS12:
  1) copy VMCS12 from guest memory to the per vCPU cache VMCS12
  2) initialize VMCS02 revision ID and host-state area
  3) load shadow fields from cache VMCS12 to VMCS02
  4) enable VMCS shadowing before L1 Vm entry

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
2021-05-24 10:34:01 +08:00
Zide Chen
0a1ac2f4a0 hv: nested: support for VMXOFF emulation
This patch implements the VMXOFF instruction. By issuing VMXOFF,
L1 guest Leaves VMX Operation.

- cleanup VCPU nested virtualization context states in VMXOFF handler.
- implement check_vmx_permission() to check permission for VMX operation
  for VMXOFF and other VMX instructions.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-05-24 10:34:01 +08:00
Zide Chen
3fdad3c6d1 hv: nested: check prerequisites to enter VMX operation
According to VMXON Instruction Reference, do the following checks in the
virtual hardware environment: vCPU CPL, guest CR0, CR4, revision ID
in VMXON region, etc.

Currently ACRN doesn't support 32-bit L1 hypervisor, and injects an #UD
exception if L1 hypervisor is not running in 64-bit mode.

Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-05-24 10:34:01 +08:00
Zide Chen
fc8f07e740 hv: nested: support for VMXON emulation
This patch emulates VMXON instruction. Basically checks some
prerequisites to enable VMX operation on L1 guest (next patch), and
prepares some virtual hardware environment in L0.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-05-24 10:34:01 +08:00
Li Fei1
a69e67b58b hv: vlapic: wrap a function to calculate destination vcpu mask by shorthand
1. Rename vlapic_calc_dest to vlapic_calc_dest_noshort
2. Remove vlapic_calc_dest_lapic_pt, use vlapic_calc_dest_noshort instead
3. Wrap vlapic_calc_dest to calculate destination vcpu mask according shorthand

Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-24 10:27:32 +08:00
Liang Yi
3547c9cd23 hv/mod_timer: make timer into an arch-independent module
x86/timer.[ch] was moved to the common directory largely unchanged.

x86 specific code now resides in x86/tsc_deadline_timer.c and its
interface was defined in hw/hw_timer.h. The interface defines two
functions: init_hw_timer() and set_hw_timeout() that provides HW
specific initialization and timer interrupt source.

Other than these two functions, the timer module is largely arch
agnostic.

Tracked-On: #5920
Signed-off-by: Rong Liu <rong2.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-05-18 16:43:28 +08:00
Zide Chen
c9982e8c7e hv: nested: setup emulated VMX MSRs
We emulated these MSRs:

- MSR_IA32_VMX_PINBASED_CTLS
- MSR_IA32_VMX_PROCBASED_CTLS
- MSR_IA32_VMX_PROCBASED_CTLS2
- MSR_IA32_VMX_EXIT_CTLS
- MSR_IA32_VMX_ENTRY_CTLS
- MSR_IA32_VMX_BASIC: emulate VMCS revision ID, etc.
- MSR_IA32_VMX_MISC

For the following MSRs, we pass through the physical value to L1 guests:

- MSR_IA32_VMX_EPT_VPID_CAP
- MSR_IA32_VMX_VMCS_ENUM
- MSR_IA32_VMX_CR0_FIXED0
- MSR_IA32_VMX_CR0_FIXED1
- MSR_IA32_VMX_CR4_FIXED0
- MSR_IA32_VMX_CR4_FIXED1

Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-16 19:05:21 +08:00
Zide Chen
4930992118 hv: nested: implement the framework for VMX MSR emulation
Define LIST_OF_VMX_MSRS which includes a list of MSRs that are visible to
L1 guests if nested virtualization is enabled.
- If CONFIG_NVMX_ENABLED is set, these MSRs are included in
  emulated_guest_msrs[].
- otherwise, they are included in unsupported_msrs[].

In this way we can take advantage of the existing infrastructure to
emulate these MSRs.

Tracked-On: #5923
Spick igned-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-16 19:05:21 +08:00
Zide Chen
ccfdf9cdd7 hv: nested: enable nested virtualization
Allow guest set CR4_VMXE if CONFIG_NVMX_ENABLED is set:

- move CR4_VMXE from CR4_EMULATED_RESERVE_BITS to CR4_TRAP_AND_EMULATE_BITS
  so that CR4_VMXE is removed from cr4_reserved_bits_mask.
- force CR4_VMXE to be removed from cr4_rsv_bits_guest_value so that CR4_VMXE
  is able to be set.

Expose VMX feature (CPUID01.01H:ECX[5]) to L1 guests whose GUEST_FLAG_NVMX_ENABLED
is set.

Assuming guest hypervisor (L1) is KVM, and KVM uses EPT for L2 guests.

Constraints on ACRN VM.
- LAPIC passthrough should be enabled.
- use SCHED_NOOP scheduler.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-13 16:16:30 +08:00
Shuo A Liu
3fffa68665 hv: Support WAITPKG instructions in guest VM
TPAUSE, UMONITOR or UMWAIT instructions execution in guest VM cause
a #UD if "enable user wait and pause" (bit 26) of VMX_PROCBASED_CTLS2
is not set. To fix this issue, set the bit 26 of VMX_PROCBASED_CTLS2.

Besides, these WAITPKG instructions uses MSR_IA32_UMWAIT_CONTROL. So
load corresponding vMSR value during context switch in of a vCPU.

Please note, the TPAUSE or UMWAIT instruction causes a VM exit if the
"RDTSC exiting" and "enable user wait and pause" are both 1. In ACRN
hypervisor, "RDTSC exiting" is always 0. So TPAUSE or UMWAIT doesn't
cause a VM exit.

Performance impact:
    MSR_IA32_UMWAIT_CONTROL read costs ~19 cycles;
    MSR_IA32_UMWAIT_CONTROL write costs ~63 cycles.

Tracked-On: #6006
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
2021-05-13 14:19:50 +08:00
Liang Yi
688a41c290 hv: mod: do not use explicit arch name when including headers
Instead of "#include <x86/foo.h>", use "#include <asm/foo.h>".

In other words, we are adopting the same practice in Linux kernel.

Tracked-On: #5920
Signed-off-by: Liang Yi <yi.liang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-05-08 11:15:46 +08:00