Commit Graph

1263 Commits

Author SHA1 Message Date
Tao Yuhong
7926504011 HV: rename split-lock emulation APIs
Because the emulation code is for both split-lock and uc-lock, Changed
these API names:
vcpu_kick_splitlock_emulation() -> vcpu_kick_lock_instr_emulation()
vcpu_complete_splitlock_emulation() -> vcpu_complete_lock_instr_emulation()
emulate_splitlock() -> emulate_lock_instr()

Tracked-On: #6299
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-07-21 11:25:47 +08:00
Tao Yuhong
bbd7b7091b HV: re-use split-lock emulation code for uc-lock
Split-lock emulation can be re-used for uc-lock. In emulate_splitlock(),
it only work if this vmexit is for #AC trap and guest do not handle
split-lock and HV enable #AC for splitlock.
Add another condition to let emulate_splitlock() also work for #GP trap
and guest do not handle uc-lock and HV enable #GP for uc-lock.

Tracked-On: #6299
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-07-21 11:25:47 +08:00
Tao Yuhong
553d59644b HV: Fix decode_instruction() trigger #UD for emulating UC-lock
When ACRN uses decode_instruction to emulate split-lock/uc-lock
instruction, It is actually a try-decode to see if it is XCHG.
If the instruction is XCHG instruction, ACRN must emulate it
(inject #PF if it is triggered) with peer VCPUs paused, and advance
the guest IP. If the instruction is a LOCK prefixed instruction
with accessing the UC memory, ACRN Halted the peer VCPUs, and
advance the IP to skip the LOCK prefix, and then let the VCPU
Executes one instruction by enabling IRQ Windows vm-exit. For
other cases, ACRN injects the exception back to VCPU without
emulating it.
So change the API to decode_instruction(vcpu, bool full_decode),
when full_decode is true, the API does same thing as before. When
full_decode is false, the different is if decode_instruction() meet unknown
instruction, will keep return = -1 and do not inject #UD. We can use
this to distinguish that an #UD has been skipped, and need inject #AC/#GP back.

Tracked-On: #6299
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
2021-07-21 11:25:47 +08:00
Shuo A Liu
6e0b12180c hv: dm: Use new power management data structures
struct cpu_px_data		->	struct acrn_pstate_data
struct cpu_cx_data		->	struct acrn_cstate_data
enum pm_cmd_type		->	enum acrn_pm_cmd_type
struct acpi_generic_address	->	struct acrn_acpi_generic_address
cpu_cx_data			->	acrn_cstate_data
cpu_px_data			->	acrn_pstate_data

IC_PM_GET_CPU_STATE		->	ACRN_IOCTL_PM_GET_CPU_STATE

PMCMD_GET_PX_CNT		->	ACRN_PMCMD_GET_PX_CNT
PMCMD_GET_CX_CNT		->	ACRN_PMCMD_GET_CX_CNT
PMCMD_GET_PX_DATA		->	ACRN_PMCMD_GET_PX_DATA
PMCMD_GET_CX_DATA		->	ACRN_PMCMD_GET_CX_DATA

Tracked-On: #6282
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
2021-07-15 11:53:54 +08:00
Shuo A Liu
9c910bae44 hv: dm: Use new I/O request data structures
struct vhm_request		->	struct acrn_io_request
union vhm_request_buffer	->	struct acrn_io_request_buffer
struct pio_request		->	struct acrn_pio_request
struct mmio_request		->	struct acrn_mmio_request
struct ioreq_notify		->	struct acrn_ioreq_notify

VHM_REQ_PIO_INVAL		->	IOREQ_PIO_INVAL
VHM_REQ_MMIO_INVAL		->	IOREQ_MMIO_INVAL
REQ_PORTIO			->	ACRN_IOREQ_TYPE_PORTIO
REQ_MMIO			->	ACRN_IOREQ_TYPE_MMIO
REQ_PCICFG			->	ACRN_IOREQ_TYPE_PCICFG
REQ_WP				->	ACRN_IOREQ_TYPE_WP

REQUEST_READ			->	ACRN_IOREQ_DIR_READ
REQUEST_WRITE			->	ACRN_IOREQ_DIR_WRITE
REQ_STATE_PROCESSING		->	ACRN_IOREQ_STATE_PROCESSING
REQ_STATE_PENDING		->	ACRN_IOREQ_STATE_PENDING
REQ_STATE_COMPLETE		->	ACRN_IOREQ_STATE_COMPLETE
REQ_STATE_FREE			->	ACRN_IOREQ_STATE_FREE

IC_CREATE_IOREQ_CLIENT		->	ACRN_IOCTL_CREATE_IOREQ_CLIENT
IC_DESTROY_IOREQ_CLIENT		->	ACRN_IOCTL_DESTROY_IOREQ_CLIENT
IC_ATTACH_IOREQ_CLIENT		->	ACRN_IOCTL_ATTACH_IOREQ_CLIENT
IC_NOTIFY_REQUEST_FINISH	->	ACRN_IOCTL_NOTIFY_REQUEST_FINISH
IC_CLEAR_VM_IOREQ		->	ACRN_IOCTL_CLEAR_VM_IOREQ
HYPERVISOR_CALLBACK_VHM_VECTOR	->	HYPERVISOR_CALLBACK_HSM_VECTOR

arch_fire_vhm_interrupt()	->	arch_fire_hsm_interrupt()
get_vhm_notification_vector()	->	get_hsm_notification_vector()
set_vhm_notification_vector()	->	set_hsm_notification_vector()
acrn_vhm_notification_vector	->	acrn_hsm_notification_vector
get_vhm_req_state()		->	get_io_req_state()
set_vhm_req_state()		->	set_io_req_state()

Below structures have slight difference with former ones.

  struct acrn_ioreq_notify
  strcut acrn_io_request

Tracked-On: #6282
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
2021-07-15 11:53:54 +08:00
Shuo A Liu
107cae316a hv: dm: Use new ioctl ACRN_IOCTL_SET_VCPU_REGS
struct acrn_set_vcpu_regs	->	struct acrn_vcpu_regs
struct acrn_vcpu_regs		->	struct acrn_regs
IC_SET_VCPU_REGS		->	ACRN_IOCTL_SET_VCPU_REGS

Tracked-On: #6282
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
2021-07-15 11:53:54 +08:00
Shuo A Liu
3deb973b7a dm: Use new ioctl ACRN_IOCTL_GET_PLATFORM_INFO
IC_GET_PLATFORM_INFO	->	ACRN_IOCTL_GET_PLATFORM_INFO
struct acrn_vm_config	->	struct acrn_vm_config_header(DM only)
struct platform_info	->	struct acrn_platform_info

Tracked-On: #6282
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
2021-07-15 11:53:54 +08:00
Shuo A Liu
a431cff94e hv: Use 64 bits definition for 64 bits MSR_IA32_VMX_EPT_VPID_CAP operation
MSR_IA32_VMX_EPT_VPID_CAP is 64 bits. Using 32 bits MACROs with it may
cause the bit expression wrong.

Unify the MSR_IA32_VMX_EPT_VPID_CAP operation with 64 bits definition.

Tracked-On: #5923
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-07-02 09:24:12 +08:00
Rong Liu
321e560968 hv: add max payload to vrp
It seems important that passthru device's max payload settings match
the settings on the native device otherwise passthru device may not work.
So we have to set vrp's max payload capacity as native root port
otherwise we may accidentally change passthru device's max payload
since during guest OS's pci device enumeration, pass-thru device will
renegotiate its max payload's setting with vrp.

Tracked-On: #5915

Signed-off-by: Rong Liu <rong.l.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-06-15 08:53:53 +08:00
Victor Sun
1b3a75c984 HV: place kernel and ramdisk by find_space_from_ve820()
We should not hardcode the VM ramdisk load address right after kernel
load address because of two reasons:
	1. Per Linux kernel boot protocol, the Kernel need a size of
	   contiguous memory(i.e. init_size field in zeropage) from
	   its load address to boot, then the address would overlap
	   with ramdisk;
	2. The hardcoded address could not be ensured as a valid address
	   in guest e820 table, especially with a huge ramdisk;

Also we should not hardcode the VM kernel load address to its pref_address
which work for non-relocatable kernel only. For a relocatable kernel,
it could run from any valid address where bootloader load to.

The patch will set the VM kernel and ramdisk load address by scanning
guest e820 table with find_space_from_ve820() api:
	1. For SOS VM, the ramdisk has been loaded by multiboot bootloader
	   already so set the load address as module source address,
	   the relocatable kernel would be relocated to a appropriate address
	   out space of hypervisor and boot modules to avoid guest memory
	   copy corruption;
	2. For pre-launched VM, the kernel would be loaded to pref_address
	   first, then ramdisk will be put to a appropriate address out space
	   of kernel according to guest memory layout and maximum ramdisk
	   address limit under 4GB;

Tracked-On: #5879

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-06-11 10:06:02 +08:00
Victor Sun
ed97022646 HV: add find_space_from_ve820() api
The API would search ve820 table and return a valid GPA when the requested
size of memory is available in the specified memory range, or return
INVALID_GPA if the requested memory slot is not available;

Tracked-On: #5626

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-06-11 10:06:02 +08:00
Victor Sun
28b7cee412 HV: modularization: rename multiboot.h to boot.h
Given the structure in multiboot.h could be used for any boot protocol,
use a more generic name "boot.h" instead;

Tracked-On: #5661

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-06-11 10:06:02 +08:00
Liang Yi
400d31916a doc: update timer HLD doc after modularization
Replace rdstc() and get_tsc_khz() with their architectural agnostic
counterparts cpu_ticks() and cpu_tickrate().

Tracked-On: #5920
Signed-off-by: Yi Liang <yi.liang@intel.com>
2021-06-09 17:11:25 -04:00
Shuo A Liu
d965f6e6a1 hv: Enlarge E820_MAX_ENTRIES to 64
e820_alloc_memory() splits one E820 entry into two entries. With vEPT
enabled, e820_alloc_memory() is called one more. On some platforms, the
e820 entries might exceed 32.

Enlarge E820_MAX_ENTRIES to 64. Please note, it must be less than 128
due to constrain of zeropage. Linux kernel defines it as 128.

Tracked-On: #6168
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-06-09 10:07:05 +08:00
Shuo A Liu
387ea23961 hv: Rename get_ept_entry() to get_eptp()
get_ept_entry() actually returns the EPTP of a VM. So rename it to
get_eptp() for readability.

Tracked-On: #5923
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-06-09 10:07:05 +08:00
Shuo A Liu
b10b5658bd hv: nested: Introduce L2 VM EPT VIOLATION handler
With shadow EPT, the hypervisor walks through guest EPT table:

  * If the entry is not present in guest EPT, ACRN injects EPT_VIOLATION
    to L1 VM and resumes to L1 VM.

  * If the entry is present in guest EPT, do the EPT_MISCONFIG check.
    Inject EPT_MISCONFIG to L1 VM if the check failed.

  * If the entry is present in guest EPT, do permission check.
    Reflect EPT_VIOLATION to L1 VM if the check failed.

  * If the entry is present in guest EPT but shadow EPT entry is not
    present, create the shadow entry and resumes to L2 VM.

  * If the entry is present in guest EPT but the GPA in the entry is
    invalid, injects EPT_VIOLATION to L1 VM and resumes L1 VM.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-06-04 13:53:47 +08:00
Shuo A Liu
540a484147 hv: nested: Manage shadow EPTP according to guest VMCS change
'struct nept_desc' is used to associate guest EPTP with a shadow EPTP.
It's created in the first reference and be freed while no reference.

The life cycle seems like,

While guest VMCS VMX_EPT_POINTER_FULL is changed, the 'struct nept_desc'
of the new guest EPTP is referenced; the 'struct nept_desc' of the old
guest EPTP is dereferenced.

While guest VMCS be cleared(by VMCLEAR in L1 VM), the 'struct nept_desc'
of the old guest EPTP is dereferenced.

While a new guest VMCS be loaded(by VMPTRLD in L1 VM), the 'struct
nept_desc' of the new guest EPTP is referenced. The 'struct nept_desc'
of the old guest EPTP is dereferenced.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-06-04 13:53:47 +08:00
Shuo A Liu
10ec896f99 hv: nested: Introduce shadow EPT infrastructure
To shadow guest EPT, the hypervisor needs construct a shadow EPT for each
guest EPT. The key to associate a shadow EPT and a guest EPT is the EPTP
(EPT pointer). This patch provides following structure to do the association.

	struct nept_desc {
	       /*
	        * A shadow EPTP.
	        * The format is same with 'EPT pointer' in VMCS.
	        * Its PML4 address field is a HVA of the hypervisor.
	        */
	       uint64_t shadow_eptp;
	       /*
	        * An guest EPTP configured by L1 VM.
	        * The format is same with 'EPT pointer' in VMCS.
	        * Its PML4 address field is a GPA of the L1 VM.
	        */
	       uint64_t guest_eptp;
	       uint32_t ref_count;
	};

Due to lack of dynamic memory allocation of the hypervisor, a array
nept_bucket of type 'struct nept_desc' is introduced to store those
association information. A guest EPT might be shared between different
L2 vCPUs, so this patch provides several functions to handle the
reference of the structure.

Interface get_shadow_eptp() also is introduced. To find the shadow EPTP
of a specified guest EPTP.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-06-04 13:53:47 +08:00
Shuo A Liu
17bc7f08c9 hv: nested: Create a page pool for shadow EPT construction
Shadow EPT uses lots of pages to construct the shadow page table. To
utilize the memory more efficient, a page poll sept_page_pool is
introduced.

For simplicity, total platform RAM size is considered to calculate the
memory needed for shadow page tables. This is not an accurate upper
bound.  This can satisfy typical use-cases where there is not a lot
of overcommitment and sharing of memory between L2 VMs.

Memory of the pool is marked as reserved from E820 table in early stage.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-06-04 13:53:47 +08:00
Zide Chen
811e367ad9 hv: nested: implement nested VM exit handler
Nested VM exits happen when vCPU is in guest mode (VMCS02 is current).
Initially we reflect all nested VM exits to L1 hypervisor. To prepare
the environment to run L1 guest:

- restore some VMCS fields to the value as what L1 hypervisor programmed.
- VMCLEAR VMCS02, VMPTRLD VMCS01 and enable VMCS shadowing.
- load the non-shadowing host states from VMCS12 to VMCS01 guest states.
- VMRESUME to L1 guest with this modified VMCS01.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Alexander Merritt <alex.merritt@intel.com>
2021-06-03 15:23:25 +08:00
Zide Chen
4acc65eacc hv: nested: support for INVEPT and INVVPID emulation
invvpid and invept instructions cause VM exits unconditionally.
For initial support, we pass all the instruction operands as is
to the pCPU.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-06-03 15:23:25 +08:00
Zide Chen
4c29a0bb29 hv: nested: support for VMLAUNCH and VMRESUME emulation
Implement the VMLAUNCH and VMRESUME instructions, allowing a L1
hypervisor to run nested guests.

- merge VMCS control fields and VMCS guest fields to VMCS02
- clear shadow VMCS indicator on VMCS02 and load VMCS02 as current
- set VMCS12 launch state to "launched" in VMLAUNCH handler

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Alex Merritt <alex.merritt@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-06-03 15:23:25 +08:00
Yonghua Huang
1a6ead9af5 hv: update RTCT ACPI table detecting
Signature of RTCT ACPI table maybe "PTCT"(v1) or "RTCT"(v2).
 and the MAGIC number in CRL header is also changed from "PTCM"
 to "RTCM".

 This patch refine the code to detect RTCT table for both
 v1 and v2.

Tracked-On: #6020
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
2021-06-01 08:22:20 +08:00
Zide Chen
6d69058a9d hv: nested: support for VMREAD and VMWRITE emulation
This patch implements the VMREAD and VMWRITE instructions.

When L1 guest is running with an active VMCS12, the “VMCS shadowing”
VM-execution control is always set to 1 in VMCS01. Thus the possible
behavior of VMREAD or VMWRITE from L1 could be:

- It causes a VM exit to L0 if the bit corresponds to the target VMCS
  field in the VMREAD bitmap or VMWRITE bitmap is set to 1.
- It accesses the VMCS referenced by VMCS01 link pointer (VMCS02 in
  our case) if the above mentioned bit is set to 0.

This patch handles the VMREAD and VMWRITE VM exits in this way:

- on VMWRITE, it writes the desired VMCS value to the respective field
  in the cached VMCS12. For VMCS fields that need to be synced to VMCS02,
  sets the corresponding dirty flag.

- on VMREAD, it reads the desired VMCS value from the cached VMCS12.

Tracked-On: #5923
Signed-off-by: Alex Merritt <alex.merritt@intel.com>
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-05-24 10:34:01 +08:00
Zide Chen
2bd269c11c hv: nested: support for VMCLEAR emulation
This patch is to emulate VMCLEAR instruction.

L1 hypervisor issues VMCLEAR on a VMCS12 whose state could be any of
these: active and current, active but not current, not yet VMPTRLDed.

To emulate the VMCLEAR instruction, ACRN sets the VMCS12 launch state to
"clear", and if L0 already cached this VMCS12, need to sync it back to
guest memory:

- sync shadow fields from shadow VMCS VMCS to cache VMCS12
- copy cache VMCS12 to L1 guest memory

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
2021-05-24 10:34:01 +08:00
Zide Chen
5379b14108 hv: nested: define VMCS shadow fields
Enable VMCS shadowing for most of the VMCS fields, so that execution of
the VMREAD or VMWRITE on these shadow VMCS fields from L1 hypervisor
won't cause VM exits, but read from or write to the shadow VMCS.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Alexander Merritt <alex.merritt@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
2021-05-24 10:34:01 +08:00
Zide Chen
863e58e539 hv: nested: define software layout for VMCS12 and helper functions
Software layout of VMCS12 data is a contract between L1 guest and L0
hypervisor to run a L2 guest.

ACRN hypervisor caches the VMCS12 which is passed down from L1 hypervisor
by the VMPTRLD instructin. At the time of VMCLEAR, ACRN syncs the cached
VMCS12 back to L1 guest memory.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-05-24 10:34:01 +08:00
Zide Chen
f5744174b5 hv: nested: support for VMPTRLD emulation
This patch emulates the VMPTRLD instruction. L0 hypervisor (ACRN) caches
the VMCS12 that is passed down from the VMPTRLD instruction, and merges it
with VMCS01 to create VMCS02 to run the nested VM.

- Currently ACRN can't cache multiple VMCS12 on one vCPU, so it needs to
  flushes active but not current VMCS12s to L1 guest.
- ACRN creates VMCS02 to run nested VM based on VMCS12:
  1) copy VMCS12 from guest memory to the per vCPU cache VMCS12
  2) initialize VMCS02 revision ID and host-state area
  3) load shadow fields from cache VMCS12 to VMCS02
  4) enable VMCS shadowing before L1 Vm entry

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
2021-05-24 10:34:01 +08:00
Zide Chen
0a1ac2f4a0 hv: nested: support for VMXOFF emulation
This patch implements the VMXOFF instruction. By issuing VMXOFF,
L1 guest Leaves VMX Operation.

- cleanup VCPU nested virtualization context states in VMXOFF handler.
- implement check_vmx_permission() to check permission for VMX operation
  for VMXOFF and other VMX instructions.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-05-24 10:34:01 +08:00
Zide Chen
3fdad3c6d1 hv: nested: check prerequisites to enter VMX operation
According to VMXON Instruction Reference, do the following checks in the
virtual hardware environment: vCPU CPL, guest CR0, CR4, revision ID
in VMXON region, etc.

Currently ACRN doesn't support 32-bit L1 hypervisor, and injects an #UD
exception if L1 hypervisor is not running in 64-bit mode.

Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-05-24 10:34:01 +08:00
Zide Chen
fc8f07e740 hv: nested: support for VMXON emulation
This patch emulates VMXON instruction. Basically checks some
prerequisites to enable VMX operation on L1 guest (next patch), and
prepares some virtual hardware environment in L0.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-05-24 10:34:01 +08:00
Li Fei1
a69e67b58b hv: vlapic: wrap a function to calculate destination vcpu mask by shorthand
1. Rename vlapic_calc_dest to vlapic_calc_dest_noshort
2. Remove vlapic_calc_dest_lapic_pt, use vlapic_calc_dest_noshort instead
3. Wrap vlapic_calc_dest to calculate destination vcpu mask according shorthand

Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-24 10:27:32 +08:00
Rong Liu
3db4491e1c hv: PTM: Create virtual root port
Create virtual root port through add_vdev hypercall. add_vdev
identifies the virtual device to add by its vendor id and device id, then
call the corresponding function to create virtual device.

	-create_vrp(): Find the right virtual root port to create
by its secondary bus number, then initialize the virtual root port.
And finally initialize PTM related configurations.

	-destroy_vrp(): nothing to destroy

Tracked-On: #5915
Signed-off-by: Rong Liu <rong.l.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Acked-by: Jason Chen <jason.cj.chen@intel.com>
Acked-by: Yu Wang <yu1.wang@intel.com>
2021-05-19 13:54:24 +08:00
Rong Liu
d57bf51c89 hv: PTM: Add virtual root port
Add virtual root port that supports the most basic pci-e bridge and root port operations.

	- init_vroot_port(): init vroot_port's basic registers.

	- deinit_vroot_port(): reset vroot_port

	- read_vroot_port_cfg(): read from vroot_port's virtual config space.

	- write_vroot_port_cfg(): write to vroot_port's virtual config space.

Tracked-On: #5915
Signed-off-by: Rong Liu <rong.l.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Acked-by: Jason Chen <jason.cj.chen@intel.com>
Acked-by: Yu Wang <yu1.wang@intel.com>
2021-05-19 13:54:24 +08:00
Liang Yi
3547c9cd23 hv/mod_timer: make timer into an arch-independent module
x86/timer.[ch] was moved to the common directory largely unchanged.

x86 specific code now resides in x86/tsc_deadline_timer.c and its
interface was defined in hw/hw_timer.h. The interface defines two
functions: init_hw_timer() and set_hw_timeout() that provides HW
specific initialization and timer interrupt source.

Other than these two functions, the timer module is largely arch
agnostic.

Tracked-On: #5920
Signed-off-by: Rong Liu <rong2.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-05-18 16:43:28 +08:00
Liang Yi
51204a8d11 hv/mod_timer: separate delay functions from the timer module
Modules that use udelay() should include "delay.h" explicitly.

Tracked-On: #5920
Signed-off-by: Rong Liu <rong2.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-05-18 16:43:28 +08:00
Liang Yi
5a2b89b0a4 hv/mod_timer: split tsc handling code from timer.
Generalize and split basic cpu cycle/tick routines from x86/timer:
- Instead of rdstc(), use cpu_ticks() in generic code.
- Instead of get_tsc_khz(), use cpu_tickrate() in generic code.
- Include "common/ticks.h" instead of "x86/timer.h" in generic code.
- CYCLES_PER_MS is renamed to TICKS_PER_MS.

The x86 specific API rdstc() and get_tsc_khz(), as well as TSC_PER_MS
are still available in arch/x86/tsc.h but only for x86 specific usage.

Tracked-On: #5920
Signed-off-by: Rong Liu <rong2.liu@intel.com>
Signed-off-by: Yi Liang <yi.liang@intel.com>
2021-05-18 16:43:28 +08:00
Yonghua Huang
00b3a28d5d hv: update RTCT parser to support RTCT version 2
RTCT has been updated to version 2,
  this patch updates hypervisor RTCT parser to support
  both version 1 and version 2 of RTCT.

Tracked-On: #6020
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Jason CJ Chen <jason.cj.chen@intel.com>
2021-05-17 17:19:11 +08:00
Zide Chen
c9982e8c7e hv: nested: setup emulated VMX MSRs
We emulated these MSRs:

- MSR_IA32_VMX_PINBASED_CTLS
- MSR_IA32_VMX_PROCBASED_CTLS
- MSR_IA32_VMX_PROCBASED_CTLS2
- MSR_IA32_VMX_EXIT_CTLS
- MSR_IA32_VMX_ENTRY_CTLS
- MSR_IA32_VMX_BASIC: emulate VMCS revision ID, etc.
- MSR_IA32_VMX_MISC

For the following MSRs, we pass through the physical value to L1 guests:

- MSR_IA32_VMX_EPT_VPID_CAP
- MSR_IA32_VMX_VMCS_ENUM
- MSR_IA32_VMX_CR0_FIXED0
- MSR_IA32_VMX_CR0_FIXED1
- MSR_IA32_VMX_CR4_FIXED0
- MSR_IA32_VMX_CR4_FIXED1

Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-16 19:05:21 +08:00
Zide Chen
4930992118 hv: nested: implement the framework for VMX MSR emulation
Define LIST_OF_VMX_MSRS which includes a list of MSRs that are visible to
L1 guests if nested virtualization is enabled.
- If CONFIG_NVMX_ENABLED is set, these MSRs are included in
  emulated_guest_msrs[].
- otherwise, they are included in unsupported_msrs[].

In this way we can take advantage of the existing infrastructure to
emulate these MSRs.

Tracked-On: #5923
Spick igned-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-16 19:05:21 +08:00
Yonghua Huang
e9870893a3 hv: rename some software SRAM local names
For simplification purpose, use 'ssram' instead of
 'software sram' for local names inside rtcm module.

Tracked-On: #6015
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-16 10:08:17 +08:00
Li Fei1
30febed0e1 hv: cache: wrap common APIs
Wrap three common Cache APIs:
- flush_invalidate_all_cache
- flush_cacheline
- flush_cache_range

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-05-14 09:18:00 +08:00
Li Fei1
77e64f6092 hv: tlb: wrap common APIs
Wrap two common TLB APIs: flush_tlb and flush_tlb_range.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-05-14 09:18:00 +08:00
Li Fei1
d94582389e hv: mmu: move arch specific parts into cpu.h
Move Cache/TLB arch specific parts into cpu.h
After this change, we should not expose arch specific parts out from mmu.h

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-05-14 09:18:00 +08:00
Li Fei1
d6362b6e0a hv: paging: rename ppt_set/clear_ATTR to set_paging_ATTR
Rename ppt_set/clear_(attribute) to set_paging_(attribute)

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-05-14 09:18:00 +08:00
Zide Chen
ccfdf9cdd7 hv: nested: enable nested virtualization
Allow guest set CR4_VMXE if CONFIG_NVMX_ENABLED is set:

- move CR4_VMXE from CR4_EMULATED_RESERVE_BITS to CR4_TRAP_AND_EMULATE_BITS
  so that CR4_VMXE is removed from cr4_reserved_bits_mask.
- force CR4_VMXE to be removed from cr4_rsv_bits_guest_value so that CR4_VMXE
  is able to be set.

Expose VMX feature (CPUID01.01H:ECX[5]) to L1 guests whose GUEST_FLAG_NVMX_ENABLED
is set.

Assuming guest hypervisor (L1) is KVM, and KVM uses EPT for L2 guests.

Constraints on ACRN VM.
- LAPIC passthrough should be enabled.
- use SCHED_NOOP scheduler.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-13 16:16:30 +08:00
Zide Chen
dd90eccc25 hv: move invvpid and invept helper code from mmu.c to mmu.h
moving invvpid and invept helper code from mmu.c to mmu.h, so that they
can be accessed by the nested virtualization code.

No logical changes.

Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-13 16:16:30 +08:00
Shuo A Liu
3fffa68665 hv: Support WAITPKG instructions in guest VM
TPAUSE, UMONITOR or UMWAIT instructions execution in guest VM cause
a #UD if "enable user wait and pause" (bit 26) of VMX_PROCBASED_CTLS2
is not set. To fix this issue, set the bit 26 of VMX_PROCBASED_CTLS2.

Besides, these WAITPKG instructions uses MSR_IA32_UMWAIT_CONTROL. So
load corresponding vMSR value during context switch in of a vCPU.

Please note, the TPAUSE or UMWAIT instruction causes a VM exit if the
"RDTSC exiting" and "enable user wait and pause" are both 1. In ACRN
hypervisor, "RDTSC exiting" is always 0. So TPAUSE or UMWAIT doesn't
cause a VM exit.

Performance impact:
    MSR_IA32_UMWAIT_CONTROL read costs ~19 cycles;
    MSR_IA32_UMWAIT_CONTROL write costs ~63 cycles.

Tracked-On: #6006
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
2021-05-13 14:19:50 +08:00
Liang Yi
688a41c290 hv: mod: do not use explicit arch name when including headers
Instead of "#include <x86/foo.h>", use "#include <asm/foo.h>".

In other words, we are adopting the same practice in Linux kernel.

Tracked-On: #5920
Signed-off-by: Liang Yi <yi.liang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-05-08 11:15:46 +08:00
Shuo A Liu
dc88c2e397 hv: Save/restore MSR_IA32_CSTAR during context switch
Both Windows guest and Linux guest use the MSR MSR_IA32_CSTAR, while
Linux uses it rarely. Now vcpu context switch doesn't save/restore it.
Windows detects the change of the MSR and rises a exception.

Do the save/resotre MSR_IA32_CSTAR during context switch.

Tracked-On: #5899
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-04-23 11:21:52 +08:00
Jian Jun Chen
31b8b698ce hv: TLFS: Add tsc_offset support for reference time
TLFS spec defines that when a VM is created, the value of
HV_X64_MSR_TIME_REF_COUNT is set to zero. Now tsc_offset is not
supported properly, so guest get a drifted reference time.

This patch implements tsc_offset. tsc_scale and tsc_offset
are calculated when a VM is launched and are saved in
struct acrn_hyperv of struct acrn_vm.

Tracked-On: #5956
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-04-23 10:48:07 +08:00
Li Fei1
628bca5cad hv: pgtable: use new algo to calculate PPT/EPT_PD_PAGE_NUM
In order to support platform (such as Ander Lake) which physical address width
bits is 46, the current code need to reserve 2^16 PD page ((2^46) / (2^30)).
This is a complete waste of memory.

This patch would reserve PD page by three parts:
1. DRAM - may take PD_PAGE_NUM(CONFIG_PLATFORM_RAM_SIZE) PD pages at most;
2. low MMIO - may take PD_PAGE_NUM(MEM_1G << 2U) PD pages at most;
3. high MMIO - may takes (CONFIG_MAX_PCI_DEV_NUM * 6U) PD pages (may plus
PDPT entries if its size is larger than 1GB ) at most for:
(a) MMIO BAR size must be a power of 2 from 16 bytes;
(b) MMIO BAR base address must be power of two in size and are aligned with
its size.

Tracked-On: #5929
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-04-22 14:35:57 +08:00
Li Fei1
41e2d40d1f hv: e820: remove get_mem_range_info
No one uses get_mem_range_info to get the top/bottom/size of the physical memory.
We could get these informations by e820 table easily.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: eddie Dong <eddie.dong@intel.com>
2021-04-21 14:00:44 +08:00
Li Fei1
ad15053304 hv: mmu: remove get_mem_range_info in init_paging
We used get_mem_range_info to get the top memory address and then use this address
as the high 64 bits max memory address. This assumes the platform must have high
memory space.

This patch calculates the high 64 bits max memory address according the e820 tables
and removes the assumption "The platform must have high memory space" by map the
low RAM region and high RAM region separately.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: eddie Dong <eddie.dong@intel.com>
2021-04-21 14:00:44 +08:00
Li Fei1
d1ae797742 hv: pgtable: move sanitize_pte into pagetable.c
sanitize_pte is used to set page table entry to map to an sanitized page to
mitigate l1tf. It should belongs to pgtable module. So move it to pagetable.c

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-03-29 13:28:55 +08:00
Li Fei1
ef90bb6db3 hv:pgtable: rename lookup_address to pgtable_lookup_entry
lookup_address is used to lookup a pagetable entry by an address. So rename it
to pgtable_lookup_entry to indicate this clearly.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-29 13:28:55 +08:00
Li Fei1
36ddd87a09 hv: pgtable: remove alloc_ept_page
alloc_page/free_page should been called in pagetable module. In order to do this,
we add pgtable_create_root and pgtable_create_trusty_root to create PML4 page table
page for normal world and secure world.

After this done, no one uses alloc_ept_page. So remove it.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-29 13:28:55 +08:00
Li Fei1
ea701c63c7 hv: pgtable: add pgtable_create_trusty_root
Add pgtable_create_trusty_root to allocate a page for trusty PML4 page table page.
This function also copy PDPT entries from Normal world to Secure world.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-29 13:28:55 +08:00
Li Fei1
596c349600 hv: pgtable: add pgtable_create_root
Add pgtable_create_root to allocate a page for PMl4 page table page.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-29 13:28:55 +08:00
Li Fei1
eb52e2193a hv: pgtable: refine name for pgtable add/modify/del
Rename mmu_add to pgtable_add_map;
Rename mmu_modify_or_del to pgtable_modify_or_del_map.
And move these functions declaration into pgtable.h

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-29 13:28:55 +08:00
Liang Yi
33ef656462 hv/mod-irq: use arch specific header files
Requires explicit arch path name in the include directive.

The config scripts was also updated to reflect this change.

Tracked-On: #5825
Signed-off-by: Peter Fang <peter.fang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-24 11:38:14 +08:00
Liang Yi
df36da1b80 hv/mod_irq: do not include x86/irq.h in common/irq.h
Each .c file includes the arch specific irq header file (with full
path) by itself if required.

Tracked-On: #5825
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-24 11:38:14 +08:00
Liang Yi
741a208a02 hv/mod_irq: cleanup x86 lapic/ioapic header files
Declarations referenced nowhere else are moved into the c file.

Tracked-On: #5825
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-24 11:38:14 +08:00
Liang Yi
6f0a7016d3 hv/mod_irq: move IPI declarations out of x86/irq.h
They are moved into the new header file x86/notify.h.

Tracked-On: #5825
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-24 11:38:14 +08:00
Liang Yi
ff732cfb2a hv/mod_irq: move guest interrupt API out of x86/irq.h
A new x86/guest/virq.h head file now contains all guest
related interrupt handling API.

Tracked-On: #5825
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-24 11:38:14 +08:00
Liang Yi
6098648373 hv/mod_irq: cleanup x86/irq.h
Move exception stack layout struct and exception/NMI handling
declarations from x86/irq.h into x86/cpu.h.

Tracked-On: #5825
Signed-off-by: Peter Fang <peter.fang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-24 11:38:14 +08:00
Liang Yi
3a50f949e1 hv/mod_irq: split irq.c into arch/x86/irq.c and common/irq.c
The common irq file is responsible for managing the central
irq_desc data structure and provides the following APIs for
host interrupt handling.
- init_interrupt()
- reserve_irq_num()
- request_irq()
- free_irq()
- set_irq_trigger_mode()
- do_irq()

API prototypes, constant and data structures belonging to common
interrupt handling are all moved into include/common/irq.h.

Conversely, the following arch specific APIs are added which are
called from the common code at various points:
- init_irq_descs_arch()
- setup_irqs_arch()
- init_interrupt_arch()
- free_irq_arch()
- request_irq_arch()
- pre_irq_arch()
- post_irq_arch()

Tracked-On: #5825
Signed-off-by: Peter Fang <peter.fang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-24 11:38:14 +08:00
Liang Yi
c46e3c71ac hv/mod_irq: decouple irq number reservation from ioapic
This is done be adding irq_rsvd_bitmap as an auxiliary bitmap
besides irq_alloc_bitmap.

Tracked-On: #5825
Signed-off-by: Peter Fang <peter.fang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-24 11:38:14 +08:00
Liang Yi
f3cae9e258 hv/mod_irq: hide arch specific data in irq_desc
Arch specific IRQ data is now an opaque pointer in irq_desc.

This is a preparation step for spliting IRQ handling into common
and architecture specific parts.

Tracked-On: #5825
Signed-off-by: Peter Fang <peter.fang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-24 11:38:14 +08:00
Li Fei1
9000381f34 hv: pgtable: move pgtable definition to pgtable.h
This patch moves pgtable definition to pgtable.h and include the proper
header file for page module.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-03-11 13:48:52 +08:00
Li Fei1
0278a3f46e hv: pgatble: move the EPT page table related APIs to ept.c
Move the EPT page table related APIs to ept.c. page module only provides APIs to
allocate/free page for page table page. pagetabl module only provides APIs to
add/modify/delete/lookup page table entry. The page pool and the page table
related APIs for EPT should defined in EPT module.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-11 13:48:52 +08:00
Li Fei1
5c71ca456a hv: pgatble: move the MMU page table related APIs to mmu.c
Move the MMU page table related APIs to mmu.c. page module only provides APIs to
allocate/free page for page table page. pagetabl module only provides APIs to
add/modify/delete/lookup page table entry. The page pool and the page table
related APIs for MMU should defined in MMU module.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-11 13:48:52 +08:00
Li Fei1
15d68675e9 hv: pgtable: separate common APIs for MMU/EPT
We would move the MMU page table related APIs to mmu.c and move the EPT related
APIs to EPT.c. The page table module only provides APIs to add/modify/delete/lookup
page table entry.

This patch separates common APIs and adds separate APIs of page table module
for MMU/EPT.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-03-11 13:48:52 +08:00
Li Fei1
80bd3ac02a hv: trusty: move post_uos_sworld_memory into vm.c
post_uos_sworld_memory are used for post-launched VM which support trusty.
It's more VM related. So move it definition into vm.c

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-03-11 13:48:52 +08:00
Yonghua Huang
1a011bd91b hv: disable guest MONITOR-WAIT support when SW SRAM is configured
Per-core software SRAM L2 cache may be flushed by 'mwait'
extension instruction, which guest VM may execute to enter
core deep sleep. Such kind of flushing is not expected when
software SRAM is enabled for RTVM.

Hypervisor disables MONITOR-WAIT support on both hypervisor
and VMs sides to protect above software SRAM from being flushed.

This patch disable ACRN guest MONITOR-WAIT support if software
SRAM is configured.

Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-03-11 09:42:44 +08:00
Yonghua Huang
ae43b2a847 hv: disable host MONITOR-WAIT support when SW SRAM is enabled
Per-core software SRAM L2 cache may be flushed by 'mwait'
extension instruction, which guest VM may execute to enter
core deep sleep. Such kind of flushing is not expected when
software SRAM is enabled for RTVM.

Hypervisor disables MONITOR-WAIT support on both hypervisor
and VMs sides to protect above software SRAM from being flushed.

This patch disable hypervisor(host) MONITOR-WAIT support and refine
software sram initializaion flow.

Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-03-11 09:42:44 +08:00
Yonghua Huang
ea44bb6c4d hv: wrap function to check software SRAM support
Below boolean function are defined in this patch:
 - is_software_sram_enabled() to check if SW SRAM
   feature is enabled or not.
 - set global variable 'is_sw_sram_initialized'
   to file static.

Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-03-11 09:42:44 +08:00
Li Fei1
768e483cd2 hv: pgtable: rename 'struct memory_ops' to 'struct pgtable'
The fields and APIs in old 'struct memory_ops' are used to add/modify/delete
page table (page or entry). So rename 'struct memory_ops' to 'struct pgtable'.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-03-10 11:42:13 +08:00
Li Fei1
ef98fa69ce hv: pgtable: remove get_default_access_right API
Use default_access_right field to replace get_default_access_right API.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-03-10 11:42:13 +08:00
Li Fei1
1db32f4d03 hv: ept: build 4KB page mapping in EPT for code pages of rtvm
RTVM is enforced to use 4KB pages to mitigate CVE-2018-12207 and performance jitter,
which may be introduced by splitting large page into 4KB pages on demand. It works
fine in previous hardware platform where the size of address space for the RTVM is
relatively small. However, this is a problem when the platforms support 64 bits
high MMIO space, which could be super large and therefore consumes large # of
EPT page table pages.

This patch optimize it by using large page for purely data pages, such as MMIO spaces,
even for the RTVM.

Signed-off-by: Li Fei1 <fei1.li@intel.com>
Tracked-On: #5788
2021-03-03 13:46:49 +08:00
Li Fei1
0579e2ee24 hv: page: add free_page
Add free_page to free page when unmap pagetable.

Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Tracked-On: #5788
2021-03-01 13:10:04 +08:00
Li Fei1
8d9f12f3b7 hv: page: use dynamic page allocation for pagetable mapping
For FuSa's case, we remove all dynamic memory allocation use in ACRN HV. Instead,
we use static memory allocation or embedded data structure. For pagetable page,
we prefer to use an index (hva for MMU, gpa for EPT) to get a page from a special
page pool. The special page pool should be big enougn for each possible index.
This is not a big problem when we don't support 64 bits MMIO. Without 64 bits MMIO
support, we could use the index to search addrss not larger than DRAM_SIZE + 4G.

However, if ACRN plan to support 64 bits MMIO in SOS, we could not use the static
memory alocation any more. This is because there's a very huge hole between the
top DRAM address and the bottom 64 bits MMIO address. We could not reserve such
many pages for pagetable mapping as the CPU physical address bits may very large.

This patch will use dynamic page allocation for pagetable mapping. We also need
reserve a big enough page pool at first. For HV MMU, we don't use 4K granularity
page table mapping, we need reserve PML4, PDPT and PD pages according the maximum
physical address space (PPT va and pa are identical mapping); For each VM EPT,
we reserve PML4, PDPT and PD pages according to the maximum physical address space
too, (the EPT address sapce can't beyond the physical address space), and we reserve
PT pages by real use cases of DRAM, low MMIO and high MMIO.

Signed-off-by: Li Fei1 <fei1.li@intel.com>
Tracked-On: #5788
2021-03-01 13:10:04 +08:00
Li Fei1
5621fabbcb hv: memory: remove get_sworld_memory_base API
memory_ops structure will be changed to store page table related fields.
However, secure world memory base address is not one of them, it's VM
related. So save sworld_memory_base_hva in vm_arch structure directly.

Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Tracked-On: #5788
2021-03-01 13:10:04 +08:00
Yonghua Huang
fdfd28b140 hv: unmap software region of pre-RTVM from Service VM EPT
Accessing to software SRAM region is not allowed when
 software SRAM is pass-thru to prelaunch RTVM.

 This patch removes software SRAM region from service VM
 EPT if it is enabled for prelaunch RTVM.

Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
2021-02-25 09:35:31 +08:00
Sainath Grandhi
80a91987f4 hv: Fix incorrect struct definition for ir_bits
Fixing an incorrect struct definition for ir_bits in ioapic_rte. Since bits after
the delivery status in the lower 32 bits are not touched by code,
this has never showed up as an issue. And the higher 32 bits in the RTE
are aligned by the compiler.

Tracked-On: #5773
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
2021-02-25 09:34:49 +08:00
Shuo A Liu
d4aaf99d86 hv: keylocker: Support keylocker backup MSRs for Guest VM
The logical processor scoped IWKey can be copied to or from a
platform-scope storage copy called IWKeyBackup. Copying IWKey to
IWKeyBackup is called ‘backing up IWKey’ and copying from IWKeyBackup to
IWKey is called ‘restoring IWKey’.

IWKeyBackup and the path between it and IWKey are protected against
software and simple hardware attacks. This means that IWKeyBackup can be
used to distribute an IWKey within the logical processors in a platform
in a protected manner.

Linux keylocker implementation uses this feature, so they are
introduced by this patch.

Tracked-On: #5695
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-02-03 13:54:45 +08:00
Shuo A Liu
38cd5b481d hv: keylocker: host keylocker iwkey context switch
Different vCPU may have different IWKeys. Hypervisor need do the iwkey
context switch.

This patch introduce a load_iwkey() function to do that. Switches the
host iwkey when the switch_in vCPU satisfies:
  1) keylocker feature enabled
  2) Different from the current loaded one.

Two opportunities to do the load_iwkey():
  1) Guest enables CR4.KL bit.
  2) vCPU thread context switch.

load_iwkey() costs ~600 cycles when do the load IWKey action.

Tracked-On: #5695
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-02-03 13:54:45 +08:00
Shuo A Liu
c11c07e0fe hv: keylocker: Support Key Locker feature for guest VM
KeyLocker is a new security feature available in new Intel CPUs that
protects data-encryption keys for the Advanced Encryption Standard (AES)
algorithm. These keys are more valuable than what they guard. If stolen
once, the key can be repeatedly used even on another system and even
after vulnerability closed.

It also introduces a CPU-internal wrapping key (IWKey), which is a key-
encryption key to wrap AES keys into handles. While the IWKey is
inaccessible to software, randomizing the value during the boot-time
helps its value unpredictable.

Keylocker usage:
 - New “ENCODEKEY” instructions take original key input and returns HANDLE
   crypted by an internal wrap key (IWKey, init by “LOADIWKEY” instruction)
 - Software can then delete the original key from memory
 - Early in boot/software, less likely to have vulnerability that allows
   stealing original key
 - Later encrypt/decrypt can use the HANDLE through new AES KeyLocker
   instructions
 - Note:
      * Software can use original key without knowing it (use HANDLE)
      * HANDLE cannot be used on other systems or after warm/cold reset
      * IWKey cannot be read from CPU after it's loaded (this is the
        nature of this feature) and only 1 copy of IWKey inside CPU.

The virtualization implementation of Key Locker on ACRN is:
 - Each vCPU has a 'struct iwkey' to store its IWKey in struct
   acrn_vcpu_arch.
 - At initilization, every vCPU is created with a random IWKey.
 - Hypervisor traps the execution of LOADIWKEY (by 'LOADIWKEY exiting'
   VM-exectuion control) of vCPU to capture and save the IWKey if guest
   set a new IWKey. Don't support randomization (emulate CPUID to
   disable) of the LOADIWKEY as hypervisor cannot capture and save the
   random IWKey. From keylocker spec:
   "Note that a VMM may wish to enumerate no support for HW random IWKeys
   to the guest (i.e. enumerate CPUID.19H:ECX[1] as 0) as such IWKeys
   cannot be easily context switched. A guest ENCODEKEY will return the
   type of IWKey used (IWKey.KeySource) and thus will notice if a VMM
   virtualized a HW random IWKey with a SW specified IWKey."
 - In context_switch_in() of each vCPU, hypervisor loads that vCPU's
   IWKey into pCPU by LOADIWKEY instruction.
 - There is an assumption that ACRN hypervisor will never use the
   KeyLocker feature itself.

This patch implements the vCPU's IWKey management and the next patch
implements host context save/restore IWKey logic.

Tracked-On: #5695
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-02-03 13:54:45 +08:00
Shuo A Liu
4483e93bd1 hv: keylocker: Enable the tertiary VM-execution controls
In order for a VMM to capture the IWKey values of guests, processors
that support Key Locker also support a new "LOADIWKEY exiting"
VM-execution control in bit 0 of the tertiary processor-based
VM-execution controls.

This patch enables the tertiary VM-execution controls.

Tracked-On: #5695
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-02-03 13:54:45 +08:00
Shuo A Liu
e9247dbca0 hv: keylocker: Simulate CPUID of keylocker caps for guest VM
KeyLocker is a new security feature available in new Intel CPUs that
protects data-encryption keys for the Advanced Encryption Standard (AES)
algorithm.

This patch emulates Keylocker CPUID leaf 19H to support Keylocker
feature for guest VM.

To make the hypervisor being able to manage the IWKey correctly, this
patch doesn't expose hardware random IWKey capability
(CPUID.0x19.ECX[1]) to guest VM.

Tracked-On: #5695
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-02-03 13:54:45 +08:00
Shuo A Liu
15c967ad34 hv: keylocker: Add CR4 bit CR4_KL as CR4_TRAP_AND_PASSTHRU_BITS
Bit19 (CR4_KL) of CR4 is CPU KeyLocker feature enable bit. Hypervisor
traps the bit's writing to track the keylocker feature on/off of guest.
While the bit is set by guest,
 - set cr4_kl_enabled to indicate the vcpu's keylocker feature enabled status
 - load vcpu's IWKey in host (will add in later patch)
While the bit is clear by guest,
 - clear cr4_kl_enabled

This patch trap and passthru the CR4_KL bit to guest for operation.

Tracked-On: #5695
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-02-03 13:54:45 +08:00
Li Fei1
94a980c923 hv: hypercall: prevent sos can touch hv/pre-launched VM resource
Current implementation, SOS may allocate the memory region belonging to
hypervisor/pre-launched VM to a post-launched VM. Because it only verifies
the start address rather than the entire memory region.

This patch verifies the validity of the entire memory region before
allocating to a post-launched VM so that the specified memory can only
be allocated to a post-launched VM if the entire memory region is mapped
in SOS’s EPT.

Tracked-On: #5555
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Yonghua Huang  <yonghua.huang@intel.com>
2021-02-02 16:55:40 +08:00
Yonghua Huang
8bec63a6ea hv: remove the hardcoding of Software SRAM GPA base
Currently, we hardcode the GPA base of Software SRAM
 to an address that is derived from TGL platform,
 as this GPA is identical with HPA for Pre-launch VM,
 This hardcoded address may not work on other platforms
 if the HPA bases of Software SRAM are different.

 Now, Offline tool configures above GPA based on the
 detection of Software SRAM on specific platform.

 This patch removes the hardcoding GPA of Software SRAM,
 and also renames MACRO 'SOFTWARE_SRAM_BASE_GPA' to
 'PRE_RTVM_SW_SRAM_BASE_GPA' to avoid confusing, as it
 is for Prelaunch VM only.

Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-01-30 13:41:02 +08:00
Yonghua Huang
a6e666dbe7 hv: remove hardcoding of SW SRAM HPA base
Physical address to SW SRAM region maybe different
 on different platforms, this hardcoded address may
 result in address mismatch for SW SRAM operations.

 This patch removes above hardcoded address and uses
 the physical address parsed from native RTCT.

Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-01-28 11:29:25 +08:00
Yonghua Huang
a6420e8cfa hv: cleanup legacy terminologies in RTCM module
This patch updates below terminologies according
 to the latest TCC Spec:
  PTCT -> RTCT
  PTCM -> RTCM
  pSRAM -> Software SRAM

Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
2021-01-28 11:29:25 +08:00
Yonghua Huang
806f479108 hv: rename RTCM source files
'ptcm' and 'ptct' are legacy name according
   to the latest TCC spec, hence rename below files
   to avoid confusing:

  ptcm.c -> rtcm.c
  ptcm.h -> rtcm.h
  ptct.h -> rtct.h

Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
2021-01-28 11:29:25 +08:00
Liang Yi
1de396363f hv: modularization: avoid dependency of multiboot on zeropage.h.
Split off definition of "struct efi_info" into a separate header
file lib/efi.h.

Tracked-On: #5661
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-01-27 15:59:47 +08:00
Liang Yi
8f9ec59a53 hv: modularization: cleanup boot.h
Move multiboot specific declarations from boot.h to multiboot.h.

Tracked-On: #5661
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-01-27 15:59:47 +08:00
Jie Deng
8aebf5526f hv: move split-lock logic into dedicated file
This patch move the split-lock logic into dedicated file
to reduce LOC. This may make the logic more clear.

Tracked-On: #5605
Signed-off-by: Jie Deng <jie.deng@intel.com>
2021-01-08 17:37:20 +08:00
Jie Deng
27d5711b62 hv: add a cache register for VMX_PROC_VM_EXEC_CONTROLS
This patch adds a cache register for VMX_PROC_VM_EXEC_CONTROLS
to avoid the frequent VMCS access.

Tracked-On: #5605
Signed-off-by: Jie Deng <jie.deng@intel.com>
2021-01-08 17:37:20 +08:00