acrn-hypervisor

mirror of https://github.com/projectacrn/acrn-hypervisor.git synced 2025-11-24 09:29:16 +00:00

Author	SHA1	Message	Date
Zide Chen	2e6cf2b85b	hv: nested: fix bugs in init_vmx_msrs() Currently init_vmx_msrs() emulates same value for the IA32_VMX_xxx_CTLS and IA32_VMX_TRUE_xxx_CTLS MSRs. But the value of physical MSRs could be different between the pair, and we need to adjust the emulated value accordingly. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-20 09:40:50 +08:00
Zide Chen	ad37553873	hv: nested: redundant permission check on nested_vmentry() check_vmx_permission() is called in vmresume_vmexit_handler() and vmlaunch_vmexit_handler() already. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com>	2021-08-20 08:14:40 +08:00
Zhou, Wu	53f6720d13	HV: Combine the acpi loading fucntion to one place Remove the acpi loading function from elf_loader, rawimage_loaer and bzimage_loader, and call it together in vm_sw_loader. Now the vm_sw_loader's job is not just loading sw, so we rename it to prepare_os_image. Tracked-On: #6323 Signed-off-by: Zhou, Wu <wu.zhou@intel.com> Reviewed-by: Victor Sun <victor.sun@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-19 20:00:45 +08:00
Victor Sun	3124018917	HV: vm_load: rename vboot_info.h to vboot.h vboot_info.h declares vm loader function also, so rename the file name to vboot.h; Tracked-On: #6323 Signed-off-by: Victor Sun <victor.sun@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-19 20:00:45 +08:00
Fei Li	2e7491a8ec	hv: mmiodev: a minor bug fix about refine acrn_mmiodev data structure Rename base_hpa to host_pa in acrn_mmiodev data structure. Tracked-On: #6366 Signed-off-by: Fei Li <fei1.li@intel.com>	2021-08-19 12:01:35 +08:00
Liu,Junming	2c5c8754de	hv:enable GVT-d for pre-launched linux guest in logical partion mode When pass-thru GPU to pre-launched Linux guest, need to pass GPU OpRegion to the guest. Here's the detailed steps: 1. reserve a memory region in ve820 table for GPU OpRegion 2. build EPT mapping for GPU OpRegion to pass-thru OpRegion to guest 3. emulate the pci config register for OpRegion For the third step, here's detailed description: The address of OpRegion locates on PCI config space offset 0xFC, Normal Linux guest won't write this register, so we can regard this register as read-only. When guest reads this register, return the emulated value. When guest writes this register, ignore the operation. Tracked-On: #6387 Signed-off-by: Liu,Junming <junming.liu@intel.com>	2021-08-19 11:56:26 +08:00
Fei Li	a705ff2dac	hv: relocate ACPI DATA address to 0x7fe00000 Relocate ACPI address to 0x7fe00000 and ACPI NVS to 0x7ff00000 correspondingly. In this case, we could include TPM event log region [0x7ffb0000, 0x80000000) into ACPI NVS. Tracked-On: #6320 Signed-off-by: Fei Li <fei1.li@intel.com>	2021-08-11 14:45:55 +08:00
Fei Li	74e68e39d1	hv: tpm2: do tpm2 fixup for security vm ACRN used to prepare the vTPM2 ACPI Table for pre-launched VM at the build stage using config tools. This is OK if the TPM2 ACPI Table never changes. However, TPM2 ACPI Table may be changed in some conditions: change BIOS configuration or update BIOS. This patch do TPM2 fixup to update the vTPM2 ACPI Table and TPM2 MMIO resource configuration according to the physical TPM2 ACPI Table. Tracked-On: #6366 Signed-off-by: Tao Yuhong <yuhong.tao@intel.com> Signed-off-by: Fei Li <fei1.li@intel.com>	2021-08-11 14:45:55 +08:00
Fei Li	f81b39225c	HV: refine acrn_mmiodev data structure 1. add a name field to indicate what the MMIO Device is. 2. add two more MMIO resource to the acrn_mmiodev data structure. Tracked-On: #6366 Signed-off-by: Tao Yuhong <yuhong.tao@intel.com> Signed-off-by: Fei Li <fei1.li@intel.com>	2021-08-11 14:45:55 +08:00
Fei Li	20061b7c39	hv: remove xsave dependence ACRN could run without XSAVE Capability. So remove XSAVE dependence to support more (hardware or virtual) platforms. Tracked-On: #6287 Signed-off-by: Fei Li <fei1.li@intel.com>	2021-08-10 16:36:15 +08:00
Tao Yuhong	171856c46b	hv: uc-lock: Fix do not trap #GP If HV enable trigger #GP for uc-lock, and is about to emulate guest uc-lock instructions, should trap guest #GP. Guest uc-lock instrucction trigger #GP, cause vmexit for #GP, HV handle this vmexit and emulate uc-lock instruction. Tracked-On: #6299 Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>	2021-08-09 15:33:12 +08:00
Minggui Cao	80ae3224d9	hv: expose PMC to core partition VM for core partition VM (like RTVM), PMC is always used for performance profiling / tuning, so expose PMC capability and pass-through its MSRs to the VM. Tracked-On: #6307 Signed-off-by: Minggui Cao <minggui.cao@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-07-27 14:58:28 +08:00
Minggui Cao	eba8c4e78b	hv: use ARRAY_SIZE to calc local array size if one array just used in local only, and its size not used extern, use ARRAY_SIZE macro to calculate its size. Tracked-On: #6307 Signed-off-by: Minggui Cao <minggui.cao@intel.com> Reviewed-by: Junjie Mao <junjie.mao@intel.com>	2021-07-27 14:58:28 +08:00
Yifan Liu	69fef2e685	hv: debug: Add hv console callback to VM-exit event In some scenarios (e.g., nested) where lapic-pt is enabled for a vcpu running on a pcpu hosting console timer, the hv console will be inaccessible. This patch adds the console callback to every VM-exit event so that the console can still be somewhat functional under such circumstance. Since this is VM-exit driven, the VM-exit/second can be low in certain cases (e.g., idle or running stress workload). In extreme cases where the guest panics/hangs, there will be no VM-exits at all. In most cases, the shell is laggy but functional (probably enough for debugging purpose). Tracked-On: #6312 Signed-off-by: Yifan Liu <yifan1.liu@intel.com>	2021-07-22 10:08:23 +08:00
Tao Yuhong	8360c3dfe6	HV: enable #GP for UC lock For an atomic operation using bus locking, it would generate LOCK# bus signal, if it has Non-WB memory operand. This is an UC lock. It will ruin the RT behavior of the system. If MSR_IA32_CORE_CAPABILITIES[bit4] is 1, then CPU can trigger #GP for instructions which cause UC lock. This feature is controlled by MSR_TEST_CTL[bit28]. This patch enables #GP for guest UC lock. Tracked-On: #6299 Signed-off-by: Tao Yuhong <yuhong.tao@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-07-21 11:25:47 +08:00
Tao Yuhong	2aba7f31db	HV: rename splitlock file name Because the emulation code is for both split-lock and uc-lock, rename splitlock.c/splitlock.h to lock_instr_emul.c/lock_instr_emul.h Tracked-On: #6299 Signed-off-by: Tao Yuhong <yuhong.tao@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-07-21 11:25:47 +08:00
Tao Yuhong	7926504011	HV: rename split-lock emulation APIs Because the emulation code is for both split-lock and uc-lock, Changed these API names: vcpu_kick_splitlock_emulation() -> vcpu_kick_lock_instr_emulation() vcpu_complete_splitlock_emulation() -> vcpu_complete_lock_instr_emulation() emulate_splitlock() -> emulate_lock_instr() Tracked-On: #6299 Signed-off-by: Tao Yuhong <yuhong.tao@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-07-21 11:25:47 +08:00
Tao Yuhong	bbd7b7091b	HV: re-use split-lock emulation code for uc-lock Split-lock emulation can be re-used for uc-lock. In emulate_splitlock(), it only work if this vmexit is for #AC trap and guest do not handle split-lock and HV enable #AC for splitlock. Add another condition to let emulate_splitlock() also work for #GP trap and guest do not handle uc-lock and HV enable #GP for uc-lock. Tracked-On: #6299 Signed-off-by: Tao Yuhong <yuhong.tao@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-07-21 11:25:47 +08:00
Tao Yuhong	553d59644b	HV: Fix decode_instruction() trigger #UD for emulating UC-lock When ACRN uses decode_instruction to emulate split-lock/uc-lock instruction, It is actually a try-decode to see if it is XCHG. If the instruction is XCHG instruction, ACRN must emulate it (inject #PF if it is triggered) with peer VCPUs paused, and advance the guest IP. If the instruction is a LOCK prefixed instruction with accessing the UC memory, ACRN Halted the peer VCPUs, and advance the IP to skip the LOCK prefix, and then let the VCPU Executes one instruction by enabling IRQ Windows vm-exit. For other cases, ACRN injects the exception back to VCPU without emulating it. So change the API to decode_instruction(vcpu, bool full_decode), when full_decode is true, the API does same thing as before. When full_decode is false, the different is if decode_instruction() meet unknown instruction, will keep return = -1 and do not inject #UD. We can use this to distinguish that an #UD has been skipped, and need inject #AC/#GP back. Tracked-On: #6299 Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>	2021-07-21 11:25:47 +08:00
Shuo A Liu	6e0b12180c	hv: dm: Use new power management data structures struct cpu_px_data -> struct acrn_pstate_data struct cpu_cx_data -> struct acrn_cstate_data enum pm_cmd_type -> enum acrn_pm_cmd_type struct acpi_generic_address -> struct acrn_acpi_generic_address cpu_cx_data -> acrn_cstate_data cpu_px_data -> acrn_pstate_data IC_PM_GET_CPU_STATE -> ACRN_IOCTL_PM_GET_CPU_STATE PMCMD_GET_PX_CNT -> ACRN_PMCMD_GET_PX_CNT PMCMD_GET_CX_CNT -> ACRN_PMCMD_GET_CX_CNT PMCMD_GET_PX_DATA -> ACRN_PMCMD_GET_PX_DATA PMCMD_GET_CX_DATA -> ACRN_PMCMD_GET_CX_DATA Tracked-On: #6282 Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>	2021-07-15 11:53:54 +08:00
Shuo A Liu	9c910bae44	hv: dm: Use new I/O request data structures struct vhm_request -> struct acrn_io_request union vhm_request_buffer -> struct acrn_io_request_buffer struct pio_request -> struct acrn_pio_request struct mmio_request -> struct acrn_mmio_request struct ioreq_notify -> struct acrn_ioreq_notify VHM_REQ_PIO_INVAL -> IOREQ_PIO_INVAL VHM_REQ_MMIO_INVAL -> IOREQ_MMIO_INVAL REQ_PORTIO -> ACRN_IOREQ_TYPE_PORTIO REQ_MMIO -> ACRN_IOREQ_TYPE_MMIO REQ_PCICFG -> ACRN_IOREQ_TYPE_PCICFG REQ_WP -> ACRN_IOREQ_TYPE_WP REQUEST_READ -> ACRN_IOREQ_DIR_READ REQUEST_WRITE -> ACRN_IOREQ_DIR_WRITE REQ_STATE_PROCESSING -> ACRN_IOREQ_STATE_PROCESSING REQ_STATE_PENDING -> ACRN_IOREQ_STATE_PENDING REQ_STATE_COMPLETE -> ACRN_IOREQ_STATE_COMPLETE REQ_STATE_FREE -> ACRN_IOREQ_STATE_FREE IC_CREATE_IOREQ_CLIENT -> ACRN_IOCTL_CREATE_IOREQ_CLIENT IC_DESTROY_IOREQ_CLIENT -> ACRN_IOCTL_DESTROY_IOREQ_CLIENT IC_ATTACH_IOREQ_CLIENT -> ACRN_IOCTL_ATTACH_IOREQ_CLIENT IC_NOTIFY_REQUEST_FINISH -> ACRN_IOCTL_NOTIFY_REQUEST_FINISH IC_CLEAR_VM_IOREQ -> ACRN_IOCTL_CLEAR_VM_IOREQ HYPERVISOR_CALLBACK_VHM_VECTOR -> HYPERVISOR_CALLBACK_HSM_VECTOR arch_fire_vhm_interrupt() -> arch_fire_hsm_interrupt() get_vhm_notification_vector() -> get_hsm_notification_vector() set_vhm_notification_vector() -> set_hsm_notification_vector() acrn_vhm_notification_vector -> acrn_hsm_notification_vector get_vhm_req_state() -> get_io_req_state() set_vhm_req_state() -> set_io_req_state() Below structures have slight difference with former ones. struct acrn_ioreq_notify strcut acrn_io_request Tracked-On: #6282 Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>	2021-07-15 11:53:54 +08:00
Shuo A Liu	107cae316a	hv: dm: Use new ioctl ACRN_IOCTL_SET_VCPU_REGS struct acrn_set_vcpu_regs -> struct acrn_vcpu_regs struct acrn_vcpu_regs -> struct acrn_regs IC_SET_VCPU_REGS -> ACRN_IOCTL_SET_VCPU_REGS Tracked-On: #6282 Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>	2021-07-15 11:53:54 +08:00
Shuo A Liu	f476ca55ab	hv: dm: Use new VM management ioctls IC_CREATE_VM -> ACRN_IOCTL_CREATE_VM IC_DESTROY_VM -> ACRN_IOCTL_DESTROY_VM IC_START_VM -> ACRN_IOCTL_START_VM IC_PAUSE_VM -> ACRN_IOCTL_PAUSE_VM IC_RESET_VM -> ACRN_IOCTL_RESET_VM struct acrn_create_vm -> struct acrn_vm_creation Tracked-On: #6282 Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>	2021-07-15 11:53:54 +08:00
Shuo A Liu	9c1caad25a	hv: nested: Keep privilege bits sync in shadow EPT entry Guest may not use INVEPT instruction after enabling any of bits 2:0 from 0 to 1 of a present EPT entry, then the shadow EPT entry has no chance to sync guest EPT entry. According to the SDM, """ Software may use the INVEPT instruction after modifying a present EPT paging-structure entry (see Section 28.2.2) to change any of the privilege bits 2:0 from 0 to 1.1 Failure to do so may cause an EPT violation that would not otherwise occur. Because an EPT violation invalidates any mappings that would be used by the access that caused the EPT violation (see Section 28.3.3.1), an EPT violation will not recur if the original access is performed again, even if the INVEPT instruction is not executed. """ Sync the afterthought of privilege bits from guest EPT entry to shadow EPT entry to cover above case. Tracked-On: #5923 Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-07-02 09:24:12 +08:00
Shuo A Liu	a431cff94e	hv: Use 64 bits definition for 64 bits MSR_IA32_VMX_EPT_VPID_CAP operation MSR_IA32_VMX_EPT_VPID_CAP is 64 bits. Using 32 bits MACROs with it may cause the bit expression wrong. Unify the MSR_IA32_VMX_EPT_VPID_CAP operation with 64 bits definition. Tracked-On: #5923 Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-07-02 09:24:12 +08:00
Victor Sun	e371432695	HV: avoid pre-launched VM modules being corrupted by SOS kernel load When hypervisor boots, the multiboot modules have been loaded to host space by bootloader already. The space range of pre-launched VM modules is also exposed to SOS VM, so SOS VM kernel might pick this range to extract kernel when KASLR enabled. This would corrupt pre-launched VM modules and result in pre-launched VM boot fail. This patch will try to fix this issue. The SOS VM will not be loaded to guest space until all pre-launched VMs are loaded successfully. Tracked-On: #5879 Signed-off-by: Victor Sun <victor.sun@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>	2021-06-11 10:06:02 +08:00
Victor Sun	268d4c3f3c	HV: boot guest with boot params Previously the load GPA of LaaG boot params like zeropage/cmdline and initgdt are all hard-coded, this would bring potential LaaG boot issues. The patch will try to fix this issue by finding a 32KB load_params memory block for LaaG to store these guest boot params. For other guest with raw image, in general only vgdt need to be cared of so the load_params will be put at 0x800 since it is a common place that most guests won't touch for entering protected mode. Tracked-On: #5626 Signed-off-by: Victor Sun <victor.sun@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>	2021-06-11 10:06:02 +08:00
Victor Sun	ed97022646	HV: add find_space_from_ve820() api The API would search ve820 table and return a valid GPA when the requested size of memory is available in the specified memory range, or return INVALID_GPA if the requested memory slot is not available; Tracked-On: #5626 Signed-off-by: Victor Sun <victor.sun@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>	2021-06-11 10:06:02 +08:00
Victor Sun	6127c0c5d2	HV: modify low 1MB area for pre-launched VM e820 The memory range of [0xA0000, 0xFFFFF] is a known reserved area for BIOS, actually Linux kernel would enforce this area to be reserved during its boot stage. Set this area to usable would cause potential compatibility issues. The patch set the range to reserved type to make it consistent with the real world. BTW, There should be a EBDA(Entended BIOS DATA Area) with reserved type exist right before 0xA0000 in real world for non-EFI boot. But given ACRN has no legacy BIOS emulation, we simply skipped the EBDA in vE820. Tracked-On: #5626 Signed-off-by: Victor Sun <victor.sun@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>	2021-06-11 10:06:02 +08:00
Victor Sun	28b7cee412	HV: modularization: rename multiboot.h to boot.h Given the structure in multiboot.h could be used for any boot protocol, use a more generic name "boot.h" instead; Tracked-On: #5661 Signed-off-by: Victor Sun <victor.sun@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>	2021-06-11 10:06:02 +08:00
Shuo A Liu	9ae32f96af	hv: Wrap same code as a static function vmptrld_vmexit_handler() has a same code snippet with vmclear_vmexit_handler(). Wrap the same code snippet as a static function clear_vmcs02(). There is only a small logic change that add nested->current_vmcs12_ptr = INVALID_GPA in vmptrld_vmexit_handler() for the old VMCS. That's reasonable. Tracked-On: #5923 Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-06-09 10:07:05 +08:00
Shuo A Liu	387ea23961	hv: Rename get_ept_entry() to get_eptp() get_ept_entry() actually returns the EPTP of a VM. So rename it to get_eptp() for readability. Tracked-On: #5923 Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-06-09 10:07:05 +08:00
Zide Chen	b6b5373818	hv: deny access to HV owned legacy PIO UART from SOS We need to deny accesses from SOS to the HV owned UART device, otherwise SOS could have direct access to this physical device and mess up the HV console. If ACRN debug UART is configured as PIO based, For example, CONFIG_SERIAL_PIO_BASE is generated from acrn-config tool, or the UART config is overwritten by hypervisor parameter "uart=port@<port address>", it could run into problem if ACRN doesn't emulate this UART PIO port to SOS. For example: - none of the ACRN emulated vUART devices has same PIO port with the port of the debug UART device. - ACRN emulates PCI vUART for SOS (configure "console_vuart" with PCI_VUART in the scenario configuration) This patch fixes the above issue by masking PIO accesses from SOS. deny_hv_owned_devices() is moved after setup_io_bitmap() where vm->arch_vm.io_bitmap is initialized. Commit `50d852561` ("HV: deny HV owned PCI bar access from SOS") handles the case that ACRN debug UART is configured as a PCI device. e.g., hypervisor parameter "uart=bdf@<BDF value>" is appended. If the hypervisor debug UART is MMIO based, need to configured it as a PCI type device, so that it can be hidden from SOS. Tracked-On: #5923 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-06-08 16:16:14 +08:00
Yonghua Huang	25c0e3817e	hv: validate input for dmar_free_irte function Malicious input 'index' may trigger buffer overflow on array 'irte_alloc_bitmap[]'. This patch validate that 'index' shall be less than 'CONFIG_MAX_IR_ENTRIES' and also remove unnecessary check on 'index' in 'ptirq_free_irte()' function with this fix. Tracked-On: #6132 Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>	2021-06-08 09:03:10 +08:00
Yonghua Huang	4acaeb91bd	hv: remove unnecessary ASSERT in vlapic_write vlapic_write handle 'offset' that is valid and ignore all other invalid 'offset'. so ASSERT on this 'offset' input is unnecessary. This patch removes above ASSERT to avoid potential hypervisor crash by guest malicious input when debug build is used. Tracked-On: #6131 Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>	2021-06-08 09:03:10 +08:00
Shuo A Liu	15e6c5b9cf	hv: nested: audit guest EPT mapping during shadow EPT entries setup generate_shadow_ept_entry() didn't verify the correctness of the requested guest EPT mapping. That might leak host memory access to L2 VM. To simplify the implementation of the guest EPT audit, hide capabilities 'map 2-Mbyte page' and 'map 1-Gbyte page' from L1 VM. In addition, minimize the attribute bits of EPT entry when create a shadow EPT entry. Also, for invalid requested mapping address, reflect the EPT_VIOLATION to L1 VM. Here, we have some TODOs: 1) Enable large page support in generate_shadow_ept_entry() 2) Evaluate if need to emulate the invalid GPA access of L2 in HV directly. 3) Minimize EPT entry attributes. Tracked-On: #5923 Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-06-04 13:53:47 +08:00
Shuo A Liu	3110e70d0a	hv: nested: INVEPT emulation supports shadow EPT L1 VM changes the guest EPT and do INVEPT to invalidate the previous TLB cache of EPT entries. The shadow EPT replies on INVEPT instruction to do the update. The target shadow EPTs can be found according to the 'type' of INVEPT. Here are two types and their target shadow EPT, 1) Single-context invalidation Get the EPTP from the INVEPT descriptor. Then find the target shadow EPT. 2) Global invalidation All shadow EPTs of the L1 VM. The INVEPT emulation handler invalidate all the EPT entries of the target shadow EPTs. Tracked-On: #5923 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-06-04 13:53:47 +08:00
Shuo A Liu	1dc7b7f798	hv: nested: Introduce shadow EPT release function When a shadow EPT is not used anymore, its resources need to be released. free_sept_table() is introduced to walk the whole shadow EPT table and free the pagetable pages. Please note, the PML4E page of shadow EPT is not freed by free_sept_table() as it still be used to present a shadow EPT pointer. Tracked-On: #5923 Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-06-04 13:53:47 +08:00
Shuo A Liu	b10b5658bd	hv: nested: Introduce L2 VM EPT VIOLATION handler With shadow EPT, the hypervisor walks through guest EPT table: * If the entry is not present in guest EPT, ACRN injects EPT_VIOLATION to L1 VM and resumes to L1 VM. * If the entry is present in guest EPT, do the EPT_MISCONFIG check. Inject EPT_MISCONFIG to L1 VM if the check failed. * If the entry is present in guest EPT, do permission check. Reflect EPT_VIOLATION to L1 VM if the check failed. * If the entry is present in guest EPT but shadow EPT entry is not present, create the shadow entry and resumes to L2 VM. * If the entry is present in guest EPT but the GPA in the entry is invalid, injects EPT_VIOLATION to L1 VM and resumes L1 VM. Tracked-On: #5923 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-06-04 13:53:47 +08:00
Shuo A Liu	8565750bbe	hv: nested: Hide some capability bits from L1 guest * Hide 5 level EPT capability, let L1 guest stick to 4 level EPT. * Access/Dirty bits are not support currently, hide corresponding EPT capability bits. * "Mode-based execute control for EPT" is also not support well currently, hide its capability bit from MSR_IA32_VMX_PROCBASED_CTLS2. Tracked-On: #5923 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-06-04 13:53:47 +08:00
Shuo A Liu	540a484147	hv: nested: Manage shadow EPTP according to guest VMCS change 'struct nept_desc' is used to associate guest EPTP with a shadow EPTP. It's created in the first reference and be freed while no reference. The life cycle seems like, While guest VMCS VMX_EPT_POINTER_FULL is changed, the 'struct nept_desc' of the new guest EPTP is referenced; the 'struct nept_desc' of the old guest EPTP is dereferenced. While guest VMCS be cleared(by VMCLEAR in L1 VM), the 'struct nept_desc' of the old guest EPTP is dereferenced. While a new guest VMCS be loaded(by VMPTRLD in L1 VM), the 'struct nept_desc' of the new guest EPTP is referenced. The 'struct nept_desc' of the old guest EPTP is dereferenced. Tracked-On: #5923 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-06-04 13:53:47 +08:00
Shuo A Liu	10ec896f99	hv: nested: Introduce shadow EPT infrastructure To shadow guest EPT, the hypervisor needs construct a shadow EPT for each guest EPT. The key to associate a shadow EPT and a guest EPT is the EPTP (EPT pointer). This patch provides following structure to do the association. struct nept_desc { /* * A shadow EPTP. * The format is same with 'EPT pointer' in VMCS. * Its PML4 address field is a HVA of the hypervisor. / uint64_t shadow_eptp; / * An guest EPTP configured by L1 VM. * The format is same with 'EPT pointer' in VMCS. * Its PML4 address field is a GPA of the L1 VM. */ uint64_t guest_eptp; uint32_t ref_count; }; Due to lack of dynamic memory allocation of the hypervisor, a array nept_bucket of type 'struct nept_desc' is introduced to store those association information. A guest EPT might be shared between different L2 vCPUs, so this patch provides several functions to handle the reference of the structure. Interface get_shadow_eptp() also is introduced. To find the shadow EPTP of a specified guest EPTP. Tracked-On: #5923 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-06-04 13:53:47 +08:00
Shuo A Liu	17bc7f08c9	hv: nested: Create a page pool for shadow EPT construction Shadow EPT uses lots of pages to construct the shadow page table. To utilize the memory more efficient, a page poll sept_page_pool is introduced. For simplicity, total platform RAM size is considered to calculate the memory needed for shadow page tables. This is not an accurate upper bound. This can satisfy typical use-cases where there is not a lot of overcommitment and sharing of memory between L2 VMs. Memory of the pool is marked as reserved from E820 table in early stage. Tracked-On: #5923 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-06-04 13:53:47 +08:00
Zide Chen	811e367ad9	hv: nested: implement nested VM exit handler Nested VM exits happen when vCPU is in guest mode (VMCS02 is current). Initially we reflect all nested VM exits to L1 hypervisor. To prepare the environment to run L1 guest: - restore some VMCS fields to the value as what L1 hypervisor programmed. - VMCLEAR VMCS02, VMPTRLD VMCS01 and enable VMCS shadowing. - load the non-shadowing host states from VMCS12 to VMCS01 guest states. - VMRESUME to L1 guest with this modified VMCS01. Tracked-On: #5923 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Signed-off-by: Alexander Merritt <alex.merritt@intel.com>	2021-06-03 15:23:25 +08:00
Zide Chen	22d225663f	hv: nested: update run_vcpu() function for nested case Since L2 guest vCPU mode and VPID are managed by L1 hypervisor, so we can skip these handling in run_vcpu(). And be careful that we can't cache L2 registers in struct acrn_vcpu. Tracked-On: #5923 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-06-03 15:23:25 +08:00
Zide Chen	4acc65eacc	hv: nested: support for INVEPT and INVVPID emulation invvpid and invept instructions cause VM exits unconditionally. For initial support, we pass all the instruction operands as is to the pCPU. Tracked-On: #5923 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-06-03 15:23:25 +08:00
Zide Chen	4c29a0bb29	hv: nested: support for VMLAUNCH and VMRESUME emulation Implement the VMLAUNCH and VMRESUME instructions, allowing a L1 hypervisor to run nested guests. - merge VMCS control fields and VMCS guest fields to VMCS02 - clear shadow VMCS indicator on VMCS02 and load VMCS02 as current - set VMCS12 launch state to "launched" in VMLAUNCH handler Tracked-On: #5923 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Signed-off-by: Alex Merritt <alex.merritt@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-06-03 15:23:25 +08:00
Li Fei1	0d5f12e281	hv: vlapic: a minor refine about vlapic_x2apic_pt_icr_access In physical destination mode, the destination processor is specified by its local APIC ID. When a CPU switch xAPIC Mode to x2APIC Mode or vice versa, the local APIC ID is not changed. So a vcpu in x2APIC Mode could use physical Destination Mode to send an IPI to another vcpu in xAPIC Mode by writing ICR. This patch adds support for a vCPU A could write ICR to send IPI to another vCPU B which is in different APIC mode. Tracked-On: #5923 Signed-off-by: Li Fei1 <fei1.li@intel.com>	2021-05-27 09:00:08 +08:00
dongshen	5e3c6ae941	hv: vcpuid: passthrough host CPUID leaf.0BH to guest VMs Using physical APIC IDs as vLAPIC IDs for pre-Launched and post-launched VMs is not sufficient to replicate the host CPU and cache topologies in guest VMs, we also need to passthrough host CPUID leaf.0BH to guest VMs, otherwise, guest VMs may see weird CPU topology. Note that in current code, ACRN has already passthroughed host cache CPUID leaf 04H to guest VMs Tracked-On: #6020 Reviewed-by: Wang, Yu1 <yu1.wang@intel.com> Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>	2021-05-26 11:23:06 +08:00
dongshen	f332ef15b2	hv: vlapic: use physical APIC IDs as vLAPIC IDs for pre-launched and post-launched VMs In current code, ACRN uses physical APIC IDs as vLAPIC IDs for SOS, and vCPU ids (contiguous) as vLAPIC IDs for pre-Launched and post-Launched VMs. Using vCPU ids as vLAPIC IDs for pre-Launched and post-Launched VMs would result in wrong CPU and cache topologies showing in the guest VMs, and could adversely affect performance if the guest VM chooses to detect CPU and cache topologies and optimize its behavior accordingly. Uses physical APIC IDs as vLAPIC IDs (and related CPU/cache topology enumeration CPUIDs passthrough) will replicate the host CPU and cache topologies in pre-Launched and post-Launched VMs. Tracked-On: #6020 Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com> Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>	2021-05-26 11:23:06 +08:00

1 2 3 4 5 ...

1184 Commits