acrn-hypervisor

mirror of https://github.com/projectacrn/acrn-hypervisor.git synced 2025-07-04 19:17:34 +00:00

Author	SHA1	Message	Date
Yifan Liu	10963b04d1	hv: Fix vcpu signaling racing problem in lock instruction emulation In lock instruction emulation, we use vcpu_make_request and signal_event pairs to shoot down/release other vcpus. However, vcpu_make_request is async and does not guarantee an execution of wait_event on target vcpu, and we want wait_event to be consistent with signal_event. Consider following scenarios: 1, When target vcpu's state has not yet turned to VCPU_RUNNING, vcpu_make_request on ACRN_REQUEST_SPLIT_LOCK does not make sense, and will not result in wait_event. 2, When target vcpu is already requested on ACRN_REQUEST_SPLIT_LOCK (i.e., the corresponding bit in pending_req is set) but not yet handled, the vcpu_make_request call does not result in wait_event as 1 bit is not enough to cache multiple requests. This patch tries to add checks in vcpu_kick_lock_instr_emulation and vcpu_complete_lock_instr_emulation to resolve these issues. Tracked-On: #6502 Signed-off-by: Yifan Liu <yifan1.liu@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>	2021-11-02 15:01:20 +08:00
Liu Long	3f4ea38158	ACRN: misc: Unify terminology for service vm/user vm Rename SOS_VM type to SERVICE_VM rename UOS to User VM in XML description rename uos_thread_pid to user_vm_thread_pid rename devname_uos to devname_user_vm rename uosid to user_vmid rename UOS_ACK to USER_VM_ACK rename SOS_VM_CONFIG_CPU_AFFINITY to SERVICE_VM_CONFIG_CPU_AFFINITY rename SOS_COM to SERVICE_VM_COM rename SOS_UART1_VALID_NUM" to SERVICE_VM_UART1_VALID_NUM rename SOS_BOOTARGS_DIFF to SERVICE_VM_BOOTARGS_DIFF rename uos to user_vm in launch script and xml Tracked-On: #6744 Signed-off-by: Liu Long <long.liu@linux.intel.com> Reviewed-by: Geoffroy Van Cutsem <geoffroy.vancutsem@intel.com>	2021-11-02 10:00:55 +08:00
Liu Long	14c6e21efa	ACRN: misc: Unify terminology for sos/uos rin macro Rename SOS_VM_NUM to SERVICE_VM_NUM. rename SOS_SOCKET_PORT to SERVICE_VM_SOCKET_PORT. rename PROCESS_RUN_IN_SOS to PROCESS_RUN_IN_SERVICE_VM. rename PCI_DEV_TYPE_SOSEMUL to PCI_DEV_TYPE_SERVICE_VM_EMUL. rename SHUTDOWN_REQ_FROM_SOS to SHUTDOWN_REQ_FROM_SERVICE_VM. rename PROCESS_RUN_IN_SOS to PROCESS_RUN_IN_SERVICE_VM. rename SHUTDOWN_REQ_FROM_UOS to SHUTDOWN_REQ_FROM_USER_VM. rename UOS_SOCKET_PORT to USER_VM_SOCKET_PORT. rename SOS_CONSOLE to SERVICE_VM_OS_CONSOLE. rename SOS_LCS_SOCK to SERVICE_VM_LCS_SOCK. rename SOS_VM_BOOTARGS to SERVICE_VM_OS_BOOTARGS. rename SOS_ROOTFS to SERVICE_VM_ROOTFS. rename SOS_IDLE to SERVICE_VM_IDLE. rename SEVERITY_SOS to SEVERITY_SERVICE_VM. rename SOS_VM_UUID to SERVICE_VM_UUID. rename SOS_REQ to SERVICE_VM_REQ. rename RTCT_NATIVE_FILE_PATH_IN_SOS to RTCT_NATIVE_FILE_PATH_IN_SERVICE_VM. rename CBC_REQ_T_UOS_ACTIVE to CBC_REQ_T_USER_VM_ACTIVE. rename CBC_REQ_T_UOS_INACTIVE to CBC_REQ_T_USER_VM_INACTIV. rename uos_active to user_vm_active. Tracked-On: #6744 Signed-off-by: Liu Long <long.liu@linux.intel.com> Reviewed-by: Geoffroy Van Cutsem <geoffroy.vancutsem@intel.com>	2021-11-02 10:00:55 +08:00
Liu Long	e9c4ced460	ACRN: hv: Unify terminology for user vm Rename gpa_uos to gpa_user_vm rename base_gpa_in_uos to base_gpa_in_user_vm rename UOS_VIRT_PCI_MMCFG_BASE to USER_VM_VIRT_PCI_MMCFG_BASE rename UOS_VIRT_PCI_MMCFG_START_BUS to USER_VM_VIRT_PCI_MMCFG_START_BUS rename UOS_VIRT_PCI_MMCFG_END_BUS to USER_VM_VIRT_PCI_MMCFG_END_BUS rename UOS_VIRT_PCI_MEMBASE32 to USER_VM_VIRT_PCI_MEMBASE32 rename UOS_VIRT_PCI_MEMLIMIT32 to USER_VM_VIRT_PCI_MEMLIMIT32 rename UOS_VIRT_PCI_MEMBASE64 to USER_VM_VIRT_PCI_MEMBASE64 rename UOS_VIRT_PCI_MEMLIMIT64 to USER_VM_VIRT_PCI_MEMLIMIT64 rename UOS in comments message to User VM. Tracked-On: #6744 Signed-off-by: Liu Long <long.liu@linux.intel.com> Reviewed-by: Geoffroy Van Cutsem <geoffroy.vancutsem@intel.com>	2021-11-02 10:00:55 +08:00
Liu Long	92b7d6a9a3	ACRN: hv: Terminology modification in hv code Rename sos_vm to service_vm. rename sos_vmid to service_vmid. rename sos_vm_ptr to service_vm_ptr. rename get_sos_vm to get_service_vm. rename sos_vm_gpa to service_vm_gpa. rename sos_vm_e820 to service_vm_e820. rename sos_efi_info to service_vm_efi_info. rename sos_vm_config to service_vm_config. rename sos_vm_hpa2gpa to service_vm_hpa2gpa. rename vdev_in_sos to vdev_in_service_vm. rename create_sos_vm_e820 to create_service_vm_e820. rename sos_high64_max_ram to service_vm_high64_max_ram. rename prepare_sos_vm_memmap to prepare_service_vm_memmap. rename post_uos_sworld_memory to post_user_vm_sworld_memory rename hcall_sos_offline_cpu to hcall_service_vm_offline_cpu. rename filter_mem_from_sos_e820 to filter_mem_from_service_vm_e820. rename create_sos_vm_efi_mmap_desc to create_service_vm_efi_mmap_desc. rename HC_SOS_OFFLINE_CPU to HC_SERVICE_VM_OFFLINE_CPU. rename SOS to Service VM in comments message. Tracked-On: #6744 Signed-off-by: Liu Long <long.liu@linux.intel.com> Reviewed-by: Geoffroy Van Cutsem <geoffroy.vancutsem@intel.com>	2021-11-02 10:00:55 +08:00
Liu Long	26e507a06e	ACRN: hv: Unify terminology for service vm Rename is_sos_vm to is_service_vm Tracked-On: #6744 Signed-off-by: Liu Long <longliu@intel.com>	2021-11-02 10:00:55 +08:00
dongshen	dcafcadaf9	hv: rename some C preprocessor macros Rename some C preprocessor macros: NUM_GUEST_MSRS --> NUM_EMULATED_MSRS CAT_MSR_START_INDEX --> FLEXIBLE_MSR_INDEX NUM_VCAT_MSRS --> NUM_CAT_MSRS NUM_VCAT_L2_MSRS --> NUM_CAT_L2_MSRS NUM_VCAT_L3_MSRS --> NUM_CAT_L3_MSRS Tracked-On: #5917 Signed-off-by: Eddie Dong <eddie.dong@Intel.com>	2021-10-28 19:12:29 +08:00
dongshen	c0d95558c1	hv: vCAT: propagate vCBM to other vCPUs that share cache with vcpu Implement the propagate_vcbm() function: Set vCBM to to all the vCPUs that share cache with vcpu to mimic hardware CAT behavior Tracked-On: #5917 Signed-off-by: dongshen <dongsheng.x.zhang@intel.com> Acked-by: Eddie Dong <eddie.dong@Intel.com>	2021-10-28 19:12:29 +08:00
dongshen	a7014f4654	hv: vCAT: implementing the vCAT MSRs write handler Implement the write_vcbm() function to handle the MSR_IA32_type_MASK_n vCBM MSRs write request Call write_vclosid() to handle MSR_IA32_PQR_ASSOC MSR write request Several vCAT P2V (physical to virtual) and V2P (virtual to physical) mappings exist: struct acrn_vm_config *vm_config = get_vm_config(vm_id) max_pcbm = vm_config->max_type_pcbm (type: l2 or l3) mask_shift = ffs64(max_pcbm) vclosid = vmsr - MSR_IA32_type_MASK_0 pclosid = vm_config->pclosids[vclosid] pmsr = MSR_IA32_type_MASK_0 + pclosid pcbm = vcbm << mask_shift vcbm = pcbm >> mask_shift Where MSR_IA32_type_MASK_n: L2 or L3 mask msr address for CLOSIDn, from 0C90H through 0D8FH (inclusive). max_pcbm: a bitmask that selects all the physical cache ways assigned to the VM vclosid: virtual CLOSID, always starts from 0 pclosid: corresponding physical CLOSID for a given vclosid vmsr: virtual msr address, passed to vCAT handlers by the caller functions rdmsr_vmexit_handler()/wrmsr_vmexit_handler() pmsr: physical msr address vcbm: virtual CBM, passed to vCAT handlers by the caller functions rdmsr_vmexit_handler()/wrmsr_vmexit_handler() pcbm: physical CBM Tracked-On: #5917 Signed-off-by: dongshen <dongsheng.x.zhang@intel.com> Acked-by: Eddie Dong <eddie.dong@Intel.com>	2021-10-28 19:12:29 +08:00
dongshen	3ab50f2ef5	hv: vCAT: implementing the vCAT MSRs read handlers Implement the read_vcbm() and read_vclosid() functions to handle the MSR_IA32_PQR_ASSOC and MSR_IA32_type_MASK_n vCAT MSRs read request. Tracked-On: #5917 Signed-off-by: dongshen <dongsheng.x.zhang@intel.com> Acked-by: Eddie Dong <eddie.dong@Intel.com>	2021-10-28 19:12:29 +08:00
dongshen	be855d2352	hv: vCAT: expose CAT capabilities to vCAT-enabled VM Expose CAT feature to vCAT VM by reporting the number of cache ways/CLOSIDs via the 04H/10H cpuid instructions, so that the VM can take advantage of CAT to prioritize and partition cache resource for its own tasks. Add the vcat_pcbm_to_vcbm() function to map pcbm to vcbm Tracked-On: #5917 Signed-off-by: dongshen <dongsheng.x.zhang@intel.com> Acked-by: Eddie Dong <eddie.dong@Intel.com>	2021-10-28 19:12:29 +08:00
dongshen	77ae989379	hv: vCAT: initialize vCAT MSRs during vmcs init Initialize vCBM MSRs Initialize vCLOSID MSR Add some vCAT functions: Retrieve max_vcbm and max_pcbm Check if vCAT is configured or not for the VM Map vclosid to pclosid write_vclosid: vCLOSID MSR write handler write_vcbm: vCBM MSR write handler Tracked-On: #5917 Signed-off-by: dongshen <dongsheng.x.zhang@intel.com> Acked-by: Eddie Dong <eddie.dong@Intel.com>	2021-10-28 19:12:29 +08:00
Fei Li	1905ed6124	hv: vMSR: minor fix about rdmsr_vmexit_handler Specifying a reserved or unimplemented MSR address in ECX for rdmsr will cause a general protection exception. In this case, we should not change the contents of registers EDX:EAX. Tracked-On: #4550 Signed-off-by: Fei Li <fei1.li@intel.com>	2021-10-27 08:23:43 +08:00
dongshen	39461ef9dd	hv: vCAT: initialize the emulated_guest_msrs array for CAT msrs during platform initialization Initialize the emulated_guest_msrs[] array at runtime for MSR_IA32_type_MASK_n and MSR_IA32_PQR_ASSOC msrs, there is no good way to do this initialization statically at build time Tracked-On: #5917 Signed-off-by: dongshen <dongsheng.x.zhang@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-10-26 11:48:27 +08:00
dongshen	cb2bb78b6f	hv/config_tools: amend the struct acrn_vm_config to make it compatible with vCAT For vCAT, it may need to store more than MAX_VCPUS_PER_VM of closids, change clos in vm_config.h to a pointer to accommodate this situation Rename clos to pclosids pclosids now is a pointer to an array of physical CLOSIDs that is defined in vm_configurations.c by vmconfig. The number of elements in the array must be equal to the value given by num_pclosids Add max_type_pcbm (type: l2 or l3) to struct acrn_vm_config, which stores a bitmask that selects/covers all the physical cache ways assigned to the VM Change vmsr.c to accommodate this amended data structure Change the config-tools to generate vm_configurations.c, and fill in the num_closids and clos pointers based on the information from the scenario file. Now vm_configurations.c.xsl generates all the clos related code so remove the same code from misc_cfg.h.xsl. Examples: Scenario file: <RDT> <RDT_ENABLED>y</RDT_ENABLED> <CDP_ENABLED>n</CDP_ENABLED> <VCAT_ENABLED>y</VCAT_ENABLED> <CLOS_MASK>0x7ff</CLOS_MASK> <CLOS_MASK>0x7ff</CLOS_MASK> <CLOS_MASK>0x7ff</CLOS_MASK> <CLOS_MASK>0xff800</CLOS_MASK> <CLOS_MASK>0xff800</CLOS_MASK> <CLOS_MASK>0xff800</CLOS_MASK> <CLOS_MASK>0xff800</CLOS_MASK> <CLOS_MASK>0xff800</CLOS_MASK> /RDT> <vm id="0"> <guest_flags> <guest_flag>GUEST_FLAG_VCAT_ENABLED</guest_flag> </guest_flags> <clos> <vcpu_clos>3</vcpu_clos> <vcpu_clos>4</vcpu_clos> <vcpu_clos>5</vcpu_clos> <vcpu_clos>6</vcpu_clos> <vcpu_clos>7</vcpu_clos> </clos> </vm> <vm id="1"> <clos> <vcpu_clos>1</vcpu_clos> <vcpu_clos>2</vcpu_clos> </clos> </vm> vm_configurations.c (generated by config-tools) with the above vCAT config: static uint16_t vm0_vcpu_clos[5U] = {3U, 4U, 5U, 6U, 7U}; static uint16_t vm1_vcpu_clos[2U] = {1U, 2U}; struct acrn_vm_config vm_configs[CONFIG_MAX_VM_NUM] = { { .guest_flags = (GUEST_FLAG_VCAT_ENABLED), .pclosids = vm0_vcpu_clos, .num_pclosids = 5U, .max_l3_pcbm = 0xff800U, }, { .pclosids = vm1_vcpu_clos, .num_pclosids = 2U, }, }; Tracked-On: #5917 Signed-off-by: dongshen <dongsheng.x.zhang@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-10-26 11:48:27 +08:00
dongshen	368f158b46	hv/config-tools: add the support for vCAT Add the VCAT_ENABLED element to RDTType so that user can enable/disable vCAT globally Add the GUEST_FLAG_VCAT_ENABLED guest flag to enable/disable vCAT per-VM. Currently we have the following per-VM clos element in scenario file for RDT use: <clos> <vcpu_clos>0</vcpu_clos> <vcpu_clos>0</vcpu_clos> </clos> When the GUEST_FLAG_VCAT_ENABLED guest flag is not specified, clos is for RDT use, vcpu_clos is per-CPU and it configures each CPU in VMs to a desired CLOS ID. When the GUEST_FLAG_VCAT_ENABLED guest flag is specified, vCAT is enabled for this VM, clos is for vCAT use, vcpu_clos is not per-CPU anymore in this case, just a list of physical CLOSIDs (minimum 2) that are assigned to VMs for vCAT use. Each vcpu_clos will be mapped to a virtual CLOSID, the first vcpu_clos is mapped to virtual CLOSID 0 and the second is mapped to virtual CLOSID 1, etc Add xs:assert to prevent any problems with invalid configuration data for vCAT: If any GUEST_FLAG_VCAT_ENABLED guest flag is specified, both RDT_ENABLED and VCAT_ENABLED must be 'y' If VCAT_ENABLED is 'y', RDT_ENABLED must be 'y' and CDP_ENABLED must be 'n' For a vCAT VM, vcpu_clos cannot be set to CLOSID 0, CLOSID 0 is reserved to be used by hypervisor For a vCAT VM, number of clos/vcpu_clos elements must be greater than 1 For a vCAT VM, each clos/vcpu_clos must be less than L2/L3 COS_MAX For a vCAT VM, its clos/vcpu_clos elements cannot contain duplicate values There should not be any CLOS IDs overlap between a vCAT VM and any other VMs Tracked-On: #5917 Signed-off-by: dongshen <dongsheng.x.zhang@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-10-26 11:48:27 +08:00
Yonghua Huang	c8e2060d37	hv: unmap IOMMU register pages from service VM EPT IOMMU hardware resource is owned by hypervisor, while IOMMU capability is reported to service VM in its ACPI table. In this case, Service VM may access IOMMU hardware resource, which is not expected. This patch unmaps all Intel IOMMU register pages for service VM EPT. Tracked-On: #6677 Signed-off-by: Yonghua Huang <yonghua.huang@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com> Reviewed-by: Victor Sun <victor.sun@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-10-22 09:31:10 +08:00
David B. Kinder	2913395123	doc: update uses of VHM in doxygen comments PR #6283 updated code and docs to the new kernel HSM driver. Fix some references to VHM missed in the doxygen comments. Also fixed some misspellings while in these files. Tracked-On: #6282 Signed-off-by: David B. Kinder <david.b.kinder@intel.com>	2021-10-18 19:09:07 -07:00
Liu,Junming	79a5d7a787	hv: initialize IGD offset 0xfc of CFG space for Service VM For the IGD device the opregion addr is returned by reading the 0xFC config of 0:02.0 bdf. And the opregion addr is required by GPU driver. The opregion_addr should be the GPA addr. When the IGD is assigned to pre-launched VM, the value in 0xFC of igd_vdev is programmed into with new GPA addr. In such case the prelaunched VM reads the value from 0xFC of 0:02.0 vdev. But for the Service VM, the IGD is initialized by using the same policy as other PCI devices. We only initialize the vdev_head_conf(0x0-0x3F) by checking the corresponding pbdf. The remaining pci_config_space will be read by leveraging the corresponding pdev. But as the above code doesn't handle the scenario for Service VM, it causes that the Service VM fails to read the 0xFC config_space for IGD vdev. Then the i915 GPU driver in SOS has some issues because of incorrect 0xFC pci_conf_space. This patch initializes offset 0xfc of CFG space of IGD for Service VM, it is simple and can cover post-launched VM too. Tracked-On: #6387 Signed-off-by: Liu,Junming <junming.liu@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-10-18 09:11:16 +08:00
Xiangyang Wu	dec8d7e22f	hv: support at most MAX_VUART_NUM_PER_VM legacy vuarts In the current hypervisor, only support at most two legacy vuarts (COM1 and COM2) for a VM, COM1 is usually configured as VM console, COM2 is configured as communication channel of S5 feature. Hypervisor can support MAX_VUART_NUM_PER_VM(8) legacy vuart, but only register handlers for two legacy vuart since the assumption (legacy vuart is less than 2) is made. In the current hypervisor configurtion, io port (2F8H) is always allocated for virtual COM2, it will be not friendly if user wants to assign this port to physical COM2. Legacy vuart is common communication channel between service VM and user VM, it can work in polling mode and its driver exits in each guest OS. The channel can be used to send shutdown command to user VM in S5 featuare, so need to config serval vuarts for service VM and one vuart for each user VM. The following changes will be made to support at most MAX_VUART_NUM_PER_VM legacy vuarts: - Refine legacy vuarts initialization to register PIO handler for related vuart. - Update assumption of legacy vuart number. BTW, config tools updates about legacy vuarts will be made in other patch. v1-->v2: Update commit message to make this patch's purpose clearer; If vuart index is valid, register handler for it. Tracked-On: #6652 Signed-off-by: Xiangyang Wu <xiangyang.wu@intel.com> Acked-by: Eddie Dong <eddie.dong@Intel.com>	2021-10-15 10:00:02 +08:00
Fei Li	6c5bf4a642	hv: enhance e820_alloc_memory could allocate memory than 4G Enhance e820_alloc_memory could allocate memory than 4G. Tracked-On: #5830 Signed-off-by: Fei Li <fei1.li@intel.com>	2021-10-14 15:04:36 +08:00
Fei Li	df7ffab441	hv: remove CONFIG_HV_RAM_SIZE It's difficult to configure CONFIG_HV_RAM_SIZE properly at once. This patch not only remove CONFIG_HV_RAM_SIZE, but also we use ld linker script to dynamically get the size of HV RAM size. Tracked-On: #6663 Signed-off-by: Fei Li <fei1.li@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-10-14 15:04:36 +08:00
Zide Chen	e48962faa6	hv: optimize run_vcpu() for nested This patch implements a separate path for L2 VMEntry in run_vcpu(), which has several benefits: - keep run_vcpu() clean, to reduce the number of is_vcpu_in_l2_guest() statements: - current code has three is_vcpu_in_l2_guest() already. - supposed to have another 2 statement so that nested VMEntry won't hit the "Starting vCPU" and "vCPU launched" pr_info() and a few other statements in the VM launch path. - save few other things in run_vcpu() that are not needed for nested. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@Intel.com>	2021-10-13 15:55:31 +08:00
Zide Chen	89bbc44962	hv: inject external interrupts only if LAPIC is not passthru Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@Intel.com>	2021-10-08 09:18:34 +08:00
Zide Chen	228b052fdb	hv: operations on vcpu->reg_cached/reg_updated don't need LOCK prefix In run time, one vCPU won't read or write a register on other vCPUs, thus we don't need the LOCK prefixed instructions on reg_cached and reg_updated. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-10-08 09:11:10 +08:00
Zide Chen	2b683f8f5b	hv: call vcpu_inject_exception() only when ACRN_REQUEST_EXCP is set move the bitmap test call out of vcpu_inject_exception(), then we call the expensive bitmap_test_and_clear_lock() only pending_req_bits is non-zero and call vcpu_inject_exception() only if needed. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@Intel.com>	2021-10-07 20:48:43 +08:00
Zide Chen	f801ba4ed7	hv: update guest RIP only if vcpu->arch.inst_len is non zero In very large number of VM extis, the VM-exit instruction length could be zero, and it's no need to update VMX_GUEST_RIP. Some examples: - all external interrupt VM exits in non LAPIC passthru setup. - for all the nested VM-exits that are reflecting to L1 hypervisor. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-10-07 20:47:07 +08:00
Zide Chen	b7e9a68923	hv: code cleanup in run_vcpu() - wrap a new function exec_vmentry() to reduce code duplication. - remove exec_vmread(VMX_GUEST_RSP) since ACRN doesn't need to know guest RSP in run time. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-10-07 20:47:07 +08:00
Zide Chen	ee12daff84	hv: nested: refine vmcs12_read/write_field APIs Change "uint64_t vmcs_hva" to "void *vmcs_hva" in the input argument, list, so that no type casting is needed when calling them from pointers. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-10-07 20:45:34 +08:00
Kunhui-Li	2a8c587824	config_tools: update board name in makefile update board name from nuc7i7dnb to nuc11tnbi5 in makefile because we have removed the nuc7i7dnb board folder, and also update the scenario name from industry to shared to fix "make all" build issue. Tracked-On: #6315 Signed-off-by: Kunhui-Li <kunhuix.li@intel.com>	2021-09-29 16:53:44 +08:00
Liu,Junming	545c006a33	hv: inject #GP if guest tries to reprogram pass-thru dev PIO bar In current design, when pass-thru dev, for the PIO bar, need to ensure the guest PIO start address equals to host PIO start address. But malicious guest may reprogram the PIO bar, then hv will pass-thru the reprogramed PIO address to guest. This isn't safe behavior. When guest tries to reprogram pass-thru dev PIO bar, inject #GP to guest directly. Tracked-On: #6508 Signed-off-by: Liu,Junming <junming.liu@intel.com> Reviewed-by: Zhao Yakui <yakui.zhao@intel.com> Reviewed-by: Fei Li <fei1.li@intel.com>	2021-09-28 08:49:01 +08:00
Liu,Junming	4105ca2cb4	hv: deny the launch of VM if pass-thru PIO bar isn't identical mapping In current design, when pass-thru dev, for the PIO bar, need to ensure the guest PIO start address equals to host PIO start address. Then set the VMCS io bitmap to pass-thru the corresponding port io to guest for performance. ACRN-DM and acrn-config should ensure the identical mapping of PIO bar. If ACRN-DM or acrn-config failed to achieve this, we should deny the launch of VM Tracked-On: #6508 Signed-off-by: Liu,Junming <junming.liu@intel.com> Reviewed-by: Zhao Yakui <yakui.zhao@intel.com> Reviewed-by: Fei Li <fei1.li@intel.com>	2021-09-28 08:49:01 +08:00
Victor Sun	28824c1e74	HV: init e820 before init paging In the commit of `4e1deab3d9`, we changed the init sequence that init paging first and then init e820 because we worried about the efi memory map could be beyond 4GB space on some platform. After we double checked multiboot2 spec, when system boot from multiboot2 protocol, the efi memory map info will be embedded in multiboot info so it is guaranteed that the efi memory map must be under 4GB space. Consider that the page table will be allocated in free memory space in future, we have to change the init sequence back that init e820 first and then init paging. If we need to support other boot protocol in future that the efi memory map might be put beyond 4GB, we could have below options: 1. Request bootloader put efi memory map below 4GB; 2. Call EFI_BOOT_SERVICES.GetMemoryMap() before ExitBootServices(); 3. Enable a early 64bit page table to get the efi memory map only; Tracked-On: #5626 Signed-off-by: Victor Sun <victor.sun@intel.com>	2021-09-27 09:03:15 +08:00
Zide Chen	a62dd6ad8a	hv: nested: fixed vmxoff_vmexit_handler() issue In VMXOFF vmexit handler, it's supposed to remove VMCS shadowing. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com>	2021-09-26 08:49:35 +08:00
Zide Chen	45b036e028	hv: nested: enable multiple active VMCS12 support This patch changes the size of vvmcs[] array from 1 to PER_VCPU_ACTIVE_VVMCS_NUM, and actually enables multiple active VMCS12 support in ACRN. The basic operations: - if L1 VMPTRLDs a VMCS12 without previously VMCLEAR the current VMCS12, ACRN no longer unconditionally flushes the current VMCS12 back to L1. Instead, it tries to keep both the current and the newly loaded VMCS12 in the nested->vvmcs[] array, unless: - if there is no more available vvmcs[] entry, ACRN flushes one active VMCS12 to make room for this new VMCS12. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-09-26 08:49:35 +08:00
Mingqiang Chi	f39c882359	hv:change log level for check_vmx_ctrl Some processors don't support VMX_PROCBASED_CTLS_TERTIARY bit and VMX_PROCBASED_CTLS2_UWAIT_PAUSE bit in MSRs (IA32_VMX_PROCBASED_CTLS & IA32_VMX_PROCBASED_CTLS2), HV will output error log which will cause confusion, change the log level from pr_err to pr_info. Tracked-On: #6397 Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>	2021-09-24 10:17:19 +08:00
Jie Deng	064fd7647f	hv: add priority based scheduler This patch adds a new priority based scheduler to support vCPU scheduling based on their pre-configured priorities. A vCPU can be running only if there is no higher priority vCPU running on the same pCPU. Tracked-On: #6571 Signed-off-by: Jie Deng <jie.deng@intel.com>	2021-09-24 09:32:18 +08:00
Junjie Mao	efcb9e2fdf	Makefile: fix wrong reference to board XML and skip binary in diffconfig The current config.mk uses the variable BOARD_FILE as the path to the board XML when generating an unmodified copy of configuration files for comparison, which is incorrect. The right variable is HV_BOARD_XML which is the path to the copy of board XML that is actually used for the build. This patch corrects the bug above. In addition, this patch also skips binary files (which are not meant to be edited manually) when calculating the differences. Tracked-On: #6592 Signed-off-by: Junjie Mao <junjie.mao@intel.com>	2021-09-19 20:23:44 +08:00
Fei Li	53fe6d63be	hv: vioapic: update remote IRR for lapic-pt For local APIC passthrough case, EOI would not trigger VM-exit. So virtual 'Remote IRR' would not be updated. Needs to read physical IOxAPIC RTE to update virtual 'Remote IRR' field each time when guest wants to read I/O REDIRECTION TABLE REGISTERS Tracked-On: #5923 Signed-off-by: Fei Li <fei1.li@intel.com>	2021-09-18 09:42:44 +08:00
Zide Chen	94cbe909ee	hv: irq: identical vector mapping if LAPIC passthough In local APIC passthrough case, when devices triggered a INTx interrupt, this interrupt would be delivered to vCPU directly. For this case, need to set the virtual vector in the 'Interrupt Vector' field of physical IOxAPIC I/O REDIRECTION TABLE REGISTER (bits 7:0) and 'Vector' field of vt-d Interrupt Remapping Table Entry (IRTE) for Remapped Interrupts. Assumption: (a) IOAPIC pins won't be shared between LAPIC PT guest and other guests; (b) The guest would not trigger this IRQ before it switched to x2 APIC mode. Tracked-On: #5923 Signed-off-by: Zide Chen <zide.chen@intel.com>	2021-09-18 09:42:44 +08:00
Mingqiang Chi	db98f01b6e	add vmx capability check check some essential vmx capablility, will panic if processor doesn't support it. Tracked-On: #6584 Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-09-18 08:44:30 +08:00
dongshen	08d4517431	hv: fix bugs in RDT's CDP code In current RDT code, if CDP is configured, L2/L3 resources' num_closids calculation is wrong: res_cap_info[res].num_closids = (uint16_t)((edx & 0xffffU) >> 1U) + 1U; Should be: res_cap_info[res].num_closids = (uint16_t)((edx & 0xffffU) >> 1U + 1) >> 1U; Aslo, in order to enable CDP system-wide, need to enable the CDP bit (bit 0) on all pcpus, not just on pcpu 0. Tracked-On: #5917 Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>	2021-09-17 16:29:05 +08:00
dongshen	f4cdbba0bd	hv: some cosmetic fixes to rdt.c/rdt.h Rename the clos_max field in struct rdt_info to num_closids Rename variable valid_clos_num to common_num_closids and make it static Tracked-On: #5917 Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>	2021-09-17 16:29:05 +08:00
Liu Long	2de395b6f6	HV: Normalize hypervisor help output format Normalize hypervisor help command output format, remove the 10 lines limit for one screen, fix the misspelled words. Tracked-On: #5112 Signed-off-by: Liu Long <long.liu@intel.com> Reviewed-by: VanCutsem, Geoffroy <geoffroy.vancutsem@intel.com>	2021-09-17 11:06:18 +08:00
Zide Chen	0466d7055f	hv: nested: move the VMCS12 dirty flags to struct acrn_vvmcs These dirty flags are supposed to be per VMCS12, so move them from the per vCPU acrn_nested struct to the newly added acrn_vvmcs struct. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-09-17 10:58:43 +08:00
Zide Chen	4e54c3880b	hv: nested: remove vcpu->arch.nested.current_vmcs12_ptr This variable represents the L1 GPA of the current VMCS12. But it's no longer needed in the multiple active VMCS12 case, which uses the following variables for this purpose. - nested->current_vvmcs refers to the vvmcs[] entry which contains the cached current VMCS12, its associated VMCS02, and other context info. - nested->current_vvmcs->vmcs12_gpa refers to the L1 GPA of this current VMCS12. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-09-17 10:58:43 +08:00
Zide Chen	799a4d332a	hv: nested: initial implementation of struct acrn_vvmcs Add an array of struct acrn_vvmcs to struct acrn_nested, so it is possible to cache multiple active VMCS12s. This patch declares the size of this array to 1, meaning that there is only one active VMCS12. This is to minimize the logical code changes. Add pointer current_vvmcs to struct acrn_nested, which refers to the current vvmcs[] entry. In this patch, if any VMCS12 is active, it always points to vvmcs[0]. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-09-17 10:58:43 +08:00
Zide Chen	cf697e753d	hv: nested: some API signature changes No any logical changes, this patch is preparing for multiple active VMCS12 support. - currently it's easy to get the vmcs12 pointer from the vcpu pointer. In multiple active vmcs12 case, we need to explicitly add "struct acrn_vmcs12 *vmcs12" to certain APIs' input argument list, in order to get the desired vmcs12 pointer. - merge flush_current_vmcs12() into clear_vmcs02() for multiple reasons: a) it's called only once; b) we don't wrap the opposite operation (loading vmcs12) in an API; c) this API has simple and clear logic. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-09-17 10:58:43 +08:00
Zide Chen	e9eb72d319	hv: nested: flush L2 VPID only when it could conflict with L1 VPIDs By changing the way to assign L1 VPID from bottom-up to top-down, the possibilities for VPID conflicts between L1 and L2 guests are small. Then we can flush VPID just in case of conflicting. Tracked-On: #6289 Signed-off-by: Anthony Xu <anthony.xu@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-09-16 09:26:10 +08:00
Fei Li	0a515ab2ea	hv: pci: fix a minor bug about is_pci_cfg_multifunction Before checking whether a PCI device is a Multi-Function Device or not, we need make sure this PCI device is a valid PCI device. For a valid PCI device, the 'Header Layout' field in Header Type Register must be 000 0000b (Type 0 PCI device) or 000 0001b (Type 1 PCI device). So for a valid PCI device, the Header Type can't be 0xff. Tracked-On: #4134 Signed-off-by: Fei Li <fei1.li@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-09-15 13:24:18 +08:00
Zide Chen	1ab65825ba	hv: nested: merge gpa_field_dirty and control_field_dirty flag In run time, it's rare for L1 to write to the intercepted non host-state VMCS fields, and using multiple dirty flags is not necessary. This patch uses one single dirty flag to manage all non host-state VMCS fields. This helps to simplify current code and in the future we may not need to declare new dirty flags when we intercept more VMCS fields. Tracked-On: #5923 Signed-off-by: Zide Chen <zide.chen@intel.com>	2021-09-13 15:50:01 +08:00
Zide Chen	6376d5a0d3	hv: nested: fix bug in syncing EPTP from VMCS12 to VMCS02 Currently vmptrld_vmexit_handler() doesn't sync VMX_EPT_POINTER_FULL from vmcs12 to vmcs02, instead it sets gpa_field_dirty and relies on nested_vmentry() to sync EPTP in next nested VMentry. This creates readability issue since all other intercepted VMCS fields are synced in sync_vmcs12_to_vmcs02(). Another issue is that other VMCS fields managed by gpa_field_dirty are repeatedly synced in both vmptrld and nested vmentry handler. This patch moves get_nept_desc() ahead of sync_vmcs12_to_vmcs02(), such that shadow_eptp is allocated before sync_vmcs12_to_vmcs02() which can sync EPTP properly. BTW, in nested_vmexit_handler(), don't need to read from VMCS to get the exit reason, since vcpu->arch.exit_reason has it already. Tracked-On: #5923 Signed-off-by: Zide Chen <zide.chen@intel.com>	2021-09-13 15:50:01 +08:00
Geoffroy Van Cutsem	01bf5110c5	Makefile: add missing deps in top-level and hypervisor Makefile Add a couple of missing dependencies in the ACRN Makefiles: 1. 'acrn.bin' is required before the hypervisor can be installed 2. The 'acrn_mngr.h' needs to be installed ('tools-install') in the build folder. Tracked-On: #6360 Signed-off-by: Geoffroy Van Cutsem <geoffroy.vancutsem@intel.com>	2021-09-13 11:28:14 +08:00
Junjie Mao	2bfaa34cf2	config_tools: populate default values in scenario XML While we have default values of configuration entries stated in the schema of scenario XMLs, today we still require user-given scenario XMLs to contain literally ALL XML nodes. Missing of a single node will cause schema validation errors even though we can use its default value defined in the schema. This patch allows user-given scenario XMLs to ignore nodes with default values. It is done by adding the missing nodes, all containing the defined default values, to the input scenario XML when copying it to the build directory. This approach imposes no changes to either the schema or subsequent scripts in the build system. Tracked-On: #6292 Signed-off-by: Junjie Mao <junjie.mao@intel.com>	2021-09-13 09:05:52 +08:00
Yifan Liu	0a1ad45b32	hv: Avoid using SMBIOS major version Previously it is (falsely) assumed that the major_ver of 32-bit SMBIOS entry point structure (which is called SMBIOS 2.1 in spec, or SMBIOS2 in code) will have a value of 2 and major_ver of 64-bit SMBIOS (which is called SMBIOS 3.0 in spec, and SMBIOS3 in code) will have a value of 3. This turned out to be wrong. This major_ver refers to the implemented doc revision, and 32-bit SMBIOS2 can have its major_ver to be 3 (current most recent implementation). This patch removes the use of major_ver to distinguish between SMBIOS2/3, and use a doc-defined anchor string instead. Tracked-On: #6528 Signed-off-by: Yifan Liu <yifan1.liu@intel.com>	2021-09-08 15:22:12 +08:00
Zide Chen	11c2f3eabb	hv: check bitmap before calling bitmap_test_and_clear_lock() The locked btr instruction is expensive. This patch changes the logic to ensure that the bitmap is non-zero before executing bitmap_test_and_clear_lock(). The VMX transition time gets significant improvement. SOS running on TGL, the CPUID roundtrip reduces from ~2400 cycles to ~2000 cycles. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-09-02 16:09:33 +08:00
Zide Chen	7cde4a8d40	hv: initialize host IA32_PAT MSR Currently ACRN assumes firmware setup IA32_PAT correctly. This patch explicitly initializes host IA32_PAT MSR according to ISDM Table 11-12. Memory Type Setting of PAT Entries Following a Power-up or Reset. ACRN creates host page tables based on PAT0 (WB) and PAT3 (UC). Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com>	2021-09-02 09:15:39 +08:00
Zide Chen	aeb3690b6f	hv: simplify is_lapic_pt_enabled() is_lapic_pt_enabled() is called at least twice in one loop of the vCPU thread, and it's called in vmexit_handler() frequently if LAPIC is not pass-through. Thus the efficiency of this function has direct impact to the system performance. Since the LAPIC mode is not changed in run time, we don't have to calculate it on the fly in is_lapic_pt_enabled(). BTW, removed the unused lapic_mask from struct acrn_vcpu_arch. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-26 09:52:10 +08:00
Shiqing Gao	d90dbc0d91	hv: check the capability of XSAVES/XRSTORS instructions before execution For platforms that do not support XSAVES/XRSTORS instructions, like QEMU, executing these instructions causes #UD. This patch adds the check before the execution of XSAVES/XRSTORS instructions. It also refines the logic inside rstore_xsave_area for the following reason: If XSAVES/XRSTORS instructions are supported, restore XSAVE area if any of the following conditions is met: 1. "vcpu->launched" is false (state initialization for guest) 2. "vcpu->arch.xsave_enabled" is true (state restoring for guest) * Before vCPU is launched, condition 1 is satisfied. * After vCPU is launched, condition 2 is satisfied because is_valid_xsave_combination() guarantees that "vcpu->arch.xsave_enabled" is consistent with pcpu_has_cap(X86_FEATURE_XSAVES). Therefore, the check against "vcpu->launched" and "vcpu->arch.xsave_enabled" can be eliminated here. Tracked-On: #6481 Signed-off-by: Shiqing Gao <shiqing.gao@intel.com> Acked-by: Eddie Dong <eddie.dong@Intel.com>	2021-08-26 09:42:23 +08:00
Zide Chen	cbf3825140	hv: Pass-through IA32_TSC_AUX MSR to L1 guest Use an unused MSR on host to save ACRN pcpu ID and avoid saving and restoring TSC AUX MSR on VMX transitions. Tracked-On: #6289 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Reviewed-by: Eddie Dong <eddie.dong@intel.com>	2021-08-26 09:25:54 +08:00
Yifan Liu	d33c76f701	hv: quirks: SMBIOS passthrough for prelaunched-VM This feature is guarded under config CONFIG_SECURITY_VM_FIXUP, which by default should be disabled. This patch passthrough native SMBIOS information to prelaunched VM. SMBIOS table contains a small entry point structure and a table, of which the entry point structure will be put in 0xf0000-0xfffff region in guest address space, and the table will be put in the ACPI_NVS region in guest address space. v2 -> v3: uuid_is_equal moved to util.h as inline API result -> pVendortable, in function efi_search_guid recalc_checksum -> generate_checksum efi_search_smbios -> efi_search_smbios_eps scan_smbios_eps -> mem_search_smbios_eps EFI GUID definition kept Tracked-On: #6320 Signed-off-by: Yifan Liu <yifan1.liu@intel.com>	2021-08-26 09:24:50 +08:00
Yifan Liu	975ff33e01	hv: Move uuid_is_equal to util.h This patch moves uuid_is_equal from vm_config.c to util.h as inline API. Tracked-On: #6320 Signed-off-by: Yifan Liu <yifan1.liu@intel.com>	2021-08-26 09:24:50 +08:00
Yifan Liu	32d6ead8de	hv && config-tool: Rename GUEST_FLAG_TPM2_FIXUP This patch renames the GUEST_FLAG_TPM2_FIXUP to GUEST_FLAG_SECURITY_VM. v2 -> v3: The "FIXUP" suffix is removed. Tracked-On: #6320 Signed-off-by: Yifan Liu <yifan1.liu@intel.com>	2021-08-26 09:24:50 +08:00
Liu Long	31598ae895	ACRN:hv: Fix vcpu_dumpreg command hang issue In ACRN RT VM if the lapic is passthrough to the guest, the ipi can't trigger VM_EXIT and the vNMI is just for notification, it can't handle the smp_call function. Modify vcpu_dumpreg function prompt user switch to vLAPIC mode for vCPU register dump. Tracked-On: #6473 Signed-off-by: Liu Long <long.liu@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-25 08:54:27 +08:00
Zide Chen	0980420aea	hv: minor cleanup of hv_main.c - remove vcpu->arch.nrexits which is useless. - record full 32 bits of exit_reason to TRACE_2L(). Make the code simpler. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-25 08:49:54 +08:00
Jian Jun Chen	8de39f7b61	hv: GSI of hcall_set_irqline should be checked against target_vm GSI of hcall_set_irqline should be checked against target_vm's total GSI count instead of SOS's total GSI count. Tracked-On: #6357 Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-25 08:48:47 +08:00
Zide Chen	6d7eb6d7b6	hv: emulate IA32_EFER and adjust Load EFER VMX controls This helps to improve performance: - Don't need to execute VMREAD in vcpu_get_efer(), which is frequently called. - VMX_EXIT_CTLS_SAVE_EFER can be removed from VM-Exit Controls. - If the value of IA32_EFER MSR is identical between the host and guest (highly likely), adjust the VMX controls not to load IA32_EFER on VMExit and VMEntry. It's convenient to continue use the exiting vcpu_s/get_efer() APIs, other than the common vcpu_s/get_guest_msr(). Tracked-On: #6289 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com>	2021-08-24 11:16:53 +08:00
Liang Yi	499f62e8bd	hv: use per platform maximum physical address width MAXIMUM_PA_WIDTH will be calculated from board information. Tracked-On: #6357 Signed-off-by: Liang Yi <yi.liang@intel.com> Signed-off-by: Junjie Mao <junjie.mao@intel.com>	2021-08-20 11:02:21 +08:00
Liang Yi	2b3620de7d	hv: mask off LA57 in cpuid Mask off support of 57-bit linear addresses and five-level paging. ICX-D has LA57 but ACRN doesn't support 5-level paging yet. Tracked-On: #6357 Signed-off-by: Liang Yi <yi.liang@intel.com> Signed-off-by: Li, Fei1 <fei1.li@intel.com>	2021-08-20 11:02:21 +08:00
Shiqing Gao	91777a83b5	config_tools: add a new entry MAX_EFI_MMAP_ENTRIES It is used to specify the maximum number of EFI memmap entries. On some platforms, like Tiger Lake, the number of EFI memmap entries becomes 268 when the BIOS settings are changed. The current value of MAX_EFI_MMAP_ENTRIES (256) defined in hypervisor is not big enough to cover such cases. As the number of EFI memmap entries depends on the platforms and the BIOS settings, this patch introduces a new entry MAX_EFI_MMAP_ENTRIES in configurations so that it can be adjusted for different cases. Tracked-On: #6442 Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>	2021-08-20 09:50:39 +08:00
Shiqing Gao	651d44432c	hv: initialize the XSAVE related processor state for guest If SOS is using kernel 5.4, hypervisor got panic with #GP. Here is an example on KBL showing how the panic occurs when kernel 5.4 is used: Notes: * Physical MSR_IA32_XSS[bit 8] is 1 when physical CPU boots up. * vcpu_get_guest_msr(vcpu, MSR_IA32_XSS)[bit 8] is initialized to 0. Following thread switches would happen at run time: 1. idle thread -> vcpu thread context_switch_in happens and rstore_xsave_area is called. At this moment, vcpu->arch.xsave_enabled is false as vcpu is not launched yet and init_vmcs is not called yet (where xsave_enabled is set to true). Thus, physical MSR_IA32_XSS is not updated with the value of guest MSR_IA32_XSS. States at this point: * Physical MSR_IA32_XSS[bit 8] is 1. * vcpu_get_guest_msr(vcpu, MSR_IA32_XSS)[bit 8] is 0. 2. vcpu thread -> idle thread context_switch_out happens and save_xsave_area is called. At this moment, vcpu->arch.xsave_enabled is true. Processor state is saved to memory with XSAVES instruction. As physical MSR_IA32_XSS[bit 8] is 1, ectx->xs_area.xsave_hdr.hdr.xcomp_bv[bit 8] is set to 1 after the execution of XSAVES instruction. States at this point: * Physical MSR_IA32_XSS[bit 8] is 1. * vcpu_get_guest_msr(vcpu, MSR_IA32_XSS)[bit 8] is 0. * ectx->xs_area.xsave_hdr.hdr.xcomp_bv[bit 8] is 1. 3. idle thread -> vcpu thread context_switch_in happens and rstore_xsave_area is called. At this moment, vcpu->arch.xsave_enabled is true. Physical MSR_IA32_XSS is updated with the value of guest MSR_IA32_XSS, which is 0. States at this point: * Physical MSR_IA32_XSS[bit 8] is 0. * vcpu_get_guest_msr(vcpu, MSR_IA32_XSS)[bit 8] is 0. * ectx->xs_area.xsave_hdr.hdr.xcomp_bv[bit 8] is 1. Processor state is restored from memory with XRSTORS instruction afterwards. According to SDM Vol1 13.12 OPERATION OF XRSTORS, a #GP occurs if XCOMP_BV sets a bit in the range 62:0 that is not set in XCR0 \| IA32_XSS. So, #GP occurs once XRSTORS instruction is executed. Such issue does not happen with kernel 5.10. Because kernel 5.10 writes to MSR_IA32_XSS during initialization, while kernel 5.4 does not do such write. Once guest writes to MSR_IA32_XSS, it would be trapped to hypervisor, then, physical MSR_IA32_XSS and the value of MSR_IA32_XSS in vcpu->arch.guest_msrs are updated with the value specified by guest. So, in the point 2 above, correct processor state is saved. And #GP would not happen in the point 3. This patch initializes the XSAVE related processor state for guest. If vcpu is not launched yet, the processor state is initialized according to the initial value of vcpu_get_guest_msr(vcpu, MSR_IA32_XSS), ectx->xcr0, and ectx->xs_area. With this approach, the physical processor state is consistent with the one presented to guest. Tracked-On: #6434 Signed-off-by: Shiqing Gao <shiqing.gao@intel.com> Reviewed-by: Li Fei1 <fei1.li@intel.com>	2021-08-20 09:46:09 +08:00
Zide Chen	2e6cf2b85b	hv: nested: fix bugs in init_vmx_msrs() Currently init_vmx_msrs() emulates same value for the IA32_VMX_xxx_CTLS and IA32_VMX_TRUE_xxx_CTLS MSRs. But the value of physical MSRs could be different between the pair, and we need to adjust the emulated value accordingly. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-20 09:40:50 +08:00
Zide Chen	ad37553873	hv: nested: redundant permission check on nested_vmentry() check_vmx_permission() is called in vmresume_vmexit_handler() and vmlaunch_vmexit_handler() already. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com>	2021-08-20 08:14:40 +08:00
Yifan Liu	d575edf79a	hv: Change sched_event structure to resolve data race in event handling Currently the sched event handling may encounter data race problem, and as a result some vcpus might be stalled forever. One example can be wbinvd handling where more than 1 vcpus are doing wbinvd concurrently. The following is a possible execution of 3 vcpus: ------- 0 1 2 req [Note: 0] req bit0 set [Note: 1] IPI -> 0 req bit2 set IPI -> 2 VMExit req bit2 cleared wait vcpu2 descheduled VMExit req bit0 cleared wait vcpu0 descheduled signal 0 event0->set=true wake 0 signal 2 event2->set=true [Note: 3] wake 2 vcpu2 scheduled event2->set=false resume req req bit0 set IPI -> 0 req bit1 set IPI -> 1 (doesn't matter) vcpu0 scheduled [Note: 4] signal 0 event0->set=true (no wake) [Note: 2] event0->set=false (the rest doesn't matter) resume Any VMExit req bit0 cleared wait idle running (blocked forever) Notes: 0: req: vcpu_make_request(vcpu, ACRN_REQUEST_WAIT_WBINVD). 1: req bit: Bit in pending_req_bits. Bit0 stands for bit for vcpu0. 2: In function signal_event, At this time the event->waiting_thread is not NULL, so wake_thread will not execute 3: eventX: struct sched_event of vcpuX. 4: In function wait_event, the lock does not strictly cover the execution between schedule() and event->set=false, so other threads may kick in. ----- As shown in above example, before the last random VMExit, vcpu0 ended up with request bit set but event->set==false, so blocked forever. This patch proposes to change event->set from a boolean variable to an integer. The semantic is very similar to a semaphore. The wait_event will add 1 to this value, and block when this value is > 0, whereas signal_event will decrease this value by 1. It may happen that this value was decreased to a negative number but that is OK. As long as the wait_event and signal_event are paired and program order is observed (that is, wait_event always happens-before signal_event on a single vcpu), this value will eventually be 0. Tracked-On: #6405 Signed-off-by: Yifan Liu <yifan1.liu@intel.com>	2021-08-20 08:11:40 +08:00
Zhou, Wu	b394777908	HV: Add implements of 32bit and 64bit elf loader This is a simply implement for the 32bit and 64bit elf loader. The loading function first reads the image header, and finds the program entries that are marked as PT_LOAD, then loads segments from elf file to guest ram. After that, it finds the bss section in the elf section entries, and clear the ram area it points to. Limitations: 1. The e_type of the elf image must be ET_EXEC(executable). Relocatable or dynamic code is not supported. 2. The loader only copies program segments that has a p_type of PT_LOAD(loadable segment). Other segments are ignored. 3. The loader doesn’t support Sections that are relocatable (sh_type is SHT_REL or SHT_RELA) 4. The 64bit elf’s entry address must below 4G. 5. The elf is assumed to be able to put segments to valid guest memory. Tracked-On: #6323 Signed-off-by: Zhou, Wu <wu.zhou@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-19 20:00:45 +08:00
Zhou, Wu	c2468d2791	HV: Add elf loader sketch This patch adds a function elf_loader() to load elf image. It checks the elf header, get its 32/64 bit type, then calls the corresponding loading routines, which are empty, and will be realized later. Tracked-On: #6323 Signed-off-by: Zhou, Wu <wu.zhou@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-19 20:00:45 +08:00
Zhou, Wu	537f69dde9	HV: Add elf header file for elf loader Source: https://github.com/freebsd/freebsd-src/blob/main/sys/sys/elf_common.h Trimed to meet the minimal requirements for the Zephyr elf file to be loaded Also added elf file header data struct and program/section entry data structs. Tracked-On: #6323 Signed-off-by: Zhou, Wu <wu.zhou@intel.com> Reviewed-by: Victor Sun <victor.sun@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-19 20:00:45 +08:00
Zhou, Wu	8100b1dd56	HV: Remove 'vm_' of vm_elf_loader and etc. In order to make better sense, vm_elf_loader, vm_bzimage_loader and vm_rawimage_loader are changed to elf_loaer, bzimage_loaer and rawimage_loader. Tracked-On: #6323 Signed-off-by: Zhou, Wu <wu.zhou@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-19 20:00:45 +08:00
Zhou, Wu	53f6720d13	HV: Combine the acpi loading fucntion to one place Remove the acpi loading function from elf_loader, rawimage_loaer and bzimage_loader, and call it together in vm_sw_loader. Now the vm_sw_loader's job is not just loading sw, so we rename it to prepare_os_image. Tracked-On: #6323 Signed-off-by: Zhou, Wu <wu.zhou@intel.com> Reviewed-by: Victor Sun <victor.sun@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-19 20:00:45 +08:00
Zhou, Wu	e78aacbe55	HV: Correct some naming issues For the guest OS loaders, prapare_loading_xxx are not accurate for what those functions actually do. Now they are changed to load_xxx: load_rawimage, load_bzimage. And the 'bsp' expression is confusing in the comments for init_vcpu_protect_mode_regs, changed to a better way. Tracked-On: #6323 Signed-off-by: Zhou, Wu <wu.zhou@intel.com> Reviewed-by: Victor Sun <victor.sun@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-19 20:00:45 +08:00
Victor Sun	3124018917	HV: vm_load: rename vboot_info.h to vboot.h vboot_info.h declares vm loader function also, so rename the file name to vboot.h; Tracked-On: #6323 Signed-off-by: Victor Sun <victor.sun@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-19 20:00:45 +08:00
Victor Sun	9b632c0e4b	HV: vm_load: split vm_load.c to support diff kernel format The patch splits the vm_load.c to three parts, the loader function of bzImage kernel is moved to bzimage_loader.c, the loader function of raw image kernel is moved to rawimage_loader.c, the stub is still stayed in vm_load.c to load the corresponding kernel loader function. Each loader function could be isolated by CONFIG_GUEST_KERNEL_XXX macro which generated by config tool. Tracked-On: #6323 Signed-off-by: Victor Sun <victor.sun@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-19 20:00:45 +08:00
Victor Sun	2524572fb2	HV: vm_load: refine vm_sw_loader API Change if condition to switch in vm_sw_loader() so that the sw loader could be compiled conditionally. Tracked-On: #6323 Signed-off-by: Victor Sun <victor.sun@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-19 20:00:45 +08:00
Yang,Yu-chu	73dc610d90	config-tool: refine guest kernel types Rename KERNEL_ZEPHYR to KERNEL_RAWIMAGE. Added new type "KERNEL_ELF". Add CONFIG_GUEST_KERNEL_RAWIMAGE, CONFIG_GUEST_KERNEL_ELF and/or CONFIG_GUEST_KERNEL_BZIMAGE to config.h if it's configured. Tracked-On: #6323 Signed-off-by: Yang,Yu-chu <yu-chu.yang@intel.com> Reviewed-by: Victor Sun <victor.sun@intel.com>	2021-08-19 20:00:45 +08:00
Victor Sun	178b3e85e3	HV: vm_load: change kernel type for zephyr image Previously we only support loading raw format of zephyr image as prelaunched Zephyr VM, this would cause guest F segment overridden issue because the zephyr raw image covers memory space from 0x1000 to 0x100000 upper. To fix this issue, we should support ELF format image loading so that parse and load the multiple segments from ELF image directly. Tracked-On: #6323 Signed-off-by: Victor Sun <victor.sun@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-19 20:00:45 +08:00
Fei Li	2e7491a8ec	hv: mmiodev: a minor bug fix about refine acrn_mmiodev data structure Rename base_hpa to host_pa in acrn_mmiodev data structure. Tracked-On: #6366 Signed-off-by: Fei Li <fei1.li@intel.com>	2021-08-19 12:01:35 +08:00
Liu,Junming	2c5c8754de	hv:enable GVT-d for pre-launched linux guest in logical partion mode When pass-thru GPU to pre-launched Linux guest, need to pass GPU OpRegion to the guest. Here's the detailed steps: 1. reserve a memory region in ve820 table for GPU OpRegion 2. build EPT mapping for GPU OpRegion to pass-thru OpRegion to guest 3. emulate the pci config register for OpRegion For the third step, here's detailed description: The address of OpRegion locates on PCI config space offset 0xFC, Normal Linux guest won't write this register, so we can regard this register as read-only. When guest reads this register, return the emulated value. When guest writes this register, ignore the operation. Tracked-On: #6387 Signed-off-by: Liu,Junming <junming.liu@intel.com>	2021-08-19 11:56:26 +08:00
Jian Jun Chen	dc77ef9e52	hv: ivshmem: map SHM BAR with PAT ignored ACRN does not support the variable range vMTRR. The default memory type of vMTRR is UC. With this vMTRR emulation guest VM such as Linux refuses to map the MMIO address space as WB. In order to get better performance SHM BAR of ivshmem is mapped with PAT ignored and memory type of SHM BAR is fixed to WB. Tracked-On: #6389 Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-13 11:17:15 +08:00
Yang,Yu-chu	d997f4bbc1	config-tools: refine bin_gen.py and create virtual TPM2 acpi table Create virtual acpi table of tpm2 based on the raw data if the TPM2 device is presented and the passthrough tpm2 is enabled. Refine the arguments of bin_gen.py. The --board and --scenario take the path to the XMLs as the argument. The allocation.xml is needed for bin_gen.py to generate tpm2 acpi table. Refine the condition of tpm2_acpi_gen. The tpm2 device "MSFT0101" can be present in device id or compatible_id(CID). Check both attributes and child node of tpm2 device. Tracked-On: #6320 Signed-off-by: Yang,Yu-chu <yu-chu.yang@intel.com>	2021-08-11 14:45:55 +08:00
Fei Li	a705ff2dac	hv: relocate ACPI DATA address to 0x7fe00000 Relocate ACPI address to 0x7fe00000 and ACPI NVS to 0x7ff00000 correspondingly. In this case, we could include TPM event log region [0x7ffb0000, 0x80000000) into ACPI NVS. Tracked-On: #6320 Signed-off-by: Fei Li <fei1.li@intel.com>	2021-08-11 14:45:55 +08:00
Fei Li	74e68e39d1	hv: tpm2: do tpm2 fixup for security vm ACRN used to prepare the vTPM2 ACPI Table for pre-launched VM at the build stage using config tools. This is OK if the TPM2 ACPI Table never changes. However, TPM2 ACPI Table may be changed in some conditions: change BIOS configuration or update BIOS. This patch do TPM2 fixup to update the vTPM2 ACPI Table and TPM2 MMIO resource configuration according to the physical TPM2 ACPI Table. Tracked-On: #6366 Signed-off-by: Tao Yuhong <yuhong.tao@intel.com> Signed-off-by: Fei Li <fei1.li@intel.com>	2021-08-11 14:45:55 +08:00
Fei Li	f81b39225c	HV: refine acrn_mmiodev data structure 1. add a name field to indicate what the MMIO Device is. 2. add two more MMIO resource to the acrn_mmiodev data structure. Tracked-On: #6366 Signed-off-by: Tao Yuhong <yuhong.tao@intel.com> Signed-off-by: Fei Li <fei1.li@intel.com>	2021-08-11 14:45:55 +08:00
Fei Li	20061b7c39	hv: remove xsave dependence ACRN could run without XSAVE Capability. So remove XSAVE dependence to support more (hardware or virtual) platforms. Tracked-On: #6287 Signed-off-by: Fei Li <fei1.li@intel.com>	2021-08-10 16:36:15 +08:00
Fei Li	84235bf07c	hv: vtd: a minor refine about dmar_wait_completion Check whether condition is met before check whether time is out after iommu_read32. This is because iommu_read32 would cause time out on some virtual platform in spite of the current DMAR status meets the pre_condition. Tracked-On: #6371 Signed-off-by: Fei Li <fei1.li@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-10 16:36:15 +08:00
Tao Yuhong	171856c46b	hv: uc-lock: Fix do not trap #GP If HV enable trigger #GP for uc-lock, and is about to emulate guest uc-lock instructions, should trap guest #GP. Guest uc-lock instrucction trigger #GP, cause vmexit for #GP, HV handle this vmexit and emulate uc-lock instruction. Tracked-On: #6299 Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>	2021-08-09 15:33:12 +08:00
liu hang1	d07bd78b13	Makefile:add targz-pkg entry in Makefile User could use make targz-pkg command to generate tar package in build directory,which could help user simplify the process of installing acrn hypervisor in target board. user need to copy the tarball package to target board,and extract it to "/" directory. Tracked-On: #6355 Signed-off-by: liu hang1 <hang1.liu@intel.com> Reviewed-by: VanCutsem, Geoffroy <geoffroy.vancutsem@intel.com> Acked-by: Wang, Yu1 <yu1.wang@intel.com>	2021-08-09 11:52:27 +08:00
Kunhui-Li	578c18b962	config_tools: remove obsolete kconfig files Remove obsolete Kconfig files; Update Kconfig related README and error message. Tracked-On: #6315 Signed-off-by: Kunhui-Li <kunhuix.li@intel.com>	2021-08-09 09:25:02 +08:00
Victor Sun	4a53a23faa	HV: debug: support 64bit BAR pci uart with 32bit space Currently the HV console does not support PCI UART with 64bit BAR, but in the case that the BAR is in 64bit and the BAR space is below 4GB (i.e. the high 32bit address of the 64bit BAR is zero), HV should be able to support it. Tracked-On: #6334 Signed-off-by: Victor Sun <victor.sun@intel.com>	2021-08-04 10:10:35 +08:00
Victor Sun	2fbc4c26e6	HV: vm_load: remove kernel_load_addr in sw_kernel_info struct When guest kernel has multiple loading segments like ELF format image, just define one load address in sw_kernel_info struct is meaningless. The patch removes kernel_load_addr member in struct sw_kernel_info, the load address should be parsed in each specified format image processing. Tracked-On: #6323 Signed-off-by: Victor Sun <victor.sun@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>	2021-08-03 13:44:51 +08:00
Victor Sun	d1d59437ea	HV: vm_load: correct needed size of bzImage kernel The previous code did not load bzImage start from protected mode part, result in the protected mode part un-align with kernel_alignment field and then cause kernel decompression start from a later aligned address. In this case we had to enlarge the needed size of bzImage kernel to kernel_init_size plus double size of kernel_alignment. With loading issue of bzImage protected mode part fixed, the kernel needed size is corrected in this patch. Tracked-On: #6323 Signed-off-by: Victor Sun <victor.sun@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>	2021-08-03 13:44:51 +08:00

1 2 3 4 5 ...

3293 Commits