acrn-hypervisor

mirror of https://github.com/projectacrn/acrn-hypervisor.git synced 2025-06-04 21:29:43 +00:00

Author	SHA1	Message	Date
Liu Long	14c6e21efa	ACRN: misc: Unify terminology for sos/uos rin macro Rename SOS_VM_NUM to SERVICE_VM_NUM. rename SOS_SOCKET_PORT to SERVICE_VM_SOCKET_PORT. rename PROCESS_RUN_IN_SOS to PROCESS_RUN_IN_SERVICE_VM. rename PCI_DEV_TYPE_SOSEMUL to PCI_DEV_TYPE_SERVICE_VM_EMUL. rename SHUTDOWN_REQ_FROM_SOS to SHUTDOWN_REQ_FROM_SERVICE_VM. rename PROCESS_RUN_IN_SOS to PROCESS_RUN_IN_SERVICE_VM. rename SHUTDOWN_REQ_FROM_UOS to SHUTDOWN_REQ_FROM_USER_VM. rename UOS_SOCKET_PORT to USER_VM_SOCKET_PORT. rename SOS_CONSOLE to SERVICE_VM_OS_CONSOLE. rename SOS_LCS_SOCK to SERVICE_VM_LCS_SOCK. rename SOS_VM_BOOTARGS to SERVICE_VM_OS_BOOTARGS. rename SOS_ROOTFS to SERVICE_VM_ROOTFS. rename SOS_IDLE to SERVICE_VM_IDLE. rename SEVERITY_SOS to SEVERITY_SERVICE_VM. rename SOS_VM_UUID to SERVICE_VM_UUID. rename SOS_REQ to SERVICE_VM_REQ. rename RTCT_NATIVE_FILE_PATH_IN_SOS to RTCT_NATIVE_FILE_PATH_IN_SERVICE_VM. rename CBC_REQ_T_UOS_ACTIVE to CBC_REQ_T_USER_VM_ACTIVE. rename CBC_REQ_T_UOS_INACTIVE to CBC_REQ_T_USER_VM_INACTIV. rename uos_active to user_vm_active. Tracked-On: #6744 Signed-off-by: Liu Long <long.liu@linux.intel.com> Reviewed-by: Geoffroy Van Cutsem <geoffroy.vancutsem@intel.com>	2021-11-02 10:00:55 +08:00
Liu Long	e9c4ced460	ACRN: hv: Unify terminology for user vm Rename gpa_uos to gpa_user_vm rename base_gpa_in_uos to base_gpa_in_user_vm rename UOS_VIRT_PCI_MMCFG_BASE to USER_VM_VIRT_PCI_MMCFG_BASE rename UOS_VIRT_PCI_MMCFG_START_BUS to USER_VM_VIRT_PCI_MMCFG_START_BUS rename UOS_VIRT_PCI_MMCFG_END_BUS to USER_VM_VIRT_PCI_MMCFG_END_BUS rename UOS_VIRT_PCI_MEMBASE32 to USER_VM_VIRT_PCI_MEMBASE32 rename UOS_VIRT_PCI_MEMLIMIT32 to USER_VM_VIRT_PCI_MEMLIMIT32 rename UOS_VIRT_PCI_MEMBASE64 to USER_VM_VIRT_PCI_MEMBASE64 rename UOS_VIRT_PCI_MEMLIMIT64 to USER_VM_VIRT_PCI_MEMLIMIT64 rename UOS in comments message to User VM. Tracked-On: #6744 Signed-off-by: Liu Long <long.liu@linux.intel.com> Reviewed-by: Geoffroy Van Cutsem <geoffroy.vancutsem@intel.com>	2021-11-02 10:00:55 +08:00
Liu Long	92b7d6a9a3	ACRN: hv: Terminology modification in hv code Rename sos_vm to service_vm. rename sos_vmid to service_vmid. rename sos_vm_ptr to service_vm_ptr. rename get_sos_vm to get_service_vm. rename sos_vm_gpa to service_vm_gpa. rename sos_vm_e820 to service_vm_e820. rename sos_efi_info to service_vm_efi_info. rename sos_vm_config to service_vm_config. rename sos_vm_hpa2gpa to service_vm_hpa2gpa. rename vdev_in_sos to vdev_in_service_vm. rename create_sos_vm_e820 to create_service_vm_e820. rename sos_high64_max_ram to service_vm_high64_max_ram. rename prepare_sos_vm_memmap to prepare_service_vm_memmap. rename post_uos_sworld_memory to post_user_vm_sworld_memory rename hcall_sos_offline_cpu to hcall_service_vm_offline_cpu. rename filter_mem_from_sos_e820 to filter_mem_from_service_vm_e820. rename create_sos_vm_efi_mmap_desc to create_service_vm_efi_mmap_desc. rename HC_SOS_OFFLINE_CPU to HC_SERVICE_VM_OFFLINE_CPU. rename SOS to Service VM in comments message. Tracked-On: #6744 Signed-off-by: Liu Long <long.liu@linux.intel.com> Reviewed-by: Geoffroy Van Cutsem <geoffroy.vancutsem@intel.com>	2021-11-02 10:00:55 +08:00
Liu Long	26e507a06e	ACRN: hv: Unify terminology for service vm Rename is_sos_vm to is_service_vm Tracked-On: #6744 Signed-off-by: Liu Long <longliu@intel.com>	2021-11-02 10:00:55 +08:00
dongshen	dcafcadaf9	hv: rename some C preprocessor macros Rename some C preprocessor macros: NUM_GUEST_MSRS --> NUM_EMULATED_MSRS CAT_MSR_START_INDEX --> FLEXIBLE_MSR_INDEX NUM_VCAT_MSRS --> NUM_CAT_MSRS NUM_VCAT_L2_MSRS --> NUM_CAT_L2_MSRS NUM_VCAT_L3_MSRS --> NUM_CAT_L3_MSRS Tracked-On: #5917 Signed-off-by: Eddie Dong <eddie.dong@Intel.com>	2021-10-28 19:12:29 +08:00
dongshen	c0d95558c1	hv: vCAT: propagate vCBM to other vCPUs that share cache with vcpu Implement the propagate_vcbm() function: Set vCBM to to all the vCPUs that share cache with vcpu to mimic hardware CAT behavior Tracked-On: #5917 Signed-off-by: dongshen <dongsheng.x.zhang@intel.com> Acked-by: Eddie Dong <eddie.dong@Intel.com>	2021-10-28 19:12:29 +08:00
dongshen	a7014f4654	hv: vCAT: implementing the vCAT MSRs write handler Implement the write_vcbm() function to handle the MSR_IA32_type_MASK_n vCBM MSRs write request Call write_vclosid() to handle MSR_IA32_PQR_ASSOC MSR write request Several vCAT P2V (physical to virtual) and V2P (virtual to physical) mappings exist: struct acrn_vm_config *vm_config = get_vm_config(vm_id) max_pcbm = vm_config->max_type_pcbm (type: l2 or l3) mask_shift = ffs64(max_pcbm) vclosid = vmsr - MSR_IA32_type_MASK_0 pclosid = vm_config->pclosids[vclosid] pmsr = MSR_IA32_type_MASK_0 + pclosid pcbm = vcbm << mask_shift vcbm = pcbm >> mask_shift Where MSR_IA32_type_MASK_n: L2 or L3 mask msr address for CLOSIDn, from 0C90H through 0D8FH (inclusive). max_pcbm: a bitmask that selects all the physical cache ways assigned to the VM vclosid: virtual CLOSID, always starts from 0 pclosid: corresponding physical CLOSID for a given vclosid vmsr: virtual msr address, passed to vCAT handlers by the caller functions rdmsr_vmexit_handler()/wrmsr_vmexit_handler() pmsr: physical msr address vcbm: virtual CBM, passed to vCAT handlers by the caller functions rdmsr_vmexit_handler()/wrmsr_vmexit_handler() pcbm: physical CBM Tracked-On: #5917 Signed-off-by: dongshen <dongsheng.x.zhang@intel.com> Acked-by: Eddie Dong <eddie.dong@Intel.com>	2021-10-28 19:12:29 +08:00
dongshen	3ab50f2ef5	hv: vCAT: implementing the vCAT MSRs read handlers Implement the read_vcbm() and read_vclosid() functions to handle the MSR_IA32_PQR_ASSOC and MSR_IA32_type_MASK_n vCAT MSRs read request. Tracked-On: #5917 Signed-off-by: dongshen <dongsheng.x.zhang@intel.com> Acked-by: Eddie Dong <eddie.dong@Intel.com>	2021-10-28 19:12:29 +08:00
dongshen	be855d2352	hv: vCAT: expose CAT capabilities to vCAT-enabled VM Expose CAT feature to vCAT VM by reporting the number of cache ways/CLOSIDs via the 04H/10H cpuid instructions, so that the VM can take advantage of CAT to prioritize and partition cache resource for its own tasks. Add the vcat_pcbm_to_vcbm() function to map pcbm to vcbm Tracked-On: #5917 Signed-off-by: dongshen <dongsheng.x.zhang@intel.com> Acked-by: Eddie Dong <eddie.dong@Intel.com>	2021-10-28 19:12:29 +08:00
dongshen	77ae989379	hv: vCAT: initialize vCAT MSRs during vmcs init Initialize vCBM MSRs Initialize vCLOSID MSR Add some vCAT functions: Retrieve max_vcbm and max_pcbm Check if vCAT is configured or not for the VM Map vclosid to pclosid write_vclosid: vCLOSID MSR write handler write_vcbm: vCBM MSR write handler Tracked-On: #5917 Signed-off-by: dongshen <dongsheng.x.zhang@intel.com> Acked-by: Eddie Dong <eddie.dong@Intel.com>	2021-10-28 19:12:29 +08:00
Fei Li	1905ed6124	hv: vMSR: minor fix about rdmsr_vmexit_handler Specifying a reserved or unimplemented MSR address in ECX for rdmsr will cause a general protection exception. In this case, we should not change the contents of registers EDX:EAX. Tracked-On: #4550 Signed-off-by: Fei Li <fei1.li@intel.com>	2021-10-27 08:23:43 +08:00
dongshen	39461ef9dd	hv: vCAT: initialize the emulated_guest_msrs array for CAT msrs during platform initialization Initialize the emulated_guest_msrs[] array at runtime for MSR_IA32_type_MASK_n and MSR_IA32_PQR_ASSOC msrs, there is no good way to do this initialization statically at build time Tracked-On: #5917 Signed-off-by: dongshen <dongsheng.x.zhang@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-10-26 11:48:27 +08:00
dongshen	cb2bb78b6f	hv/config_tools: amend the struct acrn_vm_config to make it compatible with vCAT For vCAT, it may need to store more than MAX_VCPUS_PER_VM of closids, change clos in vm_config.h to a pointer to accommodate this situation Rename clos to pclosids pclosids now is a pointer to an array of physical CLOSIDs that is defined in vm_configurations.c by vmconfig. The number of elements in the array must be equal to the value given by num_pclosids Add max_type_pcbm (type: l2 or l3) to struct acrn_vm_config, which stores a bitmask that selects/covers all the physical cache ways assigned to the VM Change vmsr.c to accommodate this amended data structure Change the config-tools to generate vm_configurations.c, and fill in the num_closids and clos pointers based on the information from the scenario file. Now vm_configurations.c.xsl generates all the clos related code so remove the same code from misc_cfg.h.xsl. Examples: Scenario file: <RDT> <RDT_ENABLED>y</RDT_ENABLED> <CDP_ENABLED>n</CDP_ENABLED> <VCAT_ENABLED>y</VCAT_ENABLED> <CLOS_MASK>0x7ff</CLOS_MASK> <CLOS_MASK>0x7ff</CLOS_MASK> <CLOS_MASK>0x7ff</CLOS_MASK> <CLOS_MASK>0xff800</CLOS_MASK> <CLOS_MASK>0xff800</CLOS_MASK> <CLOS_MASK>0xff800</CLOS_MASK> <CLOS_MASK>0xff800</CLOS_MASK> <CLOS_MASK>0xff800</CLOS_MASK> /RDT> <vm id="0"> <guest_flags> <guest_flag>GUEST_FLAG_VCAT_ENABLED</guest_flag> </guest_flags> <clos> <vcpu_clos>3</vcpu_clos> <vcpu_clos>4</vcpu_clos> <vcpu_clos>5</vcpu_clos> <vcpu_clos>6</vcpu_clos> <vcpu_clos>7</vcpu_clos> </clos> </vm> <vm id="1"> <clos> <vcpu_clos>1</vcpu_clos> <vcpu_clos>2</vcpu_clos> </clos> </vm> vm_configurations.c (generated by config-tools) with the above vCAT config: static uint16_t vm0_vcpu_clos[5U] = {3U, 4U, 5U, 6U, 7U}; static uint16_t vm1_vcpu_clos[2U] = {1U, 2U}; struct acrn_vm_config vm_configs[CONFIG_MAX_VM_NUM] = { { .guest_flags = (GUEST_FLAG_VCAT_ENABLED), .pclosids = vm0_vcpu_clos, .num_pclosids = 5U, .max_l3_pcbm = 0xff800U, }, { .pclosids = vm1_vcpu_clos, .num_pclosids = 2U, }, }; Tracked-On: #5917 Signed-off-by: dongshen <dongsheng.x.zhang@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-10-26 11:48:27 +08:00
Yonghua Huang	c8e2060d37	hv: unmap IOMMU register pages from service VM EPT IOMMU hardware resource is owned by hypervisor, while IOMMU capability is reported to service VM in its ACPI table. In this case, Service VM may access IOMMU hardware resource, which is not expected. This patch unmaps all Intel IOMMU register pages for service VM EPT. Tracked-On: #6677 Signed-off-by: Yonghua Huang <yonghua.huang@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com> Reviewed-by: Victor Sun <victor.sun@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-10-22 09:31:10 +08:00
Fei Li	6c5bf4a642	hv: enhance e820_alloc_memory could allocate memory than 4G Enhance e820_alloc_memory could allocate memory than 4G. Tracked-On: #5830 Signed-off-by: Fei Li <fei1.li@intel.com>	2021-10-14 15:04:36 +08:00
Fei Li	df7ffab441	hv: remove CONFIG_HV_RAM_SIZE It's difficult to configure CONFIG_HV_RAM_SIZE properly at once. This patch not only remove CONFIG_HV_RAM_SIZE, but also we use ld linker script to dynamically get the size of HV RAM size. Tracked-On: #6663 Signed-off-by: Fei Li <fei1.li@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-10-14 15:04:36 +08:00
Zide Chen	e48962faa6	hv: optimize run_vcpu() for nested This patch implements a separate path for L2 VMEntry in run_vcpu(), which has several benefits: - keep run_vcpu() clean, to reduce the number of is_vcpu_in_l2_guest() statements: - current code has three is_vcpu_in_l2_guest() already. - supposed to have another 2 statement so that nested VMEntry won't hit the "Starting vCPU" and "vCPU launched" pr_info() and a few other statements in the VM launch path. - save few other things in run_vcpu() that are not needed for nested. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@Intel.com>	2021-10-13 15:55:31 +08:00
Zide Chen	89bbc44962	hv: inject external interrupts only if LAPIC is not passthru Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@Intel.com>	2021-10-08 09:18:34 +08:00
Zide Chen	228b052fdb	hv: operations on vcpu->reg_cached/reg_updated don't need LOCK prefix In run time, one vCPU won't read or write a register on other vCPUs, thus we don't need the LOCK prefixed instructions on reg_cached and reg_updated. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-10-08 09:11:10 +08:00
Zide Chen	2b683f8f5b	hv: call vcpu_inject_exception() only when ACRN_REQUEST_EXCP is set move the bitmap test call out of vcpu_inject_exception(), then we call the expensive bitmap_test_and_clear_lock() only pending_req_bits is non-zero and call vcpu_inject_exception() only if needed. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@Intel.com>	2021-10-07 20:48:43 +08:00
Zide Chen	f801ba4ed7	hv: update guest RIP only if vcpu->arch.inst_len is non zero In very large number of VM extis, the VM-exit instruction length could be zero, and it's no need to update VMX_GUEST_RIP. Some examples: - all external interrupt VM exits in non LAPIC passthru setup. - for all the nested VM-exits that are reflecting to L1 hypervisor. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-10-07 20:47:07 +08:00
Zide Chen	b7e9a68923	hv: code cleanup in run_vcpu() - wrap a new function exec_vmentry() to reduce code duplication. - remove exec_vmread(VMX_GUEST_RSP) since ACRN doesn't need to know guest RSP in run time. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-10-07 20:47:07 +08:00
Zide Chen	ee12daff84	hv: nested: refine vmcs12_read/write_field APIs Change "uint64_t vmcs_hva" to "void *vmcs_hva" in the input argument, list, so that no type casting is needed when calling them from pointers. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-10-07 20:45:34 +08:00
Liu,Junming	4105ca2cb4	hv: deny the launch of VM if pass-thru PIO bar isn't identical mapping In current design, when pass-thru dev, for the PIO bar, need to ensure the guest PIO start address equals to host PIO start address. Then set the VMCS io bitmap to pass-thru the corresponding port io to guest for performance. ACRN-DM and acrn-config should ensure the identical mapping of PIO bar. If ACRN-DM or acrn-config failed to achieve this, we should deny the launch of VM Tracked-On: #6508 Signed-off-by: Liu,Junming <junming.liu@intel.com> Reviewed-by: Zhao Yakui <yakui.zhao@intel.com> Reviewed-by: Fei Li <fei1.li@intel.com>	2021-09-28 08:49:01 +08:00
Victor Sun	28824c1e74	HV: init e820 before init paging In the commit of `4e1deab3d9`, we changed the init sequence that init paging first and then init e820 because we worried about the efi memory map could be beyond 4GB space on some platform. After we double checked multiboot2 spec, when system boot from multiboot2 protocol, the efi memory map info will be embedded in multiboot info so it is guaranteed that the efi memory map must be under 4GB space. Consider that the page table will be allocated in free memory space in future, we have to change the init sequence back that init e820 first and then init paging. If we need to support other boot protocol in future that the efi memory map might be put beyond 4GB, we could have below options: 1. Request bootloader put efi memory map below 4GB; 2. Call EFI_BOOT_SERVICES.GetMemoryMap() before ExitBootServices(); 3. Enable a early 64bit page table to get the efi memory map only; Tracked-On: #5626 Signed-off-by: Victor Sun <victor.sun@intel.com>	2021-09-27 09:03:15 +08:00
Zide Chen	a62dd6ad8a	hv: nested: fixed vmxoff_vmexit_handler() issue In VMXOFF vmexit handler, it's supposed to remove VMCS shadowing. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com>	2021-09-26 08:49:35 +08:00
Zide Chen	45b036e028	hv: nested: enable multiple active VMCS12 support This patch changes the size of vvmcs[] array from 1 to PER_VCPU_ACTIVE_VVMCS_NUM, and actually enables multiple active VMCS12 support in ACRN. The basic operations: - if L1 VMPTRLDs a VMCS12 without previously VMCLEAR the current VMCS12, ACRN no longer unconditionally flushes the current VMCS12 back to L1. Instead, it tries to keep both the current and the newly loaded VMCS12 in the nested->vvmcs[] array, unless: - if there is no more available vvmcs[] entry, ACRN flushes one active VMCS12 to make room for this new VMCS12. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-09-26 08:49:35 +08:00
Mingqiang Chi	f39c882359	hv:change log level for check_vmx_ctrl Some processors don't support VMX_PROCBASED_CTLS_TERTIARY bit and VMX_PROCBASED_CTLS2_UWAIT_PAUSE bit in MSRs (IA32_VMX_PROCBASED_CTLS & IA32_VMX_PROCBASED_CTLS2), HV will output error log which will cause confusion, change the log level from pr_err to pr_info. Tracked-On: #6397 Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>	2021-09-24 10:17:19 +08:00
Jie Deng	064fd7647f	hv: add priority based scheduler This patch adds a new priority based scheduler to support vCPU scheduling based on their pre-configured priorities. A vCPU can be running only if there is no higher priority vCPU running on the same pCPU. Tracked-On: #6571 Signed-off-by: Jie Deng <jie.deng@intel.com>	2021-09-24 09:32:18 +08:00
Zide Chen	94cbe909ee	hv: irq: identical vector mapping if LAPIC passthough In local APIC passthrough case, when devices triggered a INTx interrupt, this interrupt would be delivered to vCPU directly. For this case, need to set the virtual vector in the 'Interrupt Vector' field of physical IOxAPIC I/O REDIRECTION TABLE REGISTER (bits 7:0) and 'Vector' field of vt-d Interrupt Remapping Table Entry (IRTE) for Remapped Interrupts. Assumption: (a) IOAPIC pins won't be shared between LAPIC PT guest and other guests; (b) The guest would not trigger this IRQ before it switched to x2 APIC mode. Tracked-On: #5923 Signed-off-by: Zide Chen <zide.chen@intel.com>	2021-09-18 09:42:44 +08:00
Mingqiang Chi	db98f01b6e	add vmx capability check check some essential vmx capablility, will panic if processor doesn't support it. Tracked-On: #6584 Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-09-18 08:44:30 +08:00
dongshen	08d4517431	hv: fix bugs in RDT's CDP code In current RDT code, if CDP is configured, L2/L3 resources' num_closids calculation is wrong: res_cap_info[res].num_closids = (uint16_t)((edx & 0xffffU) >> 1U) + 1U; Should be: res_cap_info[res].num_closids = (uint16_t)((edx & 0xffffU) >> 1U + 1) >> 1U; Aslo, in order to enable CDP system-wide, need to enable the CDP bit (bit 0) on all pcpus, not just on pcpu 0. Tracked-On: #5917 Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>	2021-09-17 16:29:05 +08:00
dongshen	f4cdbba0bd	hv: some cosmetic fixes to rdt.c/rdt.h Rename the clos_max field in struct rdt_info to num_closids Rename variable valid_clos_num to common_num_closids and make it static Tracked-On: #5917 Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>	2021-09-17 16:29:05 +08:00
Zide Chen	0466d7055f	hv: nested: move the VMCS12 dirty flags to struct acrn_vvmcs These dirty flags are supposed to be per VMCS12, so move them from the per vCPU acrn_nested struct to the newly added acrn_vvmcs struct. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-09-17 10:58:43 +08:00
Zide Chen	4e54c3880b	hv: nested: remove vcpu->arch.nested.current_vmcs12_ptr This variable represents the L1 GPA of the current VMCS12. But it's no longer needed in the multiple active VMCS12 case, which uses the following variables for this purpose. - nested->current_vvmcs refers to the vvmcs[] entry which contains the cached current VMCS12, its associated VMCS02, and other context info. - nested->current_vvmcs->vmcs12_gpa refers to the L1 GPA of this current VMCS12. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-09-17 10:58:43 +08:00
Zide Chen	799a4d332a	hv: nested: initial implementation of struct acrn_vvmcs Add an array of struct acrn_vvmcs to struct acrn_nested, so it is possible to cache multiple active VMCS12s. This patch declares the size of this array to 1, meaning that there is only one active VMCS12. This is to minimize the logical code changes. Add pointer current_vvmcs to struct acrn_nested, which refers to the current vvmcs[] entry. In this patch, if any VMCS12 is active, it always points to vvmcs[0]. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-09-17 10:58:43 +08:00
Zide Chen	cf697e753d	hv: nested: some API signature changes No any logical changes, this patch is preparing for multiple active VMCS12 support. - currently it's easy to get the vmcs12 pointer from the vcpu pointer. In multiple active vmcs12 case, we need to explicitly add "struct acrn_vmcs12 *vmcs12" to certain APIs' input argument list, in order to get the desired vmcs12 pointer. - merge flush_current_vmcs12() into clear_vmcs02() for multiple reasons: a) it's called only once; b) we don't wrap the opposite operation (loading vmcs12) in an API; c) this API has simple and clear logic. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-09-17 10:58:43 +08:00
Zide Chen	e9eb72d319	hv: nested: flush L2 VPID only when it could conflict with L1 VPIDs By changing the way to assign L1 VPID from bottom-up to top-down, the possibilities for VPID conflicts between L1 and L2 guests are small. Then we can flush VPID just in case of conflicting. Tracked-On: #6289 Signed-off-by: Anthony Xu <anthony.xu@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-09-16 09:26:10 +08:00
Zide Chen	1ab65825ba	hv: nested: merge gpa_field_dirty and control_field_dirty flag In run time, it's rare for L1 to write to the intercepted non host-state VMCS fields, and using multiple dirty flags is not necessary. This patch uses one single dirty flag to manage all non host-state VMCS fields. This helps to simplify current code and in the future we may not need to declare new dirty flags when we intercept more VMCS fields. Tracked-On: #5923 Signed-off-by: Zide Chen <zide.chen@intel.com>	2021-09-13 15:50:01 +08:00
Zide Chen	6376d5a0d3	hv: nested: fix bug in syncing EPTP from VMCS12 to VMCS02 Currently vmptrld_vmexit_handler() doesn't sync VMX_EPT_POINTER_FULL from vmcs12 to vmcs02, instead it sets gpa_field_dirty and relies on nested_vmentry() to sync EPTP in next nested VMentry. This creates readability issue since all other intercepted VMCS fields are synced in sync_vmcs12_to_vmcs02(). Another issue is that other VMCS fields managed by gpa_field_dirty are repeatedly synced in both vmptrld and nested vmentry handler. This patch moves get_nept_desc() ahead of sync_vmcs12_to_vmcs02(), such that shadow_eptp is allocated before sync_vmcs12_to_vmcs02() which can sync EPTP properly. BTW, in nested_vmexit_handler(), don't need to read from VMCS to get the exit reason, since vcpu->arch.exit_reason has it already. Tracked-On: #5923 Signed-off-by: Zide Chen <zide.chen@intel.com>	2021-09-13 15:50:01 +08:00
Zide Chen	11c2f3eabb	hv: check bitmap before calling bitmap_test_and_clear_lock() The locked btr instruction is expensive. This patch changes the logic to ensure that the bitmap is non-zero before executing bitmap_test_and_clear_lock(). The VMX transition time gets significant improvement. SOS running on TGL, the CPUID roundtrip reduces from ~2400 cycles to ~2000 cycles. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-09-02 16:09:33 +08:00
Zide Chen	7cde4a8d40	hv: initialize host IA32_PAT MSR Currently ACRN assumes firmware setup IA32_PAT correctly. This patch explicitly initializes host IA32_PAT MSR according to ISDM Table 11-12. Memory Type Setting of PAT Entries Following a Power-up or Reset. ACRN creates host page tables based on PAT0 (WB) and PAT3 (UC). Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com>	2021-09-02 09:15:39 +08:00
Zide Chen	aeb3690b6f	hv: simplify is_lapic_pt_enabled() is_lapic_pt_enabled() is called at least twice in one loop of the vCPU thread, and it's called in vmexit_handler() frequently if LAPIC is not pass-through. Thus the efficiency of this function has direct impact to the system performance. Since the LAPIC mode is not changed in run time, we don't have to calculate it on the fly in is_lapic_pt_enabled(). BTW, removed the unused lapic_mask from struct acrn_vcpu_arch. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-26 09:52:10 +08:00
Shiqing Gao	d90dbc0d91	hv: check the capability of XSAVES/XRSTORS instructions before execution For platforms that do not support XSAVES/XRSTORS instructions, like QEMU, executing these instructions causes #UD. This patch adds the check before the execution of XSAVES/XRSTORS instructions. It also refines the logic inside rstore_xsave_area for the following reason: If XSAVES/XRSTORS instructions are supported, restore XSAVE area if any of the following conditions is met: 1. "vcpu->launched" is false (state initialization for guest) 2. "vcpu->arch.xsave_enabled" is true (state restoring for guest) * Before vCPU is launched, condition 1 is satisfied. * After vCPU is launched, condition 2 is satisfied because is_valid_xsave_combination() guarantees that "vcpu->arch.xsave_enabled" is consistent with pcpu_has_cap(X86_FEATURE_XSAVES). Therefore, the check against "vcpu->launched" and "vcpu->arch.xsave_enabled" can be eliminated here. Tracked-On: #6481 Signed-off-by: Shiqing Gao <shiqing.gao@intel.com> Acked-by: Eddie Dong <eddie.dong@Intel.com>	2021-08-26 09:42:23 +08:00
Zide Chen	cbf3825140	hv: Pass-through IA32_TSC_AUX MSR to L1 guest Use an unused MSR on host to save ACRN pcpu ID and avoid saving and restoring TSC AUX MSR on VMX transitions. Tracked-On: #6289 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Reviewed-by: Eddie Dong <eddie.dong@intel.com>	2021-08-26 09:25:54 +08:00
Yifan Liu	d33c76f701	hv: quirks: SMBIOS passthrough for prelaunched-VM This feature is guarded under config CONFIG_SECURITY_VM_FIXUP, which by default should be disabled. This patch passthrough native SMBIOS information to prelaunched VM. SMBIOS table contains a small entry point structure and a table, of which the entry point structure will be put in 0xf0000-0xfffff region in guest address space, and the table will be put in the ACPI_NVS region in guest address space. v2 -> v3: uuid_is_equal moved to util.h as inline API result -> pVendortable, in function efi_search_guid recalc_checksum -> generate_checksum efi_search_smbios -> efi_search_smbios_eps scan_smbios_eps -> mem_search_smbios_eps EFI GUID definition kept Tracked-On: #6320 Signed-off-by: Yifan Liu <yifan1.liu@intel.com>	2021-08-26 09:24:50 +08:00
Yifan Liu	975ff33e01	hv: Move uuid_is_equal to util.h This patch moves uuid_is_equal from vm_config.c to util.h as inline API. Tracked-On: #6320 Signed-off-by: Yifan Liu <yifan1.liu@intel.com>	2021-08-26 09:24:50 +08:00
Zide Chen	6d7eb6d7b6	hv: emulate IA32_EFER and adjust Load EFER VMX controls This helps to improve performance: - Don't need to execute VMREAD in vcpu_get_efer(), which is frequently called. - VMX_EXIT_CTLS_SAVE_EFER can be removed from VM-Exit Controls. - If the value of IA32_EFER MSR is identical between the host and guest (highly likely), adjust the VMX controls not to load IA32_EFER on VMExit and VMEntry. It's convenient to continue use the exiting vcpu_s/get_efer() APIs, other than the common vcpu_s/get_guest_msr(). Tracked-On: #6289 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com>	2021-08-24 11:16:53 +08:00
Liang Yi	2b3620de7d	hv: mask off LA57 in cpuid Mask off support of 57-bit linear addresses and five-level paging. ICX-D has LA57 but ACRN doesn't support 5-level paging yet. Tracked-On: #6357 Signed-off-by: Liang Yi <yi.liang@intel.com> Signed-off-by: Li, Fei1 <fei1.li@intel.com>	2021-08-20 11:02:21 +08:00
Shiqing Gao	651d44432c	hv: initialize the XSAVE related processor state for guest If SOS is using kernel 5.4, hypervisor got panic with #GP. Here is an example on KBL showing how the panic occurs when kernel 5.4 is used: Notes: * Physical MSR_IA32_XSS[bit 8] is 1 when physical CPU boots up. * vcpu_get_guest_msr(vcpu, MSR_IA32_XSS)[bit 8] is initialized to 0. Following thread switches would happen at run time: 1. idle thread -> vcpu thread context_switch_in happens and rstore_xsave_area is called. At this moment, vcpu->arch.xsave_enabled is false as vcpu is not launched yet and init_vmcs is not called yet (where xsave_enabled is set to true). Thus, physical MSR_IA32_XSS is not updated with the value of guest MSR_IA32_XSS. States at this point: * Physical MSR_IA32_XSS[bit 8] is 1. * vcpu_get_guest_msr(vcpu, MSR_IA32_XSS)[bit 8] is 0. 2. vcpu thread -> idle thread context_switch_out happens and save_xsave_area is called. At this moment, vcpu->arch.xsave_enabled is true. Processor state is saved to memory with XSAVES instruction. As physical MSR_IA32_XSS[bit 8] is 1, ectx->xs_area.xsave_hdr.hdr.xcomp_bv[bit 8] is set to 1 after the execution of XSAVES instruction. States at this point: * Physical MSR_IA32_XSS[bit 8] is 1. * vcpu_get_guest_msr(vcpu, MSR_IA32_XSS)[bit 8] is 0. * ectx->xs_area.xsave_hdr.hdr.xcomp_bv[bit 8] is 1. 3. idle thread -> vcpu thread context_switch_in happens and rstore_xsave_area is called. At this moment, vcpu->arch.xsave_enabled is true. Physical MSR_IA32_XSS is updated with the value of guest MSR_IA32_XSS, which is 0. States at this point: * Physical MSR_IA32_XSS[bit 8] is 0. * vcpu_get_guest_msr(vcpu, MSR_IA32_XSS)[bit 8] is 0. * ectx->xs_area.xsave_hdr.hdr.xcomp_bv[bit 8] is 1. Processor state is restored from memory with XRSTORS instruction afterwards. According to SDM Vol1 13.12 OPERATION OF XRSTORS, a #GP occurs if XCOMP_BV sets a bit in the range 62:0 that is not set in XCR0 \| IA32_XSS. So, #GP occurs once XRSTORS instruction is executed. Such issue does not happen with kernel 5.10. Because kernel 5.10 writes to MSR_IA32_XSS during initialization, while kernel 5.4 does not do such write. Once guest writes to MSR_IA32_XSS, it would be trapped to hypervisor, then, physical MSR_IA32_XSS and the value of MSR_IA32_XSS in vcpu->arch.guest_msrs are updated with the value specified by guest. So, in the point 2 above, correct processor state is saved. And #GP would not happen in the point 3. This patch initializes the XSAVE related processor state for guest. If vcpu is not launched yet, the processor state is initialized according to the initial value of vcpu_get_guest_msr(vcpu, MSR_IA32_XSS), ectx->xcr0, and ectx->xs_area. With this approach, the physical processor state is consistent with the one presented to guest. Tracked-On: #6434 Signed-off-by: Shiqing Gao <shiqing.gao@intel.com> Reviewed-by: Li Fei1 <fei1.li@intel.com>	2021-08-20 09:46:09 +08:00
Zide Chen	2e6cf2b85b	hv: nested: fix bugs in init_vmx_msrs() Currently init_vmx_msrs() emulates same value for the IA32_VMX_xxx_CTLS and IA32_VMX_TRUE_xxx_CTLS MSRs. But the value of physical MSRs could be different between the pair, and we need to adjust the emulated value accordingly. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-20 09:40:50 +08:00
Zide Chen	ad37553873	hv: nested: redundant permission check on nested_vmentry() check_vmx_permission() is called in vmresume_vmexit_handler() and vmlaunch_vmexit_handler() already. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com>	2021-08-20 08:14:40 +08:00
Zhou, Wu	53f6720d13	HV: Combine the acpi loading fucntion to one place Remove the acpi loading function from elf_loader, rawimage_loaer and bzimage_loader, and call it together in vm_sw_loader. Now the vm_sw_loader's job is not just loading sw, so we rename it to prepare_os_image. Tracked-On: #6323 Signed-off-by: Zhou, Wu <wu.zhou@intel.com> Reviewed-by: Victor Sun <victor.sun@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-19 20:00:45 +08:00
Victor Sun	3124018917	HV: vm_load: rename vboot_info.h to vboot.h vboot_info.h declares vm loader function also, so rename the file name to vboot.h; Tracked-On: #6323 Signed-off-by: Victor Sun <victor.sun@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-19 20:00:45 +08:00
Fei Li	2e7491a8ec	hv: mmiodev: a minor bug fix about refine acrn_mmiodev data structure Rename base_hpa to host_pa in acrn_mmiodev data structure. Tracked-On: #6366 Signed-off-by: Fei Li <fei1.li@intel.com>	2021-08-19 12:01:35 +08:00
Liu,Junming	2c5c8754de	hv:enable GVT-d for pre-launched linux guest in logical partion mode When pass-thru GPU to pre-launched Linux guest, need to pass GPU OpRegion to the guest. Here's the detailed steps: 1. reserve a memory region in ve820 table for GPU OpRegion 2. build EPT mapping for GPU OpRegion to pass-thru OpRegion to guest 3. emulate the pci config register for OpRegion For the third step, here's detailed description: The address of OpRegion locates on PCI config space offset 0xFC, Normal Linux guest won't write this register, so we can regard this register as read-only. When guest reads this register, return the emulated value. When guest writes this register, ignore the operation. Tracked-On: #6387 Signed-off-by: Liu,Junming <junming.liu@intel.com>	2021-08-19 11:56:26 +08:00
Fei Li	a705ff2dac	hv: relocate ACPI DATA address to 0x7fe00000 Relocate ACPI address to 0x7fe00000 and ACPI NVS to 0x7ff00000 correspondingly. In this case, we could include TPM event log region [0x7ffb0000, 0x80000000) into ACPI NVS. Tracked-On: #6320 Signed-off-by: Fei Li <fei1.li@intel.com>	2021-08-11 14:45:55 +08:00
Fei Li	74e68e39d1	hv: tpm2: do tpm2 fixup for security vm ACRN used to prepare the vTPM2 ACPI Table for pre-launched VM at the build stage using config tools. This is OK if the TPM2 ACPI Table never changes. However, TPM2 ACPI Table may be changed in some conditions: change BIOS configuration or update BIOS. This patch do TPM2 fixup to update the vTPM2 ACPI Table and TPM2 MMIO resource configuration according to the physical TPM2 ACPI Table. Tracked-On: #6366 Signed-off-by: Tao Yuhong <yuhong.tao@intel.com> Signed-off-by: Fei Li <fei1.li@intel.com>	2021-08-11 14:45:55 +08:00
Fei Li	f81b39225c	HV: refine acrn_mmiodev data structure 1. add a name field to indicate what the MMIO Device is. 2. add two more MMIO resource to the acrn_mmiodev data structure. Tracked-On: #6366 Signed-off-by: Tao Yuhong <yuhong.tao@intel.com> Signed-off-by: Fei Li <fei1.li@intel.com>	2021-08-11 14:45:55 +08:00
Fei Li	20061b7c39	hv: remove xsave dependence ACRN could run without XSAVE Capability. So remove XSAVE dependence to support more (hardware or virtual) platforms. Tracked-On: #6287 Signed-off-by: Fei Li <fei1.li@intel.com>	2021-08-10 16:36:15 +08:00
Fei Li	84235bf07c	hv: vtd: a minor refine about dmar_wait_completion Check whether condition is met before check whether time is out after iommu_read32. This is because iommu_read32 would cause time out on some virtual platform in spite of the current DMAR status meets the pre_condition. Tracked-On: #6371 Signed-off-by: Fei Li <fei1.li@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-10 16:36:15 +08:00
Tao Yuhong	171856c46b	hv: uc-lock: Fix do not trap #GP If HV enable trigger #GP for uc-lock, and is about to emulate guest uc-lock instructions, should trap guest #GP. Guest uc-lock instrucction trigger #GP, cause vmexit for #GP, HV handle this vmexit and emulate uc-lock instruction. Tracked-On: #6299 Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>	2021-08-09 15:33:12 +08:00
Kunhui-Li	578c18b962	config_tools: remove obsolete kconfig files Remove obsolete Kconfig files; Update Kconfig related README and error message. Tracked-On: #6315 Signed-off-by: Kunhui-Li <kunhuix.li@intel.com>	2021-08-09 09:25:02 +08:00
Fei Li	481e9fba9d	hv: remove the constraint "MMU and EPT must both support large page or not" There're some virtual platform which doesn't meet this constraint. So remove this constraint. Tracked-On: #6329 Signed-off-by: Fei Li <fei1.li@intel.com>	2021-08-03 11:01:24 +08:00
Minggui Cao	80ae3224d9	hv: expose PMC to core partition VM for core partition VM (like RTVM), PMC is always used for performance profiling / tuning, so expose PMC capability and pass-through its MSRs to the VM. Tracked-On: #6307 Signed-off-by: Minggui Cao <minggui.cao@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-07-27 14:58:28 +08:00
Minggui Cao	eba8c4e78b	hv: use ARRAY_SIZE to calc local array size if one array just used in local only, and its size not used extern, use ARRAY_SIZE macro to calculate its size. Tracked-On: #6307 Signed-off-by: Minggui Cao <minggui.cao@intel.com> Reviewed-by: Junjie Mao <junjie.mao@intel.com>	2021-07-27 14:58:28 +08:00
Yifan Liu	69fef2e685	hv: debug: Add hv console callback to VM-exit event In some scenarios (e.g., nested) where lapic-pt is enabled for a vcpu running on a pcpu hosting console timer, the hv console will be inaccessible. This patch adds the console callback to every VM-exit event so that the console can still be somewhat functional under such circumstance. Since this is VM-exit driven, the VM-exit/second can be low in certain cases (e.g., idle or running stress workload). In extreme cases where the guest panics/hangs, there will be no VM-exits at all. In most cases, the shell is laggy but functional (probably enough for debugging purpose). Tracked-On: #6312 Signed-off-by: Yifan Liu <yifan1.liu@intel.com>	2021-07-22 10:08:23 +08:00
Tao Yuhong	8360c3dfe6	HV: enable #GP for UC lock For an atomic operation using bus locking, it would generate LOCK# bus signal, if it has Non-WB memory operand. This is an UC lock. It will ruin the RT behavior of the system. If MSR_IA32_CORE_CAPABILITIES[bit4] is 1, then CPU can trigger #GP for instructions which cause UC lock. This feature is controlled by MSR_TEST_CTL[bit28]. This patch enables #GP for guest UC lock. Tracked-On: #6299 Signed-off-by: Tao Yuhong <yuhong.tao@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-07-21 11:25:47 +08:00
Tao Yuhong	2aba7f31db	HV: rename splitlock file name Because the emulation code is for both split-lock and uc-lock, rename splitlock.c/splitlock.h to lock_instr_emul.c/lock_instr_emul.h Tracked-On: #6299 Signed-off-by: Tao Yuhong <yuhong.tao@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-07-21 11:25:47 +08:00
Tao Yuhong	7926504011	HV: rename split-lock emulation APIs Because the emulation code is for both split-lock and uc-lock, Changed these API names: vcpu_kick_splitlock_emulation() -> vcpu_kick_lock_instr_emulation() vcpu_complete_splitlock_emulation() -> vcpu_complete_lock_instr_emulation() emulate_splitlock() -> emulate_lock_instr() Tracked-On: #6299 Signed-off-by: Tao Yuhong <yuhong.tao@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-07-21 11:25:47 +08:00
Tao Yuhong	bbd7b7091b	HV: re-use split-lock emulation code for uc-lock Split-lock emulation can be re-used for uc-lock. In emulate_splitlock(), it only work if this vmexit is for #AC trap and guest do not handle split-lock and HV enable #AC for splitlock. Add another condition to let emulate_splitlock() also work for #GP trap and guest do not handle uc-lock and HV enable #GP for uc-lock. Tracked-On: #6299 Signed-off-by: Tao Yuhong <yuhong.tao@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-07-21 11:25:47 +08:00
Tao Yuhong	553d59644b	HV: Fix decode_instruction() trigger #UD for emulating UC-lock When ACRN uses decode_instruction to emulate split-lock/uc-lock instruction, It is actually a try-decode to see if it is XCHG. If the instruction is XCHG instruction, ACRN must emulate it (inject #PF if it is triggered) with peer VCPUs paused, and advance the guest IP. If the instruction is a LOCK prefixed instruction with accessing the UC memory, ACRN Halted the peer VCPUs, and advance the IP to skip the LOCK prefix, and then let the VCPU Executes one instruction by enabling IRQ Windows vm-exit. For other cases, ACRN injects the exception back to VCPU without emulating it. So change the API to decode_instruction(vcpu, bool full_decode), when full_decode is true, the API does same thing as before. When full_decode is false, the different is if decode_instruction() meet unknown instruction, will keep return = -1 and do not inject #UD. We can use this to distinguish that an #UD has been skipped, and need inject #AC/#GP back. Tracked-On: #6299 Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>	2021-07-21 11:25:47 +08:00
Shuo A Liu	6e0b12180c	hv: dm: Use new power management data structures struct cpu_px_data -> struct acrn_pstate_data struct cpu_cx_data -> struct acrn_cstate_data enum pm_cmd_type -> enum acrn_pm_cmd_type struct acpi_generic_address -> struct acrn_acpi_generic_address cpu_cx_data -> acrn_cstate_data cpu_px_data -> acrn_pstate_data IC_PM_GET_CPU_STATE -> ACRN_IOCTL_PM_GET_CPU_STATE PMCMD_GET_PX_CNT -> ACRN_PMCMD_GET_PX_CNT PMCMD_GET_CX_CNT -> ACRN_PMCMD_GET_CX_CNT PMCMD_GET_PX_DATA -> ACRN_PMCMD_GET_PX_DATA PMCMD_GET_CX_DATA -> ACRN_PMCMD_GET_CX_DATA Tracked-On: #6282 Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>	2021-07-15 11:53:54 +08:00
Shuo A Liu	9c910bae44	hv: dm: Use new I/O request data structures struct vhm_request -> struct acrn_io_request union vhm_request_buffer -> struct acrn_io_request_buffer struct pio_request -> struct acrn_pio_request struct mmio_request -> struct acrn_mmio_request struct ioreq_notify -> struct acrn_ioreq_notify VHM_REQ_PIO_INVAL -> IOREQ_PIO_INVAL VHM_REQ_MMIO_INVAL -> IOREQ_MMIO_INVAL REQ_PORTIO -> ACRN_IOREQ_TYPE_PORTIO REQ_MMIO -> ACRN_IOREQ_TYPE_MMIO REQ_PCICFG -> ACRN_IOREQ_TYPE_PCICFG REQ_WP -> ACRN_IOREQ_TYPE_WP REQUEST_READ -> ACRN_IOREQ_DIR_READ REQUEST_WRITE -> ACRN_IOREQ_DIR_WRITE REQ_STATE_PROCESSING -> ACRN_IOREQ_STATE_PROCESSING REQ_STATE_PENDING -> ACRN_IOREQ_STATE_PENDING REQ_STATE_COMPLETE -> ACRN_IOREQ_STATE_COMPLETE REQ_STATE_FREE -> ACRN_IOREQ_STATE_FREE IC_CREATE_IOREQ_CLIENT -> ACRN_IOCTL_CREATE_IOREQ_CLIENT IC_DESTROY_IOREQ_CLIENT -> ACRN_IOCTL_DESTROY_IOREQ_CLIENT IC_ATTACH_IOREQ_CLIENT -> ACRN_IOCTL_ATTACH_IOREQ_CLIENT IC_NOTIFY_REQUEST_FINISH -> ACRN_IOCTL_NOTIFY_REQUEST_FINISH IC_CLEAR_VM_IOREQ -> ACRN_IOCTL_CLEAR_VM_IOREQ HYPERVISOR_CALLBACK_VHM_VECTOR -> HYPERVISOR_CALLBACK_HSM_VECTOR arch_fire_vhm_interrupt() -> arch_fire_hsm_interrupt() get_vhm_notification_vector() -> get_hsm_notification_vector() set_vhm_notification_vector() -> set_hsm_notification_vector() acrn_vhm_notification_vector -> acrn_hsm_notification_vector get_vhm_req_state() -> get_io_req_state() set_vhm_req_state() -> set_io_req_state() Below structures have slight difference with former ones. struct acrn_ioreq_notify strcut acrn_io_request Tracked-On: #6282 Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>	2021-07-15 11:53:54 +08:00
Shuo A Liu	107cae316a	hv: dm: Use new ioctl ACRN_IOCTL_SET_VCPU_REGS struct acrn_set_vcpu_regs -> struct acrn_vcpu_regs struct acrn_vcpu_regs -> struct acrn_regs IC_SET_VCPU_REGS -> ACRN_IOCTL_SET_VCPU_REGS Tracked-On: #6282 Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>	2021-07-15 11:53:54 +08:00
Shuo A Liu	f476ca55ab	hv: dm: Use new VM management ioctls IC_CREATE_VM -> ACRN_IOCTL_CREATE_VM IC_DESTROY_VM -> ACRN_IOCTL_DESTROY_VM IC_START_VM -> ACRN_IOCTL_START_VM IC_PAUSE_VM -> ACRN_IOCTL_PAUSE_VM IC_RESET_VM -> ACRN_IOCTL_RESET_VM struct acrn_create_vm -> struct acrn_vm_creation Tracked-On: #6282 Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>	2021-07-15 11:53:54 +08:00
Shuo A Liu	9c1caad25a	hv: nested: Keep privilege bits sync in shadow EPT entry Guest may not use INVEPT instruction after enabling any of bits 2:0 from 0 to 1 of a present EPT entry, then the shadow EPT entry has no chance to sync guest EPT entry. According to the SDM, """ Software may use the INVEPT instruction after modifying a present EPT paging-structure entry (see Section 28.2.2) to change any of the privilege bits 2:0 from 0 to 1.1 Failure to do so may cause an EPT violation that would not otherwise occur. Because an EPT violation invalidates any mappings that would be used by the access that caused the EPT violation (see Section 28.3.3.1), an EPT violation will not recur if the original access is performed again, even if the INVEPT instruction is not executed. """ Sync the afterthought of privilege bits from guest EPT entry to shadow EPT entry to cover above case. Tracked-On: #5923 Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-07-02 09:24:12 +08:00
Shuo A Liu	a431cff94e	hv: Use 64 bits definition for 64 bits MSR_IA32_VMX_EPT_VPID_CAP operation MSR_IA32_VMX_EPT_VPID_CAP is 64 bits. Using 32 bits MACROs with it may cause the bit expression wrong. Unify the MSR_IA32_VMX_EPT_VPID_CAP operation with 64 bits definition. Tracked-On: #5923 Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-07-02 09:24:12 +08:00
Victor Sun	e371432695	HV: avoid pre-launched VM modules being corrupted by SOS kernel load When hypervisor boots, the multiboot modules have been loaded to host space by bootloader already. The space range of pre-launched VM modules is also exposed to SOS VM, so SOS VM kernel might pick this range to extract kernel when KASLR enabled. This would corrupt pre-launched VM modules and result in pre-launched VM boot fail. This patch will try to fix this issue. The SOS VM will not be loaded to guest space until all pre-launched VMs are loaded successfully. Tracked-On: #5879 Signed-off-by: Victor Sun <victor.sun@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>	2021-06-11 10:06:02 +08:00
Victor Sun	268d4c3f3c	HV: boot guest with boot params Previously the load GPA of LaaG boot params like zeropage/cmdline and initgdt are all hard-coded, this would bring potential LaaG boot issues. The patch will try to fix this issue by finding a 32KB load_params memory block for LaaG to store these guest boot params. For other guest with raw image, in general only vgdt need to be cared of so the load_params will be put at 0x800 since it is a common place that most guests won't touch for entering protected mode. Tracked-On: #5626 Signed-off-by: Victor Sun <victor.sun@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>	2021-06-11 10:06:02 +08:00
Victor Sun	ed97022646	HV: add find_space_from_ve820() api The API would search ve820 table and return a valid GPA when the requested size of memory is available in the specified memory range, or return INVALID_GPA if the requested memory slot is not available; Tracked-On: #5626 Signed-off-by: Victor Sun <victor.sun@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>	2021-06-11 10:06:02 +08:00
Victor Sun	6127c0c5d2	HV: modify low 1MB area for pre-launched VM e820 The memory range of [0xA0000, 0xFFFFF] is a known reserved area for BIOS, actually Linux kernel would enforce this area to be reserved during its boot stage. Set this area to usable would cause potential compatibility issues. The patch set the range to reserved type to make it consistent with the real world. BTW, There should be a EBDA(Entended BIOS DATA Area) with reserved type exist right before 0xA0000 in real world for non-EFI boot. But given ACRN has no legacy BIOS emulation, we simply skipped the EBDA in vE820. Tracked-On: #5626 Signed-off-by: Victor Sun <victor.sun@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>	2021-06-11 10:06:02 +08:00
Victor Sun	9dfac7a7a3	HV: init hv_e820 from efi mmap if boot from uefi Hypervisor use e820_alloc_memory() api to allocate memory for trampoline code and ept pages, whereas the usable ram in hv_e820 might include efi boot service region if system boot from uefi environment, this would result in some uefi service broken in SOS. These boot service region should be filtered from hv_e820. This patch will parse the efi memory descriptor entries info from efi memory map pointer when system boot from uefi environment, and then initialize hv_e820 accordingly, that all efi boot service region would be kept as reserved in hv_e820. Please note the original efi memory map could be above 4GB address space, so the efi memory parsing process must be done after enable_paging(). Tracked-On: #5626 Signed-off-by: Victor Sun <victor.sun@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>	2021-06-11 10:06:02 +08:00
Victor Sun	4e1deab3d9	HV: init paging before init e820 With this patch, the hv_e820 will be initialized after enable paging. This is because the hv_e820 will be initialized from efi mmap when system boot from uefi, which the efi mmap could be above 4G space. Tracked-On: #5626 Signed-off-by: Victor Sun <victor.sun@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>	2021-06-11 10:06:02 +08:00
Victor Sun	82a1d4406c	HV: modularization: use abi_mmap struct in acrn boot info Use more generic abi_mmap struct to replace multiboot_mmap struct in acrn_boot_info; Tracked-On: #5661 Signed-off-by: Victor Sun <victor.sun@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>	2021-06-11 10:06:02 +08:00
Victor Sun	484d3ec9df	HV: modularization: use cmdline char array in acrn boot info The name of mi_cmdline in acrn_boot_info structure would cause confusion with mi_cmdline in multiboot_info structure, rename it to cmdline. At the same time, the data type is changed from pointer to array to avoid accessing the original multiboot info region which might be used by other software modules. Tracked-On: #5661 Signed-off-by: Victor Sun <victor.sun@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>	2021-06-11 10:06:02 +08:00
Victor Sun	b11dfb6f20	HV: modularization: add boot.c to wrap multiboot module Add a wrapper API init_acrn_boot_info() so that it could be used to boot ACRN with any boot protocol; Another change is change term of multiboot1 to multiboot because there is no such term officially; Tracked-On: #5661 Signed-off-by: Victor Sun <victor.sun@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>	2021-06-11 10:06:02 +08:00
Victor Sun	28b7cee412	HV: modularization: rename multiboot.h to boot.h Given the structure in multiboot.h could be used for any boot protocol, use a more generic name "boot.h" instead; Tracked-On: #5661 Signed-off-by: Victor Sun <victor.sun@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>	2021-06-11 10:06:02 +08:00
Victor Sun	e8f726e321	HV: modularization: remove mi_flags from acrn boot info The mi_flags is not needed any more so remove it from acrn_boot_info struct; Tracked-On: #5661 Signed-off-by: Victor Sun <victor.sun@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>	2021-06-11 10:06:02 +08:00
Victor Sun	8f24d91108	HV: modularization: name change on acrn_multiboot_info The acrn_multiboot_info structure stores acrn specific boot info and should not be limited to support multiboot protocol related structure only. This patch only do below changes: 1. change name of acrn_multiboot_info to acrn_boot_info; 2. change name of mbi to abi because of the change in 1, also the naming might bring confusion with native multiboot info; Tracked-On: #5661 Signed-off-by: Victor Sun <victor.sun@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>	2021-06-11 10:06:02 +08:00
Shuo A Liu	9ae32f96af	hv: Wrap same code as a static function vmptrld_vmexit_handler() has a same code snippet with vmclear_vmexit_handler(). Wrap the same code snippet as a static function clear_vmcs02(). There is only a small logic change that add nested->current_vmcs12_ptr = INVALID_GPA in vmptrld_vmexit_handler() for the old VMCS. That's reasonable. Tracked-On: #5923 Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-06-09 10:07:05 +08:00
Shuo A Liu	387ea23961	hv: Rename get_ept_entry() to get_eptp() get_ept_entry() actually returns the EPTP of a VM. So rename it to get_eptp() for readability. Tracked-On: #5923 Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-06-09 10:07:05 +08:00
Zide Chen	b6b5373818	hv: deny access to HV owned legacy PIO UART from SOS We need to deny accesses from SOS to the HV owned UART device, otherwise SOS could have direct access to this physical device and mess up the HV console. If ACRN debug UART is configured as PIO based, For example, CONFIG_SERIAL_PIO_BASE is generated from acrn-config tool, or the UART config is overwritten by hypervisor parameter "uart=port@<port address>", it could run into problem if ACRN doesn't emulate this UART PIO port to SOS. For example: - none of the ACRN emulated vUART devices has same PIO port with the port of the debug UART device. - ACRN emulates PCI vUART for SOS (configure "console_vuart" with PCI_VUART in the scenario configuration) This patch fixes the above issue by masking PIO accesses from SOS. deny_hv_owned_devices() is moved after setup_io_bitmap() where vm->arch_vm.io_bitmap is initialized. Commit `50d852561` ("HV: deny HV owned PCI bar access from SOS") handles the case that ACRN debug UART is configured as a PCI device. e.g., hypervisor parameter "uart=bdf@<BDF value>" is appended. If the hypervisor debug UART is MMIO based, need to configured it as a PCI type device, so that it can be hidden from SOS. Tracked-On: #5923 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-06-08 16:16:14 +08:00
Yonghua Huang	25c0e3817e	hv: validate input for dmar_free_irte function Malicious input 'index' may trigger buffer overflow on array 'irte_alloc_bitmap[]'. This patch validate that 'index' shall be less than 'CONFIG_MAX_IR_ENTRIES' and also remove unnecessary check on 'index' in 'ptirq_free_irte()' function with this fix. Tracked-On: #6132 Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>	2021-06-08 09:03:10 +08:00
Yonghua Huang	4acaeb91bd	hv: remove unnecessary ASSERT in vlapic_write vlapic_write handle 'offset' that is valid and ignore all other invalid 'offset'. so ASSERT on this 'offset' input is unnecessary. This patch removes above ASSERT to avoid potential hypervisor crash by guest malicious input when debug build is used. Tracked-On: #6131 Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>	2021-06-08 09:03:10 +08:00
Shuo A Liu	15e6c5b9cf	hv: nested: audit guest EPT mapping during shadow EPT entries setup generate_shadow_ept_entry() didn't verify the correctness of the requested guest EPT mapping. That might leak host memory access to L2 VM. To simplify the implementation of the guest EPT audit, hide capabilities 'map 2-Mbyte page' and 'map 1-Gbyte page' from L1 VM. In addition, minimize the attribute bits of EPT entry when create a shadow EPT entry. Also, for invalid requested mapping address, reflect the EPT_VIOLATION to L1 VM. Here, we have some TODOs: 1) Enable large page support in generate_shadow_ept_entry() 2) Evaluate if need to emulate the invalid GPA access of L2 in HV directly. 3) Minimize EPT entry attributes. Tracked-On: #5923 Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-06-04 13:53:47 +08:00
Shuo A Liu	3110e70d0a	hv: nested: INVEPT emulation supports shadow EPT L1 VM changes the guest EPT and do INVEPT to invalidate the previous TLB cache of EPT entries. The shadow EPT replies on INVEPT instruction to do the update. The target shadow EPTs can be found according to the 'type' of INVEPT. Here are two types and their target shadow EPT, 1) Single-context invalidation Get the EPTP from the INVEPT descriptor. Then find the target shadow EPT. 2) Global invalidation All shadow EPTs of the L1 VM. The INVEPT emulation handler invalidate all the EPT entries of the target shadow EPTs. Tracked-On: #5923 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-06-04 13:53:47 +08:00
Shuo A Liu	1dc7b7f798	hv: nested: Introduce shadow EPT release function When a shadow EPT is not used anymore, its resources need to be released. free_sept_table() is introduced to walk the whole shadow EPT table and free the pagetable pages. Please note, the PML4E page of shadow EPT is not freed by free_sept_table() as it still be used to present a shadow EPT pointer. Tracked-On: #5923 Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-06-04 13:53:47 +08:00
Shuo A Liu	b10b5658bd	hv: nested: Introduce L2 VM EPT VIOLATION handler With shadow EPT, the hypervisor walks through guest EPT table: * If the entry is not present in guest EPT, ACRN injects EPT_VIOLATION to L1 VM and resumes to L1 VM. * If the entry is present in guest EPT, do the EPT_MISCONFIG check. Inject EPT_MISCONFIG to L1 VM if the check failed. * If the entry is present in guest EPT, do permission check. Reflect EPT_VIOLATION to L1 VM if the check failed. * If the entry is present in guest EPT but shadow EPT entry is not present, create the shadow entry and resumes to L2 VM. * If the entry is present in guest EPT but the GPA in the entry is invalid, injects EPT_VIOLATION to L1 VM and resumes L1 VM. Tracked-On: #5923 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-06-04 13:53:47 +08:00
Shuo A Liu	8565750bbe	hv: nested: Hide some capability bits from L1 guest * Hide 5 level EPT capability, let L1 guest stick to 4 level EPT. * Access/Dirty bits are not support currently, hide corresponding EPT capability bits. * "Mode-based execute control for EPT" is also not support well currently, hide its capability bit from MSR_IA32_VMX_PROCBASED_CTLS2. Tracked-On: #5923 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-06-04 13:53:47 +08:00
Shuo A Liu	540a484147	hv: nested: Manage shadow EPTP according to guest VMCS change 'struct nept_desc' is used to associate guest EPTP with a shadow EPTP. It's created in the first reference and be freed while no reference. The life cycle seems like, While guest VMCS VMX_EPT_POINTER_FULL is changed, the 'struct nept_desc' of the new guest EPTP is referenced; the 'struct nept_desc' of the old guest EPTP is dereferenced. While guest VMCS be cleared(by VMCLEAR in L1 VM), the 'struct nept_desc' of the old guest EPTP is dereferenced. While a new guest VMCS be loaded(by VMPTRLD in L1 VM), the 'struct nept_desc' of the new guest EPTP is referenced. The 'struct nept_desc' of the old guest EPTP is dereferenced. Tracked-On: #5923 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-06-04 13:53:47 +08:00
Shuo A Liu	10ec896f99	hv: nested: Introduce shadow EPT infrastructure To shadow guest EPT, the hypervisor needs construct a shadow EPT for each guest EPT. The key to associate a shadow EPT and a guest EPT is the EPTP (EPT pointer). This patch provides following structure to do the association. struct nept_desc { /* * A shadow EPTP. * The format is same with 'EPT pointer' in VMCS. * Its PML4 address field is a HVA of the hypervisor. / uint64_t shadow_eptp; / * An guest EPTP configured by L1 VM. * The format is same with 'EPT pointer' in VMCS. * Its PML4 address field is a GPA of the L1 VM. */ uint64_t guest_eptp; uint32_t ref_count; }; Due to lack of dynamic memory allocation of the hypervisor, a array nept_bucket of type 'struct nept_desc' is introduced to store those association information. A guest EPT might be shared between different L2 vCPUs, so this patch provides several functions to handle the reference of the structure. Interface get_shadow_eptp() also is introduced. To find the shadow EPTP of a specified guest EPTP. Tracked-On: #5923 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-06-04 13:53:47 +08:00
Shuo A Liu	17bc7f08c9	hv: nested: Create a page pool for shadow EPT construction Shadow EPT uses lots of pages to construct the shadow page table. To utilize the memory more efficient, a page poll sept_page_pool is introduced. For simplicity, total platform RAM size is considered to calculate the memory needed for shadow page tables. This is not an accurate upper bound. This can satisfy typical use-cases where there is not a lot of overcommitment and sharing of memory between L2 VMs. Memory of the pool is marked as reserved from E820 table in early stage. Tracked-On: #5923 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-06-04 13:53:47 +08:00
Zide Chen	811e367ad9	hv: nested: implement nested VM exit handler Nested VM exits happen when vCPU is in guest mode (VMCS02 is current). Initially we reflect all nested VM exits to L1 hypervisor. To prepare the environment to run L1 guest: - restore some VMCS fields to the value as what L1 hypervisor programmed. - VMCLEAR VMCS02, VMPTRLD VMCS01 and enable VMCS shadowing. - load the non-shadowing host states from VMCS12 to VMCS01 guest states. - VMRESUME to L1 guest with this modified VMCS01. Tracked-On: #5923 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Signed-off-by: Alexander Merritt <alex.merritt@intel.com>	2021-06-03 15:23:25 +08:00
Zide Chen	22d225663f	hv: nested: update run_vcpu() function for nested case Since L2 guest vCPU mode and VPID are managed by L1 hypervisor, so we can skip these handling in run_vcpu(). And be careful that we can't cache L2 registers in struct acrn_vcpu. Tracked-On: #5923 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-06-03 15:23:25 +08:00
Zide Chen	4acc65eacc	hv: nested: support for INVEPT and INVVPID emulation invvpid and invept instructions cause VM exits unconditionally. For initial support, we pass all the instruction operands as is to the pCPU. Tracked-On: #5923 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-06-03 15:23:25 +08:00
Zide Chen	4c29a0bb29	hv: nested: support for VMLAUNCH and VMRESUME emulation Implement the VMLAUNCH and VMRESUME instructions, allowing a L1 hypervisor to run nested guests. - merge VMCS control fields and VMCS guest fields to VMCS02 - clear shadow VMCS indicator on VMCS02 and load VMCS02 as current - set VMCS12 launch state to "launched" in VMLAUNCH handler Tracked-On: #5923 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Signed-off-by: Alex Merritt <alex.merritt@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-06-03 15:23:25 +08:00
Yonghua Huang	1a6ead9af5	hv: update RTCT ACPI table detecting Signature of RTCT ACPI table maybe "PTCT"(v1) or "RTCT"(v2). and the MAGIC number in CRL header is also changed from "PTCM" to "RTCM". This patch refine the code to detect RTCT table for both v1 and v2. Tracked-On: #6020 Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>	2021-06-01 08:22:20 +08:00
Li Fei1	0d5f12e281	hv: vlapic: a minor refine about vlapic_x2apic_pt_icr_access In physical destination mode, the destination processor is specified by its local APIC ID. When a CPU switch xAPIC Mode to x2APIC Mode or vice versa, the local APIC ID is not changed. So a vcpu in x2APIC Mode could use physical Destination Mode to send an IPI to another vcpu in xAPIC Mode by writing ICR. This patch adds support for a vCPU A could write ICR to send IPI to another vCPU B which is in different APIC mode. Tracked-On: #5923 Signed-off-by: Li Fei1 <fei1.li@intel.com>	2021-05-27 09:00:08 +08:00
dongshen	5e3c6ae941	hv: vcpuid: passthrough host CPUID leaf.0BH to guest VMs Using physical APIC IDs as vLAPIC IDs for pre-Launched and post-launched VMs is not sufficient to replicate the host CPU and cache topologies in guest VMs, we also need to passthrough host CPUID leaf.0BH to guest VMs, otherwise, guest VMs may see weird CPU topology. Note that in current code, ACRN has already passthroughed host cache CPUID leaf 04H to guest VMs Tracked-On: #6020 Reviewed-by: Wang, Yu1 <yu1.wang@intel.com> Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>	2021-05-26 11:23:06 +08:00
dongshen	f332ef15b2	hv: vlapic: use physical APIC IDs as vLAPIC IDs for pre-launched and post-launched VMs In current code, ACRN uses physical APIC IDs as vLAPIC IDs for SOS, and vCPU ids (contiguous) as vLAPIC IDs for pre-Launched and post-Launched VMs. Using vCPU ids as vLAPIC IDs for pre-Launched and post-Launched VMs would result in wrong CPU and cache topologies showing in the guest VMs, and could adversely affect performance if the guest VM chooses to detect CPU and cache topologies and optimize its behavior accordingly. Uses physical APIC IDs as vLAPIC IDs (and related CPU/cache topology enumeration CPUIDs passthrough) will replicate the host CPU and cache topologies in pre-Launched and post-Launched VMs. Tracked-On: #6020 Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com> Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>	2021-05-26 11:23:06 +08:00
Zide Chen	6428ca8f5b	hv: VMPTRLD and VMCLEAR VMCS with the common APIs Remove the direct calls to exec_vmptrld() or exec_vmclear(), and replace with the wrapper APIs load_va_vmcs() and clear_va_vmcs(). Tracked-On: #5923 Signed-off-by: Zide Chen <zide.chen@intel.com>	2021-05-26 11:22:26 +08:00
Zide Chen	6d69058a9d	hv: nested: support for VMREAD and VMWRITE emulation This patch implements the VMREAD and VMWRITE instructions. When L1 guest is running with an active VMCS12, the “VMCS shadowing” VM-execution control is always set to 1 in VMCS01. Thus the possible behavior of VMREAD or VMWRITE from L1 could be: - It causes a VM exit to L0 if the bit corresponds to the target VMCS field in the VMREAD bitmap or VMWRITE bitmap is set to 1. - It accesses the VMCS referenced by VMCS01 link pointer (VMCS02 in our case) if the above mentioned bit is set to 0. This patch handles the VMREAD and VMWRITE VM exits in this way: - on VMWRITE, it writes the desired VMCS value to the respective field in the cached VMCS12. For VMCS fields that need to be synced to VMCS02, sets the corresponding dirty flag. - on VMREAD, it reads the desired VMCS value from the cached VMCS12. Tracked-On: #5923 Signed-off-by: Alex Merritt <alex.merritt@intel.com> Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@Intel.com>	2021-05-24 10:34:01 +08:00
Zide Chen	2bd269c11c	hv: nested: support for VMCLEAR emulation This patch is to emulate VMCLEAR instruction. L1 hypervisor issues VMCLEAR on a VMCS12 whose state could be any of these: active and current, active but not current, not yet VMPTRLDed. To emulate the VMCLEAR instruction, ACRN sets the VMCS12 launch state to "clear", and if L0 already cached this VMCS12, need to sync it back to guest memory: - sync shadow fields from shadow VMCS VMCS to cache VMCS12 - copy cache VMCS12 to L1 guest memory Tracked-On: #5923 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com>	2021-05-24 10:34:01 +08:00
Zide Chen	5379b14108	hv: nested: define VMCS shadow fields Enable VMCS shadowing for most of the VMCS fields, so that execution of the VMREAD or VMWRITE on these shadow VMCS fields from L1 hypervisor won't cause VM exits, but read from or write to the shadow VMCS. Tracked-On: #5923 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: Alexander Merritt <alex.merritt@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com>	2021-05-24 10:34:01 +08:00
Zide Chen	863e58e539	hv: nested: define software layout for VMCS12 and helper functions Software layout of VMCS12 data is a contract between L1 guest and L0 hypervisor to run a L2 guest. ACRN hypervisor caches the VMCS12 which is passed down from L1 hypervisor by the VMPTRLD instructin. At the time of VMCLEAR, ACRN syncs the cached VMCS12 back to L1 guest memory. Tracked-On: #5923 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@Intel.com>	2021-05-24 10:34:01 +08:00
Zide Chen	f5744174b5	hv: nested: support for VMPTRLD emulation This patch emulates the VMPTRLD instruction. L0 hypervisor (ACRN) caches the VMCS12 that is passed down from the VMPTRLD instruction, and merges it with VMCS01 to create VMCS02 to run the nested VM. - Currently ACRN can't cache multiple VMCS12 on one vCPU, so it needs to flushes active but not current VMCS12s to L1 guest. - ACRN creates VMCS02 to run nested VM based on VMCS12: 1) copy VMCS12 from guest memory to the per vCPU cache VMCS12 2) initialize VMCS02 revision ID and host-state area 3) load shadow fields from cache VMCS12 to VMCS02 4) enable VMCS shadowing before L1 Vm entry Tracked-On: #5923 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com>	2021-05-24 10:34:01 +08:00
Zide Chen	0a1ac2f4a0	hv: nested: support for VMXOFF emulation This patch implements the VMXOFF instruction. By issuing VMXOFF, L1 guest Leaves VMX Operation. - cleanup VCPU nested virtualization context states in VMXOFF handler. - implement check_vmx_permission() to check permission for VMX operation for VMXOFF and other VMX instructions. Tracked-On: #5923 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@Intel.com>	2021-05-24 10:34:01 +08:00
Zide Chen	3fdad3c6d1	hv: nested: check prerequisites to enter VMX operation According to VMXON Instruction Reference, do the following checks in the virtual hardware environment: vCPU CPL, guest CR0, CR4, revision ID in VMXON region, etc. Currently ACRN doesn't support 32-bit L1 hypervisor, and injects an #UD exception if L1 hypervisor is not running in 64-bit mode. Tracked-On: #5923 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@Intel.com>	2021-05-24 10:34:01 +08:00
Zide Chen	fc8f07e740	hv: nested: support for VMXON emulation This patch emulates VMXON instruction. Basically checks some prerequisites to enable VMX operation on L1 guest (next patch), and prepares some virtual hardware environment in L0. Tracked-On: #5923 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@Intel.com>	2021-05-24 10:34:01 +08:00
Tao Yuhong	b93d6b2ef0	HV: Fix mistake use stac() & clac() The commit `2ab70f43e5` HV: cache: Fix page fault by flushing cache for VM trusty RAM in HV It is wrong in using stac()/clac() Tracked-On: #6020 Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>	2021-05-24 10:32:54 +08:00
Li Fei1	6077152c4b	hv: vlapic: extend vlapic_x2apic_pt_icr_access to support more destination mode Now guest would use `Destination Shorthand` to broadcast IPIs if there're more than one destination. However, it is not supported when the guest is in LAPIC passthru situation, and all active VCPUs are working in X2APIC mode. As a result, the guest would not work properly since this kind broadcast IPIs was ignored by ACRN. What's worse, ACRN Hypervisor would inject GP to the guest in this case. This patch extend vlapic_x2apic_pt_icr_access to support more destination modes (both `Physical` and `Logical`) and destination shorthand (`No Shorthand`, `Self`, `All Including Self` and `All Excluding Self`). Tracked-On: #5923 Signed-off-by: Zide Chen <zide.chen@intel.com> Signed-off-by: Li Fei1 <fei1.li@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-05-24 10:27:32 +08:00
Li Fei1	a69e67b58b	hv: vlapic: wrap a function to calculate destination vcpu mask by shorthand 1. Rename vlapic_calc_dest to vlapic_calc_dest_noshort 2. Remove vlapic_calc_dest_lapic_pt, use vlapic_calc_dest_noshort instead 3. Wrap vlapic_calc_dest to calculate destination vcpu mask according shorthand Tracked-On: #5923 Signed-off-by: Zide Chen <zide.chen@intel.com> Signed-off-by: Li Fei1 <fei1.li@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-05-24 10:27:32 +08:00
Tao Yuhong	2ab70f43e5	HV: cache: Fix page fault by flushing cache for VM trusty RAM in HV The accrss right of HV RAM can be changed to PAGE_USER (eg. trusty RAM of post-launched VM). So before using clflush(or clflushopt) to flush HV RAM cache, must allow explicit supervisor-mode data accesses to user-mode pages. Otherwise, it may trigger page fault. Tracked-On: #6020 Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>	2021-05-21 09:20:46 +08:00
Liang Yi	6805510d77	hv/mod_timer: refine timer interface 1. do not allow external modules to touch internal field of a timer. 2. make timer mode internal, period_in_ticks will decide the mode. API wise: 1. the "mode" parameter was taken out of initialize_timer(). 2. a new function update_timer() was added to update the timeout and period fields. 3. the timer_expired() function was extended with an output parameter to return the remaining cycles before expiration. Also, the "fire_tsc" field name of hv_timer was renamed to "timeout". With the new API, however, this change should not concern user code. Tracked-On: #5920 Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>	2021-05-18 16:43:28 +08:00
Liang Yi	3547c9cd23	hv/mod_timer: make timer into an arch-independent module x86/timer.[ch] was moved to the common directory largely unchanged. x86 specific code now resides in x86/tsc_deadline_timer.c and its interface was defined in hw/hw_timer.h. The interface defines two functions: init_hw_timer() and set_hw_timeout() that provides HW specific initialization and timer interrupt source. Other than these two functions, the timer module is largely arch agnostic. Tracked-On: #5920 Signed-off-by: Rong Liu <rong2.liu@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>	2021-05-18 16:43:28 +08:00
Liang Yi	51204a8d11	hv/mod_timer: separate delay functions from the timer module Modules that use udelay() should include "delay.h" explicitly. Tracked-On: #5920 Signed-off-by: Rong Liu <rong2.liu@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>	2021-05-18 16:43:28 +08:00
Liang Yi	5a2b89b0a4	hv/mod_timer: split tsc handling code from timer. Generalize and split basic cpu cycle/tick routines from x86/timer: - Instead of rdstc(), use cpu_ticks() in generic code. - Instead of get_tsc_khz(), use cpu_tickrate() in generic code. - Include "common/ticks.h" instead of "x86/timer.h" in generic code. - CYCLES_PER_MS is renamed to TICKS_PER_MS. The x86 specific API rdstc() and get_tsc_khz(), as well as TSC_PER_MS are still available in arch/x86/tsc.h but only for x86 specific usage. Tracked-On: #5920 Signed-off-by: Rong Liu <rong2.liu@intel.com> Signed-off-by: Yi Liang <yi.liang@intel.com>	2021-05-18 16:43:28 +08:00
Yonghua Huang	00b3a28d5d	hv: update RTCT parser to support RTCT version 2 RTCT has been updated to version 2, this patch updates hypervisor RTCT parser to support both version 1 and version 2 of RTCT. Tracked-On: #6020 Signed-off-by: Yonghua Huang <yonghua.huang@intel.com> Reviewed-by: Jason CJ Chen <jason.cj.chen@intel.com>	2021-05-17 17:19:11 +08:00
Yonghua Huang	9facbb43b3	config-tool: rename PSRARM to SSRAM 'psram' and 'PSRAM' are legacy names and replaced with 'ssram' and 'SSRAM' respectively. Tracked-On: #6012 Signed-off-by: Yonghua Huang <yonghua.huang@intel.com> Reviewed-by: Shuang Zheng <shuang.zheng@intel.com>	2021-05-17 14:31:42 +08:00
Zide Chen	c9982e8c7e	hv: nested: setup emulated VMX MSRs We emulated these MSRs: - MSR_IA32_VMX_PINBASED_CTLS - MSR_IA32_VMX_PROCBASED_CTLS - MSR_IA32_VMX_PROCBASED_CTLS2 - MSR_IA32_VMX_EXIT_CTLS - MSR_IA32_VMX_ENTRY_CTLS - MSR_IA32_VMX_BASIC: emulate VMCS revision ID, etc. - MSR_IA32_VMX_MISC For the following MSRs, we pass through the physical value to L1 guests: - MSR_IA32_VMX_EPT_VPID_CAP - MSR_IA32_VMX_VMCS_ENUM - MSR_IA32_VMX_CR0_FIXED0 - MSR_IA32_VMX_CR0_FIXED1 - MSR_IA32_VMX_CR4_FIXED0 - MSR_IA32_VMX_CR4_FIXED1 Tracked-On: #5923 Signed-off-by: Zide Chen <zide.chen@intel.com> Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-05-16 19:05:21 +08:00
Zide Chen	4930992118	hv: nested: implement the framework for VMX MSR emulation Define LIST_OF_VMX_MSRS which includes a list of MSRs that are visible to L1 guests if nested virtualization is enabled. - If CONFIG_NVMX_ENABLED is set, these MSRs are included in emulated_guest_msrs[]. - otherwise, they are included in unsupported_msrs[]. In this way we can take advantage of the existing infrastructure to emulate these MSRs. Tracked-On: #5923 Spick igned-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-05-16 19:05:21 +08:00
Zide Chen	97df220f49	hv: vmsr: emulate IA32_FEATURE_CONTORL MSR for nested virtualization In order to support nested virtualization, need to expose the "Enable VMX outside SMX operation" bit to L1 hypervisor. Tracked-On: #5923 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-05-16 19:05:21 +08:00
Yonghua Huang	e9870893a3	hv: rename some software SRAM local names For simplification purpose, use 'ssram' instead of 'software sram' for local names inside rtcm module. Tracked-On: #6015 Signed-off-by: Yonghua Huang <yonghua.huang@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-05-16 10:08:17 +08:00
Li Fei1	30febed0e1	hv: cache: wrap common APIs Wrap three common Cache APIs: - flush_invalidate_all_cache - flush_cacheline - flush_cache_range Tracked-On: #5830 Signed-off-by: Li Fei1 <fei1.li@intel.com>	2021-05-14 09:18:00 +08:00
Li Fei1	77e64f6092	hv: tlb: wrap common APIs Wrap two common TLB APIs: flush_tlb and flush_tlb_range. Tracked-On: #5830 Signed-off-by: Li Fei1 <fei1.li@intel.com>	2021-05-14 09:18:00 +08:00
Li Fei1	d94582389e	hv: mmu: move arch specific parts into cpu.h Move Cache/TLB arch specific parts into cpu.h After this change, we should not expose arch specific parts out from mmu.h Tracked-On: #5830 Signed-off-by: Li Fei1 <fei1.li@intel.com>	2021-05-14 09:18:00 +08:00
Li Fei1	d6362b6e0a	hv: paging: rename ppt_set/clear_ATTR to set_paging_ATTR Rename ppt_set/clear_(attribute) to set_paging_(attribute) Tracked-On: #5830 Signed-off-by: Li Fei1 <fei1.li@intel.com>	2021-05-14 09:18:00 +08:00
Zide Chen	ccfdf9cdd7	hv: nested: enable nested virtualization Allow guest set CR4_VMXE if CONFIG_NVMX_ENABLED is set: - move CR4_VMXE from CR4_EMULATED_RESERVE_BITS to CR4_TRAP_AND_EMULATE_BITS so that CR4_VMXE is removed from cr4_reserved_bits_mask. - force CR4_VMXE to be removed from cr4_rsv_bits_guest_value so that CR4_VMXE is able to be set. Expose VMX feature (CPUID01.01H:ECX[5]) to L1 guests whose GUEST_FLAG_NVMX_ENABLED is set. Assuming guest hypervisor (L1) is KVM, and KVM uses EPT for L2 guests. Constraints on ACRN VM. - LAPIC passthrough should be enabled. - use SCHED_NOOP scheduler. Tracked-On: #5923 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-05-13 16:16:30 +08:00
Zide Chen	dd90eccc25	hv: move invvpid and invept helper code from mmu.c to mmu.h moving invvpid and invept helper code from mmu.c to mmu.h, so that they can be accessed by the nested virtualization code. No logical changes. Tracked-On: #5923 Signed-off-by: Zide Chen <zide.chen@intel.com> Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-05-13 16:16:30 +08:00
Shuo A Liu	3fffa68665	hv: Support WAITPKG instructions in guest VM TPAUSE, UMONITOR or UMWAIT instructions execution in guest VM cause a #UD if "enable user wait and pause" (bit 26) of VMX_PROCBASED_CTLS2 is not set. To fix this issue, set the bit 26 of VMX_PROCBASED_CTLS2. Besides, these WAITPKG instructions uses MSR_IA32_UMWAIT_CONTROL. So load corresponding vMSR value during context switch in of a vCPU. Please note, the TPAUSE or UMWAIT instruction causes a VM exit if the "RDTSC exiting" and "enable user wait and pause" are both 1. In ACRN hypervisor, "RDTSC exiting" is always 0. So TPAUSE or UMWAIT doesn't cause a VM exit. Performance impact: MSR_IA32_UMWAIT_CONTROL read costs ~19 cycles; MSR_IA32_UMWAIT_CONTROL write costs ~63 cycles. Tracked-On: #6006 Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>	2021-05-13 14:19:50 +08:00
dongshen	ebadf00de8	hv: some coding style fixes Fix issues reported by checkpatch.pl Tracked-On: #5917 Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>	2021-05-12 16:50:34 +08:00
Junjie Mao	ea4eadf0a5	hv: hypercalls: refactor permission-checking and dispatching logic The current permission-checking and dispatching mechanism of hypercalls is not unified because: 1. Some hypercalls require the exact vCPU initiating the call, while the others only need to know the VM. 2. Different hypercalls have different permission requirements: the trusty-related ones are enabled by a guest flag, while the others require the initiating VM to be the Service OS. Without a unified logic it could be hard to scale when more kinds of hypercalls are added later. The objectives of this patch are as follows. 1. All hypercalls have the same prototype and are dispatched by a unified logic. 2. Permissions are checked by a unified logic without consulting the hypercall ID. To achieve the first objective, this patch modifies the type of the first parameter of hcall_* functions (which are the callbacks implementing the hypercalls) from `struct acrn_vm ` to `struct acrn_vcpu `. The doxygen-style documentations are updated accordingly. To achieve the second objective, this patch adds to `struct hc_dispatch` a `permission_flags` field which specifies the guest flags that must ALL be set for a VM to be able to invoke the hypercall. The default value (which is 0UL) indicates that this hypercall is for SOS only. Currently only the `permission_flag` of trusty-related hypercalls have the non-zero value GUEST_FLAG_SECURE_WORLD_ENABLED. With `permission_flag`, the permission checking logic of hypercalls is unified as follows. 1. General checks i. If the VM is neither SOS nor having any guest flag that allows certain hypercalls, it gets #UD upon executing the `vmcall` instruction. ii. If the VM is allowed to execute the `vmcall` instruction, but attempts to execute it in ring 1, 2 or 3, the VM gets #GP(0). 2. Hypercall-specific checks i. If the hypercall is for SOS (i.e. `permission_flag` is 0), the initiating VM must be SOS and the specified target VM cannot be a pre-launched VM. Otherwise the hypercall returns -EINVAL without further actions. ii. If the hypercall requires certain guest flags, the initiating VM must have all the required flags. Otherwise the hypercall returns -EINVAL without further actions. iii. A hypercall with an unknown hypercall ID makes the hypercall returns -EINVAL without further actions. The logic above is different from the current implementation in the following aspects. 1. A pre-launched VM now gets #UD (rather than #GP(0)) when it attempts to execute `vmcall` in ring 1, 2 or 3. 2. A pre-launched VM now gets #UD (rather than the return value -EPERM) when it attempts to execute a trusty hypercall in ring 0. 3. The SOS now gets the return value -EINVAL (rather than -EPERM) when it attempts to invoke a trusty hypercall. 4. A post-launched VM with trusty support now gets the return value -EINVAL (rather than #UD) when it attempts to invoke a non-trusty hypercall or an invalid hypercall. v1 -> v2: - Update documentation that describe hypercall behavior. - Fix Doxygen warnings Tracked-On: #5924 Signed-off-by: Junjie Mao <junjie.mao@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-05-12 13:43:41 +08:00
Liang Yi	688a41c290	hv: mod: do not use explicit arch name when including headers Instead of "#include <x86/foo.h>", use "#include <asm/foo.h>". In other words, we are adopting the same practice in Linux kernel. Tracked-On: #5920 Signed-off-by: Liang Yi <yi.liang@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>	2021-05-08 11:15:46 +08:00
Li Fei1	f3327364c3	hv: mmu: fix a minor bug We should only map [low32_max_ram, 4G) MMIO region as UC attribute, not map [low32_max_ram, low32_max_ram + 4G) region as UC attribute. Otherwise, the HV will complain [4G, low32_max_ram + 4G) region has already mapped. Tracked-On: #5830 Signed-off-by: Li Fei1 <fei1.li@intel.com>	2021-04-29 08:57:13 +08:00
Shuo A Liu	dc88c2e397	hv: Save/restore MSR_IA32_CSTAR during context switch Both Windows guest and Linux guest use the MSR MSR_IA32_CSTAR, while Linux uses it rarely. Now vcpu context switch doesn't save/restore it. Windows detects the change of the MSR and rises a exception. Do the save/resotre MSR_IA32_CSTAR during context switch. Tracked-On: #5899 Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-04-23 11:21:52 +08:00
Jian Jun Chen	31b8b698ce	hv: TLFS: Add tsc_offset support for reference time TLFS spec defines that when a VM is created, the value of HV_X64_MSR_TIME_REF_COUNT is set to zero. Now tsc_offset is not supported properly, so guest get a drifted reference time. This patch implements tsc_offset. tsc_scale and tsc_offset are calculated when a VM is launched and are saved in struct acrn_hyperv of struct acrn_vm. Tracked-On: #5956 Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com> Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-04-23 10:48:07 +08:00
Jian Jun Chen	b4312efbd7	hv: TLFS: inject #GP to guest VM for writing of read-only MSRs TLFS spec defines that HV_X64_MSR_VP_INDEX and HV_X64_MSR_TIME_REF_COUNT are read-only MSRs. Any attempt to write to them results in a #GP fault. Fix the issue by returning error in handler hyperv_wrmsr() of MSRs HV_X64_MSR_VP_INDEX/HV_X64_MSR_TIME_REF_COUNT emulation. Tracked-On: #5956 Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com> Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-04-23 10:48:07 +08:00
Jian Jun Chen	dd524d076d	hv: TLFS: Setup hypercall page according to the vcpu mode TLFS spec defines different hypercall ABIs for X86 and x64. Currently x64 hypercall interface is not supported well. Setup the hypercall interface page according to the vcpu mode. Tracked-On: #5956 Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com> Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-04-23 10:48:07 +08:00
Li Fei1	628bca5cad	hv: pgtable: use new algo to calculate PPT/EPT_PD_PAGE_NUM In order to support platform (such as Ander Lake) which physical address width bits is 46, the current code need to reserve 2^16 PD page ((2^46) / (2^30)). This is a complete waste of memory. This patch would reserve PD page by three parts: 1. DRAM - may take PD_PAGE_NUM(CONFIG_PLATFORM_RAM_SIZE) PD pages at most; 2. low MMIO - may take PD_PAGE_NUM(MEM_1G << 2U) PD pages at most; 3. high MMIO - may takes (CONFIG_MAX_PCI_DEV_NUM * 6U) PD pages (may plus PDPT entries if its size is larger than 1GB ) at most for: (a) MMIO BAR size must be a power of 2 from 16 bytes; (b) MMIO BAR base address must be power of two in size and are aligned with its size. Tracked-On: #5929 Signed-off-by: Li Fei1 <fei1.li@intel.com>	2021-04-22 14:35:57 +08:00

1 2 3 4 5 ...

2306 Commits