acrn-hypervisor

mirror of https://github.com/projectacrn/acrn-hypervisor.git synced 2025-07-31 07:20:55 +00:00

Author	SHA1	Message	Date
Liu,Junming	79a5d7a787	hv: initialize IGD offset 0xfc of CFG space for Service VM For the IGD device the opregion addr is returned by reading the 0xFC config of 0:02.0 bdf. And the opregion addr is required by GPU driver. The opregion_addr should be the GPA addr. When the IGD is assigned to pre-launched VM, the value in 0xFC of igd_vdev is programmed into with new GPA addr. In such case the prelaunched VM reads the value from 0xFC of 0:02.0 vdev. But for the Service VM, the IGD is initialized by using the same policy as other PCI devices. We only initialize the vdev_head_conf(0x0-0x3F) by checking the corresponding pbdf. The remaining pci_config_space will be read by leveraging the corresponding pdev. But as the above code doesn't handle the scenario for Service VM, it causes that the Service VM fails to read the 0xFC config_space for IGD vdev. Then the i915 GPU driver in SOS has some issues because of incorrect 0xFC pci_conf_space. This patch initializes offset 0xfc of CFG space of IGD for Service VM, it is simple and can cover post-launched VM too. Tracked-On: #6387 Signed-off-by: Liu,Junming <junming.liu@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-10-18 09:11:16 +08:00
Xiangyang Wu	dec8d7e22f	hv: support at most MAX_VUART_NUM_PER_VM legacy vuarts In the current hypervisor, only support at most two legacy vuarts (COM1 and COM2) for a VM, COM1 is usually configured as VM console, COM2 is configured as communication channel of S5 feature. Hypervisor can support MAX_VUART_NUM_PER_VM(8) legacy vuart, but only register handlers for two legacy vuart since the assumption (legacy vuart is less than 2) is made. In the current hypervisor configurtion, io port (2F8H) is always allocated for virtual COM2, it will be not friendly if user wants to assign this port to physical COM2. Legacy vuart is common communication channel between service VM and user VM, it can work in polling mode and its driver exits in each guest OS. The channel can be used to send shutdown command to user VM in S5 featuare, so need to config serval vuarts for service VM and one vuart for each user VM. The following changes will be made to support at most MAX_VUART_NUM_PER_VM legacy vuarts: - Refine legacy vuarts initialization to register PIO handler for related vuart. - Update assumption of legacy vuart number. BTW, config tools updates about legacy vuarts will be made in other patch. v1-->v2: Update commit message to make this patch's purpose clearer; If vuart index is valid, register handler for it. Tracked-On: #6652 Signed-off-by: Xiangyang Wu <xiangyang.wu@intel.com> Acked-by: Eddie Dong <eddie.dong@Intel.com>	2021-10-15 10:00:02 +08:00
Fei Li	6c5bf4a642	hv: enhance e820_alloc_memory could allocate memory than 4G Enhance e820_alloc_memory could allocate memory than 4G. Tracked-On: #5830 Signed-off-by: Fei Li <fei1.li@intel.com>	2021-10-14 15:04:36 +08:00
Fei Li	df7ffab441	hv: remove CONFIG_HV_RAM_SIZE It's difficult to configure CONFIG_HV_RAM_SIZE properly at once. This patch not only remove CONFIG_HV_RAM_SIZE, but also we use ld linker script to dynamically get the size of HV RAM size. Tracked-On: #6663 Signed-off-by: Fei Li <fei1.li@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-10-14 15:04:36 +08:00
Zide Chen	e48962faa6	hv: optimize run_vcpu() for nested This patch implements a separate path for L2 VMEntry in run_vcpu(), which has several benefits: - keep run_vcpu() clean, to reduce the number of is_vcpu_in_l2_guest() statements: - current code has three is_vcpu_in_l2_guest() already. - supposed to have another 2 statement so that nested VMEntry won't hit the "Starting vCPU" and "vCPU launched" pr_info() and a few other statements in the VM launch path. - save few other things in run_vcpu() that are not needed for nested. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@Intel.com>	2021-10-13 15:55:31 +08:00
Zide Chen	89bbc44962	hv: inject external interrupts only if LAPIC is not passthru Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@Intel.com>	2021-10-08 09:18:34 +08:00
Zide Chen	228b052fdb	hv: operations on vcpu->reg_cached/reg_updated don't need LOCK prefix In run time, one vCPU won't read or write a register on other vCPUs, thus we don't need the LOCK prefixed instructions on reg_cached and reg_updated. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-10-08 09:11:10 +08:00
Zide Chen	2b683f8f5b	hv: call vcpu_inject_exception() only when ACRN_REQUEST_EXCP is set move the bitmap test call out of vcpu_inject_exception(), then we call the expensive bitmap_test_and_clear_lock() only pending_req_bits is non-zero and call vcpu_inject_exception() only if needed. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@Intel.com>	2021-10-07 20:48:43 +08:00
Zide Chen	f801ba4ed7	hv: update guest RIP only if vcpu->arch.inst_len is non zero In very large number of VM extis, the VM-exit instruction length could be zero, and it's no need to update VMX_GUEST_RIP. Some examples: - all external interrupt VM exits in non LAPIC passthru setup. - for all the nested VM-exits that are reflecting to L1 hypervisor. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-10-07 20:47:07 +08:00
Zide Chen	b7e9a68923	hv: code cleanup in run_vcpu() - wrap a new function exec_vmentry() to reduce code duplication. - remove exec_vmread(VMX_GUEST_RSP) since ACRN doesn't need to know guest RSP in run time. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-10-07 20:47:07 +08:00
Zide Chen	ee12daff84	hv: nested: refine vmcs12_read/write_field APIs Change "uint64_t vmcs_hva" to "void *vmcs_hva" in the input argument, list, so that no type casting is needed when calling them from pointers. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-10-07 20:45:34 +08:00
Kunhui-Li	2a8c587824	config_tools: update board name in makefile update board name from nuc7i7dnb to nuc11tnbi5 in makefile because we have removed the nuc7i7dnb board folder, and also update the scenario name from industry to shared to fix "make all" build issue. Tracked-On: #6315 Signed-off-by: Kunhui-Li <kunhuix.li@intel.com>	2021-09-29 16:53:44 +08:00
Liu,Junming	545c006a33	hv: inject #GP if guest tries to reprogram pass-thru dev PIO bar In current design, when pass-thru dev, for the PIO bar, need to ensure the guest PIO start address equals to host PIO start address. But malicious guest may reprogram the PIO bar, then hv will pass-thru the reprogramed PIO address to guest. This isn't safe behavior. When guest tries to reprogram pass-thru dev PIO bar, inject #GP to guest directly. Tracked-On: #6508 Signed-off-by: Liu,Junming <junming.liu@intel.com> Reviewed-by: Zhao Yakui <yakui.zhao@intel.com> Reviewed-by: Fei Li <fei1.li@intel.com>	2021-09-28 08:49:01 +08:00
Liu,Junming	4105ca2cb4	hv: deny the launch of VM if pass-thru PIO bar isn't identical mapping In current design, when pass-thru dev, for the PIO bar, need to ensure the guest PIO start address equals to host PIO start address. Then set the VMCS io bitmap to pass-thru the corresponding port io to guest for performance. ACRN-DM and acrn-config should ensure the identical mapping of PIO bar. If ACRN-DM or acrn-config failed to achieve this, we should deny the launch of VM Tracked-On: #6508 Signed-off-by: Liu,Junming <junming.liu@intel.com> Reviewed-by: Zhao Yakui <yakui.zhao@intel.com> Reviewed-by: Fei Li <fei1.li@intel.com>	2021-09-28 08:49:01 +08:00
Victor Sun	28824c1e74	HV: init e820 before init paging In the commit of `4e1deab3d9`, we changed the init sequence that init paging first and then init e820 because we worried about the efi memory map could be beyond 4GB space on some platform. After we double checked multiboot2 spec, when system boot from multiboot2 protocol, the efi memory map info will be embedded in multiboot info so it is guaranteed that the efi memory map must be under 4GB space. Consider that the page table will be allocated in free memory space in future, we have to change the init sequence back that init e820 first and then init paging. If we need to support other boot protocol in future that the efi memory map might be put beyond 4GB, we could have below options: 1. Request bootloader put efi memory map below 4GB; 2. Call EFI_BOOT_SERVICES.GetMemoryMap() before ExitBootServices(); 3. Enable a early 64bit page table to get the efi memory map only; Tracked-On: #5626 Signed-off-by: Victor Sun <victor.sun@intel.com>	2021-09-27 09:03:15 +08:00
Zide Chen	a62dd6ad8a	hv: nested: fixed vmxoff_vmexit_handler() issue In VMXOFF vmexit handler, it's supposed to remove VMCS shadowing. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com>	2021-09-26 08:49:35 +08:00
Zide Chen	45b036e028	hv: nested: enable multiple active VMCS12 support This patch changes the size of vvmcs[] array from 1 to PER_VCPU_ACTIVE_VVMCS_NUM, and actually enables multiple active VMCS12 support in ACRN. The basic operations: - if L1 VMPTRLDs a VMCS12 without previously VMCLEAR the current VMCS12, ACRN no longer unconditionally flushes the current VMCS12 back to L1. Instead, it tries to keep both the current and the newly loaded VMCS12 in the nested->vvmcs[] array, unless: - if there is no more available vvmcs[] entry, ACRN flushes one active VMCS12 to make room for this new VMCS12. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-09-26 08:49:35 +08:00
Mingqiang Chi	f39c882359	hv:change log level for check_vmx_ctrl Some processors don't support VMX_PROCBASED_CTLS_TERTIARY bit and VMX_PROCBASED_CTLS2_UWAIT_PAUSE bit in MSRs (IA32_VMX_PROCBASED_CTLS & IA32_VMX_PROCBASED_CTLS2), HV will output error log which will cause confusion, change the log level from pr_err to pr_info. Tracked-On: #6397 Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>	2021-09-24 10:17:19 +08:00
Jie Deng	064fd7647f	hv: add priority based scheduler This patch adds a new priority based scheduler to support vCPU scheduling based on their pre-configured priorities. A vCPU can be running only if there is no higher priority vCPU running on the same pCPU. Tracked-On: #6571 Signed-off-by: Jie Deng <jie.deng@intel.com>	2021-09-24 09:32:18 +08:00
Junjie Mao	efcb9e2fdf	Makefile: fix wrong reference to board XML and skip binary in diffconfig The current config.mk uses the variable BOARD_FILE as the path to the board XML when generating an unmodified copy of configuration files for comparison, which is incorrect. The right variable is HV_BOARD_XML which is the path to the copy of board XML that is actually used for the build. This patch corrects the bug above. In addition, this patch also skips binary files (which are not meant to be edited manually) when calculating the differences. Tracked-On: #6592 Signed-off-by: Junjie Mao <junjie.mao@intel.com>	2021-09-19 20:23:44 +08:00
Fei Li	53fe6d63be	hv: vioapic: update remote IRR for lapic-pt For local APIC passthrough case, EOI would not trigger VM-exit. So virtual 'Remote IRR' would not be updated. Needs to read physical IOxAPIC RTE to update virtual 'Remote IRR' field each time when guest wants to read I/O REDIRECTION TABLE REGISTERS Tracked-On: #5923 Signed-off-by: Fei Li <fei1.li@intel.com>	2021-09-18 09:42:44 +08:00
Zide Chen	94cbe909ee	hv: irq: identical vector mapping if LAPIC passthough In local APIC passthrough case, when devices triggered a INTx interrupt, this interrupt would be delivered to vCPU directly. For this case, need to set the virtual vector in the 'Interrupt Vector' field of physical IOxAPIC I/O REDIRECTION TABLE REGISTER (bits 7:0) and 'Vector' field of vt-d Interrupt Remapping Table Entry (IRTE) for Remapped Interrupts. Assumption: (a) IOAPIC pins won't be shared between LAPIC PT guest and other guests; (b) The guest would not trigger this IRQ before it switched to x2 APIC mode. Tracked-On: #5923 Signed-off-by: Zide Chen <zide.chen@intel.com>	2021-09-18 09:42:44 +08:00
Mingqiang Chi	db98f01b6e	add vmx capability check check some essential vmx capablility, will panic if processor doesn't support it. Tracked-On: #6584 Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-09-18 08:44:30 +08:00
dongshen	08d4517431	hv: fix bugs in RDT's CDP code In current RDT code, if CDP is configured, L2/L3 resources' num_closids calculation is wrong: res_cap_info[res].num_closids = (uint16_t)((edx & 0xffffU) >> 1U) + 1U; Should be: res_cap_info[res].num_closids = (uint16_t)((edx & 0xffffU) >> 1U + 1) >> 1U; Aslo, in order to enable CDP system-wide, need to enable the CDP bit (bit 0) on all pcpus, not just on pcpu 0. Tracked-On: #5917 Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>	2021-09-17 16:29:05 +08:00
dongshen	f4cdbba0bd	hv: some cosmetic fixes to rdt.c/rdt.h Rename the clos_max field in struct rdt_info to num_closids Rename variable valid_clos_num to common_num_closids and make it static Tracked-On: #5917 Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>	2021-09-17 16:29:05 +08:00
Liu Long	2de395b6f6	HV: Normalize hypervisor help output format Normalize hypervisor help command output format, remove the 10 lines limit for one screen, fix the misspelled words. Tracked-On: #5112 Signed-off-by: Liu Long <long.liu@intel.com> Reviewed-by: VanCutsem, Geoffroy <geoffroy.vancutsem@intel.com>	2021-09-17 11:06:18 +08:00
Zide Chen	0466d7055f	hv: nested: move the VMCS12 dirty flags to struct acrn_vvmcs These dirty flags are supposed to be per VMCS12, so move them from the per vCPU acrn_nested struct to the newly added acrn_vvmcs struct. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-09-17 10:58:43 +08:00
Zide Chen	4e54c3880b	hv: nested: remove vcpu->arch.nested.current_vmcs12_ptr This variable represents the L1 GPA of the current VMCS12. But it's no longer needed in the multiple active VMCS12 case, which uses the following variables for this purpose. - nested->current_vvmcs refers to the vvmcs[] entry which contains the cached current VMCS12, its associated VMCS02, and other context info. - nested->current_vvmcs->vmcs12_gpa refers to the L1 GPA of this current VMCS12. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-09-17 10:58:43 +08:00
Zide Chen	799a4d332a	hv: nested: initial implementation of struct acrn_vvmcs Add an array of struct acrn_vvmcs to struct acrn_nested, so it is possible to cache multiple active VMCS12s. This patch declares the size of this array to 1, meaning that there is only one active VMCS12. This is to minimize the logical code changes. Add pointer current_vvmcs to struct acrn_nested, which refers to the current vvmcs[] entry. In this patch, if any VMCS12 is active, it always points to vvmcs[0]. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-09-17 10:58:43 +08:00
Zide Chen	cf697e753d	hv: nested: some API signature changes No any logical changes, this patch is preparing for multiple active VMCS12 support. - currently it's easy to get the vmcs12 pointer from the vcpu pointer. In multiple active vmcs12 case, we need to explicitly add "struct acrn_vmcs12 *vmcs12" to certain APIs' input argument list, in order to get the desired vmcs12 pointer. - merge flush_current_vmcs12() into clear_vmcs02() for multiple reasons: a) it's called only once; b) we don't wrap the opposite operation (loading vmcs12) in an API; c) this API has simple and clear logic. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-09-17 10:58:43 +08:00
Zide Chen	e9eb72d319	hv: nested: flush L2 VPID only when it could conflict with L1 VPIDs By changing the way to assign L1 VPID from bottom-up to top-down, the possibilities for VPID conflicts between L1 and L2 guests are small. Then we can flush VPID just in case of conflicting. Tracked-On: #6289 Signed-off-by: Anthony Xu <anthony.xu@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-09-16 09:26:10 +08:00
Fei Li	0a515ab2ea	hv: pci: fix a minor bug about is_pci_cfg_multifunction Before checking whether a PCI device is a Multi-Function Device or not, we need make sure this PCI device is a valid PCI device. For a valid PCI device, the 'Header Layout' field in Header Type Register must be 000 0000b (Type 0 PCI device) or 000 0001b (Type 1 PCI device). So for a valid PCI device, the Header Type can't be 0xff. Tracked-On: #4134 Signed-off-by: Fei Li <fei1.li@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-09-15 13:24:18 +08:00
Zide Chen	1ab65825ba	hv: nested: merge gpa_field_dirty and control_field_dirty flag In run time, it's rare for L1 to write to the intercepted non host-state VMCS fields, and using multiple dirty flags is not necessary. This patch uses one single dirty flag to manage all non host-state VMCS fields. This helps to simplify current code and in the future we may not need to declare new dirty flags when we intercept more VMCS fields. Tracked-On: #5923 Signed-off-by: Zide Chen <zide.chen@intel.com>	2021-09-13 15:50:01 +08:00
Zide Chen	6376d5a0d3	hv: nested: fix bug in syncing EPTP from VMCS12 to VMCS02 Currently vmptrld_vmexit_handler() doesn't sync VMX_EPT_POINTER_FULL from vmcs12 to vmcs02, instead it sets gpa_field_dirty and relies on nested_vmentry() to sync EPTP in next nested VMentry. This creates readability issue since all other intercepted VMCS fields are synced in sync_vmcs12_to_vmcs02(). Another issue is that other VMCS fields managed by gpa_field_dirty are repeatedly synced in both vmptrld and nested vmentry handler. This patch moves get_nept_desc() ahead of sync_vmcs12_to_vmcs02(), such that shadow_eptp is allocated before sync_vmcs12_to_vmcs02() which can sync EPTP properly. BTW, in nested_vmexit_handler(), don't need to read from VMCS to get the exit reason, since vcpu->arch.exit_reason has it already. Tracked-On: #5923 Signed-off-by: Zide Chen <zide.chen@intel.com>	2021-09-13 15:50:01 +08:00
Geoffroy Van Cutsem	01bf5110c5	Makefile: add missing deps in top-level and hypervisor Makefile Add a couple of missing dependencies in the ACRN Makefiles: 1. 'acrn.bin' is required before the hypervisor can be installed 2. The 'acrn_mngr.h' needs to be installed ('tools-install') in the build folder. Tracked-On: #6360 Signed-off-by: Geoffroy Van Cutsem <geoffroy.vancutsem@intel.com>	2021-09-13 11:28:14 +08:00
Junjie Mao	2bfaa34cf2	config_tools: populate default values in scenario XML While we have default values of configuration entries stated in the schema of scenario XMLs, today we still require user-given scenario XMLs to contain literally ALL XML nodes. Missing of a single node will cause schema validation errors even though we can use its default value defined in the schema. This patch allows user-given scenario XMLs to ignore nodes with default values. It is done by adding the missing nodes, all containing the defined default values, to the input scenario XML when copying it to the build directory. This approach imposes no changes to either the schema or subsequent scripts in the build system. Tracked-On: #6292 Signed-off-by: Junjie Mao <junjie.mao@intel.com>	2021-09-13 09:05:52 +08:00
Yifan Liu	0a1ad45b32	hv: Avoid using SMBIOS major version Previously it is (falsely) assumed that the major_ver of 32-bit SMBIOS entry point structure (which is called SMBIOS 2.1 in spec, or SMBIOS2 in code) will have a value of 2 and major_ver of 64-bit SMBIOS (which is called SMBIOS 3.0 in spec, and SMBIOS3 in code) will have a value of 3. This turned out to be wrong. This major_ver refers to the implemented doc revision, and 32-bit SMBIOS2 can have its major_ver to be 3 (current most recent implementation). This patch removes the use of major_ver to distinguish between SMBIOS2/3, and use a doc-defined anchor string instead. Tracked-On: #6528 Signed-off-by: Yifan Liu <yifan1.liu@intel.com>	2021-09-08 15:22:12 +08:00
Zide Chen	11c2f3eabb	hv: check bitmap before calling bitmap_test_and_clear_lock() The locked btr instruction is expensive. This patch changes the logic to ensure that the bitmap is non-zero before executing bitmap_test_and_clear_lock(). The VMX transition time gets significant improvement. SOS running on TGL, the CPUID roundtrip reduces from ~2400 cycles to ~2000 cycles. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-09-02 16:09:33 +08:00
Zide Chen	7cde4a8d40	hv: initialize host IA32_PAT MSR Currently ACRN assumes firmware setup IA32_PAT correctly. This patch explicitly initializes host IA32_PAT MSR according to ISDM Table 11-12. Memory Type Setting of PAT Entries Following a Power-up or Reset. ACRN creates host page tables based on PAT0 (WB) and PAT3 (UC). Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com>	2021-09-02 09:15:39 +08:00
Zide Chen	aeb3690b6f	hv: simplify is_lapic_pt_enabled() is_lapic_pt_enabled() is called at least twice in one loop of the vCPU thread, and it's called in vmexit_handler() frequently if LAPIC is not pass-through. Thus the efficiency of this function has direct impact to the system performance. Since the LAPIC mode is not changed in run time, we don't have to calculate it on the fly in is_lapic_pt_enabled(). BTW, removed the unused lapic_mask from struct acrn_vcpu_arch. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-26 09:52:10 +08:00
Shiqing Gao	d90dbc0d91	hv: check the capability of XSAVES/XRSTORS instructions before execution For platforms that do not support XSAVES/XRSTORS instructions, like QEMU, executing these instructions causes #UD. This patch adds the check before the execution of XSAVES/XRSTORS instructions. It also refines the logic inside rstore_xsave_area for the following reason: If XSAVES/XRSTORS instructions are supported, restore XSAVE area if any of the following conditions is met: 1. "vcpu->launched" is false (state initialization for guest) 2. "vcpu->arch.xsave_enabled" is true (state restoring for guest) * Before vCPU is launched, condition 1 is satisfied. * After vCPU is launched, condition 2 is satisfied because is_valid_xsave_combination() guarantees that "vcpu->arch.xsave_enabled" is consistent with pcpu_has_cap(X86_FEATURE_XSAVES). Therefore, the check against "vcpu->launched" and "vcpu->arch.xsave_enabled" can be eliminated here. Tracked-On: #6481 Signed-off-by: Shiqing Gao <shiqing.gao@intel.com> Acked-by: Eddie Dong <eddie.dong@Intel.com>	2021-08-26 09:42:23 +08:00
Zide Chen	cbf3825140	hv: Pass-through IA32_TSC_AUX MSR to L1 guest Use an unused MSR on host to save ACRN pcpu ID and avoid saving and restoring TSC AUX MSR on VMX transitions. Tracked-On: #6289 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Reviewed-by: Eddie Dong <eddie.dong@intel.com>	2021-08-26 09:25:54 +08:00
Yifan Liu	d33c76f701	hv: quirks: SMBIOS passthrough for prelaunched-VM This feature is guarded under config CONFIG_SECURITY_VM_FIXUP, which by default should be disabled. This patch passthrough native SMBIOS information to prelaunched VM. SMBIOS table contains a small entry point structure and a table, of which the entry point structure will be put in 0xf0000-0xfffff region in guest address space, and the table will be put in the ACPI_NVS region in guest address space. v2 -> v3: uuid_is_equal moved to util.h as inline API result -> pVendortable, in function efi_search_guid recalc_checksum -> generate_checksum efi_search_smbios -> efi_search_smbios_eps scan_smbios_eps -> mem_search_smbios_eps EFI GUID definition kept Tracked-On: #6320 Signed-off-by: Yifan Liu <yifan1.liu@intel.com>	2021-08-26 09:24:50 +08:00
Yifan Liu	975ff33e01	hv: Move uuid_is_equal to util.h This patch moves uuid_is_equal from vm_config.c to util.h as inline API. Tracked-On: #6320 Signed-off-by: Yifan Liu <yifan1.liu@intel.com>	2021-08-26 09:24:50 +08:00
Yifan Liu	32d6ead8de	hv && config-tool: Rename GUEST_FLAG_TPM2_FIXUP This patch renames the GUEST_FLAG_TPM2_FIXUP to GUEST_FLAG_SECURITY_VM. v2 -> v3: The "FIXUP" suffix is removed. Tracked-On: #6320 Signed-off-by: Yifan Liu <yifan1.liu@intel.com>	2021-08-26 09:24:50 +08:00
Liu Long	31598ae895	ACRN:hv: Fix vcpu_dumpreg command hang issue In ACRN RT VM if the lapic is passthrough to the guest, the ipi can't trigger VM_EXIT and the vNMI is just for notification, it can't handle the smp_call function. Modify vcpu_dumpreg function prompt user switch to vLAPIC mode for vCPU register dump. Tracked-On: #6473 Signed-off-by: Liu Long <long.liu@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-25 08:54:27 +08:00
Zide Chen	0980420aea	hv: minor cleanup of hv_main.c - remove vcpu->arch.nrexits which is useless. - record full 32 bits of exit_reason to TRACE_2L(). Make the code simpler. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-25 08:49:54 +08:00
Jian Jun Chen	8de39f7b61	hv: GSI of hcall_set_irqline should be checked against target_vm GSI of hcall_set_irqline should be checked against target_vm's total GSI count instead of SOS's total GSI count. Tracked-On: #6357 Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-25 08:48:47 +08:00
Zide Chen	6d7eb6d7b6	hv: emulate IA32_EFER and adjust Load EFER VMX controls This helps to improve performance: - Don't need to execute VMREAD in vcpu_get_efer(), which is frequently called. - VMX_EXIT_CTLS_SAVE_EFER can be removed from VM-Exit Controls. - If the value of IA32_EFER MSR is identical between the host and guest (highly likely), adjust the VMX controls not to load IA32_EFER on VMExit and VMEntry. It's convenient to continue use the exiting vcpu_s/get_efer() APIs, other than the common vcpu_s/get_guest_msr(). Tracked-On: #6289 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com>	2021-08-24 11:16:53 +08:00
Liang Yi	499f62e8bd	hv: use per platform maximum physical address width MAXIMUM_PA_WIDTH will be calculated from board information. Tracked-On: #6357 Signed-off-by: Liang Yi <yi.liang@intel.com> Signed-off-by: Junjie Mao <junjie.mao@intel.com>	2021-08-20 11:02:21 +08:00
Liang Yi	2b3620de7d	hv: mask off LA57 in cpuid Mask off support of 57-bit linear addresses and five-level paging. ICX-D has LA57 but ACRN doesn't support 5-level paging yet. Tracked-On: #6357 Signed-off-by: Liang Yi <yi.liang@intel.com> Signed-off-by: Li, Fei1 <fei1.li@intel.com>	2021-08-20 11:02:21 +08:00
Shiqing Gao	91777a83b5	config_tools: add a new entry MAX_EFI_MMAP_ENTRIES It is used to specify the maximum number of EFI memmap entries. On some platforms, like Tiger Lake, the number of EFI memmap entries becomes 268 when the BIOS settings are changed. The current value of MAX_EFI_MMAP_ENTRIES (256) defined in hypervisor is not big enough to cover such cases. As the number of EFI memmap entries depends on the platforms and the BIOS settings, this patch introduces a new entry MAX_EFI_MMAP_ENTRIES in configurations so that it can be adjusted for different cases. Tracked-On: #6442 Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>	2021-08-20 09:50:39 +08:00
Shiqing Gao	651d44432c	hv: initialize the XSAVE related processor state for guest If SOS is using kernel 5.4, hypervisor got panic with #GP. Here is an example on KBL showing how the panic occurs when kernel 5.4 is used: Notes: * Physical MSR_IA32_XSS[bit 8] is 1 when physical CPU boots up. * vcpu_get_guest_msr(vcpu, MSR_IA32_XSS)[bit 8] is initialized to 0. Following thread switches would happen at run time: 1. idle thread -> vcpu thread context_switch_in happens and rstore_xsave_area is called. At this moment, vcpu->arch.xsave_enabled is false as vcpu is not launched yet and init_vmcs is not called yet (where xsave_enabled is set to true). Thus, physical MSR_IA32_XSS is not updated with the value of guest MSR_IA32_XSS. States at this point: * Physical MSR_IA32_XSS[bit 8] is 1. * vcpu_get_guest_msr(vcpu, MSR_IA32_XSS)[bit 8] is 0. 2. vcpu thread -> idle thread context_switch_out happens and save_xsave_area is called. At this moment, vcpu->arch.xsave_enabled is true. Processor state is saved to memory with XSAVES instruction. As physical MSR_IA32_XSS[bit 8] is 1, ectx->xs_area.xsave_hdr.hdr.xcomp_bv[bit 8] is set to 1 after the execution of XSAVES instruction. States at this point: * Physical MSR_IA32_XSS[bit 8] is 1. * vcpu_get_guest_msr(vcpu, MSR_IA32_XSS)[bit 8] is 0. * ectx->xs_area.xsave_hdr.hdr.xcomp_bv[bit 8] is 1. 3. idle thread -> vcpu thread context_switch_in happens and rstore_xsave_area is called. At this moment, vcpu->arch.xsave_enabled is true. Physical MSR_IA32_XSS is updated with the value of guest MSR_IA32_XSS, which is 0. States at this point: * Physical MSR_IA32_XSS[bit 8] is 0. * vcpu_get_guest_msr(vcpu, MSR_IA32_XSS)[bit 8] is 0. * ectx->xs_area.xsave_hdr.hdr.xcomp_bv[bit 8] is 1. Processor state is restored from memory with XRSTORS instruction afterwards. According to SDM Vol1 13.12 OPERATION OF XRSTORS, a #GP occurs if XCOMP_BV sets a bit in the range 62:0 that is not set in XCR0 \| IA32_XSS. So, #GP occurs once XRSTORS instruction is executed. Such issue does not happen with kernel 5.10. Because kernel 5.10 writes to MSR_IA32_XSS during initialization, while kernel 5.4 does not do such write. Once guest writes to MSR_IA32_XSS, it would be trapped to hypervisor, then, physical MSR_IA32_XSS and the value of MSR_IA32_XSS in vcpu->arch.guest_msrs are updated with the value specified by guest. So, in the point 2 above, correct processor state is saved. And #GP would not happen in the point 3. This patch initializes the XSAVE related processor state for guest. If vcpu is not launched yet, the processor state is initialized according to the initial value of vcpu_get_guest_msr(vcpu, MSR_IA32_XSS), ectx->xcr0, and ectx->xs_area. With this approach, the physical processor state is consistent with the one presented to guest. Tracked-On: #6434 Signed-off-by: Shiqing Gao <shiqing.gao@intel.com> Reviewed-by: Li Fei1 <fei1.li@intel.com>	2021-08-20 09:46:09 +08:00
Zide Chen	2e6cf2b85b	hv: nested: fix bugs in init_vmx_msrs() Currently init_vmx_msrs() emulates same value for the IA32_VMX_xxx_CTLS and IA32_VMX_TRUE_xxx_CTLS MSRs. But the value of physical MSRs could be different between the pair, and we need to adjust the emulated value accordingly. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-20 09:40:50 +08:00
Zide Chen	ad37553873	hv: nested: redundant permission check on nested_vmentry() check_vmx_permission() is called in vmresume_vmexit_handler() and vmlaunch_vmexit_handler() already. Tracked-On: #6289 Signed-off-by: Zide Chen <zide.chen@intel.com>	2021-08-20 08:14:40 +08:00
Yifan Liu	d575edf79a	hv: Change sched_event structure to resolve data race in event handling Currently the sched event handling may encounter data race problem, and as a result some vcpus might be stalled forever. One example can be wbinvd handling where more than 1 vcpus are doing wbinvd concurrently. The following is a possible execution of 3 vcpus: ------- 0 1 2 req [Note: 0] req bit0 set [Note: 1] IPI -> 0 req bit2 set IPI -> 2 VMExit req bit2 cleared wait vcpu2 descheduled VMExit req bit0 cleared wait vcpu0 descheduled signal 0 event0->set=true wake 0 signal 2 event2->set=true [Note: 3] wake 2 vcpu2 scheduled event2->set=false resume req req bit0 set IPI -> 0 req bit1 set IPI -> 1 (doesn't matter) vcpu0 scheduled [Note: 4] signal 0 event0->set=true (no wake) [Note: 2] event0->set=false (the rest doesn't matter) resume Any VMExit req bit0 cleared wait idle running (blocked forever) Notes: 0: req: vcpu_make_request(vcpu, ACRN_REQUEST_WAIT_WBINVD). 1: req bit: Bit in pending_req_bits. Bit0 stands for bit for vcpu0. 2: In function signal_event, At this time the event->waiting_thread is not NULL, so wake_thread will not execute 3: eventX: struct sched_event of vcpuX. 4: In function wait_event, the lock does not strictly cover the execution between schedule() and event->set=false, so other threads may kick in. ----- As shown in above example, before the last random VMExit, vcpu0 ended up with request bit set but event->set==false, so blocked forever. This patch proposes to change event->set from a boolean variable to an integer. The semantic is very similar to a semaphore. The wait_event will add 1 to this value, and block when this value is > 0, whereas signal_event will decrease this value by 1. It may happen that this value was decreased to a negative number but that is OK. As long as the wait_event and signal_event are paired and program order is observed (that is, wait_event always happens-before signal_event on a single vcpu), this value will eventually be 0. Tracked-On: #6405 Signed-off-by: Yifan Liu <yifan1.liu@intel.com>	2021-08-20 08:11:40 +08:00
Zhou, Wu	b394777908	HV: Add implements of 32bit and 64bit elf loader This is a simply implement for the 32bit and 64bit elf loader. The loading function first reads the image header, and finds the program entries that are marked as PT_LOAD, then loads segments from elf file to guest ram. After that, it finds the bss section in the elf section entries, and clear the ram area it points to. Limitations: 1. The e_type of the elf image must be ET_EXEC(executable). Relocatable or dynamic code is not supported. 2. The loader only copies program segments that has a p_type of PT_LOAD(loadable segment). Other segments are ignored. 3. The loader doesn’t support Sections that are relocatable (sh_type is SHT_REL or SHT_RELA) 4. The 64bit elf’s entry address must below 4G. 5. The elf is assumed to be able to put segments to valid guest memory. Tracked-On: #6323 Signed-off-by: Zhou, Wu <wu.zhou@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-19 20:00:45 +08:00
Zhou, Wu	c2468d2791	HV: Add elf loader sketch This patch adds a function elf_loader() to load elf image. It checks the elf header, get its 32/64 bit type, then calls the corresponding loading routines, which are empty, and will be realized later. Tracked-On: #6323 Signed-off-by: Zhou, Wu <wu.zhou@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-19 20:00:45 +08:00
Zhou, Wu	537f69dde9	HV: Add elf header file for elf loader Source: https://github.com/freebsd/freebsd-src/blob/main/sys/sys/elf_common.h Trimed to meet the minimal requirements for the Zephyr elf file to be loaded Also added elf file header data struct and program/section entry data structs. Tracked-On: #6323 Signed-off-by: Zhou, Wu <wu.zhou@intel.com> Reviewed-by: Victor Sun <victor.sun@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-19 20:00:45 +08:00
Zhou, Wu	8100b1dd56	HV: Remove 'vm_' of vm_elf_loader and etc. In order to make better sense, vm_elf_loader, vm_bzimage_loader and vm_rawimage_loader are changed to elf_loaer, bzimage_loaer and rawimage_loader. Tracked-On: #6323 Signed-off-by: Zhou, Wu <wu.zhou@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-19 20:00:45 +08:00
Zhou, Wu	53f6720d13	HV: Combine the acpi loading fucntion to one place Remove the acpi loading function from elf_loader, rawimage_loaer and bzimage_loader, and call it together in vm_sw_loader. Now the vm_sw_loader's job is not just loading sw, so we rename it to prepare_os_image. Tracked-On: #6323 Signed-off-by: Zhou, Wu <wu.zhou@intel.com> Reviewed-by: Victor Sun <victor.sun@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-19 20:00:45 +08:00
Zhou, Wu	e78aacbe55	HV: Correct some naming issues For the guest OS loaders, prapare_loading_xxx are not accurate for what those functions actually do. Now they are changed to load_xxx: load_rawimage, load_bzimage. And the 'bsp' expression is confusing in the comments for init_vcpu_protect_mode_regs, changed to a better way. Tracked-On: #6323 Signed-off-by: Zhou, Wu <wu.zhou@intel.com> Reviewed-by: Victor Sun <victor.sun@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-19 20:00:45 +08:00
Victor Sun	3124018917	HV: vm_load: rename vboot_info.h to vboot.h vboot_info.h declares vm loader function also, so rename the file name to vboot.h; Tracked-On: #6323 Signed-off-by: Victor Sun <victor.sun@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-19 20:00:45 +08:00
Victor Sun	9b632c0e4b	HV: vm_load: split vm_load.c to support diff kernel format The patch splits the vm_load.c to three parts, the loader function of bzImage kernel is moved to bzimage_loader.c, the loader function of raw image kernel is moved to rawimage_loader.c, the stub is still stayed in vm_load.c to load the corresponding kernel loader function. Each loader function could be isolated by CONFIG_GUEST_KERNEL_XXX macro which generated by config tool. Tracked-On: #6323 Signed-off-by: Victor Sun <victor.sun@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-19 20:00:45 +08:00
Victor Sun	2524572fb2	HV: vm_load: refine vm_sw_loader API Change if condition to switch in vm_sw_loader() so that the sw loader could be compiled conditionally. Tracked-On: #6323 Signed-off-by: Victor Sun <victor.sun@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-19 20:00:45 +08:00
Yang,Yu-chu	73dc610d90	config-tool: refine guest kernel types Rename KERNEL_ZEPHYR to KERNEL_RAWIMAGE. Added new type "KERNEL_ELF". Add CONFIG_GUEST_KERNEL_RAWIMAGE, CONFIG_GUEST_KERNEL_ELF and/or CONFIG_GUEST_KERNEL_BZIMAGE to config.h if it's configured. Tracked-On: #6323 Signed-off-by: Yang,Yu-chu <yu-chu.yang@intel.com> Reviewed-by: Victor Sun <victor.sun@intel.com>	2021-08-19 20:00:45 +08:00
Victor Sun	178b3e85e3	HV: vm_load: change kernel type for zephyr image Previously we only support loading raw format of zephyr image as prelaunched Zephyr VM, this would cause guest F segment overridden issue because the zephyr raw image covers memory space from 0x1000 to 0x100000 upper. To fix this issue, we should support ELF format image loading so that parse and load the multiple segments from ELF image directly. Tracked-On: #6323 Signed-off-by: Victor Sun <victor.sun@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-19 20:00:45 +08:00
Fei Li	2e7491a8ec	hv: mmiodev: a minor bug fix about refine acrn_mmiodev data structure Rename base_hpa to host_pa in acrn_mmiodev data structure. Tracked-On: #6366 Signed-off-by: Fei Li <fei1.li@intel.com>	2021-08-19 12:01:35 +08:00
Liu,Junming	2c5c8754de	hv:enable GVT-d for pre-launched linux guest in logical partion mode When pass-thru GPU to pre-launched Linux guest, need to pass GPU OpRegion to the guest. Here's the detailed steps: 1. reserve a memory region in ve820 table for GPU OpRegion 2. build EPT mapping for GPU OpRegion to pass-thru OpRegion to guest 3. emulate the pci config register for OpRegion For the third step, here's detailed description: The address of OpRegion locates on PCI config space offset 0xFC, Normal Linux guest won't write this register, so we can regard this register as read-only. When guest reads this register, return the emulated value. When guest writes this register, ignore the operation. Tracked-On: #6387 Signed-off-by: Liu,Junming <junming.liu@intel.com>	2021-08-19 11:56:26 +08:00
Jian Jun Chen	dc77ef9e52	hv: ivshmem: map SHM BAR with PAT ignored ACRN does not support the variable range vMTRR. The default memory type of vMTRR is UC. With this vMTRR emulation guest VM such as Linux refuses to map the MMIO address space as WB. In order to get better performance SHM BAR of ivshmem is mapped with PAT ignored and memory type of SHM BAR is fixed to WB. Tracked-On: #6389 Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-13 11:17:15 +08:00
Yang,Yu-chu	d997f4bbc1	config-tools: refine bin_gen.py and create virtual TPM2 acpi table Create virtual acpi table of tpm2 based on the raw data if the TPM2 device is presented and the passthrough tpm2 is enabled. Refine the arguments of bin_gen.py. The --board and --scenario take the path to the XMLs as the argument. The allocation.xml is needed for bin_gen.py to generate tpm2 acpi table. Refine the condition of tpm2_acpi_gen. The tpm2 device "MSFT0101" can be present in device id or compatible_id(CID). Check both attributes and child node of tpm2 device. Tracked-On: #6320 Signed-off-by: Yang,Yu-chu <yu-chu.yang@intel.com>	2021-08-11 14:45:55 +08:00
Fei Li	a705ff2dac	hv: relocate ACPI DATA address to 0x7fe00000 Relocate ACPI address to 0x7fe00000 and ACPI NVS to 0x7ff00000 correspondingly. In this case, we could include TPM event log region [0x7ffb0000, 0x80000000) into ACPI NVS. Tracked-On: #6320 Signed-off-by: Fei Li <fei1.li@intel.com>	2021-08-11 14:45:55 +08:00
Fei Li	74e68e39d1	hv: tpm2: do tpm2 fixup for security vm ACRN used to prepare the vTPM2 ACPI Table for pre-launched VM at the build stage using config tools. This is OK if the TPM2 ACPI Table never changes. However, TPM2 ACPI Table may be changed in some conditions: change BIOS configuration or update BIOS. This patch do TPM2 fixup to update the vTPM2 ACPI Table and TPM2 MMIO resource configuration according to the physical TPM2 ACPI Table. Tracked-On: #6366 Signed-off-by: Tao Yuhong <yuhong.tao@intel.com> Signed-off-by: Fei Li <fei1.li@intel.com>	2021-08-11 14:45:55 +08:00
Fei Li	f81b39225c	HV: refine acrn_mmiodev data structure 1. add a name field to indicate what the MMIO Device is. 2. add two more MMIO resource to the acrn_mmiodev data structure. Tracked-On: #6366 Signed-off-by: Tao Yuhong <yuhong.tao@intel.com> Signed-off-by: Fei Li <fei1.li@intel.com>	2021-08-11 14:45:55 +08:00
Fei Li	20061b7c39	hv: remove xsave dependence ACRN could run without XSAVE Capability. So remove XSAVE dependence to support more (hardware or virtual) platforms. Tracked-On: #6287 Signed-off-by: Fei Li <fei1.li@intel.com>	2021-08-10 16:36:15 +08:00
Fei Li	84235bf07c	hv: vtd: a minor refine about dmar_wait_completion Check whether condition is met before check whether time is out after iommu_read32. This is because iommu_read32 would cause time out on some virtual platform in spite of the current DMAR status meets the pre_condition. Tracked-On: #6371 Signed-off-by: Fei Li <fei1.li@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-08-10 16:36:15 +08:00
Tao Yuhong	171856c46b	hv: uc-lock: Fix do not trap #GP If HV enable trigger #GP for uc-lock, and is about to emulate guest uc-lock instructions, should trap guest #GP. Guest uc-lock instrucction trigger #GP, cause vmexit for #GP, HV handle this vmexit and emulate uc-lock instruction. Tracked-On: #6299 Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>	2021-08-09 15:33:12 +08:00
liu hang1	d07bd78b13	Makefile:add targz-pkg entry in Makefile User could use make targz-pkg command to generate tar package in build directory,which could help user simplify the process of installing acrn hypervisor in target board. user need to copy the tarball package to target board,and extract it to "/" directory. Tracked-On: #6355 Signed-off-by: liu hang1 <hang1.liu@intel.com> Reviewed-by: VanCutsem, Geoffroy <geoffroy.vancutsem@intel.com> Acked-by: Wang, Yu1 <yu1.wang@intel.com>	2021-08-09 11:52:27 +08:00
Kunhui-Li	578c18b962	config_tools: remove obsolete kconfig files Remove obsolete Kconfig files; Update Kconfig related README and error message. Tracked-On: #6315 Signed-off-by: Kunhui-Li <kunhuix.li@intel.com>	2021-08-09 09:25:02 +08:00
Victor Sun	4a53a23faa	HV: debug: support 64bit BAR pci uart with 32bit space Currently the HV console does not support PCI UART with 64bit BAR, but in the case that the BAR is in 64bit and the BAR space is below 4GB (i.e. the high 32bit address of the 64bit BAR is zero), HV should be able to support it. Tracked-On: #6334 Signed-off-by: Victor Sun <victor.sun@intel.com>	2021-08-04 10:10:35 +08:00
Victor Sun	2fbc4c26e6	HV: vm_load: remove kernel_load_addr in sw_kernel_info struct When guest kernel has multiple loading segments like ELF format image, just define one load address in sw_kernel_info struct is meaningless. The patch removes kernel_load_addr member in struct sw_kernel_info, the load address should be parsed in each specified format image processing. Tracked-On: #6323 Signed-off-by: Victor Sun <victor.sun@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>	2021-08-03 13:44:51 +08:00
Victor Sun	d1d59437ea	HV: vm_load: correct needed size of bzImage kernel The previous code did not load bzImage start from protected mode part, result in the protected mode part un-align with kernel_alignment field and then cause kernel decompression start from a later aligned address. In this case we had to enlarge the needed size of bzImage kernel to kernel_init_size plus double size of kernel_alignment. With loading issue of bzImage protected mode part fixed, the kernel needed size is corrected in this patch. Tracked-On: #6323 Signed-off-by: Victor Sun <victor.sun@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>	2021-08-03 13:44:51 +08:00
Victor Sun	2b5bd2e87a	HV: vm_load: load protected mode code only for bzImage When LaaG boots with bzImage module file, only protected mode code need to be loaded to guest space since the VM will boot from protected mode directly. Futhermore, per Linux boot protocol the protected mode code better to be aligned with kernel_alignment field in zeropage, otherwise kernel will take time to do "rep movs" to the aligned address. In previous code, the bzImage is loaded to the address where aligned with kernel_alignment, this would make the protected mode code unalign with kernel_alignment. If the kernel is configured with CONFIG_RELOCATABLE=n, the guest would not boot. This patch fixed this issue. Tracked-On: #6323 Signed-off-by: Victor Sun <victor.sun@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>	2021-08-03 13:44:51 +08:00
Victor Sun	9caff7360f	HV: vm_load: set kernel load addr in vm_load.c This patch moves get_bzimage_kernel_load_addr() from init_vm_sw_load() to vm_sw_loader() stage so will set kernel load address of bzImage type kernel in vm_bzimage_loader() in vm_load.c. Tracked-On: #6323 Signed-off-by: Victor Sun <victor.sun@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>	2021-08-03 13:44:51 +08:00
Victor Sun	e40a258102	HV: vm_load: set ramdisk load addr in vm_load.c This patch moves get_initrd_load_addr() API from init_vm_sw_load() to vm_sw_loader() stage. The patch assumes that the kernel image have been loaded to guest space already. Tracked-On: #6323 Signed-off-by: Victor Sun <victor.sun@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>	2021-08-03 13:44:51 +08:00
Victor Sun	afe24731a7	HV: vm_load: remove load_sw_modules() api In load_sw_modules() implementation, we always assuming the guest kernel module has one load address and then the whole kernel image would be loaded to guest space from its load address. This is not true when guest kernel has multiple load addresses like ELF format kernel image. This patch removes load_sw_modules() API, and the loading method of each format of kernel image could be specified in prepare_loading_xxximage() API. Tracked-On: #6323 Signed-off-by: Victor Sun <victor.sun@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>	2021-08-03 13:44:51 +08:00
Victor Sun	2938eca363	HV: vm_load: refine api of get_bzimage_kernel_load_addr() As the previous commit said the kernel load address should be moved from init_vm_sw_load() to vm_sw_loader() stage. This patch refines the API of get_bzimage_kernel_load_addr() in init_vm_kernel_info() for later use. Tracked-On: #6323 Signed-off-by: Victor Sun <victor.sun@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>	2021-08-03 13:44:51 +08:00
Victor Sun	33d226bf58	HV: vm_load: refine api of get_initrd_load_addr() Currently the guest kernel load address and ramdisk load address are initialized during init_vm_sw_load() stage, this is meaningless when guest kernel has multiple segments with different loading addresses. In that case, the kernel load addresses should be parsed and loaded in vm_sw_loader() stage, the ramdisk load address should be set in that stage also because it is depended on kernel load address. This patch refines the API of get_initrd_load_addr() which will set proper initrd load address of bzImage type kernel for later use. Tracked-On: #6323 Signed-off-by: Victor Sun <victor.sun@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>	2021-08-03 13:44:51 +08:00
Fei Li	bc5c3a0bb7	hv: vpci: modify Interrupt Line Register as writable According to PCIe Spec, for a RW register bits, If the optional feature that is associated with the bits is not implemented, the bits are permitted to be hardwired to 0b. However Zephyr would use INTx Line Register as writable even this PCI device has no INTx, so emulate INTx Line Register as writable. Tracked-On: #6330 Signed-off-by: Fei Li <fei1.li@intel.com>	2021-08-03 11:01:24 +08:00
Fei Li	481e9fba9d	hv: remove the constraint "MMU and EPT must both support large page or not" There're some virtual platform which doesn't meet this constraint. So remove this constraint. Tracked-On: #6329 Signed-off-by: Fei Li <fei1.li@intel.com>	2021-08-03 11:01:24 +08:00
Minggui Cao	80ae3224d9	hv: expose PMC to core partition VM for core partition VM (like RTVM), PMC is always used for performance profiling / tuning, so expose PMC capability and pass-through its MSRs to the VM. Tracked-On: #6307 Signed-off-by: Minggui Cao <minggui.cao@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-07-27 14:58:28 +08:00
Minggui Cao	eba8c4e78b	hv: use ARRAY_SIZE to calc local array size if one array just used in local only, and its size not used extern, use ARRAY_SIZE macro to calculate its size. Tracked-On: #6307 Signed-off-by: Minggui Cao <minggui.cao@intel.com> Reviewed-by: Junjie Mao <junjie.mao@intel.com>	2021-07-27 14:58:28 +08:00
Yifan Liu	69fef2e685	hv: debug: Add hv console callback to VM-exit event In some scenarios (e.g., nested) where lapic-pt is enabled for a vcpu running on a pcpu hosting console timer, the hv console will be inaccessible. This patch adds the console callback to every VM-exit event so that the console can still be somewhat functional under such circumstance. Since this is VM-exit driven, the VM-exit/second can be low in certain cases (e.g., idle or running stress workload). In extreme cases where the guest panics/hangs, there will be no VM-exits at all. In most cases, the shell is laggy but functional (probably enough for debugging purpose). Tracked-On: #6312 Signed-off-by: Yifan Liu <yifan1.liu@intel.com>	2021-07-22 10:08:23 +08:00
Tao Yuhong	8360c3dfe6	HV: enable #GP for UC lock For an atomic operation using bus locking, it would generate LOCK# bus signal, if it has Non-WB memory operand. This is an UC lock. It will ruin the RT behavior of the system. If MSR_IA32_CORE_CAPABILITIES[bit4] is 1, then CPU can trigger #GP for instructions which cause UC lock. This feature is controlled by MSR_TEST_CTL[bit28]. This patch enables #GP for guest UC lock. Tracked-On: #6299 Signed-off-by: Tao Yuhong <yuhong.tao@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-07-21 11:25:47 +08:00
Tao Yuhong	2aba7f31db	HV: rename splitlock file name Because the emulation code is for both split-lock and uc-lock, rename splitlock.c/splitlock.h to lock_instr_emul.c/lock_instr_emul.h Tracked-On: #6299 Signed-off-by: Tao Yuhong <yuhong.tao@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-07-21 11:25:47 +08:00
Tao Yuhong	7926504011	HV: rename split-lock emulation APIs Because the emulation code is for both split-lock and uc-lock, Changed these API names: vcpu_kick_splitlock_emulation() -> vcpu_kick_lock_instr_emulation() vcpu_complete_splitlock_emulation() -> vcpu_complete_lock_instr_emulation() emulate_splitlock() -> emulate_lock_instr() Tracked-On: #6299 Signed-off-by: Tao Yuhong <yuhong.tao@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-07-21 11:25:47 +08:00
Tao Yuhong	bbd7b7091b	HV: re-use split-lock emulation code for uc-lock Split-lock emulation can be re-used for uc-lock. In emulate_splitlock(), it only work if this vmexit is for #AC trap and guest do not handle split-lock and HV enable #AC for splitlock. Add another condition to let emulate_splitlock() also work for #GP trap and guest do not handle uc-lock and HV enable #GP for uc-lock. Tracked-On: #6299 Signed-off-by: Tao Yuhong <yuhong.tao@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2021-07-21 11:25:47 +08:00
Tao Yuhong	553d59644b	HV: Fix decode_instruction() trigger #UD for emulating UC-lock When ACRN uses decode_instruction to emulate split-lock/uc-lock instruction, It is actually a try-decode to see if it is XCHG. If the instruction is XCHG instruction, ACRN must emulate it (inject #PF if it is triggered) with peer VCPUs paused, and advance the guest IP. If the instruction is a LOCK prefixed instruction with accessing the UC memory, ACRN Halted the peer VCPUs, and advance the IP to skip the LOCK prefix, and then let the VCPU Executes one instruction by enabling IRQ Windows vm-exit. For other cases, ACRN injects the exception back to VCPU without emulating it. So change the API to decode_instruction(vcpu, bool full_decode), when full_decode is true, the API does same thing as before. When full_decode is false, the different is if decode_instruction() meet unknown instruction, will keep return = -1 and do not inject #UD. We can use this to distinguish that an #UD has been skipped, and need inject #AC/#GP back. Tracked-On: #6299 Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>	2021-07-21 11:25:47 +08:00
Yonghua Huang	48908d522f	hv: minor coding style fix in list.h To add brakets for '(char *)(ptr)' in MACRO container_of(), which may be used recursively. Tracked-On: #6284 Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>	2021-07-15 14:18:13 +08:00
Shuo A Liu	4896faaebb	hv: dm: Remove aligned attribute of common structures Common structures are used by DM, kernel, HV. Aligned attribute might caused structures size mismatch between DM/HV and kernel, as kernel uses default GCC alignment. So, make DM/HV also use the default GCC alignment. Tracked-On: #6282 Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>	2021-07-15 11:53:54 +08:00

1 2 3 4 5 ...

3275 Commits