acrn-hypervisor

mirror of https://github.com/projectacrn/acrn-hypervisor.git synced 2025-07-06 12:06:25 +00:00

Author	SHA1	Message	Date
Jie Deng	47e193a7bb	hv: Add split-lock emulation for LOCK prefix instruction This patch adds the split-lock emulation. If a #AC is caused by instruction with LOCK prefix then emulate it, otherwise, inject it back as it used to be. 1. Kick other vcpus of the guest to stop execution and set the TF flag to have #DB if the guest has more than one vcpu. 2. Skip over the LOCK prefix and resume the current vcpu back to guest for execution. 3. Notify other vcpus to restart exception at the end of handling the #DB since we have completed the LOCK prefix instruction emulation. Tracked-On: #5605 Signed-off-by: Jie Deng <jie.deng@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-12-31 11:12:33 +08:00
Yonghua Huang	643bbcfe34	hv: check the availability of guest CR4 features Check hardware support for all features in CR4, and hide bits from guest by vcpuid if they're not supported for guests OS. Tracked-On: #5586 Signed-off-by: Yonghua Huang <yonghua.huang@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-12-18 11:21:22 +08:00
Yonghua Huang	442fc30117	hv: refine virtualization flow for cr0 and cr4 - The current code to virtualize CR0/CR4 is not well designed, and hard to read. This patch reshuffle the logic to make it clear and classify those bits into PASSTHRU, TRAP_AND_PASSTHRU, TRAP_AND_EMULATE & reserved bits. Tracked-On: #5586 Signed-off-by: Eddie Dong <eddie.dong@intel.com> Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>	2020-12-18 11:21:22 +08:00
Yonghua Huang	08c42f91c9	hv: rename hypercall for hv-emulated device management Coding style cleanup, use add/remove instead of create/destroy. Tracked-On: #5586 Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>	2020-12-07 16:25:17 +08:00
Shiqing Gao	6f10bd00bf	hv: coding style clean-up related to Boolean While following two styles are both correct, the 2nd one is simpler. bool is_level_triggered; 1. if (is_level_triggered == true) {...} 2. if (is_level_triggered) {...} This patch cleans up the style in hypervisor. Tracked-On: #861 Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>	2020-11-28 14:51:32 +08:00
Junming Liu	1cd932e568	hv: refine code style refine code style Tracked-On: #4020 Signed-off-by: Junming Liu <junming.liu@intel.com>	2020-11-26 12:56:28 +08:00
Junming Liu	56eb859ea4	hv: vmexit: refine xsetbv_vmexit_handler API From SDM Vol.2C - XSETBV instruction description, If CR4.OSXSAVE[bit 18] = 0, execute "XSETBV" instruction will generate #UD exception. From SDM Vol.3C 25.1.1,#UD exception has priority over VM exits, So if vCPU execute "XSETBV" instruction when CR4.OSXSAVE[bit 18] = 0, VM exits won't happen. While hv inject #GP if vCPU execute "XSETBV" instruction when CR4.OSXSAVE[bit 18] = 0. It's a wrong behavior, this patch will fix the bug. Tracked-On: #4020 Signed-off-by: Junming Liu <junming.liu@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-11-26 12:56:28 +08:00
Peter Fang	68dc8d9f8f	hv: pm: avoid duplicate shutdowns on RTVM It is possible for more than one vCPUs to trigger shutdown on an RTVM. We need to avoid entering VM_READY_TO_POWEROFF state again after the RTVM has been paused or shut down. Also, make sure an RTVM enters VM_READY_TO_POWEROFF state before it can be paused. v1 -> v2: - rename to poweroff_if_rt_vm for better clarity Tracked-On: #5411 Signed-off-by: Peter Fang <peter.fang@intel.com>	2020-11-11 14:05:39 +08:00
dongshen	ca5683f78d	hv: add support for shutdown for pre-launched VMs Currently, ACRN only support shutdown when triple fault happens, because ACRN doesn't present/emulate a virtual HW, i.e. port IO, to support shutdown. This patch emulate a virtual shutdown component, and the vACPI method for guest OS to use. Pre-launched VM uses ACPI reduced HW mode, intercept the virtual sleep control/status registers for pre-launched VMs shutdown Tracked-On: #5411 Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>	2020-11-04 10:33:31 +08:00
dongshen	8f79ceefbd	hv: fix out-of-date comments related to pre-launched VMs rebooting Like post-launched VMs, for pre-launched VMs, the ACPI reset register is also fixed at 0xcf9 and the reset value is 0xE, so pre-launched VMs now also use ACPI reset register for rebooting. Tracked-On: #5411 Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>	2020-11-04 10:33:31 +08:00
Peter Fang	70b1218952	hv: pm: support shutting down multiple VMs when pCPUs are shared More than one VM may request shutdown on the same pCPU before shutdown_vm_from_idle() is called in the idle thread when pCPUs are shared among VMs. Use a per-pCPU bitmap to store all the VMIDs requesting shutdown. v1 -> v2: - use vm_lock to avoid a race on shutdown Tracked-On: #5411 Signed-off-by: Peter Fang <peter.fang@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-11-04 10:33:31 +08:00
Li Fei1	c6f9404f55	hv: psram: add kconfig to enable psram Add two Kconfig pSRAM config: one for whether to enable the pSRAM on the platfrom or not; another for if the pSRAM is enabled on the platform whether to enable the pSRAM in the pre-launched RTVM. If we enable the pSRAM on the platform, we should remove the pSRAM EPT mapping from the SOS to prevent it could flush the pSRAM cache. Tracked-On: #5330 Signed-off-by: Qian Wang <qian1.wang@intel.com>	2020-11-02 15:56:30 +08:00
Qian Wang	99ee76781f	hv: pSRAM: add pSRAM support for pre-launched RTVM 1.Modified the virtual e820 table for pre-launched VM. We added a segment for pSRAM, and thus lowmem RAM is split into two parts. Logics are added to deal with the split. 2.Added EPT mapping of pSRAM segment for pre-launched RTVM if it uses pSRAM. Tracked-On: #5330 Signed-off-by: Qian Wang <qian1.wang@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-11-02 15:56:30 +08:00
Qian Wang	a557105e71	hv: ept: set EPT cache attribute to WB for pSRAM pSRAM memory should be cachable. However, it's not a RAM or a normal MMIO, so we can't use the an exist API to do the EPT mapping and set the EPT cache attribute to WB for it. Now we assume that SOS must assign the PSRAM area as a whole and as a separate memory region whose base address is PSRAM_BASE_HPA. If the hpa of the EPT mapping region is equal to PSRAM_BASE_HPA, we think this EPT mapping is for pSRAM, we change the EPT mapping cache attribute to WB. And fix a minor bug when SOS trap out to emulate wbinvd when pSRAM is enabled. Tracked-On: #5330 Signed-off-by: Qian Wang <qian1.wang@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-11-02 15:56:30 +08:00
Qian Wang	ca2aee225c	hv: skip pSRAM for guest WBINVD emulation Use ept_flush_leaf_page to emulate guest WBINVD when PTCM is enabled and skip the pSRAM in ept_flush_leaf_page. TODO: do we need to emulate WBINVD in HV side. Tracked-On: #5330 Signed-off-by: Qian Wang <qian1.wang@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-11-02 10:29:43 +08:00
Qian Wang	77269c15c5	hv: vcr: remove wbinvd for CR0.CD emulation According 11.5.1 Cache Control Registers and Bits, Intel SDM Vol 3, change CR0.CD will not flush cache to insure memory coherency. So it's not needed to call wbinvd to flush cache in ACRN Hypervisor. That's what the guest should do. Tracked-On: #5330 Signed-off-by: Qian Wang <qian1.wang@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-11-02 10:29:43 +08:00
Tao Yuhong	4120bd391a	HV: decouple legacy vuart interface from acrn_vuart layer support pci-vuart type, and refine: 1.Rename init_vuart() to init_legacy_vuarts(), only init PIO type. 2.Rename deinit_vuart() to deinit_legacy_vuarts(), only deinit PIO type. 3.Move io handler code out of setup_vuart(), into init_legacy_vuarts() 4.add init_pci_vuart(), deinit_pci_vuart, for one pci vuart vdev. and some change from requirement: 1.Increase MAX_VUART_NUM_PER_VM to 8. Tracked-On: #5394 Signed-off-by: Tao Yuhong <yuhong.tao@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com> Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>	2020-10-30 20:41:34 +08:00
Yonghua Huang	3ea1ae1e11	hv: refine msi interrupt injection functions 1. refine the prototype of 'inject_msi_lapic_pt()' 2. rename below function: - rename 'vlapic_intr_msi()' to 'vlapic_inject_msi()' - rename 'inject_msi_lapic_pt()' to 'inject_msi_for_lapic_pt()' - rename 'inject_msi_lapic_virt()' to 'inject_msi_for_non_lapic_pt()' Tracked-On: #5407 Signed-off-by: Yonghua Huang <yonghua.huang@intel.com> Reviewed-by: Li Fei <fei1.li@intel.com> Reviewed-by: Wang, Yu1 <yu1.wang@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-10-26 08:44:13 +08:00
Yonghua Huang	012927d0bd	hv: move function 'inject_msi_lapic_pt()' to vlapic.c This function can be used by other modules instead of hypercall handling only, hence move it to vlapic.c Tracked-On: #5407 Signed-off-by: Yonghua Huang <yonghua.huang@intel.com> Reviewed-by: Li, Fei <fei1.li@intel.com> Reviewed-by: Wang, Yu1 <yu1.wang@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-10-26 08:44:13 +08:00
Zide Chen	bebffb29fc	hv: remove de-privilege boot mode support and remove vboot wrappers Now ACRN supports direct boot mode, which could be SBL/ABL, or GRUB boot. Thus the vboot wrapper layer can be removed and the direct boot functions don't need to be wrapped in direct_boot.c: - remove call to init_vboot(), and call e820_alloc_memory() directly at the time when the trampoline buffer is actually needed. - Similarly, call CPU_IRQ_ENABLE() instead of the wrapper init_vboot_irq(). - remove get_ap_trampoline_buf(), since the existing function get_trampoline_start16_paddr() returns the exact same value. - merge init_general_vm_boot_info() into init_vm_boot_info(). - remove vm_sw_loader pointer, and call direct_boot_sw_loader() directly. - move get_rsdp_ptr() from vboot_wrapper.c to multiboot.c, and remove the wrapper over two boot modes. Tracked-On: #5197 Signed-off-by: Zide Chen <zide.chen@intel.com>	2020-10-21 15:09:26 +08:00
Victor Sun	c63899fc81	HV: correct hpa calculation for pre-launched VM The commit of `da81a0041d` "HV: add e820 ACPI entry for pre-launched VM" introduced a issue that the base_hpa and remaining_hpa_size are also calculated on the entry of 32bit PCI hole which from 0x80000000 to 0xffffffff, which is incorrect; Tracked-On: #5266 Signed-off-by: Victor Sun <victor.sun@intel.com>	2020-09-15 09:45:10 +08:00
Li Fei1	a2fd8c5a9d	pci: mcfg: limit device bus numbers which could access by ECAM Per PCI Firmware Specification Revision 3.0, 4.1.2. MCFG Table Description: Memory Mapped Enhanced Configuration Space Base Address Allocation Structure assign the Start Bus Number and the End Bus Number which could decoded by the Host Bridge. We should not access the PCI device which bus number outside of the range of [Start Bus Number, End Bus Number). For ACRN, we should: 1. Don't detect PCI device which bus number outside the range of [Start Bus Number, End Bus Number) of MCFG ACPI Table. 2. Only trap the ECAM MMIO size: [MMCFG_BASE_ADDRESS, MMCFG_BASE_ADDRESS + (End Bus Number - Start Bus Number + 1) * 0x100000) for SOS. Tracked-On: #5233 Signed-off-by: Li Fei1 <fei1.li@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-09-09 09:31:56 +08:00
Victor Sun	34547e1e19	HV: add acpi module support for pre-launched VM Previously we use a pre-defined structure as vACPI table for pre-launched VM, the structure is initialized by HV code. Now change the method to use a pre-loaded multiboot module instead. The module file will be generated by acrn-config tool and loaded to GPA 0x7ff00000, a hardcoded RSDP table at GPA 0x000f2400 will point to the XSDT table which at GPA 0x7ff00080; Tracked-On: #5266 Signed-off-by: Victor Sun <victor.sun@intel.com> Signed-off-by: Shuang Zheng <shuang.zheng@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-09-08 19:52:25 +08:00
Victor Sun	da81a0041d	HV: add e820 ACPI entry for pre-launched VM Previously the ACPI table was stored in F segment which might not be big enough for a customized ACPI table, hence reserve 1MB space in pre-launched VM e820 table to store the ACPI related data: 0x7ff00000 ~ 0x7ffeffff : ACPI Reclaim memory 0x7fff0000 ~ 0x7fffffff : ACPI NVS memory Tracked-On: #5266 Signed-off-by: Victor Sun <victor.sun@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-09-08 19:52:25 +08:00
Nishioka, Toshiki	77fb21e98c	hv: add vgpio device model support When HV pass through the P2SB MMIO device to pre-launched VM, vgpio device model traps MMIO access to the GPIO registers within P2SB so that it can expose virtual IOAPIC pins to the VM in accordance with the programmed mappings between gsi and vgsi. Tracked-On: #5246 Signed-off-by: Toshiki Nishioka <toshiki.nishioka@intel.com> Reviewed-by: Junjie Mao <junjie.mao@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-09-07 14:52:02 +08:00
Nishioka, Toshiki	ba99984f69	hv: add INTx mapping for pre-launched VMs Add the capability of forwarding specified physical IOAPIC interrupt lines to pre-launched VMs as virtual IOAPIC interrupts. This is for the sake of the certain MMIO pass-thru devices on EHL CRB which can support only INTx interrupts. Tracked-On: #5245 Signed-off-by: Toshiki Nishioka <toshiki.nishioka@intel.com> Reviewed-by: Junjie Mao <junjie.mao@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-09-07 14:52:02 +08:00
Yonghua Huang	c03623f3fb	hv[v2]: Remove deprecated term in vPIC submodule This patch cleanup below deprecated terms: 'master' -> 'primary' 'slave' -> 'secondary' v2 update: Refine comments. Tracked-On: #5249 Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>	2020-09-01 09:30:08 +08:00
Yuan Liu	8a34cf03ca	hv: add new hypercalls to create and destroy an emulated device in hypervisor Add HC_CREATE_VDEV and HC_DESTROY_VDEV two hypercalls that are used to create and destroy an emulated device(PCI device or legacy device) in hypervisor v3: 1) change HC_CREATE_DEVICE and HC_DESTROY_DEVICE to HC_CREATE_VDEV and HC_DESTROY_VDEV 2) refine code style v4: 1) remove unnecessary parameter 2) add VM state check for HC_CREATE_VDEV and HC_DESTROY hypercalls Tracked-On: #4853 Reviewed-by: Wang, Yu1 <yu1.wang@intel.com> Signed-off-by: Yuan Liu <yuan1.liu@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-08-28 16:53:12 +08:00
Mingqiang Chi	53b11d1048	refine hypercall -- use an array to fast locate the hypercall handler to replace switch case. -- uniform hypercall handler as below: int32_t (*handler)(sos_vm, target_vm, param1, param2) Tracked-On: #4958 Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com> Reviewed-by: Eddie Dong <eddie.dong@intel.com>	2020-08-26 14:55:24 +08:00
Junming Liu	3631a85c3c	hv:cpu-caps:refine is_apl_platform func and clean up duplicated code Fix the bug for "is_apl_platform" func. "monitor_cap_buggy" is identical to "is_apl_platform", so remove it. On apl platform: 1) ACRN doesn't use monitor/mwait instructions 2) ACRN disable GPU IOMMU Tracked-On:#3675 Signed-off-by: Junming Liu <junming.liu@intel.com>	2020-08-14 10:08:50 +08:00
Mingqiang Chi	a67a85c70d	hv:refine vm & vcpu lock -- move vm_state_lock to other place in vm structure to avoid the memory waste because of the page-aligned. -- remove the memset from create_vm -- explicitly set max_emul_mmio_regions and vcpuid_entry_nr to 0 inside create_vm to avoid use without initialization. -- rename max_emul_mmio_regions to nr_emul_mmio_regions v1->v2: add deinit_emul_io in shutdown_vm Tracked-On: #4958 Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com> Reviewed-by: Grandhi, Sainath <sainath.grandhi@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-08-05 13:39:28 +08:00
Victor Sun	b9ad04d24d	HV: add cpu affinity info for SOS VM Previously the CPU affinity of SOS VM is initialized at runtime during sanitize_vm_config() stage, follow the policy that all physical CPUs except ocuppied by Pre-launched VMs are all belong to SOS_VM. Now change the process that SOS CPU affinity should be initialized at build time and has the assumption that its validity is guarenteed before runtime. Tracked-On: #5077 Signed-off-by: Victor Sun <victor.sun@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-08-04 09:05:29 +08:00
Shuo A Liu	112f02851c	hv: Disable XSAVE-managed CET state of guest VM To hide CET feature from guest VM completely, the MSR IA32_MSR_XSS also need to be intercepted because it comprises CET_U and CET_S feature bits of xsave/xstors operations. Mask these two bits in IA32_MSR_XSS writing. With IA32_MSR_XSS interception, member 'xss' of 'struct ext_context' can be removed because it is duplicated with the MSR store array 'vcpu->arch.guest_msrs[]'. Tracked-On: #5074 Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>	2020-07-23 20:15:57 +08:00
Shuo A Liu	ac598b0856	hv: Hide CET feature from guest VM Return-oriented programming (ROP), and similarly CALL/JMP-oriented programming (COP/JOP), have been the prevalent attack methodologies for stealth exploit writers targeting vulnerabilities in programs. CET (Control-flow Enforcement Technology) provides the following capabilities to defend against ROP/COP/JOP style control-flow subversion attacks: * Shadow stack: Return address protection to defend against ROP. * Indirect branch tracking: Free branch protection to defend against COP/JOP The full support of CET for Linux kernel has not been merged yet. As the first stage, hide CET from guest VM. Tracked-On: #5074 Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>	2020-07-23 20:15:57 +08:00
Li Fei1	5e605e0daf	hv: vmcall: check vm id in dispatch_sos_hypercall Check whether vm_id is valid in dispatch_sos_hypercall Tracked-On: #4550 Signed-off-by: Li Fei1 <fei1.li@intel.com>	2020-07-23 20:13:20 +08:00
Li Fei1	acc69007e2	hv: mmio_dev: add mmio device pass through support Add mmio device pass through support for pre-launched VM. When we pass through a MMIO device to pre-launched VM, we would remove its resource from the SOS. Now these resources only include the MMIO regions. Tracked-On: #5053 Acked-by: Eddie Dong <eddie.dong@intel.com> Signed-off-by: Li Fei1 <fei1.li@intel.com>	2020-07-23 20:13:20 +08:00
Li Fei1	baf77a79ad	hv: mmio_dev: add hypercall to support mmio device pass through Add two hypercalls to support MMIO device pass through for post-launched VM. And when we support MMIO pass through for pre-launched VM, we could re-use the code in mmio_dev.c Tracked-On: #5053 Signed-off-by: Li Fei1 <fei1.li@intel.com>	2020-07-23 20:13:20 +08:00
Conghui Chen	821c65b40c	hv: fix possible SSE region mismatch issue During context switch in hypervisor, xsave/xrstore are used to save/resotre the XSAVE area according to the XCR0 and XSS. The legacy region in XSAVE area include FPU and SSE, we should make sure the legacy region be saved during contex switch. FPU in XCR0 is always enabled according to SDM. For SSE, we enable it in XCR0 during context switch. Tracked-On: #5062 Signed-off-by: Conghui Chen <conghui.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-07-22 14:19:21 +08:00
Conghui Chen	53d4a7169b	hv: remove kick_thread from scheduler module kick_thread function is only used by kick_vcpu to kick vcpu out of non-root mode, the implementation in it is sending IPI to target CPU if target obj is running and target PCPU is not current one; while for runnable obj, it will just make reschedule request. So the kick_thread is not actually belong to scheduler module, we can drop it and just do the cpu notification in kick_vcpu. Tracked-On: #5057 Signed-off-by: Conghui Chen <conghui.chen@intel.com> Reviewed-by: Shuo A Liu <shuo.a.liu@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-07-22 13:38:41 +08:00
Conghui Chen	b6422f8985	hv: remove 'running' from vcpu structure vcpu->running is duplicated with THREAD_STS_RUNNING status of thread object. Introduce an API sleep_thread_sync(), which can utilize the inner status of thread object, to do the sync sleep for zombie_vcpu(). Tracked-On: #5057 Signed-off-by: Conghui Chen <conghui.chen@intel.com> Reviewed-by: Shuo A Liu <shuo.a.liu@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-07-22 13:38:41 +08:00
Mingqiang Chi	aa89eb3541	hv:add per-vm lock for vm & vcpu state change -- replace global hypercall lock with per-vm lock -- add spinlock protection for vm & vcpu state change v1-->v2: change get_vm_lock/put_vm_lock parameter from vm_id to vm move lock obtain before vm state check move all lock from vmcall.c to hypercall.c Tracked-On: #4958 Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-07-20 11:22:17 +08:00
Li Fei1	80c7da8f1c	hv: vioapic: expose ioapic to guest unconditionally Some OSes assume the platform must have the IOAPIC. For example: Linux Kernel allocates IRQ force from GSI (0 if there's no PIC and IOAPIC) on x86. And it thinks IRQ 0 is an architecture special IRQ, not for device driver. As a result, the device driver may goes wrong if the allocated IRQ is 0 for RTVM. This patch expose vIOAPIC to RTVM with LAPIC passthru even though the RTVM can't use IOAPIC, it servers as a place holder to fullfil the guest assumption. After vIOAPIC has exposed to guest unconditionally, the 'ready' field could be removed since we do vIOAPIC initialization for each guest. Tracked-On: #4691 Signed-off-by: Li Fei1 <fei1.li@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-07-10 19:33:46 +08:00
Shuo A Liu	0397cb7174	hv: Fix the interrupts lost issue with PI support Currently, not all platforms support posted interrupt processing of both VT-x and VT-d. On EHL, VT-d doesn't support posted interrupt processing. So in such scenario, is_pi_capable() in vcpu_handle_pi_notification() will bypass the PIR pending bits check which might cause a self-NV-IPI lost. With commit "bf1ff8c98 (hv: Offload syncing PIR to vIRR to processor hardware)", the syncing PIR to vIRR is postponed and it is handled by a self-NV-IPI in the following VMEnter. The process looks like, a) vcpu A accepts a virtual interrupt -> 1) ACRN_REQUEST_EVENT is set 2) corresponding bit in PIR is set 3) Posted Interrupt ON bit is set b) vcpu A does virtual interrupt injection on resume path due to the pending ACRN_REQUEST_EVENT -> 1) hypervisor disables host interrupt 2) ACRN_REQUEST_EVENT is cleared 3) a self-NV-IPI is sent via ICR of LAPIC. 4) IRR bit of the self-NV-IPI is set c) (VM-ENTRY) vcpu A returns into non-root mode 1) host interrupt enable(by HW) 2) posted interrupt processing clears the ON bit, sync PIR to vIRR 3) deliver the virtual interrupt if guest rflags.IF=1 d) (VM-EXIT) vcpu A traps due to a instruction execution (e.g. HLT) 1) host interrupt disable(by HW) 2) hypervisor enable host interrupt Above illustrates a normal process of the virtual interrupt injection with cpu PI support. However, a failing case is observed. The failing case is that the self-NV-IPI from b-3 is not accepted by the core until a timing between d-1 and d-2. b-4 happening between d-1 and d-2 is observed by debug trace. So the self-NV-IPI will be handled in root-mode which cannot do the syncing PIR to vIRR processing. Due to the bug described in the first paragraph, vcpu_handle_pi_notification() cannot succeed the virtual interrupt injection request. This patch fix it by removing the wrong check in vcpu_handle_pi_notification() because vcpu_handle_pi_notification() only happens on platform with cpu PI support. Here are some cost data for sending IPI via LAPIC ICR regsiter. Normally, the cycles between ICR write and IRR got set is 140~260, which is not accurate due to the MSR read overhead. And from b-3 to c is about 560 cycles. So b-4 happens during this period. But in bad case, b-4 doesn't happen even c is triggered. The worse case i captured is that ICR write and IRR got set costs more than 1900 cycles. Now, the best GUESS of the huge cost of IPI via ICR is the ACPI bus arbitration(refer to SDM 10.6.3, 10.7 and Figure 10-17). Tracked-On: #4937 Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Acked-by： Eddie Dong <eddie.dong@intel.com>	2020-06-28 10:33:22 +08:00
Li Fei1	da7c2ba3e9	hv: ept: wrap a function to do guest ept flush Wrap a function to do guest ept flush. This function doesn't do real EPT flush. It just make the EPT flush request and do the real flush just before vcpu vmenter. Tracked-On: #4550 Signed-off-by: Li Fei1 <fei1.li@intel.com>	2020-06-22 16:25:03 +08:00
Mingqiang Chi	1b84741a56	rename vm_lock/vlapic_state in VM structure rename: vlapic_state-->vlapic_mode vm_lock --> vlapic_mode_lock check_vm_vlapic_state --> check_vm_vlapic_mode Tracked-On: #4958 Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>	2020-06-19 16:13:20 +08:00
Mingqiang Chi	d808031a04	remove spin lock for micro code update remove spin lock for micro code update since the guest operating system will take lock Tracked-On: #4958 Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>	2020-06-19 16:13:20 +08:00
Conghui Chen	906284eec8	hv: remove unnecessary debug symbols remove unnecessary debug symbols. Tracked-On: #4956 Signed-off-by: Conghui Chen <conghui.chen@intel.com>	2020-06-18 14:05:56 +08:00
Conghui Chen	f4292752b0	hv: remove check for OSXSAVE in host We always assume the physical platform has XSAVE, and we always enable XSAVE at the beginning, so, no need to check the OXSAVE in host. Tracked-On: #4956 Signed-off-by: Conghui Chen <conghui.chen@intel.com>	2020-06-18 14:05:56 +08:00
Conghui Chen	53f74f18ac	hv: remove repeated assignment remove repeated assignment for vmcs_pa. Tracked-On: #4956 Signed-off-by: Conghui Chen <conghui.chen@intel.com>	2020-06-18 14:05:56 +08:00
Binbin Wu	7bfcc673a6	hv: ptirq: associate an irte with ptirq_remapping_info entry For a ptirq_remapping_info entry, when build IRTE: - If the caller provides a valid IRTE, use the IRET - If the caller doesn't provide a valid IRTE, allocate a IRET when the entry doesn't have a valid IRTE, in this case, the IRET will be freed when free the entry. Tracked-On:#4831 Signed-off-by: Binbin Wu <binbin.wu@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-06-16 08:52:56 +08:00
Binbin Wu	2fe4280cfa	hv: vtd: add two paramters for dmar_assign_irte idx_in: - If the caller of dmar_assign_irte passes a valid IRTE index, it will be resued; - If the caller of dmar_assign_irte passes INVALID_IRTE_ID as IRTE index, the function will allocate a new IRTE. idx_out: This paramter return the actual index of IRTE used. The caller need to check whether the return value is valid or not. Also this patch adds an internal function alloc_irte. The function takes count as input paramter to allocate continuous IRTEs. The count can only be 1, 2, 4, 8, 16 or 32. This is prepared for multiple MSI vector support. Tracked-On: #4831 Signed-off-by: Binbin Wu <binbin.wu@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-06-16 08:52:56 +08:00
Li Fei1	ae4fa40adc	hv: vpci: hv: vpci: refine pci device assignment logic Now Host Bridge and PCI Bridge could only be added to SOS's acrn_vm_pci_dev_config. So For UOS, we always emualte Host Bridge and PCI Bridge for it and assign PCI device to it; for SOS, if it's the highest severity VM, we will assign Host Bridge and PCI Bridge to it directly, otherwise, we will emulate them same as UOS. Tracked-On: #4550 Signed-off-by: Li Fei1 <fei1.li@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-06-03 22:00:43 +08:00
Minggui Cao	8c090c71ca	HV: fix bug to clear guest flags after it not used in shutdown_vm, it uses guest flags when handling the phyiscal CPUs whose LAPIC is pass-through. So if it is cleared first, the related vCPUs and pCPUs can not be switched to correct state. so move the clear action after the flags used. Tracked-On: #4848 Signed-off-by: Minggui Cao <minggui.cao@intel.com> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>	2020-05-27 11:35:47 +08:00
Shuo A Liu	9a15ea82ee	hv: pause all other vCPUs in same VM when do wbinvd emulation Invalidate cache by scanning and flushing the whole guest memory is inefficient which might cause long execution time for WBINVD emulation. A long execution in hypervisor might cause a vCPU stuck phenomenon what impact Windows Guest booting. This patch introduce a workaround method that pausing all other vCPUs in the same VM when do wbinvd emulation. Tracked-On: #4703 Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-05-21 15:21:29 +08:00
Mingqiang Chi	f994b5ffaf	hv:cleanup vcpu state -- remove VCPU_PAUSED and resume_vcpu -- remove vcpu->prev_state in vcpu structure -- rename pause_vcpu to zombie_vcpu Tracked-On: #4320 Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>	2020-05-21 15:08:49 +08:00
Li Fei1	53af096726	hv: ptirq: refine find_ptirq_entry by hashing Refine find_ptirq_entry by hashing instead of walk each of the PTIRQ entries one by one. Tracked-On: #4550 Signed-off-by: Li Fei1 <fei1.li@intel.com> Acked-by: Eddie Dong<eddie.dong@Intel.com>	2020-05-20 16:04:16 +08:00
Yonghua Huang	3391bffb27	hv:fix rtvm hang with maxcpus=0/1 in bootargs RTVM (with lapic PT) boots hang when maxcpus is assigned a value less than the CPU number configured in hypervisor. In this case, vlapic_state(per VM) is left in TRANSITION state after BSP boot, which blocks interupts to be injected to this UOS. Tracked-On: #4803 Signed-off-by: Yonghua Huang <yonghua.huang@intel.com> Reviewed-by: Li, Fei <fei1.li@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-05-15 10:09:13 +08:00
Li Fei1	27a66acd0e	hv: ptdev: refine look up MSI ptirq entry There's no need to look up MSI ptirq entry by virtual SID any more since the MSI ptirq entry would be removed before the device is assigned to a VM. Now the logic of MSI interrupt remap could simplify as: 1. Add the MSI interrupt remap first; 2. If step is already done, just do the remap part. Tracked-On: #4550 Signed-off-by: Li Fei1 <fei1.li@intel.com> Acked-by: Eddie Dong<eddie.dong@Intel.com> Reviewed-by: Grandhi, Sainath <sainath.grandhi@intel.com>	2020-05-13 14:31:01 +08:00
Zide Chen	0a956c34c7	hv: add a new field cpu_affinity in struct acrn_vm For post-launched VMs, the configured CPU affinity could be different from the actual running CPU affinity. This new field acrn_vm->cpu_affinity recognizes this difference so that it's possible that CREATE_VM hypercall won't overwrite the configured CPU afifnity. Change name cpu_affinity_bitmap in acrn_vm_config to cpu_affinity. This is read-only in run time, never overwritten by acrn-dm. Remove vm_config->vcpu_num, which means the number of vCPUs of the configured CPU affinity. This is not to be confused with the actual running vCPU number: vm->hw.created_vcpus. Changed get_vm_bsp_pcpu_id() to get_configured_bsp_pcpu_id() for less confusion. Tracked-On: #4616 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-05-08 11:04:31 +08:00
Sainath Grandhi	bf1ff8c98f	hv: Offload syncing PIR to vIRR to processor hardware ACRN syncs PIR to vIRR in the software in cases the Posted Interrupt notification happens while the pCPU is in root mode. Sync can be achieved by processor hardware by sending a posted interrupt notiification vector. This patch sends a self-IPI, if there are interrupts pending in PIR, which is serviced by the logical processor at the next VMEnter Tracked-On: #4777 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>	2020-05-08 10:01:07 +08:00
Li Fei1	0c6b3e57d6	hv: ptdev: minor refine about ptirq_build_physical_msi The virtual MSI information could be included in ptirq_remapping_info structrue, there's no need to pass another input paramater for this puepose. So we could remove the ptirq_msi_info input. Tracked-On: #4550 Signed-off-by: Li Fei1 <fei1.li@intel.com>	2020-05-06 11:51:11 +08:00
Li Fei1	73335b7276	hv: ptirq: rename ptirq_lookup_entry_by_sid to find_ptirq_entry We look up PTIRQ entru only by SID. So _by_sid could removed. And refine function name to verb-obj style. Tracked-On: #4550 Signed-off-by: Li Fei1 <fei1.li@intel.com>	2020-05-06 11:51:11 +08:00
Yin Fengwei	68269a559f	gpa2hva: add INVAVLID_HPA return value check For return value of local_gpa2hpa, either INVALID_HPA or NULL means the EPT walking failure. Current code only take care of NULL return and leave INVALID_HPA as correct case. In some cases (if guest page table is filled with invalid memory address), it could crash ACRN from guest. Add INVALID_HPA return check as well. Also add @pre assumptions for some gpa2hpa usages. Tracked-On: #4730 Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>	2020-05-06 11:29:30 +08:00
Zide Chen	9150284ca7	hv: replace vcpu_affinity array with cpu_affinity_bitmap Currently the vcpu_affinity[] array fixes the vCPU to pCPU mapping. While the new cpu_affinity_bitmap doesn't explicitly sepcify this mapping, instead, it implicitly assumes that vCPU0 maps to the pCPU with lowest pCPU ID, vCPU1 maps to the second lowest pCPU ID, and so on. This makes it possible for post-launched VM to run vCPUs on a subset of these pCPUs only, and not all of them. acrn-dm may launch post-launched VMs with the current approach: indicate VM UUID and hypervisor launches all VCPUs from the PCPUs that are masked in cpu_affinity_bitmap. Also acrn-dm can choose to launch the VM on a subset of PCPUs that is defined in cpu_affinity_bitmap. In this way, acrn-dm must specify the subset of PCPUs in the CREATE_VM hypercall. Additionally, with this change, a guest's vcpu_num can be easily calculated from cpu_affinity_bitmap, so don't assign vcpu_num in vm_configuration.c. Tracked-On: #4616 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-04-23 09:38:54 +08:00
Sainath Grandhi	60c4ec0c59	hv: Wake up vCPU for interrupts from vPIC Wake up vCPUs that are blocked upon interrupts from vPIC. Tracked-On: #4664 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>	2020-04-20 09:49:41 +08:00
Victor Sun	dfb947fe91	HV: fix wrong gpa start of hpa2 in ve820.c The current logic puts hpa2 above GPA 4G always, which is incorrect. Need to set gpa start of hpa2 right after hpa1 when hpa1 size is less then 2G; Tracked-On: #4458 Signed-off-by: Victor Sun <victor.sun@intel.com>	2020-04-17 14:08:54 +08:00
yuhong.tao@intel.com	7c80acee95	HV: emulate MSR_TEST_CTL If CPU has MSR_TEST_CTL, show an emulaued one to VCPU Tracked-On: #4496 Signed-off-by: Tao Yuhong <yuhong.tao@intel.com> Reviewed-by: Yan, Like <like.yan@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-04-17 09:53:59 +08:00
Mingqiang Chi	f90100e382	hv: add pre-condition for vcpu APIs remove unnecessary state check and add pre-condition for vcpu APIs. Tracked-On: #4320 Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-04-16 21:59:03 +08:00
Jason Chen CJ	0584981c03	hv:add pre-condition for vm APIs check the vm state in hypercall api, add pre-condition for vm api. Tracked-On: #4320 Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-04-16 21:59:03 +08:00
Mingqiang Chi	fe929d0a10	hv: move out pause_vm from shutdown_vm now it will call pause_vm in shutdown_vm, move it out from shutdown_vm to reduce coupling. Tracked-On: #4320 Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-04-16 21:59:03 +08:00
Xiaoguang Wu	d4f789f47e	hv: iommu: remove snoop related code ACRN disables Snoop Control in VT-d DMAR engines for simplifing the implementation. Also, since the snoop behavior of PCIE transactions can be controlled by guest drivers, some devices may take the advantage of the NO_SNOOP_ATTRIBUTE of PCIE transactions for better performance when snoop is not needed. No matter ACRN enables or disables Snoop Control, the DMA operations of passthrough devices behave correctly from guests' point of view. This patch is used to clean all the snoop related code. Tracked-On: #4509 Signed-off-by: Xiaoguang Wu <xiaoguang.wu@intel.com> Reviewed-by: Binbin Wu <binbin.wu@intel.com> Acked-by: Eddie Dong <eddie.dong@Intel.com>	2020-04-16 08:40:17 +08:00
Xiaoguang Wu	b4f1e5aa85	hv: iommu: disable snoop bit in EPT-PTE/SL-PTE Due to the fact that i915 iommu doesn't support snoop, hence it can't access memory when the SNOOP bit of Secondary Level page PTE (SL-PTE) is set, this will cause many undefined issues such as invisible cursor in WaaG etc. Current hv design uses EPT as Scondary Leval Page for iommu, and this patch removes the codes of setting SNOOP bit in both EPT-PTE and SL-PTE to avoid errors. And according to SDM 28.2.2, the SNOOP bit (11th bit) will be ignored by EPT, so it will not affect the CPU address translation. Tracked-On: #4509 Signed-off-by: Xiaoguang Wu <xiaoguang.wu@intel.com> Reviewed-by: Binbin Wu <binbin.wu@intel.com> Acked-by: Eddie Dong <eddie.dong@Intel.com>	2020-04-16 08:40:17 +08:00
Conghui Chen	84ad340898	hv: fix for waag 2 core reboot issue Waag will send NMIs to all its cores during reboot. But currently, NMI cannot be injected to vcpu which is in HLT state. To fix the problem, need to wakeup target vcpu, and inject NMI through interrupt-window. Tracked-On: #4620 Signed-off-by: Conghui Chen <conghui.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-04-15 14:42:00 +08:00
Binbin Wu	597f7658fc	hv: guest: fix bug in get_vcpu_paging_mode Align the implementation to SDM Vol.3 4.1.1. Also this patch fixed a bug that doesn't check paging status first in some cpu mode. Tracked-On: #4628 Signed-off-by: Binbin Wu <binbin.wu@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-04-15 14:40:02 +08:00
Zide Chen	6040d8f6a2	hv: fix SOS vapic_id assignment issue Currently vlapic_build_id() uses vcpu_id to retrieve the lapic_id per_cpu variable: vlapic_id = per_cpu(lapic_id, vcpu->vcpu_id); SOS vcpu_id may not equal to pcpu_id, and in that case it runs into problems. For example, if any pre-launched VMs are launched on PCPUs whose IDs are smaller than any PCPU IDs that are used by SOS. This patch fixes the issue and simplify the code to create or get vapic_id by: - assign vapic_id in create_vlapic(), which now takes pcpu_id as input argument, and save it in the new field: vlapic->vapic_id, which will never be changed. - simplify vlapic_get_apicid() by returning te saved vapid_id directly. - remove vlapic_build_id(). - vlapic_init() is only called once, merge it into vlapic_create(). Tracked-On: #4268 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-04-15 14:34:15 +08:00
dongshen	00ad3863a1	hv: maintain a per-pCPU array of vCPUs and handle posted interrupt IRQs Maintain a per-pCPU array of vCPUs (struct acrn_vcpu *vcpu_array[CONFIG_MAX_VM_NUM]), one VM cannot have multiple vCPUs share one pcpu, so we can utilize this property and use the containing VM's vm_id as the index to the vCPU array: In create_vcpu(), we simply do: per_cpu(vcpu_array, pcpu_id)[vm->vm_id] = vcpu; In offline_vcpu(): per_cpu(vcpu_array, pcpuid_from_vcpu(vcpu))[vcpu->vm->vm_id] = NULL; so basically we use the containing VM's vm_id as the index to the vCPU array, as well as the index of posted interrupt IRQ/vector pair that are assigned to this vCPU: 0: first vCPU and first posted interrupt IRQs/vector pair (POSTED_INTR_IRQ/POSTED_INTR_VECTOR) ... CONFIG_MAX_VM_NUM-1: last vCPU and last posted interrupt IRQs/vector pair ((POSTED_INTR_IRQ + CONFIG_MAX_VM_NUM - 1U)/(POSTED_INTR_VECTOR + CONFIG_MAX_VM_NUM - 1U) In the posted interrupt handler, it will do the following: Translate the IRQ into a zero based index of where the vCPU is located in the vCPU list for current pCPU. Once the vCPU is found, we wake up the waiting thread and record this request as ACRN_REQUEST_EVENT Tracked-On: #4506 Signed-off-by: dongshen <dongsheng.x.zhang@intel.com> Reviewed-by: Eddie Dong <eddie.dong@Intel.com> Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>	2020-04-15 13:47:22 +08:00
dongshen	14fa9c563c	hv: define posted interrupt IRQs/vectors This is a preparation patch for adding support for VT-d PI related vCPU scheduling. ACRN does not support vCPU migration, one vCPU always runs on the same pCPU, so PI's ndst is never changed after startup. VCPUs of a VM won’t share same pCPU. So the maximum possible number of VCPUs that can run on a pCPU is CONFIG_MAX_VM_NUM. Allocate unique Activation Notification Vectors (ANV) for each vCPU that belongs to the same pCPU, the ANVs need only be unique within each pCPU, not across all vCPUs. This reduces # of pre-allocated ANVs for posted interrupts to CONFIG_MAX_VM_NUM, and enables ACRN to avoid switching between active and wake-up vector values in the posted interrupt descriptor on vCPU scheduling state changes. A total of CONFIG_MAX_VM_NUM consecutive IRQs/vectors are reserved for posted interrupts use. The code first initializes vcpu->arch.pid.control.bits.nv dynamically (will be added in subsequent patch), the other code shall use vcpu->arch.pid.control.bits.nv instead of the hard-coded notification vectors. Rename some functions: apicv_post_intr --> apicv_trigger_pi_anv posted_intr_notification --> handle_pi_notification setup_posted_intr_notification --> setup_pi_notification Tracked-On: #4506 Signed-off-by: dongshen <dongsheng.x.zhang@intel.com> Reviewed-by: Eddie Dong <eddie.dong@Intel.com>	2020-04-15 13:47:22 +08:00
dongshen	f7be985a23	hv: check if the IRQ is intended for a single destination vCPU Given the vcpumask, check if the IRQ is single destination and return the destination vCPU if so, the address of associated PI descriptor for this vCPU can then be passed to dmar_assign_irte() to set up the posted interrupt IRTE for this device. For fixed mode interrupt delivery, all vCPUs listed in vcpumask should service the interrupt requested. But VT-d PI cannot support multicast/broadcast IRQs, it only supports single CPU destination. So the number of vCPUs shall be 1 in order to handle IRQ in posted mode for this device. Add pid_paddr to struct intr_source. If platform_caps.pi is true and the IRQ is single-destination, pass the physical address of the destination vCPU's PID to ptirq_build_physical_msi and dmar_assign_irte Tracked-On: #4506 Signed-off-by: dongshen <dongsheng.x.zhang@intel.com> Reviewed-by: Eddie Dong <eddie.dong@Intel.com>	2020-04-15 13:47:22 +08:00
dongshen	6496da7c56	hv: add function to check if using posted interrupt is possible for vm Add platform_caps.c to maintain platform related information Set platform_caps.pi to true if all iommus are posted interrupt capable, false otherwise If lapic passthru is not configured and platform_caps.pi is true, the vm may be able to use posted interrupt for a ptdev, if the ptdev's IRQ is single-destination Tracked-On: #4506 Signed-off-by: dongshen <dongsheng.x.zhang@intel.com> Reviewed-by: Eddie Dong <eddie.dong@Intel.com>	2020-04-15 13:47:22 +08:00
Jian Jun Chen	159c9ec759	hv: add lock for ept add/modify/del EPT table can be changed concurrently by more than one vcpus. This patch add a lock to protect the add/modify/delete operations from different vcpus concurrently. Tracked-On: #4253 Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com> Reviewed-by: Li, Fei1 <fei1.li@intel.com>	2020-04-13 11:38:55 +08:00
Li Fei1	74edf2e54b	hv: vmcs: remove vmcs field check for a vcpu The VMCS field is an embedded array for a vCPU. So there's no need to check for NULL before use. Tracked-On: #3813 Signed-off-by: Li Fei1 <fei1.li@intel.com>	2020-04-09 09:40:26 +08:00
Li Fei1	366214e567	hv: virq: refine pending event inject sequence Inject pending exception prior pending interrupt to complete the previous instruction. Tracked-On: #1842 Signed-off-by: Li Fei1 <fei1.li@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-04-09 09:40:00 +08:00
Li Fei1	572f755037	hv: vm: refine the devices unregistration sequence of vm shutdown Conceptually, the devices unregistration sequence of the shutdown process should be opposite to create. Tracked-On: #4550 Signed-off-by: Li Fei1 <fei1.li@intel.com>	2020-04-08 10:13:37 +08:00
Yan, Like	2997c4b570	HV: CAT: support cache allocation for each vcpu This commit allows hypervisor to allocate cache to vcpu by assigning different clos to vcpus of a same VM. For example, we could allocate different cache to housekeeping core and real-time core of an RTVM in order to isolate the interference of housekeeping core via cache hierarchy. Tracked-On: #4566 Signed-off-by: Yan, Like <like.yan@intel.com> Reviewed-by: Chen, Zide <zide.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-04-02 13:55:35 +08:00
Sainath Grandhi	8ffe6fc67a	hv: Reserve space for VMs' EPT 4k pages after boot As ACRN prepares to support servers with large amounts of memory current logic to allocate space for 4K pages of EPT at compile time will increase the size of .bss section of ACRN binary. Bootloaders could run into a situation where they cannot find enough contiguous space to load ACRN binary under 4GB, which is typically heavily fragmented with E820 types Reserved, ACPI data, 32-bit PCI hole etc. This patch does the following 1) Works only for "direct" mode of vboot 2) reserves space for 4K pages of EPT, after boot by parsing platform E820 table, for all types of VMs. Size comparison: w/o patch Size of DRAM Size of .bss 48 GB 0xe1bbc98 (~226 MB) 128 GB 0x222abc98 (~548 MB) w/ patch Size of DRAM Size of .bss 48 GB 0x1991c98 (~26 MB) 128 GB 0x1a81c98 (~28 MB) Tracked-On: #4563 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-04-01 21:13:37 +08:00
Li Fei1	ea2616fbbf	hv: vlapic: minor fix about dereference vcpu from vlapic Since vcpu if remove from vlapic, we could not dereference it directly. Tracked-On: #4550 Signed-off-by: Li Fei1 <fei1.li@intel.com>	2020-03-31 15:59:52 +08:00
Li Fei1	2b7168da9e	hv: vmtrr: remove vcpu structure pointer from vmtrr We could use container_of to get vcpu structure pointer from vmtrr. So vcpu structure pointer is no need in vmtrr structure. Tracked-On: #4550 Signed-off-by: Li Fei1 <fei1.li@intel.com>	2020-03-31 10:57:47 +08:00
Li Fei1	a7768fdb6a	hv: vlapic: remove vcpu/vm structure pointer from vlapic We could use container_of to get vcpu/vm structure pointer from vlapic. So vcpu/vm structure pointer is no need in vlapic structure. Tracked-On: #4550 Signed-off-by: Li Fei1 <fei1.li@intel.com>	2020-03-31 10:57:47 +08:00
Li Fei1	7f342bf62f	hv: list: rename list_entry to container_of This function casts a member of a structure out to the containing structure. So rename to container_of is more readable. Tracked-On: #4550 Signed-off-by: Li Fei1 <fei1.li@intel.com>	2020-03-31 10:57:47 +08:00
dongshen	1328dcb205	hv: extend union dmar_ir_entry to support VT-d posted interrupts Exend union dmar_ir_entry to support VT-d posted interrupts. Rename some fields of union dmar_ir_entry: entry --> value sw_bits --> avail Tracked-On: #4506 Signed-off-by: dongshen <dongsheng.x.zhang@intel.com> Reviewed-by: Eddie Dong <eddie.dong@Intel.com>	2020-03-31 10:30:30 +08:00
dongshen	016c1a5073	hv: pass pointer to functions Pass intr_src and dmar_ir_entry irte as pointers to dmar_assign_irte(), which fixes the "Attempt to change parameter passed by value" MISRA C violation. A few coding style fixes Tracked-On: #4506 Signed-off-by: dongshen <dongsheng.x.zhang@intel.com> Reviewed-by: Eddie Dong <eddie.dong@Intel.com>	2020-03-31 10:30:30 +08:00
dongshen	0f3c876a91	hv: extend struct pi_desc to support VT-d posted interrupts For CPU side posted interrupts, it only uses bit 0 (ON) of the PI's 64-bit control , other bits are don't care. This is not the case for VT-d posted interrupts, define more bit fields for the PI's 64-bit control. Use bitmap functions to manipulate the bit fields atomically. Some MISRA-C violation and coding style fixes Tracked-On: #4506 Signed-off-by: dongshen <dongsheng.x.zhang@intel.com> Reviewed-by: Eddie Dong <eddie.dong@Intel.com>	2020-03-31 10:30:30 +08:00
dongshen	8f732f2809	hv: move pi_desc related code from vlapic.h/vlapic.c to vmx.h/vmx.c/vcpu.h The posted interrupt descriptor is more of a vmx/vmcs concept than a vlapic concept. struct acrn_vcpu_arch stores the vmx/vmcs info, so put struct pi_desc in struct acrn_vcpu_arch. Remove the function apicv_get_pir_desc_paddr() A few coding style/typo fixes Tracked-On: #4506 Signed-off-by: dongshen <dongsheng.x.zhang@intel.com> Reviewed-by: Eddie Dong <eddie.dong@Intel.com>	2020-03-31 10:30:30 +08:00
dongshen	b384d04ad1	hv: rename vlapic_pir_desc to pi_desc Rename struct vlapic_pir_desc to pi_desc Rename struct member and local variable pir_desc to pid pir=posted interrupt request, pi=posted interrupt pid=posted interrupt descriptor pir is part of pi descriptor, so it is better to use pi instead of pir struct pi_desc will be moved to vmx.h in subsequent commit. Tracked-On: #4506 Signed-off-by: dongshen <dongsheng.x.zhang@intel.com> Reviewed-by: Eddie Dong <eddie.dong@Intel.com>	2020-03-31 10:30:30 +08:00
Zide Chen	eef3b51eda	hv: move error message logging into gpa copy APIs In this way, the code looks simpler and line of code is reduced. Tracked-On: #3854 Signed-off-by: Zide Chen <zide.chen@intel.com>	2020-03-30 13:19:01 +08:00
Li Fei1	4512ef7ec9	hv: cpuid: remove cpuid() The cupid() can be replaced with cupid_subleaf, which is more clear. Having both APIs makes reading difficult. Tracked-On: #4526 Signed-off-by: Li Fei1 <fei1.li@intel.com>	2020-03-25 13:26:58 +08:00
Sainath Grandhi	fe5a108c7b	hv: vioapic init for SOS VM on platforms with multiple IO-APICs For SOS VM, when the target platform has multiple IO-APICs, there should be equal number of virtual IO-APICs. This patch adds support for emulating multiple vIOAPICs per VM. Tracked-On: #4151 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Acked-by: Eddie Dong <eddie.dong@Intel.com>	2020-03-25 09:36:18 +08:00
Sainath Grandhi	85217e362f	hv: Introduce Global System Interrupt (GSI) into INTx Remapping As ACRN prepares to support platforms with multiple IO-APICs, GSI is a better way to represent physical and virtual INTx interrupt source. 1) This patch replaces usage of "pin" with "gsi" whereever applicable across the modules. 2) PIC pin to gsi is trickier and needs to consider the usage of "Interrupt Source Override" structure in ACPI for the corresponding VM. Tracked-On: #4151 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Acked-by: Eddie Dong <eddie.dong@Intel.com>	2020-03-25 09:36:18 +08:00
Sainath Grandhi	dd6c80c305	hv: Move error checking for hypercall parameters out of assign module Moving checks on validity of IOAPIC interrupt remapping hypercall parameters to hypercall module Tracked-On: #4151 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Acked-by: Eddie Dong <eddie.dong@Intel.com>	2020-03-25 09:36:18 +08:00
Sainath Grandhi	06b59e0bc1	hv: Use ptirq_lookup_entry_by_sid to lookup virtual source id in IOAPIC irq entries Reverts `538ba08c`: hv:Add vpin to ptdev entry mapping for vpic/vioapic ACRN uses an array of size per VM to store ptirq entries against the vIOAPIC pin and an array of size per VM to store ptirq entries against the vPIC pin. This is done to speed up "ptirq entry" lookup at runtime for Level triggered interrupts in API ptirq_intx_ack used on EOI. This patch switches the lookup API for INTx interrupts to the API, ptirq_lookup_entry_by_sid This could add delay to processing EOI for Level triggered interrupts. Trade-off here is space saved for array/s of size CONFIG_MAX_IOAPIC_LINES with 8 bytes per data. On a server platform, ACRN needs to emulate multiple vIOAPICs for SOS VM, same as the number of physical IO-APICs. Thereby ACRN would need around 10 such arrays per VM. Removes the need of "pic_pin" except for the APIs facing the hypercalls hcall_set_ptdev_intr_info, hcall_reset_ptdev_intr_info Tracked-On: #4151 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Acked-by: Eddie Dong <eddie.dong@Intel.com>	2020-03-25 09:36:18 +08:00
Li Fei1	e5c7a96513	hv: vpci: sos could access low severity guest pci cfg space There're some cases the SOS (higher severity guest) needs to access the post-launched VM (lower severity guest) PCI CFG space: 1. The SR-IOV PF needs to reset the VF 2. Some pass through device still need DM to handle some quirk. In the case a device is assigned to a UOS and is not in a zombie state, the SOS is able to access, if and only if the SOS has higher severity than the UOS. Tracked-On: #4371 Signed-off-by: Li Fei1 <fei1.li@intel.com>	2020-03-20 10:08:43 +08:00
Mingqiang Chi	14692ef60c	hv:Rename two VM states Rename: VM_STARTED --> VM_RUNNING VM_POWERING_OFF --> VM_READY_TO_POWEROFF Tracked-On: #4320 Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-03-13 10:34:29 +08:00
Victor Sun	e74553492a	HV: move create_sos_vm_e820 to ve820.c ve820.c is a common file in arch/x86/guest/ now, so move function of create_sos_vm_e820() to this file to make code structure clear; Tracked-On: #4458 Signed-off-by: Victor Sun <victor.sun@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-03-12 14:56:34 +08:00
Victor Sun	d7eac3fe6a	HV: decouple prelaunch VM ve820 from board configs hypervisor/arch/x86/configs/$(BOARD)/ve820.c is used to store pre-launched VM specific e820 entries according to memory configuration of customer. It should be a scenario based configurations but we had to put it in per board foler because of different board memory settings. This brings concerns to customer on configuration orgnization. Currently the file provides same e820 layout for all pre-launched VMs, but they should have different e820 when their memory are configured differently. Although we have acrn-config tool to generate ve802.c automatically, it is not friendly to modify hardcoded ve820 layout manually, so the patch changes the entries initialization method by calculating each entry item in C code. Tracked-On: #4458 Signed-off-by: Victor Sun <victor.sun@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-03-12 14:56:34 +08:00
Sainath Grandhi	460e7ee5b1	hv: Variable/macro renaming for intr handling of PT devices using IO-APIC/PIC 1. Renames DEFINE_IOAPIC_SID with DEFINE_INTX_SID as the virtual source can be IOAPIC or PIC 2. Rename the src member of source_id.intx_id to ctlr to indicate interrupt controller 2. Changes the type of src member of source_id.intx_id from uint32_t to enum with INTX_CTLR_IOAPIC and INTX_CTLR_PIC Tracked-On: #4447 Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>	2020-03-06 11:29:02 +08:00
Conghui Chen	595cefe3f2	hv: xsave: move assembler to individual function Current code avoid the rule 88 S in MISRA-C, so move xsaves and xrstors assembler to individual functions. Tracked-On: #4436 Signed-off-by: Conghui Chen <conghui.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-02-28 17:55:06 +08:00
Conghui Chen	c246d1c9b8	hv: xsave: bugfix for init value The init value for XCR0 and XSS should be the same with spec: In SDM Vol1 13.3: XCR0[0] is associated with x87 state (see Section 13.5.1). XCR0[0] is always 1. The other bits in XCR0 are all 0 coming out of RESET. The IA32_XSS MSR (with MSR index DA0H) is zero coming out of RESET. The previous code try to fix the xsave area leak to other VMs during init phase, but bring the error to linux. Besides, it cannot avoid the possible leak in running phase. Need find a better solution. Tracked-On: #4430 Signed-off-by: Conghui Chen <conghui.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-02-28 09:19:29 +08:00
Vijay Dhanraj	887e3813bc	HV: Add both HW and SW checks for RDT support There can be times when user unknowinlgy enables CONFIG_CAT_ENBALED SW flag, but the hardware might not support L3 or L2 CAT. In such case software can end up writing to the CAT MSRs which can cause undefined results. The patch fixes the issue by enabling CAT only when both HW as well software via the CONFIG_CAT_ENABLED supports CAT. The patch also address typo with "clos2prq_msr" function name. It should be "clos2pqr_msr" instead. PQR stands for platform qos register. Tracked-On: #3715 Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-02-27 10:44:07 +08:00
Vijay Dhanraj	2597429903	HV: Rename cat.c/.h files to rdt.c/.h As part of rdt cat refactoring, goal is to combine all rdt specific features such as CAT under one module. So renaming rdt resouce specific files such as cat.c/.h to generic rdt.c/.h files. Tracked-On: #3715 Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-02-27 10:44:07 +08:00
Yonghua Huang	64b874ce4c	hv: rename BOOT_CPU_ID to BSP_CPU_ID 1. Rename BOOT_CPU_ID to BSP_CPU_ID 2. Repace hardcoded value with BSP_CPU_ID when ID of BSP is referenced. Tracked-On: #4420 Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>	2020-02-25 09:08:14 +08:00
Li Fei1	e8479f84cd	hv: vPCI: remove passthrough PCI device unuse code Now we split passthrough PCI device from DM to HV, we could remove all the passthrough PCI device unused code. Tracked-On: #4371 Signed-off-by: Li Fei1 <fei1.li@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-02-24 16:17:38 +08:00
Li Fei1	dafa3da693	vPCI: split passthrough PCI device from DM to HV In this case, we could handle all the passthrough PCI devices in ACRN hypervisor. But we still need DM to initialize BAR resources and Intx for passthrough PCI device for post-launched VM since these informations should been filled into ACPI tables. So 1. we add a HC vm_assign_pcidev to pass the extra informations to replace the old vm_assign_ptdev. 2. we saso remove HC vm_set_ptdev_msix_info since it could been setted by the post-launched VM now same as SOS. 3. remove vm_map_ptdev_mmio call for PTDev in DM since ACRN hypervisor will handle these BAR access. 4. the most important thing is to trap PCI configure space access for PTDev in HV for post-launched VM and bypass the virtual PCI device configure space access to DM. This patch doesn't do the clean work. Will do it in the next patch. Tracked-On: #4371 Signed-off-by: Li Fei1 <fei1.li@intel.com>	2020-02-24 16:17:38 +08:00
Li Fei1	fe3182ea05	hv: vPCI: add assign/deassign PCI device HC APIs Add assign/deassign PCI device hypercall APIs to assign a PCI device from SOS to post-launched VM or deassign a PCI device from post-launched VM to SOS. This patch is prepared for spliting passthrough PCI device from DM to HV. The old assign/deassign ptdev APIs will be discarded. Tracked-On: #4371 Signed-off-by: Li Fei1 <fei1.li@intel.com>	2020-02-24 16:17:38 +08:00
Shuo A Liu	53de3a727c	hv: reset vcpu events in reset_vcpu On UEFI UP2 board, APs might execute HLT before SOS kernel INIT them. After SOS kernel take over and will re-init the APs directly. The flows from HV perspective is like: HLT trap: wait_event(VCPU_EVENT_VIRTUAL_INTERRUPT) -> sleep_thread SOS kernel INIT, SIPI APs: pause_vcpu(ZOMBIE) -> sleep_thread -> reset_vcpu -> launch_vcpu -> wake_vcpu However, the last wake_vcpu will fail because the cpu event VCPU_EVENT_VIRTUAL_INTERRUPT had not got signaled. This patch will reset all vcpu events in reset_vcpu. If the thread was previously waiting for a event, its waiting status will be cleared and launch_vcpu will wake it to running. Tracked-On: #4402 Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-02-23 16:27:57 +08:00
Yonghua Huang	fd4775d044	hv: rename VECTOR_XXX and XXX_IRQ Macros 1. Align the coding style for these MACROs 2. Align the values of fixed VECTORs Tracked-On: #4348 Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>	2020-01-14 10:21:23 +08:00
Yonghua Huang	b90862921e	hv: rename the ACRN_DBG_XXX Refine this MACRO 'ACRN_DBG_XXX' to 'DBG_LEVEL_XXX' Tracked-On: #4348 Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>	2020-01-14 10:21:23 +08:00
Shuo A Liu	b59e5a870a	hv: Disable HLT and PAUSE-loop exiting emulation in lapic passthrough In lapic passthrough mode, it should passthrough HLT/PAUSE execution too. This patch disable their emulation when switch to lapic passthrough mode. Tracked-On: #4329 Tested-by: Dongsheng Zhang <dongsheng.x.zhang@intel.com> Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-01-13 10:16:30 +08:00
Shuo A Liu	db708fc3e8	hv: rename is_completion_polling to is_polling_ioreq is_polling_ioreq is more straightforward. Rename it. Tracked-On: #4329 Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-01-13 10:16:30 +08:00
Li Fei1	65ed6c3529	hv: vpci: trap PCIe ECAM access for SOS SOS will use PCIe ECAM access PCIe external configuration space. HV should trap this access for security(Now pre-launched VM doesn't want to support PCI ECAM; post-launched VM trap PCIe ECAM access in DM). Besides, update PCIe MMCONFIG region to be owned by hypervisor and expose and pass through platform hide PCI devices by BIOS to SOS. Tracked-On: #3475 Signed-off-by: Li Fei1 <fei1.li@intel.com>	2020-01-07 16:05:30 +08:00
Shuo A Liu	4303ccb1a0	hv: HLT emulation in hypervisor HLT emulation is import to CPU resource maximum utilization. vcpu doing HLT means it is idle and can give up CPU proactively. Thus, we pause the vcpu thread in HLT emulation and resume it while event happens. When vcpu enter HLT, its vcpu thread will sleep, but the vcpu state is still 'Running'. VM ID PCPU ID VCPU ID VCPU ROLE VCPU STATE ===== ======= ======= ========= ========== 0 0 0 PRIMARY Running 0 1 1 SECONDARY Running Tracked-On: #4329 Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-01-07 11:23:32 +08:00
Shuo A Liu	a8f6bdd479	hv: Add vlapic_has_pending_intr of apicv to check pending interrupts Sometimes HV wants to know if there are pending interrupts of one vcpu. Add .has_pending_intr interface in acrn_apicv_ops and return the pending interrupts status by check IRRs of apicv. Tracked-On: #4329 Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-01-07 11:23:32 +08:00
Shuo A Liu	e3c303363b	hv: vcpu: wait and signal vcpu event support Introduce two kinds of events for each vcpu, VCPU_EVENT_IOREQ: for vcpu waiting for IO request completion VCPU_EVENT_VIRTUAL_INTERRUPT: for vcpu waiting for virtual interrupts events vcpu can wait for such events, and resume to run when the event get signalled. This patch also change IO request waiting/notifying to this way. Tracked-On: #4329 Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-01-07 11:23:32 +08:00
Shuo A Liu	4115dd6241	hv: PAUSE-loop exiting support in hypervisor As we enabled cpu sharing, PAUSE-loop exiting can help vcpu to release its pcpu proactively. It's good for performance. VMX_PLE_GAP: upper bound on the amount of time between two successive executions of PAUSE in a loop. VMX_PLE_WINDOW: upper bound on the amount of time a guest is allowed to execute in a PAUSE loop Tracked-On: #4329 Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-01-07 11:23:32 +08:00
Victor Sun	bfecf30f32	HV: do not offline pcpu when lapic pt disabled In current code, wait_pcpus_offline() and make_pcpu_offline() are called by both shutdown_vm() and reset_vm(), but this is not needed when lapic_pt is not enabled for the vcpus of the VM. The patch merged offline pcpus part code into a common offline_lapic_pt_enabled_pcpus() api for shutdown_vm() and reset_vm() use and called only when lapic_pt is enabled. Tracked-On: #4325 Signed-off-by: Victor Sun <victor.sun@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-01-06 15:35:08 +08:00
Binbin Wu	41a998fca3	hv: cr: handle control registers related to PCID 1. This patch passes-through CR4.PCIDE to guest VM. 2. This patch handles the invlidation of TLB and the paging-structure caches. According to SDM Vol.3 4.10.4.1, the following instructions invalidate entries in the TLBs and the paging-structure caches: - INVLPG: this instruction is passed-through to guest, no extra handling needed. - INVPCID: this instruction is passed-trhough to guest, no extra handling needed. - CR0.PG from 1 to 0: already handled by current code, change of CR0.PG will do EPT flush. - MOV to CR3: hypervisor doesn't trap this instrcution, no extra handling needed. - CR4.PGE changed: already handled by current code, change of CR4.PGE will no EPT flush. - CR4.PCIDE from 1 to 0: this patch handles this case, will do EPT flush. - CR4.PAE changed: already handled by current code, change of CR4.PAE will do EPT flush. - CR4.SEMP from 1 to 0, already handled by current code, change of CR4.SEMP will do EPT flush. - Task switch: Task switch is not supported in VMX non-root mode. - VMX transitions: already handled by current code with the support of VPID. 3. This patch checks the validatiy of CR0, CR4 related to PCID feature. According to SDM Vol.3 4.10.1, CR.PCIDE can be 1 only in IA-32e mode. - MOV to CR4 causes a general-protection exception (#GP) if it would change CR4.PCIDE from 0 to 1 and either IA32_EFER.LMA = 0 or CR3[11:0] ≠ 000H - MOV to CR0 causes a general-protection exception if it would clear CR0.PG to 0 while CR4.PCIDE = 1 Tracked-On: #4296 Signed-off-by: Binbin Wu <binbin.wu@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-01-02 10:47:34 +08:00
Binbin Wu	4ae350a091	hv: vmcs: pass-through instruction INVPCID to VM According to SDM Vol.3 Section 25.3, behavior of the INVPCID instruction is determined first by the setting of the “enable INVPCID” VM-execution control: - If the “enable INVPCID” VM-execution control is 0, INVPCID causes an invalid-opcode exception (#UD). - If the “enable INVPCID” VM-execution control is 1, treatment is based on the setting of the “INVLPG exiting” VM-execution control: * If the “INVLPG exiting” VM-execution control is 0, INVPCID operates normally. * If the “INVLPG exiting” VM-execution control is 1, INVPCID causes a VM exit. In current implementation, hypervisor doesn't set “INVLPG exiting” VM-execution control, this patch sets “enable INVPCID” VM-execution control to 1 when the instruction is supported by physical cpu. If INVPCID is supported by physical cpu, INVPCID will not cause VM exit in VM. If INVPCID is not supported by physical cpu, INVPCID causes an #UD in VM. When INVPCID is passed-through to VM, According to SDM Vol.3 28.3.3.1, INVPCID instruction invalidates linear mappings and combined mappings. They are required to do so only for the current VPID. HV assigned a unique vpid for each vCPU, if guest uses wrong PCID, it would not affect other vCPUs. Tracked-On: #4296 Signed-off-by: Binbin Wu <binbin.wu@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-01-02 10:47:34 +08:00
Binbin Wu	d330879ce5	hv: cpuid: expose PCID related capabilities to VMs Pass-through PCID related capabilities to VMs: - The support of PCID (CPUID.01H.ECX[17]) - The support of instruction INVPCID (CPUID.07H.EBX[10]) Tracked-On: #4296 Signed-off-by: Binbin Wu <binbin.wu@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-01-02 10:47:34 +08:00
Binbin Wu	96331462b7	hv: vmcs: remove redundant check on vpid ACRN relies on the capability of VPID to avoid EPT flushes during VMX transitions. This capability is checked as a must have hardware capability, otherwise, ACRN will refuse to boot. Also, the current code has already made sure each vpid for a virtual cpu is valid. So, no need to check the validity of vpid for vcpu and enable VPID for vCPU by default. Tracked-On: #4296 Signed-off-by: Binbin Wu <binbin.wu@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2020-01-02 10:47:34 +08:00
Victor Sun	c6f7803f06	HV: restore lapic state and apic id upon INIT Per SDM 10.12.5.1 vol.3, local APIC should keep LAPIC state after receiving INIT. The local APIC ID register should also be preserved. Tracked-On: #4267 Signed-off-by: Victor Sun <victor.sun@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2019-12-27 12:27:08 +08:00
Victor Sun	ab13228591	HV: ensure valid vcpu state transition The vcpu state machine transition should follow below rule: old vcpu state new vcpu state ============== ============== VCPU_OFFLINE --- create_vcpu --> VCPU_INIT VCPU_INIT --- launch_vcpu --> VCPU_RUNNING VCPU_RUNNING --- pause_vcpu --> VCPU_PAUSED VCPU_PAUSED --- resume_vcpu --> VCPU_RUNNING VCPU_RUNNING/PAUSED --- pause_vcpu --> VCPU_ZOMBIE VCPU_INIT --- pause_vcpu --> VCPU_ZOMBIE VCPU_ZOMBIE --- reset_vcpu --> VCPU_INIT VCPU_ZOMBIE --- offline_vcpu--> VCPU_OFFLINE Tracked-On: #4267 Signed-off-by: Victor Sun <victor.sun@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2019-12-27 12:27:08 +08:00
Victor Sun	a5158e2c16	HV: refine reset_vcpu api The patch abstract a vcpu_reset_internal() api for internal usage, the function would not touch any vcpu state transition and just do vcpu reset processing. It will be called by create_vcpu() and reset_vcpu(). The reset_vcpu() will act as a public api and should be called only when vcpu receive INIT or vm reset/resume from S3. It should not be called when do shutdown_vm() or hcall_sos_offline_cpu(), so the patch remove reset_vcpu() in shutdown_vm() and hcall_sos_offline_cpu(). The patch also introduced reset_mode enum so that vcpu and vlapic could do different context operation according to different reset mode; Tracked-On: #4267 Signed-off-by: Victor Sun <victor.sun@intel.com> Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2019-12-27 12:27:08 +08:00
Victor Sun	d1a46b8289	HV: rename function of vlapic_xxx_write_handler Rename vlapic_xxx_write_handler() to vlapic_write_xxx() to make code more readable; Tracked-On: #4268 Signed-off-by: Victor Sun <victor.sun@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2019-12-27 12:27:08 +08:00
Victor Sun	46ed0b1582	HV: correct apic lvt reset value Per SDM 10.4.7.1 vol3, the LVT register should be reset to 0s except for the mask bits are set to 1s. In current code, the lvt_last[] has been set to correct value(i.e. 0x10000) in vlapic_reset() before enforce setting vlapic->lvt_last[i] to 0U, add the loop that set vlapic->lvt_last[i] to 0 would lead to get zero when read LVT regs after reset, which is incompiant with SDM; Tracked-On: #4266 Signed-off-by: Victor Sun <victor.sun@intel.com> Reviewed-by: Fei Li <fei1.li@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2019-12-27 12:27:08 +08:00
Yin Fengwei	f7df43e7cd	reset: detect highest severity guest dynamically For guest reset, if the highest severity guest reset will reset system. There is vm flag to call out the highest severity guest in specific scenario which is a static guest severity assignment. There is case that the static highest severity guest is shutdown and the highest severity guest should be transfer to other guest. For example, in ISD scenario, if RTVM (static highest severity guest) is shutdown, SOS should be highest severity guest instead. The is_highest_severity_vm() is updated to detect highest severity guest dynamically. And promote the highest severity guest reset to system reset. Also remove the GUEST_FLAG_HIGHEST_SEVERITY definition. Tracked-On: #4270 Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>	2019-12-23 15:15:09 +08:00
Yin Fengwei	bfa19e9104	pm: S5: update the system shutdown logical in ACRN For system S5, ACRN had assumption that SOS shutdown will trigger system shutdown. So the system shutdown logical is: 1. Trap SOS shutdown 2. Wait for all other guest shutdown 3. Shutdown system The new logical is refined as: If all guest is shutdown, shutdown whole system Tracked-On: #4270 Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>	2019-12-23 15:15:09 +08:00
Kaige Fu	5f9d1379bc	HV: Remove INIT signal notification related code We don't use INIT signal notification method now. This patch removes them. Tracked-On: #3886 Acked-by: Eddie Dong <eddie.dong@intel.com> Signed-off-by: Kaige Fu <kaige.fu@intel.com>	2019-12-17 09:45:52 +08:00
Kaige Fu	6d1f63aef0	HV: Use NMI to replace INIT signal for lapic-pt VMs S5 We have implemented a new notification method using NMI. So replace the INIT notification method with the NMI one. Then we can remove INIT notification related code later. Tracked-On: #3886 Signed-off-by: Kaige Fu <kaige.fu@intel.com>	2019-12-17 09:45:52 +08:00
Kaige Fu	a13909cedc	HV: Use NMI-window exiting to address req missing issue There is a window where we may miss the current request in the notification period when the work flow is as the following: CPUx + + CPUr \| \| \| +--+ \| \| \| Handle pending req \| <--+ +--+ \| \| \| Set req flag \| <--+ \| +------------------>---+ \| Send NMI \| \| Handle NMI \| <--+ \| \| \| \| \| +--> vCPU enter \| \| + + So, this patch enables the NMI-window exiting to trigger the next vmexit once there is no "virtual-NMI blocking" after vCPU enter into VMX non-root mode. Then we can process the pending request on time. Tracked-On: #3886 Acked-by: Eddie Dong <eddie.dong@intel.com> Signed-off-by: Kaige Fu <kaige.fu@intel.com>	2019-12-17 09:45:52 +08:00
Kaige Fu	40ba7e8686	HV: Don't make NMI injection req when notifying vCPU The NMI for notification should not be inject to guest. So, this patch drops NMI injection request when we use NMI to notify vCPUs. Meanwhile, ACRN doesn't support vNMI well and there is no well-designed way to check if the NMI is for notification or for guest now. So, we take all the NMIs as notificaton NMI for hard rtvm temporarily. It means that the hard rtvm will never receive NMI with this patch applied. TODO: vNMI support is not ready yet. we will add it later. Tracked-On: #3886 Signed-off-by: Kaige Fu <kaige.fu@intel.com>	2019-12-17 09:45:52 +08:00
Kaige Fu	72f7f69c47	HV: Use NMI to kick lapic-pt vCPU's thread ACRN hypervisor needs to kick vCPU off VMX non-root mode to do some operations in hypervisor, such as interrupt/exception injection, EPT flush etc. For non lapic-pt vCPUs, we can use IPI to do so. But, it doesn't work for lapic-pt vCPUs as the IPI will be injected to VMs directly without vmexit. Without the way to kick the vCPU off VMX non-root mode to handle pending request on time, there may be fatal errors triggered. 1). Certain operation may not be carried out on time which may further lead to fatal errors. Taking the EPT flush request as an example, once we don't flush the EPT on time and the guest access the out-of-date EPT, fatal error happens. 2). ACRN now will send an IPI with vector 0xF0 to target vCPU to kick the vCPU off VMX non-root mode if it wants to do some operations on target vCPU. However, this way doesn't work for lapic-pt vCPUs. The IPI will be delivered to the guest directly without vmexit and the guest will receive a unexpected interrupt. Consequently, if the guest can't handle this interrupt properly, fatal error may happen. The NMI can be used as the notification signal to kick the vCPU off VMX non-root mode for lapic-pt vCPUs. So, this patch uses NMI as notification signal to address the above issues for lapic-pt vCPUs. Tracked-On: #3886 Acked-by: Eddie Dong <eddie.dong@intel.com> Signed-off-by: Kaige Fu <kaige.fu@intel.com>	2019-12-17 09:45:52 +08:00
Shiqing Gao	3cee259583	hv: msr: remove redundant check in write_pat_msr Reserved bits in a 8-bit PAT field has been checked in pat_mem_type_invalid. Remove this redundant check "(PAT_FIELD_RSV_BITS & field) != 0UL" in write_pat_msr. Tracked-On: #1842 Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>	2019-12-16 14:32:42 +08:00
Mingqiang Chi	7f96465407	hv:remove need_cleanup flag in create_vm remove this redundancy flag. Tracked-On: #1842 Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2019-12-12 16:34:13 +08:00
Victor Sun	67ec1b7708	HV: expose port 0x64 read for SOS VM The port 0x64 is the status register of i8042 keyboard controller. When i8042 is defined as ACPI PnP device in BIOS, enforce returning 0xff in read handler would cause infinite loop when booting SOS VM, so expose the physical port read in this case; Tracked-On: #4228 Signed-off-by: Victor Sun <victor.sun@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2019-12-12 13:51:24 +08:00
Victor Sun	a44c1c900c	HV: Kconfig: remove MAX_VCPUS_PER_VM in Kconfig In current architecutre, the maximum vCPUs number per VM could not exceed the pCPUs number. Given the MAX_PCPU_NUM macro is provided in board configurations, so remove the MAX_VCPUS_PER_VM from Kconfig and add a macro of MAX_VCPUS_PER_VM to reference MAX_PCPU_NUM directly. Tracked-On: #4230 Signed-off-by: Victor Sun <victor.sun@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2019-12-12 13:49:28 +08:00
Victor Sun	ea3476d22d	HV: rename CONFIG_MAX_PCPU_NUM to MAX_PCPU_NUM rename the macro since MAX_PCPU_NUM could be parsed from board file and it is not a configurable item anymore. Tracked-On: #4230 Signed-off-by: Victor Sun <victor.sun@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2019-12-12 13:49:28 +08:00
Vijay Dhanraj	c8a4ca6c78	HV: Extend non-contiguous HPA for hybrid scenario This patch extends non-contiguous HPA allocations for pre-launched VMs in hybrid scenario. Tracked-On: #4217 Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com>	2019-12-11 10:12:46 +08:00
Li Fei1	c2c05a29da	hv: vlapic: kick targeted vCPU off if interrupt trigger mode has changed In APICv advanced mode, an targeted vCPU, running in non-root mode, may get outdated TMR and EOI exit bitmap if another vCPU sends an interrupt to it if the trigger mode of this interrupt has changed. This patch try to kick vCPU off to let it get the latest TMR and EOI exit bitmap when it enters non-root mode again if new coming interrupt trigger mode has changed. Then fill the interrupt to PIR. Tracked-On: #4200 Signed-off-by: Li Fei1 <fei1.li@intel.com>	2019-12-10 09:07:54 +08:00
Vijay Dhanraj	6e8b413689	HV: Add support to assign non-contiguous HPA regions for pre-launched VM On some platforms, HPA regions for Virtual Machine can not be contiguous because of E820 reserved type or PCI hole. In such cases, pre-launched VMs need to be assigned non-contiguous memory regions and this patch addresses it. To keep things simple, current design has the following assumptions, 1. HPA2 always will be placed after HPA1 2. HPA1 and HPA2 don’t share a single ve820 entry. (Create multiple entries if needed but not shared) 3. Only support 2 non-contiguous HPA regions (can extend at a later point for multiple non-contiguous HPA) Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com> Tracked-On: #4195 Acked-by: Anthony Xu <anthony.xu@intel.com>	2019-12-09 11:28:38 +08:00
Zide Chen	03a1b2a717	hypervisor: handle reboot from non-privileged pre-launched guests To handle reboot requests from pre-launched VMs that don't have GUEST_FLAG_HIGHEST_SEVERITY, we shutdown the target VM explicitly other than ignoring them. Tracked-On: #2700 Signed-off-by: Zide Chen <zide.chen@intel.com> Acked-by: Anthony Xu <anthony.xu@intel.com>	2019-12-09 11:27:32 +08:00
Li Fei1	da3ba68cb6	hv: remove corner case in ptirq_prepare_msix_remap ptirq_prepare_msix_remap was called no matter whether MSI/MSI-X was enabled or not and it passed zero to input parameter virtual MSI/MSI-X data field to indicate MSI/MSI-X was disabled. However, it barely did nothing on this case. Now ptirq_prepare_msix_remap is called only when MSI/MSI-X is enabled. It doesn't need to check whether MSI/MSI-X is enabled or not by checking virtual MSI/MSI-X data field. Tracked-On: #3475 Signed-off-by: Li Fei1 <fei1.li@intel.com>	2019-12-05 16:43:22 +08:00

1 2 3 4 5 ...

1161 Commits