When wrong BAR address is set for pci-vuart, OS may assign a
new BAR address to it. Pci-vuart BAR can't be reprogrammed,
for its wrong fixed value. That can may because pci_vbar.fixed and
pci_vbar.type has overlap in abstraction, pci_vbar.fixed
has a confusing name, pci_vbar.type has PCIBAR_MEM64HI which is not
really a type of pci BARs.
So replace pci_vbar.type with pci_vbar.is_mem64hi, and change
pci_vbar.fixed to an union type with new name pci_vbar.bar_type.
Tracked-On: #5491
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
We have trapped the #DB for split-lock emulation.
Only fault exception need RIP being retained.
Tracked-On: #5605
Signed-off-by: Jie Deng <jie.deng@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
xchg may also cause the #AC for split-lock check.
This patch adds this emulation.
1. Kick other vcpus of the guest to stop execution
if the guest has more than one vcpu.
2. Emulate the xchg instruction.
3. Notify other vcpus (if any) to restart execution.
Tracked-On: #5605
Signed-off-by: Jie Deng <jie.deng@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This patch adds the split-lock emulation.
If a #AC is caused by instruction with LOCK prefix then
emulate it, otherwise, inject it back as it used to be.
1. Kick other vcpus of the guest to stop execution
and set the TF flag to have #DB if the guest has more
than one vcpu.
2. Skip over the LOCK prefix and resume the current
vcpu back to guest for execution.
3. Notify other vcpus to restart exception at the end
of handling the #DB since we have completed
the LOCK prefix instruction emulation.
Tracked-On: #5605
Signed-off-by: Jie Deng <jie.deng@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Check hardware support for all features in CR4,
and hide bits from guest by vcpuid if they're not supported
for guests OS.
Tracked-On: #5586
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
- The current code to virtualize CR0/CR4 is not
well designed, and hard to read.
This patch reshuffle the logic to make it clear
and classify those bits into PASSTHRU,
TRAP_AND_PASSTHRU, TRAP_AND_EMULATE & reserved bits.
Tracked-On: #5586
Signed-off-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
While following two styles are both correct, the 2nd one is simpler.
bool is_level_triggered;
1. if (is_level_triggered == true) {...}
2. if (is_level_triggered) {...}
This patch cleans up the style in hypervisor.
Tracked-On: #861
Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
From SDM Vol.2C - XSETBV instruction description,
If CR4.OSXSAVE[bit 18] = 0,
execute "XSETBV" instruction will generate #UD exception.
From SDM Vol.3C 25.1.1,#UD exception has priority over VM exits,
So if vCPU execute "XSETBV" instruction when CR4.OSXSAVE[bit 18] = 0,
VM exits won't happen.
While hv inject #GP if vCPU execute "XSETBV" instruction
when CR4.OSXSAVE[bit 18] = 0.
It's a wrong behavior, this patch will fix the bug.
Tracked-On: #4020
Signed-off-by: Junming Liu <junming.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Memory BAR of ivshmem device is 64-bit, 2 BAR registers
are used, counting in one 32-bit MMIO bar and and one
32-bit vMSIX table bar, number of bars "nr_bars" shall
be 4 instead of 3.
Tracked-On: #5490
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
- fix bug in 'hcall_destroy_vdev()', the availability of
vpci device shall be checked on 'target_vm".
- refine 'vpci_update_one_vbar()' to avoid potential NULL
pointer access.
Tracked-On: #5490
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
It is possible for more than one vCPUs to trigger shutdown on an RTVM.
We need to avoid entering VM_READY_TO_POWEROFF state again after the
RTVM has been paused or shut down.
Also, make sure an RTVM enters VM_READY_TO_POWEROFF state before it can
be paused.
v1 -> v2:
- rename to poweroff_if_rt_vm for better clarity
Tracked-On: #5411
Signed-off-by: Peter Fang <peter.fang@intel.com>
Hypercall handlers for post-launched VMs automatically grab the vm_lock
in dispatch_sos_hypercall(). Remove the use of vm_lock inside the
handler.
Tracked-On: #5411
Signed-off-by: Peter Fang <peter.fang@intel.com>
Currently, ACRN only support shutdown when triple fault happens, because ACRN
doesn't present/emulate a virtual HW, i.e. port IO, to support shutdown. This
patch emulate a virtual shutdown component, and the vACPI method for guest OS
to use.
Pre-launched VM uses ACPI reduced HW mode, intercept the virtual sleep control/status
registers for pre-launched VMs shutdown
Tracked-On: #5411
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Like post-launched VMs, for pre-launched VMs, the ACPI reset register
is also fixed at 0xcf9 and the reset value is 0xE, so pre-launched VMs
now also use ACPI reset register for rebooting.
Tracked-On: #5411
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
A VM may transition to VM_PAUSED state while its console is being used.
Jump back to the HV shell if this happens so the console does not appear
stuck.
Tracked-On: #5411
Signed-off-by: Peter Fang <peter.fang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
More than one VM may request shutdown on the same pCPU before
shutdown_vm_from_idle() is called in the idle thread when pCPUs are
shared among VMs.
Use a per-pCPU bitmap to store all the VMIDs requesting shutdown.
v1 -> v2:
- use vm_lock to avoid a race on shutdown
Tracked-On: #5411
Signed-off-by: Peter Fang <peter.fang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add two Kconfig pSRAM config:
one for whether to enable the pSRAM on the platfrom or not;
another for if the pSRAM is enabled on the platform whether to enable
the pSRAM in the pre-launched RTVM.
If we enable the pSRAM on the platform, we should remove the pSRAM EPT
mapping from the SOS to prevent it could flush the pSRAM cache.
Tracked-On: #5330
Signed-off-by: Qian Wang <qian1.wang@intel.com>
1.Modified the virtual e820 table for pre-launched VM. We added a
segment for pSRAM, and thus lowmem RAM is split into two parts.
Logics are added to deal with the split.
2.Added EPT mapping of pSRAM segment for pre-launched RTVM if it
uses pSRAM.
Tracked-On: #5330
Signed-off-by: Qian Wang <qian1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
pSRAM memory should be cachable. However, it's not a RAM or a normal MMIO,
so we can't use the an exist API to do the EPT mapping and set the EPT cache
attribute to WB for it. Now we assume that SOS must assign the PSRAM area as
a whole and as a separate memory region whose base address is PSRAM_BASE_HPA.
If the hpa of the EPT mapping region is equal to PSRAM_BASE_HPA, we think this
EPT mapping is for pSRAM, we change the EPT mapping cache attribute to WB.
And fix a minor bug when SOS trap out to emulate wbinvd when pSRAM is enabled.
Tracked-On: #5330
Signed-off-by: Qian Wang <qian1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Use ept_flush_leaf_page to emulate guest WBINVD when PTCM is enabled and skip
the pSRAM in ept_flush_leaf_page.
TODO: do we need to emulate WBINVD in HV side.
Tracked-On: #5330
Signed-off-by: Qian Wang <qian1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Rename hv_access_memory_region_update to ppt_clear_user_bit to
verb + object style.
Tracked-On: #5330
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Temporarily remove NX bit of PTCM binary in pagetable during pSRAM
initialization:
1.added a function ppt_set_nx_bit to temporarily remove/restore the NX bit of
a given area in pagetable.
2.Temporarily remove NX bit of PTCM binary during pSRAM initialization to make
PTCM codes executable.
3. TODO: We may use SMP call to flush TLB and do pSRAM initilization on APs.
Tracked-On: #5330
Signed-off-by: Qian Wang <qian1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The added parse_ptct function will parse native ACPI PTCT table to
acquire information like pSRAM location/size/level and PTCM location,
and save them.
Tracked-On: #5330
Signed-off-by: Qian Wang <qian1.wang@intel.com>
1.We added a function init_psram to initialize pSRAM as well as some definitions.
Both AP and BSP shall call init_psram to make sure pSRAM is initialized, which is
required by PTCM.
BSP:
To parse PTCT and find the entry of PTCM command function, then call PTCM ABI.
AP:
Wait until BSP has done the parsing work, then call the PTCM ABI.
Synchronization of AP and BSP is ensured, both inside and outside PTCM.
2. Added calls of init_psram in init_pcpu_post to initialize pSRAM in HV booting phase
Tracked-On: #5330
Signed-off-by: Qian Wang <qian1.wang@intel.com>
According 11.5.1 Cache Control Registers and Bits, Intel SDM Vol 3,
change CR0.CD will not flush cache to insure memory coherency. So
it's not needed to call wbinvd to flush cache in ACRN Hypervisor.
That's what the guest should do.
Tracked-On: #5330
Signed-off-by: Qian Wang <qian1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
cleanup vpci structure when shutdown_vm to avoid use uninitialized data
after reboot.
Tracked-On: #4958
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add cteate method for vmcs9900 vdev in hypercalls.
The destroy method of ivshmem is also suitable for other emulated vdev,
move it into hcall_destroy_vdev() for all emulated vdevs
Tracked-On: #5394
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
if vuart type is pci-vuart, then use MSI interrupt
split vuart_toggle_intr() control flow into vuart_trigger_level_intr() &
trigger_vmcs9900_msix(), because MSI is edge triggered, no deassertion
operation. Only trigger MSI for pci-vuart when assert interrupt.
Tracked-On: #5394
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
support pci-vuart type, and refine:
1.Rename init_vuart() to init_legacy_vuarts(), only init PIO type.
2.Rename deinit_vuart() to deinit_legacy_vuarts(), only deinit PIO type.
3.Move io handler code out of setup_vuart(), into init_legacy_vuarts()
4.add init_pci_vuart(), deinit_pci_vuart, for one pci vuart vdev.
and some change from requirement:
1.Increase MAX_VUART_NUM_PER_VM to 8.
Tracked-On: #5394
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
The vuart_read()/vuart_write() are coupled with PIO vuart type. Move
the non-type related code into vuart_read_reg()/vuart_write_reg(), so
that we can re-use them to handle MMIO request of pci-vuart type.
Tracked-On: #5394
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
- Refactor pci_dev_c.py to insert devices information per VMs
- Add function to get unused vbdf form bus:dev.func 00:00.0 to 00:1F.7
Add pci devices variables to vm_configurations.c
- To pass the pci vuart information form tool, add pci_dev_num and
pci_devs initialization by tool
- Change CONFIG_SOS_VM in hypervisor/include/arch/x86/vm_config.h to
compromise vm_configurations.c
Tracked-On: #5426
Signed-off-by: Yang, Yu-chu <yu-chu.yang@intel.com>
The new (1.8.17) release of doxygen is complaining about errors in the
doxygen comments that were's reported by our current 1.8.13 release.
Let's fix these now. In a separate PR we'll also update some
configuration settings that will be obsolete, in preparation for moving
to this newer version.
[External_System_ID]ACRN-6774
Tracked-On: #5385
Signed-off-by: David B. Kinder <david.b.kinder@intel.com>
In pre-launched VM the GPA of vmsix BAR which is used for vmsix
over msi is calculated/allocated by acrn-config tool. The GPA
needs to be assigned to vdev when vdev is initialized. The
assignment is only needed for pre-launched VM. For SOS kernel
will reprogram the Bar base when startup. For post-launched VM
the Bar GPA will be assigned by device model via hypercall.
Tracked-On: #5316
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
When init_vmsix_on_msi is called during the initialization of a pt
device, the vmsix bar used for vmsix over msi is just created. No
mapping/unmapping is done and pci_vdev_write_vbar should be called
instead of vdev_pt_write_vbar at the time. Currently the Bar mapping
is delayed till OS sizing the Bar. Backup vbar base_gpa to mmio_gpa
is not required here becuase it will be done later when Bar mapping.
Tracked-On: #5316
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
- Since de-privilege boot is removed, we no longer need to save boot
context in boot time.
- cpu_primary_start_64 is not an entry for ACRN hypervisor any more,
and can be removed.
Tracked-On: #5197
Signed-off-by: Zide Chen <zide.chen@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
This patch enables doorbell feature for hv-land
ivshmem device to support interrupt notification
between VMs that use inter-VM(ivshmem) devices.
Tracked-On: #5407
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Li, Fei <fei1.li@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This function can be used by other modules instead of hypercall
handling only, hence move it to vlapic.c
Tracked-On: #5407
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Li, Fei <fei1.li@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
- write_vmsix_cap_reg(): emulates vmsix cap writes.
write_pt_vmsix_cap_reg(): emulates msix cap write
for PT devices.
- rw_vmsix_table(): emulates vmsix table bar space access.
- vmsix_handle_table_mmio_access(): emulates the vmsix
bar space access only.
- pt_vmsix_handle_table_mmio_access(): emulates the vmsix
bar space access and remap msi entry for PT device if
write operation is executed.
- rename 'init_vmsix()' and 'deinit_vmsix()' to
'init_vmsix_pt()' and 'deinit_vmsix_pt()' respectively,
they're for PT devices only.
- remove below 2 functions,call
'pci_vdev_read_vcfg()' directly in cases they're used.
- 'read_vmsi_cap_reg()'
- 'read_vmsix_cap_reg()'
Tracked-On: #5407
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Li, Fei <fei1.li@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Acked-by: Eddie Done <eddie.dong@intel.com>
vmsix.c originally covers ptdev case but ACRN hypervisor
need to support pure virtual PCI mediator, such as ivshmem
device in this patch set.
For better understanding the code changes from patch
perspective, split the changes to several small patches.
This patch moves most original vmsix code to pci_pt.c
as they're mixed with ptdev specific operations.
The subsequent patches will start the detail abstraction change.
Tracked-On: #5407
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Li Fei <fei1.li@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Now ACRN supports direct boot mode, which could be SBL/ABL, or GRUB boot.
Thus the vboot wrapper layer can be removed and the direct boot functions
don't need to be wrapped in direct_boot.c:
- remove call to init_vboot(), and call e820_alloc_memory() directly at the
time when the trampoline buffer is actually needed.
- Similarly, call CPU_IRQ_ENABLE() instead of the wrapper init_vboot_irq().
- remove get_ap_trampoline_buf(), since the existing function
get_trampoline_start16_paddr() returns the exact same value.
- merge init_general_vm_boot_info() into init_vm_boot_info().
- remove vm_sw_loader pointer, and call direct_boot_sw_loader() directly.
- move get_rsdp_ptr() from vboot_wrapper.c to multiboot.c, and remove the
wrapper over two boot modes.
Tracked-On: #5197
Signed-off-by: Zide Chen <zide.chen@intel.com>
Since now we support direct boot only, we don't have to use FIRMWARE variable
to differentiate between sbl/GRUB and UEFI boot.
After this change:
- "FIRMWARE=sbl/uefi" should be removed from make commands.
- the firmware name is removed from the installed ACRN image. For example,
acrn.apl-up2.sbl.sdc.32.out will be changed to acrn.apl-up2.sdc.32.out.
Tracked-On: #5197
Signed-off-by: Zide Chen <zide.chen@intel.com>
update the help message of config SCENARIO to set 2 standard
post-launched VMs for default hybrid_rt scenario in Kconfig.
Tracked-On: #5390
Signed-off-by: Shuang Zheng <shuang.zheng@intel.com>
Acked-by: Victor Sun <victor.sun@intel.com>
This is a bug fix that avoids multiple declarations of mem_regions
Tracked-On: #4853
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Move struct pt_intx_config vm0_pt_intx[] defintion to pt_intx.c
so that vm_configurations.h/vm_configurations.c are consistent for different boards
Tracked-On: #5229
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
The commit of da81a0041d
"HV: add e820 ACPI entry for pre-launched VM" introduced a issue that the
base_hpa and remaining_hpa_size are also calculated on the entry of 32bit
PCI hole which from 0x80000000 to 0xffffffff, which is incorrect;
Tracked-On: #5266
Signed-off-by: Victor Sun <victor.sun@intel.com>
On a PCI type HV uart, the bdf value is in a union together with
mmio_base_vaddr, then the value would be overridden by mmio_base_addr
in uart16550_init(), result in is_pci_dbg_uart() returns a wrong value
and then uart hang.
Tracked-On: #5288
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Per PCI Firmware Specification Revision 3.0, 4.1.2. MCFG Table Description:
Memory Mapped Enhanced Configuration Space Base Address Allocation Structure
assign the Start Bus Number and the End Bus Number which could decoded by the
Host Bridge. We should not access the PCI device which bus number outside of
the range of [Start Bus Number, End Bus Number).
For ACRN, we should:
1. Don't detect PCI device which bus number outside the range of
[Start Bus Number, End Bus Number) of MCFG ACPI Table.
2. Only trap the ECAM MMIO size: [MMCFG_BASE_ADDRESS, MMCFG_BASE_ADDRESS +
(End Bus Number - Start Bus Number + 1) * 0x100000) for SOS.
Tracked-On: #5233
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
compile ACPI tables for pre-launched VMs to one binary when pre-build
hypervisor.
Tracked-On: #5266
Signed-off-by: Shuang Zheng <shuang.zheng@intel.com>
Acked-by: Victor Sun <victor.sun@intel.com>
The old method of build pre-launched VM vacpi by HV source code is deprecated,
so remove related source code;
Tracked-On: #5266
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Previously we use a pre-defined structure as vACPI table for pre-launched
VM, the structure is initialized by HV code. Now change the method to use a
pre-loaded multiboot module instead. The module file will be generated by
acrn-config tool and loaded to GPA 0x7ff00000, a hardcoded RSDP table at
GPA 0x000f2400 will point to the XSDT table which at GPA 0x7ff00080;
Tracked-On: #5266
Signed-off-by: Victor Sun <victor.sun@intel.com>
Signed-off-by: Shuang Zheng <shuang.zheng@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Previously the ACPI table was stored in F segment which might not be big
enough for a customized ACPI table, hence reserve 1MB space in pre-launched
VM e820 table to store the ACPI related data:
0x7ff00000 ~ 0x7ffeffff : ACPI Reclaim memory
0x7fff0000 ~ 0x7fffffff : ACPI NVS memory
Tracked-On: #5266
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
hv: vpci: Add 0x45, which is the high-byte of device id of EHL,
to the enumeration array in vhostbridge.c. This is to fix the
problem that PCIe extended capabilities like SR-IOV cannot be
used on EHL.
Tracked-On: #5256
Signed-off-by: Qian Wang <qian1.wang@intel.com>
Previously the min load_addr for HV image is hard coded to 0x10000000 when
CONFIG_RELOC is enabled, now use CONFIG_HV_RAM_START as its prefer minimum
address like setting of CONFIG_PHYSICAL_START do in Linux kernel.
With this patch, we can offload the CONFIG_HV_RAM_START algorithm to
acrn-config or manually set it in scenario XML on some special boards.
Tracked-On: #5275
Signed-off-by: Victor Sun <victor.sun@intel.com>
When HV pass through the P2SB MMIO device to pre-launched VM, vgpio
device model traps MMIO access to the GPIO registers within P2SB so
that it can expose virtual IOAPIC pins to the VM in accordance with
the programmed mappings between gsi and vgsi.
Tracked-On: #5246
Signed-off-by: Toshiki Nishioka <toshiki.nishioka@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add the capability of forwarding specified physical IOAPIC interrupt
lines to pre-launched VMs as virtual IOAPIC interrupts. This is for the
sake of the certain MMIO pass-thru devices on EHL CRB which can support
only INTx interrupts.
Tracked-On: #5245
Signed-off-by: Toshiki Nishioka <toshiki.nishioka@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Kernel driver and ACPI ASL may access a platform hidden device
thru PIO, e.g., Intel ICH LPC driver. If the access is originated
in SOS or Pre-launched OS, vpci_pio_cfgdata_write/read should support
it.
This commit also reworks vpci_write_cfg/vpci_read_cfg to do the access
check and elimiates the access from post-launched VM (that should be
handled by DM).
Tracked-On: #5257
Signed-off-by: Stanley Chang <stanley.chang@intel.com>
Reviewed-by: Li Fei <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
BDF string can be parsed by the configuration tool. A 16bit WORD value with
format (B:8, D:5, F:3) can be passed from configuration to the
hypervisor directly to save some BDF string parse code.
Tracked-On: #4937
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
On EHL platform, we need to expose GPIO chassis interrupt to pre-launched VM
as INTx. Add related data structures so that they can be used in subsequent
commits.
Tracked-On: #5241
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
On EHL platform, we need to pass through P2SB bridge to pre-launched VM.
Use pt_p2sb_bar to indicate whether to passthru p2sb bridge to pre-launched VM
or not.
Tracked-On: #5221
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
When trying to passthru a DHRD-ignored PCI device,
iommu_attach_device shall report success. Otherwise,
the assign_vdev_pt_iommu_domain will result in HV panic.
Same for iommu_detach_device case.
Tracked-On: #5240
Signed-off-by: Stanley Chang <stanley.chang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Some hypercalls to a target VM are only acceptable in some certain
states, else it impacts target VM. Add some restrictive status checks to
avoid that.
Tracked-On: #5208
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Virtual interrupts injection and memory mapping operations can impact
target VM. By design, these type of operations from lower severity VM
to higher severity VM should be blocked by the hypervisor.
While the hypercalls are the interface between SOS VM and the
hypervisor, severity checks can be implemented at the beginning of
hypercalls needed.
Added severity checks in below hypercalls:
* hcall_set_vm_memory_regions()
* hcall_notify_ioreq_finish()
* hcall_set_irqline()
* hcall_inject_msi()
* hcall_write_protect_page()
Tracked-On: #5208
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
if device configuration vbdf is unassigned, then the corresponding
vdev will not be initialized, instead, the vdev will be initialized
by device model through hypercall.
Tracked-On: #4853
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
For ivshmem vdev creation, the vdev vBDF, vBARs, shared memory region
name and size are set by device model. The shared memory name and size
must be same as the corresponding device configuration which is configured
by offline tool.
v3: add a comment to the vbar_base member of the acrn_vm_pci_dev_config
structure that vbar_base is power-on default value
Tracked-On: #4853
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add HC_CREATE_VDEV and HC_DESTROY_VDEV two hypercalls that are used to
create and destroy an emulated device(PCI device or legacy device) in hypervisor
v3: 1) change HC_CREATE_DEVICE and HC_DESTROY_DEVICE to HC_CREATE_VDEV
and HC_DESTROY_VDEV
2) refine code style
v4: 1) remove unnecessary parameter
2) add VM state check for HC_CREATE_VDEV and HC_DESTROY hypercalls
Tracked-On: #4853
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
1.Modify clos_mask and mba_delay as a member of the union type.
2.Move HV_SUPPORTED_MAX_CLOS ,MAX_CACHE_CLOS_NUM_ENTRIES and
MAX_MBA_CLOS_NUM_ENTRIES to misc_cfg.h file.
Tracked-On: #5229
Signed-off-by: Wei Liu <weix.w.liu@intel.com>
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
HV_SUPPORTED_MAX_CLOS:
This value represents the maximum CLOS that is allowed by ACRN hypervisor.
This value is set to be least common Max CLOS (CPUID.(EAX=0x10,ECX=ResID):EDX[15:0])
among all supported RDT resources in the platform. In other words, it is
min(maximum CLOS of L2, L3 and MBA). This is done in order to have consistent
CLOS allocations between all the RDT resources.
Tracked-On: #5229
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
New board, EHL CRB, does not have legacy port IO UART. Even the PCI UART
are not work due to BIOS's bug workaround(the BARs on LPSS PCI are reset
after BIOS hand over control to OS). For ACRN console usage, expose the
debug UART via ACPI PnP device (access by MMIO) and add support in
hypervisor debug code.
Another special thing is that register width of UART of EHL CRB is
1byte. Introduce reg_width for each struct console_uart.
Tracked-On: #4937
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
-- use an array to fast locate the hypercall handler
to replace switch case.
-- uniform hypercall handler as below:
int32_t (*handler)(sos_vm, target_vm, param1, param2)
Tracked-On: #4958
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Enhance the help text that accompanies the CONFIG_SCENARIO symbol in Kconfig
Tracked-On: #5203
Signed-off-by: Geoffroy Van Cutsem <geoffroy.vancutsem@intel.com>
2abbb99f6a ("hv: make thread status more accurate") introduced a
transition stage, marked as var be_blocking, between RUNNING->BLOCKED
of thread status. wake_thread() does not work in this transition stage
because it only checks thread->status.
Need to check thread->be_blocking as well in wake_thread(). When
wake_thread() happens in the transition stage, the previous sleep
operation rolled back.
Tracked-On: #5190
Fixes: 2abbb99f6a ("hv: make thread status more accurate")
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Replace pr_fatal with pr_info to reduce printing logs
Tracked-On: #4853
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Ivshmem device defines four registers including Interrupt Mask, Interrupt
Status, IVPostion and Doorbell. The first two are useless and no emulation
is required. The latter two are used for interrupts and will be implemented
in the future.
This patch also introduces a new priv_data member for structure pci_vdev,
it can be used to find an ivshmem device through pci_vdev.
v2: refine code style
v3: 1) add @pre for ivshmem_mmio_handler function
2) refine code style
v4: 1) set ivshmem registers default value when vBAR mapping
2) change find_ivshmem_device to set_ivshmem_device
v5: 1) change set_ivshmem_device to find_and_set_ivshmem_device
2) add a ASSERT to check if the vdev->priv_data is set successfully
v6: change find_and_set_ivshmem_device to create_ivshmem_device
Tracked-On: #4853
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Implement read_vdev_cfg/write_vdev_cfg operations for ivshmem deivce
v2: read_vdev_cfg/write_vdev_cfg always return zero, the ivshmem device
only emulated in HV.
Tracked-On: #4853
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This patch introduces vpci_update_one_vbar API to simplify
vBAR mapping/unmapping when vBAR writing.
v2: refine commit message
v4: refine commit message
Tracked-On: #4853
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
ivshmem device supports two BARs, BAR 0 is used for inter-VM
notification mechanism, BAR 2 is used to provide shared memory
base address and size.
v4: check if the return value of get_shm_region function is NULL
v5: 1) change get_shm_region to find_shm_region
2) add print log when ivshmem device doesn't find memory region
Tracked-On: #4853
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
add an IVSHMEM regoin and the related configuration parameters in
hybrid_rt scenario on whl-ipc-i5. The size of the shared memory is
2M, and it is used for the communication between VM0 and VM2.
v6: rename shm name; remove unnecessary MACROs.
v7: rename MACRO for shm name; add unassigned vbdf for post-launched
VMs.
Tracked-On: #4853
Signed-off-by: Shuang Zheng <shuang.zheng@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Ivshmem device is used for shared memory based communication between
pre-launched/post-launched VMs.
this patch implements ivshmem device configuration space initialization
and ivshmem device operation methods.
v2: introduce init_one_pcibar interface to simplify BAR initialization
operation of HV emulated PCI device.
v3: 1) due to init_one_pcibar API is only used for pre-launched VM vdevs
it can't be applied to all vdevs, so remove it.
2) move ivshmem BARs initialization to subsequent patch, this patch
only introduce ivshmem configuration space initialization.
Tracked-On: #4853
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The ivshmem memory regions use the memory of the hypervisor and
they are continuous and page aligned.
this patch is used to initialize each memory region hpa.
v2: 1) if CONFIG_IVSHMEM_SHARED_MEMORY_ENABLED is not defined, the
entire code of ivshmem will not be compiled.
2) change ivshmem shared memory unit from byte to page to avoid
misconfiguration.
3) add ivshmem configuration and vm configuration references
v3: 1) change CONFIG_IVSHMEM_SHARED_MEMORY_ENABLED to CONFIG_IVSHMEM_ENABLED
2) remove the ivshmem configuration sample, offline tool provides default
ivshmem configuration.
3) refine code style.
v4: 1) make ivshmem_base 2M aligned.
Tracked-On: #4853
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
There's a corner case:
When want to get CPUID.01H:EDX value,
may have the following code snippet:
uint32_t unused,edx;
cpuid_subleaf(0x1U, 0x0U, &unused, &unused, &unused, &edx);
while in cpuid_subleaf:
*eax = leaf;
*ecx = subleaf;
eax and ecx point to the same location,
When deep into asm_cpuid, it's input value will be 0x0U and 0x0U.
but the expected input value is 0x1U and 0x0U.
This case will return CPUID.00H:EDX, which is the wrong answer.
Tracked-On: #4526
Signed-off-by: Junming Liu <junming.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Fix the bug for "is_apl_platform" func.
"monitor_cap_buggy" is identical to "is_apl_platform", so remove it.
On apl platform:
1) ACRN doesn't use monitor/mwait instructions
2) ACRN disable GPU IOMMU
Tracked-On:#3675
Signed-off-by: Junming Liu <junming.liu@intel.com>
v3 -> v4:
Refine commit message and code stype
1.
SDM Vol. 2A 3-211 states DisplayFamily = Extended_Family_ID + Family_ID
when Family_ID == 0FH.
So it should be family += ((eax >> 20U) & 0xffU) when Family_ID == 0FH.
2.
IF (Family_ID = 06H or Family_ID = 0FH)
THEN DisplayModel = (Extended_Model_ID « 4) + Model_ID;
While previous code this logic:
IF (DisplayFamily = 06H or DisplayFamily = 0FH)
Fix the bug about calculation of display family and
display model according to SDM definition.
3. use variable name to distinguish Family ID/Display Family/Model ID/Display Model,
then the code is more clear to avoid some mistake
Tracked-On:#3675
Signed-off-by: liujunming <junming.liu@intel.com>
Reviewed-by: Wu Xiangyang <xiangyang.wu@linux.intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This patch will move the VM configuration check to pre-build stage,
a test program will do the check for pre-defined VM configuration
data before making hypervisor binary. If test failed, the make
process will be aborted. So once the hypervisor binary is built
successfully or start to run, it means the VM configuration has
been sanitized.
The patch did not add any new VM configuration check function,
it just port the original sanitize_vm_config() function from cpu.c
to static_checks.c with below change:
1. remove runtime rdt detection for clos check;
2. replace pr_err() from logmsg.h with printf() from stdio.h;
3. replace runtime call get_pcpu_nums() in ALL_CPUS_MASK macro
with static defined MAX_PCPU_NUM;
4. remove cpu_affinity check since pre-launched VM might share
pcpu with SOS VM;
The BOARD/SCENARIO parameter check and configuration folder check is
also moved to prebuild Makefile.
Tracked-On: #5077
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Remove function of sanitize_vm_config() since the processing of sanitizing
will be moved to pre-build process.
When hypervisor has booted, we assume all VM configurations is sanitized;
Tracked-On: #5077
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
realpath function would be got null while the directory or file is
not exist, modify the function abspath to instead realpath.
Tracked-On: #5146
Signed-off-by: Wei Liu <weix.w.liu@intel.com>
-- move vm_state_lock to other place in vm structure
to avoid the memory waste because of the page-aligned.
-- remove the memset from create_vm
-- explicitly set max_emul_mmio_regions and vcpuid_entry_nr to 0
inside create_vm to avoid use without initialization.
-- rename max_emul_mmio_regions to nr_emul_mmio_regions
v1->v2:
add deinit_emul_io in shutdown_vm
Tracked-On: #4958
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Grandhi, Sainath <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Previously the CPU affinity of SOS VM is initialized at runtime during
sanitize_vm_config() stage, follow the policy that all physical CPUs
except ocuppied by Pre-launched VMs are all belong to SOS_VM. Now change
the process that SOS CPU affinity should be initialized at build time
and has the assumption that its validity is guarenteed before runtime.
Tracked-On: #5077
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Previously we have complicated check mechanism on platform_acpi_info.h which
is supposed to be generated by acrn-config tool, but given the reality that
all configurations should be generated by acrn-config before build acrn
hypervisor, this check is not needed anymore.
Tracked-On: #5077
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The SDC scenario configurations will not be validated so remove it from
build makefile;
Tracked-On: #5077
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
In MSI Capability Structure, bit 7 (64 bit address capable) of MSICTRL
is RO;
Tracked-On: #5125
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Li Fei <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
As we only set BLOCKED status in context switch_out, which means, only
running thread can be changed to BLOCKED, but runnable thread can not.
This lead to the deadloop in sleep_thread_sync.
To solve the problem, in sleep_thread, we set the status to BLOCKED
directly when the original thread status is RUNNABLE.
Tracked-On: #5115
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
When VM read pre-sriov header in ECAP of ptdev, only emulate the
reading if SRIOV is hidden.
Write to pre-sriov header is ignored so no need to fix writting.
Tracked-On: #5085
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
The old layout configuration source which located in:
hypervisor/arch/x86/configs/ is abandoned, remove it;
Tracked-On: #5077
Signed-off-by: Victor Sun <victor.sun@intel.com>
The make command is same as old configs layout:
under acrn-hypervisor folder:
make hypervisor BOARD=xxx SCENARIO=xxx [TARGET_DIR]=xxx [RELEASE=x]
under hypervisor folder:
make BOARD=xxx SCENARIO=xxx [TARGET_DIR]=xxx [RELEASE=x]
if BOARD/SCENARIO parameter is not specified, the default will be:
BOARD=nuc7i7dnb SCENARIO=industry
Tracked-On: #5077
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
There are 3 kinds of configurations in ACRN hypervisor source code: hypervisor
overall setting, per-board setting and scenario specific per-VM setting.
Currently Kconfig act as hypervisor overall setting and its souce is located at
"hypervisor/arch/x86/configs/$(BOARD).config"; Per-board configs are located at
"hypervisor/arch/x86/configs/$(BOARD)" folder; scenario specific per-VM configs
are located at "hypervisor/scenarios/$(SCENARIO)" folder.
This layout brings issues that board configs and VM configs are coupled tightly.
The board specific Kconfig file and misc_cfg.h are shared by all scenarios, and
scenario specific pci_dev.c is shared by all boards. So the user have no way to
build hypervisor binary for different scenario on different board with one
source code repo.
The patch will setup a new VM configurations layout as below:
misc/vm_configs
├── boards --> folder of supported boards
│ ├── <board_1> --> scenario-irrelevant board configs
│ │ ├── board.c --> C file of board configs
│ │ ├── board_info.h --> H file of board info
│ │ ├── pci_devices.h --> pBDF of PCI devices
│ │ └── platform_acpi_info.h --> native ACPI info
│ ├── <board_2>
│ ├── <board_3>
│ └── <board...>
└── scenarios --> folder of supported scenarios
├── <scenario_1> --> scenario specific VM configs
│ ├── <board_1> --> board specific VM configs for <scenario_1>
│ │ ├── <board_1>.config --> Kconfig for specific scenario on specific board
│ │ ├── misc_cfg.h --> H file of board specific VM configs
│ │ ├── pci_dev.c --> board specific VM pci devices list
│ │ └── vbar_base.h --> vBAR base info of VM PT pci devices
│ ├── <board_2>
│ ├── <board_3>
│ ├── <board...>
│ ├── vm_configurations.c --> C file of scenario specific VM configs
│ └── vm_configurations.h --> H file of scenario specific VM configs
├── <scenario_2>
├── <scenario_3>
└── <scenario...>
The new layout would decouple board configs and VM configs completely:
The boards folder stores kinds of supported boards info, each board folder
stores scenario-irrelevant board configs only, which could be totally got from
a physical platform and works for all scenarios;
The scenarios folder stores VM configs of kinds of working scenario. In each
scenario folder, besides the generic scenario specific VM configs, the board
specific VM configs would be put in a embedded board folder.
In new layout, all configs files will be removed out of hypervisor folder and
moved to a separate folder. This would make hypervisor LoC calculation more
precisely with below fomula:
typical LoC = Loc(hypervisor) + Loc(one vm_configs)
which
Loc(one vm_configs) = Loc(misc/vm_configs/boards/<board>)
+ LoC(misc/vm_configs/scenarios/<scenario>/<board>)
+ Loc(misc/vm_configs/scenarios/<scenario>/vm_configurations.c
+ Loc(misc/vm_configs/scenarios/<scenario>/vm_configurations.h
Tracked-On: #5077
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
hv: vpci: inject physical PCIEXBAR to SOS vhostbridge in
order to fully emulate a full host bridge following HW spec
The vhostbridge we emulate currently is a "Celeron N3350/
Pentium N4200/Atom E3900 Series Host Bridge", which is of
Appollo Lake SoC, but the emulation is incomplete, and
we need to implement a full vhostbridge following HW spec.
This is a step-by-step process, and in this patch we fixes
the simulation of PCIEXBAR register (0x60) and thus solved
bug #6464.
-------#6464: SOS cannot make use of ECAM---------------
Generally, SOS will check the MMIO Base Addr in ACPI MCFG
table to confirm it is a reserved memory area. There will
be 3 methods to check:
1. Via E820 table
2. Via EFI runtime service
3. To check with the value in PCIEXBAR(0x60) of hostbridge
For SOS, method 2 is not feasible since no EFI runtime service
is available for SOS. And on newer platform like EHL/TGL, its
BIOS somehow doesn't reserve it in native E820, thus SOS will
try use method 3 to verify, so we should inject physical ECAM
to vhostbridge, otherwise all 3 methods will fail, and SOS will
not make use of ECAM, which will results in that SOS cannot use
PCIe Extended Capabilities like SR-IOV.
-------------------------------------------------------
TODO:
1. In the future, we may add one or more virtual hostbridges for CPUs that are incompatible in layout with the current one, according to HW specs
2. Besides PCIEXBAR(0x60), there are also some registers needs to be emulated more precisely rather than be treated as read-only and hard-coded, will be fixed in future patches.
Tracked-On: #5056
Signed-off-by: Qian Wang <qian1.wang@intel.com>
Reviewed-by: Jason Chen <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
hv: vpci: refine init_vhostbridge to be dword-aligned
Refine the hard-coded non-dword-aligned sentences in init_vhostbridge
to be dword-aligned to simplify the initialization operation
Tracked-On: #5056
Signed-off-by: Qian Wang <qian1.wang@intel.com>
Reviewed-by: Jason Chen <jason.cj.chen@intel.com>
Reviewed-by: Li Fei <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
To hide CET feature from guest VM completely, the MSR IA32_MSR_XSS also
need to be intercepted because it comprises CET_U and CET_S feature bits
of xsave/xstors operations. Mask these two bits in IA32_MSR_XSS writing.
With IA32_MSR_XSS interception, member 'xss' of 'struct ext_context' can
be removed because it is duplicated with the MSR store array
'vcpu->arch.guest_msrs[]'.
Tracked-On: #5074
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Return-oriented programming (ROP), and similarly CALL/JMP-oriented
programming (COP/JOP), have been the prevalent attack methodologies for
stealth exploit writers targeting vulnerabilities in programs.
CET (Control-flow Enforcement Technology) provides the following
capabilities to defend against ROP/COP/JOP style control-flow subversion
attacks:
* Shadow stack: Return address protection to defend against ROP.
* Indirect branch tracking: Free branch protection to defend against
COP/JOP
The full support of CET for Linux kernel has not been merged yet. As the
first stage, hide CET from guest VM.
Tracked-On: #5074
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
On WHL platform, we need to pass through TPM to Secure pre-launched VM. In order
to do this, we need to add TPM2 ACPI Table and add TPM DSDT ACPI table to include
the _CRS.
Now we only support the TPM 2.0 device (TPM 1.2 device is not support). Besides,
the TPM must use Start Method 7 (Uses the Command Response Buffer Interface)
to notify the TPM 2.0 device that a command is available for processing.
Tracked-On: #5053
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Using ACPI_TABLE_HEADER MACRO to initial the ACPI Table Header.
Tracked-On: #5053
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
Add mmio device pass through support for pre-launched VM.
When we pass through a MMIO device to pre-launched VM, we would remove its
resource from the SOS. Now these resources only include the MMIO regions.
Tracked-On: #5053
Acked-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Add two hypercalls to support MMIO device pass through for post-launched VM.
And when we support MMIO pass through for pre-launched VM, we could re-use
the code in mmio_dev.c
Tracked-On: #5053
Signed-off-by: Li Fei1 <fei1.li@intel.com>
During context switch in hypervisor, xsave/xrstore are used to
save/resotre the XSAVE area according to the XCR0 and XSS. The legacy
region in XSAVE area include FPU and SSE, we should make sure the
legacy region be saved during contex switch. FPU in XCR0 is always
enabled according to SDM.
For SSE, we enable it in XCR0 during context switch.
Tracked-On: #5062
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
kick_thread function is only used by kick_vcpu to kick vcpu out of
non-root mode, the implementation in it is sending IPI to target CPU if
target obj is running and target PCPU is not current one; while for
runnable obj, it will just make reschedule request. So the kick_thread
is not actually belong to scheduler module, we can drop it and just do
the cpu notification in kick_vcpu.
Tracked-On: #5057
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Reviewed-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
vcpu->running is duplicated with THREAD_STS_RUNNING status of thread
object. Introduce an API sleep_thread_sync(), which can utilize the
inner status of thread object, to do the sync sleep for zombie_vcpu().
Tracked-On: #5057
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Reviewed-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
1. Update thread status after switch_in/switch_out.
2. Add 'be_blocking' to represent the intermediate state during
sleep_thread and switch_out. After switch_out, the thread status
update to THREAD_STS_BLOCKED.
Tracked-On: #5057
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Reviewed-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
-- replace global hypercall lock with per-vm lock
-- add spinlock protection for vm & vcpu state change
v1-->v2:
change get_vm_lock/put_vm_lock parameter from vm_id to vm
move lock obtain before vm state check
move all lock from vmcall.c to hypercall.c
Tracked-On: #4958
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Hide sriov capability of passthrough devices for VMs at init_vdev_pt().
And for post-launched VM, allow assign PF.
Tracked-On: #5041
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Support hide SRIOV extend capability for passthough device
Tracked-On: #5041
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
There are some devices (like Samsung NVMe SSD SM981/PM981 which has 33 MSIX tables)
which have more than 16 MSIX tables. Extend the default value to 64 to handle them.
Tracked-On: #4994
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Some OSes assume the platform must have the IOAPIC. For example:
Linux Kernel allocates IRQ force from GSI (0 if there's no PIC and IOAPIC) on x86.
And it thinks IRQ 0 is an architecture special IRQ, not for device driver. As a
result, the device driver may goes wrong if the allocated IRQ is 0 for RTVM.
This patch expose vIOAPIC to RTVM with LAPIC passthru even though the RTVM can't
use IOAPIC, it servers as a place holder to fullfil the guest assumption.
After vIOAPIC has exposed to guest unconditionally, the 'ready' field could be
removed since we do vIOAPIC initialization for each guest.
Tracked-On: #4691
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
replace spinlock_obtain/spinlock_release with spinlock_irqsave_obtain
and spinlock_irqrestore_release to avoid dead lock for uart module.
this uart lock may be accessed in ISR context like this path:
dispatch_interrupt->pr_err/pr_xxx or printf
->console_write->uart16550_puts
Tracked-On: #4958
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
About the MSI/MSI-X Capability, there're some fields of it would never been changed
once they had been initialized. So it's no need to reset them once the vdev instance
is still used. What need to reset are the fields which would been changed by guest
at runtime.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
will follow this convention for spin lock initialization:
-- for simple global variable locks, use this style:
static spinlock_t xxx_spinlock = {.head = 0U, .tail = 0U,}
-- for the locks inside a data structure, need to call
spinlock_init to initialize.
Tracked-On: #4958
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
replace spinlock_obtain/spinlock_release with spinlock_irqsave_obtain
and spinlock_irqrestore_release to avoid dead lock for vpic module.
this vpic lock may be accessed in ISR context like this path:
dispatch_interrupt->do_softirq->softirq_handlers
->ptirq_softirq->ptirq_handle_intx->vpic_set_irqline
Tracked-On: #4958
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
hv: hypercall: restrict the condition to assign/deassign a pci device to
a post-launched VM for safety
For the safety of post-launched VMs, pci devices assignments should
occur only when VM is being created (at VM_CREATED STATUS), and pci
devices de-assignment should occur only when VM is being created or
shutdown/reset (at VM_CREATED or VM_PAUSED status)
Tracked-On: #4995
Acked-by: Eddie Done <eddie.dong@intel.com>
Reviewed-by: Li Fei <Fei1.Li@intel.com>
Signed-off-by: Wang Qian <qian1.wang@intel.com>
From the VT-d spec 8.3:
If a DRHD structure with INCLUDE_PCI_ALL flag Set is reported for a
Segment, it must be enumerated by BIOS after all other DRHD structures
for the same Segment.
However, some broken BIOS violate the rules. To bring up ACRN with them,
change the ASSERT to a permissive check to unblock the BIOS limitation.
Also, scan the DRHD list to find the one who has INCLUDE_PCI_ALL flag.
Tracked-On: #4937
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Replace dmar_iterate_tbl() by a direct for loop. Handle the
dmar_unit_cnt and handle_one_drhd() of each DRHD in the direct for loop.
Also tune some function definitions to save LOC.
Tracked-On: #4937
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
According to SDM 10.12.11, we can know this register is dedicated to the
purpose of sending self-IPIs with the intent of enabling a highly
optimized path for sending self-IPIs. Also sending the IPI via the Self
Interrupt Register ensures that interrupt is delivered to the processor
core. Specifically completion of the WRMSR instruction to the SELF IPI
register implies that the interrupt has been logged into the IRR.
Tracked-On: #4937
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Currently, not all platforms support posted interrupt processing of both
VT-x and VT-d. On EHL, VT-d doesn't support posted interrupt processing.
So in such scenario, is_pi_capable() in vcpu_handle_pi_notification()
will bypass the PIR pending bits check which might cause a self-NV-IPI
lost.
With commit "bf1ff8c98 (hv: Offload syncing PIR to vIRR to processor
hardware)", the syncing PIR to vIRR is postponed and it is handled by a
self-NV-IPI in the following VMEnter. The process looks like,
a) vcpu A accepts a virtual interrupt ->
1) ACRN_REQUEST_EVENT is set
2) corresponding bit in PIR is set
3) Posted Interrupt ON bit is set
b) vcpu A does virtual interrupt injection on resume path due to
the pending ACRN_REQUEST_EVENT ->
1) hypervisor disables host interrupt
2) ACRN_REQUEST_EVENT is cleared
3) a self-NV-IPI is sent via ICR of LAPIC.
4) IRR bit of the self-NV-IPI is set
c) (VM-ENTRY) vcpu A returns into non-root mode
1) host interrupt enable(by HW)
2) posted interrupt processing clears the ON bit, sync PIR to vIRR
3) deliver the virtual interrupt if guest rflags.IF=1
d) (VM-EXIT) vcpu A traps due to a instruction execution (e.g. HLT)
1) host interrupt disable(by HW)
2) hypervisor enable host interrupt
Above illustrates a normal process of the virtual interrupt injection
with cpu PI support. However, a failing case is observed. The failing
case is that the self-NV-IPI from b-3 is not accepted by the core until
a timing between d-1 and d-2. b-4 happening between d-1 and d-2 is
observed by debug trace. So the self-NV-IPI will be handled in root-mode
which cannot do the syncing PIR to vIRR processing. Due to the bug
described in the first paragraph, vcpu_handle_pi_notification() cannot
succeed the virtual interrupt injection request. This patch fix it by
removing the wrong check in vcpu_handle_pi_notification() because
vcpu_handle_pi_notification() only happens on platform with cpu PI
support.
Here are some cost data for sending IPI via LAPIC ICR regsiter.
Normally, the cycles between ICR write and IRR got set is 140~260,
which is not accurate due to the MSR read overhead.
And from b-3 to c is about 560 cycles. So b-4 happens during this
period. But in bad case, b-4 doesn't happen even c is triggered.
The worse case i captured is that ICR write and IRR got set costs more
than 1900 cycles. Now, the best GUESS of the huge cost of IPI via ICR is
the ACPI bus arbitration(refer to SDM 10.6.3, 10.7 and Figure 10-17).
Tracked-On: #4937
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
We hit following build error when using gcc10:
arch/x86/page.c:240:48: error: array subscript is outside
array bounds of 'struct page[0][1]' [-Werror=array-bounds]
It happens with gcc10 on different Linux distributions.
Regarding the case that ACRN depends on zero length array in
sevaral places, we disable the zero length array warning by
gcc option.
Tracked-On: #4810
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Wrap a function to do guest ept flush. This function doesn't do real EPT flush.
It just make the EPT flush request and do the real flush just before vcpu vmenter.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
-- remove unnecessary lock in pci_mmcfg_read_cfg and
pci_mmcfg_write_cfg since the mmio operation is atomic
if the offest is aligned with 1/2/4 bytes.
-- move pci_is_valid_access to pci.h
Tracked-On: #4958
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
remove spin lock for micro code update since the guest
operating system will take lock
Tracked-On: #4958
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
The commit 'HV: Config Splitlock Detection to be disable' allows
using CONFIG_ENFORCE_TURNOFF_AC to turn off splitlock #AC. If
CONFIG_ENFORCE_TURNOFF_AC is not set, splitlock #AC should be turn on
Tracked-On: #4962
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Check bit 48 in IA32_VMX_BASIC MSR, if it is 1, return error, as we only
support Intel 64 architecture.
SDM:
Appendix A.1 BASIC VMX INFORMATION
Bit 48 indicates the width of the physical addresses that may be used for the
VMXON region, each VMCS, anddata structures referenced by pointers in a
VMCS (I/O bitmaps, virtual-APIC page, MSR areas for VMX transitions). If
the bit is 0, these addresses are limited to the processor’s
physical-address width.2 If the bit is 1, these addresses are limited to
32 bits. This bit is always 0 for processors that support Intel 64
architecture.
Tracked-On: #4956
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
We always assume the physical platform has XSAVE, and we always enable
XSAVE at the beginning, so, no need to check the OXSAVE in host.
Tracked-On: #4956
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
As build variants for different board and different scenario growing, users
might make mistake on HV binary distributions. Checking board/scenario info
from log would be the fastest way to know whether the binary matches. Also
it would be of benifit to developers for confirming the correct binary they
are debugging.
Tracked-On: #4946
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
hv: pci: refine pci_find_vdev with hash
1. Refined pci_find_vdev with BDF-hashing for better performance
Tracked-On: #4857
Signed-off-by: Wang Qian <qian1.wang@intel.com>
Reviewed-by: Li Fei <Fei1.Li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
hv: pci: refine pci_lookup_drhd_for_pbdf with hash
1. Added an auxiliary function pci_find_pdev using hash to find pdev
with pbdf, thus pci_lookup_drhd_for_pbdf will have a better performance
Tracked-On: #4857
Signed-off-by: Wang Qian <qian1.wang@intel.com>
Reviewed-by: Li Fei <Fei1.Li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
hv: pci: rename pci_pdev_array to pci_pdevs to make it clearer
Tracked-On: #4857
Signed-off-by: Wang Qian <qian1.wang@intel.com>
Reviewed-by: Li Fei <Fei1.Li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Some passthrough devices require multiple MSI vectors, but don't
support MSI-X. In meanwhile, Linux kernel doesn't support continuous
vector allocation.
On native platform, this issue can be mitigated by IOMMU via interrupt
remapping. However, on ACRN, there is no vIOMMU.
vMSI-X on MSI emulation is one solution to mitigate this problem on ACRN.
This patch adds MSI-X emulation on MSI capability.
For the device needs to do MSI-X emulation, HV will hide MSI capability
and present MSI-X capability to guest.
The guest driver may need to modify to reqeust MSI-X vector.
For example:
ret = pci_alloc_irq_vectors(pdev, 1, STMMAC_MSI_VEC_MAX,
- PCI_IRQ_MSI);
+ PCI_IRQ_MSI | PCI_IRQ_MSIX);
To enable MSI-X emulation, the device should:
- 1. The device should be in vmsix_on_msi_devs array.
- 2. Support MSI, but don't support MSI-X.
- 3. MSI capability should support per-vector mask.
- 4. The device should have an unused BAR.
- 5. The device driver should not rely on PBA for functionality.
Tracked-On: #4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
dmar_reserve_irte is added to reserve N coutinuous IRTEs.
N could be 1, 2, 4, 8, 16, or 32.
The reserved IRTEs will not be freed.
Tracked-On:#4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
For a ptirq_remapping_info entry, when build IRTE:
- If the caller provides a valid IRTE, use the IRET
- If the caller doesn't provide a valid IRTE, allocate a IRET when the
entry doesn't have a valid IRTE, in this case, the IRET will be freed
when free the entry.
Tracked-On:#4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
idx_in:
- If the caller of dmar_assign_irte passes a valid IRTE index, it will
be resued;
- If the caller of dmar_assign_irte passes INVALID_IRTE_ID as IRTE index,
the function will allocate a new IRTE.
idx_out:
This paramter return the actual index of IRTE used. The caller need to
check whether the return value is valid or not.
Also this patch adds an internal function alloc_irte.
The function takes count as input paramter to allocate continuous IRTEs.
The count can only be 1, 2, 4, 8, 16 or 32.
This is prepared for multiple MSI vector support.
Tracked-On: #4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Script only append 'U' for the config of int with a range.
Add a range to MAX_IR_ENTRIES.
Tracked-On: #4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
There're some platforms still doesn't support 1GB large page on CPU side.
Such as lakefield, TNT and EHL platforms on which have some silicon bug and
this case CPU don't support 1GB large page.
This patch tries to release this constrain to support more hardware platform.
Note this patch doesn't release the constrain on IOMMU side.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
This patch tries to release hardware platform 1GB large page support constrain
on CPU side.
There're some silicon bug on lakefield, TNT and EHL platforms which cause CPU
couldn't support 1GB large page. As a result, the pre-assumption The platform
which ACRN supports must support 1GB large page on both CPU side and VTD side
is not true any more.
This reverts commit f01aad7e to let trampoline execution use 2MB pages.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
The information needed to enable MSI-x emulation.
Only enable MSI-x emuation for the devices in msix_emul_devs array.
Currently, only EHL has the need to enable MSI-x emulation for TSN
devices.
Tracked-On: #4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The acrn_mbi.mi_mmap_va should point to struct multiboot2_mmap_entry when
boot from multiboot2, which is different from struct multiboot_mmap when
boot from multiboot1. So we should handle mmap info separately for multiboot2.
Tracked-On: #4885
Signed-off-by: Victor Sun <victor.sun@intel.com>
Previously the VM kernel bootargs for pre-launched VMs and direct boot mode
of SOS VM are built-in hypervisor binary so end users have no way to change
it. Now we provide another option that the multiboot module string could be
used as bootargs also. This would bring convenience to end users when they
use GRUB as bootloader because the bootargs could be configurable in GRUB
menu.
The usage is if there is any string follows configured kernel_mod_tag in
module string, the string will be used as new kernel bootargs instead of
built-in kernel bootargs. If there is no string follows kernel_mod_tag,
then the built-in bootargs will be the default kernel bootargs.
Please note kernel_mod_tag must be the first word in module string in any
case, it is used to specify the module for which VM.
Tracked-On: #4885
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Previously append_seed_arg() just do fill in seed arg to dest cmd buffer,
so rename the api name to fill_seed_arg().
Since fill_seed_arg() will be called in SOS VM path only, the param of
bool vm_is_sos is not needed and will be replaced by dest buffer size.
The seed_args[] which used by fill_seed_arg() is pre-defined as all-zero,
so memset() is not needed in fill_seed_arg(), buffer pointer check
and strncpy_s() are not needed also.
Tracked-On: #4885
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add a standard string api strncat_s() to replace merge_cmdline() to make code
more readable.
Another change is that the multiboot cmdline will be appended to the end of
configured SOS bootargs instead of the beginning, this would enable a feature
that some kernel cmdline paramter items could be overriden by multiboot cmdline
since the later one would win if same parameters configured in kernel cmdline.
Tracked-On: #4885
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Per C11 standard (ISO/IEC 9899:2011): K.3.7.1.4
1. Copying shall not take place between objects that overlap;
2. If there is a runtime-constraint violation, the strncpy_s function sets
s1[0] to '\0\;
3. The strncpy_s function returns zero if there was no runtime-constraint
violation. Otherwise, a nonzero value is returned.
4. The function is implemented with memcpy_s() because the runtime-constraint
detection is almost same.
Tracked-On: #4885
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Per C11 standard (ISO/IEC 9899:2011): K.3.7.1.1
1. Copying shall not take place between objects that overlap;
2. If there is a runtime-constraint violation, the memcpy_s function stores
zeros in the first s1max characters of the object;
3. The memcpy_s function returns zero if there was no runtime-constraint
violation. Otherwise, a nonzero value is returned.
Tracked-On: #4885
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The multiboot2 cmdline would be used as hypervisor cmdline, add parse logic
for the case that hypervisor boot from multiboot2 protocol.
Tracked-On: #4885
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Previously sanitize_multiboot_info() was called after init_debug_pre() because
the debug message can only print after uart is initialized. On the other hand,
multiboot cmdline need to be parsed before init_debug_pre() because the cmdline
could override uart settings and make sure debug message printed successfully.
This cause multiboot info was parsed in two stages.
The patch revise the multiboot parse logic that split sanitize_multiboot_info()
api and use init_acrn_multiboot_info() api for the early stage. The most of
multiboot info will be initialized during this stage and no debug message need
to be printed. After uart is initialized, the sanitize_multiboot_info() would
do sanitize multiboot info and print needed debug messages.
Tracked-On: #4885
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
We define some functions to read some fields of the CFG header registers. We
could remove them since they're not necessary since calling pci_pdev_read_cfg
is simple.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Don't hardcode install paths. Instead of hardcoding where binaries are
installed, add variables that installer can override.
Tracked-On: #4864
Signed-off-by: Chee Yang Lee <chee.yang.lee@intel.com>
Signed-off-by: Naveen Saini <naveen.kumar.saini@intel.com>
The $(VERSION) should be depended on config.h change. For example, when RELEASE
parameter is changed in make commmand, CONFIG_RELEASE need to be updated in
defconfig file, and then message in version.h should be updated.
The patch also fix a bug that a code path in make defconfig never be triggered
because shell will treat [ ! -f $(KCONFIG_FILE) ] as false when $(KCONFIG_FILE)
is not specified. (i.e. "$(KCONFIG_FILE)" == "")
Tracked-On: #2412
Signed-off-by: Victor Sun <victor.sun@intel.com>
Update efi bootloader image file path for Yocto rootfs in Kconfig.
Tracked-On: #4868
Signed-off-by: Wei Liu <weix.w.liu@intel.com>
Reviewed-by: Victor Sun <victor.sun@intel.com>
Now Host Bridge and PCI Bridge could only be added to SOS's acrn_vm_pci_dev_config.
So For UOS, we always emualte Host Bridge and PCI Bridge for it and assign PCI device
to it; for SOS, if it's the highest severity VM, we will assign Host Bridge and PCI
Bridge to it directly, otherwise, we will emulate them same as UOS.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
According PCI Code and ID Assignment Specification Revision 1.11, a PCI device
whose Base Class is 06h and Sub-Class is 00h is a Host bridge.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
We should check whether a PCI device is host bridge or not by Base Class (06h)
and Sub-Class (00h).
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Guest may write a MSI-X capability register with only RW bits setting on. This works
well on native since the hardware will make sure RO register bits could not over-write.
However, the software needs more efforts to achieve this. This patch does this by
defining a RW permission mapping base on bits. When a guest tries to write a MSI-X
Capability register, only modify the RW bits on vCFG space.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Guest may write a MSI capability register with only RW bits setting on. This works
well on native since the hardware will make sure RO register bits could not over-write.
However, the software needs more efforts to achieve this. This patch does this by
defining a RW permission mapping base on bits. When a guest tries to write a MSI
Capability register, only modify the RW bits on vCFG space.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
As per the BWG a delay should be provided between the
INIT IPI and Startup IPI. Without the delay observe hangs
on certain platforms during MP Init sequence. So Setting
a delay of 10us between assert INIT IPI and Startup IPI.
Also, as per SDM section 10.7 the the de-assert INIT IPI is
only used for Pentium and P6 processors. This is not applicable
for Pentium4 and Xeon processors so removing this sequence.
Tracked-On: #4835
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
in shutdown_vm, it uses guest flags when handling the phyiscal
CPUs whose LAPIC is pass-through. So if it is cleared first,
the related vCPUs and pCPUs can not be switched to correct state.
so move the clear action after the flags used.
Tracked-On: #4848
Signed-off-by: Minggui Cao <minggui.cao@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
1. context_entry couldn't be NULL in iommu_attach_device since bus
number is checked before the call.
2. root_entry couldn't be NULL in iommu_detach_device since bus number
is checked before the call.
Tracked-On: #4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add a function dmar_unit_valid to check if the input dmar uint is valid
or not.
A valid dmar_unit should not be NULL, or ignore flag should not be set.
Tracked-On: #4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Snoop control will not be turned on by hypervisor, delete snoop control
related code.
Tracked-On: #4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Initialize root_table_addr/ir_table_addr of dmar uint when register the dmar uint.
So no need to check if they are initialzed or not later.
Tracked-On: #4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
1. though "pci_device_lock" & "logmsg_ctl.lock" are set to 0 when
system dose memory initialization, it is better to explicitly init
them before using.
2. unify the usage of spinlock_init
Tracked-On: #4827
Signed-off-by: Minggui Cao <minggui.cao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The mask valuei 0x3F was added to prevent out of range in array access.
However, it should not be hardcoded.
Since in ptirq_alloc_entry_id, the valid allocated id is no greater
than CONFIG_MAX_PT_IRQ_ENTRIES, it will not cause out of range array
access without mask.
So this patch removes the mask.
Also, use bitmap_clear_lock instead of bitmap_clear_nolock becuase there
could be the chance that more than 1 core to access a same 64bit var.
Tracked-On: #4828
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Invalidate cache by scanning and flushing the whole guest memory is
inefficient which might cause long execution time for WBINVD emulation.
A long execution in hypervisor might cause a vCPU stuck phenomenon what
impact Windows Guest booting.
This patch introduce a workaround method that pausing all other vCPUs in
the same VM when do wbinvd emulation.
Tracked-On: #4703
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
1. improve makefile to avoid duplicated build libs when make
in acrn-hypervisor/hypervisor directory to build HV only.
2. for debug/release library just select one makefile to build
Tracked-On: #2412
Signed-off-by: Minggui Cao <minggui.cao@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@intel.com>
Refine find_ptirq_entry by hashing instead of walk each of the PTIRQ entries one by one.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong<eddie.dong@Intel.com>
This patch adds hash function to hash 64bit value.
Tracked-On: #4550
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Acked-by: Eddie Dong<eddie.dong@Intel.com>
RTVM (with lapic PT) boots hang when maxcpus is
assigned a value less than the CPU number configured
in hypervisor.
In this case, vlapic_state(per VM) is left in TRANSITION
state after BSP boot, which blocks interupts to be injected
to this UOS.
Tracked-On: #4803
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Li, Fei <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
There's no need to look up MSI ptirq entry by virtual SID any more since the MSI
ptirq entry would be removed before the device is assigned to a VM.
Now the logic of MSI interrupt remap could simplify as:
1. Add the MSI interrupt remap first;
2. If step is already done, just do the remap part.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong<eddie.dong@Intel.com>
Reviewed-by: Grandhi, Sainath <sainath.grandhi@intel.com>
In commit 0a7770cb, we remove vm pointer in vpci structrue. So there's no need
for such pre-condition since vpci is embedded in vm structure. The vm can't be
NULL Once the vpci is not NULL.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong<eddie.dong@Intel.com>
The existing code do separately for each VM when we deinit vpci of a VM. This is
not necessary. This patch use the common handling for all VMs: we first deassign
it from the (current) user, then give it back to its parent user.
When we deassign the vdev from the (current) user, we would de-initialize the
vMSI/VMSI-X remapping, so does the vMSI/vMSI-X data structure.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong<eddie.dong@Intel.com>
Now we could know a device status by 'user' filed, like
---------------------------------------------------------------------------
| NULL | == vdev | != NULL && != vdev
vdev->user | device is de-init | used by itself VM | assigned to another VM
---------------------------------------------------------------------------
So we don't need to modify 'vpci' field accordingly.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong<eddie.dong@Intel.com>
Add a new field 'parent_user' to record the parent user of the vdev. And refine
'new_owner' to 'user' to record who is the current user of the vdev. Like
-----------------------------------------------------------------------------------------------
vdev in | HV | pre-VM | SOS | post-VM
| | |vdev used by SOS|vdev used by post-VM|
-----------------------------------------------------------------------------------------------
parent_user| NULL(HV) | NULL(HV) | NULL(HV) | NULL(HV) | vdev in SOS
-----------------------------------------------------------------------------------------------
user | vdev in HV | vdev in pre-VM | vdev in SOS | vdev in post-VM | vdev in post-VM
-----------------------------------------------------------------------------------------------
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong<eddie.dong@Intel.com>
If HV relocation is enabled, either ACRN efi-stub or GRUB relocates
hypervisor image above HPA 256MB, thus we put hvlog and ramoops buffer
under 256MB to avoid conflict with hypervisor owned address.
This patch hardcodes these addresses:
0xa00000 - 0xdfffff: 4MiB for ramoops buffer
0xe00000 - 0xffffff: 2MiB for hvlog buffer
However, user can customize them to other addresses as long as it's under
256MB, available in host e820, and SOS bootarg "nokaslr" is not specified.
If HV relocation is disabled, need to make sure that these buffer
addresses are not between HV_RAM_START and HV_RAM_START + HV_RAM_SIZE.
Tracked-On: #4760
Signed-off-by: Zide Chen <zide.chen@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
For post-launched VMs, the configured CPU affinity could be different
from the actual running CPU affinity. This new field acrn_vm->cpu_affinity
recognizes this difference so that it's possible that CREATE_VM
hypercall won't overwrite the configured CPU afifnity.
Change name cpu_affinity_bitmap in acrn_vm_config to cpu_affinity.
This is read-only in run time, never overwritten by acrn-dm.
Remove vm_config->vcpu_num, which means the number of vCPUs of the
configured CPU affinity. This is not to be confused with the actual
running vCPU number: vm->hw.created_vcpus.
Changed get_vm_bsp_pcpu_id() to get_configured_bsp_pcpu_id() for less
confusion.
Tracked-On: #4616
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
ACRN syncs PIR to vIRR in the software in cases the Posted
Interrupt notification happens while the pCPU is in root mode.
Sync can be achieved by processor hardware by sending a
posted interrupt notiification vector.
This patch sends a self-IPI, if there are interrupts pending in PIR,
which is serviced by the logical processor at the next
VMEnter
Tracked-On: #4777
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
CDP is an extension of CAT. It enables isolation and separate prioritization of
code and data fetches to the L2 or L3 cache in a software configurable manner,
depending on hardware support.
This commit adds a Kconfig switch "CDP_ENABLED" which depends on "RDT_ENABLED".
CDP will be enabled if the capability available and "CDP_ENABLED" is selected.
Tracked-On: #4604
Signed-off-by: Yan, Like <like.yan@intel.com>
Reviewed-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This commit makes some RDT code cleanup, mainling including:
- remove the clos_mask and mba_delay validation check in setup_res_clos_msr(), the check will be done in pre-build;
- rename platform_clos_num to valid_clos_num, which is set as the minimal clos_mas of all enabled RDT resouces;
- init the platform_clos_array in the res_cap_info[] definition;
- remove the unnecessary return values and return value check.
Tracked-On: #4604
Signed-off-by: Yan, Like <like.yan@intel.com>
A RDT resource could be CAT or MBA, so only one of struct rdt_cache and struct rdt_membw
would be used at a time. They should be a union.
This commit merge struct rdt_cache and struct rdt_membw in to a union res.
Tracked-On: #4604
Signed-off-by: Yan, Like <like.yan@intel.com>
Reviewed-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com
#AC should be normally enabled for slpitlock detection, however,
community developers may want to run ACRN on buggy system.
In this case, CONFIG_ENFORCE_TURNOFF_AC can be used to turn off the
#AC, to let the guest run without #AC.
Tracked-On: #4765
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The virtual MSI information could be included in ptirq_remapping_info structrue,
there's no need to pass another input paramater for this puepose. So we could
remove the ptirq_msi_info input.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
We look up PTIRQ entru only by SID. So _by_sid could removed.
And refine function name to verb-obj style.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
For return value of local_gpa2hpa, either INVALID_HPA or NULL
means the EPT walking failure. Current code only take care of
NULL return and leave INVALID_HPA as correct case.
In some cases (if guest page table is filled with invalid memory
address), it could crash ACRN from guest.
Add INVALID_HPA return check as well.
Also add @pre assumptions for some gpa2hpa usages.
Tracked-On: #4730
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Assign PCPU0-1 to post-launched VM. The CPU affinity can be overridden
with the '--cpu_affinity' parameter of acrn-dm.
Tracked-On: #4641
Signed-off-by: Wei Liu <weix.w.liu@intel.com>
Acked-by: Victor Sun <victor.sun@intel.com>
When boot ACRN hypervisor from grub multiboot, HV will be loaded at
CONFIG_HV_RAM_START since relocation is not supported in grub
multiboot1. The CONFIG_HV_RAM_SIZE in industry scenario will take
~330MB(0x14000000), unfortunately the efi memmap on NUC7i7DNB is
truncated at 0x6dba2000 although it is still usable from 0x6dba2000. So
from grub point of view, it could not find a continuous memory from
0x6000000 to load industry scenario. Per efi memmap, there is a big
memory area available from 0x40400000, so put CONFIG_HV_RAM_START to
0x41000000 is much safe for NUC7i7DNB.
Tracked-On: #4641
Signed-off-by: Victor Sun <victor.sun@intel.com>
The original stack used in CPU booting is: ld_bss_end + 4KB;
which could be out of the RAM size limit defined in link_ram file.
So add a specific stack space in link_ram file, and used in
CPU booting.
Tracked-On: #4738
Signed-off-by: Minggui Cao <minggui.cao@intel.com>
Reviewed by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
According to SDM Vol 3, Chap 10.4.7.2 Local APIC State After It Has Been Software Disabled,
The mask bits for all the LVT entries are set when the local APIC has been software disabled.
So there's no need to mask all the LVT entries one by one.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Most code in the if ... else is duplicated. We could put it out of the
conditional statement.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
- add a new member cpu_affinity to struct acrn_create_vm, so that acrn-dm
is able to assign CPU affinity through HC_CREATE_VM hypercall.
- if vm_create.cpu_affinity is zero, hypervisor launches the VM with the
statically configured CPU affinity.
Tracked-On: #4616
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Currently the vcpu_affinity[] array fixes the vCPU to pCPU mapping.
While the new cpu_affinity_bitmap doesn't explicitly sepcify this
mapping, instead, it implicitly assumes that vCPU0 maps to the pCPU
with lowest pCPU ID, vCPU1 maps to the second lowest pCPU ID, and
so on.
This makes it possible for post-launched VM to run vCPUs on a subset of
these pCPUs only, and not all of them.
acrn-dm may launch post-launched VMs with the current approach: indicate
VM UUID and hypervisor launches all VCPUs from the PCPUs that are masked
in cpu_affinity_bitmap.
Also acrn-dm can choose to launch the VM on a subset of PCPUs that is
defined in cpu_affinity_bitmap. In this way, acrn-dm must specify the
subset of PCPUs in the CREATE_VM hypercall.
Additionally, with this change, a guest's vcpu_num can be easily calculated
from cpu_affinity_bitmap, so don't assign vcpu_num in vm_configuration.c.
Tracked-On: #4616
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
SCENARIO XML file has included RELEASE or DEBUG info already, so if RELEASE
is not specified in make command, Makefile should not override RELEASE info
in SCENARIO XML. If RELEASE is specified in make command, then RELEASE info
in SCENARIO XML could be overridden by make command.
The patch also fixed a issue that get correct board defconfig when build
hypervisor from TARGET_DIR;
Tracked-On: #4688
Signed-off-by: Victor Sun <victor.sun@intel.com>
The parameter of "idle=halt" for SOS cmdline is only needed when cpu sharing
is enabled, otherwise it will impact SOS power.
Tracked-On: #4329
Signed-off-by: Victor Sun <victor.sun@intel.com>
Add FADT table support to support guest S5 setting.
According to ACPI 6.3 Spec, OSPM must ignored the DSDT and FACS fields if them're zero.
However, Linux kernel seems not to abide by the protocol, it will check DSDT still.
So add an empty DSDT to meet it.
Tracked-On: #4623
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Remove sdc2 scenario since the VM launch requirement under this scenario
could be satisfied by industry scenario now;
Tracked-On: #4661
Signed-off-by: Victor Sun <victor.sun@intel.com>
In industry scenario, hypervisor will support 1 post-launched RT VM
and 1 post-launched kata VM and up to 5 post-launched standard VMs;
Tracked-On: #4661
Signed-off-by: Victor Sun <victor.sun@intel.com>
The patch enables CPU sharing feature by default, the default scheduler is
set to SCHED_BVT;
Tracked-On: #4661
Signed-off-by: Victor Sun <victor.sun@intel.com>
The current logic puts hpa2 above GPA 4G always, which is incorrect. Need
to set gpa start of hpa2 right after hpa1 when hpa1 size is less then 2G;
Tracked-On: #4458
Signed-off-by: Victor Sun <victor.sun@intel.com>
On most board the MCFG base is set to 0xe0000000, so modify this value in
platform_acpi_info.h for generic boards;
The description of ACPI_PARSE_ENABLED is modified also to match its usage.
Tracked-On: #4157
Signed-off-by: Victor Sun <victor.sun@intel.com>
CONFIG_MAX_KATA_VM_NUM is a scenario specific configuration, so it is better
to put the MACRO in scenario folder directly, to instead the Kconfig item in
Kconfig file which should work for all scenarios;
Tracked-On: #4616
Signed-off-by: Victor Sun <victor.sun@intel.com>
Basicly ACRN scenario is a configuration name for specific usage. By giving
scenario name ACRN will load corresponding VM configurations to build the
hypervisor. But customer might have their own scenario name, change the
scenario type from choice to string is friendly to them since Kconfig source
file change will not be needed.
With this change, CONFIG_$(SCENARIO) will not exist in kconfig file and will
be instead of CONFIG_SCENARIO, so the Makefile need to be changed accordingly;
Tracked-On: #4616
Signed-off-by: Victor Sun <victor.sun@intel.com>
The pci_dev config settings of SOS are same so move the config interface
from vm_configurations.c to CONFIG_SOS_VM macro;
Tracked-On: #4616
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Currently the vm uuid and severity is initilized separately in
vm_config struct, developer need to take care both items carefuly
otherwise hypervisor would have trouble with the configurations.
Given the vm loader_order/uuid and severity are binded tightly, the
patch merged these tree settings in one macro so that developer will
have a simple interface to configure in vm_config struct.
Tracked-On: #4616
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
If CPU has MSR_TEST_CTL, show an emulaued one to VCPU
Tracked-On: #4496
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Yan, Like <like.yan@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
If CPU support rise #AC for Splitlock Access, then enable this
feature at each CPU.
Tracked-On: #4496
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Yan, Like <like.yan@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
When the destination of an atomic memory operation located in 2
cache lines, it is called a Splitlock Access. LOCK# bus signal is
asserted for splitlock access which may lead to long latency. #AC
for Splitlock Access is a CPU feature, it allows rise alignment
check exception #AC(0) instead of asserting LOCK#, that is helpful
to detect Splitlock Access.
This feature is enumerated by MSR(0xcf) IA32_CORE_CAPABILITIES[bit5]
Add helper function:
bool has_core_cap(uint32_t bitmask)
Tracked-On: #4496
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Yan, Like <like.yan@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
remove unnecessary state check and
add pre-condition for vcpu APIs.
Tracked-On: #4320
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
check the vm state in hypercall api,
add pre-condition for vm api.
Tracked-On: #4320
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
now it will call pause_vm in shutdown_vm,
move it out from shutdown_vm to reduce coupling.
Tracked-On: #4320
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This patch wrapps a common function to initialize physical
CPU for the second phase to reduce redundant code.
Tracked-On: #861
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Hypervisor reports VM configuration information to SOS which can be used to
dynamically allocate VCPU affinity.
Servise OS can get the vm_configs in this order:
1. call platform_info HC (set vm_configs_addr with 0) to get max_vms and
vm_config_entry_size.
2. allocate memory for acrn_vm_config array based on the number of VMs
and entry size that just got in step 1.
3. call platform_info HC again to collect VM configurations.
Tracked-On: #4616
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
ACRN disables Snoop Control in VT-d DMAR engines for simplifing the
implementation. Also, since the snoop behavior of PCIE transactions
can be controlled by guest drivers, some devices may take the advantage
of the NO_SNOOP_ATTRIBUTE of PCIE transactions for better performance
when snoop is not needed. No matter ACRN enables or disables Snoop
Control, the DMA operations of passthrough devices behave correctly
from guests' point of view.
This patch is used to clean all the snoop related code.
Tracked-On: #4509
Signed-off-by: Xiaoguang Wu <xiaoguang.wu@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
Due to the fact that i915 iommu doesn't support snoop, hence it can't
access memory when the SNOOP bit of Secondary Level page PTE (SL-PTE)
is set, this will cause many undefined issues such as invisible cursor
in WaaG etc.
Current hv design uses EPT as Scondary Leval Page for iommu, and this
patch removes the codes of setting SNOOP bit in both EPT-PTE and SL-PTE
to avoid errors.
And according to SDM 28.2.2, the SNOOP bit (11th bit) will be ignored
by EPT, so it will not affect the CPU address translation.
Tracked-On: #4509
Signed-off-by: Xiaoguang Wu <xiaoguang.wu@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
Waag will send NMIs to all its cores during reboot. But currently,
NMI cannot be injected to vcpu which is in HLT state.
To fix the problem, need to wakeup target vcpu, and inject NMI through
interrupt-window.
Tracked-On: #4620
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Align the implementation to SDM Vol.3 4.1.1.
Also this patch fixed a bug that doesn't check paging status first
in some cpu mode.
Tracked-On: #4628
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
now the actual address does not match with the MAP file
if enable CONFIG_RELO when there are some exceptions,
this patch print the delta between the actual load addess
and CONFIG_HV_RAM_START.
Tracked-On: #4144
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Currently vlapic_build_id() uses vcpu_id to retrieve the lapic_id
per_cpu variable:
vlapic_id = per_cpu(lapic_id, vcpu->vcpu_id);
SOS vcpu_id may not equal to pcpu_id, and in that case it runs into
problems. For example, if any pre-launched VMs are launched on PCPUs
whose IDs are smaller than any PCPU IDs that are used by SOS.
This patch fixes the issue and simplify the code to create or get
vapic_id by:
- assign vapic_id in create_vlapic(), which now takes pcpu_id as input
argument, and save it in the new field: vlapic->vapic_id, which will
never be changed.
- simplify vlapic_get_apicid() by returning te saved vapid_id directly.
- remove vlapic_build_id().
- vlapic_init() is only called once, merge it into vlapic_create().
Tracked-On: #4268
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Maintain a per-pCPU array of vCPUs (struct acrn_vcpu *vcpu_array[CONFIG_MAX_VM_NUM]),
one VM cannot have multiple vCPUs share one pcpu, so we can utilize this property
and use the containing VM's vm_id as the index to the vCPU array:
In create_vcpu(), we simply do:
per_cpu(vcpu_array, pcpu_id)[vm->vm_id] = vcpu;
In offline_vcpu():
per_cpu(vcpu_array, pcpuid_from_vcpu(vcpu))[vcpu->vm->vm_id] = NULL;
so basically we use the containing VM's vm_id as the index to the vCPU array,
as well as the index of posted interrupt IRQ/vector pair that are assigned
to this vCPU:
0: first vCPU and first posted interrupt IRQs/vector pair
(POSTED_INTR_IRQ/POSTED_INTR_VECTOR)
...
CONFIG_MAX_VM_NUM-1: last vCPU and last posted interrupt IRQs/vector pair
((POSTED_INTR_IRQ + CONFIG_MAX_VM_NUM - 1U)/(POSTED_INTR_VECTOR + CONFIG_MAX_VM_NUM - 1U)
In the posted interrupt handler, it will do the following:
Translate the IRQ into a zero based index of where the vCPU
is located in the vCPU list for current pCPU. Once the
vCPU is found, we wake up the waiting thread and record
this request as ACRN_REQUEST_EVENT
Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
This is a preparation patch for adding support for VT-d PI
related vCPU scheduling.
ACRN does not support vCPU migration, one vCPU always runs on
the same pCPU, so PI's ndst is never changed after startup.
VCPUs of a VM won’t share same pCPU. So the maximum possible number
of VCPUs that can run on a pCPU is CONFIG_MAX_VM_NUM.
Allocate unique Activation Notification Vectors (ANV) for each vCPU
that belongs to the same pCPU, the ANVs need only be unique within each
pCPU, not across all vCPUs. This reduces # of pre-allocated ANVs for
posted interrupts to CONFIG_MAX_VM_NUM, and enables ACRN to avoid
switching between active and wake-up vector values in the posted
interrupt descriptor on vCPU scheduling state changes.
A total of CONFIG_MAX_VM_NUM consecutive IRQs/vectors are reserved
for posted interrupts use.
The code first initializes vcpu->arch.pid.control.bits.nv dynamically
(will be added in subsequent patch), the other code shall use
vcpu->arch.pid.control.bits.nv instead of the hard-coded notification vectors.
Rename some functions:
apicv_post_intr --> apicv_trigger_pi_anv
posted_intr_notification --> handle_pi_notification
setup_posted_intr_notification --> setup_pi_notification
Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
Fill in posted interrupt fields (vector, pda, etc) and set mode to 1 to
enable VT-d PI (posted mode) for this ptdev.
If intr_src->pi_vcpu is 0, fall back to use the remapped mode.
Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
Given the vcpumask, check if the IRQ is single destination
and return the destination vCPU if so, the address of associated PI
descriptor for this vCPU can then be passed to dmar_assign_irte() to
set up the posted interrupt IRTE for this device.
For fixed mode interrupt delivery, all vCPUs listed in vcpumask should
service the interrupt requested. But VT-d PI cannot support multicast/broadcast
IRQs, it only supports single CPU destination. So the number of vCPUs
shall be 1 in order to handle IRQ in posted mode for this device.
Add pid_paddr to struct intr_source. If platform_caps.pi is true and
the IRQ is single-destination, pass the physical address of the destination
vCPU's PID to ptirq_build_physical_msi and dmar_assign_irte
Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
Add platform_caps.c to maintain platform related information
Set platform_caps.pi to true if all iommus are posted interrupt capable, false
otherwise
If lapic passthru is not configured and platform_caps.pi is true, the vm
may be able to use posted interrupt for a ptdev, if the ptdev's IRQ is
single-destination
Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
PCI devices with 64-bit MMIO BARs and requiring large MMIO space
can be assigned with physical address range at the very high end of
platform supported physical address space.
This patch uses the board info for 64-bit MMIO window as programmed
by BIOS and constructs 1G page tables for the same.
As ACRN uses identity mapping from Linear to Physical address space
physical addresses upto 48 bit or 256TB can be supported.
Tracked-On: #4586
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add 64-bit MMIO window related MACROs to the supported board files
in the hypervisor source code.
Tracked-On: #4586
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
EPT table can be changed concurrently by more than one vcpus.
This patch add a lock to protect the add/modify/delete operations
from different vcpus concurrently.
Tracked-On: #4253
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Reviewed-by: Li, Fei1 <fei1.li@intel.com>
The VMCS field is an embedded array for a vCPU. So there's no need to check for
NULL before use.
Tracked-On: #3813
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Conceptually, the devices unregistration sequence of the shutdown process should be
opposite to create.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Customer might have specific folder where stores their own configurations
for their customized scenario/board, so add TARGET_DIR parameter to support
option that build hyprvisor with specified configurations.
So valid usages are: (target = all | hypervisor)
1. make <target>
2. make <target> KCONFIG_FILE=xxx [TARGET_DIR=xxx]
3. make <target> BOARD=xxx SCENARIO=xxx [TARGET_DIR=xxx]
4. make <target> BOARD_FILE=xxx SCENARIO_FILE=xxx [TARGET_DIR=xxx]
5. make <target> KCONFIG_FILE=xxx BOARD_FILE=xxx SCENARIO_FILE=xxx [TARGET_DIR=xxx]
If TARGET_DIR parameter is not specified in make command, hypervisor will be
built with board configurations under hypervisor/arch/x86/configs/ and scenario
configurations under hypervisor/scenarios/. Moreover, the configurations would
be overwritten if BOARD/SCENARIO files are specified in make command.
If TARGET_DIR parameter is specified in make command, hypervisor will be built
with configuration under that folder if no BOARD/SCENARIO files are specified.
When BOARD/SCENARIO files are available in make command, the TARGET_DIR is used
to store configurations that BOARD/SCENARIO file provided, i.e. Configurations
in TARGET_DIR folder will be overwritten.
Tracked-On: #4517
Signed-off-by: Victor Sun <victor.sun@intel.com>
When user use make parameters to specify BOARD and SCENARIO, there might
be some conflict because parameter of KCONFIG_FILE/BOARD_FILE/SCENARIO_FILE
also includes BOARD/SCENARIO info. To simplify, we only alow below valid
usages:
1. make <target>
2. make <target> KCONFIG_FILE=xxx
3. make <target> BOARD=xxx SCENARIO=xxx
4. make <target> BOARD_FILE=xxx SCENARIO_FILE=xxx
5. make <target> KCONFIG_FILE=xxx BOARD_FILE=xxx SCENARIO_FILE=xxx
Especially for case 1 that no any parameters are specified:
a. If hypervisor/build/.config file which generated by "make menuconfig"
exist, the .config file will be loaded as KCONFIG_FILE:
i.e. equal: make <target> KCONFIG_FILE=hypervisor/build/.config
b. If hypervisor/build/.config file does not exist,
the default BOARD/SCENARIO will be loaded:
i.e. equal: make <target> BOARD=$(BOARD) SCENARIO=$(SCENARIO)
Tracked-On: #4517
Signed-off-by: Victor Sun <victor.sun@intel.com>
This commit allows hypervisor to allocate cache to vcpu by assigning different clos
to vcpus of a same VM.
For example, we could allocate different cache to housekeeping core and real-time core
of an RTVM in order to isolate the interference of housekeeping core via cache hierarchy.
Tracked-On: #4566
Signed-off-by: Yan, Like <like.yan@intel.com>
Reviewed-by: Chen, Zide <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
In dmar_issue_qi_request, currently use a global var qi_status, which could
cause potential issue when concurrent call to dmar_issue_qi_request for different
DMAR units.
Use local var instead.
Tracked-On: #4535
Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
As ACRN prepares to support servers with large amounts of memory
current logic to allocate space for 4K pages of EPT at compile time
will increase the size of .bss section of ACRN binary.
Bootloaders could run into a situation where they cannot
find enough contiguous space to load ACRN binary under 4GB,
which is typically heavily fragmented with E820 types Reserved,
ACPI data, 32-bit PCI hole etc.
This patch does the following
1) Works only for "direct" mode of vboot
2) reserves space for 4K pages of EPT, after boot by parsing
platform E820 table, for all types of VMs.
Size comparison:
w/o patch
Size of DRAM Size of .bss
48 GB 0xe1bbc98 (~226 MB)
128 GB 0x222abc98 (~548 MB)
w/ patch
Size of DRAM Size of .bss
48 GB 0x1991c98 (~26 MB)
128 GB 0x1a81c98 (~28 MB)
Tracked-On: #4563
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
hv: acpi: changed the enum "acpi_dmar_type" to macros to pass the
MISRA-C check
Tracked-On: #4535
Signed-off-by: Qian Wang <qian1.wang@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
hv: pci: renamed some internal data structs to make them more
understandable
Tracked-On: #4535
Signed-off-by: Qian Wang <qian1.wang@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
hv: vtd: renamed some static functions from dmar_verb to verb_dmar
Tracked-On: #4535
Signed-off-by: Qian Wang <qian1.wang@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
hv: vtd: corrected the return type of get_qi_queue and get_ir_table to
void *
Tracked-On: #4535
Signed-off-by: Qian Wang <qian1.wang@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
hv: vtd: removed is_host (always false) and is_tt_ept (always true) member
variables of struct iommu_domain and related codes since the values are
always determined.
Tracked-On: #4535
Signed-off-by: Qian Wang <qian1.wang@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
We could use container_of to get vcpu structure pointer from vmtrr. So vcpu
structure pointer is no need in vmtrr structure.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
We could use container_of to get vm structure pointer from vpic. So vm
structure pointer is no need in vpic structure.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
We could use container_of to get vm structure pointer from vpci. So vm
structure pointer is no need in vpci structure.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
We could use container_of to get vcpu/vm structure pointer from vlapic. So vcpu/vm
structure pointer is no need in vlapic structure.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
This function casts a member of a structure out to the containing structure.
So rename to container_of is more readable.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Exend union dmar_ir_entry to support VT-d posted interrupts.
Rename some fields of union dmar_ir_entry:
entry --> value
sw_bits --> avail
Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
Pass intr_src and dmar_ir_entry irte as pointers to dmar_assign_irte(),
which fixes the "Attempt to change parameter passed by value" MISRA C violation.
A few coding style fixes
Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
For CPU side posted interrupts, it only uses bit 0 (ON) of the PI's 64-bit control
, other bits are don't care. This is not the case for VT-d posted
interrupts, define more bit fields for the PI's 64-bit control.
Use bitmap functions to manipulate the bit fields atomically.
Some MISRA-C violation and coding style fixes
Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
The posted interrupt descriptor is more of a vmx/vmcs concept than a vlapic
concept. struct acrn_vcpu_arch stores the vmx/vmcs info, so put struct pi_desc
in struct acrn_vcpu_arch.
Remove the function apicv_get_pir_desc_paddr()
A few coding style/typo fixes
Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
Rename struct vlapic_pir_desc to pi_desc
Rename struct member and local variable pir_desc to pid
pir=posted interrupt request, pi=posted interrupt
pid=posted interrupt descriptor
pir is part of pi descriptor, so it is better to use pi instead of pir
struct pi_desc will be moved to vmx.h in subsequent commit.
Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
- since now we don't need to print error messages if copy_to/from_gpa()
fails, then in many cases we can simplify the function return handling.
In the following example, my fix could change the 'ret' value from
the original '-1' to the actual errno returned from copy_to_gpa(). But
this is valid. Ideally we may replace all '-1' with the actual errno.
- if (copy_to_gpa() < 0) {
- pr_err("error messages");
- ret = -1;
- } else {
- ret = 0;
- }
+ ret = copy_to_gpa();
- in most cases, 'ret' is declared with a default value 0 or -1, then the
redundant assignment statements can be removed.
- replace white spaces with tabs.
Tracked-On: #3854
Signed-off-by: Zide Chen <zide.chen@intel.com>
The cupid() can be replaced with cupid_subleaf, which is more clear.
Having both APIs makes reading difficult.
Tracked-On: #4526
Signed-off-by: Li Fei1 <fei1.li@intel.com>
To support server platforms with more than 8 IO-APICs
Tracked-On: #4151
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
For SOS VM, when the target platform has multiple IO-APICs, there
should be equal number of virtual IO-APICs.
This patch adds support for emulating multiple vIOAPICs per VM.
Tracked-On: #4151
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
MADT is used to specify the GSI base for each IO-APIC and the number of
interrupt pins per IO-APIC is programmed into Max. Redir. Entry register of
that IO-APIC.
On platforms with multiple IO-APICs, there can be holes in the GSI space.
For example, on a platform with 2 IO-APICs, the following configuration has
a hole (from 24 to 31) in the GSI space.
IO-APIC 1: GSI base - 0, number of pins - 24
IO-APIC 2: GSI base - 32, number of pins - 8
This patch also adjusts the size for variables used to represent the total
number of IO-APICs on the system from uint16_t to uint8_t as the ACPI MADT
uses only 8-bits to indicate the unique IO-APIC IDs.
Tracked-On: #4151
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
As ACRN prepares to support platforms with multiple IO-APICs,
GSI is a better way to represent physical and virtual INTx interrupt
source.
1) This patch replaces usage of "pin" with "gsi" whereever applicable
across the modules.
2) PIC pin to gsi is trickier and needs to consider the usage of
"Interrupt Source Override" structure in ACPI for the corresponding VM.
Tracked-On: #4151
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
Changes the mmio handler data from that of the acrn_vm struct to
the acrn_vioapic.
Add nr_pins and base_addr to the acrn_vioapic data structure.
Tracked-On: #4151
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
Reverts 538ba08c: hv:Add vpin to ptdev entry mapping for vpic/vioapic
ACRN uses an array of size per VM to store ptirq entries against the vIOAPIC pin
and an array of size per VM to store ptirq entries against the vPIC pin.
This is done to speed up "ptirq entry" lookup at runtime for Level triggered
interrupts in API ptirq_intx_ack used on EOI.
This patch switches the lookup API for INTx interrupts to the API,
ptirq_lookup_entry_by_sid
This could add delay to processing EOI for Level triggered interrupts.
Trade-off here is space saved for array/s of size CONFIG_MAX_IOAPIC_LINES with 8 bytes
per data. On a server platform, ACRN needs to emulate multiple vIOAPICs for
SOS VM, same as the number of physical IO-APICs. Thereby ACRN would need around
10 such arrays per VM.
Removes the need of "pic_pin" except for the APIs facing the hypercalls
hcall_set_ptdev_intr_info, hcall_reset_ptdev_intr_info
Tracked-On: #4151
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
- need to specify the load_addr in the multiboot2 address tag. GRUB needs
it to correctly calculate the ACRN binary's load size if load_end_addr is
a non-zero value.
- multiboot2 can be enabled if hypervisor relocation is disabled.
- print the name of the boot loader. This might be helpful if the boot
loader, e.g. GRUB, inludes its version in the name string.
Tracked-On: #4441
Signed-off-by: Victor Sun <victor.sun@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
There're some PCI devices need special handler for vendor-specical feature or
capability CFG access. The Intel GPU is one of them. In order to keep the ACRN-HV
clean, we want to throw the qurik part of PCI CFG asccess to DM to handle.
To achieve this, we implement per-device policy base on whether it needs quirk handler
for a VM: each device could configure as "quirk pass through device" or not. For a
"quirk pass through device", we will handle the general part in HV and the quirk part
in DM. For a non "quirk pass through device", we will handle all the part in HV.
Tracked-On: #4371
Signed-off-by: Li Fei1 <fei1.li@intel.com>
There're some cases the SOS (higher severity guest) needs to access the
post-launched VM (lower severity guest) PCI CFG space:
1. The SR-IOV PF needs to reset the VF
2. Some pass through device still need DM to handle some quirk.
In the case a device is assigned to a UOS and is not in a zombie state, the SOS
is able to access, if and only if the SOS has higher severity than the UOS.
Tracked-On: #4371
Signed-off-by: Li Fei1 <fei1.li@intel.com>
To avoid information leakage, we need to ensure that the device is
inaccessble when it does not exist.
For SR-IOV disabled VF device, we have the following operations.
1. The configuration space accessing will get 0xFFFFFFFF as a
return value after set the device state to zombie.
2. The BAR MMIO EPT mapping are removed, the accesssing causes
EPT violation.
3. The device will be detached from IOMMU.
4. The IRQ pin and vector are released.
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
As pci_devices.h is included by <page.h>, need to prepare pci_devices.h
for nuc6cayh and apl-up2 board.
Also the #error info in generic/pci_devices.h should be removed, otherwise
the build will be failed in sdc/sdc2/industry scenarios.
Tracked-On: #4458
Signed-off-by: Victor Sun <victor.sun@intel.com>
For a pre-launched VM, a region from PTDEV_HI_MMIO_START is used to store
64bit vBARs of PT devices which address is high than 4G. The region should
be located after all user memory space and be coverd by guest EPT address.
Tracked-On: #4458
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
ve820.c is a common file in arch/x86/guest/ now, so move function of
create_sos_vm_e820() to this file to make code structure clear;
Tracked-On: #4458
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Remove useless per board ve820.c as arch/x86/guest/ve820.c is common for
all boards now;
Tracked-On: #4458
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
hypervisor/arch/x86/configs/$(BOARD)/ve820.c is used to store pre-launched
VM specific e820 entries according to memory configuration of customer.
It should be a scenario based configurations but we had to put it in per
board foler because of different board memory settings. This brings concerns
to customer on configuration orgnization.
Currently the file provides same e820 layout for all pre-launched VMs, but
they should have different e820 when their memory are configured differently.
Although we have acrn-config tool to generate ve802.c automatically, it
is not friendly to modify hardcoded ve820 layout manually, so the patch
changes the entries initialization method by calculating each entry item
in C code.
Tracked-On: #4458
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Currently ept_pages_info[] is initialized with first element only that force
VM of id 0 using SOS EPT pages. This is incorrect for logical partition and
hybrid scenario. Considering SOS_RAM_SIZE and UOS_RAM_SIZE are configured
separately, we should use different ept pages accordingly.
So, the PRE_VM_NUM/SOS_VM_NUM and MAX_POST_VM_NUM macros are introduced to
resolve this issue. The macros would be generated by acrn-config tool when
user configure ACRN for their specific scenario.
One more thing, that when UOS_RAM_SIZE is less then 2GB, the EPT address
range should be (4G + PLATFORM_HI_MMIO_SIZE).
Tracked-On: #4458
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
change vmsi_read_cfg to read_vmsi_cfg, same applies to writing
change vmsix_read_cfg to read_vmsix_cfg, same applies to writing
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Move CFG read/write function for PCI-compatible Configuration Mechanism from
debug/uartuart16550.c to hw/pci.c and rename CFG read/write function for
PCI-compatible Configuration Mechanism to pci_pio_read/write_cfg to align with
CFG read/write function pci_mmcfg_read/write_cfg for PCI Express Enhanced
Configuration Access Mechanism.
Tracked-On: #4371
Signed-off-by: Li Fei1 <fei1.li@intel.com>
In order to add GVT-D support, we need pass through stolen memory and opregion memroy
to the post-launched VM. To implement this, we first reserve the GPA for stolen memory
and opregion memory through post-launched VM e820 table. Then we would build EPT mapping
between the GPA and the stolen memory and opregion memory real HPA. The last, we need to
return the GPA to post-launched VM if it wants to read the stolen memory and opregion
memory address and prevent post-launched VM to write the stolen memory and opregion memory
address register for now.
We do the GPA reserve and GPA to HPA EPT mapping in ACRN-DM and the stolen memory and
opregion memory CFG space register access emulation in ACRN-HV.
Tracked-On: #4371
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Hypervisor uses 2MB large page, and if either CONFIG_HV_RAM_START or
CONFIG_HV_RAM_SIZE is not aligned to 2MB, ACRN won't boot. Add static check
to avoid unexpected boot failures.
If CONFIG_RELOC is enabled, CONFIG_HV_RAM_START is not directly referred
by the code, but it causes problems because ld_text_end could be relocated
to an address that is not 2MB aligned which fails mmu_modify_or_del().
Tracked-On: #4441
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
VM needs to check if it owns this device before deiniting it.
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Change enable_vf/disable_vf to create_vfs/disable_vfs
Change base member of pci_vbar to base_gpa
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The vf_bdf is not initialized when invoking pci_pdev_read_cfg function.
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
We didn't support SR-IOV capability of PF in UOS for now, we should
hide the SR-IOV capability if we pass through the PF to a UOS.
For now, we don't support assignment of PF to a UOS.
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
Emulate Device ID, Vendor ID and MSE(Memory Space Enable) bit in
configuration space for an assigned VF, initialize assgined VF Bars.
The Device ID comes from PF's SRIOV capability
The Vendor ID comes from PF's Vendor ID
The PCI MSE bit always be set when VM reads from an assigned VF.
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
If a VF instance is disabled, we didn’t remove the vdev instance,
only set the vdev as a zombie vdev instance, indicating that it
cannot be accessed anymore.
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Change name find_vdev to find_available_vdev and add comments
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The VF BARs are initialized by its PF SRIOV capability
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Refine coding style to wrap msix map/unmap operations, clean up repeated
assignments for msix mmio_hpa and mmio_size.
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add _v prefix for some function name to indicate this function wants to operate
on virtual CFG space or virtual BAR register.
Tracked-On: #4371
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Since there is no RTVM requirement for sdc2 scenario, replace uuid
495ae2e5-2603-4d64-af76-d4bc5a8ec0e5 which is dedicated to RTVM with
615db82a-e189-4b4f-8dbb-d321343e4ab3
Tracked-On: #4472
Signed-off-by: fuzhongl <fuzhong.liu@intel.com>
Reviewed-by: Sun Victor <victor.sun@intel.com>
Removed the pci_vdev_write_cfg_u8/u16/u32 APIs and only used
pci_vdev_write_cfg as the API for writing vdev's cfgdata
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
In Commit 127c73c3, we remove the strict check for adding page table mapping. However,
we just replace the ASSERT of pr_fatal in add_pte. This is not enough. We still add
the virtual address by 4K if the page table mapping is exist and check the virtual
address is over the virtual address region for this mapping. Otherwise, The complain
will continue for 512 times at most.
Tracked-On: #3475
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Add cfg_header_read_cfg and cfg_header_write_cfg to handle the 1st 64B
CFG Space header PCI configuration space.
Only Command and Status Registers are pass through;
Only Command and Status Registers and Base Address Registers are writable.
In order to implement this, we add two type bit mask for per 4B register:
pass through mask and read-only mask. When pass through bit mask is set, this
means this bit of this 4B register is pass through, otherwise, it is virtualized;
When read-only mask is set, this means this bit of this 4B register is read-only,
otherwise, it's writable. We should write it to physical CFG space or virtual
CFG space base on whether the pass through bit mask is set or not.
Tracked-On: #4371
Signed-off-by: Li Fei1 <fei1.li@intel.com>
1. Renames DEFINE_IOAPIC_SID with DEFINE_INTX_SID as the virtual source can
be IOAPIC or PIC
2. Rename the src member of source_id.intx_id to ctlr to indicate interrupt
controller
2. Changes the type of src member of source_id.intx_id from uint32_t to
enum with INTX_CTLR_IOAPIC and INTX_CTLR_PIC
Tracked-On: #4447
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
for vpci_bridge it is better just write the virtual configure space,
so move out the PCI bridge phyiscal cfg write to pci.c
also add some rules in config pci bridge.
Tracked-On: #3381
Signed-off-by: Minggui Cao <minggui.cao@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
For SRIOV needs ARI support, so enable it in HV if
the PCI bridge support it.
TODO:
need check all the PCI devices under this bridge can support ARI,
if not, it is better not enable it as PCIe spec. That check will be
done when scanning PCI devices.
Tracked-On: #3381
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Signed-off-by: Minggui Cao <minggui.cao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
- remove limit of CONFIG_HV_RAM_SIZE which is for scenario of 2 VMs only,
the default size from Kconfig could build scenario which up to 5 VMs;
- rename whl-ipc-i5_acpi_info.h to platform_acpi_info.h, since the former
one should be generated by acrn-config tool;
- add SOS related macros in misc.h, otherwise build scenarios which has
SOS VM would be failed;
Tracked-On: #4463
Signed-off-by: Victor Sun <victor.sun@intel.com>
- remove .data and .text directives. We want to place all the boot data and
text in the .entry section since the boot code is different from others
in terms of relocation fixup. With this change, the page tables are in
entry section now and it's aligned at 4KB.
- regardless CONFIG_MULTIBOOT2 is set or not, the 64-bit entry offset is
fixed at 0x1200:
0x00 -- 0x10: Multiboot1 header
0x10 -- 0x88: Multiboot2 header if CONFIG_MULTIBOOT2 is set
0x1000: start of entry section: cpu_primary_start_32
0x1200: cpu_primary_start_64 (thanks to the '.org 0x200' directive)
GDT tables
initial page tables
etc.
Tracked-On: #4441
Reviewed-by: Fengwei Yin <fengwei.yin@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
This is to enable relocation for code32.
- RIP relative addressing is available in x86-64 only so we manually add
relocation delta to the target symbols to fixup code32.
- both code32 and code64 need to load GDT hence both need to fixup GDT
pointer. This patch declares separate GDT pointer cpu_primary64_gdt_ptr
for code64 to avoid double fixup.
- manually fixup cpu_primary64_gdt_ptr in code64, but not rely on relocate()
to do that. Otherwise it's very confusing that symbols from same file could
be fixed up externally by relocate() or self-relocated.
- to make it clear, define a new symbol ld_entry_end representing the end of
the boot code that needs manually fixup, and use this symbol in relocate()
to filter out all symbols belong to the entry sections.
Tracked-On: #4441
Reviewed-by: Fengwei Yin <fengwei.yin@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
GRUB multiboot2 doesn't support relocation for ELF, which means it can't
load acrn.32.out to other address other than the one specified in ELF
header. Thus we need to use the raw binary file acrn.bin, and add
address/entry address/relocatable tags to instruct multiboot2 loader
how to load the raw binary.
Tracked-On: #4441
Reviewed-by: Fengwei Yin <fengwei.yin@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
In direct boot mode, boot_context[] which is saved from cpu_primary_save_32()
is no longer used since commit 6beb34c3cb ("vm_load: update init gdt
preparation"). Thus, the call to it and the function itself can be removed.
Tracked-On: #4441
Reviewed-by: Fengwei Yin <fengwei.yin@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
reduce the use of similar APIs (particularly the name confusion) for
CFG space read/write.
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Make the name of the functions more accurate
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add some pre-assumption and safety check for PCIe ECAM:
1) ACRN only support platforms with PCIe ECAM to access PCIe device CFG space;
2) Must not use ECAM to access PCIe device CFG space before
pci_switch_to_mmio_cfg_ops was called. (In release version, ACRN didn't support
IO port Mechanism. ECAM is the only way to access the PCIe device CFG space).
Tracked-On: #4371
Signed-off-by: Li Fei1 <fei1.li@intel.com>
One argument is missing for the function ptirq_alloc_entry.
This patch fixes the doc generation error.
Tracked-On: #3882
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
- change variable name from hpa to hva because in this function we are
dealing with hva, not hpa.
- can get the address of ld_text_end by directly referring to this symbol,
because relative addressing yields the correct hva, not the hva before
relocation.
Tracked-On: #4441
Signed-off-by: Zide Chen <zide.chen@intel.com>
The pci_read_cap and pci_read_ext_cap are used to enumerate PCI
legacy capability and extended capability.
Change the name pci_read_cap to pci_enumerate_cap
Change the name pci_read_ext_cap to pci_enumerate_ext_cap
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Fixed misspellings and rst formatting issues.
Added ptdev.h to the list of include file for doxygen
Tracked-On: #3882
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Signed-off-by: David B. Kinder <david.b.kinder@intel.com>
Add doxygen style comments to ptdev public APIs.
Add these API descriptions to group acrn_passthrough.
Tracked-On: #3882
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
This patch updates board.c files for RDT MBA on existing
platforms. Also, fixes setting RDT flag in WHL config file.
Tracked-On: #3725
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This patch adds RDT MBA support to detect, configure and
and setup MBA throttle registers based on VM configuration.
Tracked-On: #3725
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
create new pdev and vdev structures for a SRIOV VF device initialization
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add a new parameter pf_vdev for function vpci_init_vdev to support SRIOV
VF vdev initializaiton.
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The init_one_dev_config is used to initialize a acrn_vm_pci_dev_config
SRIOV needs a explicit acrn_vm_pci_dev_config to create a VF vdev,so
refine it to return acrn_vm_pci_dev_config.
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Due to SRIOV VF physical device needs to be initialized when
VF_ENABLE is set and a SRIOV VF physical device initialization
is same with standard PCIe physical device, so expose the
init_pdev for SRIOV VF physical device initialization.
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
All SRIOV VF physical devices don't have bars in configuration space,
they are from the VF associated PF's VF_BAR registers of SRIOV capability.
Adding a vbars data structure in pci_cap_sriov data structure to store
SRIOV VF_BAR information, so that each VF bars can be initialized directly
through the vbars instead multiple accessing of the PF VF_BAR registers.
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
To support SRIOV capability initialization, add a new parameter
is_sriov_pf_vdev for init_vdev_pt function.
If parameter is_sriov_pf_vdev of function init_vdev_pt is true,
then function init_vdev_pt initializes the vdev's SRIOV capability.
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Current code avoid the rule 88 S in MISRA-C, so move xsaves and xrstors
assembler to individual functions.
Tracked-On: #4436
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
VF_ENABLE is one field of SRIOV capability that is used to create
or remove VF physical devices. If VF_ENABLE is set, hv can detect
if the VF physical devices are ready after waiting 100 ms.
v2: Add sanity check for writing NumVFs register, add precondition
and application constraints when VF_ENABLE is set and refine
code style.
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Introduce SRIOV capability field for pci_vdev and add SRIOV capability
interception entries.
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Make the SRIOV-Capable device invisible from SOS if there is
no room for its all virtual functions.
v2: fix a issue that if a PF has been dropped, the subsequent PF
will be dropped too even there is room for its VFs.
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
if the device has PCIe capability, walks all PCIe extended
capabilities for SRIOV discovery.
v2: avoid type casting and refine naming.
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
It puts the new line in the wrong place, and the logs are confusing.
For example, for these entries:
mmap[0] - type: 1, base: 0x00000, length: 0x9800
mmap[1] - type: 2, base: 0x98000, length: 0x8000
mmap[2] - type: 3, base: 0xc0000, length: 0x4000
Currently it prints them in this way:
mmap table: 0 type: 0x1
Base: 0x0000000000000000 length: 0x0000000000098000
mmap table: 1 type: 0x2
Base: 0x0000000000098000 length: 0x0000000000008000
mmap table: 2 type: 0x3
Base: 0x00000000000c0000 length: 0x0000000000040000
With this fix, it looks like the following, and now it's of same style
with how prepare_sos_vm_memmap() logs ve820 tables.
mmap table: 0 type: 0x1
Base: 0x0000000000000000 length: 0x0000000000098000
mmap table: 1 type: 0x2
Base: 0x0000000000098000 length: 0x0000000000008000
mmap table: 2 type: 0x3
Base: 0x00000000000c0000 length: 0x0000000000040000
Tracked-On: #1842
Signed-off-by: Zide Chen <zide.chen@intel.com>
add vpci bridge operations in hypervisor, to avoid SOS mis-operations
to affect other VM's PCI devices.
assumption: before hypervisor bootup, the physical pci-bridge shall be
configured correctly by BIOS or other bootloader; for ACS (Access
Control Service) capability, it is configured by BIOS to support the
devices under it to be isolated and allocated to different VMs.
to simplify the emulations of vpci bridge, set limitations as following:
1. expose all configure space registers, but readonly
2. BIST not support; by default is 0
3. not support interrupt, including INTx and MSI.
TODO:
1. configure tool can select whether a PCI bridge is emulated or pass
through.
Open:
1. SOS how to reset PCI device under the PCI bridge?
Tracked-On: #3381
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Signed-off-by: Minggui Cao <minggui.cao@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
The init value for XCR0 and XSS should be the same with spec:
In SDM Vol1 13.3:
XCR0[0] is associated with x87 state (see Section 13.5.1). XCR0[0] is
always 1. The other bits in XCR0 are all 0 coming out of RESET.
The IA32_XSS MSR (with MSR index DA0H) is zero coming out of RESET.
The previous code try to fix the xsave area leak to other VMs during init
phase, but bring the error to linux. Besides, it cannot avoid the
possible leak in running phase. Need find a better solution.
Tracked-On: #4430
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The dedicated DMAR unit for Intel integrated GPU
shall be available on the physical platform.
So remove the assert and add application constraint
in handle_one_drhd func.
Tracked-On: #4405
Signed-off-by: Junming Liu <junming.liu@intel.com>
Reviewed-by: Wu Binbin <binbin.wu@intel.com>
Reviewed-by: Wu Xiangyang <xiangyang.wu@linux.intel.com>
is not set
This patch does the following,
1. Removes RDT code if CONFIG_RDT_ENABLED flag is
not set.
2. Set the CONFIG_RDT_ENABLED flag only on platforms
that support RDT so that build scripts will automatically
reflect the config.
Tracked-On: #3715
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
cache configuration.
This patch creates a generic infrastructure for
RDT resources instead of just L2 or L3 cache. This
patch also fixes L3 CAT config overwrite by L2 in
cases where both L2 and L3 CAT are supported.
Tracked-On: #3715
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
There can be times when user unknowinlgy enables
CONFIG_CAT_ENBALED SW flag, but the hardware might
not support L3 or L2 CAT. In such case software can
end up writing to the CAT MSRs which can cause
undefined results. The patch fixes the issue by
enabling CAT only when both HW as well software
via the CONFIG_CAT_ENABLED supports CAT.
The patch also address typo with "clos2prq_msr"
function name. It should be "clos2pqr_msr" instead.
PQR stands for platform qos register.
Tracked-On: #3715
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Upcoming intel platforms can support both L2 and L3
but our current code only supports either L2 or L3 CAT.
So split the MSRs so that we can support allocation
for both L2 and L3.
This patch does the following,
1. splits programming of L2 and L3 cache resource
based on the resource ID.
2. Replace generic platform_clos_array struct with resource
specific struct in all the existing board.c files.
Tracked-On: #3715
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
As part of rdt cat refactoring, goal is to combine all rdt
specific features such as CAT under one module. So renaming
rdt resouce specific files such as cat.c/.h to generic rdt.c/.h
files.
Tracked-On: #3715
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Per Section 4.4 Speculation Barriers, in
"Retpoline: A Branch Target Inject Mitigation" white paper,
"LFENCE instruction limits the speculative execution that
a processor implementation can perform around the LFENCE,
possibly impacting processor performance,but also creating
a tool with which to mitigate speculative-execution
side-channel attacks."
Tracked-On: #4424
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Initialize efi info of acrn mbi when boot from multiboot2 protocol, with
this patch hypervisor could get host efi info and pass it to Linux zeropage,
then make guest Linux possible to boot with efi environment;
Tracked-On: #4419
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Initialize module info and ACPI rsdp info of acrn mbi when boot from
multiboot2 protocol, with this patch SOS VM could be loaded sucessfully
with correct ACPI RSDP;
Tracked-On: #4419
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Initialize mmap info of acrn mbi when boot from multiboot2 protocol,
with this patch acrn hv could boot from multiboot2;
Tracked-On: #4419
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add multiboot2 header info in HV image so that bootloader could
recognize it.
Tracked-On: #4419
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Initialize and sanitize a acrn specific multiboot info struct with current
supported multiboot1 in very early boot stage, which would bring below
benifits:
- don't need to do hpa2hva convention every time when refering boot_regs;
- panic early if failed to sanitize multiboot info, so that don't need to
check multiboot info pointer/flags and panic in later boot process;
- keep most code unchanged when introduce multiboot2 support in future;
Tracked-On: #4419
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The patch re-arch boot component header files by:
- moving multiboot.h from include/arch/x86/ to boot/include/ and keep
this header for multiboot1 protocol data struct only;
- moving multiboot related MACROs in cpu_primary.S to multiboot.h;
- creating an independent boot.h to store acrn specific boot information
for other files' reference;
Tracked-On: #4419
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
- It is meaningless to enable debug function in parse_hv_cmdline() because
the function run in very eary stage and uart has not been initialized at
that time, so remove this debug level definition;
- Rewrite parse_hv_cmdline() function to make it compliant with MISRA-C;
- Decouple uart16550 stuff from Init.c module and let console.c handle it;
Tracked-On: #4419
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
We hit build issue if the ld version is 2.34:
error: PHDR segment not covered by LOAD segment
One issue was created to binutils bugzilla system:
https://sourceware.org/bugzilla/show_bug.cgi?id=25585
From the ld guys comment, this is not an issue of 2.34. It's an
issue fixing of the old ld. He suggested to add option
--no-dynamic-linker
to ld if we don't depend on dynamically linker to loader our binary.
Tracked-On: #4415
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Count down number will be decreased at each tick, when it comes to zero,
it will trigger reschedule.
Tracked-On: #4410
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
pick_next function will update the virtual time parameters, and return
the vcpu thread with earlest evt. Calculate the count down number for
the picked vcpu thread, it means how many mcu a thread can run before
the next reschedule occur.
Tracked-On: #4410
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
In the wakeup handler, the vcpu_thread object will be inserted into the
runqueue, and in the sleep handler, it will be removed from the queue.
vcpu_thread object is ordered by EVT (effective virtual time).
Tracked-On: #4410
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add init function for bvt scheduler, creating a runqueue and a period
timer, the timer interval is default as 1ms. The interval is the minimum
charging unit.
Tracked-On: #4410
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
BVT (Borrowed virtual time) scheduler is used to schedule vCPUs on pCPU.
It has the concept of virtual time, vCPU with earliset virtual time is
dispatched first.
Main concepts:
tick timer:
a period tick is used to measure the physcial time in units of MCU
(minimum charing unit).
runqueue:
thread in the runqueue is ordered by virtual time.
weight:
each thread receives a share of the pCPU in proportion to its
weight.
context switch allowance:
the physcial time by which the current thread is allowed to advance
beyond the next runnable thread.
warp:
a thread with warp enabled will have a change to minus a value (Wi)
from virtual time to achieve higher priority.
virtual time:
AVT: actual virtual time, advance in proportional to weight.
EVT: effective virtual time.
EVT <- AVT - ( warp ? Wi : 0 )
SVT: scheduler virtual time, the minimum AVT in the runqueue.
Tracked-On: #4410
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
1. Rename BOOT_CPU_ID to BSP_CPU_ID
2. Repace hardcoded value with BSP_CPU_ID when
ID of BSP is referenced.
Tracked-On: #4420
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Now only PCI MSI-X BAR access need dynamic register/unregister. Others don't need
unregister once it's registered. So we don't need to lock the vm level emul_mmio_lock
when we handle the MMIO access. Instead, we could use finer granularity lock in the
handler to ptotest the shared resource.
This patch fixed the dead lock issue when OVMF try to size the BAR size:
Becasue OVMF use ECAM to access the PCI configuration space, it will first hold vm
emul_mmio_lock, then calls vpci_handle_mmconfig_access. While this tries to size a
BAR which is also a MSI-X Table BAR, it will call register_mmio_emulation_handler to
register the MSI-X Table BAR MMIO access handler. This will causes the emul_mmio_lock
dead lock.
Tracked-On: #3475
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Now we split passthrough PCI device from DM to HV, we could remove all the passthrough
PCI device unused code.
Tracked-On: #4371
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
In this case, we could handle all the passthrough PCI devices in ACRN hypervisor.
But we still need DM to initialize BAR resources and Intx for passthrough PCI
device for post-launched VM since these informations should been filled into
ACPI tables. So
1. we add a HC vm_assign_pcidev to pass the extra informations to replace the old
vm_assign_ptdev.
2. we saso remove HC vm_set_ptdev_msix_info since it could been setted by the post-launched
VM now same as SOS.
3. remove vm_map_ptdev_mmio call for PTDev in DM since ACRN hypervisor will handle these
BAR access.
4. the most important thing is to trap PCI configure space access for PTDev in HV for
post-launched VM and bypass the virtual PCI device configure space access to DM.
This patch doesn't do the clean work. Will do it in the next patch.
Tracked-On: #4371
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Add assign/deassign PCI device hypercall APIs to assign a PCI device from SOS to
post-launched VM or deassign a PCI device from post-launched VM to SOS. This patch
is prepared for spliting passthrough PCI device from DM to HV.
The old assign/deassign ptdev APIs will be discarded.
Tracked-On: #4371
Signed-off-by: Li Fei1 <fei1.li@intel.com>
The previous fcf-protection fix broke the old gcc (older than
gcc 8 which is common on Ubuntu 18.04 and older distributions).
We only add fcf-protection=none for gcc8 and newer.
Tracked-On: #4358
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
apl-mrb need to access P2SB device, so add 00:0d.0 P2SB device to
whitelist for platform pci hidden device.
Tracked-On: #3475
Signed-off-by: Wei Liu <weix.w.liu@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Victor Sun <victor.sun@intel.com>
To enable gvt-d,need to allow the GPU IOMMU.
While gvt-d hasn't been enabled on APL yet,
so let APL disable GPU IOMMU.
v2 -> v3:
* let APL platforms disable GPU IOMMU.
Tracked-On: #4405
Signed-off-by: Junming Liu <junming.liu@intel.com>
Reviewed-by: Wu Binbin <binbin.wu@intel.com>
If one of the enabled VT-d DMAR units
doesn’t support snoop control,
then bit 11 of leaf PET of EPT is not set,
since the field is treated as reserved(0)
by VT-d hardware implementations
not supporting snoop control.
GUP IOMMU doesn’t support snoop control,
this patch add an option to disable
iommu snoop control for gvt-d.
v2 -> v3:
* refine the MICRO name and description.
Tracked-On: #4405
Signed-off-by: Junming Liu <junming.liu@intel.com>
Reviewed-by: Wu Binbin <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
On UEFI UP2 board, APs might execute HLT before SOS kernel INIT them.
After SOS kernel take over and will re-init the APs directly. The flows
from HV perspective is like:
HLT trap:
wait_event(VCPU_EVENT_VIRTUAL_INTERRUPT) -> sleep_thread
SOS kernel INIT, SIPI APs:
pause_vcpu(ZOMBIE) -> sleep_thread
-> reset_vcpu
-> launch_vcpu -> wake_vcpu
However, the last wake_vcpu will fail because the cpu event
VCPU_EVENT_VIRTUAL_INTERRUPT had not got signaled.
This patch will reset all vcpu events in reset_vcpu. If the thread was
previously waiting for a event, its waiting status will be cleared and
launch_vcpu will wake it to running.
Tracked-On: #4402
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
In platforms that support CAT, when it is enabled by ACRN, i.e.
IA32_resourceType_MASK_n registers are programmed with customized values,
it has impacts to the whole system.
The per guest flag GUEST_FLAG_CLOS_REQUIRED suggests that CAT may be
enabled in some guests, but not in others who don't have this flag,
which is conceptually incorrect.
This patch removes GUEST_FLAG_CLOS_REQUIRED, and adds a new Kconfig
entry CAT_ENABLED for CAT enabling. When it's enabled, platform_clos_array[]
defines a set of system-wide Class of Service (COS, or CLOS), and the
per guest vm_configs[].clos associates the guest with particular CLOS.
Tracked-On: #2462
Signed-off-by: Zide Chen <zide.chen@intel.com>
In some build env (Ubuntu 19.10 as example), gcc enabled the option
-fcf-protection by default. But this option is not compatible with
-mindirect-branch. Which could trigger following build error:
fail to build with gcc-9 [error: ‘-mindirect-branch’ and
‘-fcf-protection’ are not compatible]
-mindirect-branch is mandatory for retpoline mitigation and always
enabled for ACRN build. We disable -fcf-protection here for ACRN
build.
Tracked-On: #4358
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Wu Binbin <binbin.wu@intel.com>
Currently panic() and pr_xxx() statements before init_primary_pcpu_post()
won't be printed, which is inconvenient and misleading for debugging.
This patch makes pr_xxx() APIs working before init_pcpu_pre():
- clear .bss in init.c, which makes sense to clear .bss at the very beginning
of initialization code. Also this makes it possible to call init_logmsg()
before init_pcpu_pre().
- move parse_hv_cmdline() and uart16550_init(true) to init.c.
- refine ticks_to_us() to handle the case that it's called before
calibrate_tsc(). As a side effect, it prints "0us" in early pr_xxx() calls.
- call init_debug_pre() in init_primary_pcpu() and after this point,
both printf() and pr_xxx() APIs are available.
However, this patch doesn't address the issue that pr_xxx() could be called
on PCPUs that set_current_pcpu_id() hasn't been called, which implies that
the PCPU ID shown in early logs may not be accurate.
Tracked-On: #2987
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
it is better to init bdfs_from_drhds.pci_bdf_map_count
before it is passed to other function to do:
bdfs_from_drhds->pci_bdf_map_count++
Tracked-On: #3875
Signed-off-by: Minggui Cao <minggui.cao@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
INVALID_BIT_INDEX has 16 bits only, which removes all pcpu_id that
is >= 16 from the destination mask.
Tracked-On: #4354
Signed-off-by: Zide Chen <zide.chen@intel.com>
1. Align the coding style for these MACROs
2. Align the values of fixed VECTORs
Tracked-On: #4348
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
In lapic passthrough mode, it should passthrough HLT/PAUSE execution
too. This patch disable their emulation when switch to lapic passthrough mode.
Tracked-On: #4329
Tested-by: Dongsheng Zhang <dongsheng.x.zhang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Due to vcpu and its thread are two different perspective modules, each
of them has its own status. Dump both states for better understanding
of system status.
Tracked-On: #4329
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
is_polling_ioreq is more straightforward. Rename it.
Tracked-On: #4329
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This patch checks the validity of 'vdev->pdev' to
ensure physical device is linked to 'vdev'.
this check is to avoid some potential hypervisor
crash when destroying VM with crafted input.
Tracked-On: #4336
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
'param' is BDF value instead of GPA when VHM driver
issues below 2 hypercalls:
- HC_ASSIGN_PTEDEV
- HC_DEASSIGN_PTDEV
This patch is to remove related code in hc_assign/deassign()
functions.
Tracked-On: #4334
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
SOS will use PCIe ECAM access PCIe external configuration space. HV should trap this
access for security(Now pre-launched VM doesn't want to support PCI ECAM; post-launched
VM trap PCIe ECAM access in DM).
Besides, update PCIe MMCONFIG region to be owned by hypervisor and expose and pass through
platform hide PCI devices by BIOS to SOS.
Tracked-On: #3475
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Use Enhanced Configuration Access Mechanism (MMIO) instead of PCI-compatible
Configuration Mechanism (IO port) to access PCIe Configuration Space
PCI-compatible Configuration Mechanism (IO port) access is used for UART in
debug version.
Tracked-On: #3475
Signed-off-by: Li Fei1 <fei1.li@intel.com>
This patch overwrites the idle driver of service OS for industry, sdc,
sdc2 scenarios. HLT will be used as the default idle action.
Tracked-On: #4329
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
HLT emulation is import to CPU resource maximum utilization. vcpu
doing HLT means it is idle and can give up CPU proactively. Thus, we
pause the vcpu thread in HLT emulation and resume it while event happens.
When vcpu enter HLT, its vcpu thread will sleep, but the vcpu state is
still 'Running'.
VM ID PCPU ID VCPU ID VCPU ROLE VCPU STATE
===== ======= ======= ========= ==========
0 0 0 PRIMARY Running
0 1 1 SECONDARY Running
Tracked-On: #4329
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Sometimes HV wants to know if there are pending interrupts of one vcpu.
Add .has_pending_intr interface in acrn_apicv_ops and return the pending
interrupts status by check IRRs of apicv.
Tracked-On: #4329
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Introduce two kinds of events for each vcpu,
VCPU_EVENT_IOREQ: for vcpu waiting for IO request completion
VCPU_EVENT_VIRTUAL_INTERRUPT: for vcpu waiting for virtual interrupts events
vcpu can wait for such events, and resume to run when the
event get signalled.
This patch also change IO request waiting/notifying to this way.
Tracked-On: #4329
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This simple event implemention can only support exclusive waiting
at same time. It mainly used by thread who want to wait for special event
happens.
Thread A who want to wait for some events calls
wait_event(struct sched_event *);
Thread B who can give the event signal calls
signal_event(struct sched_event *);
Tracked-On: #4329
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
As we enabled cpu sharing, PAUSE-loop exiting can help vcpu
to release its pcpu proactively. It's good for performance.
VMX_PLE_GAP: upper bound on the amount of time between two successive
executions of PAUSE in a loop.
VMX_PLE_WINDOW: upper bound on the amount of time a guest is allowed to
execute in a PAUSE loop
Tracked-On: #4329
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
In current code, wait_pcpus_offline() and make_pcpu_offline() are called by
both shutdown_vm() and reset_vm(), but this is not needed when lapic_pt is
not enabled for the vcpus of the VM.
The patch merged offline pcpus part code into a common
offline_lapic_pt_enabled_pcpus() api for shutdown_vm() and reset_vm() use and
called only when lapic_pt is enabled.
Tracked-On: #4325
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
1. This patch passes-through CR4.PCIDE to guest VM.
2. This patch handles the invlidation of TLB and the paging-structure caches.
According to SDM Vol.3 4.10.4.1, the following instructions invalidate
entries in the TLBs and the paging-structure caches:
- INVLPG: this instruction is passed-through to guest, no extra handling needed.
- INVPCID: this instruction is passed-trhough to guest, no extra handling needed.
- CR0.PG from 1 to 0: already handled by current code, change of CR0.PG will do
EPT flush.
- MOV to CR3: hypervisor doesn't trap this instrcution, no extra handling needed.
- CR4.PGE changed: already handled by current code, change of CR4.PGE will no EPT
flush.
- CR4.PCIDE from 1 to 0: this patch handles this case, will do EPT flush.
- CR4.PAE changed: already handled by current code, change of CR4.PAE will do EPT
flush.
- CR4.SEMP from 1 to 0, already handled by current code, change of CR4.SEMP will
do EPT flush.
- Task switch: Task switch is not supported in VMX non-root mode.
- VMX transitions: already handled by current code with the support of VPID.
3. This patch checks the validatiy of CR0, CR4 related to PCID feature.
According to SDM Vol.3 4.10.1, CR.PCIDE can be 1 only in IA-32e mode.
- MOV to CR4 causes a general-protection exception (#GP) if it would change CR4.PCIDE
from 0 to 1 and either IA32_EFER.LMA = 0 or CR3[11:0] ≠ 000H
- MOV to CR0 causes a general-protection exception if it would clear CR0.PG to 0
while CR4.PCIDE = 1
Tracked-On: #4296
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
According to SDM Vol.3 Section 25.3, behavior of the INVPCID
instruction is determined first by the setting of the “enable
INVPCID” VM-execution control:
- If the “enable INVPCID” VM-execution control is 0, INVPCID
causes an invalid-opcode exception (#UD).
- If the “enable INVPCID” VM-execution control is 1, treatment
is based on the setting of the “INVLPG exiting” VM-execution
control:
* If the “INVLPG exiting” VM-execution control is 0, INVPCID
operates normally.
* If the “INVLPG exiting” VM-execution control is 1, INVPCID
causes a VM exit.
In current implementation, hypervisor doesn't set “INVLPG exiting”
VM-execution control, this patch sets “enable INVPCID” VM-execution
control to 1 when the instruction is supported by physical cpu.
If INVPCID is supported by physical cpu, INVPCID will not cause VM
exit in VM.
If INVPCID is not supported by physical cpu, INVPCID causes an #UD
in VM.
When INVPCID is passed-through to VM, According to SDM Vol.3 28.3.3.1,
INVPCID instruction invalidates linear mappings and combined mappings.
They are required to do so only for the current VPID.
HV assigned a unique vpid for each vCPU, if guest uses wrong PCID,
it would not affect other vCPUs.
Tracked-On: #4296
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Pass-through PCID related capabilities to VMs:
- The support of PCID (CPUID.01H.ECX[17])
- The support of instruction INVPCID (CPUID.07H.EBX[10])
Tracked-On: #4296
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
ACRN relies on the capability of VPID to avoid EPT flushes during VMX transitions.
This capability is checked as a must have hardware capability, otherwise, ACRN will
refuse to boot.
Also, the current code has already made sure each vpid for a virtual cpu is valid.
So, no need to check the validity of vpid for vcpu and enable VPID for vCPU by default.
Tracked-On: #4296
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Before we assign a PT device to post-launched VM, we should reset the PCI device
first. However, ACRN hypervisor doesn't plan to support PCIe hot-plug and doesn't
support PCIe bridge Secondary Bus Reset. So the PT device must support FLR or PM
reset. This patch do this check when assigning a PT device to post-launched VM.
Tracked-On: #3465
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Since we restore BAR values when writing Command Register if necessary. We don't
need to trap FLR and do the BAR restore then.
Tracked-On: #3475
Signed-off-by: Li Fei1 <fei1.li@intel.com>
When PCIe does Conventinal Reset or FLR, almost PCIe configurations and states will
lost. So we should save the configurations and states before do the reset and restore
them after the reset. This was done well by BIOS or Guest now. However, ACRN will trap
these access and handle them properly for security. Almost of these configurations and
states will be written to physical configuration space at last except for BAR values
for now. So we should do the restore for BAR values. One way is to do restore after
one type reset is detected. This will be too complex. Another way is to do the restore
when BIOS or guest tries to write the Command Register. This could work because:
1. The I/O Space Enable bit and Memory Space Enable bits in Command Register will reset
to zero.
2. Before BIOS or guest wants to enable these bits, the BAR couldn't be accessed.
3. So we could restore the BAR values before enable these bits if reset is detected.
Tracked-On: #3475
Signed-off-by: Li Fei1 <fei1.li@intel.com>
- target vm_id of vuart can't be un-defined VM, nor the VM itself.
- fix potential NULL pointer dereference in find_active_target_vuart()
Tracked-On: #3854
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Per SDM 10.12.5.1 vol.3, local APIC should keep LAPIC state after receiving
INIT. The local APIC ID register should also be preserved.
Tracked-On: #4267
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The patch abstract a vcpu_reset_internal() api for internal usage, the
function would not touch any vcpu state transition and just do vcpu reset
processing. It will be called by create_vcpu() and reset_vcpu().
The reset_vcpu() will act as a public api and should be called
only when vcpu receive INIT or vm reset/resume from S3. It should not be
called when do shutdown_vm() or hcall_sos_offline_cpu(), so the patch remove
reset_vcpu() in shutdown_vm() and hcall_sos_offline_cpu().
The patch also introduced reset_mode enum so that vcpu and vlapic could do
different context operation according to different reset mode;
Tracked-On: #4267
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Rename vlapic_xxx_write_handler() to vlapic_write_xxx() to make code more
readable;
Tracked-On: #4268
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Some MACROs in lapic.h are duplicated with apicreg.h, and some MACROs are
never referenced, remove them.
Tracked-On: #4268
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Per SDM 10.4.7.1 vol3, the LVT register should be reset to 0s except for the
mask bits are set to 1s.
In current code, the lvt_last[] has been set to correct value(i.e. 0x10000) in
vlapic_reset() before enforce setting vlapic->lvt_last[i] to 0U, add the loop
that set vlapic->lvt_last[i] to 0 would lead to get zero when read LVT regs
after reset, which is incompiant with SDM;
Tracked-On: #4266
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Structure pci_vbar is used to define the virtual BAR rather than physical BAR.
It's better to name as pci_vbar.
Tracked-On: #3475
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
There's no need to check which capability we care at the very beginning. We could
do it later step by step.
Tracked-On: #3475
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Per ACPI 6.2 spec, chapter 5.2.5.2 "Finding the RSDP on UEFI Enabled Systems":
In Unified Extensible Firmware Interface (UEFI) enabled systems, a pointer to
the RSDP structure exists within the EFI System Table. The OS loader is provided
a pointer to the EFI System Table at invocation. The OS loader must retrieve the
pointer to the RSDP structure from the EFI System Table and convey the pointer
to OSPM, using an OS dependent data structure, as part of the hand off of
control from the OS loader to the OS.
So when ACRN boot from direct mode on a UEFI enabled system, hypervisor might
be failed to get rsdp by seaching rsdp in legacy EBDA or 0xe0000~0xfffff region,
but it still have chance to get rsdp by seaching it in e820 ACPI reclaimable
region with some edk2 based BIOS.
The patch will search rsdp from e820 ACPI reclaim region When failed to get
rsdp from legacy region.
Tracked-On: #4301
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
- commit 69152647 ("hv: Use virtual APIC IDs for Pre-launched VMs")
enables virtual APIC IDs for pre-launched VMs thus xapic_phys is no
longer needed to force guest xAPIC to work in physical destination mode.
- HVC is not available in logical partition mode and "console=hvc0" should
be removed from guest Linux bootargs.
Tracked-On: #3854
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Victor Sun <victor.sun@intel.com>
Add severity definitions for different scenarios. The static
guest severity is defined according to guest configurations.
Also add sanity check to make sure the severity for all guests
are correct.
Tracked-On: #4270
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
For guest reset, if the highest severity guest reset will reset
system. There is vm flag to call out the highest severity guest
in specific scenario which is a static guest severity assignment.
There is case that the static highest severity guest is shutdown
and the highest severity guest should be transfer to other guest.
For example, in ISD scenario, if RTVM (static highest severity
guest) is shutdown, SOS should be highest severity guest instead.
The is_highest_severity_vm() is updated to detect highest severity
guest dynamically. And promote the highest severity guest reset
to system reset.
Also remove the GUEST_FLAG_HIGHEST_SEVERITY definition.
Tracked-On: #4270
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
For system S5, ACRN had assumption that SOS shutdown will trigger
system shutdown. So the system shutdown logical is:
1. Trap SOS shutdown
2. Wait for all other guest shutdown
3. Shutdown system
The new logical is refined as:
If all guest is shutdown, shutdown whole system
Tracked-On: #4270
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
ACRN hypervisor should trap guest doing PCI AF FLR. Besides, it should save some status
before doing the FLR and restore them later, only BARs values for now.
This patch will trap guest Conventional PCI Advanced Features Control Register write
operation if the device supports Conventional PCI Advanced Features Capability and
check whether it wants to do device AF FLR. If it does, call pdev_do_flr to do the job.
Tracked-On: #3465
Signed-off-by: Li Fei1 <fei1.li@intel.com>
ACRN hypervisor should trap guest doing PCIe FLR. Besides, it should save some status
before doing the FLR and restore them later, only BARs values for now.
This patch will trap guest Device Capabilities Register write operation if the device
supports PCI Express Capability and check whether it wants to do device FLR. If it does,
call pdev_do_flr to do the job.
Tracked-On: #3465
Signed-off-by: Li Fei1 <fei1.li@intel.com>
The pointer variable 'start' should be checked against NULL
right after detected it is not pointer to a space character,
otherwise the pointer variable 'end' must hold the wrong
address right after NULL if the cmdline containing trailing
whitespaces and deference the wrong address out of cmdline
string. this parsing code also been optimized and simplified.
Tracked-On: projectacrn#4250
Signed-off-by: Gary <gordon.king@intel.com>
We don't use INIT signal notification method now. This patch
removes them.
Tracked-On: #3886
Acked-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
We have implemented a new notification method using NMI.
So replace the INIT notification method with the NMI one.
Then we can remove INIT notification related code later.
Tracked-On: #3886
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
There is a window where we may miss the current request in the
notification period when the work flow is as the following:
CPUx + + CPUr
| |
| +--+
| | | Handle pending req
| <--+
+--+ |
| | Set req flag |
<--+ |
+------------------>---+
| Send NMI | | Handle NMI
| <--+
| |
| |
| +--> vCPU enter
| |
+ +
So, this patch enables the NMI-window exiting to trigger the next vmexit
once there is no "virtual-NMI blocking" after vCPU enter into VMX non-root
mode. Then we can process the pending request on time.
Tracked-On: #3886
Acked-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
The NMI for notification should not be inject to guest. So,
this patch drops NMI injection request when we use NMI
to notify vCPUs. Meanwhile, ACRN doesn't support vNMI well
and there is no well-designed way to check if the NMI is
for notification or for guest now. So, we take all the NMIs as
notificaton NMI for hard rtvm temporarily. It means that the
hard rtvm will never receive NMI with this patch applied.
TODO: vNMI support is not ready yet. we will add it later.
Tracked-On: #3886
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
ACRN hypervisor needs to kick vCPU off VMX non-root mode to do some
operations in hypervisor, such as interrupt/exception injection, EPT
flush etc. For non lapic-pt vCPUs, we can use IPI to do so. But, it
doesn't work for lapic-pt vCPUs as the IPI will be injected to VMs
directly without vmexit.
Without the way to kick the vCPU off VMX non-root mode to handle pending
request on time, there may be fatal errors triggered.
1). Certain operation may not be carried out on time which may further
lead to fatal errors. Taking the EPT flush request as an example, once we
don't flush the EPT on time and the guest access the out-of-date EPT,
fatal error happens.
2). ACRN now will send an IPI with vector 0xF0 to target vCPU to kick the vCPU
off VMX non-root mode if it wants to do some operations on target vCPU.
However, this way doesn't work for lapic-pt vCPUs. The IPI will be delivered
to the guest directly without vmexit and the guest will receive a unexpected
interrupt. Consequently, if the guest can't handle this interrupt properly,
fatal error may happen.
The NMI can be used as the notification signal to kick the vCPU off VMX
non-root mode for lapic-pt vCPUs. So, this patch uses NMI as notification signal
to address the above issues for lapic-pt vCPUs.
Tracked-On: #3886
Acked-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
Reserved bits in a 8-bit PAT field has been checked in pat_mem_type_invalid.
Remove this redundant check "(PAT_FIELD_RSV_BITS & field) != 0UL" in
write_pat_msr.
Tracked-On: #1842
Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>