From the VT-d spec 8.3:
If a DRHD structure with INCLUDE_PCI_ALL flag Set is reported for a
Segment, it must be enumerated by BIOS after all other DRHD structures
for the same Segment.
However, some broken BIOS violate the rules. To bring up ACRN with them,
change the ASSERT to a permissive check to unblock the BIOS limitation.
Also, scan the DRHD list to find the one who has INCLUDE_PCI_ALL flag.
Tracked-On: #4937
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Replace dmar_iterate_tbl() by a direct for loop. Handle the
dmar_unit_cnt and handle_one_drhd() of each DRHD in the direct for loop.
Also tune some function definitions to save LOC.
Tracked-On: #4937
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
According to SDM 10.12.11, we can know this register is dedicated to the
purpose of sending self-IPIs with the intent of enabling a highly
optimized path for sending self-IPIs. Also sending the IPI via the Self
Interrupt Register ensures that interrupt is delivered to the processor
core. Specifically completion of the WRMSR instruction to the SELF IPI
register implies that the interrupt has been logged into the IRR.
Tracked-On: #4937
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Currently, not all platforms support posted interrupt processing of both
VT-x and VT-d. On EHL, VT-d doesn't support posted interrupt processing.
So in such scenario, is_pi_capable() in vcpu_handle_pi_notification()
will bypass the PIR pending bits check which might cause a self-NV-IPI
lost.
With commit "bf1ff8c98 (hv: Offload syncing PIR to vIRR to processor
hardware)", the syncing PIR to vIRR is postponed and it is handled by a
self-NV-IPI in the following VMEnter. The process looks like,
a) vcpu A accepts a virtual interrupt ->
1) ACRN_REQUEST_EVENT is set
2) corresponding bit in PIR is set
3) Posted Interrupt ON bit is set
b) vcpu A does virtual interrupt injection on resume path due to
the pending ACRN_REQUEST_EVENT ->
1) hypervisor disables host interrupt
2) ACRN_REQUEST_EVENT is cleared
3) a self-NV-IPI is sent via ICR of LAPIC.
4) IRR bit of the self-NV-IPI is set
c) (VM-ENTRY) vcpu A returns into non-root mode
1) host interrupt enable(by HW)
2) posted interrupt processing clears the ON bit, sync PIR to vIRR
3) deliver the virtual interrupt if guest rflags.IF=1
d) (VM-EXIT) vcpu A traps due to a instruction execution (e.g. HLT)
1) host interrupt disable(by HW)
2) hypervisor enable host interrupt
Above illustrates a normal process of the virtual interrupt injection
with cpu PI support. However, a failing case is observed. The failing
case is that the self-NV-IPI from b-3 is not accepted by the core until
a timing between d-1 and d-2. b-4 happening between d-1 and d-2 is
observed by debug trace. So the self-NV-IPI will be handled in root-mode
which cannot do the syncing PIR to vIRR processing. Due to the bug
described in the first paragraph, vcpu_handle_pi_notification() cannot
succeed the virtual interrupt injection request. This patch fix it by
removing the wrong check in vcpu_handle_pi_notification() because
vcpu_handle_pi_notification() only happens on platform with cpu PI
support.
Here are some cost data for sending IPI via LAPIC ICR regsiter.
Normally, the cycles between ICR write and IRR got set is 140~260,
which is not accurate due to the MSR read overhead.
And from b-3 to c is about 560 cycles. So b-4 happens during this
period. But in bad case, b-4 doesn't happen even c is triggered.
The worse case i captured is that ICR write and IRR got set costs more
than 1900 cycles. Now, the best GUESS of the huge cost of IPI via ICR is
the ACPI bus arbitration(refer to SDM 10.6.3, 10.7 and Figure 10-17).
Tracked-On: #4937
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
We hit following build error when using gcc10:
arch/x86/page.c:240:48: error: array subscript is outside
array bounds of 'struct page[0][1]' [-Werror=array-bounds]
It happens with gcc10 on different Linux distributions.
Regarding the case that ACRN depends on zero length array in
sevaral places, we disable the zero length array warning by
gcc option.
Tracked-On: #4810
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Wrap a function to do guest ept flush. This function doesn't do real EPT flush.
It just make the EPT flush request and do the real flush just before vcpu vmenter.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
-- remove unnecessary lock in pci_mmcfg_read_cfg and
pci_mmcfg_write_cfg since the mmio operation is atomic
if the offest is aligned with 1/2/4 bytes.
-- move pci_is_valid_access to pci.h
Tracked-On: #4958
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
remove spin lock for micro code update since the guest
operating system will take lock
Tracked-On: #4958
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
The commit 'HV: Config Splitlock Detection to be disable' allows
using CONFIG_ENFORCE_TURNOFF_AC to turn off splitlock #AC. If
CONFIG_ENFORCE_TURNOFF_AC is not set, splitlock #AC should be turn on
Tracked-On: #4962
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Check bit 48 in IA32_VMX_BASIC MSR, if it is 1, return error, as we only
support Intel 64 architecture.
SDM:
Appendix A.1 BASIC VMX INFORMATION
Bit 48 indicates the width of the physical addresses that may be used for the
VMXON region, each VMCS, anddata structures referenced by pointers in a
VMCS (I/O bitmaps, virtual-APIC page, MSR areas for VMX transitions). If
the bit is 0, these addresses are limited to the processor’s
physical-address width.2 If the bit is 1, these addresses are limited to
32 bits. This bit is always 0 for processors that support Intel 64
architecture.
Tracked-On: #4956
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
We always assume the physical platform has XSAVE, and we always enable
XSAVE at the beginning, so, no need to check the OXSAVE in host.
Tracked-On: #4956
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
As build variants for different board and different scenario growing, users
might make mistake on HV binary distributions. Checking board/scenario info
from log would be the fastest way to know whether the binary matches. Also
it would be of benifit to developers for confirming the correct binary they
are debugging.
Tracked-On: #4946
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
hv: pci: refine pci_find_vdev with hash
1. Refined pci_find_vdev with BDF-hashing for better performance
Tracked-On: #4857
Signed-off-by: Wang Qian <qian1.wang@intel.com>
Reviewed-by: Li Fei <Fei1.Li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
hv: pci: refine pci_lookup_drhd_for_pbdf with hash
1. Added an auxiliary function pci_find_pdev using hash to find pdev
with pbdf, thus pci_lookup_drhd_for_pbdf will have a better performance
Tracked-On: #4857
Signed-off-by: Wang Qian <qian1.wang@intel.com>
Reviewed-by: Li Fei <Fei1.Li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
hv: pci: rename pci_pdev_array to pci_pdevs to make it clearer
Tracked-On: #4857
Signed-off-by: Wang Qian <qian1.wang@intel.com>
Reviewed-by: Li Fei <Fei1.Li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Some passthrough devices require multiple MSI vectors, but don't
support MSI-X. In meanwhile, Linux kernel doesn't support continuous
vector allocation.
On native platform, this issue can be mitigated by IOMMU via interrupt
remapping. However, on ACRN, there is no vIOMMU.
vMSI-X on MSI emulation is one solution to mitigate this problem on ACRN.
This patch adds MSI-X emulation on MSI capability.
For the device needs to do MSI-X emulation, HV will hide MSI capability
and present MSI-X capability to guest.
The guest driver may need to modify to reqeust MSI-X vector.
For example:
ret = pci_alloc_irq_vectors(pdev, 1, STMMAC_MSI_VEC_MAX,
- PCI_IRQ_MSI);
+ PCI_IRQ_MSI | PCI_IRQ_MSIX);
To enable MSI-X emulation, the device should:
- 1. The device should be in vmsix_on_msi_devs array.
- 2. Support MSI, but don't support MSI-X.
- 3. MSI capability should support per-vector mask.
- 4. The device should have an unused BAR.
- 5. The device driver should not rely on PBA for functionality.
Tracked-On: #4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
dmar_reserve_irte is added to reserve N coutinuous IRTEs.
N could be 1, 2, 4, 8, 16, or 32.
The reserved IRTEs will not be freed.
Tracked-On:#4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
For a ptirq_remapping_info entry, when build IRTE:
- If the caller provides a valid IRTE, use the IRET
- If the caller doesn't provide a valid IRTE, allocate a IRET when the
entry doesn't have a valid IRTE, in this case, the IRET will be freed
when free the entry.
Tracked-On:#4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
idx_in:
- If the caller of dmar_assign_irte passes a valid IRTE index, it will
be resued;
- If the caller of dmar_assign_irte passes INVALID_IRTE_ID as IRTE index,
the function will allocate a new IRTE.
idx_out:
This paramter return the actual index of IRTE used. The caller need to
check whether the return value is valid or not.
Also this patch adds an internal function alloc_irte.
The function takes count as input paramter to allocate continuous IRTEs.
The count can only be 1, 2, 4, 8, 16 or 32.
This is prepared for multiple MSI vector support.
Tracked-On: #4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Script only append 'U' for the config of int with a range.
Add a range to MAX_IR_ENTRIES.
Tracked-On: #4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
There're some platforms still doesn't support 1GB large page on CPU side.
Such as lakefield, TNT and EHL platforms on which have some silicon bug and
this case CPU don't support 1GB large page.
This patch tries to release this constrain to support more hardware platform.
Note this patch doesn't release the constrain on IOMMU side.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
This patch tries to release hardware platform 1GB large page support constrain
on CPU side.
There're some silicon bug on lakefield, TNT and EHL platforms which cause CPU
couldn't support 1GB large page. As a result, the pre-assumption The platform
which ACRN supports must support 1GB large page on both CPU side and VTD side
is not true any more.
This reverts commit f01aad7e to let trampoline execution use 2MB pages.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
The information needed to enable MSI-x emulation.
Only enable MSI-x emuation for the devices in msix_emul_devs array.
Currently, only EHL has the need to enable MSI-x emulation for TSN
devices.
Tracked-On: #4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The acrn_mbi.mi_mmap_va should point to struct multiboot2_mmap_entry when
boot from multiboot2, which is different from struct multiboot_mmap when
boot from multiboot1. So we should handle mmap info separately for multiboot2.
Tracked-On: #4885
Signed-off-by: Victor Sun <victor.sun@intel.com>
Previously the VM kernel bootargs for pre-launched VMs and direct boot mode
of SOS VM are built-in hypervisor binary so end users have no way to change
it. Now we provide another option that the multiboot module string could be
used as bootargs also. This would bring convenience to end users when they
use GRUB as bootloader because the bootargs could be configurable in GRUB
menu.
The usage is if there is any string follows configured kernel_mod_tag in
module string, the string will be used as new kernel bootargs instead of
built-in kernel bootargs. If there is no string follows kernel_mod_tag,
then the built-in bootargs will be the default kernel bootargs.
Please note kernel_mod_tag must be the first word in module string in any
case, it is used to specify the module for which VM.
Tracked-On: #4885
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Previously append_seed_arg() just do fill in seed arg to dest cmd buffer,
so rename the api name to fill_seed_arg().
Since fill_seed_arg() will be called in SOS VM path only, the param of
bool vm_is_sos is not needed and will be replaced by dest buffer size.
The seed_args[] which used by fill_seed_arg() is pre-defined as all-zero,
so memset() is not needed in fill_seed_arg(), buffer pointer check
and strncpy_s() are not needed also.
Tracked-On: #4885
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add a standard string api strncat_s() to replace merge_cmdline() to make code
more readable.
Another change is that the multiboot cmdline will be appended to the end of
configured SOS bootargs instead of the beginning, this would enable a feature
that some kernel cmdline paramter items could be overriden by multiboot cmdline
since the later one would win if same parameters configured in kernel cmdline.
Tracked-On: #4885
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Per C11 standard (ISO/IEC 9899:2011): K.3.7.1.4
1. Copying shall not take place between objects that overlap;
2. If there is a runtime-constraint violation, the strncpy_s function sets
s1[0] to '\0\;
3. The strncpy_s function returns zero if there was no runtime-constraint
violation. Otherwise, a nonzero value is returned.
4. The function is implemented with memcpy_s() because the runtime-constraint
detection is almost same.
Tracked-On: #4885
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Per C11 standard (ISO/IEC 9899:2011): K.3.7.1.1
1. Copying shall not take place between objects that overlap;
2. If there is a runtime-constraint violation, the memcpy_s function stores
zeros in the first s1max characters of the object;
3. The memcpy_s function returns zero if there was no runtime-constraint
violation. Otherwise, a nonzero value is returned.
Tracked-On: #4885
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The multiboot2 cmdline would be used as hypervisor cmdline, add parse logic
for the case that hypervisor boot from multiboot2 protocol.
Tracked-On: #4885
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Previously sanitize_multiboot_info() was called after init_debug_pre() because
the debug message can only print after uart is initialized. On the other hand,
multiboot cmdline need to be parsed before init_debug_pre() because the cmdline
could override uart settings and make sure debug message printed successfully.
This cause multiboot info was parsed in two stages.
The patch revise the multiboot parse logic that split sanitize_multiboot_info()
api and use init_acrn_multiboot_info() api for the early stage. The most of
multiboot info will be initialized during this stage and no debug message need
to be printed. After uart is initialized, the sanitize_multiboot_info() would
do sanitize multiboot info and print needed debug messages.
Tracked-On: #4885
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
We define some functions to read some fields of the CFG header registers. We
could remove them since they're not necessary since calling pci_pdev_read_cfg
is simple.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Don't hardcode install paths. Instead of hardcoding where binaries are
installed, add variables that installer can override.
Tracked-On: #4864
Signed-off-by: Chee Yang Lee <chee.yang.lee@intel.com>
Signed-off-by: Naveen Saini <naveen.kumar.saini@intel.com>
The $(VERSION) should be depended on config.h change. For example, when RELEASE
parameter is changed in make commmand, CONFIG_RELEASE need to be updated in
defconfig file, and then message in version.h should be updated.
The patch also fix a bug that a code path in make defconfig never be triggered
because shell will treat [ ! -f $(KCONFIG_FILE) ] as false when $(KCONFIG_FILE)
is not specified. (i.e. "$(KCONFIG_FILE)" == "")
Tracked-On: #2412
Signed-off-by: Victor Sun <victor.sun@intel.com>
Update efi bootloader image file path for Yocto rootfs in Kconfig.
Tracked-On: #4868
Signed-off-by: Wei Liu <weix.w.liu@intel.com>
Reviewed-by: Victor Sun <victor.sun@intel.com>
Now Host Bridge and PCI Bridge could only be added to SOS's acrn_vm_pci_dev_config.
So For UOS, we always emualte Host Bridge and PCI Bridge for it and assign PCI device
to it; for SOS, if it's the highest severity VM, we will assign Host Bridge and PCI
Bridge to it directly, otherwise, we will emulate them same as UOS.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
According PCI Code and ID Assignment Specification Revision 1.11, a PCI device
whose Base Class is 06h and Sub-Class is 00h is a Host bridge.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
We should check whether a PCI device is host bridge or not by Base Class (06h)
and Sub-Class (00h).
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Guest may write a MSI-X capability register with only RW bits setting on. This works
well on native since the hardware will make sure RO register bits could not over-write.
However, the software needs more efforts to achieve this. This patch does this by
defining a RW permission mapping base on bits. When a guest tries to write a MSI-X
Capability register, only modify the RW bits on vCFG space.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Guest may write a MSI capability register with only RW bits setting on. This works
well on native since the hardware will make sure RO register bits could not over-write.
However, the software needs more efforts to achieve this. This patch does this by
defining a RW permission mapping base on bits. When a guest tries to write a MSI
Capability register, only modify the RW bits on vCFG space.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
As per the BWG a delay should be provided between the
INIT IPI and Startup IPI. Without the delay observe hangs
on certain platforms during MP Init sequence. So Setting
a delay of 10us between assert INIT IPI and Startup IPI.
Also, as per SDM section 10.7 the the de-assert INIT IPI is
only used for Pentium and P6 processors. This is not applicable
for Pentium4 and Xeon processors so removing this sequence.
Tracked-On: #4835
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>