-- move vm_state_lock to other place in vm structure
to avoid the memory waste because of the page-aligned.
-- remove the memset from create_vm
-- explicitly set max_emul_mmio_regions and vcpuid_entry_nr to 0
inside create_vm to avoid use without initialization.
-- rename max_emul_mmio_regions to nr_emul_mmio_regions
v1->v2:
add deinit_emul_io in shutdown_vm
Tracked-On: #4958
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Grandhi, Sainath <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The make command is same as old configs layout:
under acrn-hypervisor folder:
make hypervisor BOARD=xxx SCENARIO=xxx [TARGET_DIR]=xxx [RELEASE=x]
under hypervisor folder:
make BOARD=xxx SCENARIO=xxx [TARGET_DIR]=xxx [RELEASE=x]
if BOARD/SCENARIO parameter is not specified, the default will be:
BOARD=nuc7i7dnb SCENARIO=industry
Tracked-On: #5077
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
There are 3 kinds of configurations in ACRN hypervisor source code: hypervisor
overall setting, per-board setting and scenario specific per-VM setting.
Currently Kconfig act as hypervisor overall setting and its souce is located at
"hypervisor/arch/x86/configs/$(BOARD).config"; Per-board configs are located at
"hypervisor/arch/x86/configs/$(BOARD)" folder; scenario specific per-VM configs
are located at "hypervisor/scenarios/$(SCENARIO)" folder.
This layout brings issues that board configs and VM configs are coupled tightly.
The board specific Kconfig file and misc_cfg.h are shared by all scenarios, and
scenario specific pci_dev.c is shared by all boards. So the user have no way to
build hypervisor binary for different scenario on different board with one
source code repo.
The patch will setup a new VM configurations layout as below:
misc/vm_configs
├── boards --> folder of supported boards
│ ├── <board_1> --> scenario-irrelevant board configs
│ │ ├── board.c --> C file of board configs
│ │ ├── board_info.h --> H file of board info
│ │ ├── pci_devices.h --> pBDF of PCI devices
│ │ └── platform_acpi_info.h --> native ACPI info
│ ├── <board_2>
│ ├── <board_3>
│ └── <board...>
└── scenarios --> folder of supported scenarios
├── <scenario_1> --> scenario specific VM configs
│ ├── <board_1> --> board specific VM configs for <scenario_1>
│ │ ├── <board_1>.config --> Kconfig for specific scenario on specific board
│ │ ├── misc_cfg.h --> H file of board specific VM configs
│ │ ├── pci_dev.c --> board specific VM pci devices list
│ │ └── vbar_base.h --> vBAR base info of VM PT pci devices
│ ├── <board_2>
│ ├── <board_3>
│ ├── <board...>
│ ├── vm_configurations.c --> C file of scenario specific VM configs
│ └── vm_configurations.h --> H file of scenario specific VM configs
├── <scenario_2>
├── <scenario_3>
└── <scenario...>
The new layout would decouple board configs and VM configs completely:
The boards folder stores kinds of supported boards info, each board folder
stores scenario-irrelevant board configs only, which could be totally got from
a physical platform and works for all scenarios;
The scenarios folder stores VM configs of kinds of working scenario. In each
scenario folder, besides the generic scenario specific VM configs, the board
specific VM configs would be put in a embedded board folder.
In new layout, all configs files will be removed out of hypervisor folder and
moved to a separate folder. This would make hypervisor LoC calculation more
precisely with below fomula:
typical LoC = Loc(hypervisor) + Loc(one vm_configs)
which
Loc(one vm_configs) = Loc(misc/vm_configs/boards/<board>)
+ LoC(misc/vm_configs/scenarios/<scenario>/<board>)
+ Loc(misc/vm_configs/scenarios/<scenario>/vm_configurations.c
+ Loc(misc/vm_configs/scenarios/<scenario>/vm_configurations.h
Tracked-On: #5077
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
To hide CET feature from guest VM completely, the MSR IA32_MSR_XSS also
need to be intercepted because it comprises CET_U and CET_S feature bits
of xsave/xstors operations. Mask these two bits in IA32_MSR_XSS writing.
With IA32_MSR_XSS interception, member 'xss' of 'struct ext_context' can
be removed because it is duplicated with the MSR store array
'vcpu->arch.guest_msrs[]'.
Tracked-On: #5074
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Return-oriented programming (ROP), and similarly CALL/JMP-oriented
programming (COP/JOP), have been the prevalent attack methodologies for
stealth exploit writers targeting vulnerabilities in programs.
CET (Control-flow Enforcement Technology) provides the following
capabilities to defend against ROP/COP/JOP style control-flow subversion
attacks:
* Shadow stack: Return address protection to defend against ROP.
* Indirect branch tracking: Free branch protection to defend against
COP/JOP
The full support of CET for Linux kernel has not been merged yet. As the
first stage, hide CET from guest VM.
Tracked-On: #5074
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
On WHL platform, we need to pass through TPM to Secure pre-launched VM. In order
to do this, we need to add TPM2 ACPI Table and add TPM DSDT ACPI table to include
the _CRS.
Now we only support the TPM 2.0 device (TPM 1.2 device is not support). Besides,
the TPM must use Start Method 7 (Uses the Command Response Buffer Interface)
to notify the TPM 2.0 device that a command is available for processing.
Tracked-On: #5053
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Add mmio device pass through support for pre-launched VM.
When we pass through a MMIO device to pre-launched VM, we would remove its
resource from the SOS. Now these resources only include the MMIO regions.
Tracked-On: #5053
Acked-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Add two hypercalls to support MMIO device pass through for post-launched VM.
And when we support MMIO pass through for pre-launched VM, we could re-use
the code in mmio_dev.c
Tracked-On: #5053
Signed-off-by: Li Fei1 <fei1.li@intel.com>
During context switch in hypervisor, xsave/xrstore are used to
save/resotre the XSAVE area according to the XCR0 and XSS. The legacy
region in XSAVE area include FPU and SSE, we should make sure the
legacy region be saved during contex switch. FPU in XCR0 is always
enabled according to SDM.
For SSE, we enable it in XCR0 during context switch.
Tracked-On: #5062
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
kick_thread function is only used by kick_vcpu to kick vcpu out of
non-root mode, the implementation in it is sending IPI to target CPU if
target obj is running and target PCPU is not current one; while for
runnable obj, it will just make reschedule request. So the kick_thread
is not actually belong to scheduler module, we can drop it and just do
the cpu notification in kick_vcpu.
Tracked-On: #5057
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Reviewed-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
vcpu->running is duplicated with THREAD_STS_RUNNING status of thread
object. Introduce an API sleep_thread_sync(), which can utilize the
inner status of thread object, to do the sync sleep for zombie_vcpu().
Tracked-On: #5057
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Reviewed-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
1. Update thread status after switch_in/switch_out.
2. Add 'be_blocking' to represent the intermediate state during
sleep_thread and switch_out. After switch_out, the thread status
update to THREAD_STS_BLOCKED.
Tracked-On: #5057
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Reviewed-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
-- replace global hypercall lock with per-vm lock
-- add spinlock protection for vm & vcpu state change
v1-->v2:
change get_vm_lock/put_vm_lock parameter from vm_id to vm
move lock obtain before vm state check
move all lock from vmcall.c to hypercall.c
Tracked-On: #4958
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Support hide SRIOV extend capability for passthough device
Tracked-On: #5041
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Some OSes assume the platform must have the IOAPIC. For example:
Linux Kernel allocates IRQ force from GSI (0 if there's no PIC and IOAPIC) on x86.
And it thinks IRQ 0 is an architecture special IRQ, not for device driver. As a
result, the device driver may goes wrong if the allocated IRQ is 0 for RTVM.
This patch expose vIOAPIC to RTVM with LAPIC passthru even though the RTVM can't
use IOAPIC, it servers as a place holder to fullfil the guest assumption.
After vIOAPIC has exposed to guest unconditionally, the 'ready' field could be
removed since we do vIOAPIC initialization for each guest.
Tracked-On: #4691
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
-- remove unnecessary lock in pci_mmcfg_read_cfg and
pci_mmcfg_write_cfg since the mmio operation is atomic
if the offest is aligned with 1/2/4 bytes.
-- move pci_is_valid_access to pci.h
Tracked-On: #4958
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Check bit 48 in IA32_VMX_BASIC MSR, if it is 1, return error, as we only
support Intel 64 architecture.
SDM:
Appendix A.1 BASIC VMX INFORMATION
Bit 48 indicates the width of the physical addresses that may be used for the
VMXON region, each VMCS, anddata structures referenced by pointers in a
VMCS (I/O bitmaps, virtual-APIC page, MSR areas for VMX transitions). If
the bit is 0, these addresses are limited to the processor’s
physical-address width.2 If the bit is 1, these addresses are limited to
32 bits. This bit is always 0 for processors that support Intel 64
architecture.
Tracked-On: #4956
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
hv: pci: refine pci_find_vdev with hash
1. Refined pci_find_vdev with BDF-hashing for better performance
Tracked-On: #4857
Signed-off-by: Wang Qian <qian1.wang@intel.com>
Reviewed-by: Li Fei <Fei1.Li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
hv: pci: refine pci_lookup_drhd_for_pbdf with hash
1. Added an auxiliary function pci_find_pdev using hash to find pdev
with pbdf, thus pci_lookup_drhd_for_pbdf will have a better performance
Tracked-On: #4857
Signed-off-by: Wang Qian <qian1.wang@intel.com>
Reviewed-by: Li Fei <Fei1.Li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
hv: pci: rename pci_pdev_array to pci_pdevs to make it clearer
Tracked-On: #4857
Signed-off-by: Wang Qian <qian1.wang@intel.com>
Reviewed-by: Li Fei <Fei1.Li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Some passthrough devices require multiple MSI vectors, but don't
support MSI-X. In meanwhile, Linux kernel doesn't support continuous
vector allocation.
On native platform, this issue can be mitigated by IOMMU via interrupt
remapping. However, on ACRN, there is no vIOMMU.
vMSI-X on MSI emulation is one solution to mitigate this problem on ACRN.
This patch adds MSI-X emulation on MSI capability.
For the device needs to do MSI-X emulation, HV will hide MSI capability
and present MSI-X capability to guest.
The guest driver may need to modify to reqeust MSI-X vector.
For example:
ret = pci_alloc_irq_vectors(pdev, 1, STMMAC_MSI_VEC_MAX,
- PCI_IRQ_MSI);
+ PCI_IRQ_MSI | PCI_IRQ_MSIX);
To enable MSI-X emulation, the device should:
- 1. The device should be in vmsix_on_msi_devs array.
- 2. Support MSI, but don't support MSI-X.
- 3. MSI capability should support per-vector mask.
- 4. The device should have an unused BAR.
- 5. The device driver should not rely on PBA for functionality.
Tracked-On: #4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
dmar_reserve_irte is added to reserve N coutinuous IRTEs.
N could be 1, 2, 4, 8, 16, or 32.
The reserved IRTEs will not be freed.
Tracked-On:#4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
For a ptirq_remapping_info entry, when build IRTE:
- If the caller provides a valid IRTE, use the IRET
- If the caller doesn't provide a valid IRTE, allocate a IRET when the
entry doesn't have a valid IRTE, in this case, the IRET will be freed
when free the entry.
Tracked-On:#4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
idx_in:
- If the caller of dmar_assign_irte passes a valid IRTE index, it will
be resued;
- If the caller of dmar_assign_irte passes INVALID_IRTE_ID as IRTE index,
the function will allocate a new IRTE.
idx_out:
This paramter return the actual index of IRTE used. The caller need to
check whether the return value is valid or not.
Also this patch adds an internal function alloc_irte.
The function takes count as input paramter to allocate continuous IRTEs.
The count can only be 1, 2, 4, 8, 16 or 32.
This is prepared for multiple MSI vector support.
Tracked-On: #4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
There're some platforms still doesn't support 1GB large page on CPU side.
Such as lakefield, TNT and EHL platforms on which have some silicon bug and
this case CPU don't support 1GB large page.
This patch tries to release this constrain to support more hardware platform.
Note this patch doesn't release the constrain on IOMMU side.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
The information needed to enable MSI-x emulation.
Only enable MSI-x emuation for the devices in msix_emul_devs array.
Currently, only EHL has the need to enable MSI-x emulation for TSN
devices.
Tracked-On: #4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Previously the VM kernel bootargs for pre-launched VMs and direct boot mode
of SOS VM are built-in hypervisor binary so end users have no way to change
it. Now we provide another option that the multiboot module string could be
used as bootargs also. This would bring convenience to end users when they
use GRUB as bootloader because the bootargs could be configurable in GRUB
menu.
The usage is if there is any string follows configured kernel_mod_tag in
module string, the string will be used as new kernel bootargs instead of
built-in kernel bootargs. If there is no string follows kernel_mod_tag,
then the built-in bootargs will be the default kernel bootargs.
Please note kernel_mod_tag must be the first word in module string in any
case, it is used to specify the module for which VM.
Tracked-On: #4885
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Previously append_seed_arg() just do fill in seed arg to dest cmd buffer,
so rename the api name to fill_seed_arg().
Since fill_seed_arg() will be called in SOS VM path only, the param of
bool vm_is_sos is not needed and will be replaced by dest buffer size.
The seed_args[] which used by fill_seed_arg() is pre-defined as all-zero,
so memset() is not needed in fill_seed_arg(), buffer pointer check
and strncpy_s() are not needed also.
Tracked-On: #4885
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add a standard string api strncat_s() to replace merge_cmdline() to make code
more readable.
Another change is that the multiboot cmdline will be appended to the end of
configured SOS bootargs instead of the beginning, this would enable a feature
that some kernel cmdline paramter items could be overriden by multiboot cmdline
since the later one would win if same parameters configured in kernel cmdline.
Tracked-On: #4885
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Per C11 standard (ISO/IEC 9899:2011): K.3.7.1.4
1. Copying shall not take place between objects that overlap;
2. If there is a runtime-constraint violation, the strncpy_s function sets
s1[0] to '\0\;
3. The strncpy_s function returns zero if there was no runtime-constraint
violation. Otherwise, a nonzero value is returned.
4. The function is implemented with memcpy_s() because the runtime-constraint
detection is almost same.
Tracked-On: #4885
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Per C11 standard (ISO/IEC 9899:2011): K.3.7.1.1
1. Copying shall not take place between objects that overlap;
2. If there is a runtime-constraint violation, the memcpy_s function stores
zeros in the first s1max characters of the object;
3. The memcpy_s function returns zero if there was no runtime-constraint
violation. Otherwise, a nonzero value is returned.
Tracked-On: #4885
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
We define some functions to read some fields of the CFG header registers. We
could remove them since they're not necessary since calling pci_pdev_read_cfg
is simple.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Now Host Bridge and PCI Bridge could only be added to SOS's acrn_vm_pci_dev_config.
So For UOS, we always emualte Host Bridge and PCI Bridge for it and assign PCI device
to it; for SOS, if it's the highest severity VM, we will assign Host Bridge and PCI
Bridge to it directly, otherwise, we will emulate them same as UOS.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
According PCI Code and ID Assignment Specification Revision 1.11, a PCI device
whose Base Class is 06h and Sub-Class is 00h is a Host bridge.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
We should check whether a PCI device is host bridge or not by Base Class (06h)
and Sub-Class (00h).
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
As per the BWG a delay should be provided between the
INIT IPI and Startup IPI. Without the delay observe hangs
on certain platforms during MP Init sequence. So Setting
a delay of 10us between assert INIT IPI and Startup IPI.
Also, as per SDM section 10.7 the the de-assert INIT IPI is
only used for Pentium and P6 processors. This is not applicable
for Pentium4 and Xeon processors so removing this sequence.
Tracked-On: #4835
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Snoop control will not be turned on by hypervisor, delete snoop control
related code.
Tracked-On: #4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Invalidate cache by scanning and flushing the whole guest memory is
inefficient which might cause long execution time for WBINVD emulation.
A long execution in hypervisor might cause a vCPU stuck phenomenon what
impact Windows Guest booting.
This patch introduce a workaround method that pausing all other vCPUs in
the same VM when do wbinvd emulation.
Tracked-On: #4703
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Refine find_ptirq_entry by hashing instead of walk each of the PTIRQ entries one by one.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong<eddie.dong@Intel.com>
This patch adds hash function to hash 64bit value.
Tracked-On: #4550
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Acked-by: Eddie Dong<eddie.dong@Intel.com>
RTVM (with lapic PT) boots hang when maxcpus is
assigned a value less than the CPU number configured
in hypervisor.
In this case, vlapic_state(per VM) is left in TRANSITION
state after BSP boot, which blocks interupts to be injected
to this UOS.
Tracked-On: #4803
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Li, Fei <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
There's no need to look up MSI ptirq entry by virtual SID any more since the MSI
ptirq entry would be removed before the device is assigned to a VM.
Now the logic of MSI interrupt remap could simplify as:
1. Add the MSI interrupt remap first;
2. If step is already done, just do the remap part.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong<eddie.dong@Intel.com>
Reviewed-by: Grandhi, Sainath <sainath.grandhi@intel.com>
The existing code do separately for each VM when we deinit vpci of a VM. This is
not necessary. This patch use the common handling for all VMs: we first deassign
it from the (current) user, then give it back to its parent user.
When we deassign the vdev from the (current) user, we would de-initialize the
vMSI/VMSI-X remapping, so does the vMSI/vMSI-X data structure.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong<eddie.dong@Intel.com>