Commit Graph

1973 Commits

Author SHA1 Message Date
Victor Sun
e792fa3d3c HV: nuc7i7dnb example of new VM configuratons layout
There are 3 kinds of configurations in ACRN hypervisor source code: hypervisor
overall setting, per-board setting and scenario specific per-VM setting.
Currently Kconfig act as hypervisor overall setting and its souce is located at
"hypervisor/arch/x86/configs/$(BOARD).config"; Per-board configs are located at
"hypervisor/arch/x86/configs/$(BOARD)" folder; scenario specific per-VM configs
are located at "hypervisor/scenarios/$(SCENARIO)" folder.

This layout brings issues that board configs and VM configs are coupled tightly.
The board specific Kconfig file and misc_cfg.h are shared by all scenarios, and
scenario specific pci_dev.c is shared by all boards. So the user have no way to
build hypervisor binary for different scenario on different board with one
source code repo.

The patch will setup a new VM configurations layout as below:

  misc/vm_configs
  ├── boards                         --> folder of supported boards
  │   ├── <board_1>                  --> scenario-irrelevant board configs
  │   │   ├── board.c                --> C file of board configs
  │   │   ├── board_info.h           --> H file of board info
  │   │   ├── pci_devices.h          --> pBDF of PCI devices
  │   │   └── platform_acpi_info.h   --> native ACPI info
  │   ├── <board_2>
  │   ├── <board_3>
  │   └── <board...>
  └── scenarios                      --> folder of supported scenarios
      ├── <scenario_1>               --> scenario specific VM configs
      │   ├── <board_1>              --> board specific VM configs for <scenario_1>
      │   │   ├── <board_1>.config   --> Kconfig for specific scenario on specific board
      │   │   ├── misc_cfg.h         --> H file of board specific VM configs
      │   │   ├── pci_dev.c          --> board specific VM pci devices list
      │   │   └── vbar_base.h        --> vBAR base info of VM PT pci devices
      │   ├── <board_2>
      │   ├── <board_3>
      │   ├── <board...>
      │   ├── vm_configurations.c    --> C file of scenario specific VM configs
      │   └── vm_configurations.h    --> H file of scenario specific VM configs
      ├── <scenario_2>
      ├── <scenario_3>
      └── <scenario...>

The new layout would decouple board configs and VM configs completely:

The boards folder stores kinds of supported boards info, each board folder
stores scenario-irrelevant board configs only, which could be totally got from
a physical platform and works for all scenarios;

The scenarios folder stores VM configs of kinds of working scenario. In each
scenario folder, besides the generic scenario specific VM configs, the board
specific VM configs would be put in a embedded board folder.

In new layout, all configs files will be removed out of hypervisor folder and
moved to a separate folder. This would make hypervisor LoC calculation more
precisely with below fomula:
	typical LoC = Loc(hypervisor) + Loc(one vm_configs)
which
	Loc(one vm_configs) = Loc(misc/vm_configs/boards/<board>)
		+ LoC(misc/vm_configs/scenarios/<scenario>/<board>)
		+ Loc(misc/vm_configs/scenarios/<scenario>/vm_configurations.c
		+ Loc(misc/vm_configs/scenarios/<scenario>/vm_configurations.h

Tracked-On: #5077

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-07-24 16:16:06 +08:00
Shuo A Liu
112f02851c hv: Disable XSAVE-managed CET state of guest VM
To hide CET feature from guest VM completely, the MSR IA32_MSR_XSS also
need to be intercepted because it comprises CET_U and CET_S feature bits
of xsave/xstors operations. Mask these two bits in IA32_MSR_XSS writing.

With IA32_MSR_XSS interception, member 'xss' of 'struct ext_context' can
be removed because it is duplicated with the MSR store array
'vcpu->arch.guest_msrs[]'.

Tracked-On: #5074
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2020-07-23 20:15:57 +08:00
Shuo A Liu
ac598b0856 hv: Hide CET feature from guest VM
Return-oriented programming (ROP), and similarly CALL/JMP-oriented
programming (COP/JOP), have been the prevalent attack methodologies for
stealth exploit writers targeting vulnerabilities in programs.

CET (Control-flow Enforcement Technology) provides the following
capabilities to defend against ROP/COP/JOP style control-flow subversion
attacks:
 * Shadow stack: Return address protection to defend against ROP.
 * Indirect branch tracking: Free branch protection to defend against
   COP/JOP

The full support of CET for Linux kernel has not been merged yet. As the
first stage, hide CET from guest VM.

Tracked-On: #5074
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2020-07-23 20:15:57 +08:00
Li Fei1
5e605e0daf hv: vmcall: check vm id in dispatch_sos_hypercall
Check whether vm_id is valid in dispatch_sos_hypercall

Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2020-07-23 20:13:20 +08:00
Li Fei1
1859727abc hv: vapci: add tpm2 support for pre-launched vm
On WHL platform, we need to pass through TPM to Secure pre-launched VM. In order
to do this, we need to add TPM2 ACPI Table and add TPM DSDT ACPI table to include
the _CRS.

Now we only support the TPM 2.0 device (TPM 1.2 device is not support). Besides,
the TPM must use Start Method 7 (Uses the Command Response Buffer Interface)
to notify the TPM 2.0 device that a command is available for processing.

Tracked-On: #5053
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2020-07-23 20:13:20 +08:00
Li Fei1
7971f34344 hv: vapci: refine acpi table header initialization
Using ACPI_TABLE_HEADER MACRO to initial the ACPI Table Header.

Tracked-On: #5053
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2020-07-23 20:13:20 +08:00
Li Fei1
acc69007e2 hv: mmio_dev: add mmio device pass through support
Add mmio device pass through support for pre-launched VM.
When we pass through a MMIO device to pre-launched VM, we would remove its
resource from the SOS. Now these resources only include the MMIO regions.

Tracked-On: #5053
Acked-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2020-07-23 20:13:20 +08:00
Li Fei1
baf77a79ad hv: mmio_dev: add hypercall to support mmio device pass through
Add two hypercalls to support MMIO device pass through for post-launched VM.
And when we support MMIO pass through for pre-launched VM, we could re-use
the code in mmio_dev.c

Tracked-On: #5053
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2020-07-23 20:13:20 +08:00
Conghui Chen
821c65b40c hv: fix possible SSE region mismatch issue
During context switch in hypervisor, xsave/xrstore are used to
save/resotre the XSAVE area according to the XCR0 and XSS. The legacy
region in XSAVE area include FPU and SSE, we should make sure the
legacy region be saved during contex switch. FPU in XCR0 is always
enabled according to SDM.
For SSE, we enable it in XCR0 during context switch.

Tracked-On: #5062
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-07-22 14:19:21 +08:00
Conghui Chen
53d4a7169b hv: remove kick_thread from scheduler module
kick_thread function is only used by kick_vcpu to kick vcpu out of
non-root mode, the implementation in it is sending IPI to target CPU if
target obj is running and target PCPU is not current one; while for
runnable obj, it will just make reschedule request. So the kick_thread
is not actually belong to scheduler module, we can drop it and just do
the cpu notification in kick_vcpu.

Tracked-On: #5057
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Reviewed-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-07-22 13:38:41 +08:00
Conghui Chen
b6422f8985 hv: remove 'running' from vcpu structure
vcpu->running is duplicated with THREAD_STS_RUNNING status of thread
object. Introduce an API sleep_thread_sync(), which can utilize the
inner status of thread object, to do the sync sleep for zombie_vcpu().

Tracked-On: #5057
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Reviewed-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-07-22 13:38:41 +08:00
Mingqiang Chi
aa89eb3541 hv:add per-vm lock for vm & vcpu state change
-- replace global hypercall lock with per-vm lock
-- add spinlock protection for vm & vcpu state change

v1-->v2:
   change get_vm_lock/put_vm_lock parameter from vm_id to vm
   move lock obtain before vm state check
   move all lock from vmcall.c to hypercall.c

Tracked-On: #4958
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-07-20 11:22:17 +08:00
Yin Fengwei
fcec5a94be kconfig: extend the max msix table number to 64
There are some devices (like Samsung NVMe SSD SM981/PM981 which has 33 MSIX tables)
which have more than 16 MSIX tables. Extend the default value to 64 to handle them.

Tracked-On: #4994
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
2020-07-10 19:39:11 +08:00
Li Fei1
80c7da8f1c hv: vioapic: expose ioapic to guest unconditionally
Some OSes assume the platform must have the IOAPIC. For example:
Linux Kernel allocates IRQ force from GSI (0 if there's no PIC and IOAPIC) on x86.
And it thinks IRQ 0 is an architecture special IRQ, not for device driver. As a
result, the device driver may goes wrong if the allocated IRQ is 0 for RTVM.

This patch expose vIOAPIC to RTVM with LAPIC passthru even though the RTVM can't
use IOAPIC, it servers as a place holder to fullfil the guest assumption.

After vIOAPIC has exposed to guest unconditionally, the 'ready' field could be
removed since we do vIOAPIC initialization for each guest.

Tracked-On: #4691
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-07-10 19:33:46 +08:00
Mingqiang Chi
7751c7933d hv:unify spin_lock initialization
will follow this convention for spin lock initialization:
-- for simple global variable locks, use this style:
   static spinlock_t xxx_spinlock = {.head = 0U, .tail = 0U,}
-- for the locks inside a data structure, need to call
   spinlock_init to initialize.

Tracked-On: #4958
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
2020-07-02 09:40:29 +08:00
Shuo A Liu
025df6d44c hv: use SELF IPI Register for self IPI in X2APIC mode
According to SDM 10.12.11, we can know this register is dedicated to the
purpose of sending self-IPIs with the intent of enabling a highly
optimized path for sending self-IPIs. Also sending the IPI via the Self
Interrupt Register ensures that interrupt is delivered to the processor
core. Specifically completion of the WRMSR instruction to the SELF IPI
register implies that the interrupt has been logged into the IRR.

Tracked-On: #4937
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-06-28 10:33:22 +08:00
Shuo A Liu
0397cb7174 hv: Fix the interrupts lost issue with PI support
Currently, not all platforms support posted interrupt processing of both
VT-x and VT-d. On EHL, VT-d doesn't support posted interrupt processing.
So in such scenario, is_pi_capable() in vcpu_handle_pi_notification()
will bypass the PIR pending bits check which might cause a self-NV-IPI
lost.

With commit "bf1ff8c98 (hv: Offload syncing PIR to vIRR to processor
hardware)", the syncing PIR to vIRR is postponed and it is handled by a
self-NV-IPI in the following VMEnter. The process looks like,
a) vcpu A accepts a virtual interrupt ->
   1) ACRN_REQUEST_EVENT is set
   2) corresponding bit in PIR is set
   3) Posted Interrupt ON bit is set
b) vcpu A does virtual interrupt injection on resume path due to
   the pending ACRN_REQUEST_EVENT ->
   1) hypervisor disables host interrupt
   2) ACRN_REQUEST_EVENT is cleared
   3) a self-NV-IPI is sent via ICR of LAPIC.
   4) IRR bit of the self-NV-IPI is set
c) (VM-ENTRY) vcpu A returns into non-root mode
   1) host interrupt enable(by HW)
   2) posted interrupt processing clears the ON bit, sync PIR to vIRR
   3) deliver the virtual interrupt if guest rflags.IF=1
d) (VM-EXIT) vcpu A traps due to a instruction execution (e.g. HLT)
   1) host interrupt disable(by HW)
   2) hypervisor enable host interrupt

Above illustrates a normal process of the virtual interrupt injection
with cpu PI support. However, a failing case is observed. The failing
case is that the self-NV-IPI from b-3 is not accepted by the core until
a timing between d-1 and d-2. b-4 happening between d-1 and d-2 is
observed by debug trace. So the self-NV-IPI will be handled in root-mode
which cannot do the syncing PIR to vIRR processing. Due to the bug
described in the first paragraph, vcpu_handle_pi_notification() cannot
succeed the virtual interrupt injection request. This patch fix it by
removing the wrong check in vcpu_handle_pi_notification() because
vcpu_handle_pi_notification() only happens on platform with cpu PI
support.

Here are some cost data for sending IPI via LAPIC ICR regsiter.
Normally, the cycles between ICR write and IRR got set is 140~260,
which is not accurate due to the MSR read overhead.
And from b-3 to c is about 560 cycles. So b-4 happens during this
period. But in bad case, b-4 doesn't happen even c is triggered.
The worse case i captured is that ICR write and IRR got set costs more
than 1900 cycles. Now, the best GUESS of the huge cost of IPI via ICR is
the ACPI bus arbitration(refer to SDM 10.6.3, 10.7 and Figure 10-17).

Tracked-On: #4937
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-06-28 10:33:22 +08:00
Li Fei1
da7c2ba3e9 hv: ept: wrap a function to do guest ept flush
Wrap a function to do guest ept flush. This function doesn't do real EPT flush.
It just make the EPT flush request and do the real flush just before vcpu vmenter.

Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2020-06-22 16:25:03 +08:00
Mingqiang Chi
1b84741a56 rename vm_lock/vlapic_state in VM structure
rename:
   vlapic_state-->vlapic_mode
   vm_lock -->  vlapic_mode_lock
   check_vm_vlapic_state --> check_vm_vlapic_mode

Tracked-On: #4958
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
2020-06-19 16:13:20 +08:00
Mingqiang Chi
d808031a04 remove spin lock for micro code update
remove spin lock for micro code update since the guest
operating system will take lock

Tracked-On: #4958
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
2020-06-19 16:13:20 +08:00
Mingqiang Chi
ac65898f35 cleanup spin lock in vtd.c
move dm_unit->lock into dmar_issue_qi_request

Tracked-On: #4958

Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
2020-06-19 16:13:20 +08:00
Mingqiang Chi
67a7c355ec cleanup spin lock in irq.c
-- move exception_lock to dump.c
-- optimize the lock usage in request_irq

Tracked-On: #4958
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
2020-06-19 16:13:20 +08:00
yuhong.tao@intel.com
aff687896b HV: Fix split-locked access detection is disabled by default
The commit 'HV: Config Splitlock Detection to be disable' allows
using CONFIG_ENFORCE_TURNOFF_AC to turn off splitlock #AC. If
CONFIG_ENFORCE_TURNOFF_AC is not set, splitlock #AC should be turn on

Tracked-On: #4962
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
2020-06-19 09:22:58 +08:00
Conghui Chen
2a4c59db74 hv: add check for BASIC VMX INFORMATION
Check bit 48 in IA32_VMX_BASIC MSR, if it is 1, return error, as we only
support Intel 64 architecture.

SDM:
Appendix A.1 BASIC VMX INFORMATION

Bit 48 indicates the width of the physical addresses that may be used for the
VMXON region, each VMCS, anddata structures referenced by pointers in a
VMCS (I/O bitmaps, virtual-APIC page, MSR areas for VMX transitions). If
the bit is 0, these addresses are limited to the processor’s
physical-address width.2 If the bit is 1, these addresses are limited to
32 bits. This bit is always 0 for processors that support Intel 64
architecture.

Tracked-On: #4956
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
2020-06-18 14:05:56 +08:00
Conghui Chen
906284eec8 hv: remove unnecessary debug symbols
remove unnecessary debug symbols.

Tracked-On: #4956
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
2020-06-18 14:05:56 +08:00
Conghui Chen
f4292752b0 hv: remove check for OSXSAVE in host
We always assume the physical platform has XSAVE, and we always enable
XSAVE at the beginning, so, no need to check the OXSAVE in host.

Tracked-On: #4956
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
2020-06-18 14:05:56 +08:00
Conghui Chen
53f74f18ac hv: remove repeated assignment
remove repeated assignment for vmcs_pa.

Tracked-On: #4956
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
2020-06-18 14:05:56 +08:00
Victor Sun
ca9e98cc74 HV: add board and scenario info in log
As build variants for different board and different scenario growing, users
might make mistake on HV binary distributions. Checking board/scenario info
from log would be the fastest way to know whether the binary matches. Also
it would be of benifit to developers for confirming the correct binary they
are debugging.

Tracked-On: #4946

Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-06-18 13:05:42 +08:00
Binbin Wu
da1788c9a3 hv: vtd: add an API to reserve continuous irtes
dmar_reserve_irte is added to reserve N coutinuous IRTEs.
N could be 1, 2, 4, 8, 16, or 32.

The reserved IRTEs will not be freed.

Tracked-On:#4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-06-16 08:52:56 +08:00
Binbin Wu
7bfcc673a6 hv: ptirq: associate an irte with ptirq_remapping_info entry
For a ptirq_remapping_info entry, when build IRTE:
- If the caller provides a valid IRTE, use the IRET
- If the caller doesn't provide a valid IRTE, allocate a IRET when the
entry doesn't have a valid IRTE, in this case, the IRET will be freed
when free the entry.

Tracked-On:#4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-06-16 08:52:56 +08:00
Binbin Wu
2fe4280cfa hv: vtd: add two paramters for dmar_assign_irte
idx_in:
- If the caller of dmar_assign_irte passes a valid IRTE index, it will
be resued;
- If the caller of dmar_assign_irte passes INVALID_IRTE_ID as IRTE index,
the function will allocate a new IRTE.

idx_out:
This paramter return the actual index of IRTE used. The caller need to
check whether the return value is valid or not.

Also this patch adds an internal function alloc_irte.
The function takes count as input paramter to allocate continuous IRTEs.
The count can only be 1, 2, 4, 8, 16 or 32.
This is prepared for multiple MSI vector support.

Tracked-On: #4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-06-16 08:52:56 +08:00
Binbin Wu
9c52b353fa hv: kconfig: add a range for MAX_IR_ENTRIES
Script only append 'U' for the config of int with a range.
Add a range to MAX_IR_ENTRIES.

Tracked-On: #4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
2020-06-16 08:52:56 +08:00
Li Fei1
65e4a16e6a hv: mmu: release 1GB cpu side support constrain
There're some platforms still doesn't support 1GB large page on CPU side.
Such as lakefield, TNT and EHL platforms on which have some silicon bug and
this case CPU don't support 1GB large page.

This patch tries to release this constrain to support more hardware platform.

Note this patch doesn't release the constrain on IOMMU side.

Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2020-06-15 15:16:34 +08:00
Li Fei1
6e57553015 Revert "hv: Let trampoline execution use 1GB pages"
This patch tries to release hardware platform 1GB large page support constrain
on CPU side.

There're some silicon bug on lakefield, TNT and EHL platforms which cause CPU
couldn't support 1GB large page. As a result, the pre-assumption The platform
which ACRN supports must support 1GB large page on both CPU side and VTD side
is not true any more.

This reverts commit f01aad7e to let trampoline execution use 2MB pages.

Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2020-06-15 15:16:34 +08:00
Binbin Wu
c907a820df hv: config: add msix emulation support
The information needed to enable MSI-x emulation.
Only enable MSI-x emuation for the devices in msix_emul_devs array.
Currently, only EHL has the need to enable MSI-x emulation for TSN
devices.

Tracked-On: #4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-06-10 14:32:15 +08:00
Victor Sun
80262f0602 HV: rename append_seed_arg to fill_seed_arg
Previously append_seed_arg() just do fill in seed arg to dest cmd buffer,
so rename the api name to fill_seed_arg().

Since fill_seed_arg() will be called in SOS VM path only, the param of
bool vm_is_sos is not needed and will be replaced by dest buffer size.

The seed_args[] which used by fill_seed_arg() is pre-defined as all-zero,
so memset() is not needed in fill_seed_arg(), buffer pointer check
and strncpy_s() are not needed also.

Tracked-On: #4885

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-06-08 13:30:04 +08:00
Victor Sun
e254be150a HV: rewrite memcpy_s to be iso c11 compliant
Per C11 standard (ISO/IEC 9899:2011): K.3.7.1.1

1. Copying shall not take place between objects that overlap;
2. If there is a runtime-constraint violation, the memcpy_s function stores
   zeros in the first s1max characters of the object;
3. The memcpy_s function returns zero if there was no runtime-constraint
   violation. Otherwise, a nonzero value is returned.

Tracked-On: #4885

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-06-08 13:30:04 +08:00
Victor Sun
c74b1941a0 HV: split sanitize_multiboot_info api
Previously sanitize_multiboot_info() was called after init_debug_pre() because
the debug message can only print after uart is initialized. On the other hand,
multiboot cmdline need to be parsed before init_debug_pre() because the cmdline
could override uart settings and make sure debug message printed successfully.
This cause multiboot info was parsed in two stages.

The patch revise the multiboot parse logic that split sanitize_multiboot_info()
api and use init_acrn_multiboot_info() api for the early stage. The most of
multiboot info will be initialized during this stage and no debug message need
to be printed. After uart is initialized, the sanitize_multiboot_info() would
do sanitize multiboot info and print needed debug messages.

Tracked-On: #4885

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-06-08 13:30:04 +08:00
Binbin Wu
65ec6f3f3b hv: vtd: fix potential dead loop if qi request timeout
Fix potential dead loop if qi request timeout.

Tracked-On: #4680
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
2020-06-05 05:31:16 +08:00
Wei Liu
fbb1dfa264 HV/Kconfig: update efi bootloader image file path for Kconfig
Update efi bootloader image file path for Yocto rootfs in Kconfig.

Tracked-On: #4868
Signed-off-by: Wei Liu <weix.w.liu@intel.com>
Reviewed-by: Victor Sun <victor.sun@intel.com>
2020-06-03 22:02:58 +08:00
Li Fei1
ae4fa40adc hv: vpci: hv: vpci: refine pci device assignment logic
Now Host Bridge and PCI Bridge could only be added to SOS's acrn_vm_pci_dev_config.
So For UOS, we always emualte Host Bridge and PCI Bridge for it and assign PCI device
to it; for SOS, if it's the highest severity VM, we will assign Host Bridge and PCI
Bridge to it directly, otherwise, we will emulate them same as UOS.

Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-06-03 22:00:43 +08:00
Li Fei1
b8f151a55f hv: pci: check whether a PCI device is host bridge or not by class
According PCI Code and ID Assignment Specification Revision 1.11, a PCI device
whose Base Class is 06h and Sub-Class is 00h is a Host bridge.

Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-06-03 22:00:43 +08:00
Li Fei1
0bd2daf1c5 hv: pci: remove host bridge BDF definition
We should check whether a PCI device is host bridge or not by Base Class (06h)
and Sub-Class (00h).

Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-06-03 22:00:43 +08:00
Vijay Dhanraj
d03df0c7e2 HV: Fix MP Init sequence hang by adding a delay
As per the BWG a delay should be provided between the
INIT IPI and Startup IPI. Without the delay observe hangs
on certain platforms during MP Init sequence. So Setting
a delay of 10us between assert INIT IPI and Startup IPI.

Also, as per SDM section 10.7 the the de-assert INIT IPI is
only used for Pentium and P6 processors. This is not applicable
for Pentium4 and Xeon processors so removing this sequence.

Tracked-On: #4835
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-05-27 13:34:59 +08:00
Minggui Cao
8c090c71ca HV: fix bug to clear guest flags after it not used
in shutdown_vm, it uses guest flags when handling the phyiscal
CPUs whose LAPIC is pass-through. So if it is cleared first,
the related vCPUs and pCPUs can not be switched to correct state.

so move the clear action after the flags used.

Tracked-On: #4848
Signed-off-by: Minggui Cao <minggui.cao@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
2020-05-27 11:35:47 +08:00
Binbin Wu
454bb14348 hv: vtd: remove some unnecessary check
1. context_entry couldn't be NULL in iommu_attach_device since bus
number is checked before the call.
2. root_entry couldn't be NULL in iommu_detach_device since bus number
is checked before the call.

Tracked-On: #4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-05-27 11:27:42 +08:00
Binbin Wu
e9901b3edd hv: vtd: add a function to check valid of dmar unit
Add a function dmar_unit_valid to check if the input dmar uint is valid
or not.

A valid dmar_unit should not be NULL, or ignore flag should not be set.

Tracked-On: #4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-05-27 11:27:42 +08:00
Binbin Wu
3009d9399f hv: vtd: cleanup snoop control related code
Snoop control will not be turned on by hypervisor, delete snoop control
related code.

Tracked-On: #4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-05-27 11:27:42 +08:00
Binbin Wu
a94c3ef763 hv: vtd: init DMAR/IR table address when register
Initialize root_table_addr/ir_table_addr of dmar uint when register the dmar uint.
So no need to check if they are initialzed or not later.

Tracked-On: #4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-05-27 11:27:42 +08:00
Shuo A Liu
9a15ea82ee hv: pause all other vCPUs in same VM when do wbinvd emulation
Invalidate cache by scanning and flushing the whole guest memory is
inefficient which might cause long execution time for WBINVD emulation.
A long execution in hypervisor might cause a vCPU stuck phenomenon what
impact Windows Guest booting.

This patch introduce a workaround method that pausing all other vCPUs in
the same VM when do wbinvd emulation.

Tracked-On: #4703
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-05-21 15:21:29 +08:00
Mingqiang Chi
f994b5ffaf hv:cleanup vcpu state
-- remove VCPU_PAUSED and resume_vcpu
-- remove vcpu->prev_state in vcpu structure
-- rename pause_vcpu to zombie_vcpu

Tracked-On: #4320
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
2020-05-21 15:08:49 +08:00
Li Fei1
53af096726 hv: ptirq: refine find_ptirq_entry by hashing
Refine find_ptirq_entry by hashing instead of walk each of the PTIRQ entries one by one.

Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong<eddie.dong@Intel.com>
2020-05-20 16:04:16 +08:00
Yonghua Huang
3391bffb27 hv:fix rtvm hang with maxcpus=0/1 in bootargs
RTVM (with lapic PT) boots hang when maxcpus is
 assigned a value less than the CPU number configured
 in hypervisor.

 In this case, vlapic_state(per VM) is left in TRANSITION
 state after BSP boot, which blocks interupts to be injected
 to this UOS.

Tracked-On: #4803
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Li, Fei <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-05-15 10:09:13 +08:00
Li Fei1
27a66acd0e hv: ptdev: refine look up MSI ptirq entry
There's no need to look up MSI ptirq entry by virtual SID any more since the MSI
ptirq entry would be removed before the device is assigned to a VM.

Now the logic of MSI interrupt remap could simplify as:
1. Add the MSI interrupt remap first;
2. If step is already done, just do the remap part.

Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong<eddie.dong@Intel.com>
Reviewed-by: Grandhi, Sainath <sainath.grandhi@intel.com>
2020-05-13 14:31:01 +08:00
Zide Chen
1bc5c7ac5b hv/acrn-config/efi-stuf: assign hvlog and ramoops buffer address < 256MB
If HV relocation is enabled, either ACRN efi-stub or GRUB relocates
hypervisor image above HPA 256MB, thus we put hvlog and ramoops buffer
under 256MB to avoid conflict with hypervisor owned address.

This patch hardcodes these addresses:

0xa00000 - 0xdfffff: 4MiB for ramoops buffer
0xe00000 - 0xffffff: 2MiB for hvlog buffer

However, user can customize them to other addresses as long as it's under
256MB, available in host e820, and SOS bootarg "nokaslr" is not specified.

If HV relocation is disabled, need to make sure that these buffer
addresses are not between HV_RAM_START and HV_RAM_START + HV_RAM_SIZE.

Tracked-On: #4760
Signed-off-by: Zide Chen <zide.chen@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2020-05-13 08:36:54 +08:00
Zide Chen
0a956c34c7 hv: add a new field cpu_affinity in struct acrn_vm
For post-launched VMs, the configured CPU affinity could be different
from the actual running CPU affinity. This new field acrn_vm->cpu_affinity
recognizes this difference so that it's possible that CREATE_VM
hypercall won't overwrite the configured CPU afifnity.

Change name cpu_affinity_bitmap in acrn_vm_config to cpu_affinity.
This is read-only in run time, never overwritten by acrn-dm.

Remove vm_config->vcpu_num, which means the number of vCPUs of the
configured CPU affinity. This is not to be confused with the actual
running vCPU number: vm->hw.created_vcpus.

Changed get_vm_bsp_pcpu_id() to get_configured_bsp_pcpu_id() for less
confusion.

Tracked-On: #4616
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-05-08 11:04:31 +08:00
Sainath Grandhi
bf1ff8c98f hv: Offload syncing PIR to vIRR to processor hardware
ACRN syncs PIR to vIRR in the software in cases the Posted
Interrupt notification happens while the pCPU is in root mode.

Sync can be achieved by processor hardware by sending a
posted interrupt notiification vector.

This patch sends a self-IPI, if there are interrupts pending in PIR,
which is serviced by the logical processor at the next
VMEnter

Tracked-On: #4777
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
2020-05-08 10:01:07 +08:00
Yan, Like
869ccb7ba8 HV: RDT: add CDP support in ACRN
CDP is an extension of CAT. It enables isolation and separate prioritization of
code and data fetches to the L2 or L3 cache in a software configurable manner,
depending on hardware support.

This commit adds a Kconfig switch "CDP_ENABLED" which depends on "RDT_ENABLED".
CDP will be enabled if the capability available and "CDP_ENABLED" is selected.

Tracked-On: #4604
Signed-off-by: Yan, Like <like.yan@intel.com>
Reviewed-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-05-08 08:50:13 +08:00
Yan, Like
277c668b04 HV: RDT: clean up RDT code
This commit makes some RDT code cleanup, mainling including:
- remove the clos_mask and mba_delay validation check in setup_res_clos_msr(), the check will be done in pre-build;
- rename platform_clos_num to valid_clos_num, which is set as the minimal clos_mas of all enabled RDT resouces;
- init the platform_clos_array in the res_cap_info[] definition;
- remove the unnecessary return values and return value check.

Tracked-On: #4604
Signed-off-by: Yan, Like <like.yan@intel.com>
2020-05-08 08:50:13 +08:00
Yan, Like
f774ee1fba HV: RDT: merge struct rdt_cache and rdt_membw in to a union
A RDT resource could be CAT or MBA, so only one of struct rdt_cache and struct rdt_membw
would be used at a time. They should be a union.
This commit merge struct rdt_cache and struct rdt_membw in to a union res.

Tracked-On: #4604
Signed-off-by: Yan, Like <like.yan@intel.com>
Reviewed-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com
2020-05-08 08:50:13 +08:00
yuhong.tao@intel.com
b0ae9cfa2b HV: Config Splitlock Detection to be disable
#AC should be normally enabled for slpitlock detection, however,
community developers may want to run ACRN on buggy system.
In this case, CONFIG_ENFORCE_TURNOFF_AC can be used to turn off the
 #AC, to let the guest run without #AC.

Tracked-On: #4765
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-05-07 09:35:22 +08:00
Li Fei1
0c6b3e57d6 hv: ptdev: minor refine about ptirq_build_physical_msi
The virtual MSI information could be included in ptirq_remapping_info structrue,
there's no need to pass another input paramater for this puepose. So we could
remove the ptirq_msi_info input.

Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2020-05-06 11:51:11 +08:00
Li Fei1
73335b7276 hv: ptirq: rename ptirq_lookup_entry_by_sid to find_ptirq_entry
We look up PTIRQ entru only by SID. So _by_sid could removed.
And refine function name to verb-obj style.

Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2020-05-06 11:51:11 +08:00
Yin Fengwei
68269a559f gpa2hva: add INVAVLID_HPA return value check
For return value of local_gpa2hpa, either INVALID_HPA or NULL
means the EPT walking failure. Current code only take care of
NULL return and leave INVALID_HPA as correct case.

In some cases (if guest page table is filled with invalid memory
address), it could crash ACRN from guest.

Add INVALID_HPA return check as well.
Also add @pre assumptions for some gpa2hpa usages.

Tracked-On: #4730
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
2020-05-06 11:29:30 +08:00
Wei Liu
22aecf83e0 HV: modify CONFIG_HV_RAM_START for NUC7i7DNB
When boot ACRN hypervisor from grub multiboot, HV will be loaded at
CONFIG_HV_RAM_START since relocation is not supported in grub
multiboot1. The CONFIG_HV_RAM_SIZE in industry scenario will take
~330MB(0x14000000), unfortunately the efi memmap on NUC7i7DNB is
truncated at 0x6dba2000 although it is still usable from 0x6dba2000. So
from grub point of view, it could not find a continuous memory from
0x6000000 to load industry scenario. Per efi memmap, there is a big
memory area available from 0x40400000, so put CONFIG_HV_RAM_START to
0x41000000 is much safe for NUC7i7DNB.

Tracked-On: #4641
Signed-off-by: Victor Sun <victor.sun@intel.com>
2020-05-06 11:25:57 +08:00
Minggui Cao
691a0e2e56 HV: add a specific stack space used in CPU booting
The original stack used in CPU booting is: ld_bss_end + 4KB;
which could be out of the RAM size limit defined in link_ram file.

So add a specific stack space in link_ram file, and used in
CPU booting.

Tracked-On: #4738
Signed-off-by: Minggui Cao <minggui.cao@intel.com>
Reviewed by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-04-29 13:56:40 +08:00
Li Fei1
4c733708bf hv: lapic: minor refine about init_lapic
According to SDM Vol 3, Chap 10.4.7.2 Local APIC State After It Has Been Software Disabled,
The mask bits for all the LVT entries are set when the local APIC has been software disabled.
So there's no need to mask all the LVT entries one by one.

Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2020-04-26 10:48:49 +08:00
Li Fei1
067b439e69 hv: irq: minor refine about structure idt_64_descriptor
The 'value' field in structure idt_64_descriptor is no one used. We could remove it.

Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2020-04-26 10:48:49 +08:00
Zide Chen
9150284ca7 hv: replace vcpu_affinity array with cpu_affinity_bitmap
Currently the vcpu_affinity[] array fixes the vCPU to pCPU mapping.
While the new cpu_affinity_bitmap doesn't explicitly sepcify this
mapping, instead, it implicitly assumes that vCPU0 maps to the pCPU
with lowest pCPU ID, vCPU1 maps to the second lowest pCPU ID, and
so on.

This makes it possible for post-launched VM to run vCPUs on a subset of
these pCPUs only, and not all of them.

acrn-dm may launch post-launched VMs with the current approach: indicate
VM UUID and hypervisor launches all VCPUs from the PCPUs that are masked
in cpu_affinity_bitmap.

Also acrn-dm can choose to launch the VM on a subset of PCPUs that is
defined in cpu_affinity_bitmap. In this way, acrn-dm must specify the
subset of PCPUs in the CREATE_VM hypercall.

Additionally, with this change, a guest's vcpu_num can be easily calculated
from cpu_affinity_bitmap, so don't assign vcpu_num in vm_configuration.c.

Tracked-On: #4616
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-04-23 09:38:54 +08:00
Li Fei1
113f2f1e35 hv: vacpi: add ioapic madt table
Add IOAPIC MADT table support so that guest could detect IOAPIC exist.

Tracked-On: #4623
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2020-04-22 08:42:19 +08:00
Li Fei1
1dccbdbaa2 hv: vapic: add mcfg table support
Add MCFG table support to allow guest access PCIe external CFG space by ECAM

Tracked-On: #4623
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2020-04-22 08:42:19 +08:00
Li Fei1
4eb3f5a0c7 hv: vacpi: add fadt table support
Add FADT table support to support guest S5 setting.

According to ACPI 6.3 Spec, OSPM must ignored the DSDT and FACS fields if them're zero.
However, Linux kernel seems not to abide by the protocol, it will check DSDT still.
So add an empty DSDT to meet it.

Tracked-On: #4623
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2020-04-22 08:42:19 +08:00
Victor Sun
e5561a5c71 HV: remove sdc2 scenario support
Remove sdc2 scenario since the VM launch requirement under this scenario
could be satisfied by industry scenario now;

Tracked-On: #4661

Signed-off-by: Victor Sun <victor.sun@intel.com>
2020-04-20 14:59:23 +08:00
Victor Sun
a90890e9c7 HV: support up to 7 post launched VMs for industry scenario
In industry scenario, hypervisor will support 1 post-launched RT VM
and 1 post-launched kata VM and up to 5 post-launched standard VMs;

Tracked-On: #4661

Signed-off-by: Victor Sun <victor.sun@intel.com>
2020-04-20 14:59:23 +08:00
Victor Sun
09212cf4b6 HV: Kconfig: enable CPU sharing by default
The patch enables CPU sharing feature by default, the default scheduler is
set to SCHED_BVT;

Tracked-On: #4661

Signed-off-by: Victor Sun <victor.sun@intel.com>
2020-04-20 14:59:23 +08:00
Sainath Grandhi
60c4ec0c59 hv: Wake up vCPU for interrupts from vPIC
Wake up vCPUs that are blocked upon interrupts from vPIC.

Tracked-On: #4664
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
2020-04-20 09:49:41 +08:00
Victor Sun
dfb947fe91 HV: fix wrong gpa start of hpa2 in ve820.c
The current logic puts hpa2 above GPA 4G always, which is incorrect. Need
to set gpa start of hpa2 right after hpa1 when hpa1 size is less then 2G;

Tracked-On: #4458

Signed-off-by: Victor Sun <victor.sun@intel.com>
2020-04-17 14:08:54 +08:00
Victor Sun
fe6407155f HV: set default MCFG base for generic board
On most board the MCFG base is set to 0xe0000000, so modify this value in
platform_acpi_info.h for generic boards;

The description of ACPI_PARSE_ENABLED is modified also to match its usage.

Tracked-On: #4157

Signed-off-by: Victor Sun <victor.sun@intel.com>
2020-04-17 13:49:58 +08:00
Victor Sun
3fd5fc9b51 Kconfig: remove MAX_KATA_VM_NUM
CONFIG_MAX_KATA_VM_NUM is a scenario specific configuration, so it is better
to put the MACRO in scenario folder directly, to instead the Kconfig item in
Kconfig file which should work for all scenarios;

Tracked-On: #4616

Signed-off-by: Victor Sun <victor.sun@intel.com>
2020-04-17 13:45:18 +08:00
Victor Sun
dba0591f72 Kconfig: change scenario variable type to string
Basicly ACRN scenario is a configuration name for specific usage. By giving
scenario name ACRN will load corresponding VM configurations to build the
hypervisor. But customer might have their own scenario name, change the
scenario type from choice to string is friendly to them since Kconfig source
file change will not be needed.

With this change, CONFIG_$(SCENARIO) will not exist in kconfig file and will
be instead of CONFIG_SCENARIO, so the Makefile need to be changed accordingly;

Tracked-On: #4616

Signed-off-by: Victor Sun <victor.sun@intel.com>
2020-04-17 13:45:18 +08:00
Victor Sun
55b50f408f HV: init vm uuid and severity in macro
Currently the vm uuid and severity is initilized separately in
vm_config struct, developer need to take care both items carefuly
otherwise hypervisor would have trouble with the configurations.

Given the vm loader_order/uuid and severity are binded tightly, the
patch merged these tree settings in one macro so that developer will
have a simple interface to configure in vm_config struct.

Tracked-On: #4616

Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-04-17 13:45:18 +08:00
yuhong.tao@intel.com
7c80acee95 HV: emulate MSR_TEST_CTL
If CPU has MSR_TEST_CTL, show an emulaued one to VCPU

Tracked-On: #4496
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Yan, Like <like.yan@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-04-17 09:53:59 +08:00
yuhong.tao@intel.com
dd3fa8ed75 HV: enable #AC for Splitlock Access
If CPU support rise #AC for Splitlock Access, then enable this
feature at each CPU.

Tracked-On: #4496
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Yan, Like <like.yan@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-04-17 09:53:59 +08:00
yuhong.tao@intel.com
ea1bce0cbf HV: enumerate capability of #AC for Splitlock Access
When the destination of an atomic memory operation located in 2
cache lines, it is called a Splitlock Access. LOCK# bus signal is
asserted for splitlock access which may lead to long latency. #AC
for Splitlock Access is a CPU feature, it allows rise alignment
check exception #AC(0) instead of asserting LOCK#, that is helpful
to detect Splitlock Access.

This feature is enumerated by MSR(0xcf) IA32_CORE_CAPABILITIES[bit5]
Add helper function:
    bool has_core_cap(uint32_t bitmask)

Tracked-On: #4496
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Yan, Like <like.yan@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-04-17 09:53:59 +08:00
Mingqiang Chi
f90100e382 hv: add pre-condition for vcpu APIs
remove unnecessary state check and
add pre-condition for vcpu APIs.

Tracked-On: #4320
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-04-16 21:59:03 +08:00
Jason Chen CJ
0584981c03 hv:add pre-condition for vm APIs
check the vm state in hypercall api,
add pre-condition for vm api.

Tracked-On: #4320
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-04-16 21:59:03 +08:00
Mingqiang Chi
fe929d0a10 hv: move out pause_vm from shutdown_vm
now it will call pause_vm in shutdown_vm,
move it out from shutdown_vm to reduce coupling.
Tracked-On: #4320
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-04-16 21:59:03 +08:00
Yonghua Huang
84eaf94ae6 hv: wrap a function to initialize pCPU for second phase
This patch wrapps a common function to initialize physical
 CPU for the second phase to reduce redundant code.

Tracked-On: #861
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
2020-04-16 14:02:29 +08:00
Xiaoguang Wu
d4f789f47e hv: iommu: remove snoop related code
ACRN disables Snoop Control in VT-d DMAR engines for simplifing the
implementation. Also, since the snoop behavior of PCIE transactions
can be controlled by guest drivers, some devices may take the advantage
of the NO_SNOOP_ATTRIBUTE of PCIE transactions for better performance
when snoop is not needed. No matter ACRN enables or disables Snoop
Control, the DMA operations of passthrough devices behave correctly
from guests' point of view.

This patch is used to clean all the snoop related code.

Tracked-On: #4509
Signed-off-by: Xiaoguang Wu <xiaoguang.wu@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2020-04-16 08:40:17 +08:00
Xiaoguang Wu
b4f1e5aa85 hv: iommu: disable snoop bit in EPT-PTE/SL-PTE
Due to the fact that i915 iommu doesn't support snoop, hence it can't
access memory when the SNOOP bit of Secondary Level page PTE (SL-PTE)
is set, this will cause many undefined issues such as invisible cursor
in WaaG etc.

Current hv design uses EPT as Scondary Leval Page for iommu, and this
patch removes the codes of setting SNOOP bit in both EPT-PTE and SL-PTE
to avoid errors.

And according to SDM 28.2.2, the SNOOP bit (11th bit) will be ignored
by EPT, so it will not affect the CPU address translation.

Tracked-On: #4509
Signed-off-by: Xiaoguang Wu <xiaoguang.wu@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2020-04-16 08:40:17 +08:00
Conghui Chen
84ad340898 hv: fix for waag 2 core reboot issue
Waag will send NMIs to all its cores during reboot. But currently,
NMI cannot be injected to vcpu which is in HLT state.
To fix the problem, need to wakeup target vcpu, and inject NMI through
interrupt-window.

Tracked-On: #4620
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-04-15 14:42:00 +08:00
Binbin Wu
597f7658fc hv: guest: fix bug in get_vcpu_paging_mode
Align the implementation to SDM Vol.3 4.1.1.
Also this patch fixed a bug that doesn't check paging status first
in some cpu mode.

Tracked-On: #4628
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-04-15 14:40:02 +08:00
Zide Chen
6040d8f6a2 hv: fix SOS vapic_id assignment issue
Currently vlapic_build_id() uses vcpu_id to retrieve the lapic_id
per_cpu variable:

  vlapic_id = per_cpu(lapic_id, vcpu->vcpu_id);

SOS vcpu_id may not equal to pcpu_id, and in that case it runs into
problems. For example, if any pre-launched VMs are launched on PCPUs
whose IDs are smaller than any PCPU IDs that are used by SOS.

This patch fixes the issue and simplify the code to create or get
vapic_id by:

- assign vapic_id in create_vlapic(), which now takes pcpu_id as input
  argument, and save it in the new field: vlapic->vapic_id, which will
  never be changed.
- simplify vlapic_get_apicid() by returning te saved vapid_id directly.
- remove vlapic_build_id().
- vlapic_init() is only called once, merge it into vlapic_create().

Tracked-On: #4268
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-04-15 14:34:15 +08:00
dongshen
00ad3863a1 hv: maintain a per-pCPU array of vCPUs and handle posted interrupt IRQs
Maintain a per-pCPU array of vCPUs (struct acrn_vcpu *vcpu_array[CONFIG_MAX_VM_NUM]),
one VM cannot have multiple vCPUs share one pcpu, so we can utilize this property
and use the containing VM's vm_id as the index to the vCPU array:

 In create_vcpu(), we simply do:
   per_cpu(vcpu_array, pcpu_id)[vm->vm_id] = vcpu;

 In offline_vcpu():
   per_cpu(vcpu_array, pcpuid_from_vcpu(vcpu))[vcpu->vm->vm_id] = NULL;

so basically we use the containing VM's vm_id as the index to the vCPU array,
as well as the index of posted interrupt IRQ/vector pair that are assigned
to this vCPU:
  0: first vCPU and first posted interrupt IRQs/vector pair
  (POSTED_INTR_IRQ/POSTED_INTR_VECTOR)
  ...
  CONFIG_MAX_VM_NUM-1: last vCPU and last posted interrupt IRQs/vector pair
  ((POSTED_INTR_IRQ + CONFIG_MAX_VM_NUM - 1U)/(POSTED_INTR_VECTOR + CONFIG_MAX_VM_NUM - 1U)

In the posted interrupt handler, it will do the following:
 Translate the IRQ into a zero based index of where the vCPU
 is located in the vCPU list for current pCPU. Once the
 vCPU is found, we wake up the waiting thread and record
 this request as ACRN_REQUEST_EVENT

Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
2020-04-15 13:47:22 +08:00
dongshen
14fa9c563c hv: define posted interrupt IRQs/vectors
This is a preparation patch for adding support for VT-d PI
related vCPU scheduling.

ACRN does not support vCPU migration, one vCPU always runs on
the same pCPU, so PI's ndst is never changed after startup.

VCPUs of a VM won’t share same pCPU. So the maximum possible number
of VCPUs that can run on a pCPU is CONFIG_MAX_VM_NUM.

Allocate unique Activation Notification Vectors (ANV) for each vCPU
that belongs to the same pCPU, the ANVs need only be unique within each
pCPU, not across all vCPUs. This reduces # of pre-allocated ANVs for
posted interrupts to CONFIG_MAX_VM_NUM, and enables ACRN to avoid
switching between active and wake-up vector values in the posted
interrupt descriptor on vCPU scheduling state changes.

A total of CONFIG_MAX_VM_NUM consecutive IRQs/vectors are reserved
for posted interrupts use.

The code first initializes vcpu->arch.pid.control.bits.nv dynamically
(will be added in subsequent patch), the other code shall use
vcpu->arch.pid.control.bits.nv instead of the hard-coded notification vectors.

Rename some functions:
  apicv_post_intr --> apicv_trigger_pi_anv
  posted_intr_notification --> handle_pi_notification
  setup_posted_intr_notification --> setup_pi_notification

Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
2020-04-15 13:47:22 +08:00
dongshen
c2d350c5cc hv: enable VT-d PI for ptdev if intr_src->pid_addr is non-zero
Fill in posted interrupt fields (vector, pda, etc) and set mode to 1 to
enable VT-d PI (posted mode) for this ptdev.

If intr_src->pi_vcpu is 0, fall back to use the remapped mode.

Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
2020-04-15 13:47:22 +08:00
dongshen
f7be985a23 hv: check if the IRQ is intended for a single destination vCPU
Given the vcpumask, check if the IRQ is single destination
and return the destination vCPU if so, the address of associated PI
descriptor for this vCPU can then be passed to dmar_assign_irte() to
set up the posted interrupt IRTE for this device.

For fixed mode interrupt delivery, all vCPUs listed in vcpumask should
service the interrupt requested. But VT-d PI cannot support multicast/broadcast
IRQs, it only supports single CPU destination. So the number of vCPUs
shall be 1 in order to handle IRQ in posted mode for this device.

Add pid_paddr to struct intr_source. If platform_caps.pi is true and
the IRQ is single-destination, pass the physical address of the destination
vCPU's PID to ptirq_build_physical_msi and dmar_assign_irte

Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
2020-04-15 13:47:22 +08:00
dongshen
6496da7c56 hv: add function to check if using posted interrupt is possible for vm
Add platform_caps.c to maintain platform related information

Set platform_caps.pi to true if all iommus are posted interrupt capable, false
otherwise

If lapic passthru is not configured and platform_caps.pi is true, the vm
may be able to use posted interrupt for a ptdev, if the ptdev's IRQ is
single-destination

Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
2020-04-15 13:47:22 +08:00
Sainath Grandhi
47f883db30 hv: Hypervisor access to PCI devices with 64-bit MMIO BARs
PCI devices with 64-bit MMIO BARs and requiring large MMIO space
can be assigned with physical address range at the very high end of
platform supported physical address space.

This patch uses the board info for 64-bit MMIO window as programmed
by BIOS and constructs 1G page tables for the same.

As ACRN uses identity mapping from Linear to Physical address space
physical addresses upto 48 bit or 256TB can be supported.

Tracked-On: #4586
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-04-13 16:52:18 +08:00
Sainath Grandhi
1c21f747be hv: Add HI_MMIO_START and HI_MMIO_END macros to board files
Add 64-bit MMIO window related MACROs to the supported board files
in the hypervisor source code.

Tracked-On: #4586
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
2020-04-13 16:52:18 +08:00