Commit Graph

2242 Commits

Author SHA1 Message Date
Zide Chen
5379b14108 hv: nested: define VMCS shadow fields
Enable VMCS shadowing for most of the VMCS fields, so that execution of
the VMREAD or VMWRITE on these shadow VMCS fields from L1 hypervisor
won't cause VM exits, but read from or write to the shadow VMCS.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Alexander Merritt <alex.merritt@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
2021-05-24 10:34:01 +08:00
Zide Chen
863e58e539 hv: nested: define software layout for VMCS12 and helper functions
Software layout of VMCS12 data is a contract between L1 guest and L0
hypervisor to run a L2 guest.

ACRN hypervisor caches the VMCS12 which is passed down from L1 hypervisor
by the VMPTRLD instructin. At the time of VMCLEAR, ACRN syncs the cached
VMCS12 back to L1 guest memory.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-05-24 10:34:01 +08:00
Zide Chen
f5744174b5 hv: nested: support for VMPTRLD emulation
This patch emulates the VMPTRLD instruction. L0 hypervisor (ACRN) caches
the VMCS12 that is passed down from the VMPTRLD instruction, and merges it
with VMCS01 to create VMCS02 to run the nested VM.

- Currently ACRN can't cache multiple VMCS12 on one vCPU, so it needs to
  flushes active but not current VMCS12s to L1 guest.
- ACRN creates VMCS02 to run nested VM based on VMCS12:
  1) copy VMCS12 from guest memory to the per vCPU cache VMCS12
  2) initialize VMCS02 revision ID and host-state area
  3) load shadow fields from cache VMCS12 to VMCS02
  4) enable VMCS shadowing before L1 Vm entry

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
2021-05-24 10:34:01 +08:00
Zide Chen
0a1ac2f4a0 hv: nested: support for VMXOFF emulation
This patch implements the VMXOFF instruction. By issuing VMXOFF,
L1 guest Leaves VMX Operation.

- cleanup VCPU nested virtualization context states in VMXOFF handler.
- implement check_vmx_permission() to check permission for VMX operation
  for VMXOFF and other VMX instructions.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-05-24 10:34:01 +08:00
Zide Chen
3fdad3c6d1 hv: nested: check prerequisites to enter VMX operation
According to VMXON Instruction Reference, do the following checks in the
virtual hardware environment: vCPU CPL, guest CR0, CR4, revision ID
in VMXON region, etc.

Currently ACRN doesn't support 32-bit L1 hypervisor, and injects an #UD
exception if L1 hypervisor is not running in 64-bit mode.

Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-05-24 10:34:01 +08:00
Zide Chen
fc8f07e740 hv: nested: support for VMXON emulation
This patch emulates VMXON instruction. Basically checks some
prerequisites to enable VMX operation on L1 guest (next patch), and
prepares some virtual hardware environment in L0.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-05-24 10:34:01 +08:00
Tao Yuhong
b93d6b2ef0 HV: Fix mistake use stac() & clac()
The commit 2ab70f43e5
HV: cache: Fix page fault by flushing cache for VM trusty RAM in HV

It is wrong in using stac()/clac()

Tracked-On: #6020
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
2021-05-24 10:32:54 +08:00
Li Fei1
6077152c4b hv: vlapic: extend vlapic_x2apic_pt_icr_access to support more destination mode
Now guest would use `Destination Shorthand` to broadcast IPIs if there're more
than one destination. However, it is not supported when the guest is in LAPIC
passthru situation, and all active VCPUs are working in X2APIC mode. As a result,
the guest would not work properly since this kind broadcast IPIs was ignored
by ACRN. What's worse, ACRN Hypervisor would inject GP to the guest in this case.

This patch extend vlapic_x2apic_pt_icr_access to support more destination modes
(both `Physical` and `Logical`) and destination shorthand (`No Shorthand`, `Self`,
`All Including Self` and `All Excluding Self`).

Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-24 10:27:32 +08:00
Li Fei1
a69e67b58b hv: vlapic: wrap a function to calculate destination vcpu mask by shorthand
1. Rename vlapic_calc_dest to vlapic_calc_dest_noshort
2. Remove vlapic_calc_dest_lapic_pt, use vlapic_calc_dest_noshort instead
3. Wrap vlapic_calc_dest to calculate destination vcpu mask according shorthand

Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-24 10:27:32 +08:00
Tao Yuhong
2ab70f43e5 HV: cache: Fix page fault by flushing cache for VM trusty RAM in HV
The accrss right of HV RAM can be changed to PAGE_USER (eg. trusty RAM
of post-launched VM). So before using clflush(or clflushopt) to flush
HV RAM cache, must allow explicit supervisor-mode data accesses to
user-mode pages. Otherwise, it may trigger page fault.

Tracked-On: #6020
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
2021-05-21 09:20:46 +08:00
Liang Yi
6805510d77 hv/mod_timer: refine timer interface
1. do not allow external modules to touch internal field of a timer.
2. make timer mode internal, period_in_ticks will decide the mode.

API wise:
1. the "mode" parameter was taken out of initialize_timer().
2. a new function update_timer() was added to update the timeout and
   period fields.
3. the timer_expired() function was extended with an output parameter
   to return the remaining cycles before expiration.

Also, the "fire_tsc" field name of hv_timer was renamed to "timeout".
With the new API, however, this change should not concern user code.

Tracked-On: #5920

Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-05-18 16:43:28 +08:00
Liang Yi
3547c9cd23 hv/mod_timer: make timer into an arch-independent module
x86/timer.[ch] was moved to the common directory largely unchanged.

x86 specific code now resides in x86/tsc_deadline_timer.c and its
interface was defined in hw/hw_timer.h. The interface defines two
functions: init_hw_timer() and set_hw_timeout() that provides HW
specific initialization and timer interrupt source.

Other than these two functions, the timer module is largely arch
agnostic.

Tracked-On: #5920
Signed-off-by: Rong Liu <rong2.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-05-18 16:43:28 +08:00
Liang Yi
51204a8d11 hv/mod_timer: separate delay functions from the timer module
Modules that use udelay() should include "delay.h" explicitly.

Tracked-On: #5920
Signed-off-by: Rong Liu <rong2.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-05-18 16:43:28 +08:00
Liang Yi
5a2b89b0a4 hv/mod_timer: split tsc handling code from timer.
Generalize and split basic cpu cycle/tick routines from x86/timer:
- Instead of rdstc(), use cpu_ticks() in generic code.
- Instead of get_tsc_khz(), use cpu_tickrate() in generic code.
- Include "common/ticks.h" instead of "x86/timer.h" in generic code.
- CYCLES_PER_MS is renamed to TICKS_PER_MS.

The x86 specific API rdstc() and get_tsc_khz(), as well as TSC_PER_MS
are still available in arch/x86/tsc.h but only for x86 specific usage.

Tracked-On: #5920
Signed-off-by: Rong Liu <rong2.liu@intel.com>
Signed-off-by: Yi Liang <yi.liang@intel.com>
2021-05-18 16:43:28 +08:00
Yonghua Huang
00b3a28d5d hv: update RTCT parser to support RTCT version 2
RTCT has been updated to version 2,
  this patch updates hypervisor RTCT parser to support
  both version 1 and version 2 of RTCT.

Tracked-On: #6020
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Jason CJ Chen <jason.cj.chen@intel.com>
2021-05-17 17:19:11 +08:00
Yonghua Huang
9facbb43b3 config-tool: rename PSRARM to SSRAM
'psram' and 'PSRAM' are legacy names and replaced
  with 'ssram' and 'SSRAM' respectively.

Tracked-On: #6012
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Shuang Zheng <shuang.zheng@intel.com>
2021-05-17 14:31:42 +08:00
Zide Chen
c9982e8c7e hv: nested: setup emulated VMX MSRs
We emulated these MSRs:

- MSR_IA32_VMX_PINBASED_CTLS
- MSR_IA32_VMX_PROCBASED_CTLS
- MSR_IA32_VMX_PROCBASED_CTLS2
- MSR_IA32_VMX_EXIT_CTLS
- MSR_IA32_VMX_ENTRY_CTLS
- MSR_IA32_VMX_BASIC: emulate VMCS revision ID, etc.
- MSR_IA32_VMX_MISC

For the following MSRs, we pass through the physical value to L1 guests:

- MSR_IA32_VMX_EPT_VPID_CAP
- MSR_IA32_VMX_VMCS_ENUM
- MSR_IA32_VMX_CR0_FIXED0
- MSR_IA32_VMX_CR0_FIXED1
- MSR_IA32_VMX_CR4_FIXED0
- MSR_IA32_VMX_CR4_FIXED1

Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-16 19:05:21 +08:00
Zide Chen
4930992118 hv: nested: implement the framework for VMX MSR emulation
Define LIST_OF_VMX_MSRS which includes a list of MSRs that are visible to
L1 guests if nested virtualization is enabled.
- If CONFIG_NVMX_ENABLED is set, these MSRs are included in
  emulated_guest_msrs[].
- otherwise, they are included in unsupported_msrs[].

In this way we can take advantage of the existing infrastructure to
emulate these MSRs.

Tracked-On: #5923
Spick igned-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-16 19:05:21 +08:00
Zide Chen
97df220f49 hv: vmsr: emulate IA32_FEATURE_CONTORL MSR for nested virtualization
In order to support nested virtualization, need to expose the "Enable VMX
outside SMX operation" bit to L1 hypervisor.

Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-16 19:05:21 +08:00
Yonghua Huang
e9870893a3 hv: rename some software SRAM local names
For simplification purpose, use 'ssram' instead of
 'software sram' for local names inside rtcm module.

Tracked-On: #6015
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-16 10:08:17 +08:00
Li Fei1
30febed0e1 hv: cache: wrap common APIs
Wrap three common Cache APIs:
- flush_invalidate_all_cache
- flush_cacheline
- flush_cache_range

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-05-14 09:18:00 +08:00
Li Fei1
77e64f6092 hv: tlb: wrap common APIs
Wrap two common TLB APIs: flush_tlb and flush_tlb_range.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-05-14 09:18:00 +08:00
Li Fei1
d94582389e hv: mmu: move arch specific parts into cpu.h
Move Cache/TLB arch specific parts into cpu.h
After this change, we should not expose arch specific parts out from mmu.h

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-05-14 09:18:00 +08:00
Li Fei1
d6362b6e0a hv: paging: rename ppt_set/clear_ATTR to set_paging_ATTR
Rename ppt_set/clear_(attribute) to set_paging_(attribute)

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-05-14 09:18:00 +08:00
Zide Chen
ccfdf9cdd7 hv: nested: enable nested virtualization
Allow guest set CR4_VMXE if CONFIG_NVMX_ENABLED is set:

- move CR4_VMXE from CR4_EMULATED_RESERVE_BITS to CR4_TRAP_AND_EMULATE_BITS
  so that CR4_VMXE is removed from cr4_reserved_bits_mask.
- force CR4_VMXE to be removed from cr4_rsv_bits_guest_value so that CR4_VMXE
  is able to be set.

Expose VMX feature (CPUID01.01H:ECX[5]) to L1 guests whose GUEST_FLAG_NVMX_ENABLED
is set.

Assuming guest hypervisor (L1) is KVM, and KVM uses EPT for L2 guests.

Constraints on ACRN VM.
- LAPIC passthrough should be enabled.
- use SCHED_NOOP scheduler.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-13 16:16:30 +08:00
Zide Chen
dd90eccc25 hv: move invvpid and invept helper code from mmu.c to mmu.h
moving invvpid and invept helper code from mmu.c to mmu.h, so that they
can be accessed by the nested virtualization code.

No logical changes.

Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-13 16:16:30 +08:00
Shuo A Liu
3fffa68665 hv: Support WAITPKG instructions in guest VM
TPAUSE, UMONITOR or UMWAIT instructions execution in guest VM cause
a #UD if "enable user wait and pause" (bit 26) of VMX_PROCBASED_CTLS2
is not set. To fix this issue, set the bit 26 of VMX_PROCBASED_CTLS2.

Besides, these WAITPKG instructions uses MSR_IA32_UMWAIT_CONTROL. So
load corresponding vMSR value during context switch in of a vCPU.

Please note, the TPAUSE or UMWAIT instruction causes a VM exit if the
"RDTSC exiting" and "enable user wait and pause" are both 1. In ACRN
hypervisor, "RDTSC exiting" is always 0. So TPAUSE or UMWAIT doesn't
cause a VM exit.

Performance impact:
    MSR_IA32_UMWAIT_CONTROL read costs ~19 cycles;
    MSR_IA32_UMWAIT_CONTROL write costs ~63 cycles.

Tracked-On: #6006
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
2021-05-13 14:19:50 +08:00
dongshen
ebadf00de8 hv: some coding style fixes
Fix issues reported by checkpatch.pl

Tracked-On: #5917
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
2021-05-12 16:50:34 +08:00
Junjie Mao
ea4eadf0a5 hv: hypercalls: refactor permission-checking and dispatching logic
The current permission-checking and dispatching mechanism of hypercalls is
not unified because:

  1. Some hypercalls require the exact vCPU initiating the call, while the
     others only need to know the VM.
  2. Different hypercalls have different permission requirements: the
     trusty-related ones are enabled by a guest flag, while the others
     require the initiating VM to be the Service OS.

Without a unified logic it could be hard to scale when more kinds of
hypercalls are added later.

The objectives of this patch are as follows.

  1. All hypercalls have the same prototype and are dispatched by a unified
     logic.
  2. Permissions are checked by a unified logic without consulting the
     hypercall ID.

To achieve the first objective, this patch modifies the type of the first
parameter of hcall_* functions (which are the callbacks implementing the
hypercalls) from `struct acrn_vm *` to `struct acrn_vcpu *`. The
doxygen-style documentations are updated accordingly.

To achieve the second objective, this patch adds to `struct hc_dispatch` a
`permission_flags` field which specifies the guest flags that must ALL be
set for a VM to be able to invoke the hypercall. The default value (which
is 0UL) indicates that this hypercall is for SOS only. Currently only the
`permission_flag` of trusty-related hypercalls have the non-zero value
GUEST_FLAG_SECURE_WORLD_ENABLED.

With `permission_flag`, the permission checking logic of hypercalls is
unified as follows.

  1. General checks
     i. If the VM is neither SOS nor having any guest flag that allows
        certain hypercalls, it gets #UD upon executing the `vmcall`
        instruction.
    ii. If the VM is allowed to execute the `vmcall` instruction, but
        attempts to execute it in ring 1, 2 or 3, the VM gets #GP(0).
  2. Hypercall-specific checks
     i. If the hypercall is for SOS (i.e. `permission_flag` is 0), the
        initiating VM must be SOS and the specified target VM cannot be a
        pre-launched VM. Otherwise the hypercall returns -EINVAL without
        further actions.
    ii. If the hypercall requires certain guest flags, the initiating VM
        must have all the required flags. Otherwise the hypercall returns
        -EINVAL without further actions.
   iii. A hypercall with an unknown hypercall ID makes the hypercall
        returns -EINVAL without further actions.

The logic above is different from the current implementation in the
following aspects.

  1. A pre-launched VM now gets #UD (rather than #GP(0)) when it attempts
     to execute `vmcall` in ring 1, 2 or 3.
  2. A pre-launched VM now gets #UD (rather than the return value -EPERM)
     when it attempts to execute a trusty hypercall in ring 0.
  3. The SOS now gets the return value -EINVAL (rather than -EPERM) when it
     attempts to invoke a trusty hypercall.
  4. A post-launched VM with trusty support now gets the return value
     -EINVAL (rather than #UD) when it attempts to invoke a non-trusty
     hypercall or an invalid hypercall.

v1 -> v2:
 - Update documentation that describe hypercall behavior.
 - Fix Doxygen warnings

Tracked-On: #5924
Signed-off-by: Junjie Mao <junjie.mao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-12 13:43:41 +08:00
Liang Yi
688a41c290 hv: mod: do not use explicit arch name when including headers
Instead of "#include <x86/foo.h>", use "#include <asm/foo.h>".

In other words, we are adopting the same practice in Linux kernel.

Tracked-On: #5920
Signed-off-by: Liang Yi <yi.liang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-05-08 11:15:46 +08:00
Li Fei1
f3327364c3 hv: mmu: fix a minor bug
We should only map [low32_max_ram, 4G) MMIO region as UC attribute,
not map [low32_max_ram, low32_max_ram + 4G) region as UC attribute.
Otherwise, the HV will complain [4G, low32_max_ram + 4G) region has
already mapped.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-04-29 08:57:13 +08:00
Shuo A Liu
dc88c2e397 hv: Save/restore MSR_IA32_CSTAR during context switch
Both Windows guest and Linux guest use the MSR MSR_IA32_CSTAR, while
Linux uses it rarely. Now vcpu context switch doesn't save/restore it.
Windows detects the change of the MSR and rises a exception.

Do the save/resotre MSR_IA32_CSTAR during context switch.

Tracked-On: #5899
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-04-23 11:21:52 +08:00
Jian Jun Chen
31b8b698ce hv: TLFS: Add tsc_offset support for reference time
TLFS spec defines that when a VM is created, the value of
HV_X64_MSR_TIME_REF_COUNT is set to zero. Now tsc_offset is not
supported properly, so guest get a drifted reference time.

This patch implements tsc_offset. tsc_scale and tsc_offset
are calculated when a VM is launched and are saved in
struct acrn_hyperv of struct acrn_vm.

Tracked-On: #5956
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-04-23 10:48:07 +08:00
Jian Jun Chen
b4312efbd7 hv: TLFS: inject #GP to guest VM for writing of read-only MSRs
TLFS spec defines that HV_X64_MSR_VP_INDEX and HV_X64_MSR_TIME_REF_COUNT
are read-only MSRs. Any attempt to write to them results in a #GP fault.

Fix the issue by returning error in handler hyperv_wrmsr() of MSRs
HV_X64_MSR_VP_INDEX/HV_X64_MSR_TIME_REF_COUNT emulation.

Tracked-On: #5956
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-04-23 10:48:07 +08:00
Jian Jun Chen
dd524d076d hv: TLFS: Setup hypercall page according to the vcpu mode
TLFS spec defines different hypercall ABIs for X86 and x64. Currently
x64 hypercall interface is not supported well.

Setup the hypercall interface page according to the vcpu mode.

Tracked-On: #5956
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-04-23 10:48:07 +08:00
Li Fei1
628bca5cad hv: pgtable: use new algo to calculate PPT/EPT_PD_PAGE_NUM
In order to support platform (such as Ander Lake) which physical address width
bits is 46, the current code need to reserve 2^16 PD page ((2^46) / (2^30)).
This is a complete waste of memory.

This patch would reserve PD page by three parts:
1. DRAM - may take PD_PAGE_NUM(CONFIG_PLATFORM_RAM_SIZE) PD pages at most;
2. low MMIO - may take PD_PAGE_NUM(MEM_1G << 2U) PD pages at most;
3. high MMIO - may takes (CONFIG_MAX_PCI_DEV_NUM * 6U) PD pages (may plus
PDPT entries if its size is larger than 1GB ) at most for:
(a) MMIO BAR size must be a power of 2 from 16 bytes;
(b) MMIO BAR base address must be power of two in size and are aligned with
its size.

Tracked-On: #5929
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-04-22 14:35:57 +08:00
Li Fei1
053c09e764 hv: cpu_cap: PAW over 39 bits must support 1GB large page
The platform which physical-address width over 39 bits must support
1GB large page (Both MMU and VMX sides ). This could save lots of
page table pages for EPT MMIO mapping.

Tracked-On: #5929
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-04-22 14:35:57 +08:00
Li Fei1
41e2d40d1f hv: e820: remove get_mem_range_info
No one uses get_mem_range_info to get the top/bottom/size of the physical memory.
We could get these informations by e820 table easily.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: eddie Dong <eddie.dong@intel.com>
2021-04-21 14:00:44 +08:00
Li Fei1
3a465388d4 hv: guest: remove get_mem_range_info in prepare_sos_vm_memmap
We used get_mem_range_info to get the top memory address and then use this address
as the high 64 bits max memory address of SOS. This assumes the platform must have
high memory space.

This patch removes the assumption. It will set high 64 bits max memory address of
SOS to 4G by default (Which means there's no 64 bits high memory), then update
the high 64 bits max memory address if the SOS really has high memory space.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: eddie Dong <eddie.dong@intel.com>
2021-04-21 14:00:44 +08:00
Li Fei1
901e8c869e hv: vE820: calculate SOS memory size by vE820 tables
SOS's memory size could be calculated by its vE820 Tables easily.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: eddie Dong <eddie.dong@intel.com>
2021-04-21 14:00:44 +08:00
Li Fei1
ad15053304 hv: mmu: remove get_mem_range_info in init_paging
We used get_mem_range_info to get the top memory address and then use this address
as the high 64 bits max memory address. This assumes the platform must have high
memory space.

This patch calculates the high 64 bits max memory address according the e820 tables
and removes the assumption "The platform must have high memory space" by map the
low RAM region and high RAM region separately.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: eddie Dong <eddie.dong@intel.com>
2021-04-21 14:00:44 +08:00
Li Fei1
6137347411 hv: smp: fix an isuue about SMP sync
Now BSP may launch VMs before APs have not done its initilization,
for example, sched_control for per-cpu. However, when we initilize
the vcpu thread data, it will access the object (scheduler) of the
sched_control of APs. As a result, it will trigger the PF.

This patch would waits each physical has done its initilization before
to continue to execute.

Tracked-On: #5929
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-04-21 10:54:48 +08:00
Li Fei1
5f281df548 hv: serializng: use mfence to ensure trampoline code was updated
Using the MFENCE to make sure trampoline code
has been updated (clflush) into memory beforing start APs.

Tracked-On: #5929
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-04-21 10:54:48 +08:00
Li Fei1
e049abb542 hv: vcpuid: hide new cpuid 0x1b/0x1f
Hide CPUID 0x1b (PCONFIG) and 0x1f (Extended Topology Enumeration Leaf)

Tracked-On: #5929
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-04-20 13:28:44 +08:00
Li Fei1
31f48d12a2 hv: memory order: use mfence to strengthen the fast string operations order
Use MFENCE to strengthen the fast string operations execute order to ensure
all trampoline code was updated before flush it into the memory.

Tracked-On: #5929
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-04-20 13:28:44 +08:00
Yifan Liu
b80c388b52 hv: Hide HLAT to guest
For platform with HLAT (Hypervisor-managed Linear Address Translation)
capability, the hypervisor shall hide this feature to its guest.

This patch adds MSR_IA32_VMX_PROCBASED_CTLS3 MSR to unsupported MSR
list.

The presence of this MSR is determined by 1-setting of bit 49 of MSR
MSR_IA32_VMX_PROCBASED_CTLS. which is already in unsupported MSR list. [2]

Related documentations:
[1] Intel Architecture Instruction Set Extensions, version Feb 16, 2021,
Ch 6.12
[2] Intel KeyLocker Specification, Sept 2020, Ch 7.2

Tracked-On: #5895
Signed-off-by: Yifan Liu <yifan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-04-07 13:47:47 +08:00
Li Fei1
d1ae797742 hv: pgtable: move sanitize_pte into pagetable.c
sanitize_pte is used to set page table entry to map to an sanitized page to
mitigate l1tf. It should belongs to pgtable module. So move it to pagetable.c

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-03-29 13:28:55 +08:00
Li Fei1
ef90bb6db3 hv:pgtable: rename lookup_address to pgtable_lookup_entry
lookup_address is used to lookup a pagetable entry by an address. So rename it
to pgtable_lookup_entry to indicate this clearly.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-29 13:28:55 +08:00
Li Fei1
36ddd87a09 hv: pgtable: remove alloc_ept_page
alloc_page/free_page should been called in pagetable module. In order to do this,
we add pgtable_create_root and pgtable_create_trusty_root to create PML4 page table
page for normal world and secure world.

After this done, no one uses alloc_ept_page. So remove it.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-29 13:28:55 +08:00
Li Fei1
ea701c63c7 hv: pgtable: add pgtable_create_trusty_root
Add pgtable_create_trusty_root to allocate a page for trusty PML4 page table page.
This function also copy PDPT entries from Normal world to Secure world.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-29 13:28:55 +08:00
Li Fei1
596c349600 hv: pgtable: add pgtable_create_root
Add pgtable_create_root to allocate a page for PMl4 page table page.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-29 13:28:55 +08:00
Li Fei1
eb52e2193a hv: pgtable: refine name for pgtable add/modify/del
Rename mmu_add to pgtable_add_map;
Rename mmu_modify_or_del to pgtable_modify_or_del_map.
And move these functions declaration into pgtable.h

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-29 13:28:55 +08:00
Liang Yi
33ef656462 hv/mod-irq: use arch specific header files
Requires explicit arch path name in the include directive.

The config scripts was also updated to reflect this change.

Tracked-On: #5825
Signed-off-by: Peter Fang <peter.fang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-24 11:38:14 +08:00
Liang Yi
df36da1b80 hv/mod_irq: do not include x86/irq.h in common/irq.h
Each .c file includes the arch specific irq header file (with full
path) by itself if required.

Tracked-On: #5825
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-24 11:38:14 +08:00
Liang Yi
741a208a02 hv/mod_irq: cleanup x86 lapic/ioapic header files
Declarations referenced nowhere else are moved into the c file.

Tracked-On: #5825
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-24 11:38:14 +08:00
Liang Yi
ff732cfb2a hv/mod_irq: move guest interrupt API out of x86/irq.h
A new x86/guest/virq.h head file now contains all guest
related interrupt handling API.

Tracked-On: #5825
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-24 11:38:14 +08:00
Liang Yi
798015876c hv/mod_irq: move NMI and exception handler out of x86/irq.c
Each of them now resides in a separate .c file.

Tracked-On: #5825

Signed-off-by: Yang, Yu-chu <yu-chu.yang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-24 11:38:14 +08:00
Liang Yi
3a50f949e1 hv/mod_irq: split irq.c into arch/x86/irq.c and common/irq.c
The common irq file is responsible for managing the central
irq_desc data structure and provides the following APIs for
host interrupt handling.
- init_interrupt()
- reserve_irq_num()
- request_irq()
- free_irq()
- set_irq_trigger_mode()
- do_irq()

API prototypes, constant and data structures belonging to common
interrupt handling are all moved into include/common/irq.h.

Conversely, the following arch specific APIs are added which are
called from the common code at various points:
- init_irq_descs_arch()
- setup_irqs_arch()
- init_interrupt_arch()
- free_irq_arch()
- request_irq_arch()
- pre_irq_arch()
- post_irq_arch()

Tracked-On: #5825
Signed-off-by: Peter Fang <peter.fang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-24 11:38:14 +08:00
Liang Yi
c46e3c71ac hv/mod_irq: decouple irq number reservation from ioapic
This is done be adding irq_rsvd_bitmap as an auxiliary bitmap
besides irq_alloc_bitmap.

Tracked-On: #5825
Signed-off-by: Peter Fang <peter.fang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-24 11:38:14 +08:00
Liang Yi
038e0cae92 hv/mod_irq: split IRQ handling into common and arch specific parts
The common IRQ handling routine calls arch specific functions
pre_irq_arch() and post_irq_arch() before and after calling the
registered action function respectively.

Tracked-On: #5825
Signed-off-by: Peter Fang <peter.fang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-24 11:38:14 +08:00
Liang Yi
ac3e0a1718 hv/mod_irq: split irq initialization into common and arch specific parts
The common part initializes the global irq_desc data structure while the
arch specific part initialize the HW and its own irq data.

This is one of the preparation steps for spliting IRQ handling into common
and architecture specific parts.

Tracked-On: #5825
Signed-off-by: Peter Fang <peter.fang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-24 11:38:14 +08:00
Liang Yi
f3cae9e258 hv/mod_irq: hide arch specific data in irq_desc
Arch specific IRQ data is now an opaque pointer in irq_desc.

This is a preparation step for spliting IRQ handling into common
and architecture specific parts.

Tracked-On: #5825
Signed-off-by: Peter Fang <peter.fang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-24 11:38:14 +08:00
Li Fei1
9000381f34 hv: pgtable: move pgtable definition to pgtable.h
This patch moves pgtable definition to pgtable.h and include the proper
header file for page module.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-03-11 13:48:52 +08:00
Li Fei1
0278a3f46e hv: pgatble: move the EPT page table related APIs to ept.c
Move the EPT page table related APIs to ept.c. page module only provides APIs to
allocate/free page for page table page. pagetabl module only provides APIs to
add/modify/delete/lookup page table entry. The page pool and the page table
related APIs for EPT should defined in EPT module.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-11 13:48:52 +08:00
Li Fei1
5c71ca456a hv: pgatble: move the MMU page table related APIs to mmu.c
Move the MMU page table related APIs to mmu.c. page module only provides APIs to
allocate/free page for page table page. pagetabl module only provides APIs to
add/modify/delete/lookup page table entry. The page pool and the page table
related APIs for MMU should defined in MMU module.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-11 13:48:52 +08:00
Li Fei1
15d68675e9 hv: pgtable: separate common APIs for MMU/EPT
We would move the MMU page table related APIs to mmu.c and move the EPT related
APIs to EPT.c. The page table module only provides APIs to add/modify/delete/lookup
page table entry.

This patch separates common APIs and adds separate APIs of page table module
for MMU/EPT.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-03-11 13:48:52 +08:00
Li Fei1
80bd3ac02a hv: trusty: move post_uos_sworld_memory into vm.c
post_uos_sworld_memory are used for post-launched VM which support trusty.
It's more VM related. So move it definition into vm.c

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-03-11 13:48:52 +08:00
Yonghua Huang
1a011bd91b hv: disable guest MONITOR-WAIT support when SW SRAM is configured
Per-core software SRAM L2 cache may be flushed by 'mwait'
extension instruction, which guest VM may execute to enter
core deep sleep. Such kind of flushing is not expected when
software SRAM is enabled for RTVM.

Hypervisor disables MONITOR-WAIT support on both hypervisor
and VMs sides to protect above software SRAM from being flushed.

This patch disable ACRN guest MONITOR-WAIT support if software
SRAM is configured.

Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-03-11 09:42:44 +08:00
Yonghua Huang
ae43b2a847 hv: disable host MONITOR-WAIT support when SW SRAM is enabled
Per-core software SRAM L2 cache may be flushed by 'mwait'
extension instruction, which guest VM may execute to enter
core deep sleep. Such kind of flushing is not expected when
software SRAM is enabled for RTVM.

Hypervisor disables MONITOR-WAIT support on both hypervisor
and VMs sides to protect above software SRAM from being flushed.

This patch disable hypervisor(host) MONITOR-WAIT support and refine
software sram initializaion flow.

Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-03-11 09:42:44 +08:00
Yonghua Huang
ea44bb6c4d hv: wrap function to check software SRAM support
Below boolean function are defined in this patch:
 - is_software_sram_enabled() to check if SW SRAM
   feature is enabled or not.
 - set global variable 'is_sw_sram_initialized'
   to file static.

Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-03-11 09:42:44 +08:00
Li Fei1
768e483cd2 hv: pgtable: rename 'struct memory_ops' to 'struct pgtable'
The fields and APIs in old 'struct memory_ops' are used to add/modify/delete
page table (page or entry). So rename 'struct memory_ops' to 'struct pgtable'.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-03-10 11:42:13 +08:00
Li Fei1
ef98fa69ce hv: pgtable: remove get_default_access_right API
Use default_access_right field to replace get_default_access_right API.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-03-10 11:42:13 +08:00
Li Fei1
7c6a52037a refine ept_flush_leaf_page
Refine the logic how to skip the pSRAM region when flushing cache.

Tracked-On: #5330
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-03-03 14:44:25 +08:00
Li Fei1
1db32f4d03 hv: ept: build 4KB page mapping in EPT for code pages of rtvm
RTVM is enforced to use 4KB pages to mitigate CVE-2018-12207 and performance jitter,
which may be introduced by splitting large page into 4KB pages on demand. It works
fine in previous hardware platform where the size of address space for the RTVM is
relatively small. However, this is a problem when the platforms support 64 bits
high MMIO space, which could be super large and therefore consumes large # of
EPT page table pages.

This patch optimize it by using large page for purely data pages, such as MMIO spaces,
even for the RTVM.

Signed-off-by: Li Fei1 <fei1.li@intel.com>
Tracked-On: #5788
2021-03-03 13:46:49 +08:00
Li Fei1
01b54241c6 hv: ept: only treak execution right for large pages
To mitigate the page size change MCE vulnerability (CVE-2018-12207), ACRN would
clear the execution permission in the EPT paging-structure entries for large pages
and then intercept an EPT execution-permission violation caused by an attempt to
execution an instruction in the guest.

However, the current code would clear the execution permission in the EPT paging-
structure entries for small pages too when we clearing the the execution permission
for large pages. This would trigger extra EPT violation VM exits.

This patch fix this issue.

Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Tracked-On: #5788
2021-03-03 13:46:49 +08:00
Li Fei1
97a9c5151b kv: kconfig: remove some unused ram size kconfig
SOS_RAM_SIZE/UOS_RAM_SIZE Kconfig are only used to calculate how many pages we
should reserve for the VM EPT mapping.

Now we reserve pages for each VM EPT pagetable mapping by the PLATFORM_RAM_SIZE
not the VM RAM SIZE. This could simplify the reserve logic for us: not need to
take care variable corner cases. We could make assume we reserve enough pages
base on the VM could not use the resources beyond the platform hardware resources.

So remove these two unused VM ram size kconfig.

Signed-off-by: Li Fei1 <fei1.li@intel.com>
Tracked-On: #5788
2021-03-01 13:10:04 +08:00
Li Fei1
0579e2ee24 hv: page: add free_page
Add free_page to free page when unmap pagetable.

Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Tracked-On: #5788
2021-03-01 13:10:04 +08:00
Li Fei1
8d9f12f3b7 hv: page: use dynamic page allocation for pagetable mapping
For FuSa's case, we remove all dynamic memory allocation use in ACRN HV. Instead,
we use static memory allocation or embedded data structure. For pagetable page,
we prefer to use an index (hva for MMU, gpa for EPT) to get a page from a special
page pool. The special page pool should be big enougn for each possible index.
This is not a big problem when we don't support 64 bits MMIO. Without 64 bits MMIO
support, we could use the index to search addrss not larger than DRAM_SIZE + 4G.

However, if ACRN plan to support 64 bits MMIO in SOS, we could not use the static
memory alocation any more. This is because there's a very huge hole between the
top DRAM address and the bottom 64 bits MMIO address. We could not reserve such
many pages for pagetable mapping as the CPU physical address bits may very large.

This patch will use dynamic page allocation for pagetable mapping. We also need
reserve a big enough page pool at first. For HV MMU, we don't use 4K granularity
page table mapping, we need reserve PML4, PDPT and PD pages according the maximum
physical address space (PPT va and pa are identical mapping); For each VM EPT,
we reserve PML4, PDPT and PD pages according to the maximum physical address space
too, (the EPT address sapce can't beyond the physical address space), and we reserve
PT pages by real use cases of DRAM, low MMIO and high MMIO.

Signed-off-by: Li Fei1 <fei1.li@intel.com>
Tracked-On: #5788
2021-03-01 13:10:04 +08:00
Li Fei1
5621fabbcb hv: memory: remove get_sworld_memory_base API
memory_ops structure will be changed to store page table related fields.
However, secure world memory base address is not one of them, it's VM
related. So save sworld_memory_base_hva in vm_arch structure directly.

Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Tracked-On: #5788
2021-03-01 13:10:04 +08:00
Victor Sun
26abc82f3c HV: panic on 0 address when do e820_alloc_memory
Current memory allocation algorithm is to find the available address from
the highest possible address below max_address. If the function returns 0,
means all memory is used up and we have to put the resource at address 0,
this is dangerous for a running hypervisor.

Also returns 0 would make code logic very complicated, since memcpy_s()
doesn't support address 0 copy.

Tracked-On: #5626

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-02-26 16:38:32 +08:00
Victor Sun
2e72bb97e7 HV: refine acpi rsdp initialize interface
In previous code, the rsdp initialization is done in get_rsdp() api implicitly.
The function is called multiple times in following acpi table parsing functions
and the condition (rsdp == NULL) need to be added in each parsing function.
This is not needed since the panic would occur if rsdp is NULL when do acpi
initialization.

Tracked-On: #5626

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-02-26 16:38:32 +08:00
Yonghua Huang
fdfd28b140 hv: unmap software region of pre-RTVM from Service VM EPT
Accessing to software SRAM region is not allowed when
 software SRAM is pass-thru to prelaunch RTVM.

 This patch removes software SRAM region from service VM
 EPT if it is enabled for prelaunch RTVM.

Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
2021-02-25 09:35:31 +08:00
Tao Yuhong
50d8525618 HV: deny HV owned PCI bar access from SOS
This patch denies Service VM the access permission to device resources
owned by hypervisor.
HV may own these devices: (1) debug uart pci device for debug version
(2) type 1 pci device if have pre-launched VMs.
Current implementation exposes the mmio/pio resource of HV owned devices
to SOS, should remove them from SOS.

Tracked-On: #5615
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
2021-02-03 14:01:23 +08:00
Tao Yuhong
6e7ce4a73f HV: deny pre-launched VM ptdev bar access from SOS
This patch denies Service VM the access permission to device
resources owned by pre-launched VMs.
Rationale:
 * Pre-launched VMs in ACRN are independent of service VM,
   and should be immune to attacks from service VM. However,
   current implementation exposes the bar resource of passthru
   devices to service VM for some reason. This makes it possible
   for service VM to crash or attack pre-launched VMs.
 * It is same for hypervisor owned devices.

NOTE:
 * The MMIO spaces pre-allocated to VFs are still presented to
  Service VM. The SR-IOV capable devices assigned to pre-launched
  VMs doesn't have the SR-IOV capability. So the MMIO address spaces
  pre-allocated by BIOS for VFs are not decoded by hardware and
  couldn't be enabled by guest. SOS may live with seeing the address
  space or not. We will revisit later.

Tracked-On: #5615
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-02-03 14:01:23 +08:00
Shuo A Liu
d4aaf99d86 hv: keylocker: Support keylocker backup MSRs for Guest VM
The logical processor scoped IWKey can be copied to or from a
platform-scope storage copy called IWKeyBackup. Copying IWKey to
IWKeyBackup is called ‘backing up IWKey’ and copying from IWKeyBackup to
IWKey is called ‘restoring IWKey’.

IWKeyBackup and the path between it and IWKey are protected against
software and simple hardware attacks. This means that IWKeyBackup can be
used to distribute an IWKey within the logical processors in a platform
in a protected manner.

Linux keylocker implementation uses this feature, so they are
introduced by this patch.

Tracked-On: #5695
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-02-03 13:54:45 +08:00
Shuo A Liu
38cd5b481d hv: keylocker: host keylocker iwkey context switch
Different vCPU may have different IWKeys. Hypervisor need do the iwkey
context switch.

This patch introduce a load_iwkey() function to do that. Switches the
host iwkey when the switch_in vCPU satisfies:
  1) keylocker feature enabled
  2) Different from the current loaded one.

Two opportunities to do the load_iwkey():
  1) Guest enables CR4.KL bit.
  2) vCPU thread context switch.

load_iwkey() costs ~600 cycles when do the load IWKey action.

Tracked-On: #5695
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-02-03 13:54:45 +08:00
Shuo A Liu
c11c07e0fe hv: keylocker: Support Key Locker feature for guest VM
KeyLocker is a new security feature available in new Intel CPUs that
protects data-encryption keys for the Advanced Encryption Standard (AES)
algorithm. These keys are more valuable than what they guard. If stolen
once, the key can be repeatedly used even on another system and even
after vulnerability closed.

It also introduces a CPU-internal wrapping key (IWKey), which is a key-
encryption key to wrap AES keys into handles. While the IWKey is
inaccessible to software, randomizing the value during the boot-time
helps its value unpredictable.

Keylocker usage:
 - New “ENCODEKEY” instructions take original key input and returns HANDLE
   crypted by an internal wrap key (IWKey, init by “LOADIWKEY” instruction)
 - Software can then delete the original key from memory
 - Early in boot/software, less likely to have vulnerability that allows
   stealing original key
 - Later encrypt/decrypt can use the HANDLE through new AES KeyLocker
   instructions
 - Note:
      * Software can use original key without knowing it (use HANDLE)
      * HANDLE cannot be used on other systems or after warm/cold reset
      * IWKey cannot be read from CPU after it's loaded (this is the
        nature of this feature) and only 1 copy of IWKey inside CPU.

The virtualization implementation of Key Locker on ACRN is:
 - Each vCPU has a 'struct iwkey' to store its IWKey in struct
   acrn_vcpu_arch.
 - At initilization, every vCPU is created with a random IWKey.
 - Hypervisor traps the execution of LOADIWKEY (by 'LOADIWKEY exiting'
   VM-exectuion control) of vCPU to capture and save the IWKey if guest
   set a new IWKey. Don't support randomization (emulate CPUID to
   disable) of the LOADIWKEY as hypervisor cannot capture and save the
   random IWKey. From keylocker spec:
   "Note that a VMM may wish to enumerate no support for HW random IWKeys
   to the guest (i.e. enumerate CPUID.19H:ECX[1] as 0) as such IWKeys
   cannot be easily context switched. A guest ENCODEKEY will return the
   type of IWKey used (IWKey.KeySource) and thus will notice if a VMM
   virtualized a HW random IWKey with a SW specified IWKey."
 - In context_switch_in() of each vCPU, hypervisor loads that vCPU's
   IWKey into pCPU by LOADIWKEY instruction.
 - There is an assumption that ACRN hypervisor will never use the
   KeyLocker feature itself.

This patch implements the vCPU's IWKey management and the next patch
implements host context save/restore IWKey logic.

Tracked-On: #5695
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-02-03 13:54:45 +08:00
Shuo A Liu
4483e93bd1 hv: keylocker: Enable the tertiary VM-execution controls
In order for a VMM to capture the IWKey values of guests, processors
that support Key Locker also support a new "LOADIWKEY exiting"
VM-execution control in bit 0 of the tertiary processor-based
VM-execution controls.

This patch enables the tertiary VM-execution controls.

Tracked-On: #5695
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-02-03 13:54:45 +08:00
Shuo A Liu
e9247dbca0 hv: keylocker: Simulate CPUID of keylocker caps for guest VM
KeyLocker is a new security feature available in new Intel CPUs that
protects data-encryption keys for the Advanced Encryption Standard (AES)
algorithm.

This patch emulates Keylocker CPUID leaf 19H to support Keylocker
feature for guest VM.

To make the hypervisor being able to manage the IWKey correctly, this
patch doesn't expose hardware random IWKey capability
(CPUID.0x19.ECX[1]) to guest VM.

Tracked-On: #5695
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-02-03 13:54:45 +08:00
Shuo A Liu
15c967ad34 hv: keylocker: Add CR4 bit CR4_KL as CR4_TRAP_AND_PASSTHRU_BITS
Bit19 (CR4_KL) of CR4 is CPU KeyLocker feature enable bit. Hypervisor
traps the bit's writing to track the keylocker feature on/off of guest.
While the bit is set by guest,
 - set cr4_kl_enabled to indicate the vcpu's keylocker feature enabled status
 - load vcpu's IWKey in host (will add in later patch)
While the bit is clear by guest,
 - clear cr4_kl_enabled

This patch trap and passthru the CR4_KL bit to guest for operation.

Tracked-On: #5695
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-02-03 13:54:45 +08:00
Li Fei1
94a980c923 hv: hypercall: prevent sos can touch hv/pre-launched VM resource
Current implementation, SOS may allocate the memory region belonging to
hypervisor/pre-launched VM to a post-launched VM. Because it only verifies
the start address rather than the entire memory region.

This patch verifies the validity of the entire memory region before
allocating to a post-launched VM so that the specified memory can only
be allocated to a post-launched VM if the entire memory region is mapped
in SOS’s EPT.

Tracked-On: #5555
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Yonghua Huang  <yonghua.huang@intel.com>
2021-02-02 16:55:40 +08:00
Yonghua Huang
8bec63a6ea hv: remove the hardcoding of Software SRAM GPA base
Currently, we hardcode the GPA base of Software SRAM
 to an address that is derived from TGL platform,
 as this GPA is identical with HPA for Pre-launch VM,
 This hardcoded address may not work on other platforms
 if the HPA bases of Software SRAM are different.

 Now, Offline tool configures above GPA based on the
 detection of Software SRAM on specific platform.

 This patch removes the hardcoding GPA of Software SRAM,
 and also renames MACRO 'SOFTWARE_SRAM_BASE_GPA' to
 'PRE_RTVM_SW_SRAM_BASE_GPA' to avoid confusing, as it
 is for Prelaunch VM only.

Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-01-30 13:41:02 +08:00
Yonghua Huang
c9ca23d268 hv: refine RTCM initialization code
- RTCM is initialized in hypervisor only
   if RTCM binaries are detected.
 - Remove address space of RTCM binary from
   Software SRAM region.
 - Refine parse_rtct() function, validity of
   ACPI RTCT table shall be checked by caller.

Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-01-28 11:29:25 +08:00
Yonghua Huang
a6e666dbe7 hv: remove hardcoding of SW SRAM HPA base
Physical address to SW SRAM region maybe different
 on different platforms, this hardcoded address may
 result in address mismatch for SW SRAM operations.

 This patch removes above hardcoded address and uses
 the physical address parsed from native RTCT.

Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-01-28 11:29:25 +08:00
Yonghua Huang
a6420e8cfa hv: cleanup legacy terminologies in RTCM module
This patch updates below terminologies according
 to the latest TCC Spec:
  PTCT -> RTCT
  PTCM -> RTCM
  pSRAM -> Software SRAM

Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
2021-01-28 11:29:25 +08:00
Yonghua Huang
806f479108 hv: rename RTCM source files
'ptcm' and 'ptct' are legacy name according
   to the latest TCC spec, hence rename below files
   to avoid confusing:

  ptcm.c -> rtcm.c
  ptcm.h -> rtcm.h
  ptct.h -> rtct.h

Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
2021-01-28 11:29:25 +08:00
Liang Yi
e8a76868c9 hv: modularization: remove global variable efiloader_sig.
Simplify multiboot API by removing the global variable efiloader_sig.
Replaced by constant at the use site.

Tracked-On: #5661
Signed-off-by: Yi Liang <yi.liang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-01-27 15:59:47 +08:00
Liang Yi
67926cee81 hv: modularization: remove include/boot.h.
Remove include/boot.h since it contains only assembly variables that
should only be accessed in arch/x86/init.c.

Tracked-On: #5661
Signed-off-by: Yi Liang <yi.liang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-01-27 15:59:47 +08:00
Liang Yi
1de396363f hv: modularization: avoid dependency of multiboot on zeropage.h.
Split off definition of "struct efi_info" into a separate header
file lib/efi.h.

Tracked-On: #5661
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-01-27 15:59:47 +08:00
Liang Yi
681688fbe4 hv: modularization: change of multiboot API.
The init_multiboot_info() and sanitize_multiboot_ifno() APIs now
require parameters instead of implicitly relying on global boot
variables.

Tracked-On: #5661
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-01-27 15:59:47 +08:00
Liang Yi
66599e0aa7 hv: modularization: multiboot
Calling sanitize_multiboot() from init.c instead of cpu.c.

Tracked-On: #5661
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-01-27 15:59:47 +08:00
Liang Yi
c23e557a18 hv: modularization: make parse_hv_cmdline() an internal function.
This way, we void exposing acrn_mbi as a global variable.

Tracked-On: #5661
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-01-27 15:59:47 +08:00
Liang Yi
8f9ec59a53 hv: modularization: cleanup boot.h
Move multiboot specific declarations from boot.h to multiboot.h.

Tracked-On: #5661
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-01-27 15:59:47 +08:00
Jie Deng
5c5d272358 hv: remove bitmap_clear_lock of split-lock after completing emulation
When "signal_event" is called, "wait_event" will actually not block.
So it is ok to remove this line.

Tracked-On: #5605
Signed-off-by: Jie Deng <jie.deng@intel.com>
2021-01-13 15:32:27 +08:00
Yin Fengwei
ef411d4ac3 hv: ptirq: Shouldn't change sid if intx irq mapping was added
Now, we use hash table to maintain intx irq mapping by using
the key generated from sid. So once the entry is added,we can
not update source ide any more. Otherwise, we can't locate the
entry with the key generated from new source ide.

For source id change, remove_remapping/add_remapping is used
instead of update source id directly if entry was added already.

Tracked-On: #5640
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-01-12 15:23:44 +08:00
Jie Deng
8aebf5526f hv: move split-lock logic into dedicated file
This patch move the split-lock logic into dedicated file
to reduce LOC. This may make the logic more clear.

Tracked-On: #5605
Signed-off-by: Jie Deng <jie.deng@intel.com>
2021-01-08 17:37:20 +08:00
Jie Deng
27d5711b62 hv: add a cache register for VMX_PROC_VM_EXEC_CONTROLS
This patch adds a cache register for VMX_PROC_VM_EXEC_CONTROLS
to avoid the frequent VMCS access.

Tracked-On: #5605
Signed-off-by: Jie Deng <jie.deng@intel.com>
2021-01-08 17:37:20 +08:00
Jie Deng
f291997811 hv: split-lock: using MTF instead of TF(#DB)
The TF is visible to guest which may be modified by
the guest, so it is not a safe method to emulate the
split-lock. While MTF is specifically designed for
single-stepping in x86/Intel hardware virtualization
VT-x technology which is invisible to the guest. Use MTF
to single step the VCPU during the emulation of split lock.

Tracked-On: #5605
Signed-off-by: Jie Deng <jie.deng@intel.com>
2021-01-08 17:37:20 +08:00
Jie Deng
6852438e3a hv: Support concurrent split-lock emulation on SMP.
For a SMP guest, split-lock check may happen on
multiple vCPUs simultaneously. In this case, one
vCPU at most can be allowed running in the
split-lock emulation window. And if the vCPU is
doing the emulation, it should never be blocked
in the hypervisor, it should go back to the guest
to execute the lock instruction immediately and
trap back to the hypervisor with #DB to complete the
split-lock emulation.

Tracked-On: #5605
Signed-off-by: Jie Deng <jie.deng@intel.com>
2021-01-08 17:37:20 +08:00
Li Fei1
0b18389d95 hv: vcpuid: expose mce feature to guest
Windows64 seems only support processor which has MCE (Machine Check Error)
feature.

Tracked-On: #5638
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-01-08 17:22:34 +08:00
Jie Deng
b14c32a110 hv: Retain RIP only for fault exception.
We have trapped the #DB for split-lock emulation.
Only fault exception need RIP being retained.

Tracked-On: #5605
Signed-off-by: Jie Deng <jie.deng@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-12-31 11:12:33 +08:00
Jie Deng
977e862192 hv: Add split-lock emulation for xchg
xchg may also cause the #AC for split-lock check.
This patch adds this emulation.

 1. Kick other vcpus of the guest to stop execution
    if the guest has more than one vcpu.

 2. Emulate the xchg instruction.

 3. Notify other vcpus (if any) to restart execution.

Tracked-On: #5605
Signed-off-by: Jie Deng <jie.deng@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-12-31 11:12:33 +08:00
Jie Deng
47e193a7bb hv: Add split-lock emulation for LOCK prefix instruction
This patch adds the split-lock emulation.
If a #AC is caused by instruction with LOCK prefix then
emulate it, otherwise, inject it back as it used to be.

 1. Kick other vcpus of the guest to stop execution
    and set the TF flag to have #DB if the guest has more
    than one vcpu.

 2. Skip over the LOCK prefix and resume the current
    vcpu back to guest for execution.

 3. Notify other vcpus to restart exception at the end
    of handling the #DB since we have completed
    the LOCK prefix instruction emulation.

Tracked-On: #5605
Signed-off-by: Jie Deng <jie.deng@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-12-31 11:12:33 +08:00
Yonghua Huang
643bbcfe34 hv: check the availability of guest CR4 features
Check hardware support for all features in CR4,
 and hide bits from guest by vcpuid if they're not supported
 for guests OS.

Tracked-On: #5586
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-12-18 11:21:22 +08:00
Yonghua Huang
442fc30117 hv: refine virtualization flow for cr0 and cr4
- The current code to virtualize CR0/CR4 is not
   well designed, and hard to read.
   This patch reshuffle the logic to make it clear
   and classify those bits into PASSTHRU,
   TRAP_AND_PASSTHRU, TRAP_AND_EMULATE & reserved bits.

Tracked-On: #5586
Signed-off-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
2020-12-18 11:21:22 +08:00
Yonghua Huang
08c42f91c9 hv: rename hypercall for hv-emulated device management
Coding style cleanup, use add/remove instead of create/destroy.

Tracked-On: #5586
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
2020-12-07 16:25:17 +08:00
Shiqing Gao
6f10bd00bf hv: coding style clean-up related to Boolean
While following two styles are both correct, the 2nd one is simpler.
	bool is_level_triggered;
	1. if (is_level_triggered == true) {...}
	2. if (is_level_triggered) {...}

This patch cleans up the style in hypervisor.

Tracked-On: #861
Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
2020-11-28 14:51:32 +08:00
Junming Liu
1cd932e568 hv: refine code style
refine code style

Tracked-On: #4020

Signed-off-by: Junming Liu <junming.liu@intel.com>
2020-11-26 12:56:28 +08:00
Junming Liu
56eb859ea4 hv: vmexit: refine xsetbv_vmexit_handler API
From SDM Vol.2C - XSETBV instruction description,
If CR4.OSXSAVE[bit 18] = 0,
execute "XSETBV" instruction will generate #UD exception.

From SDM Vol.3C 25.1.1,#UD exception has priority over VM exits,
So if vCPU execute "XSETBV" instruction when CR4.OSXSAVE[bit 18] = 0,
VM exits won't happen.

While hv inject #GP if vCPU execute "XSETBV" instruction
when CR4.OSXSAVE[bit 18] = 0.
It's a wrong behavior, this patch will fix the bug.

Tracked-On: #4020

Signed-off-by: Junming Liu <junming.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-11-26 12:56:28 +08:00
Peter Fang
68dc8d9f8f hv: pm: avoid duplicate shutdowns on RTVM
It is possible for more than one vCPUs to trigger shutdown on an RTVM.
We need to avoid entering VM_READY_TO_POWEROFF state again after the
RTVM has been paused or shut down.

Also, make sure an RTVM enters VM_READY_TO_POWEROFF state before it can
be paused.

v1 -> v2:
- rename to poweroff_if_rt_vm for better clarity

Tracked-On: #5411
Signed-off-by: Peter Fang <peter.fang@intel.com>
2020-11-11 14:05:39 +08:00
dongshen
ca5683f78d hv: add support for shutdown for pre-launched VMs
Currently, ACRN only support shutdown when triple fault happens, because ACRN
doesn't present/emulate a virtual HW, i.e. port IO, to support shutdown. This
patch emulate a virtual shutdown component, and the vACPI method for guest OS
to use.

Pre-launched VM uses ACPI reduced HW mode, intercept the virtual sleep control/status
registers for pre-launched VMs shutdown

Tracked-On: #5411
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
2020-11-04 10:33:31 +08:00
dongshen
8f79ceefbd hv: fix out-of-date comments related to pre-launched VMs rebooting
Like post-launched VMs, for pre-launched VMs, the ACPI reset register
is also fixed at 0xcf9 and the reset value is 0xE, so pre-launched VMs
now also use ACPI reset register for rebooting.

Tracked-On: #5411
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
2020-11-04 10:33:31 +08:00
Peter Fang
70b1218952 hv: pm: support shutting down multiple VMs when pCPUs are shared
More than one VM may request shutdown on the same pCPU before
shutdown_vm_from_idle() is called in the idle thread when pCPUs are
shared among VMs.

Use a per-pCPU bitmap to store all the VMIDs requesting shutdown.

v1 -> v2:
- use vm_lock to avoid a race on shutdown

Tracked-On: #5411
Signed-off-by: Peter Fang <peter.fang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-11-04 10:33:31 +08:00
Li Fei1
c6f9404f55 hv: psram: add kconfig to enable psram
Add two Kconfig pSRAM config:
one for whether to enable the pSRAM on the platfrom or not;
another for if the pSRAM is enabled on the platform whether to enable
the pSRAM in the pre-launched RTVM.
If we enable the pSRAM on the platform, we should remove the pSRAM EPT
mapping from the SOS to prevent it could flush the pSRAM cache.

Tracked-On: #5330
Signed-off-by: Qian Wang <qian1.wang@intel.com>
2020-11-02 15:56:30 +08:00
Qian Wang
99ee76781f hv: pSRAM: add pSRAM support for pre-launched RTVM
1.Modified the virtual e820 table for pre-launched VM. We added a
segment for pSRAM, and thus lowmem RAM is split into two parts.
Logics are added to deal with the split.
2.Added EPT mapping of pSRAM segment for pre-launched RTVM if it
uses pSRAM.

Tracked-On: #5330
Signed-off-by: Qian Wang <qian1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-11-02 15:56:30 +08:00
Qian Wang
a557105e71 hv: ept: set EPT cache attribute to WB for pSRAM
pSRAM memory should be cachable. However, it's not a RAM or a normal MMIO,
so we can't use the an exist API to do the EPT mapping and set the EPT cache
attribute to WB for it. Now we assume that SOS must assign the PSRAM area as
a whole and as a separate memory region whose base address is PSRAM_BASE_HPA.
If the hpa of the EPT mapping region is equal to PSRAM_BASE_HPA, we think this
EPT mapping is for pSRAM, we change the EPT mapping cache attribute to WB.

And fix a minor bug when SOS trap out to emulate wbinvd when pSRAM is enabled.

Tracked-On: #5330
Signed-off-by: Qian Wang <qian1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-11-02 15:56:30 +08:00
Qian Wang
ca2aee225c hv: skip pSRAM for guest WBINVD emulation
Use ept_flush_leaf_page to emulate guest WBINVD when PTCM is enabled and skip
the pSRAM in ept_flush_leaf_page.
TODO: do we need to emulate WBINVD in HV side.

Tracked-On: #5330
Signed-off-by: Qian Wang <qian1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-11-02 10:29:43 +08:00
Li Fei1
f3067f5385 hv: mmu: rename hv_access_memory_region_update to ppt_clear_user_bit
Rename hv_access_memory_region_update to ppt_clear_user_bit to
verb + object style.

Tracked-On: #5330
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-11-02 10:29:43 +08:00
Li Fei1
35abee60d6 hv: pSRAM: temporarily remove NX bit of PTCM binary
Temporarily remove NX bit of PTCM binary in pagetable during pSRAM
initialization:
1.added a function ppt_set_nx_bit to temporarily remove/restore the NX bit of
a given area in pagetable.
2.Temporarily remove NX bit of PTCM binary during pSRAM initialization to make
PTCM codes executable.
3. TODO: We may use SMP call to flush TLB and do pSRAM initilization on APs.

Tracked-On: #5330
Signed-off-by: Qian Wang <qian1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-11-02 10:29:43 +08:00
Li Fei1
5fa816f921 hv: pSRAM: add PTCT parsing code
The added parse_ptct function will parse native ACPI PTCT table to
acquire information like pSRAM location/size/level and PTCM location,
and save them.

Tracked-On: #5330
Signed-off-by: Qian Wang <qian1.wang@intel.com>
2020-11-02 10:29:43 +08:00
Li Fei1
80121b8347 hv: pSRAM: add pSRAM initialization codes
1.We added a function init_psram to initialize pSRAM as well as some definitions.
Both AP and BSP shall call init_psram to make sure pSRAM is initialized, which is
required by PTCM.

BSP:
  To parse PTCT and find the entry of PTCM command function, then call PTCM ABI.
AP:
  Wait until BSP has done the parsing work, then call the PTCM ABI.

Synchronization of AP and BSP is ensured, both inside and outside PTCM.

2. Added calls of init_psram in init_pcpu_post to initialize pSRAM in HV booting phase

Tracked-On: #5330
Signed-off-by: Qian Wang <qian1.wang@intel.com>
2020-11-02 10:29:43 +08:00
Qian Wang
77269c15c5 hv: vcr: remove wbinvd for CR0.CD emulation
According 11.5.1 Cache Control Registers and Bits, Intel SDM Vol 3,
change CR0.CD will not flush cache to insure memory coherency. So
it's not needed to call wbinvd to flush cache in ACRN Hypervisor.
That's what the guest should do.

Tracked-On: #5330
Signed-off-by: Qian Wang <qian1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-11-02 10:29:43 +08:00
Tao Yuhong
996e8f680c HV: pci-vuart support create vdev hcall
Add cteate method for vmcs9900 vdev in hypercalls.

The destroy method of ivshmem is also suitable for other emulated vdev,
move it into hcall_destroy_vdev() for all emulated vdevs

Tracked-On: #5394
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
2020-10-30 20:41:34 +08:00
Tao Yuhong
4120bd391a HV: decouple legacy vuart interface from acrn_vuart layer
support pci-vuart type, and refine:
1.Rename init_vuart() to init_legacy_vuarts(), only init PIO type.
2.Rename deinit_vuart() to deinit_legacy_vuarts(), only deinit PIO type.
3.Move io handler code out of setup_vuart(), into init_legacy_vuarts()
4.add init_pci_vuart(), deinit_pci_vuart, for one pci vuart vdev.

and some change from requirement:
1.Increase MAX_VUART_NUM_PER_VM to 8.

Tracked-On: #5394
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
2020-10-30 20:41:34 +08:00
Yang, Yu-chu
8c78590da7 acrn-config: refactor pci_dev_c.py and insert vuart device information
- Refactor pci_dev_c.py to insert devices information per VMs
- Add function to get unused vbdf form bus:dev.func 00:00.0 to 00:1F.7

Add pci devices variables to vm_configurations.c
- To pass the pci vuart information form tool, add pci_dev_num and
pci_devs initialization by tool
- Change CONFIG_SOS_VM in hypervisor/include/arch/x86/vm_config.h to
compromise vm_configurations.c

Tracked-On: #5426
Signed-off-by: Yang, Yu-chu <yu-chu.yang@intel.com>
2020-10-30 20:24:28 +08:00
Zide Chen
a776ccca94 hv: don't need to save boot context
- Since de-privilege boot is removed, we no longer need to save boot
  context in boot time.
- cpu_primary_start_64 is not an entry for ACRN hypervisor any more,
  and can be removed.

Tracked-On: #5197
Signed-off-by: Zide Chen <zide.chen@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2020-10-29 10:05:05 +08:00
Yonghua Huang
3ea1ae1e11 hv: refine msi interrupt injection functions
1. refine the prototype of 'inject_msi_lapic_pt()'
 2. rename below function:
    - rename 'vlapic_intr_msi()' to 'vlapic_inject_msi()'
    - rename 'inject_msi_lapic_pt()' to
      'inject_msi_for_lapic_pt()'
    - rename 'inject_msi_lapic_virt()' to
      'inject_msi_for_non_lapic_pt()'

Tracked-On: #5407
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Li Fei <fei1.li@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-10-26 08:44:13 +08:00
Yonghua Huang
012927d0bd hv: move function 'inject_msi_lapic_pt()' to vlapic.c
This function can be used by other modules instead of hypercall
 handling only, hence move it to vlapic.c

Tracked-On: #5407
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Li, Fei <fei1.li@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-10-26 08:44:13 +08:00
Zide Chen
9f2b35507a hv: remove UEFI_OS_LOADER_NAME from Kconfig
Since UEFI boot is no longer supported.

Tracked-On: #5197
Signed-off-by: Zide Chen <zide.chen@intel.com>
2020-10-21 15:09:26 +08:00
Zide Chen
bebffb29fc hv: remove de-privilege boot mode support and remove vboot wrappers
Now ACRN supports direct boot mode, which could be SBL/ABL, or GRUB boot.
Thus the vboot wrapper layer can be removed and the direct boot functions
don't need to be wrapped in direct_boot.c:

- remove call to init_vboot(), and call e820_alloc_memory() directly at the
  time when the trampoline buffer is actually needed.
- Similarly, call CPU_IRQ_ENABLE() instead of the wrapper init_vboot_irq().
- remove get_ap_trampoline_buf(), since the existing function
  get_trampoline_start16_paddr() returns the exact same value.
- merge init_general_vm_boot_info() into init_vm_boot_info().
- remove vm_sw_loader pointer, and call direct_boot_sw_loader() directly.
- move get_rsdp_ptr() from vboot_wrapper.c to multiboot.c, and remove the
  wrapper over two boot modes.

Tracked-On: #5197
Signed-off-by: Zide Chen <zide.chen@intel.com>
2020-10-21 15:09:26 +08:00
Shuang Zheng
13d39fda85 hv: update hybrid_rt with 2 post-launched VMs in Kconfig
update the help message of config SCENARIO to set 2 standard
post-launched VMs for default hybrid_rt scenario in Kconfig.

Tracked-On: #5390

Signed-off-by: Shuang Zheng <shuang.zheng@intel.com>
Acked-by: Victor Sun <victor.sun@intel.com>
2020-10-14 14:00:45 +08:00
Victor Sun
c63899fc81 HV: correct hpa calculation for pre-launched VM
The commit of da81a0041d
"HV: add e820 ACPI entry for pre-launched VM" introduced a issue that the
base_hpa and remaining_hpa_size are also calculated on the entry of 32bit
PCI hole which from 0x80000000 to 0xffffffff, which is incorrect;

Tracked-On: #5266

Signed-off-by: Victor Sun <victor.sun@intel.com>
2020-09-15 09:45:10 +08:00
Li Fei1
a2fd8c5a9d pci: mcfg: limit device bus numbers which could access by ECAM
Per PCI Firmware Specification Revision 3.0, 4.1.2. MCFG Table Description:
Memory Mapped Enhanced Configuration Space Base Address Allocation Structure
assign the Start Bus Number and the End Bus Number which could decoded by the
Host Bridge. We should not access the PCI device which bus number outside of
the range of [Start Bus Number, End Bus Number).
For ACRN,  we should:
1. Don't detect PCI device which bus number outside the range of
[Start Bus Number, End Bus Number) of MCFG ACPI Table.
2. Only trap the ECAM MMIO size: [MMCFG_BASE_ADDRESS, MMCFG_BASE_ADDRESS +
(End Bus Number - Start Bus Number + 1) * 0x100000) for SOS.

Tracked-On: #5233

Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-09-09 09:31:56 +08:00
Victor Sun
2c0bc146ce HV: remove deprecated vacpi build method
The old method of build pre-launched VM vacpi by HV source code is deprecated,
so remove related source code;

Tracked-On: #5266

Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-09-08 19:52:25 +08:00
Victor Sun
34547e1e19 HV: add acpi module support for pre-launched VM
Previously we use a pre-defined structure as vACPI table for pre-launched
VM, the structure is initialized by HV code. Now change the method to use a
pre-loaded multiboot module instead. The module file will be generated by
acrn-config tool and loaded to GPA 0x7ff00000, a hardcoded RSDP table at
GPA 0x000f2400 will point to the XSDT table which at GPA 0x7ff00080;

Tracked-On: #5266

Signed-off-by: Victor Sun <victor.sun@intel.com>
Signed-off-by: Shuang Zheng <shuang.zheng@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-09-08 19:52:25 +08:00
Victor Sun
da81a0041d HV: add e820 ACPI entry for pre-launched VM
Previously the ACPI table was stored in F segment which might not be big
enough for a customized ACPI table, hence reserve 1MB space in pre-launched
VM e820 table to store the ACPI related data:

	0x7ff00000 ~ 0x7ffeffff : ACPI Reclaim memory
	0x7fff0000 ~ 0x7fffffff : ACPI NVS memory

Tracked-On: #5266

Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-09-08 19:52:25 +08:00
Victor Sun
0461ac209f HV: set CONFIG_HV_RAM_START as min addr when RELOC enabled
Previously the min load_addr for HV image is hard coded to 0x10000000 when
CONFIG_RELOC is enabled, now use CONFIG_HV_RAM_START as its prefer minimum
address like setting of CONFIG_PHYSICAL_START do in Linux kernel.

With this patch, we can offload the CONFIG_HV_RAM_START algorithm to
acrn-config or manually set it in scenario XML on some special boards.

Tracked-On: #5275

Signed-off-by: Victor Sun <victor.sun@intel.com>
2020-09-07 15:03:53 +08:00
Nishioka, Toshiki
77fb21e98c hv: add vgpio device model support
When HV pass through the P2SB MMIO device to pre-launched VM, vgpio
device model traps MMIO access to the GPIO registers within P2SB so
that it can expose virtual IOAPIC pins to the VM in accordance with
the programmed mappings between gsi and vgsi.

Tracked-On: #5246

Signed-off-by: Toshiki Nishioka <toshiki.nishioka@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-09-07 14:52:02 +08:00
Nishioka, Toshiki
ba99984f69 hv: add INTx mapping for pre-launched VMs
Add the capability of forwarding specified physical IOAPIC interrupt
lines to pre-launched VMs as virtual IOAPIC interrupts. This is for the
sake of the certain MMIO pass-thru devices on EHL CRB which can support
only INTx interrupts.

Tracked-On: #5245

Signed-off-by: Toshiki Nishioka <toshiki.nishioka@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-09-07 14:52:02 +08:00
Shuo A Liu
d6b9682581 hv: debug: Convert PCI UART paramter from a BDF string to a hex value
BDF string can be parsed by the configuration tool. A 16bit WORD value with
format (B:8, D:5, F:3) can be passed from configuration to the
hypervisor directly to save some BDF string parse code.

Tracked-On: #4937
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-09-01 15:13:53 +08:00
Yonghua Huang
c03623f3fb hv[v2]: Remove deprecated term in vPIC submodule
This patch cleanup below deprecated terms:
  'master' -> 'primary'
  'slave'  -> 'secondary'

v2 update:
      Refine comments.

Tracked-On: #5249
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
2020-09-01 09:30:08 +08:00
Stanley Chang
d55813e80b hv: passthru DHRD-ignored device
When trying to passthru a DHRD-ignored PCI device,
iommu_attach_device shall report success. Otherwise,
the assign_vdev_pt_iommu_domain will result in HV panic.
Same for iommu_detach_device case.

Tracked-On: #5240
Signed-off-by: Stanley Chang <stanley.chang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-09-01 09:29:25 +08:00
Yuan Liu
8a34cf03ca hv: add new hypercalls to create and destroy an emulated device in hypervisor
Add HC_CREATE_VDEV and HC_DESTROY_VDEV two hypercalls that are used to
create and destroy an emulated device(PCI device or legacy device) in hypervisor

v3: 1) change HC_CREATE_DEVICE and HC_DESTROY_DEVICE to HC_CREATE_VDEV
       and HC_DESTROY_VDEV
    2) refine code style

v4: 1) remove unnecessary parameter
    2) add VM state check for HC_CREATE_VDEV and HC_DESTROY hypercalls

Tracked-On: #4853

Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-08-28 16:53:12 +08:00
Wei Liu
29ac258134 acrn-config: code refactoring for CAT/MBA
1.Modify clos_mask and mba_delay as a member of the union type.
2.Move HV_SUPPORTED_MAX_CLOS ,MAX_CACHE_CLOS_NUM_ENTRIES and
MAX_MBA_CLOS_NUM_ENTRIES to misc_cfg.h file.

Tracked-On: #5229
Signed-off-by: Wei Liu <weix.w.liu@intel.com>
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
2020-08-28 16:44:06 +08:00
dongshen
a425730f64 acrn-config: rename MAX_PLATFORM_CLOS_NUM to HV_SUPPORTED_MAX_CLOS
HV_SUPPORTED_MAX_CLOS:
 This value represents the maximum CLOS that is allowed by ACRN hypervisor.
 This value is set to be least common Max CLOS (CPUID.(EAX=0x10,ECX=ResID):EDX[15:0])
 among all supported RDT resources in the platform. In other words, it is
 min(maximum CLOS of L2, L3 and MBA). This is done in order to have consistent
 CLOS allocations between all the RDT resources.

Tracked-On: #5229
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
2020-08-28 16:44:06 +08:00
Yin Fengwei
d0e06c4f80 hv: debug: Enable MMIO UART support
New board, EHL CRB, does not have legacy port IO UART. Even the PCI UART
are not work due to BIOS's bug workaround(the BARs on LPSS PCI are reset
after BIOS hand over control to OS). For ACRN console usage, expose the
debug UART via ACPI PnP device (access by MMIO) and add support in
hypervisor debug code.

Another special thing is that register width of UART of EHL CRB is
1byte. Introduce reg_width for each struct console_uart.

Tracked-On: #4937
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
2020-08-27 13:31:17 +08:00
Mingqiang Chi
53b11d1048 refine hypercall
-- use an array to fast locate the hypercall handler
   to replace switch case.
-- uniform hypercall handler as below:
   int32_t (*handler)(sos_vm, target_vm, param1, param2)

Tracked-On: #4958
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
2020-08-26 14:55:24 +08:00
Geoffroy Van Cutsem
f8883f43e9 hv: enhance help text for the scenario option in Kconfig
Enhance the help text that accompanies the CONFIG_SCENARIO symbol in Kconfig

Tracked-On: #5203
Signed-off-by: Geoffroy Van Cutsem <geoffroy.vancutsem@intel.com>
2020-08-26 08:51:50 +08:00
Yuan Liu
d6f563c4eb hv: implement ivshmem memory regions initialization
The ivshmem memory regions use the memory of the hypervisor and
they are continuous and page aligned.

this patch is used to initialize each memory region hpa.

v2: 1) if CONFIG_IVSHMEM_SHARED_MEMORY_ENABLED is not defined, the
       entire code of ivshmem will not be compiled.
    2) change ivshmem shared memory unit from byte to page to avoid
       misconfiguration.
    3) add ivshmem configuration and vm configuration references

v3: 1) change CONFIG_IVSHMEM_SHARED_MEMORY_ENABLED to CONFIG_IVSHMEM_ENABLED
    2) remove the ivshmem configuration sample, offline tool provides default
       ivshmem configuration.
    3) refine code style.

v4: 1) make ivshmem_base 2M aligned.

Tracked-On: #4853

Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-08-19 15:06:15 +08:00
Junming Liu
3631a85c3c hv:cpu-caps:refine is_apl_platform func and clean up duplicated code
Fix the bug for "is_apl_platform" func.
"monitor_cap_buggy" is identical to "is_apl_platform", so remove it.

On apl platform:
1) ACRN doesn't use monitor/mwait instructions
2) ACRN disable GPU IOMMU

Tracked-On:#3675

Signed-off-by: Junming Liu <junming.liu@intel.com>
2020-08-14 10:08:50 +08:00
liujunming
538e7cf74d hv:cpu-caps:refine processor family and model info
v3 -> v4:
Refine commit message and code stype

1.
SDM Vol. 2A 3-211 states DisplayFamily = Extended_Family_ID + Family_ID
when Family_ID == 0FH.
So it should be family += ((eax >> 20U) & 0xffU) when Family_ID == 0FH.

2.
IF (Family_ID = 06H or Family_ID = 0FH)
	THEN DisplayModel = (Extended_Model_ID « 4) + Model_ID;
While previous code this logic:
IF (DisplayFamily = 06H or DisplayFamily = 0FH)

Fix the bug about calculation of display family and
display model according to SDM definition.

3. use variable name to distinguish Family ID/Display Family/Model ID/Display Model,
then the code is more clear to avoid some mistake

Tracked-On:#3675

Signed-off-by: liujunming <junming.liu@intel.com>
Reviewed-by: Wu Xiangyang <xiangyang.wu@linux.intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-08-14 10:08:50 +08:00
Victor Sun
b5dfe369da HV: move vm configuration check to pre-build time
This patch will move the VM configuration check to pre-build stage,
a test program will do the check for pre-defined VM configuration
data before making hypervisor binary. If test failed, the make
process will be aborted. So once the hypervisor binary is built
successfully or start to run, it means the VM configuration has
been sanitized.

The patch did not add any new VM configuration check function,
it just port the original sanitize_vm_config() function from cpu.c
to static_checks.c with below change:
  1. remove runtime rdt detection for clos check;
  2. replace pr_err() from logmsg.h with printf() from stdio.h;
  3. replace runtime call get_pcpu_nums() in ALL_CPUS_MASK macro
     with static defined MAX_PCPU_NUM;
  4. remove cpu_affinity check since pre-launched VM might share
     pcpu with SOS VM;

The BOARD/SCENARIO parameter check and configuration folder check is
also moved to prebuild Makefile.

Tracked-On: #5077

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-08-12 10:21:17 +08:00
Victor Sun
8245145317 HV: remove sanitize_vm_config function
Remove function of sanitize_vm_config() since the processing of sanitizing
will be moved to pre-build process.

When hypervisor has booted, we assume all VM configurations is sanitized;

Tracked-On: #5077

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-08-12 10:21:17 +08:00
Mingqiang Chi
a67a85c70d hv:refine vm & vcpu lock
-- move vm_state_lock to other place in vm structure
   to avoid the memory waste because of the page-aligned.
-- remove the memset from create_vm
-- explicitly set max_emul_mmio_regions and vcpuid_entry_nr to 0
   inside create_vm to avoid use without initialization.
-- rename max_emul_mmio_regions to nr_emul_mmio_regions
v1->v2:
   add deinit_emul_io in shutdown_vm

Tracked-On: #4958
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Grandhi, Sainath <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-08-05 13:39:28 +08:00
Victor Sun
b9ad04d24d HV: add cpu affinity info for SOS VM
Previously the CPU affinity of SOS VM is initialized at runtime during
sanitize_vm_config() stage, follow the policy that all physical CPUs
except ocuppied by Pre-launched VMs are all belong to SOS_VM. Now change
the process that SOS CPU affinity should be initialized at build time
and has the assumption that its validity is guarenteed before runtime.

Tracked-On: #5077

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-08-04 09:05:29 +08:00
Victor Sun
3acafd140f HV: Make: simplify acpi info header file check
Previously we have complicated check mechanism on platform_acpi_info.h which
is supposed to be generated by acrn-config tool, but given the reality that
all configurations should be generated by acrn-config before build acrn
hypervisor, this check is not needed anymore.

Tracked-On: #5077

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-08-04 09:05:29 +08:00
Victor Sun
07ad37f436 HV: Make: remove sdc scenario build support
The SDC scenario configurations will not be validated so remove it from
build makefile;

Tracked-On: #5077

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-08-04 09:05:29 +08:00
Victor Sun
62c87856ce HV: remove deprecated old layout configuration source
The old layout configuration source which located in:
hypervisor/arch/x86/configs/ is abandoned, remove it;

Tracked-On: #5077

Signed-off-by: Victor Sun <victor.sun@intel.com>
2020-07-24 16:16:06 +08:00
Victor Sun
a57a4fd7fb HV: Make: enable build for new configs layout
The make command is same as old configs layout:

under acrn-hypervisor folder:
	make hypervisor BOARD=xxx SCENARIO=xxx [TARGET_DIR]=xxx [RELEASE=x]

under hypervisor folder:
	make BOARD=xxx SCENARIO=xxx [TARGET_DIR]=xxx [RELEASE=x]

if BOARD/SCENARIO parameter is not specified, the default will be:
	BOARD=nuc7i7dnb SCENARIO=industry

Tracked-On: #5077

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-07-24 16:16:06 +08:00
Victor Sun
e792fa3d3c HV: nuc7i7dnb example of new VM configuratons layout
There are 3 kinds of configurations in ACRN hypervisor source code: hypervisor
overall setting, per-board setting and scenario specific per-VM setting.
Currently Kconfig act as hypervisor overall setting and its souce is located at
"hypervisor/arch/x86/configs/$(BOARD).config"; Per-board configs are located at
"hypervisor/arch/x86/configs/$(BOARD)" folder; scenario specific per-VM configs
are located at "hypervisor/scenarios/$(SCENARIO)" folder.

This layout brings issues that board configs and VM configs are coupled tightly.
The board specific Kconfig file and misc_cfg.h are shared by all scenarios, and
scenario specific pci_dev.c is shared by all boards. So the user have no way to
build hypervisor binary for different scenario on different board with one
source code repo.

The patch will setup a new VM configurations layout as below:

  misc/vm_configs
  ├── boards                         --> folder of supported boards
  │   ├── <board_1>                  --> scenario-irrelevant board configs
  │   │   ├── board.c                --> C file of board configs
  │   │   ├── board_info.h           --> H file of board info
  │   │   ├── pci_devices.h          --> pBDF of PCI devices
  │   │   └── platform_acpi_info.h   --> native ACPI info
  │   ├── <board_2>
  │   ├── <board_3>
  │   └── <board...>
  └── scenarios                      --> folder of supported scenarios
      ├── <scenario_1>               --> scenario specific VM configs
      │   ├── <board_1>              --> board specific VM configs for <scenario_1>
      │   │   ├── <board_1>.config   --> Kconfig for specific scenario on specific board
      │   │   ├── misc_cfg.h         --> H file of board specific VM configs
      │   │   ├── pci_dev.c          --> board specific VM pci devices list
      │   │   └── vbar_base.h        --> vBAR base info of VM PT pci devices
      │   ├── <board_2>
      │   ├── <board_3>
      │   ├── <board...>
      │   ├── vm_configurations.c    --> C file of scenario specific VM configs
      │   └── vm_configurations.h    --> H file of scenario specific VM configs
      ├── <scenario_2>
      ├── <scenario_3>
      └── <scenario...>

The new layout would decouple board configs and VM configs completely:

The boards folder stores kinds of supported boards info, each board folder
stores scenario-irrelevant board configs only, which could be totally got from
a physical platform and works for all scenarios;

The scenarios folder stores VM configs of kinds of working scenario. In each
scenario folder, besides the generic scenario specific VM configs, the board
specific VM configs would be put in a embedded board folder.

In new layout, all configs files will be removed out of hypervisor folder and
moved to a separate folder. This would make hypervisor LoC calculation more
precisely with below fomula:
	typical LoC = Loc(hypervisor) + Loc(one vm_configs)
which
	Loc(one vm_configs) = Loc(misc/vm_configs/boards/<board>)
		+ LoC(misc/vm_configs/scenarios/<scenario>/<board>)
		+ Loc(misc/vm_configs/scenarios/<scenario>/vm_configurations.c
		+ Loc(misc/vm_configs/scenarios/<scenario>/vm_configurations.h

Tracked-On: #5077

Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-07-24 16:16:06 +08:00
Shuo A Liu
112f02851c hv: Disable XSAVE-managed CET state of guest VM
To hide CET feature from guest VM completely, the MSR IA32_MSR_XSS also
need to be intercepted because it comprises CET_U and CET_S feature bits
of xsave/xstors operations. Mask these two bits in IA32_MSR_XSS writing.

With IA32_MSR_XSS interception, member 'xss' of 'struct ext_context' can
be removed because it is duplicated with the MSR store array
'vcpu->arch.guest_msrs[]'.

Tracked-On: #5074
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2020-07-23 20:15:57 +08:00
Shuo A Liu
ac598b0856 hv: Hide CET feature from guest VM
Return-oriented programming (ROP), and similarly CALL/JMP-oriented
programming (COP/JOP), have been the prevalent attack methodologies for
stealth exploit writers targeting vulnerabilities in programs.

CET (Control-flow Enforcement Technology) provides the following
capabilities to defend against ROP/COP/JOP style control-flow subversion
attacks:
 * Shadow stack: Return address protection to defend against ROP.
 * Indirect branch tracking: Free branch protection to defend against
   COP/JOP

The full support of CET for Linux kernel has not been merged yet. As the
first stage, hide CET from guest VM.

Tracked-On: #5074
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2020-07-23 20:15:57 +08:00
Li Fei1
5e605e0daf hv: vmcall: check vm id in dispatch_sos_hypercall
Check whether vm_id is valid in dispatch_sos_hypercall

Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2020-07-23 20:13:20 +08:00
Li Fei1
1859727abc hv: vapci: add tpm2 support for pre-launched vm
On WHL platform, we need to pass through TPM to Secure pre-launched VM. In order
to do this, we need to add TPM2 ACPI Table and add TPM DSDT ACPI table to include
the _CRS.

Now we only support the TPM 2.0 device (TPM 1.2 device is not support). Besides,
the TPM must use Start Method 7 (Uses the Command Response Buffer Interface)
to notify the TPM 2.0 device that a command is available for processing.

Tracked-On: #5053
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2020-07-23 20:13:20 +08:00
Li Fei1
7971f34344 hv: vapci: refine acpi table header initialization
Using ACPI_TABLE_HEADER MACRO to initial the ACPI Table Header.

Tracked-On: #5053
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2020-07-23 20:13:20 +08:00
Li Fei1
acc69007e2 hv: mmio_dev: add mmio device pass through support
Add mmio device pass through support for pre-launched VM.
When we pass through a MMIO device to pre-launched VM, we would remove its
resource from the SOS. Now these resources only include the MMIO regions.

Tracked-On: #5053
Acked-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2020-07-23 20:13:20 +08:00
Li Fei1
baf77a79ad hv: mmio_dev: add hypercall to support mmio device pass through
Add two hypercalls to support MMIO device pass through for post-launched VM.
And when we support MMIO pass through for pre-launched VM, we could re-use
the code in mmio_dev.c

Tracked-On: #5053
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2020-07-23 20:13:20 +08:00
Conghui Chen
821c65b40c hv: fix possible SSE region mismatch issue
During context switch in hypervisor, xsave/xrstore are used to
save/resotre the XSAVE area according to the XCR0 and XSS. The legacy
region in XSAVE area include FPU and SSE, we should make sure the
legacy region be saved during contex switch. FPU in XCR0 is always
enabled according to SDM.
For SSE, we enable it in XCR0 during context switch.

Tracked-On: #5062
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-07-22 14:19:21 +08:00
Conghui Chen
53d4a7169b hv: remove kick_thread from scheduler module
kick_thread function is only used by kick_vcpu to kick vcpu out of
non-root mode, the implementation in it is sending IPI to target CPU if
target obj is running and target PCPU is not current one; while for
runnable obj, it will just make reschedule request. So the kick_thread
is not actually belong to scheduler module, we can drop it and just do
the cpu notification in kick_vcpu.

Tracked-On: #5057
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Reviewed-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-07-22 13:38:41 +08:00
Conghui Chen
b6422f8985 hv: remove 'running' from vcpu structure
vcpu->running is duplicated with THREAD_STS_RUNNING status of thread
object. Introduce an API sleep_thread_sync(), which can utilize the
inner status of thread object, to do the sync sleep for zombie_vcpu().

Tracked-On: #5057
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Reviewed-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-07-22 13:38:41 +08:00
Mingqiang Chi
aa89eb3541 hv:add per-vm lock for vm & vcpu state change
-- replace global hypercall lock with per-vm lock
-- add spinlock protection for vm & vcpu state change

v1-->v2:
   change get_vm_lock/put_vm_lock parameter from vm_id to vm
   move lock obtain before vm state check
   move all lock from vmcall.c to hypercall.c

Tracked-On: #4958
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-07-20 11:22:17 +08:00
Yin Fengwei
fcec5a94be kconfig: extend the max msix table number to 64
There are some devices (like Samsung NVMe SSD SM981/PM981 which has 33 MSIX tables)
which have more than 16 MSIX tables. Extend the default value to 64 to handle them.

Tracked-On: #4994
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
2020-07-10 19:39:11 +08:00
Li Fei1
80c7da8f1c hv: vioapic: expose ioapic to guest unconditionally
Some OSes assume the platform must have the IOAPIC. For example:
Linux Kernel allocates IRQ force from GSI (0 if there's no PIC and IOAPIC) on x86.
And it thinks IRQ 0 is an architecture special IRQ, not for device driver. As a
result, the device driver may goes wrong if the allocated IRQ is 0 for RTVM.

This patch expose vIOAPIC to RTVM with LAPIC passthru even though the RTVM can't
use IOAPIC, it servers as a place holder to fullfil the guest assumption.

After vIOAPIC has exposed to guest unconditionally, the 'ready' field could be
removed since we do vIOAPIC initialization for each guest.

Tracked-On: #4691
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-07-10 19:33:46 +08:00
Mingqiang Chi
7751c7933d hv:unify spin_lock initialization
will follow this convention for spin lock initialization:
-- for simple global variable locks, use this style:
   static spinlock_t xxx_spinlock = {.head = 0U, .tail = 0U,}
-- for the locks inside a data structure, need to call
   spinlock_init to initialize.

Tracked-On: #4958
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
2020-07-02 09:40:29 +08:00
Shuo A Liu
025df6d44c hv: use SELF IPI Register for self IPI in X2APIC mode
According to SDM 10.12.11, we can know this register is dedicated to the
purpose of sending self-IPIs with the intent of enabling a highly
optimized path for sending self-IPIs. Also sending the IPI via the Self
Interrupt Register ensures that interrupt is delivered to the processor
core. Specifically completion of the WRMSR instruction to the SELF IPI
register implies that the interrupt has been logged into the IRR.

Tracked-On: #4937
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-06-28 10:33:22 +08:00
Shuo A Liu
0397cb7174 hv: Fix the interrupts lost issue with PI support
Currently, not all platforms support posted interrupt processing of both
VT-x and VT-d. On EHL, VT-d doesn't support posted interrupt processing.
So in such scenario, is_pi_capable() in vcpu_handle_pi_notification()
will bypass the PIR pending bits check which might cause a self-NV-IPI
lost.

With commit "bf1ff8c98 (hv: Offload syncing PIR to vIRR to processor
hardware)", the syncing PIR to vIRR is postponed and it is handled by a
self-NV-IPI in the following VMEnter. The process looks like,
a) vcpu A accepts a virtual interrupt ->
   1) ACRN_REQUEST_EVENT is set
   2) corresponding bit in PIR is set
   3) Posted Interrupt ON bit is set
b) vcpu A does virtual interrupt injection on resume path due to
   the pending ACRN_REQUEST_EVENT ->
   1) hypervisor disables host interrupt
   2) ACRN_REQUEST_EVENT is cleared
   3) a self-NV-IPI is sent via ICR of LAPIC.
   4) IRR bit of the self-NV-IPI is set
c) (VM-ENTRY) vcpu A returns into non-root mode
   1) host interrupt enable(by HW)
   2) posted interrupt processing clears the ON bit, sync PIR to vIRR
   3) deliver the virtual interrupt if guest rflags.IF=1
d) (VM-EXIT) vcpu A traps due to a instruction execution (e.g. HLT)
   1) host interrupt disable(by HW)
   2) hypervisor enable host interrupt

Above illustrates a normal process of the virtual interrupt injection
with cpu PI support. However, a failing case is observed. The failing
case is that the self-NV-IPI from b-3 is not accepted by the core until
a timing between d-1 and d-2. b-4 happening between d-1 and d-2 is
observed by debug trace. So the self-NV-IPI will be handled in root-mode
which cannot do the syncing PIR to vIRR processing. Due to the bug
described in the first paragraph, vcpu_handle_pi_notification() cannot
succeed the virtual interrupt injection request. This patch fix it by
removing the wrong check in vcpu_handle_pi_notification() because
vcpu_handle_pi_notification() only happens on platform with cpu PI
support.

Here are some cost data for sending IPI via LAPIC ICR regsiter.
Normally, the cycles between ICR write and IRR got set is 140~260,
which is not accurate due to the MSR read overhead.
And from b-3 to c is about 560 cycles. So b-4 happens during this
period. But in bad case, b-4 doesn't happen even c is triggered.
The worse case i captured is that ICR write and IRR got set costs more
than 1900 cycles. Now, the best GUESS of the huge cost of IPI via ICR is
the ACPI bus arbitration(refer to SDM 10.6.3, 10.7 and Figure 10-17).

Tracked-On: #4937
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-06-28 10:33:22 +08:00
Li Fei1
da7c2ba3e9 hv: ept: wrap a function to do guest ept flush
Wrap a function to do guest ept flush. This function doesn't do real EPT flush.
It just make the EPT flush request and do the real flush just before vcpu vmenter.

Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2020-06-22 16:25:03 +08:00
Mingqiang Chi
1b84741a56 rename vm_lock/vlapic_state in VM structure
rename:
   vlapic_state-->vlapic_mode
   vm_lock -->  vlapic_mode_lock
   check_vm_vlapic_state --> check_vm_vlapic_mode

Tracked-On: #4958
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
2020-06-19 16:13:20 +08:00
Mingqiang Chi
d808031a04 remove spin lock for micro code update
remove spin lock for micro code update since the guest
operating system will take lock

Tracked-On: #4958
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
2020-06-19 16:13:20 +08:00
Mingqiang Chi
ac65898f35 cleanup spin lock in vtd.c
move dm_unit->lock into dmar_issue_qi_request

Tracked-On: #4958

Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
2020-06-19 16:13:20 +08:00
Mingqiang Chi
67a7c355ec cleanup spin lock in irq.c
-- move exception_lock to dump.c
-- optimize the lock usage in request_irq

Tracked-On: #4958
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
2020-06-19 16:13:20 +08:00
yuhong.tao@intel.com
aff687896b HV: Fix split-locked access detection is disabled by default
The commit 'HV: Config Splitlock Detection to be disable' allows
using CONFIG_ENFORCE_TURNOFF_AC to turn off splitlock #AC. If
CONFIG_ENFORCE_TURNOFF_AC is not set, splitlock #AC should be turn on

Tracked-On: #4962
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
2020-06-19 09:22:58 +08:00
Conghui Chen
2a4c59db74 hv: add check for BASIC VMX INFORMATION
Check bit 48 in IA32_VMX_BASIC MSR, if it is 1, return error, as we only
support Intel 64 architecture.

SDM:
Appendix A.1 BASIC VMX INFORMATION

Bit 48 indicates the width of the physical addresses that may be used for the
VMXON region, each VMCS, anddata structures referenced by pointers in a
VMCS (I/O bitmaps, virtual-APIC page, MSR areas for VMX transitions). If
the bit is 0, these addresses are limited to the processor’s
physical-address width.2 If the bit is 1, these addresses are limited to
32 bits. This bit is always 0 for processors that support Intel 64
architecture.

Tracked-On: #4956
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
2020-06-18 14:05:56 +08:00
Conghui Chen
906284eec8 hv: remove unnecessary debug symbols
remove unnecessary debug symbols.

Tracked-On: #4956
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
2020-06-18 14:05:56 +08:00
Conghui Chen
f4292752b0 hv: remove check for OSXSAVE in host
We always assume the physical platform has XSAVE, and we always enable
XSAVE at the beginning, so, no need to check the OXSAVE in host.

Tracked-On: #4956
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
2020-06-18 14:05:56 +08:00
Conghui Chen
53f74f18ac hv: remove repeated assignment
remove repeated assignment for vmcs_pa.

Tracked-On: #4956
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
2020-06-18 14:05:56 +08:00
Victor Sun
ca9e98cc74 HV: add board and scenario info in log
As build variants for different board and different scenario growing, users
might make mistake on HV binary distributions. Checking board/scenario info
from log would be the fastest way to know whether the binary matches. Also
it would be of benifit to developers for confirming the correct binary they
are debugging.

Tracked-On: #4946

Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-06-18 13:05:42 +08:00
Binbin Wu
da1788c9a3 hv: vtd: add an API to reserve continuous irtes
dmar_reserve_irte is added to reserve N coutinuous IRTEs.
N could be 1, 2, 4, 8, 16, or 32.

The reserved IRTEs will not be freed.

Tracked-On:#4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-06-16 08:52:56 +08:00
Binbin Wu
7bfcc673a6 hv: ptirq: associate an irte with ptirq_remapping_info entry
For a ptirq_remapping_info entry, when build IRTE:
- If the caller provides a valid IRTE, use the IRET
- If the caller doesn't provide a valid IRTE, allocate a IRET when the
entry doesn't have a valid IRTE, in this case, the IRET will be freed
when free the entry.

Tracked-On:#4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-06-16 08:52:56 +08:00
Binbin Wu
2fe4280cfa hv: vtd: add two paramters for dmar_assign_irte
idx_in:
- If the caller of dmar_assign_irte passes a valid IRTE index, it will
be resued;
- If the caller of dmar_assign_irte passes INVALID_IRTE_ID as IRTE index,
the function will allocate a new IRTE.

idx_out:
This paramter return the actual index of IRTE used. The caller need to
check whether the return value is valid or not.

Also this patch adds an internal function alloc_irte.
The function takes count as input paramter to allocate continuous IRTEs.
The count can only be 1, 2, 4, 8, 16 or 32.
This is prepared for multiple MSI vector support.

Tracked-On: #4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2020-06-16 08:52:56 +08:00