Commit Graph

2099 Commits

Author SHA1 Message Date
Yonghua Huang
1a6ead9af5 hv: update RTCT ACPI table detecting
Signature of RTCT ACPI table maybe "PTCT"(v1) or "RTCT"(v2).
 and the MAGIC number in CRL header is also changed from "PTCM"
 to "RTCM".

 This patch refine the code to detect RTCT table for both
 v1 and v2.

Tracked-On: #6020
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
2021-06-01 08:22:20 +08:00
Li Fei1
0d5f12e281 hv: vlapic: a minor refine about vlapic_x2apic_pt_icr_access
In physical destination mode, the destination processor is specified by its
local APIC ID. When a CPU switch xAPIC Mode to x2APIC Mode or vice versa,
the local APIC ID is not changed. So a vcpu in x2APIC Mode could use physical
Destination Mode to send an IPI to another vcpu in xAPIC Mode by writing ICR.

This patch adds support for a vCPU A could write ICR to send IPI to another
vCPU B which is in different APIC mode.

Tracked-On: #5923
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-05-27 09:00:08 +08:00
dongshen
5e3c6ae941 hv: vcpuid: passthrough host CPUID leaf.0BH to guest VMs
Using physical APIC IDs as vLAPIC IDs for pre-Launched and post-launched VMs
is not sufficient to replicate the host CPU and cache topologies in guest VMs,
we also need to passthrough host CPUID leaf.0BH to guest VMs, otherwise,
guest VMs may see weird CPU topology.

Note that in current code, ACRN has already passthroughed host cache CPUID
leaf 04H to guest VMs

Tracked-On: #6020
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
2021-05-26 11:23:06 +08:00
dongshen
f332ef15b2 hv: vlapic: use physical APIC IDs as vLAPIC IDs for pre-launched and post-launched VMs
In current code, ACRN uses physical APIC IDs as vLAPIC IDs for SOS,
and vCPU ids (contiguous) as vLAPIC IDs for pre-Launched and post-Launched VMs.

Using vCPU ids as vLAPIC IDs for pre-Launched and post-Launched VMs
would result in wrong CPU and cache topologies showing in the guest VMs,
and could adversely affect performance if the guest VM chooses to detect
CPU and cache topologies and optimize its behavior accordingly.

Uses physical APIC IDs as vLAPIC IDs (and related CPU/cache topology enumeration
CPUIDs passthrough) will replicate the host CPU and cache topologies in pre-Launched
and post-Launched VMs.

Tracked-On: #6020
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
2021-05-26 11:23:06 +08:00
Zide Chen
6428ca8f5b hv: VMPTRLD and VMCLEAR VMCS with the common APIs
Remove the direct calls to exec_vmptrld() or exec_vmclear(), and replace
with the wrapper APIs load_va_vmcs() and clear_va_vmcs().

Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
2021-05-26 11:22:26 +08:00
Zide Chen
6d69058a9d hv: nested: support for VMREAD and VMWRITE emulation
This patch implements the VMREAD and VMWRITE instructions.

When L1 guest is running with an active VMCS12, the “VMCS shadowing”
VM-execution control is always set to 1 in VMCS01. Thus the possible
behavior of VMREAD or VMWRITE from L1 could be:

- It causes a VM exit to L0 if the bit corresponds to the target VMCS
  field in the VMREAD bitmap or VMWRITE bitmap is set to 1.
- It accesses the VMCS referenced by VMCS01 link pointer (VMCS02 in
  our case) if the above mentioned bit is set to 0.

This patch handles the VMREAD and VMWRITE VM exits in this way:

- on VMWRITE, it writes the desired VMCS value to the respective field
  in the cached VMCS12. For VMCS fields that need to be synced to VMCS02,
  sets the corresponding dirty flag.

- on VMREAD, it reads the desired VMCS value from the cached VMCS12.

Tracked-On: #5923
Signed-off-by: Alex Merritt <alex.merritt@intel.com>
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-05-24 10:34:01 +08:00
Zide Chen
2bd269c11c hv: nested: support for VMCLEAR emulation
This patch is to emulate VMCLEAR instruction.

L1 hypervisor issues VMCLEAR on a VMCS12 whose state could be any of
these: active and current, active but not current, not yet VMPTRLDed.

To emulate the VMCLEAR instruction, ACRN sets the VMCS12 launch state to
"clear", and if L0 already cached this VMCS12, need to sync it back to
guest memory:

- sync shadow fields from shadow VMCS VMCS to cache VMCS12
- copy cache VMCS12 to L1 guest memory

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
2021-05-24 10:34:01 +08:00
Zide Chen
5379b14108 hv: nested: define VMCS shadow fields
Enable VMCS shadowing for most of the VMCS fields, so that execution of
the VMREAD or VMWRITE on these shadow VMCS fields from L1 hypervisor
won't cause VM exits, but read from or write to the shadow VMCS.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Alexander Merritt <alex.merritt@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
2021-05-24 10:34:01 +08:00
Zide Chen
863e58e539 hv: nested: define software layout for VMCS12 and helper functions
Software layout of VMCS12 data is a contract between L1 guest and L0
hypervisor to run a L2 guest.

ACRN hypervisor caches the VMCS12 which is passed down from L1 hypervisor
by the VMPTRLD instructin. At the time of VMCLEAR, ACRN syncs the cached
VMCS12 back to L1 guest memory.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-05-24 10:34:01 +08:00
Zide Chen
f5744174b5 hv: nested: support for VMPTRLD emulation
This patch emulates the VMPTRLD instruction. L0 hypervisor (ACRN) caches
the VMCS12 that is passed down from the VMPTRLD instruction, and merges it
with VMCS01 to create VMCS02 to run the nested VM.

- Currently ACRN can't cache multiple VMCS12 on one vCPU, so it needs to
  flushes active but not current VMCS12s to L1 guest.
- ACRN creates VMCS02 to run nested VM based on VMCS12:
  1) copy VMCS12 from guest memory to the per vCPU cache VMCS12
  2) initialize VMCS02 revision ID and host-state area
  3) load shadow fields from cache VMCS12 to VMCS02
  4) enable VMCS shadowing before L1 Vm entry

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
2021-05-24 10:34:01 +08:00
Zide Chen
0a1ac2f4a0 hv: nested: support for VMXOFF emulation
This patch implements the VMXOFF instruction. By issuing VMXOFF,
L1 guest Leaves VMX Operation.

- cleanup VCPU nested virtualization context states in VMXOFF handler.
- implement check_vmx_permission() to check permission for VMX operation
  for VMXOFF and other VMX instructions.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-05-24 10:34:01 +08:00
Zide Chen
3fdad3c6d1 hv: nested: check prerequisites to enter VMX operation
According to VMXON Instruction Reference, do the following checks in the
virtual hardware environment: vCPU CPL, guest CR0, CR4, revision ID
in VMXON region, etc.

Currently ACRN doesn't support 32-bit L1 hypervisor, and injects an #UD
exception if L1 hypervisor is not running in 64-bit mode.

Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-05-24 10:34:01 +08:00
Zide Chen
fc8f07e740 hv: nested: support for VMXON emulation
This patch emulates VMXON instruction. Basically checks some
prerequisites to enable VMX operation on L1 guest (next patch), and
prepares some virtual hardware environment in L0.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-05-24 10:34:01 +08:00
Tao Yuhong
b93d6b2ef0 HV: Fix mistake use stac() & clac()
The commit 2ab70f43e5
HV: cache: Fix page fault by flushing cache for VM trusty RAM in HV

It is wrong in using stac()/clac()

Tracked-On: #6020
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
2021-05-24 10:32:54 +08:00
Li Fei1
6077152c4b hv: vlapic: extend vlapic_x2apic_pt_icr_access to support more destination mode
Now guest would use `Destination Shorthand` to broadcast IPIs if there're more
than one destination. However, it is not supported when the guest is in LAPIC
passthru situation, and all active VCPUs are working in X2APIC mode. As a result,
the guest would not work properly since this kind broadcast IPIs was ignored
by ACRN. What's worse, ACRN Hypervisor would inject GP to the guest in this case.

This patch extend vlapic_x2apic_pt_icr_access to support more destination modes
(both `Physical` and `Logical`) and destination shorthand (`No Shorthand`, `Self`,
`All Including Self` and `All Excluding Self`).

Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-24 10:27:32 +08:00
Li Fei1
a69e67b58b hv: vlapic: wrap a function to calculate destination vcpu mask by shorthand
1. Rename vlapic_calc_dest to vlapic_calc_dest_noshort
2. Remove vlapic_calc_dest_lapic_pt, use vlapic_calc_dest_noshort instead
3. Wrap vlapic_calc_dest to calculate destination vcpu mask according shorthand

Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-24 10:27:32 +08:00
Tao Yuhong
2ab70f43e5 HV: cache: Fix page fault by flushing cache for VM trusty RAM in HV
The accrss right of HV RAM can be changed to PAGE_USER (eg. trusty RAM
of post-launched VM). So before using clflush(or clflushopt) to flush
HV RAM cache, must allow explicit supervisor-mode data accesses to
user-mode pages. Otherwise, it may trigger page fault.

Tracked-On: #6020
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
2021-05-21 09:20:46 +08:00
Liang Yi
6805510d77 hv/mod_timer: refine timer interface
1. do not allow external modules to touch internal field of a timer.
2. make timer mode internal, period_in_ticks will decide the mode.

API wise:
1. the "mode" parameter was taken out of initialize_timer().
2. a new function update_timer() was added to update the timeout and
   period fields.
3. the timer_expired() function was extended with an output parameter
   to return the remaining cycles before expiration.

Also, the "fire_tsc" field name of hv_timer was renamed to "timeout".
With the new API, however, this change should not concern user code.

Tracked-On: #5920

Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-05-18 16:43:28 +08:00
Liang Yi
3547c9cd23 hv/mod_timer: make timer into an arch-independent module
x86/timer.[ch] was moved to the common directory largely unchanged.

x86 specific code now resides in x86/tsc_deadline_timer.c and its
interface was defined in hw/hw_timer.h. The interface defines two
functions: init_hw_timer() and set_hw_timeout() that provides HW
specific initialization and timer interrupt source.

Other than these two functions, the timer module is largely arch
agnostic.

Tracked-On: #5920
Signed-off-by: Rong Liu <rong2.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-05-18 16:43:28 +08:00
Liang Yi
51204a8d11 hv/mod_timer: separate delay functions from the timer module
Modules that use udelay() should include "delay.h" explicitly.

Tracked-On: #5920
Signed-off-by: Rong Liu <rong2.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-05-18 16:43:28 +08:00
Liang Yi
5a2b89b0a4 hv/mod_timer: split tsc handling code from timer.
Generalize and split basic cpu cycle/tick routines from x86/timer:
- Instead of rdstc(), use cpu_ticks() in generic code.
- Instead of get_tsc_khz(), use cpu_tickrate() in generic code.
- Include "common/ticks.h" instead of "x86/timer.h" in generic code.
- CYCLES_PER_MS is renamed to TICKS_PER_MS.

The x86 specific API rdstc() and get_tsc_khz(), as well as TSC_PER_MS
are still available in arch/x86/tsc.h but only for x86 specific usage.

Tracked-On: #5920
Signed-off-by: Rong Liu <rong2.liu@intel.com>
Signed-off-by: Yi Liang <yi.liang@intel.com>
2021-05-18 16:43:28 +08:00
Yonghua Huang
00b3a28d5d hv: update RTCT parser to support RTCT version 2
RTCT has been updated to version 2,
  this patch updates hypervisor RTCT parser to support
  both version 1 and version 2 of RTCT.

Tracked-On: #6020
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Jason CJ Chen <jason.cj.chen@intel.com>
2021-05-17 17:19:11 +08:00
Yonghua Huang
9facbb43b3 config-tool: rename PSRARM to SSRAM
'psram' and 'PSRAM' are legacy names and replaced
  with 'ssram' and 'SSRAM' respectively.

Tracked-On: #6012
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Shuang Zheng <shuang.zheng@intel.com>
2021-05-17 14:31:42 +08:00
Zide Chen
c9982e8c7e hv: nested: setup emulated VMX MSRs
We emulated these MSRs:

- MSR_IA32_VMX_PINBASED_CTLS
- MSR_IA32_VMX_PROCBASED_CTLS
- MSR_IA32_VMX_PROCBASED_CTLS2
- MSR_IA32_VMX_EXIT_CTLS
- MSR_IA32_VMX_ENTRY_CTLS
- MSR_IA32_VMX_BASIC: emulate VMCS revision ID, etc.
- MSR_IA32_VMX_MISC

For the following MSRs, we pass through the physical value to L1 guests:

- MSR_IA32_VMX_EPT_VPID_CAP
- MSR_IA32_VMX_VMCS_ENUM
- MSR_IA32_VMX_CR0_FIXED0
- MSR_IA32_VMX_CR0_FIXED1
- MSR_IA32_VMX_CR4_FIXED0
- MSR_IA32_VMX_CR4_FIXED1

Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-16 19:05:21 +08:00
Zide Chen
4930992118 hv: nested: implement the framework for VMX MSR emulation
Define LIST_OF_VMX_MSRS which includes a list of MSRs that are visible to
L1 guests if nested virtualization is enabled.
- If CONFIG_NVMX_ENABLED is set, these MSRs are included in
  emulated_guest_msrs[].
- otherwise, they are included in unsupported_msrs[].

In this way we can take advantage of the existing infrastructure to
emulate these MSRs.

Tracked-On: #5923
Spick igned-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-16 19:05:21 +08:00
Zide Chen
97df220f49 hv: vmsr: emulate IA32_FEATURE_CONTORL MSR for nested virtualization
In order to support nested virtualization, need to expose the "Enable VMX
outside SMX operation" bit to L1 hypervisor.

Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-16 19:05:21 +08:00
Yonghua Huang
e9870893a3 hv: rename some software SRAM local names
For simplification purpose, use 'ssram' instead of
 'software sram' for local names inside rtcm module.

Tracked-On: #6015
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-16 10:08:17 +08:00
Li Fei1
30febed0e1 hv: cache: wrap common APIs
Wrap three common Cache APIs:
- flush_invalidate_all_cache
- flush_cacheline
- flush_cache_range

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-05-14 09:18:00 +08:00
Li Fei1
77e64f6092 hv: tlb: wrap common APIs
Wrap two common TLB APIs: flush_tlb and flush_tlb_range.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-05-14 09:18:00 +08:00
Li Fei1
d94582389e hv: mmu: move arch specific parts into cpu.h
Move Cache/TLB arch specific parts into cpu.h
After this change, we should not expose arch specific parts out from mmu.h

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-05-14 09:18:00 +08:00
Li Fei1
d6362b6e0a hv: paging: rename ppt_set/clear_ATTR to set_paging_ATTR
Rename ppt_set/clear_(attribute) to set_paging_(attribute)

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-05-14 09:18:00 +08:00
Zide Chen
ccfdf9cdd7 hv: nested: enable nested virtualization
Allow guest set CR4_VMXE if CONFIG_NVMX_ENABLED is set:

- move CR4_VMXE from CR4_EMULATED_RESERVE_BITS to CR4_TRAP_AND_EMULATE_BITS
  so that CR4_VMXE is removed from cr4_reserved_bits_mask.
- force CR4_VMXE to be removed from cr4_rsv_bits_guest_value so that CR4_VMXE
  is able to be set.

Expose VMX feature (CPUID01.01H:ECX[5]) to L1 guests whose GUEST_FLAG_NVMX_ENABLED
is set.

Assuming guest hypervisor (L1) is KVM, and KVM uses EPT for L2 guests.

Constraints on ACRN VM.
- LAPIC passthrough should be enabled.
- use SCHED_NOOP scheduler.

Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-13 16:16:30 +08:00
Zide Chen
dd90eccc25 hv: move invvpid and invept helper code from mmu.c to mmu.h
moving invvpid and invept helper code from mmu.c to mmu.h, so that they
can be accessed by the nested virtualization code.

No logical changes.

Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-13 16:16:30 +08:00
Shuo A Liu
3fffa68665 hv: Support WAITPKG instructions in guest VM
TPAUSE, UMONITOR or UMWAIT instructions execution in guest VM cause
a #UD if "enable user wait and pause" (bit 26) of VMX_PROCBASED_CTLS2
is not set. To fix this issue, set the bit 26 of VMX_PROCBASED_CTLS2.

Besides, these WAITPKG instructions uses MSR_IA32_UMWAIT_CONTROL. So
load corresponding vMSR value during context switch in of a vCPU.

Please note, the TPAUSE or UMWAIT instruction causes a VM exit if the
"RDTSC exiting" and "enable user wait and pause" are both 1. In ACRN
hypervisor, "RDTSC exiting" is always 0. So TPAUSE or UMWAIT doesn't
cause a VM exit.

Performance impact:
    MSR_IA32_UMWAIT_CONTROL read costs ~19 cycles;
    MSR_IA32_UMWAIT_CONTROL write costs ~63 cycles.

Tracked-On: #6006
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
2021-05-13 14:19:50 +08:00
dongshen
ebadf00de8 hv: some coding style fixes
Fix issues reported by checkpatch.pl

Tracked-On: #5917
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
2021-05-12 16:50:34 +08:00
Junjie Mao
ea4eadf0a5 hv: hypercalls: refactor permission-checking and dispatching logic
The current permission-checking and dispatching mechanism of hypercalls is
not unified because:

  1. Some hypercalls require the exact vCPU initiating the call, while the
     others only need to know the VM.
  2. Different hypercalls have different permission requirements: the
     trusty-related ones are enabled by a guest flag, while the others
     require the initiating VM to be the Service OS.

Without a unified logic it could be hard to scale when more kinds of
hypercalls are added later.

The objectives of this patch are as follows.

  1. All hypercalls have the same prototype and are dispatched by a unified
     logic.
  2. Permissions are checked by a unified logic without consulting the
     hypercall ID.

To achieve the first objective, this patch modifies the type of the first
parameter of hcall_* functions (which are the callbacks implementing the
hypercalls) from `struct acrn_vm *` to `struct acrn_vcpu *`. The
doxygen-style documentations are updated accordingly.

To achieve the second objective, this patch adds to `struct hc_dispatch` a
`permission_flags` field which specifies the guest flags that must ALL be
set for a VM to be able to invoke the hypercall. The default value (which
is 0UL) indicates that this hypercall is for SOS only. Currently only the
`permission_flag` of trusty-related hypercalls have the non-zero value
GUEST_FLAG_SECURE_WORLD_ENABLED.

With `permission_flag`, the permission checking logic of hypercalls is
unified as follows.

  1. General checks
     i. If the VM is neither SOS nor having any guest flag that allows
        certain hypercalls, it gets #UD upon executing the `vmcall`
        instruction.
    ii. If the VM is allowed to execute the `vmcall` instruction, but
        attempts to execute it in ring 1, 2 or 3, the VM gets #GP(0).
  2. Hypercall-specific checks
     i. If the hypercall is for SOS (i.e. `permission_flag` is 0), the
        initiating VM must be SOS and the specified target VM cannot be a
        pre-launched VM. Otherwise the hypercall returns -EINVAL without
        further actions.
    ii. If the hypercall requires certain guest flags, the initiating VM
        must have all the required flags. Otherwise the hypercall returns
        -EINVAL without further actions.
   iii. A hypercall with an unknown hypercall ID makes the hypercall
        returns -EINVAL without further actions.

The logic above is different from the current implementation in the
following aspects.

  1. A pre-launched VM now gets #UD (rather than #GP(0)) when it attempts
     to execute `vmcall` in ring 1, 2 or 3.
  2. A pre-launched VM now gets #UD (rather than the return value -EPERM)
     when it attempts to execute a trusty hypercall in ring 0.
  3. The SOS now gets the return value -EINVAL (rather than -EPERM) when it
     attempts to invoke a trusty hypercall.
  4. A post-launched VM with trusty support now gets the return value
     -EINVAL (rather than #UD) when it attempts to invoke a non-trusty
     hypercall or an invalid hypercall.

v1 -> v2:
 - Update documentation that describe hypercall behavior.
 - Fix Doxygen warnings

Tracked-On: #5924
Signed-off-by: Junjie Mao <junjie.mao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-05-12 13:43:41 +08:00
Liang Yi
688a41c290 hv: mod: do not use explicit arch name when including headers
Instead of "#include <x86/foo.h>", use "#include <asm/foo.h>".

In other words, we are adopting the same practice in Linux kernel.

Tracked-On: #5920
Signed-off-by: Liang Yi <yi.liang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-05-08 11:15:46 +08:00
Li Fei1
f3327364c3 hv: mmu: fix a minor bug
We should only map [low32_max_ram, 4G) MMIO region as UC attribute,
not map [low32_max_ram, low32_max_ram + 4G) region as UC attribute.
Otherwise, the HV will complain [4G, low32_max_ram + 4G) region has
already mapped.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-04-29 08:57:13 +08:00
Shuo A Liu
dc88c2e397 hv: Save/restore MSR_IA32_CSTAR during context switch
Both Windows guest and Linux guest use the MSR MSR_IA32_CSTAR, while
Linux uses it rarely. Now vcpu context switch doesn't save/restore it.
Windows detects the change of the MSR and rises a exception.

Do the save/resotre MSR_IA32_CSTAR during context switch.

Tracked-On: #5899
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-04-23 11:21:52 +08:00
Jian Jun Chen
31b8b698ce hv: TLFS: Add tsc_offset support for reference time
TLFS spec defines that when a VM is created, the value of
HV_X64_MSR_TIME_REF_COUNT is set to zero. Now tsc_offset is not
supported properly, so guest get a drifted reference time.

This patch implements tsc_offset. tsc_scale and tsc_offset
are calculated when a VM is launched and are saved in
struct acrn_hyperv of struct acrn_vm.

Tracked-On: #5956
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-04-23 10:48:07 +08:00
Jian Jun Chen
b4312efbd7 hv: TLFS: inject #GP to guest VM for writing of read-only MSRs
TLFS spec defines that HV_X64_MSR_VP_INDEX and HV_X64_MSR_TIME_REF_COUNT
are read-only MSRs. Any attempt to write to them results in a #GP fault.

Fix the issue by returning error in handler hyperv_wrmsr() of MSRs
HV_X64_MSR_VP_INDEX/HV_X64_MSR_TIME_REF_COUNT emulation.

Tracked-On: #5956
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-04-23 10:48:07 +08:00
Jian Jun Chen
dd524d076d hv: TLFS: Setup hypercall page according to the vcpu mode
TLFS spec defines different hypercall ABIs for X86 and x64. Currently
x64 hypercall interface is not supported well.

Setup the hypercall interface page according to the vcpu mode.

Tracked-On: #5956
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-04-23 10:48:07 +08:00
Li Fei1
628bca5cad hv: pgtable: use new algo to calculate PPT/EPT_PD_PAGE_NUM
In order to support platform (such as Ander Lake) which physical address width
bits is 46, the current code need to reserve 2^16 PD page ((2^46) / (2^30)).
This is a complete waste of memory.

This patch would reserve PD page by three parts:
1. DRAM - may take PD_PAGE_NUM(CONFIG_PLATFORM_RAM_SIZE) PD pages at most;
2. low MMIO - may take PD_PAGE_NUM(MEM_1G << 2U) PD pages at most;
3. high MMIO - may takes (CONFIG_MAX_PCI_DEV_NUM * 6U) PD pages (may plus
PDPT entries if its size is larger than 1GB ) at most for:
(a) MMIO BAR size must be a power of 2 from 16 bytes;
(b) MMIO BAR base address must be power of two in size and are aligned with
its size.

Tracked-On: #5929
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-04-22 14:35:57 +08:00
Li Fei1
053c09e764 hv: cpu_cap: PAW over 39 bits must support 1GB large page
The platform which physical-address width over 39 bits must support
1GB large page (Both MMU and VMX sides ). This could save lots of
page table pages for EPT MMIO mapping.

Tracked-On: #5929
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-04-22 14:35:57 +08:00
Li Fei1
41e2d40d1f hv: e820: remove get_mem_range_info
No one uses get_mem_range_info to get the top/bottom/size of the physical memory.
We could get these informations by e820 table easily.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: eddie Dong <eddie.dong@intel.com>
2021-04-21 14:00:44 +08:00
Li Fei1
3a465388d4 hv: guest: remove get_mem_range_info in prepare_sos_vm_memmap
We used get_mem_range_info to get the top memory address and then use this address
as the high 64 bits max memory address of SOS. This assumes the platform must have
high memory space.

This patch removes the assumption. It will set high 64 bits max memory address of
SOS to 4G by default (Which means there's no 64 bits high memory), then update
the high 64 bits max memory address if the SOS really has high memory space.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: eddie Dong <eddie.dong@intel.com>
2021-04-21 14:00:44 +08:00
Li Fei1
901e8c869e hv: vE820: calculate SOS memory size by vE820 tables
SOS's memory size could be calculated by its vE820 Tables easily.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: eddie Dong <eddie.dong@intel.com>
2021-04-21 14:00:44 +08:00
Li Fei1
ad15053304 hv: mmu: remove get_mem_range_info in init_paging
We used get_mem_range_info to get the top memory address and then use this address
as the high 64 bits max memory address. This assumes the platform must have high
memory space.

This patch calculates the high 64 bits max memory address according the e820 tables
and removes the assumption "The platform must have high memory space" by map the
low RAM region and high RAM region separately.

Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: eddie Dong <eddie.dong@intel.com>
2021-04-21 14:00:44 +08:00
Li Fei1
6137347411 hv: smp: fix an isuue about SMP sync
Now BSP may launch VMs before APs have not done its initilization,
for example, sched_control for per-cpu. However, when we initilize
the vcpu thread data, it will access the object (scheduler) of the
sched_control of APs. As a result, it will trigger the PF.

This patch would waits each physical has done its initilization before
to continue to execute.

Tracked-On: #5929
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-04-21 10:54:48 +08:00
Li Fei1
5f281df548 hv: serializng: use mfence to ensure trampoline code was updated
Using the MFENCE to make sure trampoline code
has been updated (clflush) into memory beforing start APs.

Tracked-On: #5929
Signed-off-by: Li Fei1 <fei1.li@intel.com>
2021-04-21 10:54:48 +08:00