Commit Graph

2202 Commits

Author SHA1 Message Date
dongshen
dcafcadaf9 hv: rename some C preprocessor macros
Rename some C preprocessor macros:
  NUM_GUEST_MSRS --> NUM_EMULATED_MSRS
  CAT_MSR_START_INDEX --> FLEXIBLE_MSR_INDEX
  NUM_VCAT_MSRS --> NUM_CAT_MSRS
  NUM_VCAT_L2_MSRS --> NUM_CAT_L2_MSRS
  NUM_VCAT_L3_MSRS --> NUM_CAT_L3_MSRS

Tracked-On: #5917
Signed-off-by: Eddie Dong <eddie.dong@Intel.com>
2021-10-28 19:12:29 +08:00
dongshen
c0d95558c1 hv: vCAT: propagate vCBM to other vCPUs that share cache with vcpu
Implement the propagate_vcbm() function:
  Set vCBM to to all the vCPUs that share cache with vcpu
  to mimic hardware CAT behavior

Tracked-On: #5917
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-10-28 19:12:29 +08:00
dongshen
a7014f4654 hv: vCAT: implementing the vCAT MSRs write handler
Implement the write_vcbm() function to handle the
MSR_IA32_type_MASK_n vCBM MSRs write request

Call write_vclosid() to handle MSR_IA32_PQR_ASSOC MSR write request

Several vCAT P2V (physical to virtual) and V2P (virtual to physical)
mappings exist:

   struct acrn_vm_config *vm_config = get_vm_config(vm_id)

   max_pcbm = vm_config->max_type_pcbm (type: l2 or l3)
   mask_shift = ffs64(max_pcbm)

   vclosid = vmsr - MSR_IA32_type_MASK_0
   pclosid = vm_config->pclosids[vclosid]

   pmsr = MSR_IA32_type_MASK_0 + pclosid
   pcbm = vcbm << mask_shift
   vcbm = pcbm >> mask_shift

   Where
   MSR_IA32_type_MASK_n: L2 or L3 mask msr address for CLOSIDn, from
   0C90H through 0D8FH (inclusive).

   max_pcbm: a bitmask that selects all the physical cache ways assigned to the VM

   vclosid: virtual CLOSID, always starts from 0

   pclosid: corresponding physical CLOSID for a given vclosid

   vmsr: virtual msr address, passed to vCAT handlers by the
   caller functions rdmsr_vmexit_handler()/wrmsr_vmexit_handler()

   pmsr: physical msr address

   vcbm: virtual CBM, passed to vCAT handlers by the
   caller functions rdmsr_vmexit_handler()/wrmsr_vmexit_handler()

   pcbm: physical CBM

Tracked-On: #5917
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-10-28 19:12:29 +08:00
dongshen
3ab50f2ef5 hv: vCAT: implementing the vCAT MSRs read handlers
Implement the read_vcbm() and read_vclosid() functions to handle the MSR_IA32_PQR_ASSOC
and MSR_IA32_type_MASK_n vCAT MSRs read request.

Tracked-On: #5917
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-10-28 19:12:29 +08:00
dongshen
be855d2352 hv: vCAT: expose CAT capabilities to vCAT-enabled VM
Expose CAT feature to vCAT VM by reporting the number of
cache ways/CLOSIDs via the 04H/10H cpuid instructions, so that the
VM can take advantage of CAT to prioritize and partition cache
resource for its own tasks.

Add the vcat_pcbm_to_vcbm() function to map pcbm to vcbm

Tracked-On: #5917
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-10-28 19:12:29 +08:00
dongshen
77ae989379 hv: vCAT: initialize vCAT MSRs during vmcs init
Initialize vCBM MSRs

Initialize vCLOSID MSR

Add some vCAT functions:
 Retrieve max_vcbm and max_pcbm
 Check if vCAT is configured or not for the VM
 Map vclosid to pclosid
 write_vclosid: vCLOSID MSR write handler
 write_vcbm: vCBM MSR write handler

Tracked-On: #5917
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-10-28 19:12:29 +08:00
Fei Li
1905ed6124 hv: vMSR: minor fix about rdmsr_vmexit_handler
Specifying a reserved or unimplemented MSR address in ECX for rdmsr will cause a
general protection exception. In this case, we should not change the contents of
registers EDX:EAX.

Tracked-On: #4550
Signed-off-by: Fei Li <fei1.li@intel.com>
2021-10-27 08:23:43 +08:00
dongshen
39461ef9dd hv: vCAT: initialize the emulated_guest_msrs array for CAT msrs during platform initialization
Initialize the emulated_guest_msrs[] array at runtime for
MSR_IA32_type_MASK_n and MSR_IA32_PQR_ASSOC msrs, there is no good
way to do this initialization statically at build time

Tracked-On: #5917
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-10-26 11:48:27 +08:00
dongshen
cb2bb78b6f hv/config_tools: amend the struct acrn_vm_config to make it compatible with vCAT
For vCAT, it may need to store more than MAX_VCPUS_PER_VM of closids,
change clos in vm_config.h to a pointer to accommodate this situation

Rename clos to pclosids

pclosids now is a pointer to an array of physical CLOSIDs that is defined
in vm_configurations.c by vmconfig. The number of elements in the array
must be equal to the value given by num_pclosids

Add max_type_pcbm (type: l2 or l3) to struct acrn_vm_config, which stores a bitmask
that selects/covers all the physical cache ways assigned to the VM

Change vmsr.c to accommodate this amended data structure

Change the config-tools to generate vm_configurations.c, and fill in the num_closids
and clos pointers based on the information from the scenario file.

Now vm_configurations.c.xsl generates all the clos related code so remove the same
code from misc_cfg.h.xsl.

Examples:

  Scenario file:

  <RDT>
    <RDT_ENABLED>y</RDT_ENABLED>
    <CDP_ENABLED>n</CDP_ENABLED>
    <VCAT_ENABLED>y</VCAT_ENABLED>
    <CLOS_MASK>0x7ff</CLOS_MASK>
    <CLOS_MASK>0x7ff</CLOS_MASK>
    <CLOS_MASK>0x7ff</CLOS_MASK>
    <CLOS_MASK>0xff800</CLOS_MASK>
    <CLOS_MASK>0xff800</CLOS_MASK>
    <CLOS_MASK>0xff800</CLOS_MASK>
    <CLOS_MASK>0xff800</CLOS_MASK>
    <CLOS_MASK>0xff800</CLOS_MASK>
  /RDT>

  <vm id="0">
   <guest_flags>
     <guest_flag>GUEST_FLAG_VCAT_ENABLED</guest_flag>
   </guest_flags>
   <clos>
     <vcpu_clos>3</vcpu_clos>
     <vcpu_clos>4</vcpu_clos>
     <vcpu_clos>5</vcpu_clos>
     <vcpu_clos>6</vcpu_clos>
     <vcpu_clos>7</vcpu_clos>
   </clos>
  </vm>

  <vm id="1">
   <clos>
     <vcpu_clos>1</vcpu_clos>
     <vcpu_clos>2</vcpu_clos>
   </clos>
  </vm>

 vm_configurations.c (generated by config-tools) with the above vCAT config:

  static uint16_t vm0_vcpu_clos[5U] = {3U, 4U, 5U, 6U, 7U};
  static uint16_t vm1_vcpu_clos[2U] = {1U, 2U};

  struct acrn_vm_config vm_configs[CONFIG_MAX_VM_NUM] = {
  {
  .guest_flags = (GUEST_FLAG_VCAT_ENABLED),
  .pclosids = vm0_vcpu_clos,
  .num_pclosids = 5U,
  .max_l3_pcbm = 0xff800U,
  },
  {
  .pclosids = vm1_vcpu_clos,
  .num_pclosids = 2U,
  },
  };

Tracked-On: #5917
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-10-26 11:48:27 +08:00
Yonghua Huang
c8e2060d37 hv: unmap IOMMU register pages from service VM EPT
IOMMU hardware resource is owned by hypervisor, while
 IOMMU capability is reported to service VM in its ACPI
 table. In this case, Service VM may access IOMMU hardware
 resource, which is not expected.

 This patch unmaps all Intel IOMMU register pages for service VM EPT.

Tracked-On: #6677
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Reviewed-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-10-22 09:31:10 +08:00
Fei Li
6c5bf4a642 hv: enhance e820_alloc_memory could allocate memory than 4G
Enhance e820_alloc_memory could allocate memory than 4G.

Tracked-On: #5830
Signed-off-by: Fei Li <fei1.li@intel.com>
2021-10-14 15:04:36 +08:00
Fei Li
df7ffab441 hv: remove CONFIG_HV_RAM_SIZE
It's difficult to configure CONFIG_HV_RAM_SIZE properly at once. This patch
not only remove CONFIG_HV_RAM_SIZE, but also we use ld linker script to
dynamically get the size of HV RAM size.

Tracked-On: #6663
Signed-off-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-10-14 15:04:36 +08:00
Zide Chen
e48962faa6 hv: optimize run_vcpu() for nested
This patch implements a separate path for L2 VMEntry in run_vcpu(),
which has several benefits:

- keep run_vcpu() clean, to reduce the number of is_vcpu_in_l2_guest()
  statements:
  - current code has three is_vcpu_in_l2_guest() already.
  - supposed to have another 2 statement so that nested VMEntry won't
    hit the "Starting vCPU" and "vCPU launched" pr_info() and a few
    other statements in the VM launch path.

- save few other things in run_vcpu() that are not needed for nested.

Tracked-On: #6289
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-10-13 15:55:31 +08:00
Zide Chen
89bbc44962 hv: inject external interrupts only if LAPIC is not passthru
Tracked-On: #6289
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-10-08 09:18:34 +08:00
Zide Chen
228b052fdb hv: operations on vcpu->reg_cached/reg_updated don't need LOCK prefix
In run time, one vCPU won't read or write a register on other vCPUs,
thus we don't need the LOCK prefixed instructions on reg_cached and
reg_updated.

Tracked-On: #6289
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-10-08 09:11:10 +08:00
Zide Chen
2b683f8f5b hv: call vcpu_inject_exception() only when ACRN_REQUEST_EXCP is set
move the bitmap test call out of vcpu_inject_exception(), then we call
the expensive bitmap_test_and_clear_lock() only pending_req_bits is
non-zero and call vcpu_inject_exception() only if needed.

Tracked-On: #6289
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-10-07 20:48:43 +08:00
Zide Chen
f801ba4ed7 hv: update guest RIP only if vcpu->arch.inst_len is non zero
In very large number of VM extis, the VM-exit instruction length could be
zero, and it's no need to update VMX_GUEST_RIP.

Some examples:

- all external interrupt VM exits in non LAPIC passthru setup.
- for all the nested VM-exits that are reflecting to L1 hypervisor.

Tracked-On: #6289
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-10-07 20:47:07 +08:00
Zide Chen
b7e9a68923 hv: code cleanup in run_vcpu()
- wrap a new function exec_vmentry() to reduce code duplication.
- remove exec_vmread(VMX_GUEST_RSP) since ACRN doesn't need to know
  guest RSP in run time.

Tracked-On: #6289
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-10-07 20:47:07 +08:00
Zide Chen
ee12daff84 hv: nested: refine vmcs12_read/write_field APIs
Change "uint64_t vmcs_hva" to "void *vmcs_hva" in the input argument,
list, so that no type casting is needed when calling them from pointers.

Tracked-On: #6289
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-10-07 20:45:34 +08:00
Liu,Junming
4105ca2cb4 hv: deny the launch of VM if pass-thru PIO bar isn't identical mapping
In current design, when pass-thru dev,
for the PIO bar, need to ensure the guest PIO start address
equals to host PIO start address.
Then set the VMCS io bitmap to pass-thru the corresponding
port io to guest for performance.

ACRN-DM and acrn-config should ensure the identical mapping of PIO bar.
If ACRN-DM or acrn-config failed to achieve this,
we should deny the launch of VM

Tracked-On: #6508

Signed-off-by: Liu,Junming <junming.liu@intel.com>
Reviewed-by: Zhao Yakui <yakui.zhao@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
2021-09-28 08:49:01 +08:00
Victor Sun
28824c1e74 HV: init e820 before init paging
In the commit of 4e1deab3d9, we changed the
init sequence that init paging first and then init e820 because we worried
about the efi memory map could be beyond 4GB space on some platform.

After we double checked multiboot2 spec, when system boot from multiboot2
protocol, the efi memory map info will be embedded in multiboot info so it
is guaranteed that the efi memory map must be under 4GB space. Consider that
the page table will be allocated in free memory space in future, we have
to change the init sequence back that init e820 first and then init paging.

If we need to support other boot protocol in future that the efi memory map
might be put beyond 4GB, we could have below options:
	1. Request bootloader put efi memory map below 4GB;
	2. Call EFI_BOOT_SERVICES.GetMemoryMap() before ExitBootServices();
	3. Enable a early 64bit page table to get the efi memory map only;

Tracked-On: #5626

Signed-off-by: Victor Sun <victor.sun@intel.com>
2021-09-27 09:03:15 +08:00
Zide Chen
a62dd6ad8a hv: nested: fixed vmxoff_vmexit_handler() issue
In VMXOFF vmexit handler, it's supposed to remove VMCS shadowing.

Tracked-On: #6289
Signed-off-by: Zide Chen <zide.chen@intel.com>
2021-09-26 08:49:35 +08:00
Zide Chen
45b036e028 hv: nested: enable multiple active VMCS12 support
This patch changes the size of vvmcs[] array from 1 to
PER_VCPU_ACTIVE_VVMCS_NUM, and actually enables multiple active VMCS12
support in ACRN.  The basic operations:

- if L1 VMPTRLDs a VMCS12 without previously VMCLEAR the current
  VMCS12, ACRN no longer unconditionally flushes the current VMCS12
  back to L1.  Instead, it tries to keep both the current and the newly
  loaded VMCS12 in the nested->vvmcs[] array, unless:

- if there is no more available vvmcs[] entry, ACRN flushes one active
  VMCS12 to make room for this new VMCS12.

Tracked-On: #6289
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-09-26 08:49:35 +08:00
Mingqiang Chi
f39c882359 hv:change log level for check_vmx_ctrl
Some processors don't support VMX_PROCBASED_CTLS_TERTIARY bit
and VMX_PROCBASED_CTLS2_UWAIT_PAUSE bit in MSRs
(IA32_VMX_PROCBASED_CTLS & IA32_VMX_PROCBASED_CTLS2),
HV will output error log which will cause confusion,
change the log level from pr_err to pr_info.

Tracked-On: #6397

Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
2021-09-24 10:17:19 +08:00
Jie Deng
064fd7647f hv: add priority based scheduler
This patch adds a new priority based scheduler to support
vCPU scheduling based on their pre-configured priorities.
A vCPU can be running only if there is no higher priority
vCPU running on the same pCPU.

Tracked-On: #6571
Signed-off-by: Jie Deng <jie.deng@intel.com>
2021-09-24 09:32:18 +08:00
Zide Chen
94cbe909ee hv: irq: identical vector mapping if LAPIC passthough
In local APIC passthrough case, when devices triggered a INTx interrupt, this
interrupt would be delivered to vCPU directly. For this case, need to set the
virtual vector in
the 'Interrupt Vector' field of physical IOxAPIC I/O REDIRECTION TABLE REGISTER
(bits 7:0) and 'Vector' field of vt-d Interrupt Remapping Table Entry (IRTE)
for Remapped Interrupts.

Assumption:
(a) IOAPIC pins won't be shared between LAPIC PT guest and other guests;
(b) The guest would not trigger this IRQ before it switched to x2 APIC mode.

Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
2021-09-18 09:42:44 +08:00
Mingqiang Chi
db98f01b6e add vmx capability check
check some essential vmx capablility,
will panic if processor doesn't support it.

Tracked-On: #6584

Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-09-18 08:44:30 +08:00
dongshen
08d4517431 hv: fix bugs in RDT's CDP code
In current RDT code, if CDP is configured, L2/L3 resources' num_closids calculation
is wrong:
res_cap_info[res].num_closids = (uint16_t)((edx & 0xffffU) >> 1U) + 1U;

Should be:
res_cap_info[res].num_closids = (uint16_t)((edx & 0xffffU) >> 1U + 1) >> 1U;

Aslo, in order to enable CDP system-wide, need to enable the CDP bit (bit 0) on all pcpus,
not just on pcpu 0.

Tracked-On: #5917
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
2021-09-17 16:29:05 +08:00
dongshen
f4cdbba0bd hv: some cosmetic fixes to rdt.c/rdt.h
Rename the clos_max field in struct rdt_info to num_closids

Rename variable valid_clos_num to common_num_closids and make it static

Tracked-On: #5917
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
2021-09-17 16:29:05 +08:00
Zide Chen
0466d7055f hv: nested: move the VMCS12 dirty flags to struct acrn_vvmcs
These dirty flags are supposed to be per VMCS12, so move them from the
per vCPU acrn_nested struct to the newly added acrn_vvmcs struct.

Tracked-On: #6289
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-09-17 10:58:43 +08:00
Zide Chen
4e54c3880b hv: nested: remove vcpu->arch.nested.current_vmcs12_ptr
This variable represents the L1 GPA of the current VMCS12.  But it's
no longer needed in the multiple active VMCS12 case, which uses the
following variables for this purpose.

- nested->current_vvmcs refers to the vvmcs[] entry which contains the
  cached current VMCS12, its associated VMCS02, and other context info.

- nested->current_vvmcs->vmcs12_gpa refers to the L1 GPA of this
  current VMCS12.

Tracked-On: #6289
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-09-17 10:58:43 +08:00
Zide Chen
799a4d332a hv: nested: initial implementation of struct acrn_vvmcs
Add an array of struct acrn_vvmcs to struct acrn_nested, so it is
possible to cache multiple active VMCS12s.

This patch declares the size of this array to 1, meaning that there is
only one active VMCS12.  This is to minimize the logical code changes.

Add pointer current_vvmcs to struct acrn_nested, which refers to the
current vvmcs[] entry.  In this patch, if any VMCS12 is active, it
always points to vvmcs[0].

Tracked-On: #6289
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-09-17 10:58:43 +08:00
Zide Chen
cf697e753d hv: nested: some API signature changes
No any logical changes, this patch is preparing for multiple active
VMCS12 support.

- currently it's easy to get the vmcs12 pointer from the vcpu pointer.
  In multiple active vmcs12 case, we need to explicitly add "struct
  acrn_vmcs12 *vmcs12" to certain APIs' input argument list, in order to
  get the desired vmcs12 pointer.

- merge flush_current_vmcs12() into clear_vmcs02() for multiple reasons:
  a) it's called only once; b) we don't wrap the opposite operation
  (loading vmcs12) in an API; c) this API has simple and clear logic.

Tracked-On: #6289
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-09-17 10:58:43 +08:00
Zide Chen
e9eb72d319 hv: nested: flush L2 VPID only when it could conflict with L1 VPIDs
By changing the way to assign L1 VPID from bottom-up to top-down,
the possibilities for VPID conflicts between L1 and L2 guests are
small.

Then we can flush VPID just in case of conflicting.

Tracked-On: #6289
Signed-off-by: Anthony Xu <anthony.xu@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-09-16 09:26:10 +08:00
Zide Chen
1ab65825ba hv: nested: merge gpa_field_dirty and control_field_dirty flag
In run time, it's rare for L1 to write to the intercepted non host-state
VMCS fields, and using multiple dirty flags is not necessary.

This patch uses one single dirty flag to manage all non host-state VMCS
fields.  This helps to simplify current code and in the future we may
not need to declare new dirty flags when we intercept more VMCS fields.

Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
2021-09-13 15:50:01 +08:00
Zide Chen
6376d5a0d3 hv: nested: fix bug in syncing EPTP from VMCS12 to VMCS02
Currently vmptrld_vmexit_handler() doesn't sync VMX_EPT_POINTER_FULL
from vmcs12 to vmcs02, instead it sets gpa_field_dirty and relies on
nested_vmentry() to sync EPTP in next nested VMentry.

This creates readability issue since all other intercepted VMCS fields
are synced in sync_vmcs12_to_vmcs02().  Another issue is that other
VMCS fields managed by gpa_field_dirty are repeatedly synced in both
vmptrld and nested vmentry handler.

This patch moves get_nept_desc() ahead of sync_vmcs12_to_vmcs02(), such
that shadow_eptp is allocated before sync_vmcs12_to_vmcs02() which
can sync EPTP properly.

BTW, in nested_vmexit_handler(), don't need to read from VMCS to get
the exit reason, since vcpu->arch.exit_reason has it already.

Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
2021-09-13 15:50:01 +08:00
Zide Chen
11c2f3eabb hv: check bitmap before calling bitmap_test_and_clear_lock()
The locked btr instruction is expensive.  This patch changes the
logic to ensure that the bitmap is non-zero before executing
bitmap_test_and_clear_lock().

The VMX transition time gets significant improvement.  SOS running
on TGL, the CPUID roundtrip reduces from ~2400 cycles to ~2000 cycles.

Tracked-On: #6289
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-09-02 16:09:33 +08:00
Zide Chen
7cde4a8d40 hv: initialize host IA32_PAT MSR
Currently ACRN assumes firmware setup IA32_PAT correctly.  This patch
explicitly initializes host IA32_PAT MSR according to ISDM Table 11-12.
Memory Type Setting of PAT Entries Following a Power-up or Reset.

ACRN creates host page tables based on PAT0 (WB) and PAT3 (UC).

Tracked-On: #6289
Signed-off-by: Zide Chen <zide.chen@intel.com>
2021-09-02 09:15:39 +08:00
Zide Chen
aeb3690b6f hv: simplify is_lapic_pt_enabled()
is_lapic_pt_enabled() is called at least twice in one loop of the vCPU
thread, and it's called in vmexit_handler() frequently if LAPIC is not
pass-through.  Thus the efficiency of this function has direct
impact to the system performance.

Since the LAPIC mode is not changed in run time, we don't have to
calculate it on the fly in is_lapic_pt_enabled().

BTW, removed the unused lapic_mask from struct acrn_vcpu_arch.

Tracked-On: #6289
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-08-26 09:52:10 +08:00
Shiqing Gao
d90dbc0d91 hv: check the capability of XSAVES/XRSTORS instructions before execution
For platforms that do not support XSAVES/XRSTORS instructions, like QEMU,
executing these instructions causes #UD.
This patch adds the check before the execution of XSAVES/XRSTORS instructions.

It also refines the logic inside rstore_xsave_area for the following reason:
If XSAVES/XRSTORS instructions are supported, restore XSAVE area if any of the
following conditions is met:
 1. "vcpu->launched" is false (state initialization for guest)
 2. "vcpu->arch.xsave_enabled" is true (state restoring for guest)

 * Before vCPU is launched, condition 1 is satisfied.
 * After vCPU is launched, condition 2 is satisfied because
   is_valid_xsave_combination() guarantees that "vcpu->arch.xsave_enabled"
   is consistent with pcpu_has_cap(X86_FEATURE_XSAVES).
Therefore, the check against "vcpu->launched" and "vcpu->arch.xsave_enabled"
can be eliminated here.

Tracked-On: #6481

Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-08-26 09:42:23 +08:00
Zide Chen
cbf3825140 hv: Pass-through IA32_TSC_AUX MSR to L1 guest
Use an unused MSR on host to save ACRN pcpu ID and avoid saving and
restoring TSC AUX MSR on VMX transitions.

Tracked-On: #6289
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
2021-08-26 09:25:54 +08:00
Yifan Liu
d33c76f701 hv: quirks: SMBIOS passthrough for prelaunched-VM
This feature is guarded under config CONFIG_SECURITY_VM_FIXUP, which
by default should be disabled.

This patch passthrough native SMBIOS information to prelaunched VM.
SMBIOS table contains a small entry point structure and a table, of which
the entry point structure will be put in 0xf0000-0xfffff region in guest
address space, and the table will be put in the ACPI_NVS region in guest
address space.

v2 -> v3:
uuid_is_equal moved to util.h as inline API
result -> pVendortable, in function efi_search_guid
recalc_checksum -> generate_checksum
efi_search_smbios -> efi_search_smbios_eps
scan_smbios_eps -> mem_search_smbios_eps
EFI GUID definition kept

Tracked-On: #6320
Signed-off-by: Yifan Liu <yifan1.liu@intel.com>
2021-08-26 09:24:50 +08:00
Yifan Liu
975ff33e01 hv: Move uuid_is_equal to util.h
This patch moves uuid_is_equal from vm_config.c to util.h as inline API.

Tracked-On: #6320
Signed-off-by: Yifan Liu <yifan1.liu@intel.com>
2021-08-26 09:24:50 +08:00
Zide Chen
6d7eb6d7b6 hv: emulate IA32_EFER and adjust Load EFER VMX controls
This helps to improve performance:

- Don't need to execute VMREAD in vcpu_get_efer(), which is frequently
  called.

- VMX_EXIT_CTLS_SAVE_EFER can be removed from VM-Exit Controls.

- If the value of IA32_EFER MSR is identical between the host and guest
  (highly likely), adjust the VMX controls not to load IA32_EFER on
  VMExit and VMEntry.

It's convenient to continue use the exiting vcpu_s/get_efer() APIs,
other than the common vcpu_s/get_guest_msr().

Tracked-On: #6289
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
2021-08-24 11:16:53 +08:00
Liang Yi
2b3620de7d hv: mask off LA57 in cpuid
Mask off support of 57-bit linear addresses and five-level paging.

ICX-D has LA57 but ACRN doesn't support 5-level paging yet.

Tracked-On: #6357
Signed-off-by: Liang Yi <yi.liang@intel.com>
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
2021-08-20 11:02:21 +08:00
Shiqing Gao
651d44432c hv: initialize the XSAVE related processor state for guest
If SOS is using kernel 5.4, hypervisor got panic with #GP.

Here is an example on KBL showing how the panic occurs when kernel 5.4 is used:
Notes:
 * Physical MSR_IA32_XSS[bit 8] is 1 when physical CPU boots up.
 * vcpu_get_guest_msr(vcpu, MSR_IA32_XSS)[bit 8] is initialized to 0.

Following thread switches would happen at run time:
1. idle thread -> vcpu thread
   context_switch_in happens and rstore_xsave_area is called.
   At this moment, vcpu->arch.xsave_enabled is false as vcpu is not launched yet
   and init_vmcs is not called yet (where xsave_enabled is set to true).
   Thus, physical MSR_IA32_XSS is not updated with the value of guest MSR_IA32_XSS.

   States at this point:
    * Physical MSR_IA32_XSS[bit 8] is 1.
    * vcpu_get_guest_msr(vcpu, MSR_IA32_XSS)[bit 8] is 0.

2. vcpu thread -> idle thread
   context_switch_out happens and save_xsave_area is called.
   At this moment, vcpu->arch.xsave_enabled is true. Processor state is saved
   to memory with XSAVES instruction. As physical MSR_IA32_XSS[bit 8] is 1,
   ectx->xs_area.xsave_hdr.hdr.xcomp_bv[bit 8] is set to 1 after the execution
   of XSAVES instruction.

   States at this point:
    * Physical MSR_IA32_XSS[bit 8] is 1.
    * vcpu_get_guest_msr(vcpu, MSR_IA32_XSS)[bit 8] is 0.
    * ectx->xs_area.xsave_hdr.hdr.xcomp_bv[bit 8] is 1.

3. idle thread -> vcpu thread
   context_switch_in happens and rstore_xsave_area is called.
   At this moment, vcpu->arch.xsave_enabled is true. Physical MSR_IA32_XSS is
   updated with the value of guest MSR_IA32_XSS, which is 0.

   States at this point:
    * Physical MSR_IA32_XSS[bit 8] is 0.
    * vcpu_get_guest_msr(vcpu, MSR_IA32_XSS)[bit 8] is 0.
    * ectx->xs_area.xsave_hdr.hdr.xcomp_bv[bit 8] is 1.

   Processor state is restored from memory with XRSTORS instruction afterwards.
   According to SDM Vol1 13.12 OPERATION OF XRSTORS, a #GP occurs if XCOMP_BV
   sets a bit in the range 62:0 that is not set in XCR0 | IA32_XSS.
   So, #GP occurs once XRSTORS instruction is executed.

Such issue does not happen with kernel 5.10. Because kernel 5.10 writes to
MSR_IA32_XSS during initialization, while kernel 5.4 does not do such write.
Once guest writes to MSR_IA32_XSS, it would be trapped to hypervisor, then,
physical MSR_IA32_XSS and the value of MSR_IA32_XSS in vcpu->arch.guest_msrs
are updated with the value specified by guest. So, in the point 2 above,
correct processor state is saved. And #GP would not happen in the point 3.

This patch initializes the XSAVE related processor state for guest.
If vcpu is not launched yet, the processor state is initialized according to
the initial value of vcpu_get_guest_msr(vcpu, MSR_IA32_XSS), ectx->xcr0,
and ectx->xs_area. With this approach, the physical processor state is
consistent with the one presented to guest.

Tracked-On: #6434

Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
Reviewed-by: Li Fei1 <fei1.li@intel.com>
2021-08-20 09:46:09 +08:00
Zide Chen
2e6cf2b85b hv: nested: fix bugs in init_vmx_msrs()
Currently init_vmx_msrs() emulates same value for the IA32_VMX_xxx_CTLS
and IA32_VMX_TRUE_xxx_CTLS MSRs.

But the value of physical MSRs could be different between the pair,
and we need to adjust the emulated value accordingly.

Tracked-On: #6289
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-08-20 09:40:50 +08:00
Zide Chen
ad37553873 hv: nested: redundant permission check on nested_vmentry()
check_vmx_permission() is called in vmresume_vmexit_handler() and
vmlaunch_vmexit_handler() already.

Tracked-On: #6289
Signed-off-by: Zide Chen <zide.chen@intel.com>
2021-08-20 08:14:40 +08:00
Zhou, Wu
53f6720d13 HV: Combine the acpi loading fucntion to one place
Remove the acpi loading function from elf_loader, rawimage_loaer and
bzimage_loader, and call it together in vm_sw_loader.

Now the vm_sw_loader's job is not just loading sw, so we rename it to
prepare_os_image.

Tracked-On: #6323

Signed-off-by: Zhou, Wu <wu.zhou@intel.com>
Reviewed-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-08-19 20:00:45 +08:00
Victor Sun
3124018917 HV: vm_load: rename vboot_info.h to vboot.h
vboot_info.h declares vm loader function also, so rename the file name to
vboot.h;

Tracked-On: #6323

Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-08-19 20:00:45 +08:00