Commit Graph

235 Commits

Author SHA1 Message Date
Yifan Liu
81b78d0464 hv: vcpu: Move kick_vcpu and vcpu_make_request to common scope
Adjust kick_vcpu logic and move to common scope.
Moves also vcpu_make_request to common scope and adds
vcpu_has_pending_request and vcpu_take_request helpers.

Tracked-On: #8830
Signed-off-by: Yifan Liu <yifan1.liu@intel.com>
Reviewed-by: Li Fei <fei1.li@intel.com>
Acked-by: Wang Yu1 <yu1.wang@intel.com>
2025-10-30 13:30:32 +08:00
Yifan Liu
c86fa2e2e2 hv: vcpu: Move some helpers to common vcpu.c
Some arch-independent helpers can be moved to common vcpu.c.

Tracked-On: #8830
Signed-off-by: Yifan Liu <yifan1.liu@intel.com>
Reviewed-by: Li Fei <fei1.li@intel.com>
Acked-by: Wang Yu1 <yu1.wang@intel.com>
2025-10-30 13:30:32 +08:00
Yifan Liu
c526809125 hv: vcpu: Move zombie_vcpu to common scope
Remove the extraneous new_state argument.

Tracked-On: #8830
Signed-off-by: Yifan Liu <yifan1.liu@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Wang Yu1 <yu1.wang@intel.com>
2025-10-30 13:30:32 +08:00
Yifan Liu
a870bac7f8 hv: vcpu: Move reset_vcpu to common scope
Move reset_vcpu to common. Original x86 reset_vcpu
takes an extra parameter to handle both reset and init reset.

Common API hides this detail and let arch specific code handle
this.

This patch also renames x86 specific vcpu_reset_internal to
x86_vcpu_reset_internal.

Tracked-On: #8830
Signed-off-by: Yifan Liu <yifan1.liu@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Wang Yu1 <yu1.wang@intel.com>
2025-10-30 13:30:32 +08:00
Yifan Liu
1a5bc2aae1 hv: vcpu: Move launch_vcpu to common scope
Tracked-On: #8830
Signed-off-by: Yifan Liu <yifan1.liu@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Wang Yu1 <yu1.wang@intel.com>
2025-10-30 13:30:32 +08:00
Yifan Liu
62d07897e2 hv: vcpu: Move offline_vcpu to common vcpu.c and rename to destroy_vcpu
destroy_vcpu: Call arch_deinit_vcpu, and do common deinit.

Tracked-On: #8830
Signed-off-by: Yifan Liu <yifan1.liu@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Wang Yu1 <yu1.wang@intel.com>
2025-10-30 13:30:32 +08:00
Yifan Liu
70bcf024da hv: vcpu: Move vcpu_set_state to common scope
Movement only.

Tracked-On: #8830
Signed-off-by: Yifan Liu <yifan1.liu@intel.com>
Acked-by: Wang Yu1 <yu1.wang@intel.com>
2025-10-30 13:30:32 +08:00
Yifan Liu
223ccc5711 hv: x86: Move update_vm_vlapic_state out of vcpu_set_state
vlapic state updating basically updates a per-VM variable of
vlapic mode. vlapic mode updating should NOT happen with each
and every vcpu state update. Consider the case where a VM has
all vcpus except the last one being X2APIC, and the last one
is in the process of transitioning to X2APIC. When HV is emulating
the transitioning, request processing fails and we zombie this
vcpu. This causes the vlapic_mode to be incorrectly set to
X2APIC.

vlapic mode updating should be confined to the following cases:
1, when guest changes APIC mode
2, when guest receives SIPI/INIT

Here we also prove that the logic is correct/equivalent as before.

update_vm_vlapic_state is called in vcpu state transitioning functions:
offline_vcpu, zombie_vcpu, reset_vcpu, launch_vcpu.

launch_vcpu:
launch_vcpu is called in two places. vBSP launch and vAP launch.
vBSP launch does not need to update vlapic state as by default
vm->arch_vm.vlapic_mode is set to XAPIC_MODE (set in create_vm).
vAP launch is handled by this patch.

reset_vcpu:
reset_vcpu is called in two places. INIT_RESET and VM reset.
INIT_RESET is handled in this patch. VM reset does not need to
update_vm_vlapic_state as we manually set this to default XAPIC
in reset_vm.

zombie_vcpu:
As stated above, zombie_vcpu should NOT change vlapic mode, as
the action of zombie_vcpu is transparent to guest. It is only called
to pause vcpu thread.

offline_vcpu:
Offline_vcpu is called in two places: shutdown_vm and hypercall to
offline Service VM vcpus. In the first case it doesn't matter as
VM is being destroyed. In the second case, Service VM is already
in one of XAPIC or X2APIC mode, and offlining vcpus does not change
this mode (therefore not needed).

Tracked-On: #8830
Signed-off-by: Yifan Liu <yifan1.liu@intel.com>
Acked-by: Wang Yu1 <yu1.wang@intel.com>
2025-10-30 13:30:32 +08:00
Yifan Liu
134c5f6ab7 hv: vcpu: Move create_vcpu to common vcpu.c
Move vcpu API create_vcpu to common.

* Break create_vcpu into common vcpu init and arch_init_vcpu
  for arch-specific initialization.
* Move vcpu_thread to arch-specific and rename to arch_vcpu_thread

Tracked-On: #8830
Signed-off-by: Yifan Liu <yifan1.liu@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Wang Yu1 <yu1.wang@intel.com>
2025-10-30 13:30:32 +08:00
Yifan Liu
688741074f hv: vm: Move vm common parts under common/vm.h (data structure)
This commit moves struct acrn_vm under common header vm.h, and move some
x86-specific members of struct acrn_vm into arch_vm. This commit focuses
on struct cleanup only. API cleanup will be in future patch series.

The affected members are:
e820_entry_num
e820_entries
wire_mode
wbinvd_lock
vlapic_mode_lock
vcpuid_entry_nr
vcpuid_level
vcpuid_xlevel
vcpuid_entries
reset_control
pm
sworld_control
sworld_snapshot
intr_inject_delay_delta

Moved to common vm.h:
ept_lock -> rename to stg2pt_lock
ept_pgtable -> rename to stg2_pgtable
nworld_eptp -> rename to root_stg2ptp
emul_mmio_lock
nr_emul_mmio_regions
emul_mmio
emul_pio

To avoid circular dependency, some in-header helpers are also moved into
common vm.h.

Tracked-On: #8830
Signed-off-by: Yifan Liu <yifan1.liu@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Wang Yu1 <yu1.wang@intel.com>
2025-10-30 13:30:32 +08:00
Yifan Liu
cf91e66ac0 hv: vcpu: Move vcpu common parts under common/vcpu.h (data structure)
This commit cleans up struct acrn_vcpu. vcpu API cleanup will be in
future patch series.

Create a common vcpu.h hosting struct acrn_vcpu, and move some x86
specific members of struct acrn_vcpu into struct acrn_vcpu_arch. These
members includes:

reg_cached
reg_updated
inst_ctxt

And pending_req is being moved from arch to common.

And the maximum number of events (i.e., VCPU_EVENT_NUM) are being
replaced by MAX_VCPU_EVENT_NUM.

To avoid circular dependency, some in-header helpers are moved into
vcpu.c with only prototypes being declared inside header.

Tracked-On: #8830
Signed-off-by: Yifan Liu <yifan1.liu@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Wang Yu1 <yu1.wang@intel.com>
2025-10-30 13:30:32 +08:00
Jian Jun Chen
0222bb0fc1 hv: multi-arch: move {arch_}get_random_value to random.c
Tracked-On: #8834
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Wang, Yu1 <yu1.wang@intel.com>
2025-10-29 17:45:44 +08:00
Xue Bosheng
d8970404e3 hv: move stack_frame out of vcpu
stack_frame is not only for vcpu thread, host thread needs
it, so move stack_frame out of vcpu file.

Tracked-On: #8812
Signed-off-by: Xue Bosheng <bosheng.xue@intel.com>
Reviewed-by: Yifan Liu  <yifan1.liu@intel.com>
Acked-by: Wang, Yu1 <yu1.wang@intel.com>
2025-09-30 12:31:07 +08:00
Xue Bosheng
e278b38ec4 hv: move percpu delivery mode and idle mode from common to x86
delivery mode and idle mode are x86 specific percpu, so move it from common to
x86 arch, also change the name of mode_to_idle to be idle_mode, change the name
of mode_to_kick_pcpu to be kick_pcpu_mode.

Tracked-On: #8812
Signed-off-by: Xue Bosheng <bosheng.xue@intel.com>
Acked-by: Wang, Yu1 <yu1.wang@intel.com>
2025-09-30 12:31:07 +08:00
Haoyu Tang
a226b5f0ec hv: multi-arch reconstruct bits library
Extract common interface to include/lib/bits.h, and invoke the variant
implementation of arch.
Re-implement unlocked functions as C in common library.
Rename bitmap*lock() to bitmap*(), bitmap*nolock() to bitmap*non_atomic().

Tracked-On: #8803
Signed-off-by: Haoyu Tang <haoyu.tang@intel.com>
Reviewed-by: Yifan Liu  <yifan1.liu@intel.com>
Acked-by: Wang, Yu1 <yu1.wang@intel.com>
2025-09-22 10:52:06 +08:00
hangliu1
41f04b4e7a HV: abstract percpu structure from x86 arch
Move x86 architecture dependent per cpu data into a
seperate structure and embeded it in per_cpu_region.
caller could access architecture dependent member by
using prefix 'arch.'.

v2->v3:
move whose_iwkey, profiling_info and tsc_suspend to x86
v1->v2:
rebased on latest repo

Tracked-On: #8801
Signed-off-by: hangliu1 <hang1.liu@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Reviewed-by: Liu, Yifan1 <yifan1.liu@intel.com>
Reviewed-by: Chen, Jian Jun<jian.jun.chen@intel.com>
Acked-by: Wang, Yu1 <yu1.wang@intel.com>
2025-09-19 15:04:55 +08:00
Zhangwei6
ddfcb8c3fc hv: enable thermal lvt interrupt
This patch can fetch the thermal lvt irq and propagate
it to VM.

At this stage we support the case that there is only one VM
governing thermal. And we pass the hardware thermal irq to this VM.

First, we register the handler for thermal lvt interrupt, its irq
vector is THERMAL_VECTOR and the handler is thermal_irq_handler().

Then, when a thermal irq occurs, it flags the SOFTIRQ_THERMAL bit
of softirq_pending, This bit triggers the thermal_softirq() function.
And this function will inject the virtual thermal irq to VM.

Tracked-On: #8595

Signed-off-by: Zhangwei6 <wei6.zhang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2024-05-16 09:40:32 +08:00
Qiang Zhang
6a1d91c740 hv: sched: Add sched_params struct for thread parameters
Abstract out schedulers config data for vCPU threads and other hypervisor
threads to sched_params structure. And it's used to initialize per
thread scheduler private data. The sched_params for vCPU threads come
from vm_config generated by config tools while other hypervisor threads
need give them explicitly.

Tracked-On: #8500
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
2023-09-18 16:26:05 +08:00
Wu Zhou
9a6e940849 hv: signal_event after make_request
make_request sets the request bit, and signal_event wakes the vcpu
thread. If we signal_event comes first, the target vCPU has a chance to
sleep again before processing the request bit.

Tracked-On: #8507
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2023-09-15 11:52:40 +08:00
Wu Zhou
064be1e3e6 hv: support halt in hv idle
When all vCPU threads on one pCPU are put to sleep (e.g., when all
guests execute HLT), hv would schedule to idle thread. Currently the
idle thread executes PAUSE which does not enter any c-state and consumes
a lot of power. This patch is to support HLT in the idle thread.

When we switch to HLT, we have to make sure events that would wake a
vCPU must also be able to wake the pCPU. Those events are either
generated by local interrupt or issued by other pCPUs followed by an
ipi kick.

Each of them have an interrupt involved, so they are also able to wake
the halted pCPU. Except when the pCPU has just scheduled to idle thread
but not yet halted, interrupts could be missed.

sleep-------schedule to idle------IRQ ON---HLT--(kick missed)
                                         ^
                              wake---kick|

This areas should be protected. This is done by a safe halt
mechanism leveraging STI instruction’s delay effect (same as Linux).

vCPUs with lapic_pt or hv with CONFIG_KEEP_IRQ_DISABLED=y does not allow
interrupts in root mode, so they could never wake from HLT (INIT kick
does not wake HLT in root mode either). They should continue using PAUSE
in idle.

Tracked-On: #8507
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2023-09-15 11:52:40 +08:00
Minggui Cao
bc4c773cf8 hv: add param to control INIT used to kick pCPU
By default, notification IPI used to kick sharing pCPU, INIT
used to kick partition pCPU. If USE_INIT_IPI flag is passed to
hypervisor, only INIT will be used to kick pCPU.

Tracked-On: #8207
Signed-off-by: Minggui Cao <minggui.cao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2022-09-26 13:28:02 +08:00
Minggui Cao
2c140addaf hv: use kick-mode in per-cpu to control kick pCPU
INIT signal has been used to kick off the partitioned pCPU, like RTVM,
whose LAPIC is pass-through. notification IPI is used to kick off
sharing pCPU.

Add mode_to_kick_pcpu in per-cpu to control the way of kicking
pCPU.

Tracked-On: #8207
Signed-off-by: Minggui Cao <minggui.cao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2022-09-26 13:28:02 +08:00
Ziheng Li
eb8bcb06b3 Update copyright year range in code headers
Modified the copyright year range in code, and corrected "int32_tel"
into "Intel" in two "hypervisor/include/debug/profiling.h" and
"hypervisor/include/debug/profiling_internal.h".

Tracked-On: #7559
Signed-off-by: Ziheng Li <ziheng.li@intel.com>
2022-07-15 11:48:35 +08:00
Jiang, Yanting
599894e571 Fix: write xmm registers correctly
The movdqu instruction moves unaligned double quadword (128 bit)
contained in XMM registers.

This patch uses pointers as input parameters of the function
write_xmm_0_2() to get 128-bit value from 64-bit array for each XMM
register.

Tracked-On: #7380
Reviewed-by: Fei Li <fei1.li@intel.com>
Signed-off-by: Jiang, Yanting <yanting.jiang@intel.com>
2022-05-06 10:29:33 +08:00
Geoffroy Van Cutsem
8b16be9185 Remove "All rights reserved" string headers
Many of the license and Intel copyright headers include the "All rights
reserved" string. It is not relevant in the context of the BSD-3-Clause
license that the code is released under. This patch removes those strings
throughout the code (hypervisor, devicemodel and misc).

Tracked-On: #7254
Signed-off-by: Geoffroy Van Cutsem <geoffroy.vancutsem@intel.com>
2022-04-06 13:21:02 +08:00
Minggui Cao
3b1deda0eb hv: revert NMI notification by INIT signal
NMI is used to notify LAPIC-PT RTVM, to kick its CPU into hypervisor.
But NMI could be used by system devices, like PMU (Performance Monitor
Unit). So use INIT signal as the partition CPU notification function, to
replace injecting NMI.

Also remove unused NMI as notification related code.

Tracked-On: #6966
Acked-by: Anthony Xu <anthony.xu@intel.com>
Signed-off-by: Minggui Cao <minggui.cao@intel.com>
2022-03-10 14:34:33 +08:00
Mingqiang Chi
b6b69f2178 hv: fix violations of coding guideline C-FN-16
The coding guideline rule C-FN-16 requires that 'Mixed-use of
C code and assembly code in a single function shall not be allowed',
this patch wraps inline assembly to inline functions.

Tracked-On: #6776
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>

v1-->v2:
    use inline functions for read/write XMM registers
2022-01-13 08:29:02 +08:00
Mingqiang Chi
3555aae4ac hv:remove 2 bits vmx capability check
remove is_valid_xsave_combination api,
assume the hardware or QEMU can guarantee that support
XSAVE on CPU side and XSAVE_XRSTR on VMX side or not.
will add offline-tool in QEMU platform to avoid the user
use wrong XSAVE configurations.
remov check VMX_PROCBASED_CTLS2_XSVE_XRSTR based on the above reason.
for VMX_PROCBASED_CTLS2_PAUSE_LOOP, now it will panic
if run ACRN over QEMU, here remove it from essential check,
and it will print error information when set this bit
if there is no the hardware capability.

v1-v2:
  remove is_valid_xsave_combination

Tracked-On: #6584
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
2021-12-13 08:52:52 +08:00
Shiqing Gao
7bbd17ce80 hv: initialize and save/restore IA32_TSC_AUX MSR for guest
Commit cbf3825 "hv: Pass-through IA32_TSC_AUX MSR to L1 guest"
lets guest own the physical MSR IA32_TSC_AUX and does not handle this MSR
in the hypervisor.
If multiple vCPUs share the same pCPU, when one vCPU reads MSR IA32_TSC_AUX,
it may get the value set by other vCPUs.

To fix this issue, this patch does:
 - initialize the MSR content to 0 for the given vCPU, which is consistent with
   the value specified in SDM Vol3 "Table 9-1. IA-32 and Intel 64 Processor
   States Following Power-up, Reset, or INIT"
 - save/restore the MSR content for the given vCPU during context switch

v1 -> v2:
 * According to Table 9-1, the content of IA32_TSC_AUX MSR is unchanged
   following INIT, v2 updates the initialization logic so that the content for
   vCPU is consistent with SDM.

Tracked-On: #6799
Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
Reviewed-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-11-12 09:30:12 +08:00
dongshen
dcafcadaf9 hv: rename some C preprocessor macros
Rename some C preprocessor macros:
  NUM_GUEST_MSRS --> NUM_EMULATED_MSRS
  CAT_MSR_START_INDEX --> FLEXIBLE_MSR_INDEX
  NUM_VCAT_MSRS --> NUM_CAT_MSRS
  NUM_VCAT_L2_MSRS --> NUM_CAT_L2_MSRS
  NUM_VCAT_L3_MSRS --> NUM_CAT_L3_MSRS

Tracked-On: #5917
Signed-off-by: Eddie Dong <eddie.dong@Intel.com>
2021-10-28 19:12:29 +08:00
Zide Chen
e48962faa6 hv: optimize run_vcpu() for nested
This patch implements a separate path for L2 VMEntry in run_vcpu(),
which has several benefits:

- keep run_vcpu() clean, to reduce the number of is_vcpu_in_l2_guest()
  statements:
  - current code has three is_vcpu_in_l2_guest() already.
  - supposed to have another 2 statement so that nested VMEntry won't
    hit the "Starting vCPU" and "vCPU launched" pr_info() and a few
    other statements in the VM launch path.

- save few other things in run_vcpu() that are not needed for nested.

Tracked-On: #6289
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-10-13 15:55:31 +08:00
Zide Chen
228b052fdb hv: operations on vcpu->reg_cached/reg_updated don't need LOCK prefix
In run time, one vCPU won't read or write a register on other vCPUs,
thus we don't need the LOCK prefixed instructions on reg_cached and
reg_updated.

Tracked-On: #6289
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-10-08 09:11:10 +08:00
Zide Chen
f801ba4ed7 hv: update guest RIP only if vcpu->arch.inst_len is non zero
In very large number of VM extis, the VM-exit instruction length could be
zero, and it's no need to update VMX_GUEST_RIP.

Some examples:

- all external interrupt VM exits in non LAPIC passthru setup.
- for all the nested VM-exits that are reflecting to L1 hypervisor.

Tracked-On: #6289
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-10-07 20:47:07 +08:00
Zide Chen
b7e9a68923 hv: code cleanup in run_vcpu()
- wrap a new function exec_vmentry() to reduce code duplication.
- remove exec_vmread(VMX_GUEST_RSP) since ACRN doesn't need to know
  guest RSP in run time.

Tracked-On: #6289
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-10-07 20:47:07 +08:00
Jie Deng
064fd7647f hv: add priority based scheduler
This patch adds a new priority based scheduler to support
vCPU scheduling based on their pre-configured priorities.
A vCPU can be running only if there is no higher priority
vCPU running on the same pCPU.

Tracked-On: #6571
Signed-off-by: Jie Deng <jie.deng@intel.com>
2021-09-24 09:32:18 +08:00
Zide Chen
e9eb72d319 hv: nested: flush L2 VPID only when it could conflict with L1 VPIDs
By changing the way to assign L1 VPID from bottom-up to top-down,
the possibilities for VPID conflicts between L1 and L2 guests are
small.

Then we can flush VPID just in case of conflicting.

Tracked-On: #6289
Signed-off-by: Anthony Xu <anthony.xu@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-09-16 09:26:10 +08:00
Zide Chen
11c2f3eabb hv: check bitmap before calling bitmap_test_and_clear_lock()
The locked btr instruction is expensive.  This patch changes the
logic to ensure that the bitmap is non-zero before executing
bitmap_test_and_clear_lock().

The VMX transition time gets significant improvement.  SOS running
on TGL, the CPUID roundtrip reduces from ~2400 cycles to ~2000 cycles.

Tracked-On: #6289
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-09-02 16:09:33 +08:00
Zide Chen
aeb3690b6f hv: simplify is_lapic_pt_enabled()
is_lapic_pt_enabled() is called at least twice in one loop of the vCPU
thread, and it's called in vmexit_handler() frequently if LAPIC is not
pass-through.  Thus the efficiency of this function has direct
impact to the system performance.

Since the LAPIC mode is not changed in run time, we don't have to
calculate it on the fly in is_lapic_pt_enabled().

BTW, removed the unused lapic_mask from struct acrn_vcpu_arch.

Tracked-On: #6289
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-08-26 09:52:10 +08:00
Shiqing Gao
d90dbc0d91 hv: check the capability of XSAVES/XRSTORS instructions before execution
For platforms that do not support XSAVES/XRSTORS instructions, like QEMU,
executing these instructions causes #UD.
This patch adds the check before the execution of XSAVES/XRSTORS instructions.

It also refines the logic inside rstore_xsave_area for the following reason:
If XSAVES/XRSTORS instructions are supported, restore XSAVE area if any of the
following conditions is met:
 1. "vcpu->launched" is false (state initialization for guest)
 2. "vcpu->arch.xsave_enabled" is true (state restoring for guest)

 * Before vCPU is launched, condition 1 is satisfied.
 * After vCPU is launched, condition 2 is satisfied because
   is_valid_xsave_combination() guarantees that "vcpu->arch.xsave_enabled"
   is consistent with pcpu_has_cap(X86_FEATURE_XSAVES).
Therefore, the check against "vcpu->launched" and "vcpu->arch.xsave_enabled"
can be eliminated here.

Tracked-On: #6481

Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2021-08-26 09:42:23 +08:00
Zide Chen
6d7eb6d7b6 hv: emulate IA32_EFER and adjust Load EFER VMX controls
This helps to improve performance:

- Don't need to execute VMREAD in vcpu_get_efer(), which is frequently
  called.

- VMX_EXIT_CTLS_SAVE_EFER can be removed from VM-Exit Controls.

- If the value of IA32_EFER MSR is identical between the host and guest
  (highly likely), adjust the VMX controls not to load IA32_EFER on
  VMExit and VMEntry.

It's convenient to continue use the exiting vcpu_s/get_efer() APIs,
other than the common vcpu_s/get_guest_msr().

Tracked-On: #6289
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
2021-08-24 11:16:53 +08:00
Shiqing Gao
651d44432c hv: initialize the XSAVE related processor state for guest
If SOS is using kernel 5.4, hypervisor got panic with #GP.

Here is an example on KBL showing how the panic occurs when kernel 5.4 is used:
Notes:
 * Physical MSR_IA32_XSS[bit 8] is 1 when physical CPU boots up.
 * vcpu_get_guest_msr(vcpu, MSR_IA32_XSS)[bit 8] is initialized to 0.

Following thread switches would happen at run time:
1. idle thread -> vcpu thread
   context_switch_in happens and rstore_xsave_area is called.
   At this moment, vcpu->arch.xsave_enabled is false as vcpu is not launched yet
   and init_vmcs is not called yet (where xsave_enabled is set to true).
   Thus, physical MSR_IA32_XSS is not updated with the value of guest MSR_IA32_XSS.

   States at this point:
    * Physical MSR_IA32_XSS[bit 8] is 1.
    * vcpu_get_guest_msr(vcpu, MSR_IA32_XSS)[bit 8] is 0.

2. vcpu thread -> idle thread
   context_switch_out happens and save_xsave_area is called.
   At this moment, vcpu->arch.xsave_enabled is true. Processor state is saved
   to memory with XSAVES instruction. As physical MSR_IA32_XSS[bit 8] is 1,
   ectx->xs_area.xsave_hdr.hdr.xcomp_bv[bit 8] is set to 1 after the execution
   of XSAVES instruction.

   States at this point:
    * Physical MSR_IA32_XSS[bit 8] is 1.
    * vcpu_get_guest_msr(vcpu, MSR_IA32_XSS)[bit 8] is 0.
    * ectx->xs_area.xsave_hdr.hdr.xcomp_bv[bit 8] is 1.

3. idle thread -> vcpu thread
   context_switch_in happens and rstore_xsave_area is called.
   At this moment, vcpu->arch.xsave_enabled is true. Physical MSR_IA32_XSS is
   updated with the value of guest MSR_IA32_XSS, which is 0.

   States at this point:
    * Physical MSR_IA32_XSS[bit 8] is 0.
    * vcpu_get_guest_msr(vcpu, MSR_IA32_XSS)[bit 8] is 0.
    * ectx->xs_area.xsave_hdr.hdr.xcomp_bv[bit 8] is 1.

   Processor state is restored from memory with XRSTORS instruction afterwards.
   According to SDM Vol1 13.12 OPERATION OF XRSTORS, a #GP occurs if XCOMP_BV
   sets a bit in the range 62:0 that is not set in XCR0 | IA32_XSS.
   So, #GP occurs once XRSTORS instruction is executed.

Such issue does not happen with kernel 5.10. Because kernel 5.10 writes to
MSR_IA32_XSS during initialization, while kernel 5.4 does not do such write.
Once guest writes to MSR_IA32_XSS, it would be trapped to hypervisor, then,
physical MSR_IA32_XSS and the value of MSR_IA32_XSS in vcpu->arch.guest_msrs
are updated with the value specified by guest. So, in the point 2 above,
correct processor state is saved. And #GP would not happen in the point 3.

This patch initializes the XSAVE related processor state for guest.
If vcpu is not launched yet, the processor state is initialized according to
the initial value of vcpu_get_guest_msr(vcpu, MSR_IA32_XSS), ectx->xcr0,
and ectx->xs_area. With this approach, the physical processor state is
consistent with the one presented to guest.

Tracked-On: #6434

Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
Reviewed-by: Li Fei1 <fei1.li@intel.com>
2021-08-20 09:46:09 +08:00
Fei Li
20061b7c39 hv: remove xsave dependence
ACRN could run without XSAVE Capability. So remove XSAVE dependence to support
more (hardware or virtual) platforms.

Tracked-On: #6287
Signed-off-by: Fei Li <fei1.li@intel.com>
2021-08-10 16:36:15 +08:00
Shuo A Liu
107cae316a hv: dm: Use new ioctl ACRN_IOCTL_SET_VCPU_REGS
struct acrn_set_vcpu_regs	->	struct acrn_vcpu_regs
struct acrn_vcpu_regs		->	struct acrn_regs
IC_SET_VCPU_REGS		->	ACRN_IOCTL_SET_VCPU_REGS

Tracked-On: #6282
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
2021-07-15 11:53:54 +08:00
Zide Chen
22d225663f hv: nested: update run_vcpu() function for nested case
Since L2 guest vCPU mode and VPID are managed by L1 hypervisor, so we
can skip these handling in run_vcpu().

And be careful that we can't cache L2 registers in struct acrn_vcpu.

Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-06-03 15:23:25 +08:00
Shuo A Liu
3fffa68665 hv: Support WAITPKG instructions in guest VM
TPAUSE, UMONITOR or UMWAIT instructions execution in guest VM cause
a #UD if "enable user wait and pause" (bit 26) of VMX_PROCBASED_CTLS2
is not set. To fix this issue, set the bit 26 of VMX_PROCBASED_CTLS2.

Besides, these WAITPKG instructions uses MSR_IA32_UMWAIT_CONTROL. So
load corresponding vMSR value during context switch in of a vCPU.

Please note, the TPAUSE or UMWAIT instruction causes a VM exit if the
"RDTSC exiting" and "enable user wait and pause" are both 1. In ACRN
hypervisor, "RDTSC exiting" is always 0. So TPAUSE or UMWAIT doesn't
cause a VM exit.

Performance impact:
    MSR_IA32_UMWAIT_CONTROL read costs ~19 cycles;
    MSR_IA32_UMWAIT_CONTROL write costs ~63 cycles.

Tracked-On: #6006
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
2021-05-13 14:19:50 +08:00
Liang Yi
688a41c290 hv: mod: do not use explicit arch name when including headers
Instead of "#include <x86/foo.h>", use "#include <asm/foo.h>".

In other words, we are adopting the same practice in Linux kernel.

Tracked-On: #5920
Signed-off-by: Liang Yi <yi.liang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-05-08 11:15:46 +08:00
Shuo A Liu
dc88c2e397 hv: Save/restore MSR_IA32_CSTAR during context switch
Both Windows guest and Linux guest use the MSR MSR_IA32_CSTAR, while
Linux uses it rarely. Now vcpu context switch doesn't save/restore it.
Windows detects the change of the MSR and rises a exception.

Do the save/resotre MSR_IA32_CSTAR during context switch.

Tracked-On: #5899
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-04-23 11:21:52 +08:00
Jian Jun Chen
31b8b698ce hv: TLFS: Add tsc_offset support for reference time
TLFS spec defines that when a VM is created, the value of
HV_X64_MSR_TIME_REF_COUNT is set to zero. Now tsc_offset is not
supported properly, so guest get a drifted reference time.

This patch implements tsc_offset. tsc_scale and tsc_offset
are calculated when a VM is launched and are saved in
struct acrn_hyperv of struct acrn_vm.

Tracked-On: #5956
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2021-04-23 10:48:07 +08:00
Liang Yi
33ef656462 hv/mod-irq: use arch specific header files
Requires explicit arch path name in the include directive.

The config scripts was also updated to reflect this change.

Tracked-On: #5825
Signed-off-by: Peter Fang <peter.fang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-24 11:38:14 +08:00
Liang Yi
df36da1b80 hv/mod_irq: do not include x86/irq.h in common/irq.h
Each .c file includes the arch specific irq header file (with full
path) by itself if required.

Tracked-On: #5825
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
2021-03-24 11:38:14 +08:00