change the input parameter from vcpu to eptp in order to let this api
more generic, no need to care normal world or secure world.
Tracked-On: #1842
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Currently, the clos id of the cpu cores in vmx root mode is the same as non-root mode.
For RTVM, if hypervisor share the same clos id with non-root mode, the cacheline may
be polluted due to the hypervisor code execution when vmexit.
The patch adds hv_clos in vm_configurations.c
Hypervisor initializes clos setting according to hv_clos during physical cpu cores initialization.
For RTVM, MSR auto load/store areas are used to switch different settings for VMX root/non-root
mode for RTVM.
Tracked-On: #2462
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
-- remove some unnecessary includes
-- fix a typo
-- remove unnecessary void before launch_vms
Tracked-On: #1842
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Fix tsc_deadline issue by trapping TSC_DEADLINE msr write if VMX_TSC_OFFSET is not 0.
Because there is an assupmtion in the ACRN vART design that pTSC_Adjust and vTSC_Adjust are
both 0.
We can leave the TSC_DEADLINE write pass-through without correctness issue becuase there is
no offset between the pTSC and vTSC, and there is no write to vTSC or vTSC_Adjust write observed
in the RTOS so far.
This commit fix the potential correctness issue, but the RT performance will be badly affected
if vTSC or vTSC_Adjust was not zero, which we will address if such case happened.
Tracked-On: #3636
Signed-off-by: Yan, Like <like.yan@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Now that ACPI is enabled for pre-launched VMs, we can remove all mptable code.
Tracked-On: #3601
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Statically define the per vm RSDP/XSDT/MADT ACPI template tables in vacpi.c,
RSDP/XSDT tables are copied to guest physical memory after checksum is
calculated. For MADT table, first fix up process id/lapic id in its lapic
subtable, then the MADT table's checksum is calculated before it is copies to
guest physical memory.
Add 8-bit checksum function in util.h
Tracked-On: #3601
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
EPT tables are shared by MMU and IOMMU.
Some IOMMUs don't support page-walk coherency, the cpu cache of EPT entires
should be flushed to memory after modifications, so that the modifications
are visible to the IOMMUs.
This patch adds a new interface to flush the cache of modified EPT entires.
There are different implementations for EPT/PPT entries:
- For PPT, there is no need to flush the cpu cache after update.
- For EPT, need to call iommu_flush_cache to make the modifications visible
to IOMMUs.
Tracked-On: #3607
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Reviewed-by: Anthony Xu <anthony.xu@intel.com>
-- move 'RFLAGS_AC' to cpu.h
-- move 'VMX_SUPPORT_UNRESTRICTED_GUEST' to msr.h
and rename it to 'MSR_IA32_MISC_UNRESTRICTED_GUEST'
-- move 'get_vcpu_mode' to vcpu.h
-- remove deadcode 'vmx_eoi_exit()'
Tracked-On: #1842
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
move some data structures and APIs related host reset
from vm_reset.c to pm.c, these are not related with guest.
Tracked-On: #1842
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Now, we use native gdt saved in boot context for guest and assume
it could be put to same address of guest. But it may not be true
after the pre-launched VM is introduced. The gdt for guest could
be overwritten by guest images.
This patch make 32bit protect mode boot not use saved boot context.
Insteadly, we use predefined vcpu_regs value for protect guest to
initialize the guest bsp registers and copy pre-defined gdt table
to a safe place of guest memory to avoid gdt table overwritten by
guest images.
Tracked-On: #3532
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Currently, 'flags' is defined and set but never be used
in the flow of handling i/o request after then.
Tracked-On: #861
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
In the definition of port i/o handler, struct acrn_vm * pointer
is redundant as input, as context of acrn_vm is aleady linked
in struct acrn_vcpu * by vcpu->vm, 'vm' is not required as input.
this patch removes argument '*vm' from 'io_read_fn_t' &
'io_write_fn_t', use '*vcpu' for them instead.
Tracked-On: #861
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Check whether the address area pointed by the guest
cr3 is valid or not before loading pdptrs. Inject #GP(0)
to guest if there are any invalid cases.
Tracked-On: #3572
Signed-off-by: Jie Deng <jie.deng@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
fix violations touched below:
1.Cast operation on a constant value
2.signed/unsigned implicity conversion
3.return value unused.
V1->V2:
1.bitmap api will return boolean type, not need to check "!= 0", deleted.
2.The behaves ~(uint32_t)X and (uint32_t)~X are not defined in ACRN hypervisor Coding Guidelines,
removed the change of it.
Tracked-On: #861
Signed-off-by: Huihuang Shi <huihuang.shi@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
This patch moves vmx_rdmsr_pat/vmx_wrmsr_pat from vmcs.c to vmsr.c,
so that these two functions would become internal functions inside
vmsr.c.
This approach improves the modularity.
v1 -> v2:
* remove 'vmx_rdmsr_pat'
* rename 'vmx_wrmsr_pat' with 'write_pat_msr'
Tracked-On: #1842
Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Add emulated PCI device configure for SOS to prepare for add support for customizing
special pci operations for each emulated PCI device.
Tracked-On: #3475
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Create an iommu domain for all guest in vpci_init no matter if there's a PTDev
in it.
Tracked-On: #3475
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Dongsheng Zhang <dongsheng.x.zhang@intel.com>
Align SOS pci device configure with pre-launched VM and filter pre-launched VM's
PCI PT device from SOS pci device configure.
Tracked-On: #3475
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
For non-trusty hypercalls, HV should inject #GP(0) to vCPU if they are
from non-ring0 or inject #UD if they are from ring0 of non-SOS. Also
we should not modify RAX of vCPU for these invalid vmcalls.
Tracked-On: #3497
Signed-off-by: Victor Sun <victor.sun@intel.com>
In some case, guest need to get more information under virtual environment,
like guest capabilities. Basically this could be done by hypercalls, but
hypercalls are designed for trusted VM/SOS VM, We need a machenism to report
these information for normal VMs. In this patch, vCPUID leaf 0x40000001 will
be used to satisfy this needs that report some extended information for guest
by CPUID.
Tracked-On: #3498
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Zhao Yakui <yakui.zhao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The policy of vART is that software in native can run in
VM too. And in native side, the relationship between the
ART hardware and TSC is:
pTSC = (pART * M) / N + pAdjust
The vART solution is:
- Present the ART capability to guest through CPUID leaf
15H for M/N which identical to the physical values.
- PT devices see the pART (vART = pART).
- Guest expect: vTSC = vART * M / N + vAdjust.
- VMCS.OFFSET = vTSC - pTSC = vAdjust - pAdjust.
So to support vART, we should do the following:
1. if vAdjust and vTSC are changed by guest, we should change
VMCS.OFFSET accordingly.
2. Make the assumption that the pAjust is never touched by ACRN.
For #1, commit "a958fea hv: emulate IA32_TSC_ADJUST MSR" has implementation
it. And for #2, acrn never touch pAdjust.
--
v2 -> v3:
- Add comment when handle guest TSC_ADJUST and TSC accessing.
- Initialize the VMCS.OFFSET = vAdjust - pAdjust.
v1 -> v2
Refine commit message to describe the whole vART solution.
Tracked-On: #3501
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
After "commit f0e1c5e init vcpu host stack when reset vcpu", SOS resume form S3
wants to schedule to vcpu_thread not the point where SOS enter S3. So we should
schedule to idel first then reschedule to execute vcpu_thread.
Tracked-On: #3387
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
PMC is hidden from guest and hypervisor should
inject UD to guest when 'rdpmc' vmexit.
Tracked-On: #3453
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
Per SDM, writing 0 to MSR_IA32_MCG_STATUS is allowed, HV should not
return -EACCES on this case;
Tracked-On: #3454
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
'invept' is not expected in guest and hypervisor should
inject UD when 'invept' VM exit happens.
Tracked-On: #3444
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Otherwise, the previous local variables in host stack is not reset.
Tracked-On: #3387
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
softirq shouldn't be bounded to vcpu thread. One issue for this
is shell (based on timer) can't work if we don't start any guest.
This change also is trying best to make softirq handler running
with irq enabled.
Also update the irq disable/enabel in vmexit handler to align
with the usage in vcpu_thread.
Tracked-On: #3387
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
Because we depend on guest OS to switch x2apic mode to enable lapic pass-thru, vlapic is working at the early stage of booting, eg: in virtual boot loader.
After lapic pass-thru enabled, no interrupt should be injected via vlapic any more.
This commit resets the vlapic to clear the pending status and adds ptapic_ops to enforce that no more interrupt accepted/injected via vlapic.
Tracked-On: #3227
Signed-off-by: Yan, Like <like.yan@intel.com>
This commit adds ops to vlapic structure, and add an *ops parameter to vlapic_reset().
At vlapic reset, the ops is set to the global apicv_ops, and may be assigned
to other ops later.
Tracked-On: #3227
Signed-off-by: Yan, Like <like.yan@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
vCPU schedule state change is under schedule lock protection. So there's no need
to be atomic.
Tracked-On: #1842
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
For pre-launched VMs and SOS, vCPUs are created on BSP one by one; For post-launched VMs,
vCPUs are created under vmm_hypercall_lock protection. So vcpu_create is called sequentially.
Operation in vcpu_create don't need to be atomic.
Tracked-On: #1842
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
For per-vCPU, EOI exit bitmap is a global parameter which should set or clear
atomically since there's no lock to protect this critical variable.
Tracked-On: #1842
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Now sched_object and sched_context are protected by scheduler_lock. There's no
chance to use runqueue_lock to protect schedule runqueue if we have no plan to
support schedule migration.
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
'pcpu_id' should be less than CONFIG_MAX_PCPU_NUM,
else 'per_cpu_data' will overflow. This commit fixes
this potential overflow issue.
Tracked-On: #3397
Signed-off-by: Tianhua Sun <tianhuax.s.sun@intel.com>
Reviewed-by: Yonghua Huang <yonghua.huang@intel.com>
ACRN Coding guidelines requires two different types pointer can't
convert to each other, except void *.
Tracked-On: #861
Signed-off-by: Huihuang Shi <huihuang.shi@intel.com>
ACRN Coding guidelines requires type conversion shall be explicity.
Tracked-On: #861
Signed-off-by: Huihuang Shi <huihuang.shi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
In almost case, vLAPIC will only be accessed by the related vCPU. There's no
synchronization issue in this case. However, other vCPUs could deliver interrupts
to the current vCPU, in this case, the IRR (for APICv base situation) or PIR
(for APICv advanced situation) and TMR for both cases could be accessed by more
than one vCPUS simultaneously. So operations on IRR or PIR should be atomical
and visible to other vCPUs immediately. In another case, vLAPIC could be accessed
by another vCPU when create vCPU or reset vCPU which could be supposed to be
consequently.
Tracked-On: #1842
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
cancel_event_injection is not need any more if we do 'scheudle' prior to
acrn_handle_pending_request. Commit "921288a6672: hv: fix interrupt
lost when do acrn_handle_pending_request twice" bring 'schedule'
forward, so remove cancel_event_injection related stuff.
Tracked-On: #3374
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
ACRN Coding guidelines requires no dead code.
Tracked-On: #861
Signed-off-by: Huihuang Shi <huihuang.shi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Li, Fei1 <fei1.li@intel.com>
Microarchitectural Data Sampling (MDS) is a hardware vulnerability
which allows unprivileged speculative access to data which is available
in various CPU internal buffers.
1. Mitigation on ACRN:
1) Microcode update is required.
2) Clear CPU internal buffers (store buffer, load buffer and
load port) if current CPU is affected by MDS, when VM entry
to avoid any information leakage to guest thru above buffers.
3) Mitigation is not needed if ARCH_CAP_MDS_NO bit (bit5)
is set in IA32_ARCH_CAPABILITIES MSR (10AH), in this case,
current processor is no affected by MDS vulnerability, in other
cases mitigation for MDS is required.
2. Methods to clear CPU buffers (microcode update is required):
1) L1D cache flush
2) VERW instruction
Either of above operations will trigger clearing all
CPU internal buffers if this CPU is affected by MDS.
Above mechnism is enumerated by:
CPUID.(EAX=7H, ECX=0):EDX[MD_CLEAR=10].
3. Mitigation details on ACRN:
if (processor is affected by MDS)
if (processor is not affected by L1TF OR
L1D flush is not launched on VM Entry)
execute VERW instruction when VM entry.
endif
endif
4. Referrence:
Deep Dive: Intel Analysis of Microarchitectural Data Sampling
https://software.intel.com/security-software-guidance/insights/
deep-dive-intel-analysis-microarchitectural-data-sampling
Deep Dive: CPUID Enumeration and Architectural MSRs
https://software.intel.com/security-software-guidance/insights/
deep-dive-cpuid-enumeration-and-architectural-msrs
Tracked-On: #3317
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Anthony Xu <anthony.xu@intel.com>
Reviewed-by: Jason CJ Chen <jason.cj.chen@intel.com>
1. reset polarity of ptirq_remapping_info to zero.
this help to set correct initial pin state, and fix the interrupt lost issue
when assign a ptirq to uos.
2. since vioapic_generate_intr relys on rte, we should build rte before
generating an interrput, this fix the redundant interrupt.
Tracked-On: #3362
Signed-off-by: Cai Yulong <yulongc@hwtc.com.cn>
According to SDM, xsetbv writes the contents of registers EDX:EAX into the 64-bit
extended control register (XCR) specified in the ECX register. (On processors
that support the Intel 64 architecture, the high-order 32 bits of RCX are ignored.)
In current code, RCX is checked, should ingore the high-order 32bits.
Tracked-On: #3360
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Reviewed-by: Yonghua Huang <yonghua.huang@intel.com>
The current implement will trigger shutdown vm request on the BSP VCPU on the VM,
not the VCPU will trap out because triple fault. However, if the BSP VCPU on the VM
is handling another IO emulation, it may overwrite the triple fault IO request on
the vhm_request_buffer in function acrn_insert_request. The atomic operation of
get_vhm_req_state can't guarantee the vhm_request_buffer will not access by another
IO request if it is not running on the corresponding VCPU. So it should trigger
triple fault shutdown VM IO request on the VCPU which trap out because of triple
fault exception.
Besides, rt_vm_pm1a_io_write will do the right thing which we shouldn't do it in
triple_fault_shutdown_vm.
Tracked-On: #1842
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
According to SDM, bit N (physical address width) to bit 63 should be masked when calculate
host page frame number.
Currently, hypervisor doesn't set any of these bits, so gpa2hpa can work as expectd.
However, any of these bit set, gpa2hpa return wrong value.
Hypervisor never sets bit N to bit 51 (reserved bits), for simplicity, just mask bit 52 to bit 63.
Tracked-On: #3352
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
ACRN coding guideline requires function shall have only one return entry.
Fix it.
Tracked-On: #861
Signed-off-by: Huihuang Shi <huihuang.shi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
vcpu is never scan because of scan tool will be crashed!
After modulization, the vcpu can be scaned by the scan tool.
Clean up the violations in vcpu.c.
Fix the violations:
1.No brackets to then/else.
2.Function return value not checked.
3.Signed/unsigned coversion without cast.
V1->V2:
change the type of "vcpu->arch.irq_window_enabled" to bool.
V2->V3:
add "void *" prefix on the 1st parameter of memset.
Tracked-On: #861
Signed-off-by: Huihuang Shi <huihuang.shi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The sbuf is allocated for each pcpu by hypercall from SOS. Before launch
Guest OS, the script will offline cpus, which will trigger vcpu reset and
then reset sbuf pointer. But sbuf only initiate once by SOS, so these
cpus for Guest OS has no sbuf to use. Thus, when run 'acrntrace' on SOS,
there is no trace data for Guest OS.
To fix the issue, only reset the sbuf for SOS.
Tracked-On: #3335
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Yan, Like <like.yan@intel.com>
The current implement will cache each ISR vector in ISR vector stack and do
ISR vector stack check when updating PPR. However, there is no need to do this
because:
1) We will not touch vlapic->isrvec_stk[0] except doing vlapic_reset:
So we don't need to do vlapic->isrvec_stk[0] check.
2) We only deliver higher priority interrupt from IRR to ISR:
So we don't need to check whether vlapic->isrvec_stk interrupts is always increasing.
3) There're only 15 different priority interrupt, It will not happened that more that
15 interrupts could been delivered to ISR:
So we don't need to check whether vlapic->isrvec_stk_top will larger than
ISRVEC_STK_SIZE which is 16.
This patch try to remove ISR vector stack and use isrv to cache the vector number for
the highest priority bit that is set in the ISR.
Tracked-On: #1842
Signed-off-by: Li, Fei1 <fei1.li@intel.com>