In some case, guest need to get more information under virtual environment,
like guest capabilities. Basically this could be done by hypercalls, but
hypercalls are designed for trusted VM/SOS VM, We need a machenism to report
these information for normal VMs. In this patch, vCPUID leaf 0x40000001 will
be used to satisfy this needs that report some extended information for guest
by CPUID.
Tracked-On: #3498
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Zhao Yakui <yakui.zhao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Currently the Px Cx supported SoCs which listed in cpu_state_tbl.c is limited,
and it is not a wise option to build a huge state table data base to support
Px/Cx for other SoCs. This patch give a alternative solution that build a board
specific cpu state table in board.c which could be auto-generated by offline
tool, then the CPU Px/Cx of customer board could be enabled;
Hypervisor will search the cpu state table in cpu_state_tbl[] first, if not
found then go check board_cpu_state_tbl. If no matched cpu state table is found
then Px/Cx will not be supported;
Tracked-On: #3477
Signed-off-by: Victor Sun <victor.sun@intel.com>
After "commit f0e1c5e init vcpu host stack when reset vcpu", SOS resume form S3
wants to schedule to vcpu_thread not the point where SOS enter S3. So we should
schedule to idel first then reschedule to execute vcpu_thread.
Tracked-On: #3387
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Based on SDM Vol2 the monitor uses the RAX register to setup the address
monitored by HW. The mwait uses the rax/rcx as the hints that the process
will enter. It is incorrect that the same value is used for monitor/mwait.
The ecx in mwait specifies the optional externsions.
At the same time it needs to check whether the the value of monitored addr
is already expected before entering mwait. Otherwise it will have possible
lockup.
V1->V2: Add the asm wrappper of monitor/mwait to avoid the mixed usage of
inline assembly in wait_sync_change
v2-v3: Remove the unnecessary line break in asm_monitor/asm_mwait.
Follow Fei's comment to remove the mwait ecx hint setting that
treats the interrupt as break event. It only needs to check whether the
value of psync_change is already expected.
Tracked-On: #3442
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
When need hpa and hva translation before init_paging, we need hpa2hva_early and
hva2hpa_early since init_paging may modify hva2hpa to not be identical mapping.
Tracked-On: #2987
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Enable uart as early as possible to make things easier for debugging.
After this we could use printf to output information to the uart. As for
pr_xxx APIs, they start to work when init_logmsg is called.
Tracked-On: #2987
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
This is a followup patch to fix the coding style issue introduced
in by commit "c2d25aafb889ade954af8795df2405a94024d860":
The unmodified pointer should be defined as const
Also addressed one comments from Fei to use reversed function call
in vpci_init_pt_dev and vpci_deinit_pt_dev.
Tracked-On: #3241
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
softirq shouldn't be bounded to vcpu thread. One issue for this
is shell (based on timer) can't work if we don't start any guest.
This change also is trying best to make softirq handler running
with irq enabled.
Also update the irq disable/enabel in vmexit handler to align
with the usage in vcpu_thread.
Tracked-On: #3387
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
This commit adds ops to vlapic structure, and add an *ops parameter to vlapic_reset().
At vlapic reset, the ops is set to the global apicv_ops, and may be assigned
to other ops later.
Tracked-On: #3227
Signed-off-by: Yan, Like <like.yan@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
When initialize secondary pcpu, pass INVALID_CPU_ID as param of init_pcpu_pre()
looks weird, so change the param type to bool to represent whether the pcpu is
a BSP or AP.
Tracked-On: #3420
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
In x86 architecture, word/doubleword/quadword aligned read/write on its boundary
is atomic, so we may remove atomic load/store.
As for atomic set/clear, use bitmap_set/claer seems more reasonable. After replace
them all, we could remove them too.
Tracked-On: #1842
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
vCPU schedule state change is under schedule lock protection. So there's no need
to be atomic.
Tracked-On: #1842
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
ACRN Coding guidelines requires type conversion shall be explicity. However,
there's no need for this case since we could return bool directly.
Tracked-On: #1842
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Now sched_object and sched_context are protected by scheduler_lock. There's no
chance to use runqueue_lock to protect schedule runqueue if we have no plan to
support schedule migration.
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
And use vhostbridge for both SOS and pre-launched VM.
Tracked-On: #3241
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
It will be used to define pci vdev own ops. And high level API
will call this ops intead of invoking device specific functions
directly.
Tracked-On: #3241
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
In current design, devicemodel passes VM UUID to create VMs and hypervisor
would check the UUID whether it is matched with the one in VM configurations.
Kata container would maintain few UUIDs to let ACRN launch the VM, so
hypervisor need to add these UUIDs in VM configurations for Kata running.
In the hypercall of hcall_get_platform_info(), hypervisor will report the
maximum Kata container number it will support. The patch will add a Kconfig
to indicate the maximum Kata container number that SOS could support.
In current stage, only one Kata container is supported by SOS on SDC scenario
so add one UUID for Kata container in SDC VM configuration. If we want to
support Kata on other scenarios in the future, we could follow the example
of this patch;
Tracked-On: #3402
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
ACRN Coding guidelines requires two different types pointer can't
convert to each other, except void *.
Tracked-On: #861
Signed-off-by: Huihuang Shi <huihuang.shi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
ACRN Coding guidelines requires two different types pointer can't
convert to each other, except void *.
Tracked-On: #861
Signed-off-by: Huihuang Shi <huihuang.shi@intel.com>
ACRN Coding guidelines requires type conversion shall be explicity.
Tracked-On: #861
Signed-off-by: Huihuang Shi <huihuang.shi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
In almost case, vLAPIC will only be accessed by the related vCPU. There's no
synchronization issue in this case. However, other vCPUs could deliver interrupts
to the current vCPU, in this case, the IRR (for APICv base situation) or PIR
(for APICv advanced situation) and TMR for both cases could be accessed by more
than one vCPUS simultaneously. So operations on IRR or PIR should be atomical
and visible to other vCPUs immediately. In another case, vLAPIC could be accessed
by another vCPU when create vCPU or reset vCPU which could be supposed to be
consequently.
Tracked-On: #1842
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
In spite of vhm_req status could be updated in HV and DM on different CPUs, they
only change vhm_req status when they detect vhm_req status has been updated by
each other. So vhm_req status will not been misconfigured. However, before HV
sets vhm_req status to REQ_STATE_PENDING, vhm_req buffer filling should be visible
to DM. Add a write memory barrier to guarantee this.
Tracked-On: #1842
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Hypervisor exposes mitigation technique for Speculative
Store Bypass(SSB) to guests and allows a guest to determine
whether to enable SSBD mitigation by providing direct guest
access to IA32_SPEC_CTRL.
Before that, hypervisor should check the SSB mitigation support
on underlying processor, this patch is to add this capability check.
Tracked-On: #3385
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
ACRN Coding guidelines requires parameters need to add const prefix when the
parameter is not modified in its function or recursion function call.
Tracked-On: #861
Signed-off-by: Huihuang Shi <huihuang.shi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
cancel_event_injection is not need any more if we do 'scheudle' prior to
acrn_handle_pending_request. Commit "921288a6672: hv: fix interrupt
lost when do acrn_handle_pending_request twice" bring 'schedule'
forward, so remove cancel_event_injection related stuff.
Tracked-On: #3374
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
ACRN Coding guidelines requires no dead code.
Tracked-On: #861
Signed-off-by: Huihuang Shi <huihuang.shi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Li, Fei1 <fei1.li@intel.com>
Microarchitectural Data Sampling (MDS) is a hardware vulnerability
which allows unprivileged speculative access to data which is available
in various CPU internal buffers.
1. Mitigation on ACRN:
1) Microcode update is required.
2) Clear CPU internal buffers (store buffer, load buffer and
load port) if current CPU is affected by MDS, when VM entry
to avoid any information leakage to guest thru above buffers.
3) Mitigation is not needed if ARCH_CAP_MDS_NO bit (bit5)
is set in IA32_ARCH_CAPABILITIES MSR (10AH), in this case,
current processor is no affected by MDS vulnerability, in other
cases mitigation for MDS is required.
2. Methods to clear CPU buffers (microcode update is required):
1) L1D cache flush
2) VERW instruction
Either of above operations will trigger clearing all
CPU internal buffers if this CPU is affected by MDS.
Above mechnism is enumerated by:
CPUID.(EAX=7H, ECX=0):EDX[MD_CLEAR=10].
3. Mitigation details on ACRN:
if (processor is affected by MDS)
if (processor is not affected by L1TF OR
L1D flush is not launched on VM Entry)
execute VERW instruction when VM entry.
endif
endif
4. Referrence:
Deep Dive: Intel Analysis of Microarchitectural Data Sampling
https://software.intel.com/security-software-guidance/insights/
deep-dive-intel-analysis-microarchitectural-data-sampling
Deep Dive: CPUID Enumeration and Architectural MSRs
https://software.intel.com/security-software-guidance/insights/
deep-dive-cpuid-enumeration-and-architectural-msrs
Tracked-On: #3317
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Anthony Xu <anthony.xu@intel.com>
Reviewed-by: Jason CJ Chen <jason.cj.chen@intel.com>
ACRN hypervisor always print CPU microcode update
warning message on KBL NUC platform, even after
BIOS was updated to the latest.
'check_cpu_security_cap()' returns false if
no ARCH_CAPABILITIES MSR support on current platform,
but this MSR may not be available on some platforms.
This patch is to remove this pre-condition.
Tracked-On: #3317
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Jason CJ Chen <jason.cj.chen@intel.com>
Rename vbdf to bdf for the following reasons:
Use the same coding style as struct pci_pdev, as pci_pdev uses bdf instead of pbdf
pci_vdev implies the its bdf is virtual, no need to prefix bdf with the v
prefix (redundant)
Tracked-On: #3241
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Enable 64-bit bar emulation, if pbar is of type PCIBAR_MEM64, vbar will also be
of type PCIBAR_MEM64 instead of PCIBAR_MEM32
With 64-bit bar emulation code in place, we can remove enum pci_bar_type type
from struct pci_bar as bar type can be derived from struct pci_bar's reg member
by using the pci_get_bar_type function
Rename functions:
pci_base_from_size_mask --> git_size_masked_bar_base
Remove unused functions
Tracked-On: #3241
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
The current implement will trigger shutdown vm request on the BSP VCPU on the VM,
not the VCPU will trap out because triple fault. However, if the BSP VCPU on the VM
is handling another IO emulation, it may overwrite the triple fault IO request on
the vhm_request_buffer in function acrn_insert_request. The atomic operation of
get_vhm_req_state can't guarantee the vhm_request_buffer will not access by another
IO request if it is not running on the corresponding VCPU. So it should trigger
triple fault shutdown VM IO request on the VCPU which trap out because of triple
fault exception.
Besides, rt_vm_pm1a_io_write will do the right thing which we shouldn't do it in
triple_fault_shutdown_vm.
Tracked-On: #1842
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Since spinlock ptdev_lock is used to protect ptdev entry, there's no need to
use atomic operation to protect ptdev active flag in split of it's wrong used
to protect ptdev entry. And refine active flag data type to bool.
Tracked-On: #1842
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
According to SDM, bit N (physical address width) to bit 63 should be masked when calculate
host page frame number.
Currently, hypervisor doesn't set any of these bits, so gpa2hpa can work as expectd.
However, any of these bit set, gpa2hpa return wrong value.
Hypervisor never sets bit N to bit 51 (reserved bits), for simplicity, just mask bit 52 to bit 63.
Tracked-On: #3352
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Create 2 functions from code:
pci_base_from_size_mask
vdev_pt_remap_mem_vbar
Use vbar in place of vdev->bar[idx] by setting vbar to &vdev->bar[idx]
Change base to uint64_t to accommodate 64-bit MMIO bar size masking in
subsequent commits
Tracked-On: #3241
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
At this point, uint64_t base in struct pci_bar is not used by any code, so we
can remove it.
Tracked-On: #3241
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Define/Use variable in place of code to improve readability:
Define new local variable struct pci_bar *vbar, and use vbar-> in place of vdev->bar[idx].
Define new local variable uint64_t vbar_base in init_vdev_pt
Rename uint64_t vbar[PCI_BAR_COUNT] of struct acrn_vm_pci_ptdev_config to uint64_t vbar_base[PCI_BAR_COUNT]
Tracked-On: #3241
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
To remember the previously mapped/registered vbar base
For the following reasons:
register_mmio_emulation_handler() will throw an error if the the same addr_lo is
alreayd registered before
We are going to remove the base member from struct pci_bar, so we cannot use vdev->bar[idx].base
in the code any more
In subsequent commits, we will assume vdev_pt_remap_generic_mem_vbar() is called after a new
vbar base is set, mainly because of 64-bit mmio bar handling, so we need a
separate bar_base_mapped[] array to track the previously mapped vbar bases.
Tracked-On: #3241
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
vcpu is never scan because of scan tool will be crashed!
After modulization, the vcpu can be scaned by the scan tool.
Clean up the violations in vcpu.c.
Fix the violations:
1.No brackets to then/else.
2.Function return value not checked.
3.Signed/unsigned coversion without cast.
V1->V2:
change the type of "vcpu->arch.irq_window_enabled" to bool.
V2->V3:
add "void *" prefix on the 1st parameter of memset.
Tracked-On: #861
Signed-off-by: Huihuang Shi <huihuang.shi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The sbuf is allocated for each pcpu by hypercall from SOS. Before launch
Guest OS, the script will offline cpus, which will trigger vcpu reset and
then reset sbuf pointer. But sbuf only initiate once by SOS, so these
cpus for Guest OS has no sbuf to use. Thus, when run 'acrntrace' on SOS,
there is no trace data for Guest OS.
To fix the issue, only reset the sbuf for SOS.
Tracked-On: #3335
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Yan, Like <like.yan@intel.com>
Use nr_bars instead of PCI_BAR_COUNT to check bar access offset.
As while normal pci device has max 6 bars, pci bridge only has 2 bars,
so for pci normal pci device, pci cfg offsets 0x10-0x24 are for bar access,
but for pci bridge, only 0x10-0x14 are for bar access (0x18-0x24 are
for other accesses).
Rename function:
pci_bar_access --> is_bar_offset
Tracked-On: #3241
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
nr_bars in struct pci_pdev is used to store the actual # of bars (
6 for normal pci device and 2 for pci bridge), nr_bars will be used in subsequent
patches
Use uint32_t for bar related variables (bar index, etc) to unify the bar
related code (no casting between uint32_t and uint8_t)
Tracked-On: #3241
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
union pci_bar uses bit fields and follows the PCI bar spec definition to define the
bar flags portion and base address, this is to keep the same hardware format for vbar
register. The base/type of union pci_bar are still kept to minimize code changes
in one patch, they will be removed in subsequent patches.
define pci_pdev_get_bar_base() function to extract bar base address given a 32-bit raw
bar value
define a utility function pci_get_bar_type() to extract bar types
from raw bar value to simply code, as this function will be used in multiple
places later on: this function can be called on reg->value stored in struct
pci_bar to derive bar type.
Tracked-On: #3241
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Add get_offset_of_caplist() function to return capability offset based on header type:
For normal pci device and bridge, its capability offset is at offset 0x34
For cardbus, its capability offset is at offset 0x14
Tracked-On: #3241
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
find_pci_pdev is not used any more, remove it.
Tracked-On: #3241
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The current implement will cache each ISR vector in ISR vector stack and do
ISR vector stack check when updating PPR. However, there is no need to do this
because:
1) We will not touch vlapic->isrvec_stk[0] except doing vlapic_reset:
So we don't need to do vlapic->isrvec_stk[0] check.
2) We only deliver higher priority interrupt from IRR to ISR:
So we don't need to check whether vlapic->isrvec_stk interrupts is always increasing.
3) There're only 15 different priority interrupt, It will not happened that more that
15 interrupts could been delivered to ISR:
So we don't need to check whether vlapic->isrvec_stk_top will larger than
ISRVEC_STK_SIZE which is 16.
This patch try to remove ISR vector stack and use isrv to cache the vector number for
the highest priority bit that is set in the ISR.
Tracked-On: #1842
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Remove couple of run-time ASSERTs in ioapic module by checking for the
number of interrupt pins per IO-APICs against the configured MAX_IOAPIC_LINES
in the initialization flow.
Also remove the need for two MACROs specifying the max. number of
interrupt lines per IO-APIC and add a config item MAX_IOAPIC_LINES for the
same.
Tracked-On: #3299
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The has_rt_vm walk through all VMs to check RT VM flag and if
there is no any RT VM, then return false otherwise return true.
Signed-off-by: Jack Ren <jack.ren@intel.com>
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Reviewed-by: Li, Fei1 <fei1.li@intel.com>
The ept_flush_leaf_page API is used to flush address space
from a ept page entry, user can use it to match walk_ept_mr to
flush VM address space.
Signed-off-by: Jack Ren <jack.ren@intel.com>
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Reviewed-by: Li, Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The walk_ept_table API is used to walk through EPT table for getting
all of present pages, user can get each page entry and its size
from the walk_ept_table callback.
The get_ept_entry is used to getting EPT pointer of the vm, if current
context of mv is secure world, return secure world EPT pointer, otherwise
return normal world EPT pointer.
Signed-off-by: Jack Ren <jack.ren@intel.com>
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Reviewed-by: Li, Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
flush_address_space is used to flush address space by clflushopt instruction.
Signed-off-by: Jack Ren <jack.ren@intel.com>
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Reviewed-by: Li, Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>