dmar_reserve_irte is added to reserve N coutinuous IRTEs.
N could be 1, 2, 4, 8, 16, or 32.
The reserved IRTEs will not be freed.
Tracked-On:#4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
For a ptirq_remapping_info entry, when build IRTE:
- If the caller provides a valid IRTE, use the IRET
- If the caller doesn't provide a valid IRTE, allocate a IRET when the
entry doesn't have a valid IRTE, in this case, the IRET will be freed
when free the entry.
Tracked-On:#4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
idx_in:
- If the caller of dmar_assign_irte passes a valid IRTE index, it will
be resued;
- If the caller of dmar_assign_irte passes INVALID_IRTE_ID as IRTE index,
the function will allocate a new IRTE.
idx_out:
This paramter return the actual index of IRTE used. The caller need to
check whether the return value is valid or not.
Also this patch adds an internal function alloc_irte.
The function takes count as input paramter to allocate continuous IRTEs.
The count can only be 1, 2, 4, 8, 16 or 32.
This is prepared for multiple MSI vector support.
Tracked-On: #4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
There're some platforms still doesn't support 1GB large page on CPU side.
Such as lakefield, TNT and EHL platforms on which have some silicon bug and
this case CPU don't support 1GB large page.
This patch tries to release this constrain to support more hardware platform.
Note this patch doesn't release the constrain on IOMMU side.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
The information needed to enable MSI-x emulation.
Only enable MSI-x emuation for the devices in msix_emul_devs array.
Currently, only EHL has the need to enable MSI-x emulation for TSN
devices.
Tracked-On: #4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Previously append_seed_arg() just do fill in seed arg to dest cmd buffer,
so rename the api name to fill_seed_arg().
Since fill_seed_arg() will be called in SOS VM path only, the param of
bool vm_is_sos is not needed and will be replaced by dest buffer size.
The seed_args[] which used by fill_seed_arg() is pre-defined as all-zero,
so memset() is not needed in fill_seed_arg(), buffer pointer check
and strncpy_s() are not needed also.
Tracked-On: #4885
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add a standard string api strncat_s() to replace merge_cmdline() to make code
more readable.
Another change is that the multiboot cmdline will be appended to the end of
configured SOS bootargs instead of the beginning, this would enable a feature
that some kernel cmdline paramter items could be overriden by multiboot cmdline
since the later one would win if same parameters configured in kernel cmdline.
Tracked-On: #4885
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Now Host Bridge and PCI Bridge could only be added to SOS's acrn_vm_pci_dev_config.
So For UOS, we always emualte Host Bridge and PCI Bridge for it and assign PCI device
to it; for SOS, if it's the highest severity VM, we will assign Host Bridge and PCI
Bridge to it directly, otherwise, we will emulate them same as UOS.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
According PCI Code and ID Assignment Specification Revision 1.11, a PCI device
whose Base Class is 06h and Sub-Class is 00h is a Host bridge.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
As per the BWG a delay should be provided between the
INIT IPI and Startup IPI. Without the delay observe hangs
on certain platforms during MP Init sequence. So Setting
a delay of 10us between assert INIT IPI and Startup IPI.
Also, as per SDM section 10.7 the the de-assert INIT IPI is
only used for Pentium and P6 processors. This is not applicable
for Pentium4 and Xeon processors so removing this sequence.
Tracked-On: #4835
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Snoop control will not be turned on by hypervisor, delete snoop control
related code.
Tracked-On: #4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Invalidate cache by scanning and flushing the whole guest memory is
inefficient which might cause long execution time for WBINVD emulation.
A long execution in hypervisor might cause a vCPU stuck phenomenon what
impact Windows Guest booting.
This patch introduce a workaround method that pausing all other vCPUs in
the same VM when do wbinvd emulation.
Tracked-On: #4703
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
RTVM (with lapic PT) boots hang when maxcpus is
assigned a value less than the CPU number configured
in hypervisor.
In this case, vlapic_state(per VM) is left in TRANSITION
state after BSP boot, which blocks interupts to be injected
to this UOS.
Tracked-On: #4803
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Li, Fei <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
There's no need to look up MSI ptirq entry by virtual SID any more since the MSI
ptirq entry would be removed before the device is assigned to a VM.
Now the logic of MSI interrupt remap could simplify as:
1. Add the MSI interrupt remap first;
2. If step is already done, just do the remap part.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong<eddie.dong@Intel.com>
Reviewed-by: Grandhi, Sainath <sainath.grandhi@intel.com>
Now we could know a device status by 'user' filed, like
---------------------------------------------------------------------------
| NULL | == vdev | != NULL && != vdev
vdev->user | device is de-init | used by itself VM | assigned to another VM
---------------------------------------------------------------------------
So we don't need to modify 'vpci' field accordingly.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong<eddie.dong@Intel.com>
For post-launched VMs, the configured CPU affinity could be different
from the actual running CPU affinity. This new field acrn_vm->cpu_affinity
recognizes this difference so that it's possible that CREATE_VM
hypercall won't overwrite the configured CPU afifnity.
Change name cpu_affinity_bitmap in acrn_vm_config to cpu_affinity.
This is read-only in run time, never overwritten by acrn-dm.
Remove vm_config->vcpu_num, which means the number of vCPUs of the
configured CPU affinity. This is not to be confused with the actual
running vCPU number: vm->hw.created_vcpus.
Changed get_vm_bsp_pcpu_id() to get_configured_bsp_pcpu_id() for less
confusion.
Tracked-On: #4616
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
CDP is an extension of CAT. It enables isolation and separate prioritization of
code and data fetches to the L2 or L3 cache in a software configurable manner,
depending on hardware support.
This commit adds a Kconfig switch "CDP_ENABLED" which depends on "RDT_ENABLED".
CDP will be enabled if the capability available and "CDP_ENABLED" is selected.
Tracked-On: #4604
Signed-off-by: Yan, Like <like.yan@intel.com>
Reviewed-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This commit makes some RDT code cleanup, mainling including:
- remove the clos_mask and mba_delay validation check in setup_res_clos_msr(), the check will be done in pre-build;
- rename platform_clos_num to valid_clos_num, which is set as the minimal clos_mas of all enabled RDT resouces;
- init the platform_clos_array in the res_cap_info[] definition;
- remove the unnecessary return values and return value check.
Tracked-On: #4604
Signed-off-by: Yan, Like <like.yan@intel.com>
A RDT resource could be CAT or MBA, so only one of struct rdt_cache and struct rdt_membw
would be used at a time. They should be a union.
This commit merge struct rdt_cache and struct rdt_membw in to a union res.
Tracked-On: #4604
Signed-off-by: Yan, Like <like.yan@intel.com>
Reviewed-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com
The virtual MSI information could be included in ptirq_remapping_info structrue,
there's no need to pass another input paramater for this puepose. So we could
remove the ptirq_msi_info input.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Most code in the if ... else is duplicated. We could put it out of the
conditional statement.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Currently the vcpu_affinity[] array fixes the vCPU to pCPU mapping.
While the new cpu_affinity_bitmap doesn't explicitly sepcify this
mapping, instead, it implicitly assumes that vCPU0 maps to the pCPU
with lowest pCPU ID, vCPU1 maps to the second lowest pCPU ID, and
so on.
This makes it possible for post-launched VM to run vCPUs on a subset of
these pCPUs only, and not all of them.
acrn-dm may launch post-launched VMs with the current approach: indicate
VM UUID and hypervisor launches all VCPUs from the PCPUs that are masked
in cpu_affinity_bitmap.
Also acrn-dm can choose to launch the VM on a subset of PCPUs that is
defined in cpu_affinity_bitmap. In this way, acrn-dm must specify the
subset of PCPUs in the CREATE_VM hypercall.
Additionally, with this change, a guest's vcpu_num can be easily calculated
from cpu_affinity_bitmap, so don't assign vcpu_num in vm_configuration.c.
Tracked-On: #4616
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The parameter of "idle=halt" for SOS cmdline is only needed when cpu sharing
is enabled, otherwise it will impact SOS power.
Tracked-On: #4329
Signed-off-by: Victor Sun <victor.sun@intel.com>
The pci_dev config settings of SOS are same so move the config interface
from vm_configurations.c to CONFIG_SOS_VM macro;
Tracked-On: #4616
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Currently the vm uuid and severity is initilized separately in
vm_config struct, developer need to take care both items carefuly
otherwise hypervisor would have trouble with the configurations.
Given the vm loader_order/uuid and severity are binded tightly, the
patch merged these tree settings in one macro so that developer will
have a simple interface to configure in vm_config struct.
Tracked-On: #4616
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
If CPU has MSR_TEST_CTL, show an emulaued one to VCPU
Tracked-On: #4496
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Yan, Like <like.yan@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
If CPU support rise #AC for Splitlock Access, then enable this
feature at each CPU.
Tracked-On: #4496
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Yan, Like <like.yan@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
When the destination of an atomic memory operation located in 2
cache lines, it is called a Splitlock Access. LOCK# bus signal is
asserted for splitlock access which may lead to long latency. #AC
for Splitlock Access is a CPU feature, it allows rise alignment
check exception #AC(0) instead of asserting LOCK#, that is helpful
to detect Splitlock Access.
This feature is enumerated by MSR(0xcf) IA32_CORE_CAPABILITIES[bit5]
Add helper function:
bool has_core_cap(uint32_t bitmask)
Tracked-On: #4496
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Yan, Like <like.yan@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
remove unnecessary state check and
add pre-condition for vcpu APIs.
Tracked-On: #4320
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
check the vm state in hypercall api,
add pre-condition for vm api.
Tracked-On: #4320
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
ACRN disables Snoop Control in VT-d DMAR engines for simplifing the
implementation. Also, since the snoop behavior of PCIE transactions
can be controlled by guest drivers, some devices may take the advantage
of the NO_SNOOP_ATTRIBUTE of PCIE transactions for better performance
when snoop is not needed. No matter ACRN enables or disables Snoop
Control, the DMA operations of passthrough devices behave correctly
from guests' point of view.
This patch is used to clean all the snoop related code.
Tracked-On: #4509
Signed-off-by: Xiaoguang Wu <xiaoguang.wu@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
Waag will send NMIs to all its cores during reboot. But currently,
NMI cannot be injected to vcpu which is in HLT state.
To fix the problem, need to wakeup target vcpu, and inject NMI through
interrupt-window.
Tracked-On: #4620
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Currently vlapic_build_id() uses vcpu_id to retrieve the lapic_id
per_cpu variable:
vlapic_id = per_cpu(lapic_id, vcpu->vcpu_id);
SOS vcpu_id may not equal to pcpu_id, and in that case it runs into
problems. For example, if any pre-launched VMs are launched on PCPUs
whose IDs are smaller than any PCPU IDs that are used by SOS.
This patch fixes the issue and simplify the code to create or get
vapic_id by:
- assign vapic_id in create_vlapic(), which now takes pcpu_id as input
argument, and save it in the new field: vlapic->vapic_id, which will
never be changed.
- simplify vlapic_get_apicid() by returning te saved vapid_id directly.
- remove vlapic_build_id().
- vlapic_init() is only called once, merge it into vlapic_create().
Tracked-On: #4268
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Maintain a per-pCPU array of vCPUs (struct acrn_vcpu *vcpu_array[CONFIG_MAX_VM_NUM]),
one VM cannot have multiple vCPUs share one pcpu, so we can utilize this property
and use the containing VM's vm_id as the index to the vCPU array:
In create_vcpu(), we simply do:
per_cpu(vcpu_array, pcpu_id)[vm->vm_id] = vcpu;
In offline_vcpu():
per_cpu(vcpu_array, pcpuid_from_vcpu(vcpu))[vcpu->vm->vm_id] = NULL;
so basically we use the containing VM's vm_id as the index to the vCPU array,
as well as the index of posted interrupt IRQ/vector pair that are assigned
to this vCPU:
0: first vCPU and first posted interrupt IRQs/vector pair
(POSTED_INTR_IRQ/POSTED_INTR_VECTOR)
...
CONFIG_MAX_VM_NUM-1: last vCPU and last posted interrupt IRQs/vector pair
((POSTED_INTR_IRQ + CONFIG_MAX_VM_NUM - 1U)/(POSTED_INTR_VECTOR + CONFIG_MAX_VM_NUM - 1U)
In the posted interrupt handler, it will do the following:
Translate the IRQ into a zero based index of where the vCPU
is located in the vCPU list for current pCPU. Once the
vCPU is found, we wake up the waiting thread and record
this request as ACRN_REQUEST_EVENT
Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
This is a preparation patch for adding support for VT-d PI
related vCPU scheduling.
ACRN does not support vCPU migration, one vCPU always runs on
the same pCPU, so PI's ndst is never changed after startup.
VCPUs of a VM won’t share same pCPU. So the maximum possible number
of VCPUs that can run on a pCPU is CONFIG_MAX_VM_NUM.
Allocate unique Activation Notification Vectors (ANV) for each vCPU
that belongs to the same pCPU, the ANVs need only be unique within each
pCPU, not across all vCPUs. This reduces # of pre-allocated ANVs for
posted interrupts to CONFIG_MAX_VM_NUM, and enables ACRN to avoid
switching between active and wake-up vector values in the posted
interrupt descriptor on vCPU scheduling state changes.
A total of CONFIG_MAX_VM_NUM consecutive IRQs/vectors are reserved
for posted interrupts use.
The code first initializes vcpu->arch.pid.control.bits.nv dynamically
(will be added in subsequent patch), the other code shall use
vcpu->arch.pid.control.bits.nv instead of the hard-coded notification vectors.
Rename some functions:
apicv_post_intr --> apicv_trigger_pi_anv
posted_intr_notification --> handle_pi_notification
setup_posted_intr_notification --> setup_pi_notification
Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
Given the vcpumask, check if the IRQ is single destination
and return the destination vCPU if so, the address of associated PI
descriptor for this vCPU can then be passed to dmar_assign_irte() to
set up the posted interrupt IRTE for this device.
For fixed mode interrupt delivery, all vCPUs listed in vcpumask should
service the interrupt requested. But VT-d PI cannot support multicast/broadcast
IRQs, it only supports single CPU destination. So the number of vCPUs
shall be 1 in order to handle IRQ in posted mode for this device.
Add pid_paddr to struct intr_source. If platform_caps.pi is true and
the IRQ is single-destination, pass the physical address of the destination
vCPU's PID to ptirq_build_physical_msi and dmar_assign_irte
Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
Add platform_caps.c to maintain platform related information
Set platform_caps.pi to true if all iommus are posted interrupt capable, false
otherwise
If lapic passthru is not configured and platform_caps.pi is true, the vm
may be able to use posted interrupt for a ptdev, if the ptdev's IRQ is
single-destination
Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
EPT table can be changed concurrently by more than one vcpus.
This patch add a lock to protect the add/modify/delete operations
from different vcpus concurrently.
Tracked-On: #4253
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Reviewed-by: Li, Fei1 <fei1.li@intel.com>
This commit allows hypervisor to allocate cache to vcpu by assigning different clos
to vcpus of a same VM.
For example, we could allocate different cache to housekeeping core and real-time core
of an RTVM in order to isolate the interference of housekeeping core via cache hierarchy.
Tracked-On: #4566
Signed-off-by: Yan, Like <like.yan@intel.com>
Reviewed-by: Chen, Zide <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
As ACRN prepares to support servers with large amounts of memory
current logic to allocate space for 4K pages of EPT at compile time
will increase the size of .bss section of ACRN binary.
Bootloaders could run into a situation where they cannot
find enough contiguous space to load ACRN binary under 4GB,
which is typically heavily fragmented with E820 types Reserved,
ACPI data, 32-bit PCI hole etc.
This patch does the following
1) Works only for "direct" mode of vboot
2) reserves space for 4K pages of EPT, after boot by parsing
platform E820 table, for all types of VMs.
Size comparison:
w/o patch
Size of DRAM Size of .bss
48 GB 0xe1bbc98 (~226 MB)
128 GB 0x222abc98 (~548 MB)
w/ patch
Size of DRAM Size of .bss
48 GB 0x1991c98 (~26 MB)
128 GB 0x1a81c98 (~28 MB)
Tracked-On: #4563
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
hv: vtd: removed is_host (always false) and is_tt_ept (always true) member
variables of struct iommu_domain and related codes since the values are
always determined.
Tracked-On: #4535
Signed-off-by: Qian Wang <qian1.wang@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
We could use container_of to get vcpu structure pointer from vmtrr. So vcpu
structure pointer is no need in vmtrr structure.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
We could use container_of to get vcpu/vm structure pointer from vlapic. So vcpu/vm
structure pointer is no need in vlapic structure.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Exend union dmar_ir_entry to support VT-d posted interrupts.
Rename some fields of union dmar_ir_entry:
entry --> value
sw_bits --> avail
Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
Pass intr_src and dmar_ir_entry irte as pointers to dmar_assign_irte(),
which fixes the "Attempt to change parameter passed by value" MISRA C violation.
A few coding style fixes
Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
For CPU side posted interrupts, it only uses bit 0 (ON) of the PI's 64-bit control
, other bits are don't care. This is not the case for VT-d posted
interrupts, define more bit fields for the PI's 64-bit control.
Use bitmap functions to manipulate the bit fields atomically.
Some MISRA-C violation and coding style fixes
Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
The posted interrupt descriptor is more of a vmx/vmcs concept than a vlapic
concept. struct acrn_vcpu_arch stores the vmx/vmcs info, so put struct pi_desc
in struct acrn_vcpu_arch.
Remove the function apicv_get_pir_desc_paddr()
A few coding style/typo fixes
Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
Rename struct vlapic_pir_desc to pi_desc
Rename struct member and local variable pir_desc to pid
pir=posted interrupt request, pi=posted interrupt
pid=posted interrupt descriptor
pir is part of pi descriptor, so it is better to use pi instead of pir
struct pi_desc will be moved to vmx.h in subsequent commit.
Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
The cupid() can be replaced with cupid_subleaf, which is more clear.
Having both APIs makes reading difficult.
Tracked-On: #4526
Signed-off-by: Li Fei1 <fei1.li@intel.com>
For SOS VM, when the target platform has multiple IO-APICs, there
should be equal number of virtual IO-APICs.
This patch adds support for emulating multiple vIOAPICs per VM.
Tracked-On: #4151
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
MADT is used to specify the GSI base for each IO-APIC and the number of
interrupt pins per IO-APIC is programmed into Max. Redir. Entry register of
that IO-APIC.
On platforms with multiple IO-APICs, there can be holes in the GSI space.
For example, on a platform with 2 IO-APICs, the following configuration has
a hole (from 24 to 31) in the GSI space.
IO-APIC 1: GSI base - 0, number of pins - 24
IO-APIC 2: GSI base - 32, number of pins - 8
This patch also adjusts the size for variables used to represent the total
number of IO-APICs on the system from uint16_t to uint8_t as the ACPI MADT
uses only 8-bits to indicate the unique IO-APIC IDs.
Tracked-On: #4151
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
As ACRN prepares to support platforms with multiple IO-APICs,
GSI is a better way to represent physical and virtual INTx interrupt
source.
1) This patch replaces usage of "pin" with "gsi" whereever applicable
across the modules.
2) PIC pin to gsi is trickier and needs to consider the usage of
"Interrupt Source Override" structure in ACPI for the corresponding VM.
Tracked-On: #4151
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
Reverts 538ba08c: hv:Add vpin to ptdev entry mapping for vpic/vioapic
ACRN uses an array of size per VM to store ptirq entries against the vIOAPIC pin
and an array of size per VM to store ptirq entries against the vPIC pin.
This is done to speed up "ptirq entry" lookup at runtime for Level triggered
interrupts in API ptirq_intx_ack used on EOI.
This patch switches the lookup API for INTx interrupts to the API,
ptirq_lookup_entry_by_sid
This could add delay to processing EOI for Level triggered interrupts.
Trade-off here is space saved for array/s of size CONFIG_MAX_IOAPIC_LINES with 8 bytes
per data. On a server platform, ACRN needs to emulate multiple vIOAPICs for
SOS VM, same as the number of physical IO-APICs. Thereby ACRN would need around
10 such arrays per VM.
Removes the need of "pic_pin" except for the APIs facing the hypercalls
hcall_set_ptdev_intr_info, hcall_reset_ptdev_intr_info
Tracked-On: #4151
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
There're some cases the SOS (higher severity guest) needs to access the
post-launched VM (lower severity guest) PCI CFG space:
1. The SR-IOV PF needs to reset the VF
2. Some pass through device still need DM to handle some quirk.
In the case a device is assigned to a UOS and is not in a zombie state, the SOS
is able to access, if and only if the SOS has higher severity than the UOS.
Tracked-On: #4371
Signed-off-by: Li Fei1 <fei1.li@intel.com>
For a pre-launched VM, a region from PTDEV_HI_MMIO_START is used to store
64bit vBARs of PT devices which address is high than 4G. The region should
be located after all user memory space and be coverd by guest EPT address.
Tracked-On: #4458
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
ve820.c is a common file in arch/x86/guest/ now, so move function of
create_sos_vm_e820() to this file to make code structure clear;
Tracked-On: #4458
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
hypervisor/arch/x86/configs/$(BOARD)/ve820.c is used to store pre-launched
VM specific e820 entries according to memory configuration of customer.
It should be a scenario based configurations but we had to put it in per
board foler because of different board memory settings. This brings concerns
to customer on configuration orgnization.
Currently the file provides same e820 layout for all pre-launched VMs, but
they should have different e820 when their memory are configured differently.
Although we have acrn-config tool to generate ve802.c automatically, it
is not friendly to modify hardcoded ve820 layout manually, so the patch
changes the entries initialization method by calculating each entry item
in C code.
Tracked-On: #4458
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Currently ept_pages_info[] is initialized with first element only that force
VM of id 0 using SOS EPT pages. This is incorrect for logical partition and
hybrid scenario. Considering SOS_RAM_SIZE and UOS_RAM_SIZE are configured
separately, we should use different ept pages accordingly.
So, the PRE_VM_NUM/SOS_VM_NUM and MAX_POST_VM_NUM macros are introduced to
resolve this issue. The macros would be generated by acrn-config tool when
user configure ACRN for their specific scenario.
One more thing, that when UOS_RAM_SIZE is less then 2GB, the EPT address
range should be (4G + PLATFORM_HI_MMIO_SIZE).
Tracked-On: #4458
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
VM needs to check if it owns this device before deiniting it.
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
1. Renames DEFINE_IOAPIC_SID with DEFINE_INTX_SID as the virtual source can
be IOAPIC or PIC
2. Rename the src member of source_id.intx_id to ctlr to indicate interrupt
controller
2. Changes the type of src member of source_id.intx_id from uint32_t to
enum with INTX_CTLR_IOAPIC and INTX_CTLR_PIC
Tracked-On: #4447
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
This is to enable relocation for code32.
- RIP relative addressing is available in x86-64 only so we manually add
relocation delta to the target symbols to fixup code32.
- both code32 and code64 need to load GDT hence both need to fixup GDT
pointer. This patch declares separate GDT pointer cpu_primary64_gdt_ptr
for code64 to avoid double fixup.
- manually fixup cpu_primary64_gdt_ptr in code64, but not rely on relocate()
to do that. Otherwise it's very confusing that symbols from same file could
be fixed up externally by relocate() or self-relocated.
- to make it clear, define a new symbol ld_entry_end representing the end of
the boot code that needs manually fixup, and use this symbol in relocate()
to filter out all symbols belong to the entry sections.
Tracked-On: #4441
Reviewed-by: Fengwei Yin <fengwei.yin@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
This patch adds RDT MBA support to detect, configure and
and setup MBA throttle registers based on VM configuration.
Tracked-On: #3725
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The init_one_dev_config is used to initialize a acrn_vm_pci_dev_config
SRIOV needs a explicit acrn_vm_pci_dev_config to create a VF vdev,so
refine it to return acrn_vm_pci_dev_config.
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Current code avoid the rule 88 S in MISRA-C, so move xsaves and xrstors
assembler to individual functions.
Tracked-On: #4436
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Make the SRIOV-Capable device invisible from SOS if there is
no room for its all virtual functions.
v2: fix a issue that if a PF has been dropped, the subsequent PF
will be dropped too even there is room for its VFs.
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The init value for XCR0 and XSS should be the same with spec:
In SDM Vol1 13.3:
XCR0[0] is associated with x87 state (see Section 13.5.1). XCR0[0] is
always 1. The other bits in XCR0 are all 0 coming out of RESET.
The IA32_XSS MSR (with MSR index DA0H) is zero coming out of RESET.
The previous code try to fix the xsave area leak to other VMs during init
phase, but bring the error to linux. Besides, it cannot avoid the
possible leak in running phase. Need find a better solution.
Tracked-On: #4430
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
is not set
This patch does the following,
1. Removes RDT code if CONFIG_RDT_ENABLED flag is
not set.
2. Set the CONFIG_RDT_ENABLED flag only on platforms
that support RDT so that build scripts will automatically
reflect the config.
Tracked-On: #3715
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
cache configuration.
This patch creates a generic infrastructure for
RDT resources instead of just L2 or L3 cache. This
patch also fixes L3 CAT config overwrite by L2 in
cases where both L2 and L3 CAT are supported.
Tracked-On: #3715
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
There can be times when user unknowinlgy enables
CONFIG_CAT_ENBALED SW flag, but the hardware might
not support L3 or L2 CAT. In such case software can
end up writing to the CAT MSRs which can cause
undefined results. The patch fixes the issue by
enabling CAT only when both HW as well software
via the CONFIG_CAT_ENABLED supports CAT.
The patch also address typo with "clos2prq_msr"
function name. It should be "clos2pqr_msr" instead.
PQR stands for platform qos register.
Tracked-On: #3715
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Upcoming intel platforms can support both L2 and L3
but our current code only supports either L2 or L3 CAT.
So split the MSRs so that we can support allocation
for both L2 and L3.
This patch does the following,
1. splits programming of L2 and L3 cache resource
based on the resource ID.
2. Replace generic platform_clos_array struct with resource
specific struct in all the existing board.c files.
Tracked-On: #3715
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
As part of rdt cat refactoring, goal is to combine all rdt
specific features such as CAT under one module. So renaming
rdt resouce specific files such as cat.c/.h to generic rdt.c/.h
files.
Tracked-On: #3715
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Initialize efi info of acrn mbi when boot from multiboot2 protocol, with
this patch hypervisor could get host efi info and pass it to Linux zeropage,
then make guest Linux possible to boot with efi environment;
Tracked-On: #4419
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The patch re-arch boot component header files by:
- moving multiboot.h from include/arch/x86/ to boot/include/ and keep
this header for multiboot1 protocol data struct only;
- moving multiboot related MACROs in cpu_primary.S to multiboot.h;
- creating an independent boot.h to store acrn specific boot information
for other files' reference;
Tracked-On: #4419
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
BVT (Borrowed virtual time) scheduler is used to schedule vCPUs on pCPU.
It has the concept of virtual time, vCPU with earliset virtual time is
dispatched first.
Main concepts:
tick timer:
a period tick is used to measure the physcial time in units of MCU
(minimum charing unit).
runqueue:
thread in the runqueue is ordered by virtual time.
weight:
each thread receives a share of the pCPU in proportion to its
weight.
context switch allowance:
the physcial time by which the current thread is allowed to advance
beyond the next runnable thread.
warp:
a thread with warp enabled will have a change to minus a value (Wi)
from virtual time to achieve higher priority.
virtual time:
AVT: actual virtual time, advance in proportional to weight.
EVT: effective virtual time.
EVT <- AVT - ( warp ? Wi : 0 )
SVT: scheduler virtual time, the minimum AVT in the runqueue.
Tracked-On: #4410
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
1. Rename BOOT_CPU_ID to BSP_CPU_ID
2. Repace hardcoded value with BSP_CPU_ID when
ID of BSP is referenced.
Tracked-On: #4420
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Now we split passthrough PCI device from DM to HV, we could remove all the passthrough
PCI device unused code.
Tracked-On: #4371
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
To enable gvt-d,need to allow the GPU IOMMU.
While gvt-d hasn't been enabled on APL yet,
so let APL disable GPU IOMMU.
v2 -> v3:
* let APL platforms disable GPU IOMMU.
Tracked-On: #4405
Signed-off-by: Junming Liu <junming.liu@intel.com>
Reviewed-by: Wu Binbin <binbin.wu@intel.com>
In platforms that support CAT, when it is enabled by ACRN, i.e.
IA32_resourceType_MASK_n registers are programmed with customized values,
it has impacts to the whole system.
The per guest flag GUEST_FLAG_CLOS_REQUIRED suggests that CAT may be
enabled in some guests, but not in others who don't have this flag,
which is conceptually incorrect.
This patch removes GUEST_FLAG_CLOS_REQUIRED, and adds a new Kconfig
entry CAT_ENABLED for CAT enabling. When it's enabled, platform_clos_array[]
defines a set of system-wide Class of Service (COS, or CLOS), and the
per guest vm_configs[].clos associates the guest with particular CLOS.
Tracked-On: #2462
Signed-off-by: Zide Chen <zide.chen@intel.com>
1. Align the coding style for these MACROs
2. Align the values of fixed VECTORs
Tracked-On: #4348
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
is_polling_ioreq is more straightforward. Rename it.
Tracked-On: #4329
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Sometimes HV wants to know if there are pending interrupts of one vcpu.
Add .has_pending_intr interface in acrn_apicv_ops and return the pending
interrupts status by check IRRs of apicv.
Tracked-On: #4329
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Introduce two kinds of events for each vcpu,
VCPU_EVENT_IOREQ: for vcpu waiting for IO request completion
VCPU_EVENT_VIRTUAL_INTERRUPT: for vcpu waiting for virtual interrupts events
vcpu can wait for such events, and resume to run when the
event get signalled.
This patch also change IO request waiting/notifying to this way.
Tracked-On: #4329
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Since we restore BAR values when writing Command Register if necessary. We don't
need to trap FLR and do the BAR restore then.
Tracked-On: #3475
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Per SDM 10.12.5.1 vol.3, local APIC should keep LAPIC state after receiving
INIT. The local APIC ID register should also be preserved.
Tracked-On: #4267
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The patch abstract a vcpu_reset_internal() api for internal usage, the
function would not touch any vcpu state transition and just do vcpu reset
processing. It will be called by create_vcpu() and reset_vcpu().
The reset_vcpu() will act as a public api and should be called
only when vcpu receive INIT or vm reset/resume from S3. It should not be
called when do shutdown_vm() or hcall_sos_offline_cpu(), so the patch remove
reset_vcpu() in shutdown_vm() and hcall_sos_offline_cpu().
The patch also introduced reset_mode enum so that vcpu and vlapic could do
different context operation according to different reset mode;
Tracked-On: #4267
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Some MACROs in lapic.h are duplicated with apicreg.h, and some MACROs are
never referenced, remove them.
Tracked-On: #4268
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add severity definitions for different scenarios. The static
guest severity is defined according to guest configurations.
Also add sanity check to make sure the severity for all guests
are correct.
Tracked-On: #4270
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
For system S5, ACRN had assumption that SOS shutdown will trigger
system shutdown. So the system shutdown logical is:
1. Trap SOS shutdown
2. Wait for all other guest shutdown
3. Shutdown system
The new logical is refined as:
If all guest is shutdown, shutdown whole system
Tracked-On: #4270
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
ACRN hypervisor should trap guest doing PCIe FLR. Besides, it should save some status
before doing the FLR and restore them later, only BARs values for now.
This patch will trap guest Device Capabilities Register write operation if the device
supports PCI Express Capability and check whether it wants to do device FLR. If it does,
call pdev_do_flr to do the job.
Tracked-On: #3465
Signed-off-by: Li Fei1 <fei1.li@intel.com>
We don't use INIT signal notification method now. This patch
removes them.
Tracked-On: #3886
Acked-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
There is a window where we may miss the current request in the
notification period when the work flow is as the following:
CPUx + + CPUr
| |
| +--+
| | | Handle pending req
| <--+
+--+ |
| | Set req flag |
<--+ |
+------------------>---+
| Send NMI | | Handle NMI
| <--+
| |
| |
| +--> vCPU enter
| |
+ +
So, this patch enables the NMI-window exiting to trigger the next vmexit
once there is no "virtual-NMI blocking" after vCPU enter into VMX non-root
mode. Then we can process the pending request on time.
Tracked-On: #3886
Acked-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
The NMI for notification should not be inject to guest. So,
this patch drops NMI injection request when we use NMI
to notify vCPUs. Meanwhile, ACRN doesn't support vNMI well
and there is no well-designed way to check if the NMI is
for notification or for guest now. So, we take all the NMIs as
notificaton NMI for hard rtvm temporarily. It means that the
hard rtvm will never receive NMI with this patch applied.
TODO: vNMI support is not ready yet. we will add it later.
Tracked-On: #3886
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
Reserved bits in a 8-bit PAT field has been checked in pat_mem_type_invalid.
Remove this redundant check "(PAT_FIELD_RSV_BITS & field) != 0UL" in
write_pat_msr.
Tracked-On: #1842
Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
This patch adds a helper function send_single_nmi. The fisrt caller
will soon come with the following patch.
Tracked-On: #3886
Acked-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
This patch installs a NMI handler in acrn IDT to handle
NMIs out of dispatch_exception.
Tracked-On: #3886
Acked-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
In current architecutre, the maximum vCPUs number per VM could not
exceed the pCPUs number. Given the MAX_PCPU_NUM macro is provided
in board configurations, so remove the MAX_VCPUS_PER_VM from Kconfig
and add a macro of MAX_VCPUS_PER_VM to reference MAX_PCPU_NUM directly.
Tracked-On: #4230
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
rename the macro since MAX_PCPU_NUM could be parsed from board file and
it is not a configurable item anymore.
Tracked-On: #4230
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
remove 'guest_init_pml4' and 'tmp_pg_array' in vm_arch
since they are not used.
Tracked-On: #1842
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
IO sensitive Round-robin scheduler aim to schedule threads with
round-robin policy. Meanwhile, we also enhance it with some fairness
configuration, such as thread will be scheduled out without properly
timeslice. IO request on thread will be handled in high priority.
This patch only add a skeleton for the sched_iorr scheduler.
Tracked-On: #4178
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
On some platforms, HPA regions for Virtual Machine can not be
contiguous because of E820 reserved type or PCI hole. In such
cases, pre-launched VMs need to be assigned non-contiguous memory
regions and this patch addresses it.
To keep things simple, current design has the following assumptions,
1. HPA2 always will be placed after HPA1
2. HPA1 and HPA2 don’t share a single ve820 entry.
(Create multiple entries if needed but not shared)
3. Only support 2 non-contiguous HPA regions (can extend
at a later point for multiple non-contiguous HPA)
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Tracked-On: #4195
Acked-by: Anthony Xu <anthony.xu@intel.com>
With cpu-sharing enabled, there are more than 1 vcpu on 1 pcpu, so the
smp_call handler should switch the vmcs to the target vcpu's vmcs. Then
get the info.
dump_vcpu_reg and dump_guest_mem should run on certain vmcs, otherwise,
there will be #GP error.
Renaming:
vcpu_dumpreg -> dump_vcpu_reg
switch_vmcs -> load_vmcs
Tracked-On: #4178
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Deterministic is important for RTVM. The mitigation for MCE on
Page Size Change converts a large page to 4KB pages runtimely during
the vmexit triggered by the instruction fetch in the large page.
These vmexits increase nondeterminacy, which should be avoided for RTVM.
This patch builds 4KB page mapping in EPT for RTVM to avoid these vmexits.
Tracked-On: #4101
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Only apply the software workaround on the models that might be
affected by MCE on page size change. For these models that are
known immune to the issue, the mitigation is turned off.
Atom processors are not afftected by the issue.
Also check the CPUID & MSR to check whether the model is immune to the issue:
CPU is not vulnerable when both CPUID.(EAX=07H,ECX=0H).EDX[29] and
IA32_ARCH_CAPABILITIES[IF_PSCHANGE_MC_NO] are 1.
Other cases not listed above, CPU may be vulnerable.
This patch also changes MACROs for MSR IA32_ARCH_CAPABILITIES bits to UL instead of U
since the MSR is 64bit.
Tracked-On: #4101
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
After changing init_vmcs to smp call approach and do it before
launch_vcpu, it could work with noop scheduler. On real sharing
scheudler, it has problem.
pcpu0 pcpu1 pcpu1
vmBvcpu0 vmAvcpu1 vmBvcpu1
vmentry
init_vmcs(vmBvcpu1) vmexit->do_init_vmcs
corrupt current vmcs
vmentry fail
launch_vcpu(vmBvcpu1)
This patch mark a event flag when request vmcs init for specific vcpu. When
it is running and checking pending events, will do init_vmcs firstly.
Tracked-On: #4178
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
xsave area:
legacy region: 512 bytes
xsave header: 64 bytes
extended region: < 3k bytes
So, pre-allocate 4k area for xsave. Use certain instruction to save or
restore the area according to hardware xsave feature set.
Tracked-On: #4166
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Reviewed-by: Anthony Xu <anthony.xu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
ptirq_msix_remap doesn't do the real remap, that's the vmsi_remap and vmsix_remap_entry
does. ptirq_msix_remap only did the preparation.
Tracked-On: #3475
Signed-off-by: Li Fei1 <fei1.li@intel.com>
We add new member pci_pdev.drhd_idx associating the DRHD
(IOMMU) with this pdev, and a method to convert a pbdf of a device to
this index by searching the pdev list.
Partial patch: drhd_index initialization handled in subsequent patch.
Tracked-On: #4134
Signed-off-by: Alexander Merritt <alex.merritt@intel.com>
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
add shell command to support dump dump guest memory
e.g.
dump_guest_mem vm_id, gva, length
Tracked-On: #4144
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
In 64-bit mode, processor pushes SS and RSP onto stack unconditionally.
Also when dumping the exception info, it makes more sense to dump
the RSP at the point of interrupt, rather than the RSP after pushing
context (including GPRs)
Tracked-On: #4102
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Issue description:
-----------------
Machine Check Error on Page Size Change
Instruction fetch may cause machine check error if page size
and memory type was changed without invalidation on some
processors[1][2]. Malicious guest kernel could trigger this issue.
This issue applies to both primary page table and extended page
tables (EPT), however the primary page table is controlled by
hypervisor only. This patch mitigates the situation in EPT.
Mitigation details:
------------------
Implement non-execute huge pages in EPT.
This patch series clears the execute permission (bit 2) in the
EPT entries for large pages. When EPT violation is triggered by
guest instruction fetch, hypervisor converts the large page to
smaller 4 KB pages and restore the execute permission, and then
re-execute the guest instruction.
The current patch turns on the mitigation by default.
The follow-up patches will conditionally turn on/off the feature
per processor model.
[1] Refer to erratum KBL002 in "7th Generation Intel Processor
Family and 8th Generation Intel Processor Family for U Quad Core
Platforms Specification Update"
https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/7th-gen-core-family-spec-update.pdf
[2] Refer to erratum SKL002 in "6th Generation Intel Processor
Family Specification Update"
https://www.intel.com/content/www/us/en/products/docs/processors/core/desktop-6th-gen-core-family-spec-update.html
Tracked-On: #4101
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
MISRA C requires specified bounds for arrays declaration, previous declaration
of platform_clos_array in board.h does not meet the requirement.
Tracked-On: #3987
Signed-off-by: Victor Sun <victor.sun@intel.com>
The DMAR info is board specific so move the structure definition to board.c.
As a configruation file, the whole board.c could be generated by acrn-config
tool for each board.
Please note we only provide DMAR info MACROs for nuc7i7dnb board. For other
boards, ACPI_PARSE_ENABLED must be set to y in Kconfig to let hypervisor parse
DMAR info, or use acrn-config tool to generate DMAR info MACROs if user won't
enable ACPI parse code for FuSa consideration.
The patch also moves the function of get_dmar_info() to vtd.c, so dmar_info.c
could be removed.
Tracked-On: #3977
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The value of CONFIG_MAX_IOMMU and MAX_DRHDS are identical to DRHD_COUNT
which defined in platform ACPI table, so remove CONFIG_MAX_IOMMU_NUM
from Kconfig and link these three MACROs together.
Tracked-On: #3977
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Now the e820 structure store ACRN HV memory layout, not the physical memory layout.
Rename e820 to hv_hv_e820 to show this explicitly.
Tracked-On: #4007
Signed-off-by: Li Fei1 <fei1.li@intel.com>
After adding PCI BAR remap support, mmio_node may unregister when there's others
access it. This patch add a lock to protect mmio_node access.
Tracked-On: #3475
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Since guest could re-program PCI device MSI-X table BAR, we should add mmio
emulation handler unregister.
However, after add unregister_mmio_emulation_handler API, emul_mmio_regions
is no longer accurate. Just replace it with max_emul_mmio_regions which records
the max index of the emul_mmio_node.
Tracked-On: #3475
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
In theory, guest could re-program PCI BAR address to any address. However, ACRN
hypervisor only support [0, top_address_space) EPT memory mapping. So we need to
check whether the PCI BAR re-program address is within this scope.
Tracked-On: #3475
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
kick means to notify one thread_object. If the target thread object is
running, send a IPI to notify it; if the target thread object is
runnable, make reschedule on it.
Also add kick_vcpu API in vcpu layer to notify vcpu.
Tracked-On: #3813
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This patch decouple some scheduling logic and abstract into a scheduler.
Then we have scheduler, schedule framework. From modulization
perspective, schedule framework provides some APIs for other layers to
use, also interact with scheduler through scheduler interaces.
Tracked-On: #3813
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
- The default behaviors of PIO & MMIO handlers are same
for all VMs, no need to expose dedicated APIs to register
default hanlders for SOS and prelaunched VM.
Tracked-On: #3904
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Currently the parameter of init_ept_mem_ops is
'struct acrn_vm *vm' for this api,change it to
'struct memory_ops *mem_ops' and 'vm_id' to avoid
the reversed dependency, page.c is hardware layer and vm structure
is its upper-layer stuff.
Tracked-On: #1842
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
To support cpu sharing, multiple vcpu can run on same pcpu. We need do
necessary vcpu context switch. This patch add below actions in context
switch.
1) fxsave/fxrstor;
2) save/restore MSRs: MSR_IA32_STAR, MSR_IA32_LSTAR,
MSR_IA32_FMASK, MSR_IA32_KERNEL_GS_BASE;
3) switch vmcs.
Tracked-On: #3813
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
With cpu sharing enabled, per_cpu vcpu cannot work properly as we might
has multiple vcpus running on one pcpu.
Add a schedule API sched_get_current to get current thread_object on
specific pcpu, also add a vcpu API get_running_vcpu to get corresponding
vcpu of the thread_object.
Tracked-On: #3813
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
With cpu sharing enabled, we will map acrn_vcpu to thread_object
in scheduling. From modulization perspective, we'd better hide the
pcpu_id in acrn_vcpu and move it to thread_object.
Tracked-On: #3813
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Two time related synthetic MSRs are implemented in this patch. Both of
them are partition wide MSR.
- HV_X64_MSR_TIME_REF_COUNT is read only and it is used to return the
partition's reference counter value in 100ns units.
- HV_X64_MSR_REFERENCE_TSC is used to set/get the reference TSC page,
a sequence number, an offset and a multiplier are defined in this
page by hypervisor and guest OS can use them to calculate the
normalized reference time since partition creation, in 100ns units.
Tracked-On: #3831
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
This patch implements the minimum set of TLFS functionality. It
includes 6 vCPUID leaves and 3 vMSRs.
- 0x40000001 Hypervisor Vendor-Neutral Interface Identification
- 0x40000002 Hypervisor System Identity
- 0x40000003 Hypervisor Feature Identification
- 0x40000004 Implementation Recommendations
- 0x40000005 Hypervisor Implementation Limits
- 0x40000006 Implementation Hardware Features
- HV_X64_MSR_GUEST_OS_ID Reporting the guest OS identity
- HV_X64_MSR_HYPERCALL Establishing the hypercall interface
- HV_X64_MSR_VP_INDEX Retrieve the vCPU ID from hypervisor
Tracked-On: #3832
Signed-off-by: wenwumax <wenwux.ma@intel.com>
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
PRMRR related MSRs need to be configured by platform BIOS / bootloader.
These settings are not allowed to be changed by guest.
VMs currently have no requirement to access these MSRs even when vSGX is enabled.
So, this patch disables PRMRR related MSRs in VM.
Tracked-On: #3739
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
--remove unnecessary includes
--remove unnecssary forward-declaration for 'struct vhm_request'
Tracked-On: #861
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Add this vcpu_affinity[] for each VM to indicate the assignment policy.
With it, pcpu_bitmap is not needed, so remove it from vm_config.
Instead, vcpu_affinity is a must for each VM.
This patch also add some sanitize check of vcpu_affinity[]. Here are
some rules:
1) only one bit can be set for each vcpu_affinity of vcpu.
2) two vcpus in same VM cannot be set with same vcpu_affinity.
3) vcpu_affinity cannot be set to the pcpu which used by pre-launched VM.
v4: config SDC with CONFIG_MAX_KATA_VM_NUM
v5: config SDC with CONFIG_MAX_PCPU_NUM
Tracked-On: #3663
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
There is plan that define each VM configuration statically in HV and let
DM just do VM creating and destroying. So DM need get vcpu_num
information when VM creating.
This patch return the vcpu_num via the API param. And also initial the
VMs' cpu_num for existing scenarios.
Tracked-On: #3663
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Sometimes we need know the number of 1 in one bitmap. This patch provide
a inline function bitmap_weight for it.
Tracked-On: #3663
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
it will panic if phys_cpu_num > CONFIG_MAX_PCPU_NUM
during init_pcpu_pre,after that no need to check it again.
Tracked-On: #861
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
TSC would be reset to 0 when enter suspend state on some platform.
This will fail the secure timer checking in secure world because
secure world leverage the TSC as source of secure timer which should
be increased monotonously.
This patch save/restore TSC in host suspend/resume path to guarantee
the mono increasing TSC.
Note: There should no timer setup before TSC resumed.
Tracked-On: #3697
Signed-off-by: Qi Yadong <yadong.qi@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Now the structures(run_context & ext_context) are defined
in vcpu.h,and they are used in the lower-layer modules(wakeup.S),
this patch move down the structures from vcpu.h to cpu.h
to avoid reversed dependency.
Tracked-On: #1842
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Now the structures(union source & struct intr_source) are defined
in ptdev.h,they are used in vtd.c and assign.c,
vtd is the hardware layer and ptdev is the upper-layer module
from the modularization perspective, this patch move down
these structures to avoid reversed dependency.
Tracked-On: #1842
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Using per_cpu list to record ptdev interrupts is more reasonable than
recording them per-vm. It makes dispatching such interrupts more easier
as we now do it in softirq which happens following interrupt context of
each pcpu.
Tracked-On: #3663
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
From modulization perspective, it's not suitable to put pcpu and vm
related request operations in schedule. So move them to pcpu and vm
module respectively. Also change need_offline return value to bool.
Tracked-On: #3663
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Now, we have assumption that SOS control whether the platform
should enter S5 or not. So when SOS tries enter S5, we just
forward the S5 request to native port which make sure platform
S5 is totally aligned with SOS S5.
With higher serverity guest introduced,this assumption is not
true any more. We need to extend the platform S5 process to
handle higher severity guest:
- For DM launched RTVM, we need to make sure these guests
is off before put the whole platfrom to S5.
- For pre-launched VM, there are two cases:
* if os running in it support S5, we wait for guests off.
* if os running in it doesn't support S5, we expect it
will invoke one hypercall to notify HV to shutdown it.
NOTE: this case is not supported yet. Will add it in the
future.
Tracked-On: #3564
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Reviewed-by: Li, Fei1 <fei1.li@intel.com>
pcpu_active_bitmap was read continuously in wait_pcpus_offline(),
acrn_vcpu->running was read continuously in pause_vcpu(),
add volatile keyword to ensure that such accesses are not
optimised away by the complier.
Tracked-On: #1842
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
change the input parameter from vcpu to eptp in order to let this api
more generic, no need to care normal world or secure world.
Tracked-On: #1842
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Currently, the clos id of the cpu cores in vmx root mode is the same as non-root mode.
For RTVM, if hypervisor share the same clos id with non-root mode, the cacheline may
be polluted due to the hypervisor code execution when vmexit.
The patch adds hv_clos in vm_configurations.c
Hypervisor initializes clos setting according to hv_clos during physical cpu cores initialization.
For RTVM, MSR auto load/store areas are used to switch different settings for VMX root/non-root
mode for RTVM.
Tracked-On: #2462
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
-- remove some unnecessary includes
-- fix a typo
-- remove unnecessary void before launch_vms
Tracked-On: #1842
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
ACRN HV is designed/implemented with "invariant TSC" capability, which wasn't checked at boot time.
This commit adds the "invairant TSC" detection, ACRN fails to boot if there wasn't "invariant TSC" capability.
Tracked-On: #3636
Signed-off-by: Yan, Like <like.yan@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Now that ACPI is enabled for pre-launched VMs, we can remove all mptable code.
Tracked-On: #3601
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Statically define the per vm RSDP/XSDT/MADT ACPI template tables in vacpi.c,
RSDP/XSDT tables are copied to guest physical memory after checksum is
calculated. For MADT table, first fix up process id/lapic id in its lapic
subtable, then the MADT table's checksum is calculated before it is copies to
guest physical memory.
Add 8-bit checksum function in util.h
Tracked-On: #3601
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
EPT tables are shared by MMU and IOMMU.
Some IOMMUs don't support page-walk coherency, the cpu cache of EPT entires
should be flushed to memory after modifications, so that the modifications
are visible to the IOMMUs.
This patch adds a new interface to flush the cache of modified EPT entires.
There are different implementations for EPT/PPT entries:
- For PPT, there is no need to flush the cpu cache after update.
- For EPT, need to call iommu_flush_cache to make the modifications visible
to IOMMUs.
Tracked-On: #3607
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Reviewed-by: Anthony Xu <anthony.xu@intel.com>
VT-d shares the EPT tables as the second level translation tables.
For the IOMMUs that don't support page-walk coherecy, cpu cache should
be flushed for the IOMMU EPT entries that are modified.
For the current implementation, EPT tables for translating from GPA to HPA
for EPT/IOMMU are not modified after VM is created, so cpu cache invlidation is
done once per VM before starting execution of VM.
However, this may be changed, runtime EPT modification is possible.
When cpu cache of EPT entries is invalidated when modification, there is no need
invalidate cpu cache globally per VM.
This patch exports iommu_flush_cache for EPT entry cache invlidation operations.
- IOMMUs share the same copy of EPT table, cpu cache should be flushed if any of
the IOMMU active doesn't support page-walk coherency.
- In the context of ACRN, GPA to HPA mapping relationship is not changed after
VM created, skip flushing iotlb to avoid potential performance penalty.
Tracked-On: #3607
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Reviewed-by: Anthony Xu <anthony.xu@intel.com>
-- move 'RFLAGS_AC' to cpu.h
-- move 'VMX_SUPPORT_UNRESTRICTED_GUEST' to msr.h
and rename it to 'MSR_IA32_MISC_UNRESTRICTED_GUEST'
-- move 'get_vcpu_mode' to vcpu.h
-- remove deadcode 'vmx_eoi_exit()'
Tracked-On: #1842
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
move some data structures and APIs related host reset
from vm_reset.c to pm.c, these are not related with guest.
Tracked-On: #1842
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Now, we use native gdt saved in boot context for guest and assume
it could be put to same address of guest. But it may not be true
after the pre-launched VM is introduced. The gdt for guest could
be overwritten by guest images.
This patch make 32bit protect mode boot not use saved boot context.
Insteadly, we use predefined vcpu_regs value for protect guest to
initialize the guest bsp registers and copy pre-defined gdt table
to a safe place of guest memory to avoid gdt table overwritten by
guest images.
Tracked-On: #3532
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
fix violations touched below:
1.Cast operation on a constant value
2.signed/unsigned implicity conversion
3.return value unused.
V1->V2:
1.bitmap api will return boolean type, not need to check "!= 0", deleted.
2.The behaves ~(uint32_t)X and (uint32_t)~X are not defined in ACRN hypervisor Coding Guidelines,
removed the change of it.
Tracked-On: #861
Signed-off-by: Huihuang Shi <huihuang.shi@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
This patch moves vmx_rdmsr_pat/vmx_wrmsr_pat from vmcs.c to vmsr.c,
so that these two functions would become internal functions inside
vmsr.c.
This approach improves the modularity.
v1 -> v2:
* remove 'vmx_rdmsr_pat'
* rename 'vmx_wrmsr_pat' with 'write_pat_msr'
Tracked-On: #1842
Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Add a field (vdev_ops) in struct acrn_vm_pci_dev_config to configure a PCI CFG
operation for an emulated PCI device. Use pci_pt_dev_ops for PCI_DEV_TYPE_PTDEV
by default if there's no such configure.
Tracked-On: #3475
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add emulated PCI device configure for SOS to prepare for add support for customizing
special pci operations for each emulated PCI device.
Tracked-On: #3475
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Align SOS pci device configure with pre-launched VM and filter pre-launched VM's
PCI PT device from SOS pci device configure.
Tracked-On: #3475
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
pci_dev_config in VM configure stores all the PCI devices for a VM. Besides PT
devices, there're other type devices, like virtual host bridge. So rename ptdev
to pci_dev for these configure.
Tracked-On: #3475
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
In some case, guest need to get more information under virtual environment,
like guest capabilities. Basically this could be done by hypercalls, but
hypercalls are designed for trusted VM/SOS VM, We need a machenism to report
these information for normal VMs. In this patch, vCPUID leaf 0x40000001 will
be used to satisfy this needs that report some extended information for guest
by CPUID.
Tracked-On: #3498
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Zhao Yakui <yakui.zhao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Currently the Px Cx supported SoCs which listed in cpu_state_tbl.c is limited,
and it is not a wise option to build a huge state table data base to support
Px/Cx for other SoCs. This patch give a alternative solution that build a board
specific cpu state table in board.c which could be auto-generated by offline
tool, then the CPU Px/Cx of customer board could be enabled;
Hypervisor will search the cpu state table in cpu_state_tbl[] first, if not
found then go check board_cpu_state_tbl. If no matched cpu state table is found
then Px/Cx will not be supported;
Tracked-On: #3477
Signed-off-by: Victor Sun <victor.sun@intel.com>
After "commit f0e1c5e init vcpu host stack when reset vcpu", SOS resume form S3
wants to schedule to vcpu_thread not the point where SOS enter S3. So we should
schedule to idel first then reschedule to execute vcpu_thread.
Tracked-On: #3387
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Based on SDM Vol2 the monitor uses the RAX register to setup the address
monitored by HW. The mwait uses the rax/rcx as the hints that the process
will enter. It is incorrect that the same value is used for monitor/mwait.
The ecx in mwait specifies the optional externsions.
At the same time it needs to check whether the the value of monitored addr
is already expected before entering mwait. Otherwise it will have possible
lockup.
V1->V2: Add the asm wrappper of monitor/mwait to avoid the mixed usage of
inline assembly in wait_sync_change
v2-v3: Remove the unnecessary line break in asm_monitor/asm_mwait.
Follow Fei's comment to remove the mwait ecx hint setting that
treats the interrupt as break event. It only needs to check whether the
value of psync_change is already expected.
Tracked-On: #3442
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
When need hpa and hva translation before init_paging, we need hpa2hva_early and
hva2hpa_early since init_paging may modify hva2hpa to not be identical mapping.
Tracked-On: #2987
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
softirq shouldn't be bounded to vcpu thread. One issue for this
is shell (based on timer) can't work if we don't start any guest.
This change also is trying best to make softirq handler running
with irq enabled.
Also update the irq disable/enabel in vmexit handler to align
with the usage in vcpu_thread.
Tracked-On: #3387
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
This commit adds ops to vlapic structure, and add an *ops parameter to vlapic_reset().
At vlapic reset, the ops is set to the global apicv_ops, and may be assigned
to other ops later.
Tracked-On: #3227
Signed-off-by: Yan, Like <like.yan@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
When initialize secondary pcpu, pass INVALID_CPU_ID as param of init_pcpu_pre()
looks weird, so change the param type to bool to represent whether the pcpu is
a BSP or AP.
Tracked-On: #3420
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
In x86 architecture, word/doubleword/quadword aligned read/write on its boundary
is atomic, so we may remove atomic load/store.
As for atomic set/clear, use bitmap_set/claer seems more reasonable. After replace
them all, we could remove them too.
Tracked-On: #1842
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
vCPU schedule state change is under schedule lock protection. So there's no need
to be atomic.
Tracked-On: #1842
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
ACRN Coding guidelines requires two different types pointer can't
convert to each other, except void *.
Tracked-On: #861
Signed-off-by: Huihuang Shi <huihuang.shi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
ACRN Coding guidelines requires two different types pointer can't
convert to each other, except void *.
Tracked-On: #861
Signed-off-by: Huihuang Shi <huihuang.shi@intel.com>
ACRN Coding guidelines requires type conversion shall be explicity.
Tracked-On: #861
Signed-off-by: Huihuang Shi <huihuang.shi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
In almost case, vLAPIC will only be accessed by the related vCPU. There's no
synchronization issue in this case. However, other vCPUs could deliver interrupts
to the current vCPU, in this case, the IRR (for APICv base situation) or PIR
(for APICv advanced situation) and TMR for both cases could be accessed by more
than one vCPUS simultaneously. So operations on IRR or PIR should be atomical
and visible to other vCPUs immediately. In another case, vLAPIC could be accessed
by another vCPU when create vCPU or reset vCPU which could be supposed to be
consequently.
Tracked-On: #1842
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
In spite of vhm_req status could be updated in HV and DM on different CPUs, they
only change vhm_req status when they detect vhm_req status has been updated by
each other. So vhm_req status will not been misconfigured. However, before HV
sets vhm_req status to REQ_STATE_PENDING, vhm_req buffer filling should be visible
to DM. Add a write memory barrier to guarantee this.
Tracked-On: #1842
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Hypervisor exposes mitigation technique for Speculative
Store Bypass(SSB) to guests and allows a guest to determine
whether to enable SSBD mitigation by providing direct guest
access to IA32_SPEC_CTRL.
Before that, hypervisor should check the SSB mitigation support
on underlying processor, this patch is to add this capability check.
Tracked-On: #3385
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
cancel_event_injection is not need any more if we do 'scheudle' prior to
acrn_handle_pending_request. Commit "921288a6672: hv: fix interrupt
lost when do acrn_handle_pending_request twice" bring 'schedule'
forward, so remove cancel_event_injection related stuff.
Tracked-On: #3374
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
ACRN Coding guidelines requires no dead code.
Tracked-On: #861
Signed-off-by: Huihuang Shi <huihuang.shi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Li, Fei1 <fei1.li@intel.com>
Microarchitectural Data Sampling (MDS) is a hardware vulnerability
which allows unprivileged speculative access to data which is available
in various CPU internal buffers.
1. Mitigation on ACRN:
1) Microcode update is required.
2) Clear CPU internal buffers (store buffer, load buffer and
load port) if current CPU is affected by MDS, when VM entry
to avoid any information leakage to guest thru above buffers.
3) Mitigation is not needed if ARCH_CAP_MDS_NO bit (bit5)
is set in IA32_ARCH_CAPABILITIES MSR (10AH), in this case,
current processor is no affected by MDS vulnerability, in other
cases mitigation for MDS is required.
2. Methods to clear CPU buffers (microcode update is required):
1) L1D cache flush
2) VERW instruction
Either of above operations will trigger clearing all
CPU internal buffers if this CPU is affected by MDS.
Above mechnism is enumerated by:
CPUID.(EAX=7H, ECX=0):EDX[MD_CLEAR=10].
3. Mitigation details on ACRN:
if (processor is affected by MDS)
if (processor is not affected by L1TF OR
L1D flush is not launched on VM Entry)
execute VERW instruction when VM entry.
endif
endif
4. Referrence:
Deep Dive: Intel Analysis of Microarchitectural Data Sampling
https://software.intel.com/security-software-guidance/insights/
deep-dive-intel-analysis-microarchitectural-data-sampling
Deep Dive: CPUID Enumeration and Architectural MSRs
https://software.intel.com/security-software-guidance/insights/
deep-dive-cpuid-enumeration-and-architectural-msrs
Tracked-On: #3317
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Anthony Xu <anthony.xu@intel.com>
Reviewed-by: Jason CJ Chen <jason.cj.chen@intel.com>
ACRN hypervisor always print CPU microcode update
warning message on KBL NUC platform, even after
BIOS was updated to the latest.
'check_cpu_security_cap()' returns false if
no ARCH_CAPABILITIES MSR support on current platform,
but this MSR may not be available on some platforms.
This patch is to remove this pre-condition.
Tracked-On: #3317
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Jason CJ Chen <jason.cj.chen@intel.com>
The current implement will trigger shutdown vm request on the BSP VCPU on the VM,
not the VCPU will trap out because triple fault. However, if the BSP VCPU on the VM
is handling another IO emulation, it may overwrite the triple fault IO request on
the vhm_request_buffer in function acrn_insert_request. The atomic operation of
get_vhm_req_state can't guarantee the vhm_request_buffer will not access by another
IO request if it is not running on the corresponding VCPU. So it should trigger
triple fault shutdown VM IO request on the VCPU which trap out because of triple
fault exception.
Besides, rt_vm_pm1a_io_write will do the right thing which we shouldn't do it in
triple_fault_shutdown_vm.
Tracked-On: #1842
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
According to SDM, bit N (physical address width) to bit 63 should be masked when calculate
host page frame number.
Currently, hypervisor doesn't set any of these bits, so gpa2hpa can work as expectd.
However, any of these bit set, gpa2hpa return wrong value.
Hypervisor never sets bit N to bit 51 (reserved bits), for simplicity, just mask bit 52 to bit 63.
Tracked-On: #3352
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Define/Use variable in place of code to improve readability:
Define new local variable struct pci_bar *vbar, and use vbar-> in place of vdev->bar[idx].
Define new local variable uint64_t vbar_base in init_vdev_pt
Rename uint64_t vbar[PCI_BAR_COUNT] of struct acrn_vm_pci_ptdev_config to uint64_t vbar_base[PCI_BAR_COUNT]
Tracked-On: #3241
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
vcpu is never scan because of scan tool will be crashed!
After modulization, the vcpu can be scaned by the scan tool.
Clean up the violations in vcpu.c.
Fix the violations:
1.No brackets to then/else.
2.Function return value not checked.
3.Signed/unsigned coversion without cast.
V1->V2:
change the type of "vcpu->arch.irq_window_enabled" to bool.
V2->V3:
add "void *" prefix on the 1st parameter of memset.
Tracked-On: #861
Signed-off-by: Huihuang Shi <huihuang.shi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The current implement will cache each ISR vector in ISR vector stack and do
ISR vector stack check when updating PPR. However, there is no need to do this
because:
1) We will not touch vlapic->isrvec_stk[0] except doing vlapic_reset:
So we don't need to do vlapic->isrvec_stk[0] check.
2) We only deliver higher priority interrupt from IRR to ISR:
So we don't need to check whether vlapic->isrvec_stk interrupts is always increasing.
3) There're only 15 different priority interrupt, It will not happened that more that
15 interrupts could been delivered to ISR:
So we don't need to check whether vlapic->isrvec_stk_top will larger than
ISRVEC_STK_SIZE which is 16.
This patch try to remove ISR vector stack and use isrv to cache the vector number for
the highest priority bit that is set in the ISR.
Tracked-On: #1842
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Remove couple of run-time ASSERTs in ioapic module by checking for the
number of interrupt pins per IO-APICs against the configured MAX_IOAPIC_LINES
in the initialization flow.
Also remove the need for two MACROs specifying the max. number of
interrupt lines per IO-APIC and add a config item MAX_IOAPIC_LINES for the
same.
Tracked-On: #3299
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The has_rt_vm walk through all VMs to check RT VM flag and if
there is no any RT VM, then return false otherwise return true.
Signed-off-by: Jack Ren <jack.ren@intel.com>
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Reviewed-by: Li, Fei1 <fei1.li@intel.com>
The ept_flush_leaf_page API is used to flush address space
from a ept page entry, user can use it to match walk_ept_mr to
flush VM address space.
Signed-off-by: Jack Ren <jack.ren@intel.com>
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Reviewed-by: Li, Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The walk_ept_table API is used to walk through EPT table for getting
all of present pages, user can get each page entry and its size
from the walk_ept_table callback.
The get_ept_entry is used to getting EPT pointer of the vm, if current
context of mv is secure world, return secure world EPT pointer, otherwise
return normal world EPT pointer.
Signed-off-by: Jack Ren <jack.ren@intel.com>
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Reviewed-by: Li, Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
flush_address_space is used to flush address space by clflushopt instruction.
Signed-off-by: Jack Ren <jack.ren@intel.com>
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Reviewed-by: Li, Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
CLFLUSHOPT is used to invalidate from every level of the cache hierarchy
in the cache coherence domain the cache line that contains the linear
address specified with memory operand. If that cache line contains
modified date at any level of the cache hierarchy, that data is written
back to memory.
If the platform does not support CLFLUSHOPT instruction, boot will fail.
Signed-off-by: Jack Ren <jack.ren@intel.com>
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Reviewed-by: Li, Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The current implement doesn't clear which access type we support for
APIC-Access VM Exit:
1) linear access for an instruction fetch
-- APIC-access page is mapped as UC which doesn't support fetch
2) linear access (read or write) during event delivery
-- Which is not happened in normal case except the guest went wrong, such as,
set the IDT table in APIC-access page. In this case, we don't need to support.
3) guest-physical access during event delivery;
guest-physical access for an instruction fetch or during instruction execution
-- Do we plan to support enable APIC in real mode ? I don't think so.
Tracked-On: #1842
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
This patch introduces check_vm_vlapic_state API instead of is_lapic_pt_enabled
to check if all the vCPUs of a VM are using x2APIC mode and LAPIC
pass-through is enabled on all of them.
When the VM is in VM_VLAPIC_TRANSITION or VM_VLAPIC_DISABLED state,
following conditions apply.
1) For pass-thru MSI interrupts, interrupt source is not programmed.
2) For DM emulated device MSI interrupts, interrupt is not delivered.
3) For IPIs, it will work only if the sender and destination are both in x2APIC mode.
Tracked-On: #3253
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This patch introduces vLAPIC state for a VM. The VM vLAPIC state can
be one of the following
* VM_VLAPIC_X2APIC - All the vCPUs/vLAPICs (Except for those in Disabled mode) of this
VM use x2APIC mode
* VM_VLAPIC_XAPIC - All the vCPUs/vLAPICs (Except for those in Disabled mode) of this
VM use xAPIC mode
* VM_VLAPIC_DISABLED - All the vCPUs/vLAPICs of this VM are in Disabled mode
* VM_VLAPIC_TRANSITION - Some of the vCPUs/vLAPICs of this VM (Except for those in Disabled mode)
are in xAPIC and the others in x2APIC
Upon a vCPU updating the IA32_APIC_BASE MSR to switch LAPIC mode, this
API is called to sync the vLAPIC state of the VM. Upon VM creation and reset,
vLAPIC state is set to VM_VLAPIC_XAPIC, as ACRN starts the vCPUs vLAPIC in
XAPIC mode.
Tracked-On: #3253
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
is_xapic_enabled API returns true if vLAPIC is in xAPIC mode. In
all other cases, it returns false.
Tracked-On: #3253
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Remove static and inline attributes to the API is_x2apic_enabled
and declare a prototype in vlapic.h. Also fix the check performed on guest
APICBASE_MSR value to query vLAPIC mode.
Tracked-On: #3253
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
On SDC scenario, SOS VM id is fixed to 0 so some hypercalls from guest
are using hardcoded "0" to represent SOS VM, this would bring issues
for HYBRID scenario which SOS VM id is non-zero.
Now introducing a new VM id concept for DM/VHM hypercall APIs, that
return a relative VM id which is from SOS view when create VM for post-
launched VMs. DM/VHM could always treat their own vm id is "0". When they
make hypercalls, hypervisor will convert the VM id to the absolute id
when dispatch the hypercalls.
Tracked-On: #3214
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
According to SDM vol1 13.3:
Write 1 to reserved bit of XCR0 will trigger GP.
This patch make ACRN behavior align with SDM definition.
Tracked-On: #3239
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The VM IDs which is high or equal then CONFIG_MAX_VM_NUM are all
invalid VM IDs, the MACRO has never been referenced in code, so
remove it;
Tracked-On: #3214
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
The vcpu num could be calculated based on pcpu_bitmap when prepare_vcpu()
is done, so remove this redundant configuration item;
Tracked-On: #3214
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Zephyr kernel is stripped ram image, its entry and load address
are explicitly defined in vm configurations, hypervisor will load
Zephyr directly based on these configurations.
Currently we only support boot Zephyr from protected mode.
Tracked-On: #3214
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Previously multiboot mods[0] is designed for kernel module for all
pre-launched VMs including SOS VM, and mods[0].mm_string is used
to store kernel cmdline. This design could not satisfy the requirement
of hybrid mode scenarios that each VM might use their own kernel image
also ramdisk image. To resolve this problem, we will use a tag in
mods mm_string field to specify the module type. If the tag could
be matched with os_config of VM configurations, the corresponding
module would be loaded;
Tracked-On: #3214
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Different kernel has different load method, it should be configurable
in vm configurations;
Tracked-On: #3214
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
The guest OS of ACRN will not be limited to Linux, so refine the struct
of sw_linux to more generic sw_module_info. Currently bootargs and ramdisk
are only supported modules but we can include more modules in future;
Tracked-On: #3214
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Present SGX related MSRs to guest if SGX is supported.
- MSR_IA32_SGXLEPUBKEYHASH0 ~ MSR_IA32_SGXLEPUBKEYHASH3:
SGX Launch Control is not supported, so these MSRs are read only.
- MSR_IA32_SGX_SVN_STATUS:
read only
- MSR_IA32_FEATURE_CONTROL:
If SGX is support in VM, opt-in SGX in this MSR.
- MSR_SGXOWNEREPOCH0 ~ MSR_SGXOWNEREPOCH1:
The two MSRs' scope is package level, not allow guest to change them.
Still leave them in unsupported_msrs array.
Tracked-On: #3179
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
If sgx is supported in guest, present SGX capabilities to guest.
There will be only one EPC section presented to guest, even if EPC
memory for a guest is from muiltiple physcial EPC sections.
Tracked-On: #3179
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Reviewed-by: Yan, Like <like.yan@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add EPC information in vm configuration structure.
EPC information contains the EPC base and size allocated to a VM.
Tracked-On: #3179
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Get the platform EPC resource and partiton the EPC resource for VMs
according to VM configurations.
Don't support sgx capability in SOS VM.
init_sgx is called during platform bsp initialization.
If init_sgx() fails, consider it as configuration error, panic the system.
init_sgx() fails if one of the following happens when at least one VM requests
EPC resource if no enough EPC resource for all VMs.
No further check if sgx is not supported by platform or not opted-in in BIOS,
just disable SGX support for VMs.
Tracked-On: #3179
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
VM Name length is restricted to 32 characters. kata creates
a VM name with GUID added as a part of VM name making it around
80 characters. So increasing this size to 128.
v1->v2:
It turns out that MAX_VM_OS_NAME_LEN usage in DM and HV are for
different use cases. So removing the macro from acrn_common.h.
Definied macro MAX_VMNAME_LEN for DM purposes in dm.h. Retaining
original macron name MAX_VM_OS_NAME_LEN for HV purposes but defined
in vm_config.h.
Tracked-On: #3138
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
Define a static mptable array and each VM could index its vmptable by
vm id, then mptable is not needed in vm configurations;
Tracked-On: #2291
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
At the time the guest is switching to X2APIC mode, different VCPUs in the
same VM could expect the setting of the per VM msr_bitmap differently,
which could cause problems.
Considering different approaches to address this issue:
1) only guest BSP can update msr_bitmap when switching to X2APIC.
2) carefully re-write the update_msr_bitmap_x2apic_xxx() functions to
make sure any bit in the bitmap won't be toggled by the VCPUs.
3) make msr_bitmap as per VCPU.
We chose option 3) because it's simple and clean, though it takes more
memory than other options.
BTW, need to remove const modifier from update_msr_bitmap_x2apic_xxx()
functions to get it compiled.
Tracked-On: #3166
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
- add the GUEST_FLAG_HIGHEST_SEVERITY flag to indicate that the guest has
privilege to reboot the host system.
- this flag is statically assigned to guest(s) in vm_configurations.c in
different scenarios.
- implement reset_host() function to reset the host. First try the ACPI
reset register if available, then try the 0xcf9 PIO.
Tracked-On: #3145
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
ACRN supports LAPIC emulation for guests using x86 APICv. When guest OS/BIOS
switches from xAPIC to x2APIC mode of operation, ACRN also supports switching
froom LAPIC emulation to LAPIC passthrough to guest. User/developer needs to
configure GUEST_FLAG_LAPIC_PASSTHROUGH for guest_flags in the corresponding
VM's config for ACRN to enable LAPIC passthrough.
This patch does the following
1)Fixes a bug in the abovementioned feature. For a guest that is
configured with GUEST_FLAG_LAPIC_PASSTHROUGH, during the time period guest is
using xAPIC mode of LAPIC, virtual interrupts are not delivered. This can be
manifested as guest hang when it does not receive virtual timer interrupts.
2)ACRN exposes physical topology via CPUID leaf 0xb to LAPIC PT VMs. This patch
removes that condition and exposes virtual topology via CPUID leaf 0xb.
Tracked-On: #3136
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
This patch removes hard-coded interfaces in .rst files and
refers to the definition via doxygen style comments.
This patch mainly focus on Hypervisor part.
Other parts will be covered in seperate patches.
Tracked-On: #1595
Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
the pcpu just write its own vmcs, not need spinlock.
and the arch.lock not used other places, remove it too.
Tracked-On: #3130
Signed-off-by: Minggui Cao <minggui.cao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
For pre-launched VMs, currently we set all vbars to 0s initially in
bar emulation code, guest OS will reprogram the bars when it sees the bars are uninited (0s).
We consider this is not the right solution, change to populate the
vbars (to non zero valid pci hole address) based on the vbar base
addresses predefined in vm_config.
Store a pointer to acrn_vm_pci_ptdev_config in struct pci_vdev
Tracked-On: #3022
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
now only SOS need decide boot with de-privilege or direct boot mode, while
for other pre-launched VMs, they should use direct boot mode.
this patch merge boot/guest/direct_boot_info.c &
boot/guest/deprivilege_boot_info.c into boot/guest/vboot_info.c,
and change init_direct_vboot_info() function name to init_general_vm_boot_info().
in init_vm_boot_info(), depend on get_sos_boot_mode(), SOS may choose to init
vm boot info by setting the vm_sw_loader to deprivilege specific one; for SOS
using DIRECT_BOOT_MODE and all other VMS, they will use general_sw_loader as
vm_sw_loader and go through init_general_vm_boot_info() for virtual boot vm
info filling.
this patch also move spurious handler initilization for de-privilege mode from
boot/guest/deprivilege_boot.c to boot/guest/vboot_info.c, and just set it in
deprivilege sw_loader before irq enabling.
Changes to be committed:
modified: Makefile
modified: arch/x86/guest/vm.c
modified: boot/guest/deprivilege_boot.c
deleted: boot/guest/deprivilege_boot_info.c
modified: boot/guest/direct_boot.c
renamed: boot/guest/direct_boot_info.c -> boot/guest/vboot_info.c
modified: boot/guest/vboot_wrapper.c
modified: boot/include/guest/deprivilege_boot.h
modified: boot/include/guest/direct_boot.h
modified: boot/include/guest/vboot.h
new file: boot/include/guest/vboot_info.h
modified: common/vm_load.c
modified: include/arch/x86/guest/vm.h
Tracked-On: #1842
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
remove some unnecessary includes,
some can cause reverse dependency.
Tracked-On: #1842
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
modified: acpi_parser/acpi_ext.c
modified: acpi_parser/dmar_parse.c
modified: boot/acpi_base.c
modified: boot/guest/direct_boot_info.c
modified: include/arch/x86/per_cpu.h
Handle the PIO reset register that is defined in host ACPI:
Parse host FADT table to get the host reset register info, and emulate
it for Service OS:
- return all '1' for guest reads because the read behavior is not defined
in ACPI.
- ignore guest writes with the reset value to stop it from resetting host;
if guest writes other values, passthru it to hardware in case the reset
register supports other functionalities.
Tracked-On: #2700
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This patch implements triple fault vmexit handler and base on VM types:
- post-launched VMs: shutdown_target_vm() injects S5 PIO write to request
DM to shut down the target VM.
- pre-launched VMs: shut down the guest.
- SOS: similarly, but shut down all the non real-time post-launched VMs that
depend to SOS before shutting down SOS.
Tracked-On: #2700
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
- post-launched RTVM: intercept both PIO ports so that hypervisor has a
chance to set VM_POWERING_OFF flag.
- all other type of VMs: deny these 2 ports from guest access so that
guests are not able to reset host.
Tracked-On: #2700
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
For pre-launched VMs and SOS, VM shutdown should not be executed in the
current VM context.
- implement NEED_SHUTDOWN_VM request so that the BSP of the target VM can shut
down the guest in idle thread.
- implement shutdown_vm_from_idle() to shut down target VM.
Tracked-On: #2700
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Previously we use Kconfig of DMAR_PARSE_ENABLED to choose pre-defined DMAR info
or parse it at runtime, at the same time we use MACRO of CONFIG_CONSTANT_ACPI
to decide whether parse PM related ACPI info at runtime. This looks redundant
so use a unified ACPI_PARSE_ENABLED Kconfig to replace them.
Tracked-On: #3107
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
now this MACRO is used in atomic.h and bits.h,
move it from cpu.h to atomic.h to avoid
reverse dependency(i.e. from lower layer to upper one)
Tracked-On: #1842
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
This patch removes the dynamic memory allocation in dmar_parse.c.
v1 -> v2:
- rename 'const_dmar.c' to 'dmar_info.c' and move it to
'boot' directory
- add CONFIG_DMAR_PARSE_ENABLED check for function declaration
Tracked-On: #861
Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
Add MSR_IA32_MISC_ENABLE to emulated_guest_msrs to enable the emulation.
Init MSR_IA32_MISC_ENABLE for guest.
Tracked-On: #2834
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
According to SDM Vol4 2.1, modify vcpuid according to msr ia32_misc_enable:
- Clear CPUID.01H: ECX[3] if guest disabled monitor/mwait.
- Clear CPUID.80000001H: EDX[20] if guest set XD Bit Disable.
- Limit the CPUID leave maximum value to 2 if guest set Limit CPUID MAXVal.
Tracked-On: #2834
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Guest MSR_IA32_MISC_ENABLE read simply returns the value set by guest.
Guest MSR_IA32_MISC_ENABLE write:
- Clear EFER.NXE if MSR_IA32_MISC_ENABLE_XD_DISABLE set.
- MSR_IA32_MISC_ENABLE_MONITOR_ENA:
Allow guest to control this feature when HV doesn't use this feature and hw has no bug.
vcpuid update according to the change of the msr will be covered in following patch.
Tracked-On: #2834
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Merge two parts of different definitions for MSR_IA32_MISC_ENABLE fields.
- use the prefix "MSR_IA32_" to align with others
- Change MSR_IA32_MISC_ENABLE_XD to MSR_IA32_MISC_ENABLE_XD_DISABLE to
align the meaning of the filed since it is "XD bit disable"
Use UL instead of U as the filed bit mask because MSR_IA32_MISC_ENABLE is 64-bit.
Tracked-On: #2834
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
currently, ACRN hypervisor can either boot from sbl/abl or uefi, that's
why we have different firmware method under bsp & boot dirs.
but the fact is that we actually have two different operations based on
different guest boot mode:
1. de-privilege-boot: ACRN hypervisor will boot VM0 in the same context as
native(before entering hypervisor) - it means hypervisor will co-work with
ACRN UEFI bootloader, restore the context env and de-privilege this env
to VM0 guest.
2. direct-boot: ACRN hypervisor will directly boot different pre-launched
VM(including SOS), it will setup guest env by pre-defined configuration,
and prepare guest kernel image, ramdisk which fetch from multiboot modules.
this patch is trying to:
- rename files related with firmware, change them to guest vboot related
- restruct all guest boot stuff in boot & bsp dirs into a new boot/guest dir
- use de-privilege & direct boot to distinguish two different boot operations
this patch is pure file movement, the rename of functions based on old assumption will
be in the following patch.
Changes to be committed:
modified: ../efi-stub/Makefile
modified: ../efi-stub/boot.c
modified: Makefile
modified: arch/x86/cpu.c
modified: arch/x86/guest/vm.c
modified: arch/x86/init.c
modified: arch/x86/irq.c
modified: arch/x86/trampoline.c
modified: boot/acpi.c
renamed: bsp/cmdline.c -> boot/cmdline.c
renamed: bsp/firmware_uefi.c -> boot/guest/deprivilege_boot.c
renamed: boot/uefi/uefi_boot.c -> boot/guest/deprivilege_boot_info.c
renamed: bsp/firmware_sbl.c -> boot/guest/direct_boot.c
renamed: boot/sbl/multiboot.c -> boot/guest/direct_boot_info.c
renamed: bsp/firmware_wrapper.c -> boot/guest/vboot_wrapper.c
modified: boot/include/acpi.h
renamed: bsp/include/firmware_uefi.h -> boot/include/guest/deprivilege_boot.h
renamed: bsp/include/firmware_sbl.h -> boot/include/guest/direct_boot.h
renamed: bsp/include/firmware.h -> boot/include/guest/vboot.h
modified: include/arch/x86/multiboot.h
Tracked-On: #1842
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Replace the vm state VM_STATE_INVALID to VM_POWERED_OFF.
Also replace is_valid_vm() with is_poweroff_vm().
Add API is_created_vm() to identify VM created state.
Tracked-On: #3082
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>