We could use container_of to get vcpu structure pointer from vmtrr. So vcpu
structure pointer is no need in vmtrr structure.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
We could use container_of to get vm structure pointer from vpic. So vm
structure pointer is no need in vpic structure.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
We could use container_of to get vm structure pointer from vpci. So vm
structure pointer is no need in vpci structure.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
We could use container_of to get vcpu/vm structure pointer from vlapic. So vcpu/vm
structure pointer is no need in vlapic structure.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
This function casts a member of a structure out to the containing structure.
So rename to container_of is more readable.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Exend union dmar_ir_entry to support VT-d posted interrupts.
Rename some fields of union dmar_ir_entry:
entry --> value
sw_bits --> avail
Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
Pass intr_src and dmar_ir_entry irte as pointers to dmar_assign_irte(),
which fixes the "Attempt to change parameter passed by value" MISRA C violation.
A few coding style fixes
Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
For CPU side posted interrupts, it only uses bit 0 (ON) of the PI's 64-bit control
, other bits are don't care. This is not the case for VT-d posted
interrupts, define more bit fields for the PI's 64-bit control.
Use bitmap functions to manipulate the bit fields atomically.
Some MISRA-C violation and coding style fixes
Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
The posted interrupt descriptor is more of a vmx/vmcs concept than a vlapic
concept. struct acrn_vcpu_arch stores the vmx/vmcs info, so put struct pi_desc
in struct acrn_vcpu_arch.
Remove the function apicv_get_pir_desc_paddr()
A few coding style/typo fixes
Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
Rename struct vlapic_pir_desc to pi_desc
Rename struct member and local variable pir_desc to pid
pir=posted interrupt request, pi=posted interrupt
pid=posted interrupt descriptor
pir is part of pi descriptor, so it is better to use pi instead of pir
struct pi_desc will be moved to vmx.h in subsequent commit.
Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
The cupid() can be replaced with cupid_subleaf, which is more clear.
Having both APIs makes reading difficult.
Tracked-On: #4526
Signed-off-by: Li Fei1 <fei1.li@intel.com>
For SOS VM, when the target platform has multiple IO-APICs, there
should be equal number of virtual IO-APICs.
This patch adds support for emulating multiple vIOAPICs per VM.
Tracked-On: #4151
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
MADT is used to specify the GSI base for each IO-APIC and the number of
interrupt pins per IO-APIC is programmed into Max. Redir. Entry register of
that IO-APIC.
On platforms with multiple IO-APICs, there can be holes in the GSI space.
For example, on a platform with 2 IO-APICs, the following configuration has
a hole (from 24 to 31) in the GSI space.
IO-APIC 1: GSI base - 0, number of pins - 24
IO-APIC 2: GSI base - 32, number of pins - 8
This patch also adjusts the size for variables used to represent the total
number of IO-APICs on the system from uint16_t to uint8_t as the ACPI MADT
uses only 8-bits to indicate the unique IO-APIC IDs.
Tracked-On: #4151
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
As ACRN prepares to support platforms with multiple IO-APICs,
GSI is a better way to represent physical and virtual INTx interrupt
source.
1) This patch replaces usage of "pin" with "gsi" whereever applicable
across the modules.
2) PIC pin to gsi is trickier and needs to consider the usage of
"Interrupt Source Override" structure in ACPI for the corresponding VM.
Tracked-On: #4151
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
Changes the mmio handler data from that of the acrn_vm struct to
the acrn_vioapic.
Add nr_pins and base_addr to the acrn_vioapic data structure.
Tracked-On: #4151
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
Reverts 538ba08c: hv:Add vpin to ptdev entry mapping for vpic/vioapic
ACRN uses an array of size per VM to store ptirq entries against the vIOAPIC pin
and an array of size per VM to store ptirq entries against the vPIC pin.
This is done to speed up "ptirq entry" lookup at runtime for Level triggered
interrupts in API ptirq_intx_ack used on EOI.
This patch switches the lookup API for INTx interrupts to the API,
ptirq_lookup_entry_by_sid
This could add delay to processing EOI for Level triggered interrupts.
Trade-off here is space saved for array/s of size CONFIG_MAX_IOAPIC_LINES with 8 bytes
per data. On a server platform, ACRN needs to emulate multiple vIOAPICs for
SOS VM, same as the number of physical IO-APICs. Thereby ACRN would need around
10 such arrays per VM.
Removes the need of "pic_pin" except for the APIs facing the hypercalls
hcall_set_ptdev_intr_info, hcall_reset_ptdev_intr_info
Tracked-On: #4151
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
There're some PCI devices need special handler for vendor-specical feature or
capability CFG access. The Intel GPU is one of them. In order to keep the ACRN-HV
clean, we want to throw the qurik part of PCI CFG asccess to DM to handle.
To achieve this, we implement per-device policy base on whether it needs quirk handler
for a VM: each device could configure as "quirk pass through device" or not. For a
"quirk pass through device", we will handle the general part in HV and the quirk part
in DM. For a non "quirk pass through device", we will handle all the part in HV.
Tracked-On: #4371
Signed-off-by: Li Fei1 <fei1.li@intel.com>
There're some cases the SOS (higher severity guest) needs to access the
post-launched VM (lower severity guest) PCI CFG space:
1. The SR-IOV PF needs to reset the VF
2. Some pass through device still need DM to handle some quirk.
In the case a device is assigned to a UOS and is not in a zombie state, the SOS
is able to access, if and only if the SOS has higher severity than the UOS.
Tracked-On: #4371
Signed-off-by: Li Fei1 <fei1.li@intel.com>
For a pre-launched VM, a region from PTDEV_HI_MMIO_START is used to store
64bit vBARs of PT devices which address is high than 4G. The region should
be located after all user memory space and be coverd by guest EPT address.
Tracked-On: #4458
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
ve820.c is a common file in arch/x86/guest/ now, so move function of
create_sos_vm_e820() to this file to make code structure clear;
Tracked-On: #4458
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
hypervisor/arch/x86/configs/$(BOARD)/ve820.c is used to store pre-launched
VM specific e820 entries according to memory configuration of customer.
It should be a scenario based configurations but we had to put it in per
board foler because of different board memory settings. This brings concerns
to customer on configuration orgnization.
Currently the file provides same e820 layout for all pre-launched VMs, but
they should have different e820 when their memory are configured differently.
Although we have acrn-config tool to generate ve802.c automatically, it
is not friendly to modify hardcoded ve820 layout manually, so the patch
changes the entries initialization method by calculating each entry item
in C code.
Tracked-On: #4458
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Currently ept_pages_info[] is initialized with first element only that force
VM of id 0 using SOS EPT pages. This is incorrect for logical partition and
hybrid scenario. Considering SOS_RAM_SIZE and UOS_RAM_SIZE are configured
separately, we should use different ept pages accordingly.
So, the PRE_VM_NUM/SOS_VM_NUM and MAX_POST_VM_NUM macros are introduced to
resolve this issue. The macros would be generated by acrn-config tool when
user configure ACRN for their specific scenario.
One more thing, that when UOS_RAM_SIZE is less then 2GB, the EPT address
range should be (4G + PLATFORM_HI_MMIO_SIZE).
Tracked-On: #4458
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Move CFG read/write function for PCI-compatible Configuration Mechanism from
debug/uartuart16550.c to hw/pci.c and rename CFG read/write function for
PCI-compatible Configuration Mechanism to pci_pio_read/write_cfg to align with
CFG read/write function pci_mmcfg_read/write_cfg for PCI Express Enhanced
Configuration Access Mechanism.
Tracked-On: #4371
Signed-off-by: Li Fei1 <fei1.li@intel.com>
In order to add GVT-D support, we need pass through stolen memory and opregion memroy
to the post-launched VM. To implement this, we first reserve the GPA for stolen memory
and opregion memory through post-launched VM e820 table. Then we would build EPT mapping
between the GPA and the stolen memory and opregion memory real HPA. The last, we need to
return the GPA to post-launched VM if it wants to read the stolen memory and opregion
memory address and prevent post-launched VM to write the stolen memory and opregion memory
address register for now.
We do the GPA reserve and GPA to HPA EPT mapping in ACRN-DM and the stolen memory and
opregion memory CFG space register access emulation in ACRN-HV.
Tracked-On: #4371
Signed-off-by: Li Fei1 <fei1.li@intel.com>
VM needs to check if it owns this device before deiniting it.
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Change enable_vf/disable_vf to create_vfs/disable_vfs
Change base member of pci_vbar to base_gpa
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Emulate Device ID, Vendor ID and MSE(Memory Space Enable) bit in
configuration space for an assigned VF, initialize assgined VF Bars.
The Device ID comes from PF's SRIOV capability
The Vendor ID comes from PF's Vendor ID
The PCI MSE bit always be set when VM reads from an assigned VF.
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add cfg_header_read_cfg and cfg_header_write_cfg to handle the 1st 64B
CFG Space header PCI configuration space.
Only Command and Status Registers are pass through;
Only Command and Status Registers and Base Address Registers are writable.
In order to implement this, we add two type bit mask for per 4B register:
pass through mask and read-only mask. When pass through bit mask is set, this
means this bit of this 4B register is pass through, otherwise, it is virtualized;
When read-only mask is set, this means this bit of this 4B register is read-only,
otherwise, it's writable. We should write it to physical CFG space or virtual
CFG space base on whether the pass through bit mask is set or not.
Tracked-On: #4371
Signed-off-by: Li Fei1 <fei1.li@intel.com>
1. Renames DEFINE_IOAPIC_SID with DEFINE_INTX_SID as the virtual source can
be IOAPIC or PIC
2. Rename the src member of source_id.intx_id to ctlr to indicate interrupt
controller
2. Changes the type of src member of source_id.intx_id from uint32_t to
enum with INTX_CTLR_IOAPIC and INTX_CTLR_PIC
Tracked-On: #4447
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
For SRIOV needs ARI support, so enable it in HV if
the PCI bridge support it.
TODO:
need check all the PCI devices under this bridge can support ARI,
if not, it is better not enable it as PCIe spec. That check will be
done when scanning PCI devices.
Tracked-On: #3381
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Signed-off-by: Minggui Cao <minggui.cao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This is to enable relocation for code32.
- RIP relative addressing is available in x86-64 only so we manually add
relocation delta to the target symbols to fixup code32.
- both code32 and code64 need to load GDT hence both need to fixup GDT
pointer. This patch declares separate GDT pointer cpu_primary64_gdt_ptr
for code64 to avoid double fixup.
- manually fixup cpu_primary64_gdt_ptr in code64, but not rely on relocate()
to do that. Otherwise it's very confusing that symbols from same file could
be fixed up externally by relocate() or self-relocated.
- to make it clear, define a new symbol ld_entry_end representing the end of
the boot code that needs manually fixup, and use this symbol in relocate()
to filter out all symbols belong to the entry sections.
Tracked-On: #4441
Reviewed-by: Fengwei Yin <fengwei.yin@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
One argument is missing for the function ptirq_alloc_entry.
This patch fixes the doc generation error.
Tracked-On: #3882
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Fixed misspellings and rst formatting issues.
Added ptdev.h to the list of include file for doxygen
Tracked-On: #3882
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Signed-off-by: David B. Kinder <david.b.kinder@intel.com>
Add doxygen style comments to ptdev public APIs.
Add these API descriptions to group acrn_passthrough.
Tracked-On: #3882
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
This patch adds RDT MBA support to detect, configure and
and setup MBA throttle registers based on VM configuration.
Tracked-On: #3725
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add a new parameter pf_vdev for function vpci_init_vdev to support SRIOV
VF vdev initializaiton.
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The init_one_dev_config is used to initialize a acrn_vm_pci_dev_config
SRIOV needs a explicit acrn_vm_pci_dev_config to create a VF vdev,so
refine it to return acrn_vm_pci_dev_config.
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Due to SRIOV VF physical device needs to be initialized when
VF_ENABLE is set and a SRIOV VF physical device initialization
is same with standard PCIe physical device, so expose the
init_pdev for SRIOV VF physical device initialization.
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
All SRIOV VF physical devices don't have bars in configuration space,
they are from the VF associated PF's VF_BAR registers of SRIOV capability.
Adding a vbars data structure in pci_cap_sriov data structure to store
SRIOV VF_BAR information, so that each VF bars can be initialized directly
through the vbars instead multiple accessing of the PF VF_BAR registers.
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Current code avoid the rule 88 S in MISRA-C, so move xsaves and xrstors
assembler to individual functions.
Tracked-On: #4436
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
VF_ENABLE is one field of SRIOV capability that is used to create
or remove VF physical devices. If VF_ENABLE is set, hv can detect
if the VF physical devices are ready after waiting 100 ms.
v2: Add sanity check for writing NumVFs register, add precondition
and application constraints when VF_ENABLE is set and refine
code style.
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Introduce SRIOV capability field for pci_vdev and add SRIOV capability
interception entries.
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Make the SRIOV-Capable device invisible from SOS if there is
no room for its all virtual functions.
v2: fix a issue that if a PF has been dropped, the subsequent PF
will be dropped too even there is room for its VFs.
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
if the device has PCIe capability, walks all PCIe extended
capabilities for SRIOV discovery.
v2: avoid type casting and refine naming.
Tracked-On: #4433
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
add vpci bridge operations in hypervisor, to avoid SOS mis-operations
to affect other VM's PCI devices.
assumption: before hypervisor bootup, the physical pci-bridge shall be
configured correctly by BIOS or other bootloader; for ACS (Access
Control Service) capability, it is configured by BIOS to support the
devices under it to be isolated and allocated to different VMs.
to simplify the emulations of vpci bridge, set limitations as following:
1. expose all configure space registers, but readonly
2. BIST not support; by default is 0
3. not support interrupt, including INTx and MSI.
TODO:
1. configure tool can select whether a PCI bridge is emulated or pass
through.
Open:
1. SOS how to reset PCI device under the PCI bridge?
Tracked-On: #3381
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Signed-off-by: Minggui Cao <minggui.cao@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
The init value for XCR0 and XSS should be the same with spec:
In SDM Vol1 13.3:
XCR0[0] is associated with x87 state (see Section 13.5.1). XCR0[0] is
always 1. The other bits in XCR0 are all 0 coming out of RESET.
The IA32_XSS MSR (with MSR index DA0H) is zero coming out of RESET.
The previous code try to fix the xsave area leak to other VMs during init
phase, but bring the error to linux. Besides, it cannot avoid the
possible leak in running phase. Need find a better solution.
Tracked-On: #4430
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
is not set
This patch does the following,
1. Removes RDT code if CONFIG_RDT_ENABLED flag is
not set.
2. Set the CONFIG_RDT_ENABLED flag only on platforms
that support RDT so that build scripts will automatically
reflect the config.
Tracked-On: #3715
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
cache configuration.
This patch creates a generic infrastructure for
RDT resources instead of just L2 or L3 cache. This
patch also fixes L3 CAT config overwrite by L2 in
cases where both L2 and L3 CAT are supported.
Tracked-On: #3715
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
There can be times when user unknowinlgy enables
CONFIG_CAT_ENBALED SW flag, but the hardware might
not support L3 or L2 CAT. In such case software can
end up writing to the CAT MSRs which can cause
undefined results. The patch fixes the issue by
enabling CAT only when both HW as well software
via the CONFIG_CAT_ENABLED supports CAT.
The patch also address typo with "clos2prq_msr"
function name. It should be "clos2pqr_msr" instead.
PQR stands for platform qos register.
Tracked-On: #3715
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Upcoming intel platforms can support both L2 and L3
but our current code only supports either L2 or L3 CAT.
So split the MSRs so that we can support allocation
for both L2 and L3.
This patch does the following,
1. splits programming of L2 and L3 cache resource
based on the resource ID.
2. Replace generic platform_clos_array struct with resource
specific struct in all the existing board.c files.
Tracked-On: #3715
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
As part of rdt cat refactoring, goal is to combine all rdt
specific features such as CAT under one module. So renaming
rdt resouce specific files such as cat.c/.h to generic rdt.c/.h
files.
Tracked-On: #3715
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Initialize efi info of acrn mbi when boot from multiboot2 protocol, with
this patch hypervisor could get host efi info and pass it to Linux zeropage,
then make guest Linux possible to boot with efi environment;
Tracked-On: #4419
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The patch re-arch boot component header files by:
- moving multiboot.h from include/arch/x86/ to boot/include/ and keep
this header for multiboot1 protocol data struct only;
- moving multiboot related MACROs in cpu_primary.S to multiboot.h;
- creating an independent boot.h to store acrn specific boot information
for other files' reference;
Tracked-On: #4419
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
BVT (Borrowed virtual time) scheduler is used to schedule vCPUs on pCPU.
It has the concept of virtual time, vCPU with earliset virtual time is
dispatched first.
Main concepts:
tick timer:
a period tick is used to measure the physcial time in units of MCU
(minimum charing unit).
runqueue:
thread in the runqueue is ordered by virtual time.
weight:
each thread receives a share of the pCPU in proportion to its
weight.
context switch allowance:
the physcial time by which the current thread is allowed to advance
beyond the next runnable thread.
warp:
a thread with warp enabled will have a change to minus a value (Wi)
from virtual time to achieve higher priority.
virtual time:
AVT: actual virtual time, advance in proportional to weight.
EVT: effective virtual time.
EVT <- AVT - ( warp ? Wi : 0 )
SVT: scheduler virtual time, the minimum AVT in the runqueue.
Tracked-On: #4410
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
1. Rename BOOT_CPU_ID to BSP_CPU_ID
2. Repace hardcoded value with BSP_CPU_ID when
ID of BSP is referenced.
Tracked-On: #4420
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Now only PCI MSI-X BAR access need dynamic register/unregister. Others don't need
unregister once it's registered. So we don't need to lock the vm level emul_mmio_lock
when we handle the MMIO access. Instead, we could use finer granularity lock in the
handler to ptotest the shared resource.
This patch fixed the dead lock issue when OVMF try to size the BAR size:
Becasue OVMF use ECAM to access the PCI configuration space, it will first hold vm
emul_mmio_lock, then calls vpci_handle_mmconfig_access. While this tries to size a
BAR which is also a MSI-X Table BAR, it will call register_mmio_emulation_handler to
register the MSI-X Table BAR MMIO access handler. This will causes the emul_mmio_lock
dead lock.
Tracked-On: #3475
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Now we split passthrough PCI device from DM to HV, we could remove all the passthrough
PCI device unused code.
Tracked-On: #4371
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add assign/deassign PCI device hypercall APIs to assign a PCI device from SOS to
post-launched VM or deassign a PCI device from post-launched VM to SOS. This patch
is prepared for spliting passthrough PCI device from DM to HV.
The old assign/deassign ptdev APIs will be discarded.
Tracked-On: #4371
Signed-off-by: Li Fei1 <fei1.li@intel.com>
To enable gvt-d,need to allow the GPU IOMMU.
While gvt-d hasn't been enabled on APL yet,
so let APL disable GPU IOMMU.
v2 -> v3:
* let APL platforms disable GPU IOMMU.
Tracked-On: #4405
Signed-off-by: Junming Liu <junming.liu@intel.com>
Reviewed-by: Wu Binbin <binbin.wu@intel.com>
In platforms that support CAT, when it is enabled by ACRN, i.e.
IA32_resourceType_MASK_n registers are programmed with customized values,
it has impacts to the whole system.
The per guest flag GUEST_FLAG_CLOS_REQUIRED suggests that CAT may be
enabled in some guests, but not in others who don't have this flag,
which is conceptually incorrect.
This patch removes GUEST_FLAG_CLOS_REQUIRED, and adds a new Kconfig
entry CAT_ENABLED for CAT enabling. When it's enabled, platform_clos_array[]
defines a set of system-wide Class of Service (COS, or CLOS), and the
per guest vm_configs[].clos associates the guest with particular CLOS.
Tracked-On: #2462
Signed-off-by: Zide Chen <zide.chen@intel.com>
1. Align the coding style for these MACROs
2. Align the values of fixed VECTORs
Tracked-On: #4348
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
is_polling_ioreq is more straightforward. Rename it.
Tracked-On: #4329
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
'param' is BDF value instead of GPA when VHM driver
issues below 2 hypercalls:
- HC_ASSIGN_PTEDEV
- HC_DEASSIGN_PTDEV
This patch is to remove related code in hc_assign/deassign()
functions.
Tracked-On: #4334
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
SOS will use PCIe ECAM access PCIe external configuration space. HV should trap this
access for security(Now pre-launched VM doesn't want to support PCI ECAM; post-launched
VM trap PCIe ECAM access in DM).
Besides, update PCIe MMCONFIG region to be owned by hypervisor and expose and pass through
platform hide PCI devices by BIOS to SOS.
Tracked-On: #3475
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Use Enhanced Configuration Access Mechanism (MMIO) instead of PCI-compatible
Configuration Mechanism (IO port) to access PCIe Configuration Space
PCI-compatible Configuration Mechanism (IO port) access is used for UART in
debug version.
Tracked-On: #3475
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Sometimes HV wants to know if there are pending interrupts of one vcpu.
Add .has_pending_intr interface in acrn_apicv_ops and return the pending
interrupts status by check IRRs of apicv.
Tracked-On: #4329
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Introduce two kinds of events for each vcpu,
VCPU_EVENT_IOREQ: for vcpu waiting for IO request completion
VCPU_EVENT_VIRTUAL_INTERRUPT: for vcpu waiting for virtual interrupts events
vcpu can wait for such events, and resume to run when the
event get signalled.
This patch also change IO request waiting/notifying to this way.
Tracked-On: #4329
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This simple event implemention can only support exclusive waiting
at same time. It mainly used by thread who want to wait for special event
happens.
Thread A who want to wait for some events calls
wait_event(struct sched_event *);
Thread B who can give the event signal calls
signal_event(struct sched_event *);
Tracked-On: #4329
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Before we assign a PT device to post-launched VM, we should reset the PCI device
first. However, ACRN hypervisor doesn't plan to support PCIe hot-plug and doesn't
support PCIe bridge Secondary Bus Reset. So the PT device must support FLR or PM
reset. This patch do this check when assigning a PT device to post-launched VM.
Tracked-On: #3465
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Since we restore BAR values when writing Command Register if necessary. We don't
need to trap FLR and do the BAR restore then.
Tracked-On: #3475
Signed-off-by: Li Fei1 <fei1.li@intel.com>
When PCIe does Conventinal Reset or FLR, almost PCIe configurations and states will
lost. So we should save the configurations and states before do the reset and restore
them after the reset. This was done well by BIOS or Guest now. However, ACRN will trap
these access and handle them properly for security. Almost of these configurations and
states will be written to physical configuration space at last except for BAR values
for now. So we should do the restore for BAR values. One way is to do restore after
one type reset is detected. This will be too complex. Another way is to do the restore
when BIOS or guest tries to write the Command Register. This could work because:
1. The I/O Space Enable bit and Memory Space Enable bits in Command Register will reset
to zero.
2. Before BIOS or guest wants to enable these bits, the BAR couldn't be accessed.
3. So we could restore the BAR values before enable these bits if reset is detected.
Tracked-On: #3475
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Per SDM 10.12.5.1 vol.3, local APIC should keep LAPIC state after receiving
INIT. The local APIC ID register should also be preserved.
Tracked-On: #4267
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The patch abstract a vcpu_reset_internal() api for internal usage, the
function would not touch any vcpu state transition and just do vcpu reset
processing. It will be called by create_vcpu() and reset_vcpu().
The reset_vcpu() will act as a public api and should be called
only when vcpu receive INIT or vm reset/resume from S3. It should not be
called when do shutdown_vm() or hcall_sos_offline_cpu(), so the patch remove
reset_vcpu() in shutdown_vm() and hcall_sos_offline_cpu().
The patch also introduced reset_mode enum so that vcpu and vlapic could do
different context operation according to different reset mode;
Tracked-On: #4267
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Some MACROs in lapic.h are duplicated with apicreg.h, and some MACROs are
never referenced, remove them.
Tracked-On: #4268
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Structure pci_vbar is used to define the virtual BAR rather than physical BAR.
It's better to name as pci_vbar.
Tracked-On: #3475
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add severity definitions for different scenarios. The static
guest severity is defined according to guest configurations.
Also add sanity check to make sure the severity for all guests
are correct.
Tracked-On: #4270
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
For guest reset, if the highest severity guest reset will reset
system. There is vm flag to call out the highest severity guest
in specific scenario which is a static guest severity assignment.
There is case that the static highest severity guest is shutdown
and the highest severity guest should be transfer to other guest.
For example, in ISD scenario, if RTVM (static highest severity
guest) is shutdown, SOS should be highest severity guest instead.
The is_highest_severity_vm() is updated to detect highest severity
guest dynamically. And promote the highest severity guest reset
to system reset.
Also remove the GUEST_FLAG_HIGHEST_SEVERITY definition.
Tracked-On: #4270
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
For system S5, ACRN had assumption that SOS shutdown will trigger
system shutdown. So the system shutdown logical is:
1. Trap SOS shutdown
2. Wait for all other guest shutdown
3. Shutdown system
The new logical is refined as:
If all guest is shutdown, shutdown whole system
Tracked-On: #4270
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
ACRN hypervisor should trap guest doing PCI AF FLR. Besides, it should save some status
before doing the FLR and restore them later, only BARs values for now.
This patch will trap guest Conventional PCI Advanced Features Control Register write
operation if the device supports Conventional PCI Advanced Features Capability and
check whether it wants to do device AF FLR. If it does, call pdev_do_flr to do the job.
Tracked-On: #3465
Signed-off-by: Li Fei1 <fei1.li@intel.com>
ACRN hypervisor should trap guest doing PCIe FLR. Besides, it should save some status
before doing the FLR and restore them later, only BARs values for now.
This patch will trap guest Device Capabilities Register write operation if the device
supports PCI Express Capability and check whether it wants to do device FLR. If it does,
call pdev_do_flr to do the job.
Tracked-On: #3465
Signed-off-by: Li Fei1 <fei1.li@intel.com>
We don't use INIT signal notification method now. This patch
removes them.
Tracked-On: #3886
Acked-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
We have implemented a new notification method using NMI.
So replace the INIT notification method with the NMI one.
Then we can remove INIT notification related code later.
Tracked-On: #3886
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
There is a window where we may miss the current request in the
notification period when the work flow is as the following:
CPUx + + CPUr
| |
| +--+
| | | Handle pending req
| <--+
+--+ |
| | Set req flag |
<--+ |
+------------------>---+
| Send NMI | | Handle NMI
| <--+
| |
| |
| +--> vCPU enter
| |
+ +
So, this patch enables the NMI-window exiting to trigger the next vmexit
once there is no "virtual-NMI blocking" after vCPU enter into VMX non-root
mode. Then we can process the pending request on time.
Tracked-On: #3886
Acked-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
The NMI for notification should not be inject to guest. So,
this patch drops NMI injection request when we use NMI
to notify vCPUs. Meanwhile, ACRN doesn't support vNMI well
and there is no well-designed way to check if the NMI is
for notification or for guest now. So, we take all the NMIs as
notificaton NMI for hard rtvm temporarily. It means that the
hard rtvm will never receive NMI with this patch applied.
TODO: vNMI support is not ready yet. we will add it later.
Tracked-On: #3886
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
ACRN hypervisor needs to kick vCPU off VMX non-root mode to do some
operations in hypervisor, such as interrupt/exception injection, EPT
flush etc. For non lapic-pt vCPUs, we can use IPI to do so. But, it
doesn't work for lapic-pt vCPUs as the IPI will be injected to VMs
directly without vmexit.
Without the way to kick the vCPU off VMX non-root mode to handle pending
request on time, there may be fatal errors triggered.
1). Certain operation may not be carried out on time which may further
lead to fatal errors. Taking the EPT flush request as an example, once we
don't flush the EPT on time and the guest access the out-of-date EPT,
fatal error happens.
2). ACRN now will send an IPI with vector 0xF0 to target vCPU to kick the vCPU
off VMX non-root mode if it wants to do some operations on target vCPU.
However, this way doesn't work for lapic-pt vCPUs. The IPI will be delivered
to the guest directly without vmexit and the guest will receive a unexpected
interrupt. Consequently, if the guest can't handle this interrupt properly,
fatal error may happen.
The NMI can be used as the notification signal to kick the vCPU off VMX
non-root mode for lapic-pt vCPUs. So, this patch uses NMI as notification signal
to address the above issues for lapic-pt vCPUs.
Tracked-On: #3886
Acked-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
Reserved bits in a 8-bit PAT field has been checked in pat_mem_type_invalid.
Remove this redundant check "(PAT_FIELD_RSV_BITS & field) != 0UL" in
write_pat_msr.
Tracked-On: #1842
Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
This patch adds a helper function send_single_nmi. The fisrt caller
will soon come with the following patch.
Tracked-On: #3886
Acked-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
This patch installs a NMI handler in acrn IDT to handle
NMIs out of dispatch_exception.
Tracked-On: #3886
Acked-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
In current architecutre, the maximum vCPUs number per VM could not
exceed the pCPUs number. Given the MAX_PCPU_NUM macro is provided
in board configurations, so remove the MAX_VCPUS_PER_VM from Kconfig
and add a macro of MAX_VCPUS_PER_VM to reference MAX_PCPU_NUM directly.
Tracked-On: #4230
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
rename the macro since MAX_PCPU_NUM could be parsed from board file and
it is not a configurable item anymore.
Tracked-On: #4230
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
remove 'guest_init_pml4' and 'tmp_pg_array' in vm_arch
since they are not used.
Tracked-On: #1842
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Add yield support for schedule, which can give up pcpu proactively.
Tracked-On: #4178
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
IO sensitive Round-robin scheduler aim to schedule threads with
round-robin policy. Meanwhile, we also enhance it with some fairness
configuration, such as thread will be scheduled out without properly
timeslice. IO request on thread will be handled in high priority.
This patch only add a skeleton for the sched_iorr scheduler.
Tracked-On: #4178
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>