This patch updates kconfig to support server platforms
for increased number of VCPUs per VM and PT IRQ number.
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Tracked-On: #4196
On some platforms, HPA regions for Virtual Machine can not be
contiguous because of E820 reserved type or PCI hole. In such
cases, pre-launched VMs need to be assigned non-contiguous memory
regions and this patch addresses it.
To keep things simple, current design has the following assumptions,
1. HPA2 always will be placed after HPA1
2. HPA1 and HPA2 don’t share a single ve820 entry.
(Create multiple entries if needed but not shared)
3. Only support 2 non-contiguous HPA regions (can extend
at a later point for multiple non-contiguous HPA)
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Tracked-On: #4195
Acked-by: Anthony Xu <anthony.xu@intel.com>
To handle reboot requests from pre-launched VMs that don't have
GUEST_FLAG_HIGHEST_SEVERITY, we shutdown the target VM explicitly
other than ignoring them.
Tracked-On: #2700
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
ptirq_prepare_msix_remap was called no matter whether MSI/MSI-X was enabled or not
and it passed zero to input parameter virtual MSI/MSI-X data field to indicate
MSI/MSI-X was disabled. However, it barely did nothing on this case.
Now ptirq_prepare_msix_remap is called only when MSI/MSI-X is enabled. It doesn't
need to check whether MSI/MSI-X is enabled or not by checking virtual MSI/MSI-X
data field.
Tracked-On: #3475
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Do vMSI-X remap only when Mask Bit in Vector Control Register for MSI-X Table Entry
is unmask.
The previous implementation also has two issues:
1. It only check whether Message Control Register for MSI-X has been modified when
guest writes MSI-X CFG space at Message Control Register offset.
2. It doesn't really disable MSI-X when guest wants to disable MSI-X.
Tracked-On: #3475
Signed-off-by: Li Fei1 <fei1.li@intel.com>
1. disable physical MSI before writing the virtual MSI CFG space
2. do the remap_vmsi if the guest wants to enable MSI or update MSI address or data
3. disable INTx and enable MSI after step 2.
The previous Message Control check depends on the guest write MSI Message Control
Register at the offset of Message Control Register. However, the guest could access
this register at the offset of MSI Capability ID register. This patch remove this
constraint. Also, The previous implementation didn't really disable MSI when guest
wanted to disable MSI.
Tracked-On: #3475
Signed-off-by: Li Fei1 <fei1.li@intel.com>
It's meaningless to sleep a non-running vcpu. Add a state check before
sleep the thread object of the vcpu.
Tracked-On: #4178
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Now, we support schedule inplace. And with cpu sharing, there might be
multi vcpu running on same pcpu. Reschedule request will happen when
switch the running vcpu. If the current vcpu is polling on the IO
completion, it need to be scheduled back to the polling point.
In the polling path, construct a loop for polling, and do schedule in the
loop if needed.
Tracked-On: #4178
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
With cpu-sharing enabled, there are more than 1 vcpu on 1 pcpu, so the
smp_call handler should switch the vmcs to the target vcpu's vmcs. Then
get the info.
dump_vcpu_reg and dump_guest_mem should run on certain vmcs, otherwise,
there will be #GP error.
Renaming:
vcpu_dumpreg -> dump_vcpu_reg
switch_vmcs -> load_vmcs
Tracked-On: #4178
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add a header field in acrnlog message to indicate the current
running thread.
Tracked-On: #4178
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
We care more about leaf and subleaf of cpuid than vcpu_id.
So, this patch changes the cpuid trace-entry to trace the leaf
and subleaf of this cpuid vmexit.
Tracked-On: #4175
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
PMU is hidden from any guest, UD is expected when guest
try to execute 'rdpmc' instruction.
this patch sets 'RDPMC exiting' in Processorbased
VM-execution control.
Tracked-On: #3453
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Deterministic is important for RTVM. The mitigation for MCE on
Page Size Change converts a large page to 4KB pages runtimely during
the vmexit triggered by the instruction fetch in the large page.
These vmexits increase nondeterminacy, which should be avoided for RTVM.
This patch builds 4KB page mapping in EPT for RTVM to avoid these vmexits.
Tracked-On: #4101
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add a option MCE_ON_PSC_WORKAROUND_DISABLED to disable the software
workaround for the issue Machine Check Error on Page Size Change.
Tracked-On: #4101
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Only apply the software workaround on the models that might be
affected by MCE on page size change. For these models that are
known immune to the issue, the mitigation is turned off.
Atom processors are not afftected by the issue.
Also check the CPUID & MSR to check whether the model is immune to the issue:
CPU is not vulnerable when both CPUID.(EAX=07H,ECX=0H).EDX[29] and
IA32_ARCH_CAPABILITIES[IF_PSCHANGE_MC_NO] are 1.
Other cases not listed above, CPU may be vulnerable.
This patch also changes MACROs for MSR IA32_ARCH_CAPABILITIES bits to UL instead of U
since the MSR is 64bit.
Tracked-On: #4101
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
After changing init_vmcs to smp call approach and do it before
launch_vcpu, it could work with noop scheduler. On real sharing
scheudler, it has problem.
pcpu0 pcpu1 pcpu1
vmBvcpu0 vmAvcpu1 vmBvcpu1
vmentry
init_vmcs(vmBvcpu1) vmexit->do_init_vmcs
corrupt current vmcs
vmentry fail
launch_vcpu(vmBvcpu1)
This patch mark a event flag when request vmcs init for specific vcpu. When
it is running and checking pending events, will do init_vmcs firstly.
Tracked-On: #4178
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The default PCI mmcfg base is stored in ACPI MCFG table, when
CONFIG_ACPI_PARSE_ENABLED is set, acpi_fixup() function will
parse and fix up the platform mmcfg base in ACRN boot stage;
when it is not set, platform mmcfg base will be initialized to
DEFAULT_PCI_MMCFG_BASE which generated by acrn-config tool;
Please note we will not support platform which has multiple PCI
segment groups.
Tracked-On: #4157
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Starting with TSC_DEADLINE msr interception disabled, the virtual TSC_DEADLINE msr is always 0.
When the interception is enabled, need to sync the physical TSC_DEADLINE value to virtual TSC_DEADLINE.
When the interception is disabled, there are 2 cases:
- if the timer hasn't expired, sync virtual TSC_DEADLINE to physical TSC_DEADLINE, to make the guest read the same tsc_deadline
as it writes. This may change when the timer actually trigger.
- if the timer has expired, write 0 to the virtual TSC_DEADLINE.
Tracked-On: #4162
Signed-off-by: Yan, Like <like.yan@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
When write to virtual TSC_DEADLINE, if virtual TSC_ADJUST is not zero:
- when guest intends to disarm the tsc_deadline timer, should not arm the timer falsely;
- when guest intends to arm the tsc_deadline timer, should not disarm the timer falsely.
When read from virtual TSC_DEADLINE, if virtual TSC_ADJUST is not zero:
- if physical TSC_DEADLINE is not zero, return the virtual TSC_DEADLINE value;
- if physical TSC_DEADLINE is zero which means it's not armed (automatically disarmed after
timer triggered), return 0 and reset the virtual TSC_DEADLINE.
Tracked-On: #4162
Signed-off-by: Yan, Like <like.yan@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
xsave area:
legacy region: 512 bytes
xsave header: 64 bytes
extended region: < 3k bytes
So, pre-allocate 4k area for xsave. Use certain instruction to save or
restore the area according to hardware xsave feature set.
Tracked-On: #4166
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Reviewed-by: Anthony Xu <anthony.xu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Name vmsi and vmsix function with verb-object style:
For external APIs, using MODULE_NAME_verb-object style;
For internal APIs, using verb-object style.
Tracked-On: #3475
Signed-off-by: Li Fei1 <fei1.li@intel.com>
ptirq_msix_remap doesn't do the real remap, that's the vmsi_remap and vmsix_remap_entry
does. ptirq_msix_remap only did the preparation.
Tracked-On: #3475
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Add a Kconfig parameter called UEFI_OS_LOADER_NAME to hold the Service VM EFI
bootloader to be run by the ACRN hypervisor. A new string manipulation function
to convert from (char *) to (CHAR16 *) has been added to facilitate the
implementation.
The default value is set to systemd-boot (bootloaderx64.efi)
Tracked-On: #2793
Signed-off-by: Geoffroy Van Cutsem <geoffroy.vancutsem@intel.com>
Major changes:
1. Correct handling of device multi-function capability
We only check function zero for this feature. If it has it, we continue
looking at all remaining functions, ignoring those with invalid vendors.
The PCI spec says we are not to probe beyond function zero if it does
not exist or indicates it is not a multi-function device.
2a. Walk *ALL* buses in the PCI space, however,
Before walking the PCI hierarchy, post-processed ACPI DMAR info is parsed
and a map is created between all device-scopes across all DRHDs and the
corresponding IOMMU index.
This map is used at the time of walking the PCI hierarchy. If a BDF that
ACRN is currently working on, is found in the above-mentioned map, the
BDF device is mapped to the corresponding DRHD in the map.
If the BDF were a bridge type, realized with "Header Type" in config space,
the BDF device along with all its downstream devices are mapped to the
corresponding DRHD in the map.
To avoid walking previously visited buses, we maintain a bitmap that
stores which bus is walked when we handle Bridge type devices.
Once ACPI information is included into ACRN about the PCI-Express Root
Complexes / PCI Host Bridges, we can avoid the final loop which probes
all remainder buses, and instead jump to the next Host Bridge bus.
From prior patches, init_pdev returns the pdev structure it created to
the caller. This allows us to complete initialization by updating its
drhd_idx to the correct DRHD.
Tracked-On: #4134
Signed-off-by: Alexander Merritt <alex.merritt@intel.com>
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
On server platforms, DMAR DRHD device scope entries may contain PCI
bridges.
Bridges in the DRHD device scope indicate this IOMMU translates for all
devices on the hierarchy below that bridge.
ACRN is unaware of bridge types in the device scope, and adds these
directly to its internal representation of a DRHD. When looking up a BDF
within these DRHD entries, device_to_dmaru assumes all entries are
Endpoints, comparing BDF to BDF. Thus device to DMAR unit fails, because
it treats a bridge as an Endpoint type.
This change leverages prior patches by converting a BDF to the
associated device DRHD index, and uses that index to obtain the correct
DRHD state.
Handling a bridge in other ways may require maintaining a bus list for
each, or replacing each bridge in the dev scope with a set of all device
BDFs underneath it. Server platforms can have hundreds of PCI devices,
thus making the device scope artificially large is unwieldy.
Tracked-On: #4134
Signed-off-by: Alexander Merritt <alex.merritt@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
ACRN does not support multiple PCI segments in its current form.
But VT-d module uses segment info in its interfaces and
hardcodes it to 0.
This patch cleans up everything related to segment to avoid
ambiguity.
Tracked-On: #4134
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
In later patches we use information from DMAR tables to guide discovery
and initialization of PCI devices.
Tracked-On: #4134
Signed-off-by: Alexander Merritt <alex.merritt@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
We add new member pci_pdev.drhd_idx associating the DRHD
(IOMMU) with this pdev, and a method to convert a pbdf of a device to
this index by searching the pdev list.
Partial patch: drhd_index initialization handled in subsequent patch.
Tracked-On: #4134
Signed-off-by: Alexander Merritt <alex.merritt@intel.com>
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Add some encapsulation of utilities which read PCI header space using
wrapper functions. Also contain verification of PCI vendor to its own
function, rather than having hard-coded integrals exposed among other
code.
Tracked-On: #4134
Signed-off-by: Alexander Merritt <alex.merritt@intel.com>
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
add shell command to support dump dump guest memory
e.g.
dump_guest_mem vm_id, gva, length
Tracked-On: #4144
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
now this api assumes the guest OS is 64 bits,
this patch remove this api and will replace it
with dumping guest memory.
Tracked-On: #4144
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The $(BOARD)_acpi_info.h is generated by acrn-config tool, remove this
header in make clean would cause failure when user finish configuring
in webUI and start to make acrn-hypervisor by the command
"make hypervisor BOARD=xxx SCENARIO=yyy" because we mandatory do make
clean before making hypervisor.
The patch replace the file removal with a warning string to hint user
to check the file validity.
Tracked-On: #3779
Signed-off-by: Victor Sun <victor.sun@intel.com>
call vuart_register_io_handler function, when the parameter vuart_idx is greater
than or equal to 2, print the vuart index value which will not register the vuart.
Tracked-On: #4072
Signed-off-by: Jidong Xia <xiajidong@cmss.chinamobile.com>
After reshuffle pci_bar structrue we could write ~0U not BAR size mask to BAR
configuration space directly when do BAR sizing. In this case, we could know whether
the value in BAR configuration space is a valid base address. As a result, we could
do BAR re-programming whenever we want.
Tracked-On: #3475
Signed-off-by: Li Fei1 <fei1.li@intel.com>
The current code declare pci_bar structure following the PCI bar spec. However,
we could not tell whether the value in virtual BAR configuration space is valid
base address base on current pci_bar structure. We need to add more fields which
are duplicated instances of the vBAR information. Basides these fields which will
added, bar_base_mapped is another duplicated instance of the vBAR information.
This patch try to reshuffle the pci_bar structure to declare pci_bar structure
following the software implement benefit not the PCI bar spec.
Tracked-On: #3475
Signed-off-by: Li Fei1 <fei1.li@intel.com>
The current do PCI IO BAR remap in vdev_pt_allow_io_vbar. This patch split this
function into vdev_pt_deny_io_vbar and vdev_pt_allow_io_vbar. vdev_pt_deny_io_vbar
removes the old IO port mapping, vdev_pt_allow_io_vbar add the new IO port mapping.
Tracked-On: #3475
Signed-off-by: Li Fei1 <fei1.li@intel.com>
In 64-bit mode, processor pushes SS and RSP onto stack unconditionally.
Also when dumping the exception info, it makes more sense to dump
the RSP at the point of interrupt, rather than the RSP after pushing
context (including GPRs)
Tracked-On: #4102
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The *.mk files under misc/acrn-config/library are all rules for hypervisor
makefiles only, so move these files to hypervisor/scripts/makefile/ folder.
The folder of acrn-config/library/ will be used to store python script lib only.
Tracked-On: #3779
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Terry Zou <terry.zou@intel.com>
The board specific $(BOARD)_acpi_info.h is generated by acrn-config tool,
we should clean it up before build hypervisor, otherwise the file could be
referenced by next build process if no config XMLs is specified.
Tracked-On: #3779
Signed-off-by: Victor Sun <victor.sun@intel.com>
Issue description:
-----------------
Machine Check Error on Page Size Change
Instruction fetch may cause machine check error if page size
and memory type was changed without invalidation on some
processors[1][2]. Malicious guest kernel could trigger this issue.
This issue applies to both primary page table and extended page
tables (EPT), however the primary page table is controlled by
hypervisor only. This patch mitigates the situation in EPT.
Mitigation details:
------------------
Implement non-execute huge pages in EPT.
This patch series clears the execute permission (bit 2) in the
EPT entries for large pages. When EPT violation is triggered by
guest instruction fetch, hypervisor converts the large page to
smaller 4 KB pages and restore the execute permission, and then
re-execute the guest instruction.
The current patch turns on the mitigation by default.
The follow-up patches will conditionally turn on/off the feature
per processor model.
[1] Refer to erratum KBL002 in "7th Generation Intel Processor
Family and 8th Generation Intel Processor Family for U Quad Core
Platforms Specification Update"
https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/7th-gen-core-family-spec-update.pdf
[2] Refer to erratum SKL002 in "6th Generation Intel Processor
Family Specification Update"
https://www.intel.com/content/www/us/en/products/docs/processors/core/desktop-6th-gen-core-family-spec-update.html
Tracked-On: #4101
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Currently make hypervisor will depend on a $(BOARD).config file to load
board defconfig which triggered by oldconfig process, this will block
make from XMLs for a new board because $(BOARD).config never exist.
This requires us to patch configuration for new board earlier than make
oldconfig.
Tracked-On: #4067
Signed-off-by: Victor Sun <victor.sun@intel.com>
In non-64-bit mode, CS segment base address should be considered when
determining the linear address of the vcpu's instruction pointer. Use
vie_calculate_gla() for instruction address translation which also takes
care of 64-bit mode.
Tracked-On: #4064
Signed-off-by: Peter Fang <peter.fang@intel.com>