On some platforms, HPA regions for Virtual Machine can not be
contiguous because of E820 reserved type or PCI hole. In such
cases, pre-launched VMs need to be assigned non-contiguous memory
regions and this patch addresses it.
To keep things simple, current design has the following assumptions,
1. HPA2 always will be placed after HPA1
2. HPA1 and HPA2 don’t share a single ve820 entry.
(Create multiple entries if needed but not shared)
3. Only support 2 non-contiguous HPA regions (can extend
at a later point for multiple non-contiguous HPA)
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Tracked-On: #4195
Acked-by: Anthony Xu <anthony.xu@intel.com>
1. disable physical MSI before writing the virtual MSI CFG space
2. do the remap_vmsi if the guest wants to enable MSI or update MSI address or data
3. disable INTx and enable MSI after step 2.
The previous Message Control check depends on the guest write MSI Message Control
Register at the offset of Message Control Register. However, the guest could access
this register at the offset of MSI Capability ID register. This patch remove this
constraint. Also, The previous implementation didn't really disable MSI when guest
wanted to disable MSI.
Tracked-On: #3475
Signed-off-by: Li Fei1 <fei1.li@intel.com>
With cpu-sharing enabled, there are more than 1 vcpu on 1 pcpu, so the
smp_call handler should switch the vmcs to the target vcpu's vmcs. Then
get the info.
dump_vcpu_reg and dump_guest_mem should run on certain vmcs, otherwise,
there will be #GP error.
Renaming:
vcpu_dumpreg -> dump_vcpu_reg
switch_vmcs -> load_vmcs
Tracked-On: #4178
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Deterministic is important for RTVM. The mitigation for MCE on
Page Size Change converts a large page to 4KB pages runtimely during
the vmexit triggered by the instruction fetch in the large page.
These vmexits increase nondeterminacy, which should be avoided for RTVM.
This patch builds 4KB page mapping in EPT for RTVM to avoid these vmexits.
Tracked-On: #4101
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Only apply the software workaround on the models that might be
affected by MCE on page size change. For these models that are
known immune to the issue, the mitigation is turned off.
Atom processors are not afftected by the issue.
Also check the CPUID & MSR to check whether the model is immune to the issue:
CPU is not vulnerable when both CPUID.(EAX=07H,ECX=0H).EDX[29] and
IA32_ARCH_CAPABILITIES[IF_PSCHANGE_MC_NO] are 1.
Other cases not listed above, CPU may be vulnerable.
This patch also changes MACROs for MSR IA32_ARCH_CAPABILITIES bits to UL instead of U
since the MSR is 64bit.
Tracked-On: #4101
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
After changing init_vmcs to smp call approach and do it before
launch_vcpu, it could work with noop scheduler. On real sharing
scheudler, it has problem.
pcpu0 pcpu1 pcpu1
vmBvcpu0 vmAvcpu1 vmBvcpu1
vmentry
init_vmcs(vmBvcpu1) vmexit->do_init_vmcs
corrupt current vmcs
vmentry fail
launch_vcpu(vmBvcpu1)
This patch mark a event flag when request vmcs init for specific vcpu. When
it is running and checking pending events, will do init_vmcs firstly.
Tracked-On: #4178
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The default PCI mmcfg base is stored in ACPI MCFG table, when
CONFIG_ACPI_PARSE_ENABLED is set, acpi_fixup() function will
parse and fix up the platform mmcfg base in ACRN boot stage;
when it is not set, platform mmcfg base will be initialized to
DEFAULT_PCI_MMCFG_BASE which generated by acrn-config tool;
Please note we will not support platform which has multiple PCI
segment groups.
Tracked-On: #4157
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
xsave area:
legacy region: 512 bytes
xsave header: 64 bytes
extended region: < 3k bytes
So, pre-allocate 4k area for xsave. Use certain instruction to save or
restore the area according to hardware xsave feature set.
Tracked-On: #4166
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Reviewed-by: Anthony Xu <anthony.xu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
ptirq_msix_remap doesn't do the real remap, that's the vmsi_remap and vmsix_remap_entry
does. ptirq_msix_remap only did the preparation.
Tracked-On: #3475
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Major changes:
1. Correct handling of device multi-function capability
We only check function zero for this feature. If it has it, we continue
looking at all remaining functions, ignoring those with invalid vendors.
The PCI spec says we are not to probe beyond function zero if it does
not exist or indicates it is not a multi-function device.
2a. Walk *ALL* buses in the PCI space, however,
Before walking the PCI hierarchy, post-processed ACPI DMAR info is parsed
and a map is created between all device-scopes across all DRHDs and the
corresponding IOMMU index.
This map is used at the time of walking the PCI hierarchy. If a BDF that
ACRN is currently working on, is found in the above-mentioned map, the
BDF device is mapped to the corresponding DRHD in the map.
If the BDF were a bridge type, realized with "Header Type" in config space,
the BDF device along with all its downstream devices are mapped to the
corresponding DRHD in the map.
To avoid walking previously visited buses, we maintain a bitmap that
stores which bus is walked when we handle Bridge type devices.
Once ACPI information is included into ACRN about the PCI-Express Root
Complexes / PCI Host Bridges, we can avoid the final loop which probes
all remainder buses, and instead jump to the next Host Bridge bus.
From prior patches, init_pdev returns the pdev structure it created to
the caller. This allows us to complete initialization by updating its
drhd_idx to the correct DRHD.
Tracked-On: #4134
Signed-off-by: Alexander Merritt <alex.merritt@intel.com>
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
We add new member pci_pdev.drhd_idx associating the DRHD
(IOMMU) with this pdev, and a method to convert a pbdf of a device to
this index by searching the pdev list.
Partial patch: drhd_index initialization handled in subsequent patch.
Tracked-On: #4134
Signed-off-by: Alexander Merritt <alex.merritt@intel.com>
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Add some encapsulation of utilities which read PCI header space using
wrapper functions. Also contain verification of PCI vendor to its own
function, rather than having hard-coded integrals exposed among other
code.
Tracked-On: #4134
Signed-off-by: Alexander Merritt <alex.merritt@intel.com>
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
add shell command to support dump dump guest memory
e.g.
dump_guest_mem vm_id, gva, length
Tracked-On: #4144
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The current code declare pci_bar structure following the PCI bar spec. However,
we could not tell whether the value in virtual BAR configuration space is valid
base address base on current pci_bar structure. We need to add more fields which
are duplicated instances of the vBAR information. Basides these fields which will
added, bar_base_mapped is another duplicated instance of the vBAR information.
This patch try to reshuffle the pci_bar structure to declare pci_bar structure
following the software implement benefit not the PCI bar spec.
Tracked-On: #3475
Signed-off-by: Li Fei1 <fei1.li@intel.com>
In 64-bit mode, processor pushes SS and RSP onto stack unconditionally.
Also when dumping the exception info, it makes more sense to dump
the RSP at the point of interrupt, rather than the RSP after pushing
context (including GPRs)
Tracked-On: #4102
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Issue description:
-----------------
Machine Check Error on Page Size Change
Instruction fetch may cause machine check error if page size
and memory type was changed without invalidation on some
processors[1][2]. Malicious guest kernel could trigger this issue.
This issue applies to both primary page table and extended page
tables (EPT), however the primary page table is controlled by
hypervisor only. This patch mitigates the situation in EPT.
Mitigation details:
------------------
Implement non-execute huge pages in EPT.
This patch series clears the execute permission (bit 2) in the
EPT entries for large pages. When EPT violation is triggered by
guest instruction fetch, hypervisor converts the large page to
smaller 4 KB pages and restore the execute permission, and then
re-execute the guest instruction.
The current patch turns on the mitigation by default.
The follow-up patches will conditionally turn on/off the feature
per processor model.
[1] Refer to erratum KBL002 in "7th Generation Intel Processor
Family and 8th Generation Intel Processor Family for U Quad Core
Platforms Specification Update"
https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/7th-gen-core-family-spec-update.pdf
[2] Refer to erratum SKL002 in "6th Generation Intel Processor
Family Specification Update"
https://www.intel.com/content/www/us/en/products/docs/processors/core/desktop-6th-gen-core-family-spec-update.html
Tracked-On: #4101
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
MISRA C requires specified bounds for arrays declaration, previous declaration
of platform_clos_array in board.h does not meet the requirement.
Tracked-On: #3987
Signed-off-by: Victor Sun <victor.sun@intel.com>
The DMAR info is board specific so move the structure definition to board.c.
As a configruation file, the whole board.c could be generated by acrn-config
tool for each board.
Please note we only provide DMAR info MACROs for nuc7i7dnb board. For other
boards, ACPI_PARSE_ENABLED must be set to y in Kconfig to let hypervisor parse
DMAR info, or use acrn-config tool to generate DMAR info MACROs if user won't
enable ACPI parse code for FuSa consideration.
The patch also moves the function of get_dmar_info() to vtd.c, so dmar_info.c
could be removed.
Tracked-On: #3977
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The value of CONFIG_MAX_IOMMU and MAX_DRHDS are identical to DRHD_COUNT
which defined in platform ACPI table, so remove CONFIG_MAX_IOMMU_NUM
from Kconfig and link these three MACROs together.
Tracked-On: #3977
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Now the e820 structure store ACRN HV memory layout, not the physical memory layout.
Rename e820 to hv_hv_e820 to show this explicitly.
Tracked-On: #4007
Signed-off-by: Li Fei1 <fei1.li@intel.com>
After adding PCI BAR remap support, mmio_node may unregister when there's others
access it. This patch add a lock to protect mmio_node access.
Tracked-On: #3475
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Concurrent access on PCI device may happened if UOS try to access PCI configuration
space on different vCPUs through IO port. This patch just adds a global PCI lock for
each VM to prevent the concurrent access.
Tracked-On: #3475
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Refine PCI CONFIG_ADDRESS Register definition as its physical layout.
In this case, we could read/write PCI CONFIG_ADDRESS Register atomically.
Tracked-On: #3475
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Since guest could re-program PCI device MSI-X table BAR, we should add mmio
emulation handler unregister.
However, after add unregister_mmio_emulation_handler API, emul_mmio_regions
is no longer accurate. Just replace it with max_emul_mmio_regions which records
the max index of the emul_mmio_node.
Tracked-On: #3475
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
In theory, guest could re-program PCI BAR address to any address. However, ACRN
hypervisor only support [0, top_address_space) EPT memory mapping. So we need to
check whether the PCI BAR re-program address is within this scope.
Tracked-On: #3475
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
kick means to notify one thread_object. If the target thread object is
running, send a IPI to notify it; if the target thread object is
runnable, make reschedule on it.
Also add kick_vcpu API in vcpu layer to notify vcpu.
Tracked-On: #3813
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
After moving softirq to following interrupt path, softirq handler might
break in the schedule spinlock context and try to grab the lock again,
then deadlock.
Disable interrupt with schedule spinlock context.
For the IRQ disable/restore operations:
CPU_INT_ALL_DISABLE(&rflag)
CPU_INT_ALL_RESTORE(rflag)
each takes 50~60 cycles.
renaming: get_schedule_lock -> obtain_schedule_lock
Tracked-On: #3813
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This patch decouple some scheduling logic and abstract into a scheduler.
Then we have scheduler, schedule framework. From modulization
perspective, schedule framework provides some APIs for other layers to
use, also interact with scheduler through scheduler interaces.
Tracked-On: #3813
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
To get pcpu_id from sched_control quickly and easier.
Tracked-On: #3813
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
- The default behaviors of PIO & MMIO handlers are same
for all VMs, no need to expose dedicated APIs to register
default hanlders for SOS and prelaunched VM.
Tracked-On: #3904
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Currently the parameter of init_ept_mem_ops is
'struct acrn_vm *vm' for this api,change it to
'struct memory_ops *mem_ops' and 'vm_id' to avoid
the reversed dependency, page.c is hardware layer and vm structure
is its upper-layer stuff.
Tracked-On: #1842
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Let init thread end with run_idle_thread(), then idle thread take over and
start to do scheduling.
Change enter_guest_mode() to init_guest_mode() as run_idle_thread() is removed
out of it. Also add run_thread() in schedule module to run
thread_object's thread loop directly.
rename: switch_to_idle -> run_idle_thread
Tracked-On: #3813
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
sleep one thread_object means to prevent it from being scheduled.
wake one thread_object is an opposite operation of sleep.
This patch also add notify_mode in thread_object to indicate how to
deliver the request.
Tracked-On: #3813
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Now, we have three valid status for thread_object:
THREAD_STS_RUNNING,
THREAD_STS_RUNNABLE,
THREAD_STS_BLOCKED.
This patch also provide several helpers to check the thread's status and
a status set wrapper function.
Tracked-On: #3813
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
schedule infrastructure is per pcpu, so move its initialization to each
pcpu's initialization.
Tracked-On: #3813
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
To support cpu sharing, multiple vcpu can run on same pcpu. We need do
necessary vcpu context switch. This patch add below actions in context
switch.
1) fxsave/fxrstor;
2) save/restore MSRs: MSR_IA32_STAR, MSR_IA32_LSTAR,
MSR_IA32_FMASK, MSR_IA32_KERNEL_GS_BASE;
3) switch vmcs.
Tracked-On: #3813
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
With cpu sharing enabled, per_cpu vcpu cannot work properly as we might
has multiple vcpus running on one pcpu.
Add a schedule API sched_get_current to get current thread_object on
specific pcpu, also add a vcpu API get_running_vcpu to get corresponding
vcpu of the thread_object.
Tracked-On: #3813
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
With cpu sharing enabled, we will map acrn_vcpu to thread_object
in scheduling. From modulization perspective, we'd better hide the
pcpu_id in acrn_vcpu and move it to thread_object.
Tracked-On: #3813
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Two time related synthetic MSRs are implemented in this patch. Both of
them are partition wide MSR.
- HV_X64_MSR_TIME_REF_COUNT is read only and it is used to return the
partition's reference counter value in 100ns units.
- HV_X64_MSR_REFERENCE_TSC is used to set/get the reference TSC page,
a sequence number, an offset and a multiplier are defined in this
page by hypervisor and guest OS can use them to calculate the
normalized reference time since partition creation, in 100ns units.
Tracked-On: #3831
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
This patch implements the minimum set of TLFS functionality. It
includes 6 vCPUID leaves and 3 vMSRs.
- 0x40000001 Hypervisor Vendor-Neutral Interface Identification
- 0x40000002 Hypervisor System Identity
- 0x40000003 Hypervisor Feature Identification
- 0x40000004 Implementation Recommendations
- 0x40000005 Hypervisor Implementation Limits
- 0x40000006 Implementation Hardware Features
- HV_X64_MSR_GUEST_OS_ID Reporting the guest OS identity
- HV_X64_MSR_HYPERCALL Establishing the hypercall interface
- HV_X64_MSR_VP_INDEX Retrieve the vCPU ID from hypervisor
Tracked-On: #3832
Signed-off-by: wenwumax <wenwux.ma@intel.com>
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
Currently we are using a 1:1 mapping logic for pcpu:vcpu. So don't need
a runqueue for it. Removing it as preparation work to abstract scheduler
framework.
Tracked-On: #3813
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
PRMRR related MSRs need to be configured by platform BIOS / bootloader.
These settings are not allowed to be changed by guest.
VMs currently have no requirement to access these MSRs even when vSGX is enabled.
So, this patch disables PRMRR related MSRs in VM.
Tracked-On: #3739
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
--remove unnecessary includes
--remove unnecssary forward-declaration for 'struct vhm_request'
Tracked-On: #861
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
The MSI Message Address and Message Data have no valid data after Power-ON. So
there's no need to initialize them by reading the data from physical PCI configuration
space.
Tracked-On: #3475
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
- update the function argument type to union
Declaring argument as pointer is not necessary since it
only does the comparison.
Tracked-On: #1842
Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
As we introduced vcpu_affinity[] to assign vcpus to different pcpus, the
old policy and functions are not needed. Remove them.
Tracked-On: #3663
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>