Add assign/deassign PCI device hypercall APIs to assign a PCI device from SOS to
post-launched VM or deassign a PCI device from post-launched VM to SOS. This patch
is prepared for spliting passthrough PCI device from DM to HV.
The old assign/deassign ptdev APIs will be discarded.
Tracked-On: #4371
Signed-off-by: Li Fei1 <fei1.li@intel.com>
To enable gvt-d,need to allow the GPU IOMMU.
While gvt-d hasn't been enabled on APL yet,
so let APL disable GPU IOMMU.
v2 -> v3:
* let APL platforms disable GPU IOMMU.
Tracked-On: #4405
Signed-off-by: Junming Liu <junming.liu@intel.com>
Reviewed-by: Wu Binbin <binbin.wu@intel.com>
In platforms that support CAT, when it is enabled by ACRN, i.e.
IA32_resourceType_MASK_n registers are programmed with customized values,
it has impacts to the whole system.
The per guest flag GUEST_FLAG_CLOS_REQUIRED suggests that CAT may be
enabled in some guests, but not in others who don't have this flag,
which is conceptually incorrect.
This patch removes GUEST_FLAG_CLOS_REQUIRED, and adds a new Kconfig
entry CAT_ENABLED for CAT enabling. When it's enabled, platform_clos_array[]
defines a set of system-wide Class of Service (COS, or CLOS), and the
per guest vm_configs[].clos associates the guest with particular CLOS.
Tracked-On: #2462
Signed-off-by: Zide Chen <zide.chen@intel.com>
1. Align the coding style for these MACROs
2. Align the values of fixed VECTORs
Tracked-On: #4348
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
is_polling_ioreq is more straightforward. Rename it.
Tracked-On: #4329
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
'param' is BDF value instead of GPA when VHM driver
issues below 2 hypercalls:
- HC_ASSIGN_PTEDEV
- HC_DEASSIGN_PTDEV
This patch is to remove related code in hc_assign/deassign()
functions.
Tracked-On: #4334
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
SOS will use PCIe ECAM access PCIe external configuration space. HV should trap this
access for security(Now pre-launched VM doesn't want to support PCI ECAM; post-launched
VM trap PCIe ECAM access in DM).
Besides, update PCIe MMCONFIG region to be owned by hypervisor and expose and pass through
platform hide PCI devices by BIOS to SOS.
Tracked-On: #3475
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Use Enhanced Configuration Access Mechanism (MMIO) instead of PCI-compatible
Configuration Mechanism (IO port) to access PCIe Configuration Space
PCI-compatible Configuration Mechanism (IO port) access is used for UART in
debug version.
Tracked-On: #3475
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Sometimes HV wants to know if there are pending interrupts of one vcpu.
Add .has_pending_intr interface in acrn_apicv_ops and return the pending
interrupts status by check IRRs of apicv.
Tracked-On: #4329
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Introduce two kinds of events for each vcpu,
VCPU_EVENT_IOREQ: for vcpu waiting for IO request completion
VCPU_EVENT_VIRTUAL_INTERRUPT: for vcpu waiting for virtual interrupts events
vcpu can wait for such events, and resume to run when the
event get signalled.
This patch also change IO request waiting/notifying to this way.
Tracked-On: #4329
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This simple event implemention can only support exclusive waiting
at same time. It mainly used by thread who want to wait for special event
happens.
Thread A who want to wait for some events calls
wait_event(struct sched_event *);
Thread B who can give the event signal calls
signal_event(struct sched_event *);
Tracked-On: #4329
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Before we assign a PT device to post-launched VM, we should reset the PCI device
first. However, ACRN hypervisor doesn't plan to support PCIe hot-plug and doesn't
support PCIe bridge Secondary Bus Reset. So the PT device must support FLR or PM
reset. This patch do this check when assigning a PT device to post-launched VM.
Tracked-On: #3465
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Since we restore BAR values when writing Command Register if necessary. We don't
need to trap FLR and do the BAR restore then.
Tracked-On: #3475
Signed-off-by: Li Fei1 <fei1.li@intel.com>
When PCIe does Conventinal Reset or FLR, almost PCIe configurations and states will
lost. So we should save the configurations and states before do the reset and restore
them after the reset. This was done well by BIOS or Guest now. However, ACRN will trap
these access and handle them properly for security. Almost of these configurations and
states will be written to physical configuration space at last except for BAR values
for now. So we should do the restore for BAR values. One way is to do restore after
one type reset is detected. This will be too complex. Another way is to do the restore
when BIOS or guest tries to write the Command Register. This could work because:
1. The I/O Space Enable bit and Memory Space Enable bits in Command Register will reset
to zero.
2. Before BIOS or guest wants to enable these bits, the BAR couldn't be accessed.
3. So we could restore the BAR values before enable these bits if reset is detected.
Tracked-On: #3475
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Per SDM 10.12.5.1 vol.3, local APIC should keep LAPIC state after receiving
INIT. The local APIC ID register should also be preserved.
Tracked-On: #4267
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The patch abstract a vcpu_reset_internal() api for internal usage, the
function would not touch any vcpu state transition and just do vcpu reset
processing. It will be called by create_vcpu() and reset_vcpu().
The reset_vcpu() will act as a public api and should be called
only when vcpu receive INIT or vm reset/resume from S3. It should not be
called when do shutdown_vm() or hcall_sos_offline_cpu(), so the patch remove
reset_vcpu() in shutdown_vm() and hcall_sos_offline_cpu().
The patch also introduced reset_mode enum so that vcpu and vlapic could do
different context operation according to different reset mode;
Tracked-On: #4267
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Some MACROs in lapic.h are duplicated with apicreg.h, and some MACROs are
never referenced, remove them.
Tracked-On: #4268
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Structure pci_vbar is used to define the virtual BAR rather than physical BAR.
It's better to name as pci_vbar.
Tracked-On: #3475
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add severity definitions for different scenarios. The static
guest severity is defined according to guest configurations.
Also add sanity check to make sure the severity for all guests
are correct.
Tracked-On: #4270
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
For guest reset, if the highest severity guest reset will reset
system. There is vm flag to call out the highest severity guest
in specific scenario which is a static guest severity assignment.
There is case that the static highest severity guest is shutdown
and the highest severity guest should be transfer to other guest.
For example, in ISD scenario, if RTVM (static highest severity
guest) is shutdown, SOS should be highest severity guest instead.
The is_highest_severity_vm() is updated to detect highest severity
guest dynamically. And promote the highest severity guest reset
to system reset.
Also remove the GUEST_FLAG_HIGHEST_SEVERITY definition.
Tracked-On: #4270
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
For system S5, ACRN had assumption that SOS shutdown will trigger
system shutdown. So the system shutdown logical is:
1. Trap SOS shutdown
2. Wait for all other guest shutdown
3. Shutdown system
The new logical is refined as:
If all guest is shutdown, shutdown whole system
Tracked-On: #4270
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
ACRN hypervisor should trap guest doing PCI AF FLR. Besides, it should save some status
before doing the FLR and restore them later, only BARs values for now.
This patch will trap guest Conventional PCI Advanced Features Control Register write
operation if the device supports Conventional PCI Advanced Features Capability and
check whether it wants to do device AF FLR. If it does, call pdev_do_flr to do the job.
Tracked-On: #3465
Signed-off-by: Li Fei1 <fei1.li@intel.com>
ACRN hypervisor should trap guest doing PCIe FLR. Besides, it should save some status
before doing the FLR and restore them later, only BARs values for now.
This patch will trap guest Device Capabilities Register write operation if the device
supports PCI Express Capability and check whether it wants to do device FLR. If it does,
call pdev_do_flr to do the job.
Tracked-On: #3465
Signed-off-by: Li Fei1 <fei1.li@intel.com>
We don't use INIT signal notification method now. This patch
removes them.
Tracked-On: #3886
Acked-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
We have implemented a new notification method using NMI.
So replace the INIT notification method with the NMI one.
Then we can remove INIT notification related code later.
Tracked-On: #3886
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
There is a window where we may miss the current request in the
notification period when the work flow is as the following:
CPUx + + CPUr
| |
| +--+
| | | Handle pending req
| <--+
+--+ |
| | Set req flag |
<--+ |
+------------------>---+
| Send NMI | | Handle NMI
| <--+
| |
| |
| +--> vCPU enter
| |
+ +
So, this patch enables the NMI-window exiting to trigger the next vmexit
once there is no "virtual-NMI blocking" after vCPU enter into VMX non-root
mode. Then we can process the pending request on time.
Tracked-On: #3886
Acked-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
The NMI for notification should not be inject to guest. So,
this patch drops NMI injection request when we use NMI
to notify vCPUs. Meanwhile, ACRN doesn't support vNMI well
and there is no well-designed way to check if the NMI is
for notification or for guest now. So, we take all the NMIs as
notificaton NMI for hard rtvm temporarily. It means that the
hard rtvm will never receive NMI with this patch applied.
TODO: vNMI support is not ready yet. we will add it later.
Tracked-On: #3886
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
ACRN hypervisor needs to kick vCPU off VMX non-root mode to do some
operations in hypervisor, such as interrupt/exception injection, EPT
flush etc. For non lapic-pt vCPUs, we can use IPI to do so. But, it
doesn't work for lapic-pt vCPUs as the IPI will be injected to VMs
directly without vmexit.
Without the way to kick the vCPU off VMX non-root mode to handle pending
request on time, there may be fatal errors triggered.
1). Certain operation may not be carried out on time which may further
lead to fatal errors. Taking the EPT flush request as an example, once we
don't flush the EPT on time and the guest access the out-of-date EPT,
fatal error happens.
2). ACRN now will send an IPI with vector 0xF0 to target vCPU to kick the vCPU
off VMX non-root mode if it wants to do some operations on target vCPU.
However, this way doesn't work for lapic-pt vCPUs. The IPI will be delivered
to the guest directly without vmexit and the guest will receive a unexpected
interrupt. Consequently, if the guest can't handle this interrupt properly,
fatal error may happen.
The NMI can be used as the notification signal to kick the vCPU off VMX
non-root mode for lapic-pt vCPUs. So, this patch uses NMI as notification signal
to address the above issues for lapic-pt vCPUs.
Tracked-On: #3886
Acked-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
Reserved bits in a 8-bit PAT field has been checked in pat_mem_type_invalid.
Remove this redundant check "(PAT_FIELD_RSV_BITS & field) != 0UL" in
write_pat_msr.
Tracked-On: #1842
Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
This patch adds a helper function send_single_nmi. The fisrt caller
will soon come with the following patch.
Tracked-On: #3886
Acked-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
This patch installs a NMI handler in acrn IDT to handle
NMIs out of dispatch_exception.
Tracked-On: #3886
Acked-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
In current architecutre, the maximum vCPUs number per VM could not
exceed the pCPUs number. Given the MAX_PCPU_NUM macro is provided
in board configurations, so remove the MAX_VCPUS_PER_VM from Kconfig
and add a macro of MAX_VCPUS_PER_VM to reference MAX_PCPU_NUM directly.
Tracked-On: #4230
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
rename the macro since MAX_PCPU_NUM could be parsed from board file and
it is not a configurable item anymore.
Tracked-On: #4230
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
remove 'guest_init_pml4' and 'tmp_pg_array' in vm_arch
since they are not used.
Tracked-On: #1842
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Add yield support for schedule, which can give up pcpu proactively.
Tracked-On: #4178
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
IO sensitive Round-robin scheduler aim to schedule threads with
round-robin policy. Meanwhile, we also enhance it with some fairness
configuration, such as thread will be scheduled out without properly
timeslice. IO request on thread will be handled in high priority.
This patch only add a skeleton for the sched_iorr scheduler.
Tracked-On: #4178
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
On some platforms, HPA regions for Virtual Machine can not be
contiguous because of E820 reserved type or PCI hole. In such
cases, pre-launched VMs need to be assigned non-contiguous memory
regions and this patch addresses it.
To keep things simple, current design has the following assumptions,
1. HPA2 always will be placed after HPA1
2. HPA1 and HPA2 don’t share a single ve820 entry.
(Create multiple entries if needed but not shared)
3. Only support 2 non-contiguous HPA regions (can extend
at a later point for multiple non-contiguous HPA)
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Tracked-On: #4195
Acked-by: Anthony Xu <anthony.xu@intel.com>
1. disable physical MSI before writing the virtual MSI CFG space
2. do the remap_vmsi if the guest wants to enable MSI or update MSI address or data
3. disable INTx and enable MSI after step 2.
The previous Message Control check depends on the guest write MSI Message Control
Register at the offset of Message Control Register. However, the guest could access
this register at the offset of MSI Capability ID register. This patch remove this
constraint. Also, The previous implementation didn't really disable MSI when guest
wanted to disable MSI.
Tracked-On: #3475
Signed-off-by: Li Fei1 <fei1.li@intel.com>
With cpu-sharing enabled, there are more than 1 vcpu on 1 pcpu, so the
smp_call handler should switch the vmcs to the target vcpu's vmcs. Then
get the info.
dump_vcpu_reg and dump_guest_mem should run on certain vmcs, otherwise,
there will be #GP error.
Renaming:
vcpu_dumpreg -> dump_vcpu_reg
switch_vmcs -> load_vmcs
Tracked-On: #4178
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Deterministic is important for RTVM. The mitigation for MCE on
Page Size Change converts a large page to 4KB pages runtimely during
the vmexit triggered by the instruction fetch in the large page.
These vmexits increase nondeterminacy, which should be avoided for RTVM.
This patch builds 4KB page mapping in EPT for RTVM to avoid these vmexits.
Tracked-On: #4101
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Only apply the software workaround on the models that might be
affected by MCE on page size change. For these models that are
known immune to the issue, the mitigation is turned off.
Atom processors are not afftected by the issue.
Also check the CPUID & MSR to check whether the model is immune to the issue:
CPU is not vulnerable when both CPUID.(EAX=07H,ECX=0H).EDX[29] and
IA32_ARCH_CAPABILITIES[IF_PSCHANGE_MC_NO] are 1.
Other cases not listed above, CPU may be vulnerable.
This patch also changes MACROs for MSR IA32_ARCH_CAPABILITIES bits to UL instead of U
since the MSR is 64bit.
Tracked-On: #4101
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
After changing init_vmcs to smp call approach and do it before
launch_vcpu, it could work with noop scheduler. On real sharing
scheudler, it has problem.
pcpu0 pcpu1 pcpu1
vmBvcpu0 vmAvcpu1 vmBvcpu1
vmentry
init_vmcs(vmBvcpu1) vmexit->do_init_vmcs
corrupt current vmcs
vmentry fail
launch_vcpu(vmBvcpu1)
This patch mark a event flag when request vmcs init for specific vcpu. When
it is running and checking pending events, will do init_vmcs firstly.
Tracked-On: #4178
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The default PCI mmcfg base is stored in ACPI MCFG table, when
CONFIG_ACPI_PARSE_ENABLED is set, acpi_fixup() function will
parse and fix up the platform mmcfg base in ACRN boot stage;
when it is not set, platform mmcfg base will be initialized to
DEFAULT_PCI_MMCFG_BASE which generated by acrn-config tool;
Please note we will not support platform which has multiple PCI
segment groups.
Tracked-On: #4157
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
xsave area:
legacy region: 512 bytes
xsave header: 64 bytes
extended region: < 3k bytes
So, pre-allocate 4k area for xsave. Use certain instruction to save or
restore the area according to hardware xsave feature set.
Tracked-On: #4166
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Reviewed-by: Anthony Xu <anthony.xu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
ptirq_msix_remap doesn't do the real remap, that's the vmsi_remap and vmsix_remap_entry
does. ptirq_msix_remap only did the preparation.
Tracked-On: #3475
Signed-off-by: Li Fei1 <fei1.li@intel.com>