Issue description:
-----------------
Machine Check Error on Page Size Change
Instruction fetch may cause machine check error if page size
and memory type was changed without invalidation on some
processors[1][2]. Malicious guest kernel could trigger this issue.
This issue applies to both primary page table and extended page
tables (EPT), however the primary page table is controlled by
hypervisor only. This patch mitigates the situation in EPT.
Mitigation details:
------------------
Implement non-execute huge pages in EPT.
This patch series clears the execute permission (bit 2) in the
EPT entries for large pages. When EPT violation is triggered by
guest instruction fetch, hypervisor converts the large page to
smaller 4 KB pages and restore the execute permission, and then
re-execute the guest instruction.
The current patch turns on the mitigation by default.
The follow-up patches will conditionally turn on/off the feature
per processor model.
[1] Refer to erratum KBL002 in "7th Generation Intel Processor
Family and 8th Generation Intel Processor Family for U Quad Core
Platforms Specification Update"
https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/7th-gen-core-family-spec-update.pdf
[2] Refer to erratum SKL002 in "6th Generation Intel Processor
Family Specification Update"
https://www.intel.com/content/www/us/en/products/docs/processors/core/desktop-6th-gen-core-family-spec-update.html
Tracked-On: #4120
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Cacheline is flushed on EPT entry change, no need to invalidate cache globally
when VM created per VM.
Tracked-On: #4120
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Reviewed-by: Anthony Xu <anthony.xu@intel.com>
EPT tables are shared by MMU and IOMMU.
Some IOMMUs don't support page-walk coherency, the cpu cache of EPT entires
should be flushed to memory after modifications, so that the modifications
are visible to the IOMMUs.
This patch adds a new interface to flush the cache of modified EPT entires.
There are different implementations for EPT/PPT entries:
- For PPT, there is no need to flush the cpu cache after update.
- For EPT, need to call iommu_flush_cache to make the modifications visible
to IOMMUs.
Tracked-On: #4120
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Reviewed-by: Anthony Xu <anthony.xu@intel.com>
VT-d shares the EPT tables as the second level translation tables.
For the IOMMUs that don't support page-walk coherecy, cpu cache should
be flushed for the IOMMU EPT entries that are modified.
For the current implementation, EPT tables for translating from GPA to HPA
for EPT/IOMMU are not modified after VM is created, so cpu cache invlidation is
done once per VM before starting execution of VM.
However, this may be changed, runtime EPT modification is possible.
When cpu cache of EPT entries is invalidated when modification, there is no need
invalidate cpu cache globally per VM.
This patch exports iommu_flush_cache for EPT entry cache invlidation operations.
- IOMMUs share the same copy of EPT table, cpu cache should be flushed if any of
the IOMMU active doesn't support page-walk coherency.
- In the context of ACRN, GPA to HPA mapping relationship is not changed after
VM created, skip flushing iotlb to avoid potential performance penalty.
Tracked-On: #4120
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Reviewed-by: Anthony Xu <anthony.xu@intel.com>
AP trampoline code should be accessile to hypervisor only,
Unmap this memory region from service VM's EPT mapping
for security reason..
Tracked-On: #4091
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
1. Print warning message instead of panic when
the caller try to modify the attribute for
memory region or delete memory region that
are not present.
2. To avoid above warning message for memory region
below 1M,its attribute may be updated by Service
VM when updating MTTR setting.
Tracked-On: #4091
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
'pcpu_id' should be less than CONFIG_MAX_PCPU_NUM,
else 'per_cpu_data' will overflow. This commit fixes
this potential overflow issue.
Tracked-On: #3397
Signed-off-by: Tianhua Sun <tianhuax.s.sun@intel.com>
Reviewed-by: Yonghua Huang <yonghua.huang@intel.com>
Possible buffer overflow will happen in vlapic_set_tmr()
and vlapic_update_ppr(),this path is to fix them.
Tracked-On: #1252
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The 'boot_params' and 'entry' might be dereferenced after they were
positively checked for NULL. Refine checking logic to fix the issue.
Tracked-On: #2979
Signed-off-by: Qi Yadong <yadong.qi@intel.com>
Acked-by: Zhu Bing <bing.zhu@intel.com>
In the presence of SOS, ACRN uses fallback_iommu_domain which is the same
used by SOS, to assign domain to devices during ACRN init. Also it uses
fallback_iommu_domain when DM requests ACRN to remove device from UOS domain.
This patch changes the design of assign/remove_iommu_device to avoid the
concept of fallback_iommu_domain and its setup. This way ACRN can commonly
treat pre-launched VMs bringup w.r.t. IOMMU domain creation.
Tracked-On: #2965
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
In the current design, logic partition scenario is supported
on KBL NUC i7 since there is no related configuration and
no the cooresponding boot loader supporting.
The boot loader supporting is done in the previous patch.
Add some configurations such physical PCI devices information,
virtual e820 table etc for KBL NUC i7 to enable logical
partition scenario.
In the logical partition of KBL NUC i7, there are two
pre-launched VM, this pre-launched VM doesn't support
local APIC passthrough now. The hypervisor is booted through
GRUB.
TODO: In future, Local APIC passthrough and some real time
fetures are needed for the logic partition scenario of KBL
NUC i7.
V5-->V6:
Update "Tracked-On"
Tracked-On: #2944
Signed-off-by: Xiangyang Wu <xiangyang.wu@linux.intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
add new hypercall get platform information,
such as physical CPU number.
Tracked-On: #2538
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
ACRN builds mptable for pre-launched VMs. It uses CONFIG_PARTITION_MODE
to compile mptable source code and related support. This patch removes
the macro and checks if the type of VM is pre-launched to build mptable.
Tracked-On: #2941
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Currently VM id of NORMAL_VM is allocated dymatically, we need to make
VM id statically for FuSa compliance.
This patch will pre-configure UUID for all VMs, then NORMAL_VM could
get its VM id/configuration from vm_configs array by indexing the UUID.
If UUID collisions is found in vm configs array, HV will refuse to
load the VM;
Tracked-On: #2291
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Return true if vm configs is sanitized successfully, otherwise return false;
Tracked-On: #2291
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The code mixed the usage on term of UUID and GUID, now use UUID to make
code more consistent, also will use lowercase (i.e. uuid) in variable name
definition.
Tracked-On: #2291
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
1) In x2apic mode, when read ICR, we want to read a 64-bits value.
2) In x2apic mode, write self-IPI will trap out through MSR write when VID isn't enabled.
Tracked-On: #1842
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
We could call vlapic API directly, remove vlapic_rdmsr/wrmsr to make things easier.
Tracked-On: #1842
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
Now the io_emul.c is relates with arch,io_req.c is common,
move some APIs from io_emul.c to io_req.c as common like these APIs:
register_pio/mmio_emulation_handler
dm_emulate_pio/mmio_complete
pio_default_read/write
mmio_default_access_handler
hv_emulate_pio/mmio etc
Tracked-On: #1842
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Move ‘emul_pio[]/default_io_read/default_io_write’
from struct vm_arch to struct acrn_vm
Tracked-On: #1842
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
-- this api is related with arch_x86, then move to x86 folder
-- rename 'set_vhm_vector' to 'set_vhm_notification_vector'
-- rename 'acrn_vhm_vector' to 'acrn_vhm_notification_vector'
-- add an API 'get_vhm_notification_vector'
Tracked-On: #1842
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
For Pre-launched VMs, ACRN uses mptable for reporting APIC IDs to guest OS.
In current code, ACRN uses physical LAPIC IDs for vLAPIC IDs.
This patch is to let ACRN use vCPU id for vLAPIC IDs and also report the same
when building mptable. ACRN should still use physical LAPIC IDs for SOS
because host ACPI tables are passthru to SOS.
Tracked-On: #2934
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
Now the MAX supported VM number is defined explicitly for each scenario,
so move this config from Kconfig to VM configuration.
Tracked-On: #2291
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Previously we use unified vm_config.c for all scenarios and use MACROs
for each configuration items, then the initialization of vm_configs[]
becomes more complicated when definition of MACROs increase, so change
the coding style that all configurable items could be explicitly shown in
vm_configuration.c to make code more readable.
Tracked-On: #2291
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
As vector re-mapping is enabled for pre-launched/partition mode VMs,
there is no more need for separate interrupt routine i.e.
partition_mode_dispatch_interrupt.
Tracked-On: #2879
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
For pre-launched VMs MSI/MSI-x configuration writes are not intercepted by ACRN.
It is pass-thru and interrupts land in ACRN and the guest vector is injected into
the VM's vLAPIC. With this patch, ACRN intercepts MSI/MSI-x config writes and take
the code path to remap interrupt vector/APIC ID as it does for SOS/UOS.
Tracked-On: #2879
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
This patch mainly does the following:
- Replace prefix RT_VM_ with VIRTUAL_.
- Remove the check of "addr != RT_VM_PM1A_CNT_ADDR" as the handler is specific for this addr.
- Add comments about the meaning of return value.
Tracked-On: #2865
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Intel SDM Vol3 23.8 says:
The INIT signal is blocked whenever a logical processor is in VMX root operation.
It is not blocked in VMX nonroot operation. Instead, INITs cause VM exits
So, there is no side-effect to send INIT signal regardless of pcpu active status.
Tracked-On: #2865
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
All if . . else if constructs shall be
terminated with an else statement.
Tracked-On: #861
Signed-off-by: Huihuang Shi <huihuang.shi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com
The pt_dev.c in board folder is replaced by the one in scenarios folder,
so remove them.
Tracked-On: #2291
Signed-off-by: Victor Sun <victor.sun@intel.com>
After using get_vm_from_vmid(), vm pointer is always not NULL. But there are still many NULL pointer checks.
This commit replaced the NULL vm pointer check with a validation check which checks the vm status.
In addition, NULL check for pointer returned by get_sos_vm() and get_vm_config() is removed.
Tracked-On: #2520
Signed-off-by: Yan, Like <like.yan@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The CLOS is initialized to 0 for each scenarios. User could modify this
configuration in its vm_configurations.h;
Tracked-On: #2291
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
In this scenario, hypervisor will run two logical partition VMs.
Please note that the Kconfig of Hypervisor mode will be removed
gradually. In current Kconfig setting, the CONFIG_PARTITION_MODE
is still kept for now for back-compatibility.
Tracked-On: #2291
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Previously the vm_configs[] is defined separately for sharing mode and
partition mode, but the concept of hypervisor mode will be removed. Instead
we will introduce scenario Kconfig for hypervisor to load different vm
configurations.
SDC(Software Defined Cockpit) is a typical scenario that ACRN supported
so we introduce this scenario for previously sharing mode and move its
configurations to scenarios/sdc folder. The configuration could be used
for all boards reference.
Tracked-On: #2291
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Use MACROs in pt_dev.c to replace straight-forward BDF numbers. The
pt devices for each VM will be chosen from Board specific PCI devices
list which defined in pci_devices.h;
Tracked-On: #2291
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add the default handlers for PIO and MMIO access which returns all
FFs on read and discards write. These default handlers are registered
when SOS VM or pre-launched VM is created.
v3 -> v4:
- use single layer if in hv_emulate_pio
- change the implementation of pio_default_read
v2 -> v3:
- use runtime vm type instead of CONFIG_PARTITION_MODE
- revise the pio/mmio emulation functions
- revise the pio/mmio default read functions according to MISRA C
- revise the commit message
v1 -> v2:
- add default handlers members in struct acrn_vm and add interfaces
to register default handlers for PIO and MMIO.
Tracked-On: #2860
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
When RTVM is trying to poweroff by itself, we use INIT to
kick vCPUs off the non-root mode.
For RTVM, only if vm state equal VM_POWERING_OFF, we take action to pause
the vCPUs with INIT signal. Otherwise, we will reject the pause request.
Tracked-On: #2865
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This patch makes make_reschedule_request support for kicking
off vCPU using INIT.
Tracked-On: #2865
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This API is only for kick vcpu out of non-root mode when
RTVM poweroff by itself. And the first caller will soon come
along with the next patch.
Tracked-On: #2865
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
We set the vm state as VM_POWERING_OFF when RTVM is trying to poweroff by itself.
We will check it when trying to pause vCPUs of RTVM. Only if vm state equal to
VM_POWERING_OFF, we take action to pause the vCPUs of RTVM. Otherwise, we will
reject the pause request.
Tracked-On: #2865
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
The virtual pm port of RTVM is intercepted by HV. But the HV needs to inform the DM as well.
So we will forward the virtual S5 request to DM too
The handler in HV just set the pm state flag (VM_POWERING_OFF) which indicate that the RTVM is powering
off by itself. Meanwhile, there are data resources in VHM and DM should be released once we handle the PM
of RTVM in HV. So, return to DM to go through the entire VM destroy cycles to release the resources.
During the cycles, the DM will try to pause vm through hypercall. In the hypercall handler in HV, we will
check the pm state flag. If it is set, pause all the vCPUs of the vm. Otherwise, reject the request.
In this way, we can make sure that RTVM can only trigger its s5 by itself. All
other S5 request from external will be rejected.
Here is sequence chart of RTVM s5.
poweroff
+-----------+ +----------+ +-----------+ +----------+
| vBSP | | vAPs | | HV | | DM |
+-----+-----+ +----------+ +-----+-----+ +-----+----+
| | | |
| Stop all other cpus | | |
+----------------------------+ | |
| | |Disable LAPIC | |
| +<-+ | |
| | | |
| +--+ | |
| | |HLT in | |
| All other cpus stopped | |non-root mode | |
+----------------------------+ | |
| Call ACPI method to enter s5 | |
+-------------------------+---------------------> | |
| | Set s5 flag | |
| | <---------------------+ |
| | APs paused | Re-inject IOREQ TO DM
| | +-------------------> +-------------------> +
| | | Pause VM |
| | Check S5 flag: | <-------------------+
| | - If set, pause vm | VM paused |
| | - If no, reject | +-----------------> +--+
| | | Destroy VM | |Deinit works
| | | <--------------------<-+
| | | VM destroyed |
| | | +-----------------> |
+ + + +
Tracked-On: #2865
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
This patch makes io_read_fn_t return true or false instead of void.
Returning true means that the handler in HV process the request completely.
Returning false means that we need to re-inject the request to DM after
processing it in HV.
Tracked-On: #2865
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
This patch makes io_write_fn_t return true or false instead of void.
Returning true means that the handler in HV process the request completely.
Returning false means that we need to re-inject the request to DM after
processing it in HV.
Tracked-On: #2865
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This patch checks if the GUEST_FLAG_RT is set when GUEST_FLAG_LAPIC_PASSTHROUGH is set.
If GUEST_FLAG_RT is not set while GUEST_FLAG_LAPIC_PASSTHROUGH is set, we will refuse
to boot the VM.
Meanwhile, this patch introduces a new API is_rt_vm.
Tracked-On: #2865
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
In hypervisor fuzzing test, hypervisor will hang
if issuing HV_VM_SET_MEMORY_REGIONS hypercall after
target VM is destroyed.
this patch is to fix above vulnerability.
Tracked-On: #2849
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
Since we always enable "Use TPR shadow", so operate on TPR will not
trigger VM exit. So remove these APIs.
Tracked-On: #1842
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>