In current architecutre, the maximum vCPUs number per VM could not
exceed the pCPUs number. Given the MAX_PCPU_NUM macro is provided
in board configurations, so remove the MAX_VCPUS_PER_VM from Kconfig
and add a macro of MAX_VCPUS_PER_VM to reference MAX_PCPU_NUM directly.
Tracked-On: #4230
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
rename the macro since MAX_PCPU_NUM could be parsed from board file and
it is not a configurable item anymore.
Tracked-On: #4230
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The initialization of "dmar_unit->gcmd" shall be done via reading from
Global Status Register rather than Global Command Register.
Rationale:
According to Chapter 10.4.4 Global Command Register in VT-d spec, Global Command
Register is a write-only register to control remapping hardware.
Global Status Register is the corresponding read-only register to report remapping
hardware status.
Tracked-On: #1842
Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
For now, we set NOOP scheduler as default. User can choose IORR scheduler as needed.
Tracked-On: #4178
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
In APICv advanced mode, an targeted vCPU, running in non-root mode, may get outdated
TMR and EOI exit bitmap if another vCPU sends an interrupt to it if the trigger mode
of this interrupt has changed.
This patch try to kick vCPU off to let it get the latest TMR and EOI exit bitmap when
it enters non-root mode again if new coming interrupt trigger mode has changed. Then
fill the interrupt to PIR.
Tracked-On: #4200
Signed-off-by: Li Fei1 <fei1.li@intel.com>
This patch updates kconfig to support server platforms
for increased number of VCPUs per VM and PT IRQ number.
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Tracked-On: #4196
On some platforms, HPA regions for Virtual Machine can not be
contiguous because of E820 reserved type or PCI hole. In such
cases, pre-launched VMs need to be assigned non-contiguous memory
regions and this patch addresses it.
To keep things simple, current design has the following assumptions,
1. HPA2 always will be placed after HPA1
2. HPA1 and HPA2 don’t share a single ve820 entry.
(Create multiple entries if needed but not shared)
3. Only support 2 non-contiguous HPA regions (can extend
at a later point for multiple non-contiguous HPA)
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Tracked-On: #4195
Acked-by: Anthony Xu <anthony.xu@intel.com>
To handle reboot requests from pre-launched VMs that don't have
GUEST_FLAG_HIGHEST_SEVERITY, we shutdown the target VM explicitly
other than ignoring them.
Tracked-On: #2700
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
ptirq_prepare_msix_remap was called no matter whether MSI/MSI-X was enabled or not
and it passed zero to input parameter virtual MSI/MSI-X data field to indicate
MSI/MSI-X was disabled. However, it barely did nothing on this case.
Now ptirq_prepare_msix_remap is called only when MSI/MSI-X is enabled. It doesn't
need to check whether MSI/MSI-X is enabled or not by checking virtual MSI/MSI-X
data field.
Tracked-On: #3475
Signed-off-by: Li Fei1 <fei1.li@intel.com>
It's meaningless to sleep a non-running vcpu. Add a state check before
sleep the thread object of the vcpu.
Tracked-On: #4178
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
With cpu-sharing enabled, there are more than 1 vcpu on 1 pcpu, so the
smp_call handler should switch the vmcs to the target vcpu's vmcs. Then
get the info.
dump_vcpu_reg and dump_guest_mem should run on certain vmcs, otherwise,
there will be #GP error.
Renaming:
vcpu_dumpreg -> dump_vcpu_reg
switch_vmcs -> load_vmcs
Tracked-On: #4178
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
We care more about leaf and subleaf of cpuid than vcpu_id.
So, this patch changes the cpuid trace-entry to trace the leaf
and subleaf of this cpuid vmexit.
Tracked-On: #4175
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
PMU is hidden from any guest, UD is expected when guest
try to execute 'rdpmc' instruction.
this patch sets 'RDPMC exiting' in Processorbased
VM-execution control.
Tracked-On: #3453
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Deterministic is important for RTVM. The mitigation for MCE on
Page Size Change converts a large page to 4KB pages runtimely during
the vmexit triggered by the instruction fetch in the large page.
These vmexits increase nondeterminacy, which should be avoided for RTVM.
This patch builds 4KB page mapping in EPT for RTVM to avoid these vmexits.
Tracked-On: #4101
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add a option MCE_ON_PSC_WORKAROUND_DISABLED to disable the software
workaround for the issue Machine Check Error on Page Size Change.
Tracked-On: #4101
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Only apply the software workaround on the models that might be
affected by MCE on page size change. For these models that are
known immune to the issue, the mitigation is turned off.
Atom processors are not afftected by the issue.
Also check the CPUID & MSR to check whether the model is immune to the issue:
CPU is not vulnerable when both CPUID.(EAX=07H,ECX=0H).EDX[29] and
IA32_ARCH_CAPABILITIES[IF_PSCHANGE_MC_NO] are 1.
Other cases not listed above, CPU may be vulnerable.
This patch also changes MACROs for MSR IA32_ARCH_CAPABILITIES bits to UL instead of U
since the MSR is 64bit.
Tracked-On: #4101
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
After changing init_vmcs to smp call approach and do it before
launch_vcpu, it could work with noop scheduler. On real sharing
scheudler, it has problem.
pcpu0 pcpu1 pcpu1
vmBvcpu0 vmAvcpu1 vmBvcpu1
vmentry
init_vmcs(vmBvcpu1) vmexit->do_init_vmcs
corrupt current vmcs
vmentry fail
launch_vcpu(vmBvcpu1)
This patch mark a event flag when request vmcs init for specific vcpu. When
it is running and checking pending events, will do init_vmcs firstly.
Tracked-On: #4178
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The default PCI mmcfg base is stored in ACPI MCFG table, when
CONFIG_ACPI_PARSE_ENABLED is set, acpi_fixup() function will
parse and fix up the platform mmcfg base in ACRN boot stage;
when it is not set, platform mmcfg base will be initialized to
DEFAULT_PCI_MMCFG_BASE which generated by acrn-config tool;
Please note we will not support platform which has multiple PCI
segment groups.
Tracked-On: #4157
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Starting with TSC_DEADLINE msr interception disabled, the virtual TSC_DEADLINE msr is always 0.
When the interception is enabled, need to sync the physical TSC_DEADLINE value to virtual TSC_DEADLINE.
When the interception is disabled, there are 2 cases:
- if the timer hasn't expired, sync virtual TSC_DEADLINE to physical TSC_DEADLINE, to make the guest read the same tsc_deadline
as it writes. This may change when the timer actually trigger.
- if the timer has expired, write 0 to the virtual TSC_DEADLINE.
Tracked-On: #4162
Signed-off-by: Yan, Like <like.yan@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
When write to virtual TSC_DEADLINE, if virtual TSC_ADJUST is not zero:
- when guest intends to disarm the tsc_deadline timer, should not arm the timer falsely;
- when guest intends to arm the tsc_deadline timer, should not disarm the timer falsely.
When read from virtual TSC_DEADLINE, if virtual TSC_ADJUST is not zero:
- if physical TSC_DEADLINE is not zero, return the virtual TSC_DEADLINE value;
- if physical TSC_DEADLINE is zero which means it's not armed (automatically disarmed after
timer triggered), return 0 and reset the virtual TSC_DEADLINE.
Tracked-On: #4162
Signed-off-by: Yan, Like <like.yan@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
xsave area:
legacy region: 512 bytes
xsave header: 64 bytes
extended region: < 3k bytes
So, pre-allocate 4k area for xsave. Use certain instruction to save or
restore the area according to hardware xsave feature set.
Tracked-On: #4166
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Reviewed-by: Anthony Xu <anthony.xu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
ptirq_msix_remap doesn't do the real remap, that's the vmsi_remap and vmsix_remap_entry
does. ptirq_msix_remap only did the preparation.
Tracked-On: #3475
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Add a Kconfig parameter called UEFI_OS_LOADER_NAME to hold the Service VM EFI
bootloader to be run by the ACRN hypervisor. A new string manipulation function
to convert from (char *) to (CHAR16 *) has been added to facilitate the
implementation.
The default value is set to systemd-boot (bootloaderx64.efi)
Tracked-On: #2793
Signed-off-by: Geoffroy Van Cutsem <geoffroy.vancutsem@intel.com>
On server platforms, DMAR DRHD device scope entries may contain PCI
bridges.
Bridges in the DRHD device scope indicate this IOMMU translates for all
devices on the hierarchy below that bridge.
ACRN is unaware of bridge types in the device scope, and adds these
directly to its internal representation of a DRHD. When looking up a BDF
within these DRHD entries, device_to_dmaru assumes all entries are
Endpoints, comparing BDF to BDF. Thus device to DMAR unit fails, because
it treats a bridge as an Endpoint type.
This change leverages prior patches by converting a BDF to the
associated device DRHD index, and uses that index to obtain the correct
DRHD state.
Handling a bridge in other ways may require maintaining a bus list for
each, or replacing each bridge in the dev scope with a set of all device
BDFs underneath it. Server platforms can have hundreds of PCI devices,
thus making the device scope artificially large is unwieldy.
Tracked-On: #4134
Signed-off-by: Alexander Merritt <alex.merritt@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
ACRN does not support multiple PCI segments in its current form.
But VT-d module uses segment info in its interfaces and
hardcodes it to 0.
This patch cleans up everything related to segment to avoid
ambiguity.
Tracked-On: #4134
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
In later patches we use information from DMAR tables to guide discovery
and initialization of PCI devices.
Tracked-On: #4134
Signed-off-by: Alexander Merritt <alex.merritt@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Issue description:
-----------------
Machine Check Error on Page Size Change
Instruction fetch may cause machine check error if page size
and memory type was changed without invalidation on some
processors[1][2]. Malicious guest kernel could trigger this issue.
This issue applies to both primary page table and extended page
tables (EPT), however the primary page table is controlled by
hypervisor only. This patch mitigates the situation in EPT.
Mitigation details:
------------------
Implement non-execute huge pages in EPT.
This patch series clears the execute permission (bit 2) in the
EPT entries for large pages. When EPT violation is triggered by
guest instruction fetch, hypervisor converts the large page to
smaller 4 KB pages and restore the execute permission, and then
re-execute the guest instruction.
The current patch turns on the mitigation by default.
The follow-up patches will conditionally turn on/off the feature
per processor model.
[1] Refer to erratum KBL002 in "7th Generation Intel Processor
Family and 8th Generation Intel Processor Family for U Quad Core
Platforms Specification Update"
https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/7th-gen-core-family-spec-update.pdf
[2] Refer to erratum SKL002 in "6th Generation Intel Processor
Family Specification Update"
https://www.intel.com/content/www/us/en/products/docs/processors/core/desktop-6th-gen-core-family-spec-update.html
Tracked-On: #4101
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
In non-64-bit mode, CS segment base address should be considered when
determining the linear address of the vcpu's instruction pointer. Use
vie_calculate_gla() for instruction address translation which also takes
care of 64-bit mode.
Tracked-On: #4064
Signed-off-by: Peter Fang <peter.fang@intel.com>
MISRA C requires specified bounds for arrays declaration, previous declaration
of platform_clos_array in board.h does not meet the requirement.
Tracked-On: #3987
Signed-off-by: Victor Sun <victor.sun@intel.com>
Remove redundant DMAR MACROs for given platform_acpi_info files because
CONFIG_ACPI_PARSE_ENABLED is enabled for all boards by default. The DMAR
info for nuc7i7dnb is kept as reference in the case that ACPI_PARSE_ENABLED
is not set in Kconfig.
As DMAR info is not provided for apl-mrb, the platform_acpi_info.h under
apl-mrb config folder is meaningless, so also remove this file and let
hypervisor parse ACPI for apl-mrb;
Tracked-On: #3977
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The DMAR info is board specific so move the structure definition to board.c.
As a configruation file, the whole board.c could be generated by acrn-config
tool for each board.
Please note we only provide DMAR info MACROs for nuc7i7dnb board. For other
boards, ACPI_PARSE_ENABLED must be set to y in Kconfig to let hypervisor parse
DMAR info, or use acrn-config tool to generate DMAR info MACROs if user won't
enable ACPI parse code for FuSa consideration.
The patch also moves the function of get_dmar_info() to vtd.c, so dmar_info.c
could be removed.
Tracked-On: #3977
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The value of CONFIG_MAX_IOMMU and MAX_DRHDS are identical to DRHD_COUNT
which defined in platform ACPI table, so remove CONFIG_MAX_IOMMU_NUM
from Kconfig and link these three MACROs together.
Tracked-On: #3977
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
hypcall_id has a type of uint64_t and should use 'llx' as
formatting flag instead of '%d'. Otherwise, we will get a
confusing error log when not-allowed hypercall occurs.
Without this patch:
[96707209us][cpu=1][sev=3][seq=2386]:hypercall -2147483548 is only allowed from SOS_VM!
With this patch:
[84613395us][cpu=1][sev=3][seq=2136]:hypercall 0x80000064 is only allowed from SOS_VM!
So, we can figure out which not-allowed hypercall has been triggered more conveniently.
BTW, this patch adds hypcall_id which triggered from non-ring0 into error log.
Tracked-On: #4012
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
Now the default board memory size is 16 GB. However, ACRN support more and more boards
which may have memory size large than 16 GB. This patch try to filter e820 table which
is over top address space.
Tracked-On: #4007
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Now the e820 structure store ACRN HV memory layout, not the physical memory layout.
Rename e820 to hv_hv_e820 to show this explicitly.
Tracked-On: #4007
Signed-off-by: Li Fei1 <fei1.li@intel.com>
AP trampoline code should be accessible
to hypervisor only, this patch is to unmap
this region from service VM's EPT for security
reason.
Tracked-On: #3992
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
1. Print warning message instead of ASSERT when
the caller try to modify the attribute for
memory region that is not present.
2. To avoid above warning message for memory region
below 1M,its attribute may be updated by Service
VM when updating MTTR setting.
Tracked-On: #3992
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
We should use INIT signal to notify the vcpu threads when
powering off the hard RTVM. To achive this, we should set
the vcpu->thread_obj.notify_mode as SCHED_NOTIFY_INIT.
Patch (27163df9 hv: sched: add sleep/wake for thread object)
tries to set the notify_mode according `is_lapic_pt_enabled(vcpu)`
in function prepare_vcpu. But at this point, the is_lapic_pt_enabled(vcpu)
will always return false. Consequently, it will set notify_mode
as SCHED_NOTIFY_IPI. Then leads to the failure of powering off
hard RTVM.
This patch fixes it by:
- Initialize the notify_mode as SCHED_NOTIFY_IPI in prepare_vcpu.
- Set notify_mode as SCHED_NOTIFY_INIT after guest is trying to
enable x2apic mode of passthru lapic.
Tracked-On: #3975
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Reviewed-by: Yan, Like <like.yan@intel.com>
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
After adding PCI BAR remap support, mmio_node may unregister when there's others
access it. This patch add a lock to protect mmio_node access.
Tracked-On: #3475
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Since guest could re-program PCI device MSI-X table BAR, we should add mmio
emulation handler unregister.
However, after add unregister_mmio_emulation_handler API, emul_mmio_regions
is no longer accurate. Just replace it with max_emul_mmio_regions which records
the max index of the emul_mmio_node.
Tracked-On: #3475
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
In theory, guest could re-program PCI BAR address to any address. However, ACRN
hypervisor only support [0, top_address_space) EPT memory mapping. So we need to
check whether the PCI BAR re-program address is within this scope.
Tracked-On: #3475
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
ACRN currently uses 2MB large pages in the page tables setup
for trampoline code and data. This patch lets ACRN use 1GB large
pages instead.
When it comes to fixing symbols in trampoline code, fixing pointers
in PDPT is no more needed as PDPT PTEs contain Physical Address.
Tracked-On: #3899
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
kick means to notify one thread_object. If the target thread object is
running, send a IPI to notify it; if the target thread object is
runnable, make reschedule on it.
Also add kick_vcpu API in vcpu layer to notify vcpu.
Tracked-On: #3813
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This patch decouple some scheduling logic and abstract into a scheduler.
Then we have scheduler, schedule framework. From modulization
perspective, schedule framework provides some APIs for other layers to
use, also interact with scheduler through scheduler interaces.
Tracked-On: #3813
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
- The default behaviors of PIO & MMIO handlers are same
for all VMs, no need to expose dedicated APIs to register
default hanlders for SOS and prelaunched VM.
Tracked-On: #3904
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Currently the parameter of init_ept_mem_ops is
'struct acrn_vm *vm' for this api,change it to
'struct memory_ops *mem_ops' and 'vm_id' to avoid
the reversed dependency, page.c is hardware layer and vm structure
is its upper-layer stuff.
Tracked-On: #1842
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Let init thread end with run_idle_thread(), then idle thread take over and
start to do scheduling.
Change enter_guest_mode() to init_guest_mode() as run_idle_thread() is removed
out of it. Also add run_thread() in schedule module to run
thread_object's thread loop directly.
rename: switch_to_idle -> run_idle_thread
Tracked-On: #3813
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
sleep one thread_object means to prevent it from being scheduled.
wake one thread_object is an opposite operation of sleep.
This patch also add notify_mode in thread_object to indicate how to
deliver the request.
Tracked-On: #3813
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
schedule infrastructure is per pcpu, so move its initialization to each
pcpu's initialization.
Tracked-On: #3813
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
To support cpu sharing, multiple vcpu can run on same pcpu. We need do
necessary vcpu context switch. This patch add below actions in context
switch.
1) fxsave/fxrstor;
2) save/restore MSRs: MSR_IA32_STAR, MSR_IA32_LSTAR,
MSR_IA32_FMASK, MSR_IA32_KERNEL_GS_BASE;
3) switch vmcs.
Tracked-On: #3813
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
With cpu sharing enabled, per_cpu vcpu cannot work properly as we might
has multiple vcpus running on one pcpu.
Add a schedule API sched_get_current to get current thread_object on
specific pcpu, also add a vcpu API get_running_vcpu to get corresponding
vcpu of the thread_object.
Tracked-On: #3813
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
With cpu sharing enabled, we will map acrn_vcpu to thread_object
in scheduling. From modulization perspective, we'd better hide the
pcpu_id in acrn_vcpu and move it to thread_object.
Tracked-On: #3813
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
vcpu thread's stack shouldn't follow reset_vcpu to reset.
There is also a bug here:
while vcpu B thread set vcpu->running to false, other vcpu A thread
will treat the vcpu B is paused while it has not been switch out
completely, then reset_vcpu will reset the vcpu B thread's stack and
corrupt its running context.
This patch will remove the vcpu thread's stack reset from reset_vcpu.
With the change, we need do init_vmcs between vcpu startup address be
settled and scheduled in. And switch_to_idle() is not needed anymore
as S3 thread's stack will not be reset.
Tracked-On: #3813
Signed-off-by: Fengwei Yin <fengwei.yin@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Two time related synthetic MSRs are implemented in this patch. Both of
them are partition wide MSR.
- HV_X64_MSR_TIME_REF_COUNT is read only and it is used to return the
partition's reference counter value in 100ns units.
- HV_X64_MSR_REFERENCE_TSC is used to set/get the reference TSC page,
a sequence number, an offset and a multiplier are defined in this
page by hypervisor and guest OS can use them to calculate the
normalized reference time since partition creation, in 100ns units.
Tracked-On: #3831
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
This patch implements the minimum set of TLFS functionality. It
includes 6 vCPUID leaves and 3 vMSRs.
- 0x40000001 Hypervisor Vendor-Neutral Interface Identification
- 0x40000002 Hypervisor System Identity
- 0x40000003 Hypervisor Feature Identification
- 0x40000004 Implementation Recommendations
- 0x40000005 Hypervisor Implementation Limits
- 0x40000006 Implementation Hardware Features
- HV_X64_MSR_GUEST_OS_ID Reporting the guest OS identity
- HV_X64_MSR_HYPERCALL Establishing the hypercall interface
- HV_X64_MSR_VP_INDEX Retrieve the vCPU ID from hypervisor
Tracked-On: #3832
Signed-off-by: wenwumax <wenwux.ma@intel.com>
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
Consider the following case when TPR shadow is used with vlapic
basic mode:
1) 2 interrupts are pending in vlapic. INTa's priority > TPR and
INTb's priority <= TPR.
2) TPR threshold is set to zero and INTa is injected to guest.
3) Guest set TPR to the priority of INTa.
4) EOI of INTa. PPR is updated to TPR which equals INTa's priority.
INTb cannot be injected because its priority <= PPR.
5) Guest set TPR to zero. Because TPR threshold is still zero, there is
no TPR threshold vmexit. But since both TPR and ISRV are zero at
this time, the PPR is zero as well. INTb still cannot be injected.
This is a bug.
By adding vcpu_make_request(vlapic->vcpu, ACRN_REQUEST_EVENT) in EOI,
TPR threshold will be updated before vm_resume.
Tracked-On: #3795
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Currently we are using a 1:1 mapping logic for pcpu:vcpu. So don't need
a runqueue for it. Removing it as preparation work to abstract scheduler
framework.
Tracked-On: #3813
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
PRMRR related MSRs need to be configured by platform BIOS / bootloader.
These settings are not allowed to be changed by guest.
VMs currently have no requirement to access these MSRs even when vSGX is enabled.
So, this patch disables PRMRR related MSRs in VM.
Tracked-On: #3739
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
--remove unnecessary includes
--remove unnecssary forward-declaration for 'struct vhm_request'
Tracked-On: #861
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
When a VM is configured with LAPIC PT mode and its vCPU is in x2APIC
mode, the corresponding pCPU needs to be reset during VM shutdown/reset
as its physical LAPIC was used by its guest.
This commit fixes an issue where this reset never happens.
is_lapic_pt_enabled() needs to be called before reset_vcpu() to be able
to correctly reflect a vCPU's APIC mode.
A vCPU with LAPIC PT mode but in xAPIC mode does not require such reset,
since its physical LAPIC was not touched by its guest directly.
v2 -> v3:
- refine edge case detection logic
v1 -> v2:
- use a separate function to return the bitmap of LAPIC PT enabled pCPUs
Tracked-On: #3708
Signed-off-by: Peter Fang <peter.fang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Jack Ren <jack.ren@intel.com>
it uses builtin function(__builtin_popcountl)in bitmap_weight(),
it will use the 'popcnt' instruction,
this patch enable 'popcnt' instruction support in Makefile
Tracked-On: #3663
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
- update the function argument type to union
Declaring argument as pointer is not necessary since it
only does the comparison.
Tracked-On: #1842
Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
To enable static configuration of different scenarios, we configure VMs
in HV code and prepare all nesserary resources for this VM in create VM
hypercall. It means when we create one VM through hypercall, HV will
read all its configuration and run it automatically.
Tracked-On: #3663
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
As we introduced vcpu_affinity[] to assign vcpus to different pcpus, the
old policy and functions are not needed. Remove them.
Tracked-On: #3663
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add this vcpu_affinity[] for each VM to indicate the assignment policy.
With it, pcpu_bitmap is not needed, so remove it from vm_config.
Instead, vcpu_affinity is a must for each VM.
This patch also add some sanitize check of vcpu_affinity[]. Here are
some rules:
1) only one bit can be set for each vcpu_affinity of vcpu.
2) two vcpus in same VM cannot be set with same vcpu_affinity.
3) vcpu_affinity cannot be set to the pcpu which used by pre-launched VM.
v4: config SDC with CONFIG_MAX_KATA_VM_NUM
v5: config SDC with CONFIG_MAX_PCPU_NUM
Tracked-On: #3663
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
There is plan that define each VM configuration statically in HV and let
DM just do VM creating and destroying. So DM need get vcpu_num
information when VM creating.
This patch return the vcpu_num via the API param. And also initial the
VMs' cpu_num for existing scenarios.
Tracked-On: #3663
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
MAX_PCPU_NUM is different on various BOARDs. So we move the generic
definition from Kconfig to each board's config header file.
Tracked-On: #3663
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Now, we create vcpus while VM being created in hypervisor. The
create vcpu hypercall will not be used any more. For compatbility,
keep the hypercall HC_CREATE_VCPU do nothing.
v4: Don't remove HC_CREATE_VCPU hypercall, let it do nothing.
Tracked-On: #3663
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
it will panic if phys_cpu_num > CONFIG_MAX_PCPU_NUM
during init_pcpu_pre,after that no need to check it again.
Tracked-On: #861
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Add " with acrn-config" tag in build info when user build hypervisor with
acrn-config xmls would be helpful to identify the hypervisor configuration
in current build is from acrn-config xml or from source code.
Tracked-On: #3602
Signed-off-by: Victor Sun <victor.sun@intel.com>
TSC would be reset to 0 when enter suspend state on some platform.
This will fail the secure timer checking in secure world because
secure world leverage the TSC as source of secure timer which should
be increased monotonously.
This patch save/restore TSC in host suspend/resume path to guarantee
the mono increasing TSC.
Note: There should no timer setup before TSC resumed.
Tracked-On: #3697
Signed-off-by: Qi Yadong <yadong.qi@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Reserve memory for hv sbuf to avoid its possible overwriting on kernel memory.
For apl-up2, move hv_log address to 0x5de00000 to avoid possible conflict with
HV_RAM which start from 0x5e000000;
For nuc6cayh, move HV_RAM_START to 0x20000000 to avoid possible conflict with
hv_log which start from 0x1fe00000;
Tracked-On: #3533
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@intel.com>
Now the structures(run_context & ext_context) are defined
in vcpu.h,and they are used in the lower-layer modules(wakeup.S),
this patch move down the structures from vcpu.h to cpu.h
to avoid reversed dependency.
Tracked-On: #1842
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Now the structures(union source & struct intr_source) are defined
in ptdev.h,they are used in vtd.c and assign.c,
vtd is the hardware layer and ptdev is the upper-layer module
from the modularization perspective, this patch move down
these structures to avoid reversed dependency.
Tracked-On: #1842
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Using per_cpu list to record ptdev interrupts is more reasonable than
recording them per-vm. It makes dispatching such interrupts more easier
as we now do it in softirq which happens following interrupt context of
each pcpu.
Tracked-On: #3663
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
From modulization perspective, it's not suitable to put pcpu and vm
related request operations in schedule. So move them to pcpu and vm
module respectively. Also change need_offline return value to bool.
Tracked-On: #3663
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Now, we have assumption that SOS control whether the platform
should enter S5 or not. So when SOS tries enter S5, we just
forward the S5 request to native port which make sure platform
S5 is totally aligned with SOS S5.
With higher serverity guest introduced,this assumption is not
true any more. We need to extend the platform S5 process to
handle higher severity guest:
- For DM launched RTVM, we need to make sure these guests
is off before put the whole platfrom to S5.
- For pre-launched VM, there are two cases:
* if os running in it support S5, we wait for guests off.
* if os running in it doesn't support S5, we expect it
will invoke one hypercall to notify HV to shutdown it.
NOTE: this case is not supported yet. Will add it in the
future.
Tracked-On: #3564
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Reviewed-by: Li, Fei1 <fei1.li@intel.com>
do_acpi_s3 actually not limit to do s3 operation. It depends on
the paramters pm1a_cnt_val and pm1b_cnt_val. It could be s3/s5.
Update the function name from xx_s3 to xx_sx.
Tracked-On: #3564
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Currently nuc7i7dnb board is using default platform acpi info file so causes
S3/S5 not working properly.
This patch updates the correct ACPI info for nuc7i7dnb board.
Tracked-On: #3609
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
When we support PCI MSI-X table BAR remapping, we may re-delete the MSI-X table BAR
region. This patch removes strict check for deleting page table mapping.
Tracked-On: #3475
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
The current implement only do "only add a page table mapping for a region when
it's not mapped" check when this page table entry is a PTE entry. However, it
need to do this check for PDPTE entry and PDE entry too.
Tracked-On: #3475
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
pcpu_active_bitmap was read continuously in wait_pcpus_offline(),
acrn_vcpu->running was read continuously in pause_vcpu(),
add volatile keyword to ensure that such accesses are not
optimised away by the complier.
Tracked-On: #1842
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
We did following to do platform reset:
1. Try ACPI reset first if it's available
2. Then try 0xcf9 reset method
3. if 2 fails, try keyboard reset method
This introduces some timing concern which needs be handled carefully.
We change it by following:
assume the platforms which ACRN could be run on must support either
ACPI reset or 0xcf9 reset. And simplify platform reset operation
a little bit:
If ACPI reset register is generated
try ACPI reset
else
try 0xcf9 reset method
Tracked-On: #3609
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
change the input parameter from vcpu to eptp in order to let this api
more generic, no need to care normal world or secure world.
Tracked-On: #1842
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Currently, the clos id of the cpu cores in vmx root mode is the same as non-root mode.
For RTVM, if hypervisor share the same clos id with non-root mode, the cacheline may
be polluted due to the hypervisor code execution when vmexit.
The patch adds hv_clos in vm_configurations.c
Hypervisor initializes clos setting according to hv_clos during physical cpu cores initialization.
For RTVM, MSR auto load/store areas are used to switch different settings for VMX root/non-root
mode for RTVM.
Tracked-On: #2462
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
-- remove some unnecessary includes
-- fix a typo
-- remove unnecessary void before launch_vms
Tracked-On: #1842
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Fix tsc_deadline issue by trapping TSC_DEADLINE msr write if VMX_TSC_OFFSET is not 0.
Because there is an assupmtion in the ACRN vART design that pTSC_Adjust and vTSC_Adjust are
both 0.
We can leave the TSC_DEADLINE write pass-through without correctness issue becuase there is
no offset between the pTSC and vTSC, and there is no write to vTSC or vTSC_Adjust write observed
in the RTOS so far.
This commit fix the potential correctness issue, but the RT performance will be badly affected
if vTSC or vTSC_Adjust was not zero, which we will address if such case happened.
Tracked-On: #3636
Signed-off-by: Yan, Like <like.yan@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
ACRN HV is designed/implemented with "invariant TSC" capability, which wasn't checked at boot time.
This commit adds the "invairant TSC" detection, ACRN fails to boot if there wasn't "invariant TSC" capability.
Tracked-On: #3636
Signed-off-by: Yan, Like <like.yan@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
ROOT_ENTRY_LOWER_CTP_MASK shall be (0xFFFFFFFFFFFFFUL << ROOT_ENTRY_LOWER_CTP_POS)
rather than (0xFFFFFFFFFFFFFUL).
Rationale:
CTP is bits 63:12 in a root entry according to Chapter 9.1 Root Entry in
VT-d spec.
Similarly, update ROOT_ENTRY_LOWER_PRESENT_MASK to keep the coding style
consistent.
CTX_ENTRY_UPPER_DID_MASK shall be (0xFFFFUL << CTX_ENTRY_UPPER_DID_POS)
rather than (0x3FUL << CTX_ENTRY_UPPER_DID_POS).
Rationale:
DID is bits 87:72 in a context entry according to Chapter 9.3 Context
Entry in VT-d spec. It takes 16 bits rather than 6 bits.
Tracked-On: #3626
Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Now that ACPI is enabled for pre-launched VMs, we can remove all mptable code.
Tracked-On: #3601
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Statically define the per vm RSDP/XSDT/MADT ACPI template tables in vacpi.c,
RSDP/XSDT tables are copied to guest physical memory after checksum is
calculated. For MADT table, first fix up process id/lapic id in its lapic
subtable, then the MADT table's checksum is calculated before it is copies to
guest physical memory.
Add 8-bit checksum function in util.h
Tracked-On: #3601
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Cacheline is flushed on EPT entry change, no need to invalidate cache globally
when VM created per VM.
Tracked-On: #3607
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Reviewed-by: Anthony Xu <anthony.xu@intel.com>
EPT tables are shared by MMU and IOMMU.
Some IOMMUs don't support page-walk coherency, the cpu cache of EPT entires
should be flushed to memory after modifications, so that the modifications
are visible to the IOMMUs.
This patch adds a new interface to flush the cache of modified EPT entires.
There are different implementations for EPT/PPT entries:
- For PPT, there is no need to flush the cpu cache after update.
- For EPT, need to call iommu_flush_cache to make the modifications visible
to IOMMUs.
Tracked-On: #3607
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Reviewed-by: Anthony Xu <anthony.xu@intel.com>
VT-d shares the EPT tables as the second level translation tables.
For the IOMMUs that don't support page-walk coherecy, cpu cache should
be flushed for the IOMMU EPT entries that are modified.
For the current implementation, EPT tables for translating from GPA to HPA
for EPT/IOMMU are not modified after VM is created, so cpu cache invlidation is
done once per VM before starting execution of VM.
However, this may be changed, runtime EPT modification is possible.
When cpu cache of EPT entries is invalidated when modification, there is no need
invalidate cpu cache globally per VM.
This patch exports iommu_flush_cache for EPT entry cache invlidation operations.
- IOMMUs share the same copy of EPT table, cpu cache should be flushed if any of
the IOMMU active doesn't support page-walk coherency.
- In the context of ACRN, GPA to HPA mapping relationship is not changed after
VM created, skip flushing iotlb to avoid potential performance penalty.
Tracked-On: #3607
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Reviewed-by: Anthony Xu <anthony.xu@intel.com>
-- move 'RFLAGS_AC' to cpu.h
-- move 'VMX_SUPPORT_UNRESTRICTED_GUEST' to msr.h
and rename it to 'MSR_IA32_MISC_UNRESTRICTED_GUEST'
-- move 'get_vcpu_mode' to vcpu.h
-- remove deadcode 'vmx_eoi_exit()'
Tracked-On: #1842
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
move some data structures and APIs related host reset
from vm_reset.c to pm.c, these are not related with guest.
Tracked-On: #1842
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
The vCOM2 of each VM is designed for VM communication, one VM could send
command or request to another VM through this channel. The feature will
be used for system S3/S5 implementation.
On Hybird scenario, vCOM2 of pre-launched VM will connect to vCOM2 of SOS_VM;
On Industry scenario, vCOM2 of post-launched RTVM will connect to vCOM2 of
SOS_VM.
Tracked-On: #3602
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
The settings of SOS VM COM1 which is used for console is board specific,
and this result in SOS VM COM2 which used for VM communication is also
board specific, so move the configure method from Kconfig to board configs
folder. The MACRO definition will be handled by acrn-config tool in future.
Tracked-On: #3602
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Set sos root device of apl-up2 to mmcblk0p3 and let UP2 uefi variant
and sbl variant share one config for now.
Tracked-On: #3214
Signed-off-by: Victor Sun <victor.sun@intel.com>
Now, we use native gdt saved in boot context for guest and assume
it could be put to same address of guest. But it may not be true
after the pre-launched VM is introduced. The gdt for guest could
be overwritten by guest images.
This patch make 32bit protect mode boot not use saved boot context.
Insteadly, we use predefined vcpu_regs value for protect guest to
initialize the guest bsp registers and copy pre-defined gdt table
to a safe place of guest memory to avoid gdt table overwritten by
guest images.
Tracked-On: #3532
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Currently, 'flags' is defined and set but never be used
in the flow of handling i/o request after then.
Tracked-On: #861
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
In the definition of port i/o handler, struct acrn_vm * pointer
is redundant as input, as context of acrn_vm is aleady linked
in struct acrn_vcpu * by vcpu->vm, 'vm' is not required as input.
this patch removes argument '*vm' from 'io_read_fn_t' &
'io_write_fn_t', use '*vcpu' for them instead.
Tracked-On: #861
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Check whether the address area pointed by the guest
cr3 is valid or not before loading pdptrs. Inject #GP(0)
to guest if there are any invalid cases.
Tracked-On: #3572
Signed-off-by: Jie Deng <jie.deng@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
fix violations touched below:
1.Cast operation on a constant value
2.signed/unsigned implicity conversion
3.return value unused.
V1->V2:
1.bitmap api will return boolean type, not need to check "!= 0", deleted.
2.The behaves ~(uint32_t)X and (uint32_t)~X are not defined in ACRN hypervisor Coding Guidelines,
removed the change of it.
Tracked-On: #861
Signed-off-by: Huihuang Shi <huihuang.shi@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
This patch moves vmx_rdmsr_pat/vmx_wrmsr_pat from vmcs.c to vmsr.c,
so that these two functions would become internal functions inside
vmsr.c.
This approach improves the modularity.
v1 -> v2:
* remove 'vmx_rdmsr_pat'
* rename 'vmx_wrmsr_pat' with 'write_pat_msr'
Tracked-On: #1842
Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Add a field (vdev_ops) in struct acrn_vm_pci_dev_config to configure a PCI CFG
operation for an emulated PCI device. Use pci_pt_dev_ops for PCI_DEV_TYPE_PTDEV
by default if there's no such configure.
Tracked-On: #3475
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add emulated PCI device configure for SOS to prepare for add support for customizing
special pci operations for each emulated PCI device.
Tracked-On: #3475
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Create an iommu domain for all guest in vpci_init no matter if there's a PTDev
in it.
Tracked-On: #3475
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Dongsheng Zhang <dongsheng.x.zhang@intel.com>
Align SOS pci device configure with pre-launched VM and filter pre-launched VM's
PCI PT device from SOS pci device configure.
Tracked-On: #3475
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
For non-trusty hypercalls, HV should inject #GP(0) to vCPU if they are
from non-ring0 or inject #UD if they are from ring0 of non-SOS. Also
we should not modify RAX of vCPU for these invalid vmcalls.
Tracked-On: #3497
Signed-off-by: Victor Sun <victor.sun@intel.com>
In current code, the timer_list for per cpu can be accessed both in
vmexit and softirq handler. There is a case that, the timer_list is
modifying in vmexit, but an interrupt occur, the timer_list is also
modified in softirq handler. So the time_list may in unpredictable
state. In some platforms, the hv console may hang as its timer handler
is not invoked because of the corruption for timer_list.
So, to fix the issue, disable the interrupt before modifying the
timer_list.
Tracked-On: #3512
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Reviewed-by: Li, Fei1 <fei1.li@intel.com>
In some case, guest need to get more information under virtual environment,
like guest capabilities. Basically this could be done by hypercalls, but
hypercalls are designed for trusted VM/SOS VM, We need a machenism to report
these information for normal VMs. In this patch, vCPUID leaf 0x40000001 will
be used to satisfy this needs that report some extended information for guest
by CPUID.
Tracked-On: #3498
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Zhao Yakui <yakui.zhao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The policy of vART is that software in native can run in
VM too. And in native side, the relationship between the
ART hardware and TSC is:
pTSC = (pART * M) / N + pAdjust
The vART solution is:
- Present the ART capability to guest through CPUID leaf
15H for M/N which identical to the physical values.
- PT devices see the pART (vART = pART).
- Guest expect: vTSC = vART * M / N + vAdjust.
- VMCS.OFFSET = vTSC - pTSC = vAdjust - pAdjust.
So to support vART, we should do the following:
1. if vAdjust and vTSC are changed by guest, we should change
VMCS.OFFSET accordingly.
2. Make the assumption that the pAjust is never touched by ACRN.
For #1, commit "a958fea hv: emulate IA32_TSC_ADJUST MSR" has implementation
it. And for #2, acrn never touch pAdjust.
--
v2 -> v3:
- Add comment when handle guest TSC_ADJUST and TSC accessing.
- Initialize the VMCS.OFFSET = vAdjust - pAdjust.
v1 -> v2
Refine commit message to describe the whole vART solution.
Tracked-On: #3501
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The CONFIG_BOARD value in defconfig should match with Makefile, otherwise
the build might be failed in some condition.
Tracked-On: #2291
Signed-off-by: Victor Sun <victor.sun@intel.com>
Currently the Px Cx supported SoCs which listed in cpu_state_tbl.c is limited,
and it is not a wise option to build a huge state table data base to support
Px/Cx for other SoCs. This patch give a alternative solution that build a board
specific cpu state table in board.c which could be auto-generated by offline
tool, then the CPU Px/Cx of customer board could be enabled;
Hypervisor will search the cpu state table in cpu_state_tbl[] first, if not
found then go check board_cpu_state_tbl. If no matched cpu state table is found
then Px/Cx will not be supported;
Tracked-On: #3477
Signed-off-by: Victor Sun <victor.sun@intel.com>
After "commit f0e1c5e init vcpu host stack when reset vcpu", SOS resume form S3
wants to schedule to vcpu_thread not the point where SOS enter S3. So we should
schedule to idel first then reschedule to execute vcpu_thread.
Tracked-On: #3387
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
When monitor/mwait is not supported, it still uses the inline assembly in
wait_sync_change. As it is not allowed based on MISRA-C, the asm wrapper
is used for pause scenario in wait_sync_change.
Tracked-On: #3442
Suggested-by: Li, Fei1 <fei1.li@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Based on SDM Vol2 the monitor uses the RAX register to setup the address
monitored by HW. The mwait uses the rax/rcx as the hints that the process
will enter. It is incorrect that the same value is used for monitor/mwait.
The ecx in mwait specifies the optional externsions.
At the same time it needs to check whether the the value of monitored addr
is already expected before entering mwait. Otherwise it will have possible
lockup.
V1->V2: Add the asm wrappper of monitor/mwait to avoid the mixed usage of
inline assembly in wait_sync_change
v2-v3: Remove the unnecessary line break in asm_monitor/asm_mwait.
Follow Fei's comment to remove the mwait ecx hint setting that
treats the interrupt as break event. It only needs to check whether the
value of psync_change is already expected.
Tracked-On: #3442
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
When need hpa and hva translation before init_paging, we need hpa2hva_early and
hva2hpa_early since init_paging may modify hva2hpa to not be identical mapping.
Tracked-On: #2987
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
1) Using printf to warn if platform ram size configuration is wrong.
2) Using printf to warn if the platform is not supported by ACRN hypervisor.
Tracked-On: #2987
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Enable uart as early as possible to make things easier for debugging.
After this we could use printf to output information to the uart. As for
pr_xxx APIs, they start to work when init_logmsg is called.
Tracked-On: #2987
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
PMC is hidden from guest and hypervisor should
inject UD to guest when 'rdpmc' vmexit.
Tracked-On: #3453
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
Per SDM, writing 0 to MSR_IA32_MCG_STATUS is allowed, HV should not
return -EACCES on this case;
Tracked-On: #3454
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The vbar info which hard-coded in scenarios/logical_partition/pt_dev.c
is board specific actually, so move these information to
arch/x86/configs/$(CONFIG_BOARD)/pci_devices.h.
Please be aware that the memory range of vBAR should exactly match with
the e820 layout of VM.
Tracked-On: #3214
Signed-off-by: Victor Sun <victor.sun@intel.com>
NUC7i7BNH is not a board name but a product name of KBL NUC, and it is
outdated to support LOGICAL_PARTITION scenario and HYBRID scenario.
NUC7i7DNH is the product name of KBL NUC that ACRN currently supported,
but its official board name is NUC7i7DNB, so change the folder name from
"nuc7i7bnh" to "nuc7i7dnb" under arch/x86/configs/.
Please refer more details on below documentation:
Intel® NUC Board/Kit NUC7i7DN Technical Product Specification
Tracked-On: #3446
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Xiangyang Wu <xiangyang.wu@linux.intel.com>
'invept' is not expected in guest and hypervisor should
inject UD when 'invept' VM exit happens.
Tracked-On: #3444
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Otherwise, the previous local variables in host stack is not reset.
Tracked-On: #3387
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
softirq shouldn't be bounded to vcpu thread. One issue for this
is shell (based on timer) can't work if we don't start any guest.
This change also is trying best to make softirq handler running
with irq enabled.
Also update the irq disable/enabel in vmexit handler to align
with the usage in vcpu_thread.
Tracked-On: #3387
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
Because we depend on guest OS to switch x2apic mode to enable lapic pass-thru, vlapic is working at the early stage of booting, eg: in virtual boot loader.
After lapic pass-thru enabled, no interrupt should be injected via vlapic any more.
This commit resets the vlapic to clear the pending status and adds ptapic_ops to enforce that no more interrupt accepted/injected via vlapic.
Tracked-On: #3227
Signed-off-by: Yan, Like <like.yan@intel.com>
This commit adds ops to vlapic structure, and add an *ops parameter to vlapic_reset().
At vlapic reset, the ops is set to the global apicv_ops, and may be assigned
to other ops later.
Tracked-On: #3227
Signed-off-by: Yan, Like <like.yan@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Per community requirement;up to three post-launched VM might be
needed for some automotive SDC system, so add SDC2 scenario to
satisfy this requirement.
Tracked-On: #3429
Signed-off-by: fuzhongl <fuzhong.liu@intel.com>
Reviewed-by: Victor Sun <victor.sun@intel.com>
When initialize secondary pcpu, pass INVALID_CPU_ID as param of init_pcpu_pre()
looks weird, so change the param type to bool to represent whether the pcpu is
a BSP or AP.
Tracked-On: #3420
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
vCPU schedule state change is under schedule lock protection. So there's no need
to be atomic.
Tracked-On: #1842
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>