Thermal events are delivered through lapic thermal LVT. Currently
ACRN does not support delivering those interrupts to guests by
virtual lapic. They need to be virtualized to provide guests some
thermal management abilities. Currently we just hide thermal
lvt from guests, including:
1. Thermal LVT:
There is no way to hide thermal LVT from guests. But we need do
something to make sure no interrupt can be actually trigered:
- skip thermal LVT in vlapic_trigger_lvt()
- trap-and-emulate thermal LVT in lapic-pt mode
2. As We have plan to introduce virtualization of thermal monitor in the
future, we use a vm flag GUEST_FLAG_VTM which is default 0 to control
the access to it. So that it can help enabling VTM in the future.
Tracked-On: #8414
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
When VHWP enabled, return 0 and px_cnt = 0 on ACRN_PMCMD_GET_PX_CNT,
so that DM will write _CPC to guests' ACPI.
Tracked-On: #8414
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Changes made by this patch includes:
1. Emulate HWP and pstate MSRs/CPUIDs. Those are exposed to guest when
the GUEST_FLAG_VHWP is set:
- CPUID[6].EAX[7,9,10]: MSR_IA32_PM_ENABLE(enabled by hv, always read
1), MSR_IA32_HWP_CAPABILITIES, MSR_IA32_HWP_REQUEST,
MSR_IA32_HWP_STATUS,
- CPUID[6].ECX[0]: MSR_IA32_MPERF, MSR_IA32_APERF
- MSR_IA32_PERF_STATUS(read as base frequency when not owning pCPU)
- MSR_IA32_PERF_CTL(ignore writes)
2. Always hide HWP interrupt and package control MSRs/CPUIDs:
- CPUID[6].EAX[8]: MSR_IA32_HWP_INTERRUPT(currently ACRN is not able
to deliver thermal LVT virtual interrupt to guests)
- CPUID[6].EAX[11,22]: MSR_IA32_HWP_REQUEST_PKG, MSR_IA32_HWP_CTL
Tracked-On: #8414
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Currently CPU frequency control is hidden to guests, and controlled
by hypervisor. While it is sufficient in most cases, some guest OS may
still need CPU performance info to make multi-core scheduling decisions.
This is seen on Linux kernel, which uses HWP highest performance level
as CPU core's priority in multi-core scheduling (CONFIG_SCHED_MC_PRIO).
Enabling this kernel feature could improve performance as single thread
workloads are scheduled on the highest performance cores. This is
significantly useful for guests with hybrid cores.
The concept is to expose performance interface to guest who exclusively
owns pCPU assigned to it. So that Linux guest can load intel_pstate
driver which will then provide the kernel with each core's schedule
priority.
Intel_pstate driver also relies on CONFIG_ACPI_CPPC_LIB to implement
this mechanic, this means we also need to provide ACPI _CPC in DM.
This patch sets up a guest flag GUEST_FLAG_VHWP to indicate whether
the guest can have VHWP feature.
Tracked-On: #8414
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
This patch refactors partition_epc() to make the code easier to
understand, also fixes the maybe-uninitialized warning for gcc-13.
Initializing 'vm_config' to get_vm_config(0) is okay here as scenario
validator ensures CONFIG_MAX_VM_NUM to be always larger than 0.
Tracked-On: #8413
Signed-off-by: Jiaqing Zhao <jiaqing.zhao@linux.intel.com>
This patch emulates the PLATFORM_INFO MSR in hypervisor to make it
only visible to Service VM, and only processor ratios (bit 15:8,
47:40 and 55:48) and sample part bit (27) are exponsed. This is
intended to prevent Service VM from changing processor parameters
like turbo ratio.
Tracked-On: #8406
Signed-off-by: Jiaqing Zhao <jiaqing.zhao@linux.intel.com>
When assigning a physical interrupt to a Post-launched VM, if it has
been assigned to ServiceVM, we should remove that mapping first to reset
ioapic pin state and rte, and build new mapping for the Post-launched
VM.
Tracked-On: #8370
Signed-off-by: Qiang Zhang <qiang4.zhang@linux.intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
ptirq_remapping_info records which physical interrupt is mapped to
the virtual interrupt in a VM. As we need to knonw whether a physical
sid has been mapped to a VM and whether a virtual sid in a VM has been
used, there should be two hash tables to link and iterate
ptirq_remapping_info:
- One is used to lookup from physical sid, linking phys_link.
- The other is used to lookup from virtual sid in a VM, linking virt_link
Without this patch, phys_link or virt_link from different
ptirq_remapping_info was linked by one hash list head if they got the
same hash value, as shown in following diagram.
When looking for a ptirq_remapping_info from physical sid, the original
code took all hash list node as phys_link and failed to get
ptirq_remapping_info linked with virt_link and later references
to its members are wrong.
The same problem also occurred when looking for a ptirq_remapping_info
from virtual sid and vm.
---------- <- hash table
|hlist_head| --actual ptirq_remapping_info address
---------- --------- / --used as ptirq_remapping_info
| ... | |phys_link| ___/
---------- --------- --------- / ---------
|hlist_head| -> |phys_link| <-> |virt_link| <-> |phys_link|
---------- --------- --------- ---------
| ... | |virt_link| | other | |virt_link|
---------- --------- | members | ---------
| other | --------- | other |
| members | | members |
--------- ---------
Tracked-On: #8370
Signed-off-by: Qiang Zhang <qiang4.zhang@linux.intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
to enable early print output by pci uart, which has mmio address
above 4G, add the early pagetable map for the MMIO range.
we make an assumption that only map it with 2MB page, since platform
with 39bit memory width have no 1G huge page feature and we cannot
get the capacity at runtime, since it is a very early code stage.
v2->v1:
1. add hva2hpa_early
2. add process when *pdpte is not present
Signed-off-by: hangliu1 <hang1.liu@linux.intel.com>
Reviewed-by: fei1.li <fei1.li@intel.com>
Tracked-On: #6690
The version info is mainly used to tell the user when and where the binary is
compiled and built, this will change the hv version format.
The hv follows the format:
major.minor-stable/unstable-remote_branch-acrn-commit_date-commit_id-dirty
DBG/REL(tag-current_commit_id) scenario@board build by author date.
The '(tag-current_commit_id)' is optional, it exits only when there are
tags for current commit.
e.g.
with tag:
ACRN:\>version
HV: 3.1-stable-release_3.1-2022-09-27-11:15:42-7fad37e02-dirty DBG(tag: v3.1)
scenario3.1@my_desk_3.1 build by zhangwei 2022-11-16 07:02:37
without tag:
ACRN:\>version
HV: 3.2-unstable-master-2022-11-16-14:34:49-11f53d849-dirty DBG
scenario3.1@my_desk_3.1 build by zhangwei 2022-11-16 06:49:44
Tracked-On #8303
Signed-off-by: Zhangwei6 <wei6.zhang@intel.com>
When CPU support MONITOR/MWAIT, OS prefer to use it enter
deeper C-state.
Now ACRN pass through MONITOR/MWAIT to guest.
For vCPUs (ie vCPU A and vCPU B) share a pCPU, if vCPU A uses MWait to enter C state,
vCPU B could run only after the time slice of vCPU A is expired. This time slice of
vCPU A is gone to waste.
For Local APIC pass-through (used for RTVM), the guest pay more attention to
timeliness than power saving.
So this patch hides MONITOR/MWAIT by:
1. Clear vCPUID.05H, vCPUID.01H:ECX.[bit 3] and
MSR_IA32_MISC_ENABLE_MONITOR_ENA to tell the guest VM's vCPU
does not support MONITOR/MAIT.
2. Enable MSR_IA32_MISC_ENABLE_MONITOR_ENA bit for
MSR_IA32_MISC_ENABLE inject 'GP'.
3. Trap instruction 'MONITOR' and 'MWAIT' and inject 'UD'.
4. Clear vCPUID.07H:ECX.[bit 5] to hide 'UMONITOR/UMWAIT'.
5. Clear "enable user wait and pause" VM-execution control, so
UMONITOR/MWAIT causes an 'UD'.
Tracked-On: #8253
Signed-off-by: Yuanyuan Zhao <yuanyuan.zhao@linux.intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Exclude "GUEST_FLAG_PMU_PASSTHROUGH" from DM_OWNED_GUEST_FLAG_MASK
in case device model rewrite the value in release mode, reserve it
in debug mode.
Signed-off-by: hangliu1 <hang1.liu@linux.intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Tracked-On:#6690
The current code would write 'BAR address & size_maks' into PCIe virtual
BAR before updating the virtual BAR's base address when guest writing a
PCIe device's BAR. If the size of a PCIe device's BAR is larger than 4G,
the low 32 bits size_mask for this 64 bits BAR is zero. When ACRN updating
the virtual BAR's base address, the low 32 bits sizing information would
be lost.
This patch saves whether a BAR writing is sizing or not before updating the
virtual BAR's base address.
Tracked-On: #8267
Signed-off-by: Fei Li <fei1.li@intel.com>
The targets "clean" and "distclean" are special as they do not need BOARD
or SCENARIO from users. This patch stops the guesswork of BOARD and
SCENARIO if any of those two targets are specified.
It is assumed that "clean" or "distclean" is not invoked with other targets
at the same time.
Tracked-On: #8259
Signed-off-by: Junjie Mao <junjie.mao@intel.com>
In the current config.mk board and scenario XMLs are only copied to the
build directory when they do not exist. That prevents users from using XML
files they have edited, probably to fix previously reported validation
errors, for a rebuild unless the former build directory is totally removed.
This patch adds the user-given paths to XML files (if they exist) as
dependencies of the copied files in the build directory, so that users can
now provide new board and/or scenario XMLs to an existing build to
automatically trigger a complete rebuild.
Building without explicitly specifying BOARD and SCENARIO is still
supported if a build directory already exists. In that case the copied
board and scenario XMLs will not be modified.
Tracked-On: #8259
Signed-off-by: Junjie Mao <junjie.mao@intel.com>
ACRN config tools generate source files, scripts and binaries based on
users' inputs in the configurations files, i.e., board and scenario
XMLs. Those generation activities all assume that users' inputs are
properly validated, so that all assumptions they have on such inputs are
hold.
Unfortunately, not all dependencies on validated user inputs are explicitly
stated in the Makefiles today. That will cause random error messages
(typically a Python stack trace) to be printed along with validation errors
when an invalid scenario XML is given, and such messages confuse users
about the root causes of the failure.
This patch decouples scenario validation from genconf.sh and make that step
as a separate target that depends on the board and scenario XMLs. One
timestamp file is generated only when the validation succeeds so that
targets requiring validated XMLs can refer that file in its dependency list
to make it explicit.
Dependencies of the following targets are also updated accordingly:
* $(HV_ALLOCATION_XML), which is the allocation XML including static
allocation results, now also depends on validated XMLs.
* $(HV_CONFIG_TIMESTAMP), which records when the config source files are
generated, now also depends on validated XMLs.
* $(SERIAL_CONF), which is the serial conf for the service VM, depends on
the allocation XML.
By the way, the missing dependency on HV_CONFIG_DIR for touching
HV_DIFFCONFIG_LIST is added to fix the "file not found" issue.
Tracked-On: #8259
Signed-off-by: Junjie Mao <junjie.mao@intel.com>
Phony targets are mostly for recipes that are expected to be invoked
directly from the command line as a target and will always be executed. As
a result, it is in most cases not appropriate for a real file target to to
depend on a phony one, as that means the file will always be regenerated.
In the Makefile today, however, dependencies on phony targets are common
and cause the hypervisor to be fully rebuilt regardless of whether an
existing build exists or not.
This patch cleans up the following phony targets which are not meant to be
targets from the command line.
- pre_build: This target has three outputs, namely the prebuild checker,
the ACPI tables for pre-launched VMs and the serial.conf. It is split
into three targets, one for each output.
- headers: This target is an alias of dynamically-generated header
files. It is replaced with a variable so that targets depending on
"header" now depends on the actual header files generated.
With this patch, make will only rebuild modified files and targets
depending on them directly or indirectly.
Tracked-On: #8259
Signed-off-by: Junjie Mao <junjie.mao@intel.com>
The target "default" in hypervisor/Makefile is just an alias of the target
"all" in order to make "all" being the default goal. This patch replaces
that duplicate target and specifies the default goal by defining the
variable .DEFAULT_GOAL instead.
Tracked-On: #8259
Signed-off-by: Junjie Mao <junjie.mao@intel.com>
Now ACRN would traps MSI-X Table Structure access and does MSI-X interrupt
remapping for pass-thru PCIe devices. ACRN does this trap by unmmapping the
address ranges where the MSI-X Table Structure locates in granularity of 4K
pages. So there may have other registers (non-MSI-X structures) in these
trapped pages
However, the guest may access these registers (non-MSI-X structures) in these
trapped pages, which needs to be forwarded to the physical device. This patch
forwards the access to real hardware for pass-thru PCIe devices.
Tracked-On: #8255
Signed-off-by: Fei Li <fei1.li@intel.com>
When guest traps and wants to access a mmio region, the ACRN hypervisor
doesn't know the mmio size the guest wants to access until the trap
happens. In this case, ACRN should switch the mmio size and then call
mmio_read8/16/32/64 or mmio_write8/16/32/64 in each trap place.
This patch wrap the mmio read/write with a parameter to assign the mmio
size.
Tracked-On: #8255
Signed-off-by: Fei Li <fei1.li@intel.com>
The current code uses a predefined sworld memory array to reserve memory
for trusty VMs, and assume all post launched VMs are trusty VM which is
not correct.
This patch statically reserved memory just for trusty VMs and save 16M
memory for every non trusty VM.
Tracked-On: #6690
Signed-off-by: Chenli Wei <chenli.wei@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
The design of ACRN CPU performance management is to let hardware
do the autonomous frequency selection(or set to a fixed value),
and remove guest's ability to control CPU frequency.
This patch is to implement the CPU frequency initializer, which will
setup CPU frequency base on the performance policy type.
Two performance policy types are provided for user to choose from:
- 'Performance': CPU runs at its CPU runs at its maximum frequency.
Enable hardware autonomous frequency selection if HWP is presented.
- 'Nominal': CPU runs at its guaranteed frequency.
The policy type is passed to hypervisor through boot parameter, as
either 'cpu_perf_policy=Nominal' or 'cpu_perf_policy=Performance'.
The default type is 'Performance'.
Both HWP and ACPI p-state are supported. HWP is the first choice, for
it provides hardware autonomous frequency selection, while keeps
frequency transaction time low.
Two functions are added to the hypervisor to call:
- init_frequency_policy(): called by BSP at start up time. It processes
the boot parameters, and enables HWP if it is presented.
- apply_frequency_policy(): called after init_frequency_policy().
It applies initial CPU frequency policy setting for each core. It
uses a set of frequency limits data struct to quickly decide what the
highest/nominal frequency is. The frequency limits are generated by
config-tools.
The hypervisor will not be governing CPU frequency after initial policy
is applied.
Cores running RTVMs are fixed to nominal/guaranteed frequency, to get
more certainty in latency. This is done by setting the core's frequency
limits to highest=lowest=nominal in config-tools.
Tracked-On: #8168
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
For an IO request, hv will check if it was registered in asyncio desc
list. If yes, put the corresponding fd to the shared buffer. If the
shared buffer is full, yield the vcpu and try again later.
Tracked-On: #8209
Signed-off-by: Conghui <conghui.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add hypercall to add/remove asyncio request info. Hv will record the
info in a list, and when a new ioreq is come, hv will check if it is
in the asyncio list, if yes, queue the fd to asyncio buffer.
Tracked-On: #8209
Signed-off-by: Conghui <conghui.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Current IO emulation is synchronous. The user VM need to wait for the
completion of the the I/O request before return. But Virtio Spec
introduces introduces asynchronous IO with a new register in MMIO/PIO
space named NOTIFY, to be used for FE driver to notify BE driver, ACRN
hypervisor can emulate this register by sending a notification to vCPU
in Service VM side. This way, FE side can resume to work without waiting
for the full completion of BE side response.
Tracked-On: #8209
Signed-off-by: Conghui <conghui.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Extend sbuf hypercall to support other kinds of share buffer.
Tracked-On: #8209
Signed-off-by: Conghui <conghui.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
sbuf is now only used for debug purpose, but later, it will be used as a
common interfaces. So, move the sbuf related code out of the debug directory.
Tracked-On: #8209
Signed-off-by: Conghui <conghui.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Improve SMP call to support ACRN shell to operate RTVM.
before, the RTVM CPU can't be kicked off by notification IPI,
so some shell commands can't support it, like rdmsr/wrmsr,
memory/registers dump. So INIT will be used for RTVM, which
LAPIC is pass-thru.
Tracked-On: #8207
Signed-off-by: Minggui Cao <minggui.cao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
By default, notification IPI used to kick sharing pCPU, INIT
used to kick partition pCPU. If USE_INIT_IPI flag is passed to
hypervisor, only INIT will be used to kick pCPU.
Tracked-On: #8207
Signed-off-by: Minggui Cao <minggui.cao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
INIT signal has been used to kick off the partitioned pCPU, like RTVM,
whose LAPIC is pass-through. notification IPI is used to kick off
sharing pCPU.
Add mode_to_kick_pcpu in per-cpu to control the way of kicking
pCPU.
Tracked-On: #8207
Signed-off-by: Minggui Cao <minggui.cao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The access to PCI config_space is handled in HV for Passthrough pci
devices. And it also provides one mechanism to forward cfg_access of
some registers to DM. For example: the opregion reg for GPU device.
This patch tried to add the support of emulated PCI ROM bar for the
device. And it doesn't handle the phys PCI ROM bar of phys PCI devices.
At the same time the rom firmware is provided in DM and pci rom bar_reg
is also emulated in DM, this leverages the quirk mechanism so that the
access to PCI rom bar_reg is forwarded to DM.
Tracked-On: #8175
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Acked-by: Wang Yu <yu1.wang@intel.com>
The design of ACRN CPU performance management is to let hardware
do the autonomous frequency selection(or set to a fixed value),
and remove guest's ability to control CPU frequency.
This patch is to remove guest's ability to control CPU frequency by
removing the guests' HWP/EIST CPUIDs and blocking the related MSR
accesses. Including:
- Remove CPUID.06H:EAX[7..11] (HWP)
- Remove CPUID.01H:ECX[7] (EIST)
- Inject #GP(0) upon accesses to MSR_IA32_PM_ENABLE,
MSR_IA32_HWP_CAPABILITIES, MSR_IA32_HWP_REQUEST,
MSR_IA32_HWP_STATUS, MSR_IA32_HWP_INTERRUPT,
MSR_IA32_HWP_REQUEST_PKG
- Emulate MSR_IA32_PERF_CTL. Value written to MSR_IA32_PERF_CTL
is just stored for reading. This is like how the native
environment would behavior when EIST is disabled from BIOS.
- Emulate MSR_IA32_PERF_STATUS by filling it with base frequency
state. This is consistent with Windows, which displays current
frequency as base frequency when running in VM.
- Hide the IA32_MISC_ENABLE bit 16 (EIST enable) from guests.
This bit is dependent to CPUID.01H:ECX[7] according to SDM.
- Remove CPID.06H:ECX[0] (hardware coordination feedback)
- Inject #GP(0) upon accesses to IA32_MPERF, IA32_APERF
Also DM do not need to generate _PSS/_PPC for post-launched VMs
anymore. This is done by letting hypercall HC_PM_GET_CPU_STATE sub
command ACRN_PMCMD_GET_PX_CNT and ACRN_PMCMD_GET_PX_DATA return (-1).
Tracked-On: #8168
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
HPET is used to calibrate the tsc frequency if system fails to
get the accurate frequency from CPUID 0x15. But on some platforms
(for example: the emulated ACRN on QEMU) HPET is not started
by default, which causes the failure of calibration TSC by HPET.
Tracked-On: #8113
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Reviewed-by: Zhao Yakui <yakui.zhao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
In order to improve DX, 'DISABLED' style configurator settings are
changed to 'ENABLED' style. The meaning of the MACROs are reversed,
so in the hv code we have to change '#ifndef' -> '#ifdef' or
'#ifdef' -> '#ifndef'.
Including:
- MCE_ON_PSC_DISABLED -> MCE_ON_PSC_ENABLED
- ENFORCE_TURNOFF_AC -> SPLIT_LOCK_DETECTION_ENABLED
- ENFORCE_TURNOFF_GP -> UC_LOCK_DETECTION_ENABLED
Tracked-On: #7661
Acked-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
BVT schedule rule:
When a new thread is wakeup and added to runqueue, it will get the
smallest avt (svt) from runqueue to initiate its avt. If the svt is
smaller than it's avt, it will keep the original avt. With the svt, it
can prevent a thread from claiming an excessive share of CPU after
sleepting for a long time.
For the reboot issue, when the VM is reboot, it means a new vcpu thread
is wakeup, but at this time, the Service VM's vcpu thread is blocked,
and removed from the runqueue, and the runqueue is empty, so the svt is
0. The new vcpu thread will get avt=0. avt=0 means very high priority,
and can run for a very long time until it catch up with other thread's
avt in runqueue.
At this time, when Service VM's vcpu thread wakeup, it will check the
svt, but the svt is very small, so will not update it's avt according to
the rule, thus has a very low priority and cannot be scheduled.
To fix it, update svt in pick_next handler to make sure svt is align
with the avt of the first obj in runqueue.
Tracked-On: #7944
Signed-off-by: Conghui <conghui.chen@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
This function is registered as PCI MMIO configuration
access handler, which processes PCI configuration access
request from ACRN guest hence the inputs shall be validated
to avoid potential hypervisor crash when handling inputs
from malicious guests.
Tracked-On: #7902
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
1. make memcpy_erms as a public API; add a new one
memcpy_erms_backwards, which supports to copy data from tail to head.
2. improve to use right/left/home/end key to move cursor, and support
delete/backspace key to modify current input command.
Tracked-On: #7931
Signed-off-by: Minggui Cao <minggui.cao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
1. buffer history commands.
2. support up/down key to select history buffered commands
Tracked-On: #7931
Signed-off-by: Minggui Cao <minggui.cao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
When LAPIC timer is working in oneshot or periodic mode, OS uses
initial counter register/current counter register to program
a timer. Both initial counter and current counter depend on the
LAPIC frequency. ACRN emulated vLAPIC timer based on the TSC.
vLAPIC freq is the same as TSC freq.
Tracked-On: #7876
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Reviewed-by: Zhao Yakui <yakui.zhao@intel.com>
For platform that supports RRSBA (Restricted Return Stack Buffer
Alternate), using retpoline may not be sufficient to guard against branch
history injection or intra-mode branch target injection. RRSBA must
be disabled to prevent CPUs from using alternate predictors for RETs.
Quoting Intel CVE-2022-0001/CVE-2022-0002:
Where software is using retpoline as a mitigation for BHI or intra-mode BTI,
and the processor both enumerates RRSBA and enumerates RRSBA_DIS controls,
it should disable this behavior.
...
Software using retpoline as a mitigation for BHI or intra-mode BTI should use
these new indirect predictor controls to disable alternate predictors for RETs.
See: https://www.intel.com/content/www/us/en/developer/articles/technical/
software-security-guidance/technical-documentation/branch-history-injection.html
Tracked-On: #7907
Signed-off-by: Yifan Liu <yifan1.liu@intel.com>
TLFS defined 2 vMSRs which can be used by Windows guest to get the
TSC/APIC frequencies from hypervisor. This patch adds the support
of HV_X64_MSR_TSC_FREQUENCY/HV_X64_MSR_APIC_FREQUENCY vMSRS whose
availability is exposed by CPUID.0x40000003:EAX[bit11] and EDX[bit8].
v1->v2:
- revise commit message to highlight that the changes are for WaaG
Tracked-On: #7876
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Reviewed-by: Zhao Yakui <yakui.zhao@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
On some platforms CPUID.0x15:ECX is zero and CPUID.0x16 can
only return the TSC frequency in MHZ which is not accurate.
For example the TSC frequency obtained by CPUID.0x16 is 2300
MHZ and the TSC frequency calibrated by HPET is 2303.998 MHZ
which is much closer to the actual TSC frequency 2304.000 MHZ.
This patch adds the support of using HPET to calibrate TSC
when HPET is available and CPUID.0x15:ECX is zero.
v3->v4:
- move calc_tsc_by_hpet into hpet_calibrate_tsc
v2->v3:
- remove the NULL check in hpet_init
- remove ""& 0xFFFFFFFFU" in tsc_read_hpet
- add comment for the counter wrap in the low 32 bits in
calc_tsc_by_hpet
- use a dedicated function for hpet_calibrate_tsc
v1->v2:
- change native_calibrate_tsc_cpuid_0x15/0x16 to
native_calculate_tsc_cpuid_0x15/0x16
- move hpet_init to BSP init
- encapsulate both HPET and PIT calibration to one function
- revise the commit message with an example"
Tracked-On: #7876
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Modified the copyright year range in code, and corrected "int32_tel"
into "Intel" in two "hypervisor/include/debug/profiling.h" and
"hypervisor/include/debug/profiling_internal.h".
Tracked-On: #7559
Signed-off-by: Ziheng Li <ziheng.li@intel.com>
As mentioned in previous patch, wbinvd utilizes the vcpu_make_request
and signal_event call pair to stall other vcpus. Due to the fact that
these two calls are not thread-safe, we need to avoid concurrent call to
this API pair.
This patch adds wbinvd lock to serialize wbinvd emulation.
Tracked-On: #7887
Signed-off-by: Yifan Liu <yifan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Commit d575edf79a changes the internal
implementation of wait_event and signal_event to use a counter instead
of a boolean value.
The background was:
ACRN utilizes vcpu_make_request and signal_event pair to shoot down
other vcpus and let them wait for signals. vcpu_make_request eventually
leads to target vcpu calling wait_event.
However vcpu_make_request/signal_event pair was not thread-safe,
and concurrent calls of this pair of API could lead to problems.
One such example is the concurrent wbinvd emulation, where vcpus may
concurrently issue vcpu_make_request/signal_event to synchronize wbinvd
emulation.
d575edf commit uses a counter in internal implementation of
wait_event/signal_event to avoid data races.
However by using a counter, the wait/signal pair now carries semantics of
semaphores instead of events. Semaphores require caller to carefully
plan their calls instead of multiply signaling any number of times to the same
event, which deviates from the original "event" semantics.
This patch changes the API implementation back to boolean-based, and
re-resolve the issue of concurrent wbinvd in next patch.
This also partially reverts commit 10963b04d1,
which was introduced because of the d575edf.
Tracked-On: #7887
Signed-off-by: Yifan Liu <yifan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
To avoid hardcoding the minimum "iasl" version in multiple places, IASL_MIN_VER
is defined in the top-level Makefile and is passed to config_tools.
This patch verifies "iasl" version against IASL_MIN_VER directly in
config_tools.
Tracked-On: #7880
Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
At build time (on the *dev* machine), config_tools depends on "iasl" to
generate the binary of ACPI tables for pre-launched VMs.
This patch does:
- pass ASL_COMPILER to config_tools
By default, ASL_COMPILER is initialized by "which iasl" at build time.
User could override it by specifying ASL_COMPILER as the build option,
like below:
make BOARD=xxxx SCENARIO=yyyy ASL_COMPILER=/usr/local/bin/iasl
- use ASL_COMPILER as the path to the "iasl" compiler in config_tools
v1 -> v2:
- add a check to make sure ASL_COMPILER is initialized to a value
Tracked-On: #7880
Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
When CONFIG_MULTIBOOT2 is disabled, 'create_service_vm_efi_mmap_desc' is
unused and build fails because [-Werror=unused-function] is set.
boot/guest/bzimage_loader.c:188:17: error: 'create_service_vm_efi_mmap_desc' defined but not used [-Werror=unused-function]
188 | static uint16_t create_service_vm_efi_mmap_desc(struct acrn_vm *vm, struct efi_memory_desc *efi_mmap_desc)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
Tracked-On: #7634
Signed-off-by: Qiang Zhang <qiang4.zhang@linux.intel.com>
Updates to the TSC members and maintainer list
* Update the Technical Steering Committee (TSC) members following the election
of Junjie Mao as the new chair
* Update Thomas Gleixner's email address since he joined Intel Corporation
* Delete the MAINTAINERS files in the 'hypervisor' and 'devicemodel' folders
as they are obsolete and unused.
Tracked-On: #7706
Signed-off-by: Geoffroy Van Cutsem <geoffroy.vancutsem@intel.com>
In 'get_ptdev_info()', variables 'bdf' and 'vbdf' are
16bits in size but their base addresses are converted
to 32bit pointers when calling 'get_entry_info()'.
This mismatch causes insufficient space when storing
to memory pointer by 'bdf' or 'vbdf' in 'get_entry_info()',
where those pointers are regarded as 32bits width memory.
This patch refines definition of 'get_entry_info()'.
Tracked-On: #7547
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
If multiple control fields in GCMD_REG register need to be modified, software
must serialize the modifications through multiple writes to this register.
So one-shot bits (bits 30-29, 27 and 24) in gcmd should not been set.
Otherwise, other control field may be written to GCMD_REG at the same time
with one-shot bit (Clearing one-shot bit has no effect, software sets this field
would set/update this control field used by hardware).
Tracked-On: #7381
Signed-off-by: Fei Li <fei1.li@intel.com>
There is an issue of calculate 2^n roundup of CONFIG_MAX_PT_IRQ_ENTRIES,
and the code style is very ugly when we use macro to fix it.
So this patch move MAX_IR_ENTRIES to offline tool which could do align
check and calculate it automatically.
Signed-off-by: Chenli Wei <chenli.wei@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
dm/vrtc.c:565:33: error: 'current' may be used uninitialized in this
function.[-Werror=maybe-uninitialized]
Move the local variable definition into one code block to avoid warning.
Tracked-on: #7511
Signed-off-by: Yuanyuan Zhao <yuanyuan.zhao@linux.intel.com>
Current code limit the MAX vUART number to 8 which is not enough for
Service VM which should config S5 UART for each user VM.
We could count how many vUARTs we need by offline tool, so remove the
define of MAX_VUART_NUM_PER_VM to offline tool is a simple and accurate
way to allocate vUARTs.
Tracked-On: #8782
Signed-off-by: Chenli Wei <chenli.wei@intel.com>
The comparison "ctx->md_info == NULL" (if ctx is not NULL) will always evaluate
as 'false' for the address of 'hmac_ctx' will never be NULL.
This patch remove this unnecessary check.
Tracked-On: #7453
Signed-off-by: Fei Li <fei1.li@intel.com>
For physical RTC is monotonic growth, ensure vRTC monotonicity.
Periodical calibration and physical RTC modification may have
impact. Check it before reading
Tracked-On: #7440
Signed-off-by: Yuanyuan Zhao <yuanyuan.zhao@linux.intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
For Service VM modify physical rtc time and vrtc will calibrate
time by read physical rtc. So when Service VM modify physical
time, calibrate all vrtc.
Tracked-On: #7440
Signed-off-by: Yuanyuan Zhao <yuanyuan.zhao@linux.intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
For TSC's precision (20~100ppm) is lower than physical RTC (less than 20ppm),
vRTC need to be calibrated by physical RTC. A timer tiggers calibration for
vRTC every 3 hours. This can improve efficiency because physical RTC can be
read once and calibrate all vRTC.
Tracked-On: #7440
Signed-off-by: Yuanyuan Zhao <yuanyuan.zhao@linux.intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
VRTC for hv used to ignore writes to vRTC register.
This patch add time modification to vRTC.
Add base RTC time and TSC offset to get the current time. Convert
current time to vRTC register values (`struct rtcdev`). Then
modify a register, and calculate a new time. Get RTC offset by
substrcting new time from current time.
Thereafter, add RTC offset also when get current time. Then user
can get the modified time.
Tracked-On: #7440
Signed-off-by: Yuanyuan Zhao <yuanyuan.zhao@linux.intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Service VM write physical RTC register.
Both RTC time modify and Configuration register is available.
Tracked-On: #7440
Signed-off-by: Yuanyuan Zhao <yuanyuan.zhao@linux.intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
During setting RTC time, driver will halt RTC update. So support
bit 7 of reg_b. When it set to 0, time will be updated. And
when it's set to 1, rtc will keep the second it was set.
In the process of getting RTC time, driver sets alarm interrupt,
waits 1s, and get alarm interrupt flag. So support alarm interrupt
flag update. If alarm interrupt is enabled (bit 5, reg_b set to 1),
interrupt flag register will be set (bit 7 & bit 5 of reg_c) at
appropriate time.
Tracked-On: #7440
Signed-off-by: Yuanyuan Zhao <yuanyuan.zhao@linux.intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Add support for hour format.
Bit 1 of register B indicates the hour byte format. When it's 0,
twelve-hour mode is selected. Then the seventh bit of hour register
presents AM as 0 and PM as 1.
Tracked-On: #7440
Signed-off-by: Yuanyuan Zhao <yuanyuan.zhao@linux.intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Add judging of the vRTC data mode. If BCD data mode is inuse,
make conversion of data.
Tracked-On: #7440
Signed-off-by: Yuanyuan Zhao <yuanyuan.zhao@linux.intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Current code would read physical RTC register and return it directly to guest.
This patch would read a base physical RTC time and a base physical TSC time
at initialize stage. Then when guest tries to read vRTC time, ACRN HV would
read the real TSC time and use the TSC offset to calculate the real RTC time.
This patch only support BIN data mode and 24 hour mode.
BCD data mode and 12 hour mode will add in other patch.
The accuracy of clock provided by this patch is limited by TSC, and will
be improved in a following patch also.
Tracked-On: #7440
Signed-off-by: Yuanyuan Zhao <yuanyuan.zhao@linux.intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
when SSRAM regions are assigned to service VM
to support virtulization of SSRAM for post-launched
RTVMs, service VM need to access all SSRAM regions
for management, typically, service VM does data
cleanup in SSRAM region when it is reclaimed from
a shutdown RTVM.
This patch update memory type from UC(by default)
to WB, else SSARM region will be evicted when access
from guest happens.
Tracked-On: #7425
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
When booting prelaunch RTVM with SSRAM enabled, we need to delete the
SSRAM region that is used by prelaunch RTVM from Service VM EPT mapping.
If it is not used, or it is not fully used, the SSRAM or the rest SSRAM
should be in Service VM map.
But current code has a issue that it always deletes all SSRAM region
from Service VM EPT, even when no SSRAM is enabled for prelaunch RTVM.
This could cause the post RTVM with SSRAM boot failure, as DM checks and
removes SSRAM region from Service VM EPT during post RTVM setup.
Changing get_software_sram_size() to PRE_RTVM_SW_SRAM_MAX_SIZE could
solve the issue, as PRE_RTVM_SW_SRAM_MAX_SIZE is the SSRAM size that
prelaunch RTVM actually uses.
Tracked-On: #7401
Signed-off-by: Zhou, Wu <wu.zhou@intel.com>
The movdqu instruction moves unaligned double quadword (128 bit)
contained in XMM registers.
This patch uses pointers as input parameters of the function
write_xmm_0_2() to get 128-bit value from 64-bit array for each XMM
register.
Tracked-On: #7380
Reviewed-by: Fei Li <fei1.li@intel.com>
Signed-off-by: Jiang, Yanting <yanting.jiang@intel.com>
When enabling SRIOV capability for a PF in Service VM, ACRN Hypervisor
should add VF BARs mapping for PF since PF's firmware would access these
BARs to do initialization for VFs when it's first created.
Tracked-On: #4433
Signed-off-by: Fei Li <fei1.li@intel.com>
In spite of Table Size in MSI-X Message Control Register [Bits 10:0] masks as
RO (Register bits are read-only and cannot be altered by software), In Spec
PCIe 6.0, Chap 6.1.4.2 MSI-X Configuration "Depending upon system software
policy, system software, device driver software, or each at different times or
environments may configure a Function’s MSI-X Capability and table structures
with suitable vectors."
This patch just pass through MSI-X Control Register field to guest.
Tracked-On: #7275
Signed-off-by: Fei Li <fei1.li@intel.com>
Since CAT support for hybrid platform is landed, let's remove some old declarations
which are no longer used.
Tracked-On: #6690
Signed-off-by: Tw <wei.tan@intel.com>
Ignore the "scenario" and "board" field in <scenario>.xml:
<acrn-config board="whl-ipc-i5" scenario="shared">
Tracked-On: #7345
Signed-off-by: Conghui <conghui.chen@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
1. Support absolute path for scenario file.
2. Use the scenario xml file name as scenario name, but if it is
'scenario.xml', use the upper level directory name.
e.g.
SCENARIO=<pathxxx>/shared/scenario.xml
Then scenario name would be 'shared'.
3. Change 'realpath' to 'abspath' as we should keep the original path
for scenario file even it is a link file. This will make sure the
scenario name is always consistent with file set in 'SCENARIO='.
Tracked-On: #7345
Signed-off-by: Conghui <conghui.chen@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
The current code only supports 2 HPA regions per VM.
This patch extended ACRN to support 2+ HPA regions per VM, to use host
memory better if it is scatted among multiple regions.
This patch uses an array to describe the hpa region for the VM, and
change the logic of ve820 to support multiple regions.
This patch dependent on the config tool and GPA SSRAM change
Tracked-On: #6690
Signed-off-by: Chenli Wei <chenli.wei@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
When CPUID executes with EAX set to 1AH, the processor returns information about hybrid capabilities.
This information is percpu related, and should be obtained directly from the physical cpu.
Tracked-On: #6899
Signed-off-by: Tw <wei.tan@intel.com>
Page table entry present check is page table type
specific and static, e.g. just need to check bit0
of page entry for entries of MMU page table and
bit2~bit0 for EPT page table case. hence no need to
check it by callback function every time.
This patch remove 'pgentry_present' callback field and
add a new bitmask field for this page entry present check.
It can get better performance especially when this
check is executed frequently.
Tracked-On: #7327
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Reviewed-by: Yu Wang <yu1.wang@intel.com>
When CPUID executes with EAX set to 02H, the processor returns information about cache and TLB information.
This information is percpu related, and should be obtained directly from the physical cpu.
BTW, this patch is backported from v2.7 branch.
Tracked-On: #6931
Signed-off-by: Tw <wei.tan@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
This patch is to eliminate a code scan warning.
p_elf_header32 was given a value when it was declared, but later it was
given the same value again. Just remove the later one.
Tracked-On: #7318
Signed-off-by: Zhou, Wu <wu.zhou@intel.com>
Now interrupt vector in ACRN hypervisor is maintained as global variable, not
per-CPU variable. If there're more PCI devices, the physical interrupt vectors
are not enough most likely.
This patch would not allocate physical interrupt vector for MSI/MSI-X vectors
if interrupt posting could been used to inject the MSI/MSI-X interrupt to
a VM directly.
Tracked-On: #7275
Signed-off-by: Fei Li <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Using the SSRAM area size extracted by config_tools, the patch changes
the hard-coded GPA SSRAM area size to its actual size, so that
pre-launched VMs can support large(>8MB) SSRAM area.
When booting service VM, the SSRAM area has to be removed from Service
VM's mem space, because they are passed-through to the pre-rt VM. The
code was bugged since it was using the SSRAM area's GPA in the pre-rt
VM. Changed it to GPA in Service VM.
Tracked-On: #7212
Acked-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Zhou, Wu <wu.zhou@intel.com>
On hybrid platform(e.g. ADL), there may be multiple instances of same level caches for different type of processors,
The current design only supports one global `rdt_info` for each RDT resource type.
In order to support hybrid platform, this patch introduce `rdt_ins` to represents the "instance".
Also, the number of `rdt_info` is dynamically generated by config-tool to match with physical board.
Tracked-On: projectacrn#6690
Signed-off-by: Tw <wei.tan@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
As RDT related information will be offered by config-tool dynamically,
and HV is just a consumer of that. So there's no need to do this detection
at startup anymore.
Tracked-On: projectacrn#6690
Signed-off-by: Tw <wei.tan@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
Many of the license and Intel copyright headers include the "All rights
reserved" string. It is not relevant in the context of the BSD-3-Clause
license that the code is released under. This patch removes those strings
throughout the code (hypervisor, devicemodel and misc).
Tracked-On: #7254
Signed-off-by: Geoffroy Van Cutsem <geoffroy.vancutsem@intel.com>
remove board and scenario attributes dependency for new configuration.
To do:
will remove board and scenario attributes in all scenario XML files
and update the upgrader.py after the new configuration works.
Tracked-On: #6690
Signed-off-by: Kunhui-Li <kunhuix.li@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Now ACRN Hypervisor only support one PCI Segment, this patch add this check.
This patch also fix a small bug: it would trigger false error "DRHD with
INCLUDE_PCI_ALL flag is NOT the last one".
Tracked-On: #5907
Signed-off-by: Fei Li <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Now HI_MMIO_xxx is duplicate with MMIO64_xxx. This patch replace HI_MMIO_xxx
with MMIO64_xxx.
Tracked-On: #6011
Signed-off-by: Fei Li <fei1.li@intel.com>
Unify the handling of host/guest MSR area in VMCS. Remove the emum value
as the element index when there are a few of MSRs in host/guest area.
Because the index could be changed if one element not used. So, use a
variable to save the index which will be used.
Tracked-On: #6966
Acked-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Minggui Cao <minggui.cao@intel.com>
Instead of using a Boolean variable indicating whether a build is for debug
or release, it is more intuitive to specify the build types as "debug" or
"release".
This patch converts the config item RELEASE to BUILD_TYPE which takes
"debug" or "release" as of now.
The generated header and makefile still uses RELEASE, and the command line
option RELEASE=<y or n> is also preserved.
Tracked-On: #6690
Signed-off-by: Junjie Mao <junjie.mao@intel.com>
Today the scripts that populate default values and validate scenarios have
different command-line interfaces: the former requires an XML schema as
input (which is cumbersome in most cases), while the latter always infer
where the XML schema is (which is inflexible).
This patch unifies the command line options of those scripts as follows:
- The scenario XML is always a required positional argument.
- The output file path (if any) is an optional positional argument.
- The schema XML file is an optional long option. When not specified, the
scripts will always use the one under the misc/config_tools/schema
directory.
Also, this patch makes the validator.py executable, as is done to other
executable scripts in the repo.
Tracked-On: #6690
Signed-off-by: Junjie Mao <junjie.mao@intel.com>
Requirement: in CPU partition VM (RTVM), vtune or perf can be used to
sample hotspot code path to tune the RT performance, It need support
PMU/PEBS (Processor Event Based Sampling). Intel TCC asks for it, too.
It exposes PEBS related capabilities/features and MSRs to CPU
partition VM, like RTVM. PEBS is a part of PMU. Also PEBS needs
DS (Debug Store) feature to support. So DS is exposed too.
Limitation: current it just support PEBS feature in VM level, when CPU
traps to HV, the performance counter will stop. Perf global control
MSR is used to do this work. So, the counters shall be close to native.
Tracked-On: #6966
Acked-by: Anthony Xu <anthony.xu@intel.com>
Signed-off-by: Minggui Cao <minggui.cao@intel.com>
Add a flag: GUEST_FLAG_PMU_PASSTHROUGH to indicate if
PMU (Performance Monitor Unit) is passthrough to guest VM.
Tracked-On: #6966
Acked-by: Anthony Xu <anthony.xu@intel.com>
Signed-off-by: Minggui Cao <minggui.cao@intel.com>
NMI is used to notify LAPIC-PT RTVM, to kick its CPU into hypervisor.
But NMI could be used by system devices, like PMU (Performance Monitor
Unit). So use INIT signal as the partition CPU notification function, to
replace injecting NMI.
Also remove unused NMI as notification related code.
Tracked-On: #6966
Acked-by: Anthony Xu <anthony.xu@intel.com>
Signed-off-by: Minggui Cao <minggui.cao@intel.com>
Now the vept module uses a mixture of nept and vept, it's better to
refine it.
So this patch rename nept to vept and simplify the interface of vept
init module.
Tracked-On: #6690
Acked-by: Anthony Xu <anthony.xu@intel.com>
Signed-off-by: Chenli Wei <chenli.wei@intel.com>
Remove the fact that a default BOARD and SCENARIO are used in case there was
none provided by the user, nor any available from a previous build. Up until
now, if that was the case, a build was triggered using a default set of BOARD
and SCENARIO values. The 'make' command will now error out asking the user to
specify those parameters.
Tracked-On: #7112
Signed-off-by: Geoffroy Van Cutsem <geoffroy.vancutsem@intel.com>
This patch adds ENOTTY and ENOSYS to indicate undefined and obsoleted
request hyercall respectively, and uses ENOTTY as error code for undefined
hypercall instead of EINVAL to consistent with the ACRN kernel's return
value.
Tracked-On: #7029
Signed-off-by: Wen Qian <qian.wen@intel.com>
Signed-off-by: Li Fei <fei1.li@intel.com>
Acked-by: Wang, Yu1 <yu1.wang@intel.com>
The coding guideline rule C-ST-04 requires that
a 'if' statement followed by one or more 'else if'
statement shall be terminated by an 'else' statement
which contains either appropriate action or a comment.
Tracked-On: #6776
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
Now the vept table was allocate dynamically, but the table size of vept
was calculated by the CONFIG_PLATFORM_RAM_SIZE which was predefined by
config tool.
It's not complete change and can't support single binary for different
boards/platforms.
So this patch will replace the CONFIG_PLATFORM_RAM_SIZE and get the
top ram size from hv_E820 interface for vept.
Tracked-On: #6690
Acked-by: Anthony Xu <anthony.xu@intel.com>
Signed-off-by: Chenli Wei <chenli.wei@linux.intel.com>
Now the EPT module use predefined parameter "CONFIG_PLATFORM_RAM_SIZE"
to calculate the ept table size.
After change the EPT table to dynamic allocate to support single binary
for different boards/platforms, the ept table size should dynamic
calculate too.
So this patch replace CONFIG_PLATFORM_RAM_SIZE by the hv_e820_ram_size
to get the RAM info on run time.
Tracked-On: #6690
Acked-by: Anthony Xu <anthony.xu@intel.com>
Signed-off-by: Chenli Wei <chenli.wei@linux.intel.com>
CONFIG_PLATFORM_RAM_SIZE is predefined by config tool and mmu use it to
calculate the table size and predefine the ppt table.
This patch will change the ppt to allocate dynamically and get the table
size by the hv_e820_ram_size interface which could get the RAM
info on run time and replace the CONFIG_PLATFORM_RAM_SIZE.
Tracked-On: #6690
Acked-by: Anthony Xu <anthony.xu@intel.com>
Signed-off-by: Chenli Wei <chenli.wei@linux.intel.com>
The e820 module could get the RAM info on run time, but the RAM size
and MAX address was limited by CONFIG_PLATFORM_RAM_SIZE which was
predefined by config tool.
Current solution can't support single binary for different boards or
platforms and the CONFIG_PLATFORM_RAM_SIZE can't matching the RAM size
if user have not update config tools setting after the device changed.
So this patch remove the CONFIG_PLATFORM_RAM_SIZE and calculate ram
size on run time.
Tracked-On: #6690
Acked-by: Anthony Xu <anthony.xu@intel.com>
Signed-off-by: Chenli Wei <chenli.wei@linux.intel.com>
The 'serial.conf' file need to be put in /etc/, but it is currently being
installed in $(libdir)/acrn/. We therefore ask the user to manually copy that
file over to the /etc/ folder. This patch fixes that by installing
'serial.conf' directly there.
Tracked-On: #7107
Signed-off-by: Geoffroy Van Cutsem <geoffroy.vancutsem@intel.com>
Sometimes the memory to be allocated is not at the end of an entry,
that means we have to break one enty into 2 smaller entries, there
are two ways to add the new entry to hv_e820, adds to the end or
insert it.
The initial e820 table is ordered, that's why the e820_alloc_memory
interface asssum all entries was sorted, but add new entry to the
end will break the orde of hv_e820.
So we use insert_e820_entry to replace the add_e820_entry, the new
interfeac will keep the orde and users do not need sort again after
alloc region
Tracked-On: #6690
Acked-by: Anthony Xu <anthony.xu@intel.com>
Signed-off-by: Chenli Wei<chenli.wei@linux.intel.com>
HC_GET_PLATFORM_INFO hypercall is not supported anymore,
hence to remove related function and data structure definition.
Tracked-On: #6690
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Current mmu assum the high memory start from 4G,it's not true for some
platform.
The map logic use "high64_max_ram - 4G" to calculate the high ram size
without any check,it's an issue when the platform have no high memory.
So this patch add high64_min_ram variable to calculate the min address
of high memory and check the high64_min_ram to fix the previou issue.
Tracked-On: #6690
Acked-by: Anthony Xu <anthony.xu@intel.com>
Signed-off-by: Chenli Wei <chenli.wei@linux.intel.com>
Today the XML validation logic is embedded in scenario_cfg_gen.py which is
highly entangled with the Python-internal representation of
configurations. Such representation is used by the current configurator,
but will soon be obsolete when the new configurator is introduced.
In order to avoid unnecessary work on this internal representation when we
refine the schema of scenario XML files, this patch separates the
validation logic into a new script which can either be used from the
command line or imported in other Python-based applications. At build time
this script will be used instead to validate the XML files given by users.
This change makes it easier to refine the current configuration items for
better developer experience.
Migration of existing checks in scenario_cfg_gen.py to XML schema will be
done by a following series.
v2 -> v3:
* Keep Invoking asl_gen.py to generate vACPI tables for pre-launched VMs.
v1 -> v2:
* Remove "all rights reserved" from the license header
* Upgrade the severity of the message indicating lack of xmlschema as
error according to our latest definitions of log severities, as
validation violations could indicate build time or boot time failures.
Tracked-On: #6690
Signed-off-by: Junjie Mao <junjie.mao@intel.com>
Today we assume the paths to the build directory do not contain the
character `@` and build the sed commands on top of this
assumption. However, there is no guarantee that this assumption holds.
This patch changes the separating character in sed pattern replacing
commands back to slash ('/') and escape the slashes in the replacements to
make the commands work. That gives more flexibility to the paths where
users can put their configuration files and build the project.
Tracked-On: #6691
Signed-off-by: Junjie Mao <junjie.mao@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
The current code would inject GP to guest, when there's no IWKeyBackup,
and the guest tried to write MSR MSR_IA32_COPY_PLATFORM_TO_LOCAL(0xd92)
to copy IWKeyBackup for the platform to the IWKey for this logical processor.
This patch fixes it by adjusting the code logic, and it'll do nothing
instead of inject GP if no valid IWKeyBackup.
This patch alse add checking for the value being written to avoid setting
reserved MSR bits.
Tracked-On: #7018
Signed-off-by: Wen Qian <qian.wen@intel.com>
Signed-off-by: Li Fei <fei1.li@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
Now the multiboot modules memory have not reserve,it's an issue if
these memory alloc and write before VM start.
Incorrect allocation of multiboot modules memory will cause VM lost
data or start faild.
So we find these modules memory range and reserve these memory from
e820 entry.
All these memory will realloc to VM which own them before the vm start.
Tracked-On: #6690
Acked-by: Anthony Xu <anthony.xu@intel.com>
Signed-off-by: Chenli Wei <chenli.wei@intel.com>
The coding guideline rule C-FN-16 requires that 'Mixed-use of
C code and assembly code in a single function shall not be allowed',
this patch wraps inline assembly to inline functions.
Tracked-On: #6776
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
v1-->v2:
use inline functions for read/write XMM registers
Rename `CONFIG_IOMMU_BUS_NUM` to `ACFG_MAX_PCI_BUS_NUM`. Configure tool
will calculate `ACFG_MAX_PCI_BUS_NUM` base on the max pci num which is
used by VF. So user needn't care about `ACFG_MAX_PCI_BUS_NUM`, and memory
will be used resonable.
Tracked-On: #6942
Signed-off-by: Yuanyuan Zhao <yuanyuan.zhao@linux.intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Acked-by: Victor Sun <victor.sun@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
remove is_valid_xsave_combination api,
assume the hardware or QEMU can guarantee that support
XSAVE on CPU side and XSAVE_XRSTR on VMX side or not.
will add offline-tool in QEMU platform to avoid the user
use wrong XSAVE configurations.
remov check VMX_PROCBASED_CTLS2_XSVE_XRSTR based on the above reason.
for VMX_PROCBASED_CTLS2_PAUSE_LOOP, now it will panic
if run ACRN over QEMU, here remove it from essential check,
and it will print error information when set this bit
if there is no the hardware capability.
v1-v2:
remove is_valid_xsave_combination
Tracked-On: #6584
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
This patch adds an option CONFIG_KEEP_IRQ_DISABLED to hv (default n) and
config-tool so that when this option is 'y', all interrupts in hv root
mode will be permanently disabled.
With this option to be 'y', all interrupts received in root mode will be
handled in external interrupt vmexit after next VM entry. The postpone
latency is negligible. This new configuration is a requirement from x86
TEE's secure/non-secure interrupt flow support. Many race conditions can be
avoided when keeping IRQ off.
v5:
Rename CONFIG_ACRN_KEEP_IRQ_DISABLED to CONFIG_KEEP_IRQ_DISABLED
v4:
Change CPU_IRQ_ENABLE/DISABLE to
CPU_IRQ_ENABLE_ON_CONFIG/DISABLE_ON_CONFIG and guard them using
CONFIG_ACRN_KEEP_IRQ_DISABLED
v3:
CONFIG_ACRN_DISABLE_INTERRUPT -> CONFIG_ACRN_KEEP_IRQ_DISABLED
Add more comment in commit message
Tracked-On: #6571
Signed-off-by: Yifan Liu <yifan1.liu@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
"idle=halt " should be avoided in REE since we have to
keep the interrupt always masked in root mode.
Tracked-On: #6571
Signed-off-by: Jie Deng <jie.deng@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
Previous upstreamed patches handles the secure/non-secure interrupts in
handle_x86_tee_int. However there is a corner case in which there might
be unhandled secure interrupts (in a very short time window) when TEE
yields vCPU. For this case we always make sure that no secure interrupts
are pending in TEE's vlapic before scheduling REE.
Also in previous patches, if non-secure interrupt comes when TEE is
handling its secure interrupts, hypervisor injects a predefined vector
into TEE's vlapic. TEE does not consume this vector in secure interrupt
handling routine so it stays in vIRR, but it should be cleared because the
actual interrupt will be consumed in REE after VM Entry.
v3:
Fix comments on interrupt priority
v2:
Add comments explaining the priority of secure/non-secure interrupts
Tracked-On: #6571
Signed-off-by: Yifan Liu <yifan1.liu@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
The TEE_NOTIFICATION_VECTOR can sometimes be confused with TEE's PI
notification vector. So rename it to TEE_FIXED_NONSECURE_VECTOR for
better readability.
No logic change.
v3:
Add more comments in commit message.
Tracked-On: #6571
Signed-off-by: Yifan Liu <yifan1.liu@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
Sometimes HV would like to know if there are specific interrupt
pending in vIRR, and clears them if necessary (such as in x86_tee case).
This patch adds two APIs: get_next_pending_intr and clear_pending_intr.
This patch also moves the inline api prio() from
vlapic.c to vlapic.h
v3:
Remove apicv_get_next_pending_intr and apicv_clear_pending_intr
and use vlapic_get_next_pending_intr and vlapic_clear_pending_intr
directly.
v2:
get_pending_intr -> get_next_pending_intr
apicv_basic/advanced_clear_pending_intr -> apicv_clear_pending_intr
apicv_basic/advanced_get_pending_intr -> apicv_get_next_pending_intr
has_pending_intr kept
Tracked-On: #6571
Signed-off-by: Yifan Liu <yifan1.liu@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
In pci_enumerate_ext_cap we assume the extended capability linked lists
are always legal and correct, which might not be true when there was a
faulty hardware. This patch adds checks (time to live) to guard against malformed
extended capability linked lists.
v2:
Add error printing when node_limit <= 0.
Tracked-On: #6571
Signed-off-by: Yifan Liu <yifan1.liu@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
Though REE VM has its load order to be Service_VM, it does not offer
services as Service VM does. The only hypercalls allowed for REE are the
ones with GUEST_FLAG_REE.
Tracked-On: #6571
Signed-off-by: Yifan Liu <yifan1.liu@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
This patch wraps the check of GUEST_FLAG_TEE/REE into functions
is_tee_vm/is_ree_vm for readability. No logic changes.
Tracked-On: #6571
Signed-off-by: Yifan Liu <yifan1.liu@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
The CONFIG_LOG_DESTINATION parameter selects where the logging messages
send to,serial console or memory or npk device MMIO region.
Now we want to remove it and check the loglevel of each channel,close the
output when the loglevel is ZERO.
Tracked-On: #6934
Signed-off-by: Chenli Wei <chenli.wei@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
Currently, in RTVM with multi vCPUs, lapic pass through is
configured, each vCPU works in x2apic mode. When one vCPU sends
IPI to all other vCPUs through writes ICR register with virtual
value 0x00000000000c00f8, this ICR writting will be intercepted,
the hypervisor passes destination shorthand field 11B (All Excluding
Self) in the virtual ICR value into physical ICR value during IPI
emulation, this IPI will be sent to each physical CPU core
in the platform according to 10.6.1 Interrupt Command Register (ICR),
Vol 3, SDM.
One vCPU in User VM with lapic pass through configuration can
send IPI with destination shorthand (10B or 11B) and any vector
(such as NMI or reboot vector) to other vCPUs, this IPI will sent
other VMs in the platform by hypervisor, this interference may
cause other VMs hang.
In this patch, set "Destination Shorthand" field of the
ICR value as 00B (No Shorthand) since the emulation is done
through sending IPI to each VCPU in dmask one by one.
Tracked-On: #6908
Signed-off-by: Xiangyang Wu <xiangyang.wu@intel.com>
Reviewed-by: Chen, Jason CJ <jason.cj.chen@intel.com>
This patch introduces stateful VM which represents a VM that has its own
internal state such as a file cache, and adds a check before system
shutdown to make sure that stateless VM does not block system shutdown.
Tracked-On: #6571
Signed-off-by: Wang Yu <yu1.wang@intel.com>
Signed-off-by: Yifan Liu <yifan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Currently the MACRO(DM_OWNED_GUEST_FLAG_MASK) is generated
by config-tool. It's unnecessary to generate by tool since
it is fixed, the config-tool will remove this MACRO and
move it to vm_config.h
Tracked-On: #6366
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Variable should not have a prefix of '_' per MISRA C standard. The patch
removes the prefix for _ld_ram_start and _ld_ram_end.
Tracked-On: #6885
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
In previous commit df7ffab441
the CONFIG_HV_RAM_SIZE was removed and the hv_ram_size was calculated in
link script by following formula:
ld_ram_size = _ld_ram_end - _ld_ram_start ;
but _ld_ram_start is a relative address in boot section whereas _ld_ram_end
is a absolute address in global, the mix operation cause hv_ram_size is
incorrect when HV binary is relocated.
The patch fix this issue by getting _ld_ram_start and _ld_ram_end respectively
and calculated at runtime.
Tracked-On: #6885
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
Avoid failing hypercalls by returning 0 for empty PX and CX tables on
HC_PM_GET_CPU_STATE/PMCMD_GET_PX_CNT and
HC_PM_GET_CPU_STATE/PMCMD_GET_CX_CNT.
Tracked-On: #6848
Signed-off-by: Helmut Buchsbaum <helmut.buchsbaum@opensource.tttech-industrial.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
Secure interrupt (interrupt belongs to TEE) comes
when TEE vcpu is running, the interrupt will be
injected to TEE directly. But when REE vcpu is running
at that time, we need to switch to TEE for handling.
Non-Secure interrupt (interrupt belongs to REE) comes
when REE vcpu is running, the interrupt will be injected
to REE directly. But when TEE vcpu is running at that time,
we need to inject a predefined vector to TEE for notification
and continue to switch back to TEE for running.
To sum up, when secure interrupt comes, switch to TEE
immediately regardless of whether REE is running or not;
when non-Secure interrupt comes and TEE is running,
just notify the TEE and keep it running, TEE will switch
to REE on its own initiative after completing its work.
Tracked-On: projectacrn#6571
Signed-off-by: Jie Deng <jie.deng@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
This patch implements the following x86_tee hypercalls,
- HC_TEE_VCPU_BOOT_DONE
- HC_SWITCH_EE
Tracked-On: #6571
Signed-off-by: Jie Deng <jie.deng@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
This patch adds the x86_tee hypercall interfaces.
- HC_TEE_VCPU_BOOT_DONE
This hypercall is used to notify the hypervisor that the TEE VCPU Boot
is done, so that we can sleep the corresponding TEE VCPU. REE will be
started at the last time this hypercall is called by TEE.
- HC_SWITCH_EE
For REE VM, it uses this hypercall to request TEE service.
For TEE VM, it uses this hypercall to switch back to REE
when it completes the REE service.
Tracked-On: #6571
Signed-off-by: Jie Deng <jie.deng@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
TEE is a secure VM which has its own partitioned resources while
REE is a normal VM which owns the rest of platform resources.
The TEE, as a secure world, it can see the memory of the REE
VM, also known as normal world, but not the other way around.
But please note, TEE and REE can only see their own devices.
So this patch does the following things:
1. go through physical e820 table, to ept add all system memory entries.
2. remove hv owned memory.
Tracked-On: #6571
Signed-off-by: Jie Deng <jie.deng@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
Given an e820, this API creates an identical memmap for specified
e820 memory type, EPT memory cache type and access right.
Tracked-On: #6571
Signed-off-by: Jie Deng <jie.deng@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
Add a configuration to support companion VM.
Tracked-On: #6571
Signed-off-by: Jie Deng <jie.deng@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Add two VM flags for x86_tee. GUEST_FLAG_TEE for TEE VM,
GUEST_FLAG_REE for normal rich VM.
Tracked-On: #6571
Signed-off-by: Jie Deng <jie.deng@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
The HV will be built failed with below compiler message:
common/efi_mmap.c: In function ‘init_efi_mmap_entries’:
common/efi_mmap.c:41:11: error: unused variable ‘efi_memdesc_nr’
[-Werror=unused-variable]
uint32_t efi_memdesc_nr = uefi_info->memmap_size / uefi_info->memdesc_size;
^~~~~~~~~~~~~~
cc1: all warnings being treated as errors
The root cause is ASSERT() api is for DEBUG only so efi_memdesc_nr is not used
in RELEASE code.
The patch fix this issue by removing efi_memdesc_nr declaration;
Tracked-On: #6834
Signed-off-by: Victor Sun <victor.sun@intel.com>
Since the UUID is not a *must* set parameter for the standard post-launched
VM which doesn't depend on any static VM configuration. We can remove
the KATA related code from hypervisor as it belongs to such VM type.
v2-->v3:
separate the struce acrn_platform_info change of devicemodel
v1-->v2:
update the subject and commit msg
Tracked-On:#6685
Signed-off-by: Chenli Wei <chenli.wei@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
With current arch design the UUID is used to identify ACRN VMs,
all VM configurations must be deployed with given UUIDs at build time.
For post-launched VMs, end user must use UUID as acrn-dm parameter
to launch specified user VM. This is not friendly for end users
that they have to look up the pre-configured UUID before launching VM,
and then can only launch the VM which its UUID in the pre-configured UUID
list,otherwise the launch will fail.Another side, VM name is much straight
forward for end user to identify VMs, whereas the VM name defined
in launch script has not been passed to hypervisor VM configuration
so it is not consistent with the VM name when user list VM
in hypervisor shell, this would confuse user a lot.
This patch will resolve these issues by removing UUID as VM identifier
and use VM name instead:
1. Hypervisor will check the VM name duplication during VM creation time
to make sure the VM name is unique.
2. If the VM name passed from acrn-dm matches one of pre-configured
VM configurations, the corresponding VM will be launched,
we call it static configured VM.
If there is no matching found, hypervisor will try to allocate one
unused VM configuration slot for this VM with given VM name and get it
run if VM number does not reach CONFIG_MAX_VM_NUM,
we will call it dynamic configured VM.
3. For dynamic configured VMs, we need a guest flag to identify them
because the VM configuration need to be destroyed
when it is shutdown or creation failed.
v7->v8:
-- rename is_static_vm_configured to is_static_configured_vm
-- only set DM owned guest_flags in hcall_create_vm
-- add check dynamic flag in get_unused_vmid
v6->v7:
-- refine get_vmid_by_name, return the first matching vm_id
-- the GUEST_FLAG_STATIC_VM is added to identify the static or
dynamic VM, the offline tool will set this flag for
all the pre-defined VMs.
-- only clear name field for dynamic VM instead of clear entire
vm_config
Tracked-On: #6685
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Zhao Yakui <yakui.zhao@intel.com>
Reviewed-by: Victor Sun<victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The CONFIG_MAX_IR_ENTRIES and CONFIG_MAX_PT_IRQ_ENTRIES are separate
configuration items, and they can be configured through configuration tool
When the number of PT irq entries are more than IR entries, then some
passthrough devices' irqs may failed to be protected by interrupt
remapping or automatically injected by post-interrupt mechanism.
And it waste memory if the CONFIG_MAX_IR_ENTRIES is larger.
This patch replace the CONFIG_MAX_IR_ENTRIES to MAX_IR_ENTRIES and
enforce it align to CONFIG_PT_IRQ_ENTRIES and round up to > 2^n as the
IRTA_REG spec.This way can enforce all PT irqs works with IR or PI
mechanism.
Tracked-On: #6745
Signed-off-by: Chenli Wei <chenli.wei@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
e820_alloc_memory requires 4k alignment, so conversion to size is
encapsulated in the function. And then the pre-condition of
`size_arg` is removed.
Tracked-On: #6805
Signed-off-by: Yuanyuan Zhao <yuanyuan.zhao@linux.intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The CONFIG_LOW_RAM_SIZE is used to describe the size of trampoline code
that is never changed. And it totally confused user to configure it.
This patch hard code it to 1MB and remove the macro for configuration.
In the trampoline related code, use ld_trampoline_end and
ld_trampoline_start symbol to calculate the real size.
Tracked-On: #6805
Signed-off-by: Yuanyuan Zhao <yuanyuan.zhao@linux.intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Commit cbf3825 "hv: Pass-through IA32_TSC_AUX MSR to L1 guest"
lets guest own the physical MSR IA32_TSC_AUX and does not handle this MSR
in the hypervisor.
If multiple vCPUs share the same pCPU, when one vCPU reads MSR IA32_TSC_AUX,
it may get the value set by other vCPUs.
To fix this issue, this patch does:
- initialize the MSR content to 0 for the given vCPU, which is consistent with
the value specified in SDM Vol3 "Table 9-1. IA-32 and Intel 64 Processor
States Following Power-up, Reset, or INIT"
- save/restore the MSR content for the given vCPU during context switch
v1 -> v2:
* According to Table 9-1, the content of IA32_TSC_AUX MSR is unchanged
following INIT, v2 updates the initialization logic so that the content for
vCPU is consistent with SDM.
Tracked-On: #6799
Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
Reviewed-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Generate serial configuration file for service VM according
to scenario file and vUART ports base address allocated by
config tool.
Currently, some non-standard serial ports are emulated in
hypervisor and will be used to do communication between service
VM and user VM, so need to generate serial configuration file
to configure these serial ports for service VM.
v1-->v2:
Fix some type issues
Refine script code format
Tracked-On: #6652
Signed-off-by: Xiangyang Wu <xiangyang.wu@intel.com>
This patch moves the ssram area in ve820 tab, and reunites the
hpa1_low_part1/2 areas. The ve820 building code is refined.
before:
|<---low_1M--->|
|<---hpa1_low_part1--->|
|<---SSRAM--->|
|<---hpa1_low_part2--->|
|<---GPU_OpRegion--->|
|<---ACPI DATA--->|
|<---ACPI NVS--->|
---2G---
after:
|<---low_1M--->|
|<---hpa_low--->|
|<---SSRAM--->|
|<---GPU_OpRegion--->|
|<---ACPI DATA--->|
|<---ACPI NVS--->|
---2G---
The SSRAM area's address is described in the ACPI's RTCT/PTCT
table. To simplify the SSRAM implementation, SSRAM area was
identical mapped to GPA, and resulted in the divition of hpa_low.
Then the ve820 building logic became too complicated.
Now we managed to edit the guest's RTCT/PTCT table by offline
tools in the former patch, so we can move the guest's SSRAM
area, and reunite the hpa_low areas again.
After doing this, this patch rewrites the ve820 building code
in a much simpler way.
Tracked-On: #6674
Signed-off-by: Zhou, Wu <wu.zhou@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
The ve820 table' hpa1_low area is divided into two parts, which
is making the code too complicated and causing problems. Moving
the entries that divides the hpa1_low could make things easier.
This patch moves the GPU OpRegion to the tail area of 2G,
consecutive to the acpi data/nvs area.
before:
|<---low_1M--->|
|<---hpa1_low_part1--->|
|<---SSRAM--->|
|<---GPU_OpRegion--->|
|<---hpa1_low_part2--->|
|<---ACPI DATA--->|
|<---ACPI NVS--->|
---2G---
after:
|<---low_1M--->|
|<---hpa1_low_part1--->|
|<---SSRAM--->|
|<---hpa1_low_part2--->|
|<---GPU_OpRegion--->|
|<---ACPI DATA--->|
|<---ACPI NVS--->|
---2G---
Tracked-On: #6674
Signed-off-by: Zhou, Wu <wu.zhou@intel.com>
Reviewed-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
The length of the ACPI data entry in ve820 tab was 960K, while the
ACPI file is 1MB. It causes the ACPI file copy failed due to reserved
ACPI regions in ve820 table is not enough when loading pre-launched
VMs. This patch changes ACPI data area to 1MB to fix the problem.
And the ACPI data length was missed when calculating
ENTRY_HPA1_LOW_PART2 length. Fixed here too.
Also adds some refinement to the hard-coded ACPI base/addr definations
Tracked-On: #6674
Signed-off-by: Zhou, Wu <wu.zhou@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>