Commit Graph

3506 Commits

Author SHA1 Message Date
Haiwei Li
d6fe8b0892 hv: cpuid: make leaf 0x6 per-cpu in hybrid architecture
Leaf 0x6 returns thermal and power management information. In
hybrid architecture, P-cores and E-cores have different information.

Add leaf 0x6 to per-cpu list in hybrid architecture and handle specific
cpuid access.

Tracked-On: #8608
Signed-off-by: Haiwei Li <haiwei.li@intel.com>
2024-05-28 11:02:56 +08:00
Haiwei Li
59a8cc4c28 hv: cpuid: make leaf 0x4 per-cpu in hybrid architecture
Leaf 0x4 returns deterministic cache parameters for each level. In
hybrid architecture, P-cores and E-cores have different cache
information.

Add leaf 0x4 to per-cpu list in hybrid architecture and handle specific
cpuid access.

Tracked-On: #8608
Signed-off-by: Haiwei Li <haiwei.li@intel.com>
2024-05-28 11:02:56 +08:00
Haiwei Li
f7506424e4 hv: cpuid: refactor per-cpu leaves definition
CPUID returns processor identification and feature information.
Different pcpus may return different infos. That is, the info is
per-cpu.

In hybrid architecture, per-cpu leaf is different from the previous. So
introduce a struct percpu_cpuids to indicate the per-cpu leaf. struct
percpu_cpuids will consist of two parts: generic percpu leaves and
hybrid related percpu leaves.

This patch is just to add generic percpu leaves.

Tracked-On: #8608
Signed-off-by: Haiwei Li <haiwei.li@intel.com>
2024-05-28 11:02:56 +08:00
Xin Zhang
7edf800f16 Expose CPUID leaf 0x1f to guest with patched x2APIC ID
CPUID leaf 1f is preferred superset of leaf 0b, currently ACRN exposes
leaf 0b but leaf 1f is empty so the 2 leaves mismatch, and so
application will follow the SDM to check 1f first.

Tracked-On: #8608
Signed-off-by: Xin Zhang <xin.x.zhang@intel.com>
2024-05-28 11:02:56 +08:00
Zhangwei6
ddfcb8c3fc hv: enable thermal lvt interrupt
This patch can fetch the thermal lvt irq and propagate
it to VM.

At this stage we support the case that there is only one VM
governing thermal. And we pass the hardware thermal irq to this VM.

First, we register the handler for thermal lvt interrupt, its irq
vector is THERMAL_VECTOR and the handler is thermal_irq_handler().

Then, when a thermal irq occurs, it flags the SOFTIRQ_THERMAL bit
of softirq_pending, This bit triggers the thermal_softirq() function.
And this function will inject the virtual thermal irq to VM.

Tracked-On: #8595

Signed-off-by: Zhangwei6 <wei6.zhang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2024-05-16 09:40:32 +08:00
Zhangwei6
78243c3f49 hv: expose thermal MSRs to VM.
In this phase, we only use one VM to control thermal.
So we make thermal MSRs readable and writable by this VM.

This VM is flagged with GUEST_FLAG_VTM, and can
read/write thermal MSRs.
For the VMs not flagged with GUEST_FLAG_VTM,
can only read these thermal MSRs to get current status.

Tracked-On: #8595
Signed-off-by: Zhangwei6 <wei6.zhang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2024-05-16 09:40:32 +08:00
Zhang Chen
946a927dcb hv: sched: Fix scheduler priority issue
Fix build issue.

Tracked-On: #8586

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2024-05-08 14:52:23 +08:00
Qiang Zhang
629808c767 debug: vuart: fix interrupt ID for data receiving
When RX FIFO is not empty and Receive Data Available interrupt is
enabled, vUART should report a Receive Data Available (IIR_RXRDY) in IIR
instead of a Timeout Interrupt Pending (IIR_RXTOUT).

Tracked-On: #8583
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2024-04-25 15:00:09 +08:00
Qiang Zhang
c623e11125 debug: vuart: add guest break key support
The break key (key value 0x0) was used as switch key from guest serial
to hv console and guest serial could not receive break key. This blocked
some guest debugging features like KGDB/KDB, sysrq, etc.

This patch leverages escape sequence "<escape> + <break>" to send break to
guest and "<escape> + e" to switch from guest serial to hv console.

Tracked-On: #8583
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2024-04-25 15:00:09 +08:00
Yi Sun
e0d03b27d0 hv: return error code for default case in hcall_vm_intr_monitor
In hcall_vm_intr_monitor(), the default case for intr_hdr->cmd is a
wrong case. So, it should return error code back. But it returns success
code 0 in current codes.

Tracked-On: #8580
Reviewed-by: Fei Li <fei1.li@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
2024-04-23 15:58:36 +08:00
Yonghua Huang
7bcd9d783e hv: refine set_fs_base() function
Leave canary of stack protector untouched on pCPU
 if it has been initialized, instead of generating a new one.

Tracked-On: #8577
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
2024-04-23 11:00:43 +08:00
Yonghua Huang
d5d21fdc1b hv: fix potential NULL pointer dereferrence in ivshmem.c
secure coding fix.

Tracked-On: #8566
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2024-04-10 12:20:54 +08:00
Yonghua Huang
ddfe218747 hv: fill region ID to hv-land ivshmem PCI config space
1) region ID shall be configured by user via config tool.
   2) region ID is programmed to "Subsystem ID" of PCI config space.
   2) "Subsystem Vendor ID" is harded coded as 0x8086

Tracked-On: #8566
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2024-03-28 14:34:38 +08:00
Yonghua Huang
a7a6732580 config_tools: support IVSHMEM devices region ID configuration
This patch adds ivshmem region ID configuration support when user
   configure ACRN IVSHMEM devices via ACRN config tool, this ID provides
   VMs with a stable identification of multiple shared memory regions.

   Also add logic to generate launch script with region ID configured
   as below:
   `add_virtual_device  8 ivshmem hv:/shm_region_0,256,1`

Tracked-On: #8566
Signed-off-by: Kunhui-Li <kunhuix.li@intel.com>
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2024-03-28 14:34:38 +08:00
Jiaqing Zhao
f6bb15c85c hv: mmu: intiialize ppt_page_pool.bitmap in allocate_ppt_pages()
ppt_page_pool.bitmap should be zero-initialized. Also fixes the wrong
indention in allocate_ppt_pages().

Tracked-On: #8559
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Signed-off-by: Jiaqing Zhao <jiaqing.zhao@linux.intel.com>
2024-03-25 09:57:08 +08:00
Jiaqing Zhao
997bdc4843 hv: configure hv console default output in scenario file
Add a new option CONSOLE_VM in scenario to set the default vm to be
outputted in hv console, when it is not set, acrn console will be
used (current behavior). This is intended for debugging vm boot issues.

Tracked-On: #8518
Signed-off-by: Jiaqing Zhao <jiaqing.zhao@linux.intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2024-03-25 09:52:30 +08:00
Wu Zhou
a052cda8e4 hv: console reads all input chars in one poll
For uart console, some control keys are defined as byte sequences,
such as:
 * up arrow - 0x1b/0x5b/0x41
 * F8 - 0x1b/0x5b/0x31/0x39/0x7e

Currently hv console only read one char per poll.
When guest vuart console is active, those byte sequences may not be sent
to guest vuart in good timing due to the poll interval. Thus control keys
such as up/down can not be used in shell or vim.

The solution is to read all input chars in one poll, so that control keys
can be received by guest OS properly.

Tracked-On: #8564
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2024-03-12 15:26:58 +08:00
Wu Zhou
925e3d95b4 hv: add max_len for sbuf_put param
sbuf_put copies sbuf->ele_size of data, and puts into ring. Currently
this function assumes that data size from caller is no less than
sbuf->ele_size.

But as sbuf->ele_size is usually setup by some sources outside of the HV
(e.g., the service VM), it is not meant to be trusted. So caller should
provide the max length of the data for safety reason. sbuf_put() will
return UINT32_MAX if max_len of data is less than element size.

Additionally, a helper function sbuf_put_many() is added for putting
multiple entries.

Tracked-On: #8547
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2024-02-20 11:52:02 +08:00
Wu Zhou
29b3d03ac7 hv: vm_event: send event on triple fault handler
In the triple fault handler, post-launched VMs are instantly turned
off. Now a vm event is generated simultaneously. So that
developers can capture the event and decide what to do with it. (e.g.,
logging and populating diagnostics, or poweroff VM)

Tracked-On: #8547
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2024-02-01 17:01:31 +08:00
Wu Zhou
ab63fe1a92 hv: vm_event: send RTC change event in hv vRTC
This patch adds support for HV vrtc vm_event.

RTC change event is sent upon each date/time reg write. Those events
will be handled in DM. DM will try to emit an RTC change event(to
Libvirt) based on its strategy. Only support post-launched VMs.

The DM event handler has already implemented the rtc chanage event.
Those events will be processed the same way as vrtc events from DM vrtc.

Tracked-On: #8547
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2024-02-01 17:01:31 +08:00
Wu Zhou
262a48f346 dm: vm_event: add support for RTC change event
When a guest OS performs an RTC change action, we wish this event be
captured by developers, and then they can decide what to do with it.
(e.g., whether to change physical RTC)

There are some facts that makes RTC change event a bit complicated:
- There are 7 RTC date/time regs (year, month…). They can only be
  updated one by one.
- RTC time is not reliable before date/time update is finished.
- Guests can update RTC date/time regs in any order.
- Guests may update RTC date/time regs during either RTC halted or not
  halted.

A single date/time update event is not reliable. We have to wait for
the guest to finish the update process. So the DM's event handler
sets up a timer, and wait for some time (1 second). If no more change
happens befor the timer expires, we can conclude that the RTC
change has been done. Then the rtc change event is emitted.

This logic of event handler can be used to process HV vrtc time change
event too.

Tracked-On: #8547
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Jian Jun Chen <jian.jun.chen@intel.com>
2024-02-01 17:01:31 +08:00
Wu Zhou
d9ccf1ccb2 dm: vm_event: create vm_event thread
This patch creates a thread for vm_event delivery. The thread uses epoll
to poll event notifications, then read out the msg data queued in sbuf.
An event handler is called upon success receiving. Both HV and DM event
sources share the same process.

Also vm_event tx API for DM event source is added in this patch.

Tracked-On: #8547
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Jian Jun Chen <jian.jun.chen@intel.com>
2024-02-01 17:01:31 +08:00
Wu Zhou
581ec58fbb hv: vm_event: create vm_event support
This patch creates vm_event support in HV, including:
1. Create vm_event data type.
2. Add vm_event sbuf and its initializer. The sbuf will be allocated by
   DM in Service VM. Its page address will then be share to HV through
   hypercall.
3. Add an API to send the HV generated event.

Tracked-On: #8547
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2024-02-01 17:01:31 +08:00
Muhammad Qasim Abdul Majeed
3be3b394ad hypervisor: Fix spelling and grammar mistakes.
Tracked-On: #8533
Signed-off-by: Muhammad Qasim Abdul Majeed <qasim.majeed20@gmail.com>
2023-10-24 11:10:47 +08:00
Muhammad Qasim Abdul Majeed
ce96ef6bae hypervisor: Fix spelling and grammar mistakes.
Tracked-On: #8533
Signed-off-by: Muhammad Qasim Abdul Majeed <qasim.majeed20@gmail.com>
2023-10-23 16:45:28 +08:00
Wu Zhou
bbe8e254cf hv: support multi function ivshmem device
Currently ivshmem device can only be configurated as single function
device(bdf.f = 0) on bus 0. This greatly limits the number of ivshmem
devices we can create. This patch is to enable multiple function bit in
HEADER_TYPE config register, so that we can create many more ivshmem
devices by using different function numbers on one bus:dev.

The multi function device bit is to be set on ivshmem devices whose function
number equls 0. PCI spec describe it as: ‘When Set, indicates that the
Device may contain multiple Functions, but not necessarily.’, So if this
dev is the only one on the bus:dev, it is still OK.

Tracked-On: #8520
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2023-09-27 16:46:20 +08:00
Qiang Zhang
79b91b339b hv: sched: add four parameters for BVT scheduler
Per BVT (Borrowed Virtual Time) scheduler design, following per thread
parameters are required to tune scheduling behaviour.
- weight
  The time sharing of a thread on CPU.
- warp
  Boost value of virtual time of a thread (time borrowed from future) to
  reduce Effective Virtual Time to prioritize the thread.
- warp_limit
  Max warp time in one warp.
- unwarp_period
  Min unwarp time after a warp.

As of now, only weight is in use to tune virtual time ratio of VCPU
threads from different VMs. Others parameters are for future extension.

Tracked-On: #8500
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
2023-09-18 16:26:05 +08:00
Qiang Zhang
04a4f31d28 config: add four per-vm bvt parameters
Add four per-vm bvt parameters as the initial bvt parameter values for
vCPU threads.
- bvt_weight
  The time sharing of a thread on CPU.
- bvt_warp_value
  Boost value of virtual time of a thread (time borrowed from future) to
  reduce Effective Virtual Time to prioritize the thread.
- bvt_warp_limit
  Max warp time in one warp.
- bvt_unwarp_period
  Min unwarp time after a warp.

Tracked-On: #8500
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
2023-09-18 16:26:05 +08:00
Qiang Zhang
6a1d91c740 hv: sched: Add sched_params struct for thread parameters
Abstract out schedulers config data for vCPU threads and other hypervisor
threads to sched_params structure. And it's used to initialize per
thread scheduler private data. The sched_params for vCPU threads come
from vm_config generated by config tools while other hypervisor threads
need give them explicitly.

Tracked-On: #8500
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
2023-09-18 16:26:05 +08:00
Qiang Zhang
c000a3f70b hv: add clamp macro for convenience
Add clamp macro to clamp a value within a range.

Tracked-On: #8500
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
2023-09-18 16:26:05 +08:00
Wu Zhou
9a6e940849 hv: signal_event after make_request
make_request sets the request bit, and signal_event wakes the vcpu
thread. If we signal_event comes first, the target vCPU has a chance to
sleep again before processing the request bit.

Tracked-On: #8507
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2023-09-15 11:52:40 +08:00
Wu Zhou
064be1e3e6 hv: support halt in hv idle
When all vCPU threads on one pCPU are put to sleep (e.g., when all
guests execute HLT), hv would schedule to idle thread. Currently the
idle thread executes PAUSE which does not enter any c-state and consumes
a lot of power. This patch is to support HLT in the idle thread.

When we switch to HLT, we have to make sure events that would wake a
vCPU must also be able to wake the pCPU. Those events are either
generated by local interrupt or issued by other pCPUs followed by an
ipi kick.

Each of them have an interrupt involved, so they are also able to wake
the halted pCPU. Except when the pCPU has just scheduled to idle thread
but not yet halted, interrupts could be missed.

sleep-------schedule to idle------IRQ ON---HLT--(kick missed)
                                         ^
                              wake---kick|

This areas should be protected. This is done by a safe halt
mechanism leveraging STI instruction’s delay effect (same as Linux).

vCPUs with lapic_pt or hv with CONFIG_KEEP_IRQ_DISABLED=y does not allow
interrupts in root mode, so they could never wake from HLT (INIT kick
does not wake HLT in root mode either). They should continue using PAUSE
in idle.

Tracked-On: #8507
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2023-09-15 11:52:40 +08:00
Wu Zhou
64d999e703 hv: switch to dynamic timer in bvt scheduler
When bvt scheduler picks up a thread to run, it sets up a counter
‘run_countdown’ to determine how many ticks it should remain running.
Then the timer will decrease run_countdown by 1 on every 1000Hz tick
interrupt, until it reaches 0. The tick interrupt consumes a lot of
power during idle (if we are using HLT in idle thread).

This patch is to switch the 1000 HZ timer to a dynamic one, which only
interrupt on run_countdown expires.

Tracked-On: #8507
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2023-09-13 08:30:27 +08:00
Yonghua Huang
252ba0b047 hv: allocate ivshmem shared memory from E820
ACRN boot fails when size of ivshmem device is
 configured to 512MB, this patch allocates this memory
 from E820 table instead of reserving in hypervisor.

Tracked-On: #8502
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2023-09-12 13:52:48 +08:00
Yonghua Huang
791019edc5 hv: define a MACRO to indicate maximum memory size
~0UL is widely used to specify the maximum memory size
 when calling e820_alloc_memory(), this patch to define
 a MACRO for it to avoid using this magic number.

Tracked-On: #8502
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2023-09-12 13:52:48 +08:00
Qiang Zhang
a4a73b5aac HV: emulate dummy multi-function dev in Service VM
For a pdev which allocated to prelaunched VM or owned by HV, we need to check
whether it is a multifuction dev at function 0. If yes we have to emulate a
dummy function dev in Service VM, otherwise the sub-function devices will be
lost in guest OS pci probe process.

Tracked-On: #8492
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
Signed-off-by: Victor Sun <victor.sun@intel.com>
2023-09-11 16:13:16 +08:00
Qiang Zhang
bf653d277b HV: init one dev config with service vm config param
When we do init_all_dev_config() in pci.c, the pdevs added to pci dev_config
will be exposed to Service VM or passthru to prelauched VM. The original code
would find service VM config in every pci pdev init loop, this is unnecessary
and definitely impact performance. Here we generate Service VM config pointer
with config tool so that init_one_dev_config() could refer service VM config
directly.

Tracked-On: #8491
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
Signed-off-by: Victor Sun <victor.sun@intel.com>
2023-09-11 16:13:16 +08:00
Qiang Zhang
deccb22ea8 hv: rename is_allocated_to_prelaunched_vm to allocate_to_prelaunched_vm
Rename is_allocated_to_prelaunched_vm to allocate_to_prelaunched_vm as
it not only checks whether the PCI device is allocated to a Pre-launched
VM but also associate it with Pre-launched VM's dev_config.

Tracked-On: #8491
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
2023-09-11 16:13:16 +08:00
Qiang Zhang
aebc16e9e5 hv: fix Service VM EPT mapping upper bound
On some platforms, the last e820 entry may not be of type E820_TYPE_RAM,
such as E820_TYPE_ACPI_NVS which may also be used by Service VM.
So we need take all e820 entry types into account when finding the upper
bound of Service VM EPT mapping.

Tracked-On: #8495
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
2023-09-05 11:09:46 +08:00
Muhammad Qasim Abdul Majeed
a457e65619 doc: Fix spelling and typo mistakes.
Tracked-On: #8488

Signed-off-by: Muhammad Qasim Abdul Majeed <qasim.majeed20@gmail.com>
2023-09-05 09:34:21 +08:00
Zhang Chen
c6eda313f9 hypervisor/ivshmem: Add check to prevent malicious BAR0 opts
The ivshmem spec define the BAR0 offset > 16 are reserved.
So ACRN need ignore all operation when offset out of range.

Tracked-On: #8487

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2023-08-16 10:12:03 +08:00
Zhang Chen
45382dca4b hypervisor/ivshmem: Add ivshmem_dev_lock to protect deinit
Add the ivshmem_dev_lock to protect races between release
and allocations.

Tracked-On: ##8486

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2023-08-16 10:12:03 +08:00
Zhang Chen
c9c0d2167b hypervisor/ivshmem: Fix ivshmem ivposition loss issue
ACRN sets the ivposition to the VM_ID in ivshmem_server_bind_peer().
This value should be saved in the ivshmem_device until unbind.
It is wrong to clear ivs_dev->mmio in the ivshmem_vbar_map(),
Instead, it should clear the ivshmem_device structure in the
create_ivshmem_device to ensure the same initial states
after VM reboot case.

Tracked-On: #8485

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Junjie Mao <Junjie.mao@intel.com>
2023-08-16 10:12:03 +08:00
Jiaqing Zhao
7bfbdf04b8 doc: remove '@return None' for void functions
doxygen will warn that documented return type is found for functions
that does not return anything in 1.9.4 or later versions. 'None' is
not a special keyword in doxyge, it will recognize it as description
to the return value that does not exist in void functions.

Tracked-On: #8425
Signed-off-by: Jiaqing Zhao <jiaqing.zhao@linux.intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2023-08-03 14:56:29 -07:00
Wu Zhou
8af2c263db hv: disable HFI and ITD for guests
The Hardware Feedback Interface (HFI) and Intel® Thread Director (ITD)
features require OS to provide a physical page address to
IA32_HW_FEEDBACK_PTR. Then the hardware will update the processor
information to the page address. The issue is that guest VM will program
its GPA to that MSR, causing great risk of tempering memory.

So HFI and ITD should be made invisible to guests, until we provide
proper virtulization of those features.

Tracked-On: #8463
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2023-08-01 14:57:23 +08:00
Jiaqing Zhao
3cc1127ae9 hv: fix undefined reference to nested_vmexit_handler() in vmexit.c
arch/x86/guest/nested.c, where nested_vmexit_handler() is defined, is
only compiled when CONFIG_NVMX_ENABLED is enabled. Define a dummy
function in include/arch/x86/asm/guest/nested.h to fix the undefined
reference linker error.

Tracked-On: #8465
Signed-off-by: Jiaqing Zhao <jiaqing.zhao@linux.intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2023-08-01 14:22:14 +08:00
Wu Zhou
fa97e32917 hv: bugfix: skip invalid ffs64 return value
ffs64() returns INVALID_BIT_INDEX (0xffffU) when it tries to deal with
zero input value. This may happen In calculate_logical_dest_mask() when
the guest tries to write some illegal destination IDs to MSI config
registers of a pt-device. The ffs64() return value is used as per_cpu
array index, and it would cause a page fault.

This patch adds protection to the per_cpu array, making this function
return zero on illegal value. As in logical destination's definition, a
zero logical designation would point to no CPU.

Fixes: 89d11d91e
Tracked-On: #8454
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2023-07-14 17:38:16 +08:00
Wu Zhou
89d11d91e2 hv: bugfix: fix the ptdev irq destination issue
According to SDM Vol3 11.12.10, in x2APIC mode, Logical Destination has
two parts:
  - Cluster ID (LDR[31:16])
  - Logical ID (LDR[15:0])
Cluster ID is a numerical address, while Logical ID is a 16bit mask. We
can only use Logical ID to address multi destinations within a Cluster.

So we can't just 'or' all the Logical Destination in LDR registers to
get one mask for all target pCPUs. This would get a wrong destination
mask if the target Destinations are from different Clusters.

For example in ADL/RPL x2APIC LDRs for core 2-5 are 0x10001 0x10100
0x20001 0x20100. If we 'or' them together, we would get a Logical
Destination of 0x30101, which points to core 6 and another core.
If core 6 is running a RTVM, then the irq is unable to get to
core 2-5, causing the guest on core 2-5 driver fail.

Guests working in xAPIC mode may use 'Flat Model' to select an
arbitrary list of CPUs as its irq destination. HV may not be able to
include them all when transfering to physical destinations, because
the HW is working in x2APIC mode and can only use 'Cluster Model'.

There would be no perfect fix for this issue. This patch is a simple
fix, by just keep the first Cluster of all target Logical Destinations.

Tracked-On: #8435
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2023-07-05 17:41:16 +08:00
Jiaqing Zhao
e5d46dcc7d hv: vpci: ignore PCI I/O BAR with non-zero upper 16 bits
On x86 platform, the upper 16 bit of I/O BAR should be initialized to
zero by BIOS. Howerever, some buggy BIOS still programs the upper 16
bits to non-zero, which causes error in check_pt_dev_pio_bars(). Since
I/O BAR reprogramming by VM is currently unsupported, this patch
ignores such I/O BARs when creating vpci devices to make VM boot.

Tracked-On: #8373
Signed-off-by: Jiaqing Zhao <jiaqing.zhao@linux.intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2023-06-26 14:40:57 +08:00
Jiaqing Zhao
e92320cf56 hv: allow non-service vm to read MSR_PLATFORM_INFO (CEh)
Guests bootloader may read MSR_PLATFORM_INFO to get TSC frequency for
time measurement, so injecting #GP on read may crash the vm on boot.
This patch emulates MSR_PLATFORM_INFO with 0, same behavior in kvm, to
tell the guest it's a virtualized environment.

Tracked-On: #8406
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Signed-off-by: Jiaqing Zhao <jiaqing.zhao@linux.intel.com>
2023-06-19 17:45:11 +08:00
Wu Zhou
db83648a8d hv: hide thermal interface from guests
Thermal events are delivered through lapic thermal LVT. Currently
ACRN does not support delivering those interrupts to guests by
virtual lapic. They need to be virtualized to provide guests some
thermal management abilities. Currently we just hide thermal
lvt from guests, including:

1. Thermal LVT:
There is no way to hide thermal LVT from guests. But we need do
something to make sure no interrupt can be actually trigered:
  - skip thermal LVT in vlapic_trigger_lvt()
  - trap-and-emulate thermal LVT in lapic-pt mode

2. As We have plan to introduce virtualization of thermal monitor in the
future, we use a vm flag GUEST_FLAG_VTM which is default 0 to control
the access to it. So that it can help enabling VTM in the future.

Tracked-On: #8414
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2023-06-15 20:36:44 +08:00
Wu Zhou
972cdeb318 hv: Write _CPC to guests' ACPI when VHWP is enabled
When VHWP enabled, return 0 and px_cnt = 0 on ACRN_PMCMD_GET_PX_CNT,
so that DM will write _CPC to guests' ACPI.

Tracked-On: #8414
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2023-06-09 10:06:42 +08:00
Wu Zhou
c5d019b836 hv: emulate cpuids and MSRs for VHWP
Changes made by this patch includes:
1. Emulate HWP and pstate MSRs/CPUIDs. Those are exposed to guest when
   the GUEST_FLAG_VHWP is set:
    - CPUID[6].EAX[7,9,10]: MSR_IA32_PM_ENABLE(enabled by hv, always read
      1), MSR_IA32_HWP_CAPABILITIES, MSR_IA32_HWP_REQUEST,
      MSR_IA32_HWP_STATUS,
    - CPUID[6].ECX[0]: MSR_IA32_MPERF, MSR_IA32_APERF
    - MSR_IA32_PERF_STATUS(read as base frequency when not owning pCPU)
    - MSR_IA32_PERF_CTL(ignore writes)
2. Always hide HWP interrupt and package control MSRs/CPUIDs:
    - CPUID[6].EAX[8]: MSR_IA32_HWP_INTERRUPT(currently ACRN is not able
      to deliver thermal LVT virtual interrupt to guests)
    - CPUID[6].EAX[11,22]: MSR_IA32_HWP_REQUEST_PKG, MSR_IA32_HWP_CTL

Tracked-On: #8414
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2023-06-09 10:06:42 +08:00
Wu Zhou
2edf141047 hv: add VHWP guest flag and its helper func
Currently CPU frequency control is hidden to guests, and controlled
by hypervisor. While it is sufficient in most cases, some guest OS may
still need CPU performance info to make multi-core scheduling decisions.
This is seen on Linux kernel, which uses HWP highest performance level
as CPU core's priority in multi-core scheduling (CONFIG_SCHED_MC_PRIO).
Enabling this kernel feature could improve performance as single thread
workloads are scheduled on the highest performance cores. This is
significantly useful for guests with hybrid cores.

The concept is to expose performance interface to guest who exclusively
owns pCPU assigned to it. So that Linux guest can load intel_pstate
driver which will then provide the kernel with each core's schedule
priority.

Intel_pstate driver also relies on CONFIG_ACPI_CPPC_LIB to implement
this mechanic, this means we also need to provide ACPI _CPC in DM.

This patch sets up a guest flag GUEST_FLAG_VHWP to indicate whether
the guest can have VHWP feature.

Tracked-On: #8414
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
2023-06-09 10:06:42 +08:00
Jiaqing Zhao
e4e25f076c hv: sgx: refactor partition_epc()
This patch refactors partition_epc() to make the code easier to
understand, also fixes the maybe-uninitialized warning for gcc-13.

Initializing 'vm_config' to get_vm_config(0) is okay here as scenario
validator ensures CONFIG_MAX_VM_NUM to be always larger than 0.

Tracked-On: #8413
Signed-off-by: Jiaqing Zhao <jiaqing.zhao@linux.intel.com>
2023-06-06 15:22:19 +08:00
Jiaqing Zhao
770cf8c434 hv: emulate MSR_PLATFORM_INFO (17h)
This patch emulates the PLATFORM_INFO MSR in hypervisor to make it
only visible to Service VM, and only processor ratios (bit 15:8,
47:40 and 55:48) and sample part bit (27) are exponsed. This is
intended to prevent Service VM from changing processor parameters
like turbo ratio.

Tracked-On: #8406
Signed-off-by: Jiaqing Zhao <jiaqing.zhao@linux.intel.com>
2023-05-30 15:10:05 +08:00
Qiang Zhang
fcb8e9bb2d ptirq: Fix INTx assignment for Post-launched VM
When assigning a physical interrupt to a Post-launched VM, if it has
been assigned to ServiceVM, we should remove that mapping first to reset
ioapic pin state and rte, and build new mapping for the Post-launched
VM.

Tracked-On: #8370
Signed-off-by: Qiang Zhang <qiang4.zhang@linux.intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2023-04-13 12:24:57 +08:00
Qiang Zhang
bf9341844a ptirq: Fix ptirq hash tables
ptirq_remapping_info records which physical interrupt is mapped to
the virtual interrupt in a VM. As we need to knonw whether a physical
sid has been mapped to a VM and whether a virtual sid in a VM has been
used, there should be two hash tables to link and iterate
ptirq_remapping_info:
- One is used to lookup from physical sid, linking phys_link.
- The other is used to lookup from virtual sid in a VM, linking virt_link

Without this patch, phys_link or virt_link from different
ptirq_remapping_info was linked by one hash list head if they got the
same hash value, as shown in following diagram.
When looking for a ptirq_remapping_info from physical sid, the original
code took all hash list node as phys_link and failed to get
ptirq_remapping_info linked with virt_link and later references
to its members are wrong.
The same problem also occurred when looking for a ptirq_remapping_info
from virtual sid and vm.

 ---------- <- hash table
|hlist_head|                                --actual ptirq_remapping_info address
 ----------                      --------- /    --used as ptirq_remapping_info
|    ...   |                    |phys_link| ___/
 ----------      ---------       --------- /     ---------
|hlist_head| -> |phys_link| <-> |virt_link| <-> |phys_link|
 ----------      ---------       ---------       ---------
|    ...   |    |virt_link|     | other   |     |virt_link|
 ----------      ---------      | members |      ---------
                | other   |      ---------      | other   |
                | members |                     | members |
                 ---------                       ---------

Tracked-On: #8370
Signed-off-by: Qiang Zhang <qiang4.zhang@linux.intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2023-04-13 12:24:57 +08:00
hangliu1
bb2118d98b hv: support for pci uart with high mmio
to enable early print output by pci uart, which has mmio address
above 4G, add the early pagetable map for the MMIO range.

we make an assumption that only map it with 2MB page, since platform
with 39bit memory width have no 1G huge page feature and we cannot
get the capacity at runtime, since it is a very early code stage.

v2->v1:
1. add hva2hpa_early
2. add process when *pdpte is not present

Signed-off-by: hangliu1 <hang1.liu@linux.intel.com>
Reviewed-by: fei1.li <fei1.li@intel.com>
Tracked-On: #6690
2022-11-21 16:50:05 +08:00
Zhangwei6
38f910e4fb hv: change the version format
The version info is mainly used to tell the user when and where the binary is
compiled and built, this will change the hv version format.

The hv follows the format:
major.minor-stable/unstable-remote_branch-acrn-commit_date-commit_id-dirty
DBG/REL(tag-current_commit_id) scenario@board build by author date.
The '(tag-current_commit_id)' is optional, it exits only when there are
tags for current commit.
e.g.
with tag:
ACRN:\>version
HV: 3.1-stable-release_3.1-2022-09-27-11:15:42-7fad37e02-dirty DBG(tag: v3.1)
scenario3.1@my_desk_3.1 build by zhangwei 2022-11-16 07:02:37
without tag:
ACRN:\>version
HV: 3.2-unstable-master-2022-11-16-14:34:49-11f53d849-dirty DBG
scenario3.1@my_desk_3.1 build by zhangwei 2022-11-16 06:49:44

Tracked-On #8303
Signed-off-by: Zhangwei6 <wei6.zhang@intel.com>
2022-11-21 09:45:26 +08:00
Yuanyuan Zhao
0a4c76357e hv: hide mwait from guest.
When CPU support MONITOR/MWAIT, OS prefer to use it enter
deeper C-state.

Now ACRN pass through MONITOR/MWAIT to guest.

For vCPUs (ie vCPU A and vCPU B) share a pCPU, if vCPU A uses MWait to enter C state,
vCPU B could run only after the time slice of vCPU A is expired. This time slice of
vCPU A is gone to waste.

For Local APIC pass-through (used for RTVM), the guest pay more attention to
timeliness than power saving.

So this patch hides MONITOR/MWAIT by:
    1. Clear vCPUID.05H, vCPUID.01H:ECX.[bit 3] and
    MSR_IA32_MISC_ENABLE_MONITOR_ENA to tell the guest VM's vCPU
    does not support MONITOR/MAIT.
    2. Enable MSR_IA32_MISC_ENABLE_MONITOR_ENA bit for
    MSR_IA32_MISC_ENABLE inject 'GP'.
    3. Trap instruction 'MONITOR' and 'MWAIT' and inject 'UD'.
    4. Clear vCPUID.07H:ECX.[bit 5] to hide 'UMONITOR/UMWAIT'.
    5. Clear  "enable user wait and pause" VM-execution control, so
    UMONITOR/MWAIT causes an 'UD'.

Tracked-On: #8253
Signed-off-by: Yuanyuan Zhao <yuanyuan.zhao@linux.intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
2022-11-04 18:55:52 +08:00
hangliu1
f144e8089c HV: remove rewrite of PMU guest flag in acrn dm
Exclude "GUEST_FLAG_PMU_PASSTHROUGH" from DM_OWNED_GUEST_FLAG_MASK
in case device model rewrite the value in release mode, reserve it
in debug mode.

Signed-off-by: hangliu1 <hang1.liu@linux.intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Tracked-On:#6690
2022-11-02 15:50:30 +08:00
Fei Li
ab4e19d0be hv: vPCI: fix large bar base update
The current code would write 'BAR address & size_maks' into PCIe virtual
BAR before updating the virtual BAR's base address when guest writing a
PCIe device's BAR. If the size of a PCIe device's BAR is larger than 4G,
the low 32 bits size_mask for this 64 bits BAR is zero. When ACRN updating
the virtual BAR's base address, the low 32 bits sizing information would
be lost.

This patch saves whether a BAR writing is sizing or not before updating the
virtual BAR's base address.

Tracked-On: #8267
Signed-off-by: Fei Li <fei1.li@intel.com>
2022-10-28 05:55:20 +08:00
Junjie Mao
e96937aa89 Makefile: do not try to figure out BOARD and SCENARIO for cleanup
The targets "clean" and "distclean" are special as they do not need BOARD
or SCENARIO from users. This patch stops the guesswork of BOARD and
SCENARIO if any of those two targets are specified.

It is assumed that "clean" or "distclean" is not invoked with other targets
at the same time.

Tracked-On: #8259
Signed-off-by: Junjie Mao <junjie.mao@intel.com>
2022-10-26 14:09:44 +08:00
Junjie Mao
64cbc706b7 Makefile: retake XMLs from users if they are updated
In the current config.mk board and scenario XMLs are only copied to the
build directory when they do not exist. That prevents users from using XML
files they have edited, probably to fix previously reported validation
errors, for a rebuild unless the former build directory is totally removed.

This patch adds the user-given paths to XML files (if they exist) as
dependencies of the copied files in the build directory, so that users can
now provide new board and/or scenario XMLs to an existing build to
automatically trigger a complete rebuild.

Building without explicitly specifying BOARD and SCENARIO is still
supported if a build directory already exists. In that case the copied
board and scenario XMLs will not be modified.

Tracked-On: #8259
Signed-off-by: Junjie Mao <junjie.mao@intel.com>
2022-10-26 14:09:44 +08:00
Junjie Mao
314f3004c7 Makefile: make dependencies on validated configurations explicit
ACRN config tools generate source files, scripts and binaries based on
users' inputs in the configurations files, i.e., board and scenario
XMLs. Those generation activities all assume that users' inputs are
properly validated, so that all assumptions they have on such inputs are
hold.

Unfortunately, not all dependencies on validated user inputs are explicitly
stated in the Makefiles today. That will cause random error messages
(typically a Python stack trace) to be printed along with validation errors
when an invalid scenario XML is given, and such messages confuse users
about the root causes of the failure.

This patch decouples scenario validation from genconf.sh and make that step
as a separate target that depends on the board and scenario XMLs. One
timestamp file is generated only when the validation succeeds so that
targets requiring validated XMLs can refer that file in its dependency list
to make it explicit.

Dependencies of the following targets are also updated accordingly:

  * $(HV_ALLOCATION_XML), which is the allocation XML including static
    allocation results, now also depends on validated XMLs.

  * $(HV_CONFIG_TIMESTAMP), which records when the config source files are
    generated, now also depends on validated XMLs.

  * $(SERIAL_CONF), which is the serial conf for the service VM, depends on
    the allocation XML.

By the way, the missing dependency on HV_CONFIG_DIR for touching
HV_DIFFCONFIG_LIST is added to fix the "file not found" issue.

Tracked-On: #8259
Signed-off-by: Junjie Mao <junjie.mao@intel.com>
2022-10-26 14:09:44 +08:00
Junjie Mao
cdf7796a62 Makefile: clean up unnecessary phony targets
Phony targets are mostly for recipes that are expected to be invoked
directly from the command line as a target and will always be executed. As
a result, it is in most cases not appropriate for a real file target to to
depend on a phony one, as that means the file will always be regenerated.

In the Makefile today, however, dependencies on phony targets are common
and cause the hypervisor to be fully rebuilt regardless of whether an
existing build exists or not.

This patch cleans up the following phony targets which are not meant to be
targets from the command line.

  - pre_build: This target has three outputs, namely the prebuild checker,
    the ACPI tables for pre-launched VMs and the serial.conf. It is split
    into three targets, one for each output.

  - headers: This target is an alias of dynamically-generated header
    files. It is replaced with a variable so that targets depending on
    "header" now depends on the actual header files generated.

With this patch, make will only rebuild modified files and targets
depending on them directly or indirectly.

Tracked-On: #8259
Signed-off-by: Junjie Mao <junjie.mao@intel.com>
2022-10-26 14:09:44 +08:00
Junjie Mao
0e8ce66af9 Makefile: specify default goal using the variable .DEFAULT_GOAL
The target "default" in hypervisor/Makefile is just an alias of the target
"all" in order to make "all" being the default goal. This patch replaces
that duplicate target and specifies the default goal by defining the
variable .DEFAULT_GOAL instead.

Tracked-On: #8259
Signed-off-by: Junjie Mao <junjie.mao@intel.com>
2022-10-26 14:09:44 +08:00
Fei Li
7c940207d2 hv: vpci: fix pass-thru pcie device may access MSI-X BAR
Now ACRN would traps MSI-X Table Structure access and does MSI-X interrupt
remapping for pass-thru PCIe devices. ACRN does this trap by unmmapping the
address ranges where the MSI-X Table Structure locates in granularity of 4K
pages. So there may have other registers (non-MSI-X structures) in these
trapped pages

However, the guest may access these registers (non-MSI-X structures) in these
trapped pages, which needs to be forwarded to the physical device. This patch
forwards the access to real hardware for pass-thru PCIe devices.

Tracked-On: #8255
Signed-off-by: Fei Li <fei1.li@intel.com>
2022-10-26 01:02:20 +08:00
Fei Li
8d734dd915 hv: pci: use mmio_read/write directly
Use mmio_read/write directly.

Tracked-On: #8255
Signed-off-by: Fei Li <fei1.li@intel.com>
2022-10-26 01:02:20 +08:00
Fei Li
65454730de hv:io: wrap mmio read/write
When guest traps and wants to access a mmio region, the ACRN hypervisor
doesn't know the mmio size the guest wants to access until the trap
happens. In this case, ACRN should switch the mmio size and then call
mmio_read8/16/32/64 or mmio_write8/16/32/64 in each trap place.

This patch wrap the mmio read/write with a parameter to assign the mmio
size.

Tracked-On: #8255
Signed-off-by: Fei Li <fei1.li@intel.com>
2022-10-26 01:02:20 +08:00
Conghui
fbe1c39a9e hv: fix compile issue in release mode
fix compile issue for sbuf in release mode.

Tracked-On: #8241
Signed-off-by: Conghui <conghui.chen@intel.com>
Acked-by: Fei Li <fei1.li@intel.com>
2022-10-21 14:24:02 +08:00
Chenli Wei
dcb0f05efc hv: refine the sworld memory allocate
The current code uses a predefined sworld memory array to reserve memory
for trusty VMs, and assume all post launched VMs are trusty VM which is
not correct.

This patch statically reserved memory just for trusty VMs and save 16M
memory for every non trusty VM.

Tracked-On: #6690
Signed-off-by: Chenli Wei <chenli.wei@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
2022-10-19 15:58:25 +08:00
Wu Zhou
fbdc2774af hv: add ACRN CPU frequency initializer
The design of ACRN CPU performance management is to let hardware
do the autonomous frequency selection(or set to a fixed value),
and remove guest's ability to control CPU frequency.

This patch is to implement the CPU frequency initializer, which will
setup CPU frequency base on the performance policy type.

Two performance policy types are provided for user to choose from:
  - 'Performance': CPU runs at its CPU runs at its maximum frequency.
    Enable hardware autonomous frequency selection if HWP is presented.
  - 'Nominal': CPU runs at its guaranteed frequency.

The policy type is passed to hypervisor through boot parameter, as
either 'cpu_perf_policy=Nominal' or 'cpu_perf_policy=Performance'.
The default type is 'Performance'.

Both HWP and ACPI p-state are supported. HWP is the first choice, for
it provides hardware autonomous frequency selection, while keeps
frequency transaction time low.

Two functions are added to the hypervisor to call:
  - init_frequency_policy(): called by BSP at start up time. It processes
    the boot parameters, and enables HWP if it is presented.
  - apply_frequency_policy(): called after init_frequency_policy().
    It applies initial CPU frequency policy setting for each core. It
    uses a set of frequency limits data struct to quickly decide what the
    highest/nominal frequency is. The frequency limits are generated by
    config-tools.

The hypervisor will not be governing CPU frequency after initial policy
is applied.

Cores running RTVMs are fixed to nominal/guaranteed frequency, to get
more certainty in latency. This is done by setting the core's frequency
limits to highest=lowest=nominal in config-tools.

Tracked-On: #8168
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2022-10-08 11:13:21 +08:00
Conghui
17f94605f0 hv: dispatch asyncio request
For an IO request, hv will check if it was registered in asyncio desc
list. If yes, put the corresponding fd to the shared buffer. If the
shared buffer is full, yield the vcpu and try again later.

Tracked-On: #8209
Signed-off-by: Conghui <conghui.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2022-09-27 10:26:42 +08:00
Conghui
4c79354798 hv: add hypercall to register asyncio
Add hypercall to add/remove asyncio request info. Hv will record the
info in a list, and when a new ioreq is come, hv will check if it is
in the asyncio list, if yes, queue the fd to asyncio buffer.

Tracked-On: #8209
Signed-off-by: Conghui <conghui.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2022-09-27 10:26:42 +08:00
Conghui
12bfa98a37 hv: support asyncio request
Current IO emulation is synchronous. The user VM need to wait for the
completion of the the I/O request before return. But Virtio Spec
introduces introduces asynchronous IO with a new register in MMIO/PIO
space named NOTIFY, to be used for FE driver to notify BE driver, ACRN
hypervisor can emulate this register by sending a notification to vCPU
in Service VM side. This way, FE side can resume to work without waiting
for the full completion of BE side response.

Tracked-On: #8209
Signed-off-by: Conghui <conghui.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2022-09-27 10:26:42 +08:00
Conghui
9cf9606e56 hv: extend sbuf hypercall
Extend sbuf hypercall to support other kinds of share buffer.

Tracked-On: #8209
Signed-off-by: Conghui <conghui.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2022-09-27 10:26:42 +08:00
Conghui
efb01db779 hv: change sbuf to a common infrastructure
sbuf is now only used for debug purpose, but later, it will be used as a
common interfaces. So, move the sbuf related code out of the debug directory.

Tracked-On: #8209
Signed-off-by: Conghui <conghui.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2022-09-27 10:26:42 +08:00
Minggui Cao
6d4ca4b3a1 hv: improve smp call to support debugging RTVM
Improve SMP call to support ACRN shell to operate RTVM.
before, the RTVM CPU can't be kicked off by notification IPI,
so some shell commands can't support it, like rdmsr/wrmsr,
memory/registers dump. So INIT will be used for RTVM, which
LAPIC is pass-thru.

Tracked-On: #8207
Signed-off-by: Minggui Cao <minggui.cao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2022-09-26 13:28:02 +08:00
Minggui Cao
bc4c773cf8 hv: add param to control INIT used to kick pCPU
By default, notification IPI used to kick sharing pCPU, INIT
used to kick partition pCPU. If USE_INIT_IPI flag is passed to
hypervisor, only INIT will be used to kick pCPU.

Tracked-On: #8207
Signed-off-by: Minggui Cao <minggui.cao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2022-09-26 13:28:02 +08:00
Minggui Cao
2c140addaf hv: use kick-mode in per-cpu to control kick pCPU
INIT signal has been used to kick off the partitioned pCPU, like RTVM,
whose LAPIC is pass-through. notification IPI is used to kick off
sharing pCPU.

Add mode_to_kick_pcpu in per-cpu to control the way of kicking
pCPU.

Tracked-On: #8207
Signed-off-by: Minggui Cao <minggui.cao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2022-09-26 13:28:02 +08:00
Zhao Yakui
d0720096b0 ACRN:HV:VPCI: Forward access of PCI ROM bar_reg to DM for passthru device
The access to PCI config_space is handled in HV for Passthrough pci
devices. And it also provides one mechanism to forward cfg_access of
some registers to DM. For example: the opregion reg for GPU device.

This patch tried to add the support of emulated PCI ROM bar for the
device. And it doesn't handle the phys PCI ROM bar of phys PCI devices.
At the same time the rom firmware is provided in DM and pci rom bar_reg
is also emulated in DM, this leverages the quirk mechanism so that the
access to PCI rom bar_reg is forwarded to DM.

Tracked-On: #8175
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Acked-by: Wang Yu <yu1.wang@intel.com>
2022-09-23 18:12:01 +08:00
Wu Zhou
6a430de814 hv: remove CPU frequency control from guests
The design of ACRN CPU performance management is to let hardware
do the autonomous frequency selection(or set to a fixed value),
and remove guest's ability to control CPU frequency.

This patch is to remove guest's ability to control CPU frequency by
removing the guests' HWP/EIST CPUIDs and blocking the related MSR
accesses. Including:
  - Remove CPUID.06H:EAX[7..11] (HWP)
  - Remove CPUID.01H:ECX[7] (EIST)
  - Inject #GP(0) upon accesses to MSR_IA32_PM_ENABLE,
    MSR_IA32_HWP_CAPABILITIES, MSR_IA32_HWP_REQUEST,
    MSR_IA32_HWP_STATUS, MSR_IA32_HWP_INTERRUPT,
    MSR_IA32_HWP_REQUEST_PKG
  - Emulate MSR_IA32_PERF_CTL. Value written to MSR_IA32_PERF_CTL
    is just stored for reading. This is like how the native
    environment would behavior when EIST is disabled from BIOS.
  - Emulate MSR_IA32_PERF_STATUS by filling it with base frequency
    state. This is consistent with Windows, which displays current
    frequency as base frequency when running in VM.
  - Hide the IA32_MISC_ENABLE bit 16 (EIST enable) from guests.
    This bit is dependent to CPUID.01H:ECX[7] according to SDM.
  - Remove CPID.06H:ECX[0] (hardware coordination feedback)
  - Inject #GP(0) upon accesses to IA32_MPERF, IA32_APERF

Also DM do not need to generate _PSS/_PPC for post-launched VMs
anymore. This is done by letting hypercall HC_PM_GET_CPU_STATE sub
command ACRN_PMCMD_GET_PX_CNT and ACRN_PMCMD_GET_PX_DATA return (-1).

Tracked-On: #8168
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2022-09-21 03:48:58 +08:00
Jian Jun Chen
1bf984890b hv: tsc: start HPET counter before calibration
HPET is used to calibrate the tsc frequency if system fails to
get the accurate frequency from CPUID 0x15. But on some platforms
(for example: the emulated ACRN on QEMU) HPET is not started
by default, which causes the failure of calibration TSC by HPET.

Tracked-On: #8113
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Reviewed-by: Zhao Yakui <yakui.zhao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2022-09-15 03:14:01 +08:00
Wu Zhou
8c5bb8b471 hv: change 'DISABLED' settings to 'ENABLED'
In order to improve DX, 'DISABLED' style configurator settings are
changed to 'ENABLED' style. The meaning of the MACROs are reversed,
so in the hv code we have to change '#ifndef' -> '#ifdef' or
'#ifdef' -> '#ifndef'.

Including:
  - MCE_ON_PSC_DISABLED -> MCE_ON_PSC_ENABLED
  - ENFORCE_TURNOFF_AC -> SPLIT_LOCK_DETECTION_ENABLED
  - ENFORCE_TURNOFF_GP -> UC_LOCK_DETECTION_ENABLED

Tracked-On: #7661
Acked-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
2022-08-17 09:23:33 +08:00
Conghui
51e6dc5864 hv: sched: fix bug when reboot vm
BVT schedule rule:
When a new thread is wakeup and added to runqueue, it will get the
smallest avt (svt) from runqueue to initiate its avt. If the svt is
smaller than it's avt, it will keep the original avt. With the svt, it
can prevent a thread from claiming an excessive share of CPU after
sleepting for a long time.

For the reboot issue, when the VM is reboot, it means a new vcpu thread
is wakeup, but at this time, the Service VM's vcpu thread is blocked,
and removed from the runqueue, and the runqueue is empty, so the svt is
0. The new vcpu thread will get avt=0. avt=0 means very high priority,
and can run for a very long time until it catch up with other thread's
avt in runqueue.
At this time, when Service VM's vcpu thread wakeup, it will check the
svt, but the svt is very small, so will not update it's avt according to
the rule, thus has a very low priority and cannot be scheduled.

To fix it, update svt in pick_next handler to make sure svt is align
with the avt of the first obj in runqueue.

Tracked-On: #7944
Signed-off-by: Conghui <conghui.chen@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
2022-08-05 02:39:54 +08:00
Yonghua Huang
95a938e50a hv: validate inputs in vpci_mmio_cfg_access
This function is registered as PCI MMIO configuration
  access handler, which processes PCI configuration access
  request from ACRN guest hence the inputs shall be validated
  to avoid potential hypervisor crash when handling inputs
  from malicious guests.

Tracked-On: #7902
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2022-07-29 10:30:08 +08:00
Minggui Cao
83164d6030 hv: shell: improve console to modify input easier
1. make memcpy_erms as a public API; add a new one
  memcpy_erms_backwards, which supports to copy data from tail to head.

  2. improve to use right/left/home/end key to move cursor, and support
delete/backspace key to modify current input command.

Tracked-On: #7931
Signed-off-by: Minggui Cao <minggui.cao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2022-07-28 23:31:43 +08:00
Minggui Cao
d5b2c82156 hv: shell: improve console to buffer history cmds
1. buffer history commands.
  2. support up/down key to select history buffered commands

Tracked-On: #7931
Signed-off-by: Minggui Cao <minggui.cao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2022-07-28 23:31:43 +08:00
Jian Jun Chen
22a302599a hv: tlfs: fix the incorrect vLAPIC freq MSR
When LAPIC timer is working in oneshot or periodic mode, OS uses
initial counter register/current counter register to program
a timer. Both initial counter and current counter depend on the
LAPIC frequency. ACRN emulated vLAPIC timer based on the TSC.
vLAPIC freq is the same as TSC freq.

Tracked-On: #7876
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Reviewed-by: Zhao Yakui <yakui.zhao@intel.com>
2022-07-26 05:53:19 +08:00
Yifan Liu
4f4da08490 hv: cve hotfix: Disable RRSBA on platform using retpoline
For platform that supports RRSBA (Restricted Return Stack Buffer
Alternate), using retpoline may not be sufficient to guard against branch
history injection or intra-mode branch target injection. RRSBA must
be disabled to prevent CPUs from using alternate predictors for RETs.

Quoting Intel CVE-2022-0001/CVE-2022-0002:

Where software is using retpoline as a mitigation for BHI or intra-mode BTI,
and the processor both enumerates RRSBA and enumerates RRSBA_DIS controls,
it should disable this behavior.
...
Software using retpoline as a mitigation for BHI or intra-mode BTI should use
these new indirect predictor controls to disable alternate predictors for RETs.

See: https://www.intel.com/content/www/us/en/developer/articles/technical/
 software-security-guidance/technical-documentation/branch-history-injection.html

Tracked-On: #7907
Signed-off-by: Yifan Liu <yifan1.liu@intel.com>
2022-07-22 09:38:41 +08:00
Jian Jun Chen
c88860250e hv: tlfs: add tlfs TSC freq MSR support for WaaG
TLFS defined 2 vMSRs which can be used by Windows guest to get the
TSC/APIC frequencies from hypervisor. This patch adds the support
of HV_X64_MSR_TSC_FREQUENCY/HV_X64_MSR_APIC_FREQUENCY vMSRS whose
availability is exposed by CPUID.0x40000003:EAX[bit11] and EDX[bit8].

v1->v2:
- revise commit message to highlight that the changes are for WaaG

Tracked-On: #7876
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Reviewed-by: Zhao Yakui <yakui.zhao@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
2022-07-18 16:15:29 +08:00
Jian Jun Chen
97a2919138 hv: tsc: calibrate TSC by HPET
On some platforms CPUID.0x15:ECX is zero and CPUID.0x16 can
only return the TSC frequency in MHZ which is not accurate.
For example the TSC frequency obtained by CPUID.0x16 is 2300
MHZ and the TSC frequency calibrated by HPET is 2303.998 MHZ
which is much closer to the actual TSC frequency 2304.000 MHZ.
This patch adds the support of using HPET to calibrate TSC
when HPET is available and CPUID.0x15:ECX is zero.

v3->v4:
  - move calc_tsc_by_hpet into hpet_calibrate_tsc

v2->v3:
  - remove the NULL check in hpet_init
  - remove ""& 0xFFFFFFFFU" in tsc_read_hpet
  - add comment for the counter wrap in the low 32 bits in
    calc_tsc_by_hpet
  - use a dedicated function for hpet_calibrate_tsc

v1->v2:
  - change native_calibrate_tsc_cpuid_0x15/0x16 to
    native_calculate_tsc_cpuid_0x15/0x16
  - move hpet_init to BSP init
  - encapsulate both HPET and PIT calibration to one function
  - revise the commit message with an example"

Tracked-On: #7876
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
2022-07-17 16:48:47 +08:00
Ziheng Li
eb8bcb06b3 Update copyright year range in code headers
Modified the copyright year range in code, and corrected "int32_tel"
into "Intel" in two "hypervisor/include/debug/profiling.h" and
"hypervisor/include/debug/profiling_internal.h".

Tracked-On: #7559
Signed-off-by: Ziheng Li <ziheng.li@intel.com>
2022-07-15 11:48:35 +08:00
Yifan Liu
05460f151a hv: Serialize WBINVD using wbinvd_lock
As mentioned in previous patch, wbinvd utilizes the vcpu_make_request
and signal_event call pair to stall other vcpus. Due to the fact that
these two calls are not thread-safe, we need to avoid concurrent call to
this API pair.

This patch adds wbinvd lock to serialize wbinvd emulation.

Tracked-On: #7887
Signed-off-by: Yifan Liu <yifan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2022-07-14 09:05:37 +08:00
Yifan Liu
745e70fb06 hv: Change sched_event back to boolean-based implementation
Commit d575edf79a changes the internal
implementation of wait_event and signal_event to use a counter instead
of a boolean value.

The background was:
ACRN utilizes vcpu_make_request and signal_event pair to shoot down
other vcpus and let them wait for signals. vcpu_make_request eventually
leads to target vcpu calling wait_event.

However vcpu_make_request/signal_event pair was not thread-safe,
and concurrent calls of this pair of API could lead to problems.
One such example is the concurrent wbinvd emulation, where vcpus may
concurrently issue vcpu_make_request/signal_event to synchronize wbinvd
emulation.

d575edf commit uses a counter in internal implementation of
wait_event/signal_event to avoid data races.

However by using a counter, the wait/signal pair now carries semantics of
semaphores instead of events. Semaphores require caller to carefully
plan their calls instead of multiply signaling any number of times to the same
event, which deviates from the original "event" semantics.

This patch changes the API implementation back to boolean-based, and
re-resolve the issue of concurrent wbinvd in next patch.

This also partially reverts commit 10963b04d1,
which was introduced because of the d575edf.

Tracked-On: #7887
Signed-off-by: Yifan Liu <yifan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
2022-07-14 09:05:37 +08:00
Shiqing Gao
3eb1237db3 config_tools: verify "iasl" version against IASL_MIN_VER
To avoid hardcoding the minimum "iasl" version in multiple places, IASL_MIN_VER
is defined in the top-level Makefile and is passed to config_tools.

This patch verifies "iasl" version against IASL_MIN_VER directly in
config_tools.

Tracked-On: #7880

Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
2022-07-13 14:01:01 +08:00
Shiqing Gao
3f0fae81b2 config_tools: use ASL_COMPILER as the path to the "iasl" compiler
At build time (on the *dev* machine), config_tools depends on "iasl" to
generate the binary of ACPI tables for pre-launched VMs.

This patch does:
- pass ASL_COMPILER to config_tools
  By default, ASL_COMPILER is initialized by "which iasl" at build time.
  User could override it by specifying ASL_COMPILER as the build option,
  like below:
    make BOARD=xxxx SCENARIO=yyyy ASL_COMPILER=/usr/local/bin/iasl

- use ASL_COMPILER as the path to the "iasl" compiler in config_tools

v1 -> v2:
 - add a check to make sure ASL_COMPILER is initialized to a value

Tracked-On: #7880

Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
2022-07-13 14:01:01 +08:00
Qiang Zhang
95c4d18423 hv: compile out unused function if CONFIG_MULTIBOOT2 is disabled
When CONFIG_MULTIBOOT2 is disabled, 'create_service_vm_efi_mmap_desc' is
unused and build fails because [-Werror=unused-function] is set.

boot/guest/bzimage_loader.c:188:17: error: 'create_service_vm_efi_mmap_desc' defined but not used [-Werror=unused-function]
  188 | static uint16_t create_service_vm_efi_mmap_desc(struct acrn_vm *vm, struct efi_memory_desc *efi_mmap_desc)
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors

Tracked-On: #7634
Signed-off-by: Qiang Zhang <qiang4.zhang@linux.intel.com>
2022-07-07 11:25:31 +08:00