Some cpuids will return invalid values on hybrid platform because of the
error in the pointer arithmetic. Add `(void *)` before
`cpu_cpuids.leaves`.
Leaf 0x14 is used to report Intel Processor Trace Enumeration and varies
between P-cores and E-cores on hybrid platform. So add it to
`hybrid_leaves`.
Tracked-On: #8608
Fixes: 59a8cc4c2 ("hv: cpuid: make leaf 0x4 per-cpu in hybrid architecture")
Signed-off-by: Haiwei Li <haiwei.li@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
In HV, cpuid uses the lower 32 bits of rax\rbx\rcx\rdx registers to pass parameters,
But the software does not clear the upper 32-bit registers, if the guest
uses 64-bit variables to pass parameters to cpuid,guest will use rax\rbx\rcx\rdx,
not eax\ebx\ecx\edx, the previous value of the high 32 registers will affect the guest.
Tracked-On: #8605
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Signed-off-by: andi6 <andi6@xiaomi.com>
Virtio legacy device (ver < 1.0) uses a single PIO for all virtqueues.
Notifications from different virtqueues are implemented by writing
virtqueue index to the PIO. Writing different values to the same addr
needs to be mapped to different eventfds by asyncio. This is called
data match feature of asyncio.
v3 -> v4:
* Update the definition of `struct asyncio_desc`
Use `struct acrn_asyncio_info` inside it, instaed of defining the duplicated
fileds.
* Update `add_asyncio` to use `memcpy_s` rather than assigning all the fields
using 5 assignment statements.
* Update `asyncio_is_conflict` for coding style
120-character line is sufficient to write all conditions.
* Update the checks related to `wildcard`
Because we require every conditional clause to have a Boolean type
in the coding guideline.
v2 -> v3:
No change
v1 -> v2:
No change
Tracked-On: #8612
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
Acked-by: Wang, Yu1 <yu1.wang@intel.com>
The create function of hv-emulated device must check the return value
of vpci_init_vdev() as it returns NULL pointer on failure, and that
function should be called atomically.
Also, the destory function should deinit the vpci devices created to
prevent resource leak.
Tracked-On: #8590
Signed-off-by: Jiaqing Zhao <jiaqing.zhao@linux.intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
In devicemodel, a passthrough device is deassigned and then assigned to
guest on guest reboot. Each time hypervisor allocates a new pci_vdev
structure to keep its info. As it was stored in a statically-allocated
array, it will eventually use up all slots, resulting both resource
leak and out-of-bounds access.
Fix it by clearing the corresponding vdev structure on device deassign,
thus a bitmap is introduced to track the usage, replacing the existing
array count.
Tracked-On: #8590
Signed-off-by: Jiaqing Zhao <jiaqing.zhao@linux.intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
P-cores and E-cores accessing leaf 0x2U/0x14U/0x16U/0x18U/0x1A/0x1C/0x80000006U
will have different information in hybrid architecture.
So add them to per-cpu list in hybrid architecture and directly return
the physical value.
Note: 0x14U is hided and return 0.
Tracked-On: #8608
Signed-off-by: Haiwei Li <haiwei.li@intel.com>
Leaf 0x6 returns thermal and power management information. In
hybrid architecture, P-cores and E-cores have different information.
Add leaf 0x6 to per-cpu list in hybrid architecture and handle specific
cpuid access.
Tracked-On: #8608
Signed-off-by: Haiwei Li <haiwei.li@intel.com>
Leaf 0x4 returns deterministic cache parameters for each level. In
hybrid architecture, P-cores and E-cores have different cache
information.
Add leaf 0x4 to per-cpu list in hybrid architecture and handle specific
cpuid access.
Tracked-On: #8608
Signed-off-by: Haiwei Li <haiwei.li@intel.com>
CPUID returns processor identification and feature information.
Different pcpus may return different infos. That is, the info is
per-cpu.
In hybrid architecture, per-cpu leaf is different from the previous. So
introduce a struct percpu_cpuids to indicate the per-cpu leaf. struct
percpu_cpuids will consist of two parts: generic percpu leaves and
hybrid related percpu leaves.
This patch is just to add generic percpu leaves.
Tracked-On: #8608
Signed-off-by: Haiwei Li <haiwei.li@intel.com>
CPUID leaf 1f is preferred superset of leaf 0b, currently ACRN exposes
leaf 0b but leaf 1f is empty so the 2 leaves mismatch, and so
application will follow the SDM to check 1f first.
Tracked-On: #8608
Signed-off-by: Xin Zhang <xin.x.zhang@intel.com>
This patch can fetch the thermal lvt irq and propagate
it to VM.
At this stage we support the case that there is only one VM
governing thermal. And we pass the hardware thermal irq to this VM.
First, we register the handler for thermal lvt interrupt, its irq
vector is THERMAL_VECTOR and the handler is thermal_irq_handler().
Then, when a thermal irq occurs, it flags the SOFTIRQ_THERMAL bit
of softirq_pending, This bit triggers the thermal_softirq() function.
And this function will inject the virtual thermal irq to VM.
Tracked-On: #8595
Signed-off-by: Zhangwei6 <wei6.zhang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
In this phase, we only use one VM to control thermal.
So we make thermal MSRs readable and writable by this VM.
This VM is flagged with GUEST_FLAG_VTM, and can
read/write thermal MSRs.
For the VMs not flagged with GUEST_FLAG_VTM,
can only read these thermal MSRs to get current status.
Tracked-On: #8595
Signed-off-by: Zhangwei6 <wei6.zhang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
When RX FIFO is not empty and Receive Data Available interrupt is
enabled, vUART should report a Receive Data Available (IIR_RXRDY) in IIR
instead of a Timeout Interrupt Pending (IIR_RXTOUT).
Tracked-On: #8583
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
The break key (key value 0x0) was used as switch key from guest serial
to hv console and guest serial could not receive break key. This blocked
some guest debugging features like KGDB/KDB, sysrq, etc.
This patch leverages escape sequence "<escape> + <break>" to send break to
guest and "<escape> + e" to switch from guest serial to hv console.
Tracked-On: #8583
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
In hcall_vm_intr_monitor(), the default case for intr_hdr->cmd is a
wrong case. So, it should return error code back. But it returns success
code 0 in current codes.
Tracked-On: #8580
Reviewed-by: Fei Li <fei1.li@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Leave canary of stack protector untouched on pCPU
if it has been initialized, instead of generating a new one.
Tracked-On: #8577
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
1) region ID shall be configured by user via config tool.
2) region ID is programmed to "Subsystem ID" of PCI config space.
2) "Subsystem Vendor ID" is harded coded as 0x8086
Tracked-On: #8566
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
This patch adds ivshmem region ID configuration support when user
configure ACRN IVSHMEM devices via ACRN config tool, this ID provides
VMs with a stable identification of multiple shared memory regions.
Also add logic to generate launch script with region ID configured
as below:
`add_virtual_device 8 ivshmem hv:/shm_region_0,256,1`
Tracked-On: #8566
Signed-off-by: Kunhui-Li <kunhuix.li@intel.com>
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
ppt_page_pool.bitmap should be zero-initialized. Also fixes the wrong
indention in allocate_ppt_pages().
Tracked-On: #8559
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Signed-off-by: Jiaqing Zhao <jiaqing.zhao@linux.intel.com>
Add a new option CONSOLE_VM in scenario to set the default vm to be
outputted in hv console, when it is not set, acrn console will be
used (current behavior). This is intended for debugging vm boot issues.
Tracked-On: #8518
Signed-off-by: Jiaqing Zhao <jiaqing.zhao@linux.intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
For uart console, some control keys are defined as byte sequences,
such as:
* up arrow - 0x1b/0x5b/0x41
* F8 - 0x1b/0x5b/0x31/0x39/0x7e
Currently hv console only read one char per poll.
When guest vuart console is active, those byte sequences may not be sent
to guest vuart in good timing due to the poll interval. Thus control keys
such as up/down can not be used in shell or vim.
The solution is to read all input chars in one poll, so that control keys
can be received by guest OS properly.
Tracked-On: #8564
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
sbuf_put copies sbuf->ele_size of data, and puts into ring. Currently
this function assumes that data size from caller is no less than
sbuf->ele_size.
But as sbuf->ele_size is usually setup by some sources outside of the HV
(e.g., the service VM), it is not meant to be trusted. So caller should
provide the max length of the data for safety reason. sbuf_put() will
return UINT32_MAX if max_len of data is less than element size.
Additionally, a helper function sbuf_put_many() is added for putting
multiple entries.
Tracked-On: #8547
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
In the triple fault handler, post-launched VMs are instantly turned
off. Now a vm event is generated simultaneously. So that
developers can capture the event and decide what to do with it. (e.g.,
logging and populating diagnostics, or poweroff VM)
Tracked-On: #8547
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
This patch adds support for HV vrtc vm_event.
RTC change event is sent upon each date/time reg write. Those events
will be handled in DM. DM will try to emit an RTC change event(to
Libvirt) based on its strategy. Only support post-launched VMs.
The DM event handler has already implemented the rtc chanage event.
Those events will be processed the same way as vrtc events from DM vrtc.
Tracked-On: #8547
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
When a guest OS performs an RTC change action, we wish this event be
captured by developers, and then they can decide what to do with it.
(e.g., whether to change physical RTC)
There are some facts that makes RTC change event a bit complicated:
- There are 7 RTC date/time regs (year, month…). They can only be
updated one by one.
- RTC time is not reliable before date/time update is finished.
- Guests can update RTC date/time regs in any order.
- Guests may update RTC date/time regs during either RTC halted or not
halted.
A single date/time update event is not reliable. We have to wait for
the guest to finish the update process. So the DM's event handler
sets up a timer, and wait for some time (1 second). If no more change
happens befor the timer expires, we can conclude that the RTC
change has been done. Then the rtc change event is emitted.
This logic of event handler can be used to process HV vrtc time change
event too.
Tracked-On: #8547
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Jian Jun Chen <jian.jun.chen@intel.com>
This patch creates a thread for vm_event delivery. The thread uses epoll
to poll event notifications, then read out the msg data queued in sbuf.
An event handler is called upon success receiving. Both HV and DM event
sources share the same process.
Also vm_event tx API for DM event source is added in this patch.
Tracked-On: #8547
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Jian Jun Chen <jian.jun.chen@intel.com>
This patch creates vm_event support in HV, including:
1. Create vm_event data type.
2. Add vm_event sbuf and its initializer. The sbuf will be allocated by
DM in Service VM. Its page address will then be share to HV through
hypercall.
3. Add an API to send the HV generated event.
Tracked-On: #8547
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Currently ivshmem device can only be configurated as single function
device(bdf.f = 0) on bus 0. This greatly limits the number of ivshmem
devices we can create. This patch is to enable multiple function bit in
HEADER_TYPE config register, so that we can create many more ivshmem
devices by using different function numbers on one bus:dev.
The multi function device bit is to be set on ivshmem devices whose function
number equls 0. PCI spec describe it as: ‘When Set, indicates that the
Device may contain multiple Functions, but not necessarily.’, So if this
dev is the only one on the bus:dev, it is still OK.
Tracked-On: #8520
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Per BVT (Borrowed Virtual Time) scheduler design, following per thread
parameters are required to tune scheduling behaviour.
- weight
The time sharing of a thread on CPU.
- warp
Boost value of virtual time of a thread (time borrowed from future) to
reduce Effective Virtual Time to prioritize the thread.
- warp_limit
Max warp time in one warp.
- unwarp_period
Min unwarp time after a warp.
As of now, only weight is in use to tune virtual time ratio of VCPU
threads from different VMs. Others parameters are for future extension.
Tracked-On: #8500
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
Add four per-vm bvt parameters as the initial bvt parameter values for
vCPU threads.
- bvt_weight
The time sharing of a thread on CPU.
- bvt_warp_value
Boost value of virtual time of a thread (time borrowed from future) to
reduce Effective Virtual Time to prioritize the thread.
- bvt_warp_limit
Max warp time in one warp.
- bvt_unwarp_period
Min unwarp time after a warp.
Tracked-On: #8500
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
Abstract out schedulers config data for vCPU threads and other hypervisor
threads to sched_params structure. And it's used to initialize per
thread scheduler private data. The sched_params for vCPU threads come
from vm_config generated by config tools while other hypervisor threads
need give them explicitly.
Tracked-On: #8500
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
make_request sets the request bit, and signal_event wakes the vcpu
thread. If we signal_event comes first, the target vCPU has a chance to
sleep again before processing the request bit.
Tracked-On: #8507
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
When all vCPU threads on one pCPU are put to sleep (e.g., when all
guests execute HLT), hv would schedule to idle thread. Currently the
idle thread executes PAUSE which does not enter any c-state and consumes
a lot of power. This patch is to support HLT in the idle thread.
When we switch to HLT, we have to make sure events that would wake a
vCPU must also be able to wake the pCPU. Those events are either
generated by local interrupt or issued by other pCPUs followed by an
ipi kick.
Each of them have an interrupt involved, so they are also able to wake
the halted pCPU. Except when the pCPU has just scheduled to idle thread
but not yet halted, interrupts could be missed.
sleep-------schedule to idle------IRQ ON---HLT--(kick missed)
^
wake---kick|
This areas should be protected. This is done by a safe halt
mechanism leveraging STI instruction’s delay effect (same as Linux).
vCPUs with lapic_pt or hv with CONFIG_KEEP_IRQ_DISABLED=y does not allow
interrupts in root mode, so they could never wake from HLT (INIT kick
does not wake HLT in root mode either). They should continue using PAUSE
in idle.
Tracked-On: #8507
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
When bvt scheduler picks up a thread to run, it sets up a counter
‘run_countdown’ to determine how many ticks it should remain running.
Then the timer will decrease run_countdown by 1 on every 1000Hz tick
interrupt, until it reaches 0. The tick interrupt consumes a lot of
power during idle (if we are using HLT in idle thread).
This patch is to switch the 1000 HZ timer to a dynamic one, which only
interrupt on run_countdown expires.
Tracked-On: #8507
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
ACRN boot fails when size of ivshmem device is
configured to 512MB, this patch allocates this memory
from E820 table instead of reserving in hypervisor.
Tracked-On: #8502
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
~0UL is widely used to specify the maximum memory size
when calling e820_alloc_memory(), this patch to define
a MACRO for it to avoid using this magic number.
Tracked-On: #8502
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
For a pdev which allocated to prelaunched VM or owned by HV, we need to check
whether it is a multifuction dev at function 0. If yes we have to emulate a
dummy function dev in Service VM, otherwise the sub-function devices will be
lost in guest OS pci probe process.
Tracked-On: #8492
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
Signed-off-by: Victor Sun <victor.sun@intel.com>
When we do init_all_dev_config() in pci.c, the pdevs added to pci dev_config
will be exposed to Service VM or passthru to prelauched VM. The original code
would find service VM config in every pci pdev init loop, this is unnecessary
and definitely impact performance. Here we generate Service VM config pointer
with config tool so that init_one_dev_config() could refer service VM config
directly.
Tracked-On: #8491
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
Signed-off-by: Victor Sun <victor.sun@intel.com>
Rename is_allocated_to_prelaunched_vm to allocate_to_prelaunched_vm as
it not only checks whether the PCI device is allocated to a Pre-launched
VM but also associate it with Pre-launched VM's dev_config.
Tracked-On: #8491
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
On some platforms, the last e820 entry may not be of type E820_TYPE_RAM,
such as E820_TYPE_ACPI_NVS which may also be used by Service VM.
So we need take all e820 entry types into account when finding the upper
bound of Service VM EPT mapping.
Tracked-On: #8495
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
The ivshmem spec define the BAR0 offset > 16 are reserved.
So ACRN need ignore all operation when offset out of range.
Tracked-On: #8487
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Add the ivshmem_dev_lock to protect races between release
and allocations.
Tracked-On: ##8486
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
ACRN sets the ivposition to the VM_ID in ivshmem_server_bind_peer().
This value should be saved in the ivshmem_device until unbind.
It is wrong to clear ivs_dev->mmio in the ivshmem_vbar_map(),
Instead, it should clear the ivshmem_device structure in the
create_ivshmem_device to ensure the same initial states
after VM reboot case.
Tracked-On: #8485
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Junjie Mao <Junjie.mao@intel.com>
doxygen will warn that documented return type is found for functions
that does not return anything in 1.9.4 or later versions. 'None' is
not a special keyword in doxyge, it will recognize it as description
to the return value that does not exist in void functions.
Tracked-On: #8425
Signed-off-by: Jiaqing Zhao <jiaqing.zhao@linux.intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>