Device model uses polling mode to fetch the virtio blk request in RTVM.
When RTVM brings up with io uring, the threads handling io uring and vq are
not same, which would cause competition. To fix this issue, device
model should handle vq and io uring in the same thread to avoid conflict.
Tracked-On: #8737
Signed-off-by: YuanXin-Intel <xin.yuan@intel.com>
Reviewed-by: Jian Jun Chen <jian.jun.chen@intel.com>
Prior to this patch, one single iothread instance is created and initialized
in the `main` function. This single iothread monitors all the registered fds
and handles all the corresponding requests. It leads to the limited flexibility
of the iothread support.
To improve the flexibility of the iothread support, this patch does:
- add the support of multiple iothread instances.
`iothread_create` is introduced to create a certain number of iothread
instances. It shall be called at first by each virtual device owner (such as
virtio-blk BE) on initialization phase. Then, `iothread_add` can be called
to add the to be monitored fd to the specified iothread.
- update virtio-blk BE to let the acrn-dm option `iothread` accept a number
as the number of iothread instances to be created.
If `iothread` is contained in the parameters, but the number is not specified,
one iothread instance would be created by default.
Examples to specify the number of iothread instances:
1. Create 2 iothread instances
`add_virtual_device 9 virtio-blk iothread=2,mq=2,/dev/nvme1n1,writeback,aio=io_uring`
2. Create 1 iothread instances (by default)
`add_virtual_device 9 virtio-blk iothread,mq=2,/dev/nvme1n1,writeback,aio=io_uring`
- update virtio-blk BE to separate the request handling of different virtqueues
to different iothreads.
The request from one or more virtqueues can be handled in one iothread.
The mapping between virtqueues and iothreads is based on round robin.
v1 -> v2:
* add a mutex to protect the free ioctx slot allocation
Tracked-On: #8612
Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
Acked-by: Wang, Yu1 <yu1.wang@intel.com>
io_uring is a high-performance asynchronous I/O framework, primarily designed
to improve the efficiency of input and output (I/O) operations in user-space
applications.
This patch enables io_uring in block_if module. It utilizes the interfaces
provided by the user-space library `liburing` to interact with io_uring
in kernel-space.
To build the acrn-dm with io_uring support, `liburing-dev` package needs to be
installed. For example, it can be installed like below in Ubuntu 22.04.
sudo apt install liburing-dev
In order to support both the thread pool mechanism and the io_uring mechanism,
an acrn-dm option `aio` is introduced. By default, thread pool mechanism is
selected.
- Example to use io_uring:
`add_virtual_device 9 virtio-blk iothread,mq=2,/dev/nvme1n1,writeback,aio=io_uring`
- Example to use thread pool:
`add_virtual_device 9 virtio-blk iothread,mq=2,/dev/nvme1n1,writeback,aio=threads`
- Example to use thread pool (by default):
`add_virtual_device 9 virtio-blk iothread,mq=2,/dev/nvme1n1,writeback`
v2 -> v3:
* Update iothread_handler
- Use the unified eventfd interfaces to read the counter value of the
ioeventfd.
- Remove the while loop to read the ioeventfd. It is not necessary
because one read would reset the counter value to 0.
* Update iou_submit_sqe to return an error code
The caller of iou_submit_sqe shall check the return value.
If there is NO available submission queue entry in the submission queue,
need to break the while loop. Request can only be submitted when SQE is
available.
v1 -> v2:
* move the logic of reading out ioeventfd from iothread.c to virtio.c, because
it is specific to the virtqueue handling.
Tracked-On: #8612
Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
Acked-by: Wang, Yu1 <yu1.wang@intel.com>
ACRN virtio devices are using a per device mutex to protect the
concurrent operations on the device's PIO/MMIO. This introduces
big contention in fast IO hence downgrades the IO performance,
for example virtio-blk with asyncio enabled. This patch introduces
per queue mutex to relieve such issues. Currently the per queue
mutex is only used in the asycio path when iothread is enabled.
Tracked-On: #8612
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Acked-by: Wang, Yu1 <yu1.wang@intel.com>
ACRN_IOEVENTFD_FLAG_ASYNCIO is not set when unregister ioeventfd
in the current implementation which will cause the old asyncio_desc
will be remained in hypervisor link list when switching from OVMF to
kernel.
Tracked-On: #8612
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Acked-by: Wang, Yu1 <yu1.wang@intel.com>
doxygen will warn that documented return type is found for functions
that does not return anything in 1.9.4 or later versions. 'None' is
not a special keyword in doxyge, it will recognize it as description
to the return value that does not exist in void functions.
Tracked-On: #8425
Signed-off-by: Jiaqing Zhao <jiaqing.zhao@linux.intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Add the new parameter for register ioevent function, let the vhost
vq and viothread vq can share the register ioevent common API.
Tracked-On: #8323
Signed-off-by: Liu Long <long.liu@linux.intel.com>
Reviewed-by: Conghui <conghui.chen@intel.com>
Add a new flag in ioeventfd ioctl to support asyncio. After that, the IO
request will be processed in asyncio path by kernel and hypervisor.
Tracked-On: #8209
Signed-off-by: Conghui <conghui.chen@intel.com>
Acked-by: Wang, Yu1 <yu1.wang@intel.com>
Fix the bug in iothread handler, the event should be read out so that the
next epoll_wait not return directly as the fd can still readable.
Tracked-On: #8181
Signed-off-by: Conghui <conghui.chen@intel.com>
Acked-by: Wang, Yu1 <yu1.wang@intel.com>
Add ioeventfd and iothread to virtio framework. When a virtio device
claim to support iothread, virtio framework will register a ioeventfd
and add it to iothread's epoll. After that, the new notify will come
through the iothread instead of the vcpu thread. The notify handler will
be called to process the request.
Tracked-On: #7940
Signed-off-by: Conghui <conghui.chen@intel.com>
Acked-by: Wang, Yu1 <yu1.wang@intel.com>
The virtio feature bits is 64bits but the access will be split
as two times 32bits access. This patch fixes the bug which overwrites
another side 32bits, and causes feature bits are lost.
Tracked-On: #7456
Signed-off-by: Liu Long <long.liu@linux.intel.com>
Reviewed-by: Conghui <conghui.chen@intel.com>
Acked-by: Wang, Yu1 <yu1.wang@intel.com>
Legacy VGA & VBE interface as a common interface is supported by
many legacy and modern OS. Many installer of OS distribution use
this interface to display the GUI of installer when setup a refresh
new installation on bare-metal. Besides, Windows OS always use this
interface to display it's BSOD, recovery mode & safe mode GUI. It
is need because Windows don't include virtio-gpu driver as their
in-box driver, VGA interface will be used before the virtio-gpu
driver been installed.
To be compatiable with the PCI bar layout of legacy VGA, the layout
is refined to meet with the requirement of legacy VGA and modern
virtio-gpu.
BAR0: VGA Framebuffer memory, 16 MB in size.
BAR2: MMIO Space
[0x0000~0x03ff] EDID data blob
[0x0400~0x041f] VGA ioports registers
[0x0500~0x0516] bochs display interface registers
[0x1000~0x17ff] Virtio common configuration registers
[0x1800~0x1fff] Virtio ISR state registers
[0x2000~0x2fff] Virtio device configuration registers
[0x3000~0x3fff] Virtio notification registers
BAR4: MSI/MSI-X
BAR5: Virtio port io
Tracked-On: #7210
Signed-off-by: Sun Peng <peng.p.sun@linux.intel.com>
Reviewed-by: Zhao, yakui <yakui.zhao@intel.com>
Acked-by: Wang, Yu1 <yu1.wang@intel.com>
paddr_guest2host can return NULL, but code paths in virtio
are not checking the return value.
_vq_record() initializes iov_base pointer using paddr_guest2host()
but there is nothing in the flow that checks for NULL.
Chane _vq_record to return -1 in case the address translation
has failed.
Tracked-On: #5452
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Acked-by: Wang, Yu1 <yu1.wang@intel.com>
Add Null pointer check in init vq ring and add vq ring descriptor
check in case cause Nullpointer exception.
Tracked-On: #5355
Signed-off-by: Liu Long <long.liu@intel.com>
Add Null pointer check in init vq ring and add vq ring descriptor
check in case cause Nullpointer exception.
Tracked-On: #5355
Signed-off-by: Liu Long <long.liu@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
We can only call these callbacks when they are not NULL.
Tracked-On: #5342
Signed-off-by: Jie Deng <jie.deng@intel.com>
Acked-by: Wang, Yu1 <yu1.wang@intel.com>
use acrn-dm logger function instread of fprintf,
this helps the stability testing log capture.
Tracked-On: #4098
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Cao Minggui <minggui.cao@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
In vxworks, virtio-console FE driver only initiate 2 virtqueues, but BE
creates 2+ virtqueues for it. So the rest of the virtqueues are not
initiated. vq->used->flags cannot be used directly without any
condition.
Tracked-On: #3203
Signed-off-by: Gao Junhao <junhao.gao@intel.com>
Reviewed-by: Yu Wang <yu1.wang@intel.com>
Acked-by: Yu Wang <yu1.wang@intel.com>
Make sure VQ_ALLOC is visible only after vq is completely initialized.
This ensures vq_ring_ready() is reliable when it returns true.
Tracked-On: #2763
Signed-off-by: Peter Fang <peter.fang@intel.com>
Reviewed-by: Shuo A Liu <shuo.a.liu@intel.com>
Without memory barrier, the change of used ring index could not
immediately detected by FE, this would bring some problems.
For virtio-blk FE driver, when it receives an interrupt, and confirms the
used ring index has changed, it will first set ring flags with
VRING_AVAIL_F_NO_INTERRUPT, then get buffer from virtqueue, after
process this request, it will mask VRING_AVAIL_F_NO_INTERRUPT, and get
used ring index again before return. If used ring changes, it will
process it. At the same time, BE will read this flags before each notify,
if VRING_AVAIL_F_NO_INTERRUPT was set, BE will not inject interrupt.
Without memory barrier, before FE mask VRING_AVAIL_F_NO_INTERRUPT, BE
has finished notify without interrupt, then FE mask
VRING_AVAIL_F_NO_INTERRUPT, and get used ring index but failed (index
has changed from BE side). FE will return from interrupt handler
function, and wait for next interrupt which was not injected by BE. Thus,
this will cause kernel hung.
Tracked-On: #2732
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Wang Yu <yu1.wang@intel.com>
It is possible for multiple timeouts to occur in one mevent epoll
iteration. Providing the number of timer expirations to the timer
callback handlers can be useful. E.g., this could improve emulation of
timing-sensitive hardware components.
Tracked-On: #2319
Signed-off-by: Peter Fang <peter.fang@intel.com>
Acked-by: Anthony Xu <anthony.xu@intel.com>
Reuse linux common virtio header file and remove the repetitive
definition.
Tracked-On: #2145
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Acked-by: Yu Wang <yu1.wang@intel.com>
Device trap has great impact on latency of real time (RT) tasks.
This patch provide a virtio poll mode to avoid trap.
According to the virtio spec, backend devices can declare the
notification is not needed so that frontend will never trap.
This means the backends make commitment to the frontends they have a
poll mechanism which don’t need any frontends notification.
This patch uses a periodic timer to give backends pseudo notifications
so that drive them processing data in their virtqueues. People should
choose a appropriate notification peroid interval to use this poll
mode. Too big interval may cause virtqueue processing latency while
too small interval may cause high SOS CPU usage. The suggested interval
is between 100us to 1ms.
The poll mode is not enabled by default and traditional trap
notification mode will be used. To use poll mode for RT with interval
1ms. You can add following acrn-dm parameter.
--virtio_poll 1000000
Tracked-On: #1956
Signed-off-by: Jie Deng <jie.deng@intel.com>
Acked-by: Yu Wang <yu1.wang@intel.com>
It is preferred to state the absence of a return value explicitly in the
doxygen-stile comments. Currently there are different styles of doing this,
including:
@return None
@return NULL
@return void
@return N/A
This patch unifies the above with `@return None`.
Tracked-On: #1595
Signed-off-by: Junjie Mao <junjie.mao@intel.com>
movb is used for registers STATUS and CFGGENERATION whose size is 1
byte. Previously hv cannot report the correct MMIO trap size for
movb and virtio hard coded their size to 4 as a workaround. hv fixed
movb instruction emulation and MMIO size can be reported correctly.
This patch removes those workaround.
commit 9df8790ffc ("hv: Fix two minor issues in instruction emulation code")
Tracked-On: #1449
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Acked-by: Yin Fengwei <fengwei.yin@intel.com>
Some virtio ring structures and virtio feature bits are using the
same name/definition as those in kernel header files(linux/
virtio_ring.h, linux/virtio_config.h). Kernel header files must
be included to perform ioctls to support vhost. There are
compiling errors due to duplicated definitions. In this patch
the following renamings are done:
VRING_DESC_F_NEXT -> ACRN_VRING_DESC_F_NEXT
VRING_DESC_F_WRITE -> ACRN_VRING_DESC_F_WRITE
VRING_DESC_F_INDIRECT -> ACRN_VRING_DESC_F_INDIRECT
VRING_AVAIL_F_NO_INTERRUPT -> ACRN_VRING_AVAIL_F_NO_INTERRUPT
VRING_USED_F_NO_NOTIFY -> ACRN_VRING_USED_F_NO_NOTIFY
VIRTIO_F_NOTIFY_ON_EMPTY -> ACRN_VIRTIO_F_NOTIFY_ON_EMPTY
VIRTIO_RING_F_INDIRECT_DESC -> ACRN_VIRTIO_RING_F_INDIRECT_DESC
VIRTIO_RING_F_EVENT_IDX -> ACRN_VIRTIO_RING_F_EVENT_IDX
VIRTIO_F_VERSION_1 -> ACRN_VIRTIO_F_VERSION_1
vring_avail -> virtio_vring_avail
vring_used -> virtio_vring_used
vring_size -> virtio_vring_size
Tracked-On: #1329
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Acked-by: Yu Wang <yu1.wang@intel.com>
currently, each virtio device has their own virtio_ops implementation.
Take virtio-blk for example:
static struct virtio_ops virtio_blk_ops = {
"virtio_blk",
1,
sizeof(struct virtio_blk_config),
virtio_blk_reset,
virtio_blk_notify,
virtio_blk_cfgread,
virtio_blk_cfgwrite,
NULL,
NULL,
VIRTIO_BLK_S_HOSTCAPS,
};
If start DM with two virtio-blk, this global variable will be
assigined to two virtio-blk instances. Changing hv_caps for one
instance will affect others. But different instances may need
different capabilities.
To support this requirement, we suggest to move hv_caps to
virtio_base structure, and each instance can return their own
capabilities.
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Yu Wang <yu1.wang@intel.com>
The VIRTIO_PCI_CAP_PCI_CFG capability creates an alternative access
method to the common configuration, notification, ISR and device-
specific configuration regions.
To access a device region, the driver writes into the capability
structure (ie. within the PCI configuration space) as follows:
- The driver sets the BAR to access by writing to cap.bar
- The driver sets the size of the access by writing 1, 2 or 4 to
cap.length
- The driver sets the offset within the BAR by writing to cap.offset
At that point, pci_cfg_data will provide a window of size cap.length
into the given cap.bar at offset cap.offset.
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Reviewed-by: Hao Li <hao.l.li@intel.com>
Reviewed-by: Zhao Yakui <yakui.zhao@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
When the device has experienced an error from which it cannot
re-cover, DEVICE_NEEDS_RESET is set to the device status register
and a config change intr is sent to the guest driver.
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Reviewed-by: Hao Li <hao.l.li@intel.com>
Reviewed-by: Zhao Yakui <yakui.zhao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Virtio modern changed the virtqueue cofiguration precedures. GPA
of descriptor table, available ring and used ring are written to
common configuration registers separately. A final write to
Q_ENABLE register triggered initialization of the virtqueue on
the backend device.
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Reviewed-by: Hao Li <hao.l.li@intel.com>
Reviewed-by: Zhao Yakui <yakui.zhao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
virtio_notify_cfg_write is called when guest driver performs virtqueue
kick by writing the notificaiton register of the virtqueue.
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Reviewed-by: Hao Li <hao.l.li@intel.com>
Reviewed-by: Zhao Yakui <yakui.zhao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Registers in the isr configuration region are read-only.
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Reviewed-by: Hao Li <hao.l.li@intel.com>
Reviewed-by: Zhao Yakui <yakui.zhao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This patch implements the read/write callbacks for the registers in the
device-specific region. This region is implemented in the modern MMIO
bar.
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Reviewed-by: Hao Li <hao.l.li@intel.com>
Reviewed-by: Zhao Yakui <yakui.zhao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This patch implements the read/write callbacks for the registers in the
common configuration region. This region is implemented in the modern
MMIO bar.
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Reviewed-by: Hao Li <hao.l.li@intel.com>
Reviewed-by: Zhao Yakui <yakui.zhao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This patch implements the generic PCI barread/barwrite callbacks.
Specific barread/barwrite interfaces are called based on the baridx.
Virtio legacy devices, transitional devices and modern devices can
be handled in an unified way.
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Reviewed-by: Hao Li <hao.l.li@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Reviewed-by: Zhao Yakui <yakui.zhao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
virtio_set_modern_bar is an external interface that backend virtio
driver can call to initialize the PCI capabilities and PCI bars
defined in the virtio 1.0 spec.
The following are done in the function:
- 5 PCI capabilities are added to the PCI configuration space of the
virtio PCI device. (common/isr/device_specific/notify/cfg_access)
- A 64-bit MMIO bar is allocated to accommodate the registers defined
in the 4 PCI capabilities. (cfg_access capability does not require
MMIO.)
- If use_notify_pio is true, a PIO notify capability is added to the
PCI configuration space and a PIO bar is allocated for it
accordingly.
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Reviewed-by: Hao Li <hao.l.li@intel.com>
Reviewed-by: Zhao Yakui <yakui.zhao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Struct virtio_base and struct virtio_vq_info are expanded to support
virtio 1.0 framework. The BAR layouts of virtio legacy/transitional/
modern are introduced as well.
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Reviewed-by: Hao Li <hao.l.li@intel.com>
Reviewed-by: Zhao Yakui <yakui.zhao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>