mirror of
https://github.com/projectacrn/acrn-hypervisor.git
synced 2025-05-02 13:44:00 +00:00
292 lines
10 KiB
ReStructuredText
292 lines
10 KiB
ReStructuredText
.. _sriov_virtualization:
|
||
|
||
SR-IOV Virtualization
|
||
=====================
|
||
|
||
SR-IOV (Single Root Input/Output Virtualization) can isolate PCIe devices
|
||
to improve performance that is similar to bare-metal levels. SR-IOV consists
|
||
of two basic units: PF (Physical Function), which supports SR-IOV PCIe
|
||
extended capability and manages entire physical devices; and VF (Virtual
|
||
Function), a "lightweight" PCIe function which is a passthrough device for
|
||
VMs.
|
||
|
||
For details, refer to Chapter 9 of PCI-SIG's `PCI Express Base SpecificationRevision 4.0, Version 1.0 <https://pcisig.com/pci-express-architecture-configuration-space-test-specification-revision-40-version-10>`_.
|
||
|
||
SR-IOV Architectural Overview
|
||
-----------------------------
|
||
|
||
.. figure:: images/sriov-image1.png
|
||
:align: center
|
||
:name: SR-IOV-architecture-overview
|
||
|
||
SR-IOV Architectural Overview
|
||
|
||
- **SI** - A System Image known as a VM.
|
||
|
||
- **VI** - A Virtualization Intermediary known as a hypervisor.
|
||
|
||
- **SR-PCIM** - A Single Root PCI Manager; it is a software entity for
|
||
SR-IOV management.
|
||
|
||
- **PF** - A PCIe Function that supports the SR-IOV capability
|
||
and is accessible to an SR-PCIM, a VI, or an SI.
|
||
|
||
- **VF** - A “light-weight” PCIe Function that is directly accessible by an
|
||
SI.
|
||
|
||
SR-IOV Extended Capability
|
||
--------------------------
|
||
|
||
The SR-IOV Extended Capability defined here is a PCIe extended
|
||
capability that must be implemented in each PF device that supports the
|
||
SR-IOV feature. This capability is used to describe and control a PF’s
|
||
SR-IOV Capabilities.
|
||
|
||
.. figure:: images/sriov-image2.png
|
||
:align: center
|
||
:name: SR-IOV-extended-capability
|
||
|
||
SR-IOV Extended Capability
|
||
|
||
- **PCIe Extended Capability ID** - 0010h.
|
||
|
||
- **SR-IOV Capabilities** - VF Migration-Capable and ARI-Capable.
|
||
|
||
- **SR-IOV Control** - Enable/Disable VFs; VF migration state query.
|
||
|
||
- **SR-IOV Status** - VF Migration Status.
|
||
|
||
- **InitialVFs** - Indicates to the SR-PCIM the number of VFs that are
|
||
initially associated with the PF.
|
||
|
||
- **TotalVFs** - Indicates the maximum number of VFs that can be
|
||
associated with the PF.
|
||
|
||
- **NumVFs** - Controls the number of VFs that are visible. *NumVFs* <=
|
||
*InitialVFs* = *TotalVFs*.
|
||
|
||
- **Function Link Dependency** - The field used to describe
|
||
dependencies between PFs. VF dependencies are the same as the
|
||
dependencies of their associated PFs.
|
||
|
||
- **First VF Offset** - A constant that defines the Routing ID
|
||
offset of the first VF that is associated with the PF that contains
|
||
this Capability structure.
|
||
|
||
- **VF Stride** - Defines the Routing ID offset from one VF to the
|
||
next one for all VFs associated with the PF that contains this
|
||
Capability structure.
|
||
|
||
- **VF Device ID** - The field that contains the Device ID that should be
|
||
presented for every VF to the SI.
|
||
|
||
- **Supported Page Sizes** - The field that indicates the page sizes
|
||
supported by the PF.
|
||
|
||
- **System Page Size** - The field that defines the page size the system
|
||
will use to map the VFs’ memory addresses. Software must set the
|
||
value of the *System Page Size* to one of the page sizes set in the
|
||
*Supported Page Sizes* field.
|
||
|
||
- **VF BARs** - Fields that must define the VF’s Base Address
|
||
Registers (BARs). These fields behave as normal PCI BARs.
|
||
|
||
- **VF Migration State Array Offset** - Register that contains a
|
||
PF BAR relative pointer to the VF Migration State Array.
|
||
|
||
- **VF Migration State Array** – Located using the VF Migration
|
||
State Array Offset register of the SR-IOV Capability block.
|
||
|
||
For details, refer to the *PCI Express Base Specification Revision 4.0, Version 1.0 Chapter 9.3.3*.
|
||
|
||
SR-IOV Architecture In ACRN
|
||
---------------------------
|
||
|
||
.. figure:: images/sriov-image3.png
|
||
:align: center
|
||
:name: SR-IOV-architecure-in-acrn
|
||
|
||
SR-IOV Architectural in ACRN
|
||
|
||
1. A hypervisor detects a SR-IOV capable PCIe device in the physical PCI
|
||
device enumeration phase.
|
||
|
||
2. The hypervisor intercepts the PF’s SR-IOV capability and accesses whether
|
||
to enable/disable VF devices based on the *VF\_ENABLE* state. All
|
||
read/write requests for a PF device passthrough to the PF physical
|
||
device.
|
||
|
||
3. The hypervisor waits for 100ms after *VF\_ENABLE* is set and initializes
|
||
VF devices. The differences between a normal passthrough device and
|
||
SR-IOV VF device are physical device detection, BARs, and MSIx
|
||
initialization. The hypervisor uses *Subsystem Vendor ID* to detect the
|
||
SR-IOV VF physical device instead of *Vendor ID* since no valid
|
||
*Vendor ID* exists for the SR-IOV VF physical device. The VF BARs are
|
||
initialized by its associated PF’s SR-IOV capabilities, not PCI
|
||
standard BAR registers. The MSIx mapping base address is also from the
|
||
PF’s SR-IOV capabilities, not PCI standard BAR registers.
|
||
|
||
SR-IOV Passthrough VF Architecture In ACRN
|
||
------------------------------------------
|
||
|
||
.. figure:: images/sriov-image4.png
|
||
:align: center
|
||
:name: SR-IOV-vf-passthrough
|
||
|
||
SR-IOV VF Passthrough Architecture In ACRN
|
||
|
||
1. The SR-IOV VF device needs to bind the PCI-stud driver instead of the
|
||
vendor-specific VF driver before the device passthrough.
|
||
|
||
2. The user configures the ``acrn-dm`` boot parameter with the passthrough
|
||
SR-IOV VF device. When the User VM starts, ``acrn-dm`` invokes a
|
||
hypercall to set the *vdev-VF0* device in the User VM.
|
||
|
||
3. The hypervisor emulates *Device ID/Vendor ID* and *Memory Space Enable
|
||
(MSE)* in the configuration space for an assigned SR-IOV VF device. The
|
||
assigned VF *Device ID* comes from its associated PF’s capability. The
|
||
*Vendor ID* is the same as the PF’s *Vendor ID* and the *MSE* is always
|
||
set when reading the SR-IOV VF device's *CONTROL* register.
|
||
|
||
4. The vendor-specific VF driver in the target VM probes the assigned SR-IOV
|
||
VF device.
|
||
|
||
SR-IOV Initialization Flow
|
||
--------------------------
|
||
|
||
.. figure:: images/sriov-image5.png
|
||
:align: center
|
||
:name: SR-IOV-init-flow
|
||
|
||
SR-IOV Initialization Flow
|
||
|
||
When a SR-IOV capable device is initialized, all access to the
|
||
configuration space will passthrough to the physical device directly.
|
||
The Service VM can identify all capabilities of the device from the SR-IOV
|
||
extended capability and then create an *sysfs* node for SR-IOV management.
|
||
|
||
SR-IOV VF Enable Flow
|
||
---------------------
|
||
|
||
.. figure:: images/sriov-image6.png
|
||
:align: center
|
||
:width: 900px
|
||
:name: SR-IOV-enable-flow
|
||
|
||
SR-IOV VF Enable Flow
|
||
|
||
The application enables n VF devices via a SR-IOV PF device *sysfs* node.
|
||
The hypervisor intercepts all SR-IOV capability access and checks the
|
||
*VF\_ENABLE* state. If *VF\_ENABLE* is set, the hypervisor creates n
|
||
virtual devices after 100ms so that VF physical devices have enough time to
|
||
be created. The Service VM waits 100ms and then only accesses the first VF
|
||
device’s configuration space including *Class Code, Reversion ID, Subsystem
|
||
Vendor ID, Subsystem ID*. The Service VM uses the first VF device
|
||
information to initialize subsequent VF devices.
|
||
|
||
SR-IOV VF Disable Flow
|
||
----------------------
|
||
|
||
.. figure:: images/sriov-image7.png
|
||
:align: center
|
||
:name: SR-IOV-disable-flow
|
||
|
||
SR-IOV VF Disable Flow
|
||
|
||
The application disables SR-IOV VF devices by writing zero to the SR-IOV PF
|
||
device *sysfs* node. The hypervisor intercepts all SR-IOV capability
|
||
accesses and checks the *VF\_ENABLE* state. If *VF\_ENABLE* is clear, the
|
||
hypervisor makes VF virtual devices invisible from the Service VM so that all
|
||
access to VF devices will return 0xFFFFFFFF as an error. The VF physical
|
||
devices are removed within 1s of when *VF\_ENABLE* is clear.
|
||
|
||
SR-IOV VF Assignment Policy
|
||
---------------------------
|
||
|
||
.. figure:: images/sriov-image8.png
|
||
:align: center
|
||
:name: SR-IOV-vf-assignment
|
||
|
||
SR-IOV VF Assignment
|
||
|
||
1. All SR-IOV PF devices are managed by the Service VM.
|
||
|
||
2. Currently, the SR-IOV PF cannot passthrough to the User VM.
|
||
|
||
3. All VFs can passthrough to the User VM, but we do not recommend
|
||
a passthrough to high privilege VMs because the PF device may impact
|
||
the assigned VFs' functionality and stability.
|
||
|
||
SR-IOV Usage Guide In ACRN
|
||
--------------------------
|
||
|
||
We use the Intel 82576 NIC as an example in the following instructions. We
|
||
only support LaaG (Linux as a Guest).
|
||
|
||
1. Ensure that the 82576 VF driver is compiled into the User VM Kernel
|
||
(set *CONFIG\_IGBVF=y* in the Kernel Config).
|
||
|
||
#. When the Service VM boots up, the ``\ *lspci -v*\`` command indicates
|
||
that the Intel 82576 NIC devices have SR-IOV capability and their PF
|
||
drivers are ``igb``.
|
||
|
||
.. figure:: images/sriov-image9.png
|
||
:align: center
|
||
:name: 82576-pf
|
||
|
||
82576 SR-IOV PF devices
|
||
|
||
#. Input the ``\ *echo n > /sys/class/net/enp109s0f0/device/sriov\_numvfs*\``
|
||
command in the Service VM to enable n VF devices for the first PF
|
||
device (\ *enp109s0f0)*. The number *n* can’t be more than *TotalVFs*
|
||
which comes from the return value of command ``cat /sys/class/net/enp109s0f0/device/sriov\_totalvfs``. Here we use *n = 2* as an example.
|
||
|
||
.. figure:: images/sriov-image10.png
|
||
:align: center
|
||
:name: 82576-vf
|
||
|
||
82576 SR-IOV VF devices
|
||
|
||
.. figure:: images/sriov-image11.png
|
||
:align: center
|
||
:name: 82576-vf-nic
|
||
|
||
82576 SR-IOV VF NIC
|
||
|
||
#. Passthrough a SR-IOV VF device to guest.
|
||
|
||
a. Unbind the igbvf driver in the Service VM.
|
||
|
||
i. *modprobe pci\_stub*
|
||
|
||
ii. *echo "8086 10ca" > /sys/bus/pci/drivers/pci-stub/new\_id*
|
||
|
||
iii. *echo "0000:6d:10.0" >
|
||
/sys/bus/pci/devices/0000:6d:10.0/driver/unbind*
|
||
|
||
iv. *echo "0000:6d:10.0" >
|
||
/sys/bus/pci/drivers/pci-stub/bind*
|
||
|
||
b. Add the SR-IOV VF device parameter (“*-s X, passthru,6d/10/0*\ ”) in
|
||
the launch User VM script
|
||
|
||
.. figure:: images/sriov-image12.png
|
||
:align: center
|
||
:name: 82576-nic-passthru
|
||
|
||
Configure 82576 NIC as a Passthrough Device
|
||
|
||
c. Boot the User VM
|
||
|
||
SR-IOV Limitations In ACRN
|
||
--------------------------
|
||
|
||
1. The SR-IOV migration feature is not supported.
|
||
|
||
2. If one SR-IOV PF device is detected during the enumeration phase, but
|
||
not enough room exists for its total VF devices, the PF device will be
|
||
dropped. The platform uses the *MAX_PCI_DEV_NUM* ACRN configuration to
|
||
support the maximum number of PCI devices. Make sure *MAX_PCI_DEV_NUM* is
|
||
more than the number of all PCI devices, including the total SR-IOV VF
|
||
devices.
|