diff --git a/doc/developer-guides/images/network-virt-arch.png b/doc/developer-guides/images/network-virt-arch.png new file mode 100644 index 000000000..6079adf28 Binary files /dev/null and b/doc/developer-guides/images/network-virt-arch.png differ diff --git a/doc/developer-guides/images/network-virt-sos-infrastruct.png b/doc/developer-guides/images/network-virt-sos-infrastruct.png new file mode 100644 index 000000000..fc9a8222a Binary files /dev/null and b/doc/developer-guides/images/network-virt-sos-infrastruct.png differ diff --git a/doc/developer-guides/index.rst b/doc/developer-guides/index.rst index 6f02567bb..7e7e2827d 100644 --- a/doc/developer-guides/index.rst +++ b/doc/developer-guides/index.rst @@ -29,6 +29,7 @@ specific areas within the ACRN hypervisor system. GVT-g-porting.rst security-hld.rst watchdog-hld.rst + network-virt-hld.rst Contributing to the project *************************** diff --git a/doc/developer-guides/network-virt-hld.rst b/doc/developer-guides/network-virt-hld.rst new file mode 100644 index 000000000..d4ae9260e --- /dev/null +++ b/doc/developer-guides/network-virt-hld.rst @@ -0,0 +1,556 @@ +.. net-virt-hld: + +Network Virtualization +###################### + +Introduction +************ + +Virtio-net is the para-virtualization solution used in ACRN for +networking. The ACRN device model emulates virtual NICs for UOS and the +frontend virtio network driver, simulating the virtual NIC and following +the virtio specification. (Refer to :ref:`introduction` and +:ref:`virtio-hld` background introductions to ACRN and Virtio.) + +Supported Features Notes +************************ + +Here are some notes about Virtio-net support in ACRN: + +- Legacy devices are supported, modern devices are not supported +- Two virtqueues are used in virtio-net: RX queue and TX queue +- Indirect descriptor is supported +- TAP backend is supported +- Control queue is not supported +- NIC multiple queues are not supported + +Network Virtualization Architecture +=================================== + +ACRN's network virtualization architecture is shown below in +:numref:`net-virt-arch`, and illustrates the many necessary network +virtualization components that must cooperate for the UOS to send and +receive data from the outside world. + +.. figure:: images/network-virt-arch.png + :align: center + :width: 900px + :name: net-virt-arch + + Network Virtualization Architecture + +(The green components are parts of the ACRN solution, while the gray +components are parts of the Linux kernel.) + +Let's explore these components further. + +SOS/UOS Network Stack: + This is the standard Linux TCP/IP stack, currently the most + feature-rich TCP/IP implementation. + +virtio-net Frontend Driver: + This is the standard driver in the Linux Kernel for virtual Ethernet + devices. This driver matches devices with PCI vendor ID 0x1AF4 and PCI + Device ID 0x1000 (for legacy devices in our case) or 0x1041 (for modern + devices). The virtual NIC supports two virtqueues, one for transmitting + packets and the other for receiving packets. The frontend driver places + empty buffers into one virtqueue for receiving packets, and enqueues + outgoing packets into another virtqueue for transmission. The size of + each virtqueue is 1024, configurable in the virtio-net backend driver. + +ACRN Hypervisor: + The ACRN hypervisor is a type 1 hypervisor, running directly on the + bare-metal hardware, and suitable for a variety of IoT and embedded + device solutions. It fetches and analyzes the guest instructions, puts + the decoded information into the shared page as an IOREQ, and notifies + or interrupts the VHM module in the SOS for processing. + +VHM Module: + The Virtio and Hypervisor Service Module (VHM) is a kernel module in the + Service OS (SOS) acting as a middle layer to support the device model + and hypervisor. The VHM forwards a IOREQ to the virtio-net backend + driver for processing. + +ACRN Device Model and virtio-net Backend Driver: + The ACRN Device Model (DM) gets an IOREQ from a shared page and calls + the virtio-net backend driver to process the request. The backend driver + receives the data in a shared virtqueue and sends it to the TAP device. + +Bridge and Tap Device: + Bridge and Tap are standard virtual network infrastructures. They play + an important role in communication among the SOS, the UOS, and the + outside world. + +IGB Driver: + IGB is the physical Network Interface Card (NIC) Linux kernel driver + responsible for sending data to and receiving data from the physical + NIC. + +The virtual network card (NIC) is implemented as a virtio legacy device +in the ACRN device model (DM). It is registered as a PCI virtio device +to the guest OS (UOS) and uses the standard virtio-net in the Linux kernel as +its driver (the guest kernel should be built with +``CONFIG_VIRTIO_NET=y``). + +The virtio-net backend in DM forwards the data received from the +frontend to the TAP device, then from the TAP device to the bridge, and +finally from the bridge to the physical NIC driver, and vice versa for +returning data from the NIC to the frontend. + +ACRN Virtio-Network Calling Stack +********************************* + +Various components of ACRN network virtualization are shown in the +architecture diagram shows in :numref:`net-virt-arch`. In this section, +we will use UOS data transmission (TX) and reception (RX) examples to +explain step-by-step how these components work together to implement +ACRN network virtualization. + +Initialization in Device Model +============================== + +virtio_net_init +--------------- + +- Present frontend a virtual PCI based NIC +- Setup control plan callbacks +- Setup data plan callbacks, including TX, RX +- Setup tap backend + +Initialization in virtio-net Frontend Driver +============================================ + +virtio_pci_probe +---------------- + +- Construct virtio device using virtual pci device and register it to + virtio bus + +virtio_dev_probe --> virtnet_probe --> init_vqs +----------------------------------------------- + +- Register network driver +- Setup shared virtqueues + +ACRN UOS TX FLOW +================ + +The following shows the ACRN UOS network TX flow, using TCP as an +example, showing the flow through each layer:. + +UOS TCP Layer +------------- + +.. code-block:: c + + tcp_sendmsg --> + tcp_sendmsg_locked --> + tcp_push_one --> + tcp_write_xmit --> + tcp_transmit_skb --> + +UOS IP Layer +------------ + +.. code-block:: c + + ip_queue_xmit --> + ip_local_out --> + __ip_local_out --> + dst_output --> + ip_output --> + ip_finish_output --> + ip_finish_output2 --> + neigh_output --> + neigh_resolve_output --> + +UOS MAC Layer +------------- + +.. code-block:: c + + dev_queue_xmit --> + __dev_queue_xmit --> + dev_hard_start_xmit --> + xmit_one --> + netdev_start_xmit --> + __netdev_start_xmit --> + + +UOS MAC Layer virtio-net Frontend Driver +---------------------------------------- + +.. code-block:: c + + start_xmit --> // virtual NIC driver xmit in virtio_net + xmit_skb --> + virtqueue_add_outbuf --> // add out buffer to shared virtqueue + virtqueue_add --> + + virtqueue_kick --> // notify the backend + virtqueue_notify --> + vp_notify --> + iowrite16 --> // trap here, HV will first get notified + +ACRN Hypervisor +--------------- + +.. code-block:: c + + vmexit_handler --> // vmexit because VMX_EXIT_REASON_IO_INSTRUCTION + pio_instr_vmexit_handler --> + emulate_io --> // ioreq cant be processed in HV, forward it to VHM + acrn_insert_request_wait --> + fire_vhm_interrupt --> // interrupt SOS, VHM will get notified + +VHM Module +---------- + +.. code-block:: c + + vhm_intr_handler --> // VHM interrupt handler + tasklet_schedule --> + io_req_tasklet --> + acrn_ioreq_distribute_request --> // ioreq can't be processed in VHM, forward it to device DM + acrn_ioreq_notify_client --> + wake_up_interruptible --> // wake up DM to handle ioreq + +ACRN Device Model / virtio-net Backend Driver +--------------------------------------------- + +.. code-block:: c + + handle_vmexit --> + vmexit_inout --> + emulate_inout --> + pci_emul_io_handler --> + virtio_pci_write --> + virtio_pci_legacy_write --> + virtio_net_ping_txq --> // start TX thread to process, notify thread return + virtio_net_tx_thread --> // this is TX thread + virtio_net_proctx --> // call corresponding backend (tap) to process + virtio_net_tap_tx --> + writev --> // write data to tap device + +SOS TAP Device Forwarding +------------------------- + +.. code-block:: c + + do_writev --> + vfs_writev --> + do_iter_write --> + do_iter_readv_writev --> + call_write_iter --> + tun_chr_write_iter --> + tun_get_user --> + netif_receive_skb --> + netif_receive_skb_internal --> + __netif_receive_skb --> + __netif_receive_skb_core --> + + +SOS Bridge Forwarding +--------------------- + +.. code-block:: c + + br_handle_frame --> + br_handle_frame_finish --> + br_forward --> + __br_forward --> + br_forward_finish --> + br_dev_queue_push_xmit --> + +SOS MAC Layer +------------- + +.. code-block:: c + + dev_queue_xmit --> + __dev_queue_xmit --> + dev_hard_start_xmit --> + xmit_one --> + netdev_start_xmit --> + __netdev_start_xmit --> + + +SOS MAC Layer IGB Driver +------------------------ + +.. code-block:: c + + igb_xmit_frame --> // IGB physical NIC driver xmit function + +ACRN UOS RX FLOW +================ + +The following shows the ACRN UOS network RX flow, using TCP as an example. +Let's start by receiving a device interrupt. (Note that the hypervisor +will first get notified when receiving an interrupt even in passthrough +cases.) + +Hypervisor Interrupt Dispatch +----------------------------- + +.. code-block:: c + + vmexit_handler --> // vmexit because VMX_EXIT_REASON_EXTERNAL_INTERRUPT + external_interrupt_vmexit_handler --> + dispatch_interrupt --> + common_handler_edge --> + ptdev_interrupt_handler --> + ptdev_enqueue_softirq --> // Interrupt will be delivered in bottom-half softirq + + +Hypervisor Interrupt Injection +------------------------------ + +.. code-block:: c + + do_softirq --> + ptdev_softirq --> + vlapic_intr_msi --> // insert the interrupt into SOS + + start_vcpu --> // VM Entry here, will process the pending interrupts + +SOS MAC Layer IGB Driver +------------------------ + +.. code-block:: c + + do_IRQ --> + ... + igb_msix_ring --> + igbpoll --> + napi_gro_receive --> + napi_skb_finish --> + netif_receive_skb_internal --> + __netif_receive_skb --> + __netif_receive_skb_core -- + +SOS Bridge Forwarding +--------------------- + +.. code-block:: c + + br_handle_frame --> + br_handle_frame_finish --> + br_forward --> + __br_forward --> + br_forward_finish --> + br_dev_queue_push_xmit --> + +SOS MAC Layer +------------- + +.. code-block:: c + + dev_queue_xmit --> + __dev_queue_xmit --> + dev_hard_start_xmit --> + xmit_one --> + netdev_start_xmit --> + __netdev_start_xmit --> + +SOS MAC Layer TAP Driver +------------------------ + +.. code-block:: c + + tun_net_xmit --> // Notify and wake up reader process + +ACRN Device Model / virtio-net Backend Driver +--------------------------------------------- + +.. code-block:: c + + virtio_net_rx_callback --> // the tap fd get notified and this function invoked + virtio_net_tap_rx --> // read data from tap, prepare virtqueue, insert interrupt into the UOS + vq_endchains --> + vq_interrupt --> + pci_generate_msi --> + +VHM Module +---------- + +.. code-block:: c + + vhm_dev_ioctl --> // process the IOCTL and call hypercall to inject interrupt + hcall_inject_msi --> + +ACRN Hypervisor +--------------- + +.. code-block:: c + + vmexit_handler --> // vmexit because VMX_EXIT_REASON_VMCALL + vmcall_vmexit_handler --> + hcall_inject_msi --> // insert interrupt into UOS + vlapic_intr_msi --> + +UOS MAC Layer virtio_net Frontend Driver +---------------------------------------- + +.. code-block:: c + + vring_interrupt --> // virtio-net frontend driver interrupt handler + skb_recv_done --> //registed by virtnet_probe-->init_vqs-->virtnet_find_vqs + virtqueue_napi_schedule --> + __napi_schedule --> + virtnet_poll --> + virtnet_receive --> + receive_buf --> + +UOS MAC Layer +------------- + +.. code-block:: c + + napi_gro_receive --> + napi_skb_finish --> + netif_receive_skb_internal --> + __netif_receive_skb --> + __netif_receive_skb_core --> + +UOS IP Layer +------------ + +.. code-block:: c + + ip_rcv --> + ip_rcv_finish --> + dst_input --> + ip_local_deliver --> + ip_local_deliver_finish --> + + +UOS TCP Layer +------------- + +.. code-block:: c + + tcp_v4_rcv --> + tcp_v4_do_rcv --> + tcp_rcv_established --> + tcp_data_queue --> + tcp_queue_rcv --> + __skb_queue_tail --> + + sk->sk_data_ready --> // application will get notified + +How to Use +========== + +The network infrastructure shown in :numref:`net-virt-infra` needs to be +prepared in the SOS before we start. We need to create a bridge and at +least one tap device (two tap devices are needed to create a dual +virtual NIC) and attach a physical NIC and tap device to the bridge. + +.. figure:: images/network-virt-sos-infrastruct.png + :align: center + :width: 900px + :name: net-virt-infra + + Network Infrastructure in SOS + +You can use Linux commands (e.g. ip, brctl) to create this network. In +our case, we use systemd to automatically create the network by default. +You can check the files with prefix 50- in the SOS +``/usr/lib/systemd/network/``: + +- `50-acrn.netdev `__ +- `50-acrn.network `__ +- `50-acrn_tap0.netdev `__ +- `50-eth.network `__ + +When the SOS is started, run ``ifconfig`` to show the devices created by +this systemd configuration: + +.. code-block:: none + + acrn-br0 Link encap:Ethernet HWaddr B2:50:41:FE:F7:A3 + inet addr:10.239.154.43 Bcast:10.239.154.255 Mask:255.255.255.0 + inet6 addr: fe80::b050:41ff:fefe:f7a3/64 Scope:Link + UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 + RX packets:226932 errors:0 dropped:21383 overruns:0 frame:0 + TX packets:14816 errors:0 dropped:0 overruns:0 carrier:0 + collisions:0 txqueuelen:1000 + RX bytes:100457754 (95.8 Mb) TX bytes:83481244 (79.6 Mb) + + acrn_tap0 Link encap:Ethernet HWaddr F6:A7:7E:52:50:C6 + UP BROADCAST MULTICAST MTU:1500 Metric:1 + RX packets:0 errors:0 dropped:0 overruns:0 frame:0 + TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 + collisions:0 txqueuelen:1000 + RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) + + enp3s0 Link encap:Ethernet HWaddr 98:4F:EE:14:5B:74 + inet6 addr: fe80::9a4f:eeff:fe14:5b74/64 Scope:Link + UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 + RX packets:279174 errors:0 dropped:0 overruns:0 frame:0 + TX packets:69923 errors:0 dropped:0 overruns:0 carrier:0 + collisions:0 txqueuelen:1000 + RX bytes:107312294 (102.3 Mb) TX bytes:87117507 (83.0 Mb) + Memory:82200000-8227ffff + + lo Link encap:Local Loopback + inet addr:127.0.0.1 Mask:255.0.0.0 + inet6 addr: ::1/128 Scope:Host + UP LOOPBACK RUNNING MTU:65536 Metric:1 + RX packets:16 errors:0 dropped:0 overruns:0 frame:0 + TX packets:16 errors:0 dropped:0 overruns:0 carrier:0 + collisions:0 txqueuelen:1000 + RX bytes:1216 (1.1 Kb) TX bytes:1216 (1.1 Kb) + +Run ``brctl show`` to see the bridge ``acrn-br0`` and attached devices: + +.. code-block:: none + + bridge name bridge id STP enabled interfaces + + acrn-br0 8000.b25041fef7a3 no acrn_tap0 + enp3s0 + +Add a pci slot to the device model acrn-dm command line (mac address is +optional): + +.. code-block:: none + + -s 4,virtio-net,,[mac=] + +When the UOS is lauched, run ``ifconfig`` to check the network. enp0s4r +is the virtual NIC created by acrn-dm: + +.. code-block:: none + + enp0s4 Link encap:Ethernet HWaddr 00:16:3E:39:0F:CD + inet addr:10.239.154.186 Bcast:10.239.154.255 Mask:255.255.255.0 + inet6 addr: fe80::216:3eff:fe39:fcd/64 Scope:Link + UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 + RX packets:140 errors:0 dropped:8 overruns:0 frame:0 + TX packets:46 errors:0 dropped:0 overruns:0 carrier:0 + collisions:0 txqueuelen:1000 + RX bytes:110727 (108.1 Kb) TX bytes:4474 (4.3 Kb) + + lo Link encap:Local Loopback + inet addr:127.0.0.1 Mask:255.0.0.0 + inet6 addr: ::1/128 Scope:Host + UP LOOPBACK RUNNING MTU:65536 Metric:1 + RX packets:0 errors:0 dropped:0 overruns:0 frame:0 + TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 + collisions:0 txqueuelen:1000 + RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) + +Performance Estimation +====================== + +We've introduced the network virtualization solution in ACRN, from the +top level architecture to the detailed TX and RX flow. Currently, the +control plane and data plane are all processed in ACRN device model, +which may bring some overhead. But this is not a bottleneck for 1000Mbit +NICs or below. Network bandwidth for virtualization can be very close to +the native bandwidgh. For high speed NIC (e.g. 10Gb or above), it is +necessary to separate the data plane from the control plane. We can use +vhost for acceleration. For most IoT scenarios, processing in user space +is simple and reasonable. diff --git a/doc/substitutions.txt b/doc/substitutions.txt index 82ef0d288..531005937 100644 --- a/doc/substitutions.txt +++ b/doc/substitutions.txt @@ -1,6 +1,6 @@ .. |br| raw:: html .. force a line break in HTML output (blank lines needed here) -
+
.. These are replacement strings for non-ASCII characters used within the project using the same name as the html entity names (e.g., ©) for that character