diff --git a/.gitignore b/.gitignore index 94cf62aeb..f670f4ead 100644 --- a/.gitignore +++ b/.gitignore @@ -1,4 +1,4 @@ doxygen _build -devicemodel -hypervisor +*.bak +*.sav diff --git a/hypervisor_primer/index.rst b/hypervisor_primer/index.rst deleted file mode 100644 index 6e058fee5..000000000 --- a/hypervisor_primer/index.rst +++ /dev/null @@ -1,23 +0,0 @@ -.. _hypervisor_primer: - -Hypervisor Developer Primer -########################### - -This Developer Primer introduces the fundamental components and -virtualization technology used by this open source reference hypervisor -stack. Code level documentation and additional details can be found by -consulting the :ref:`hypercall_apis` documentation and the source code -in GitHub. - -The Hypervisor acts as a host with full control of the processor(s) and -the hardware (physical memory, interrupt management and I/O). It -provides the Guest OS with an abstraction of a virtual processor, -allowing the guest to think it is executing directly on a logical -processor. - -.. _source tree structure: - -Source Tree Structure -********************* - -blah blah diff --git a/index.rst b/index.rst index 7c97a2832..dd8a3e010 100644 --- a/index.rst +++ b/index.rst @@ -27,7 +27,7 @@ Sections introduction/index.rst hardware.rst getting_started/index.rst - hypervisor_primer/index.rst + primer/index.rst release_notes.rst contribute.rst api/index.rst diff --git a/primer/images/primer-dma-address-mapping.png b/primer/images/primer-dma-address-mapping.png new file mode 100644 index 000000000..377f40c06 Binary files /dev/null and b/primer/images/primer-dma-address-mapping.png differ diff --git a/primer/images/primer-host-gdt.png b/primer/images/primer-host-gdt.png new file mode 100644 index 000000000..27a61fca1 Binary files /dev/null and b/primer/images/primer-host-gdt.png differ diff --git a/primer/images/primer-hypervisor-interrupt.png b/primer/images/primer-hypervisor-interrupt.png new file mode 100644 index 000000000..f175d8007 Binary files /dev/null and b/primer/images/primer-hypervisor-interrupt.png differ diff --git a/primer/images/primer-mem-layout.png b/primer/images/primer-mem-layout.png new file mode 100644 index 000000000..edc0f7686 Binary files /dev/null and b/primer/images/primer-mem-layout.png differ diff --git a/primer/images/primer-pirq-routing.png b/primer/images/primer-pirq-routing.png new file mode 100644 index 000000000..c88827b95 Binary files /dev/null and b/primer/images/primer-pirq-routing.png differ diff --git a/primer/images/primer-pv-mapping.png b/primer/images/primer-pv-mapping.png new file mode 100644 index 000000000..74dda4d13 Binary files /dev/null and b/primer/images/primer-pv-mapping.png differ diff --git a/primer/images/primer-sos-ept-mapping.png b/primer/images/primer-sos-ept-mapping.png new file mode 100644 index 000000000..8e60e6737 Binary files /dev/null and b/primer/images/primer-sos-ept-mapping.png differ diff --git a/primer/images/primer-symmetric-io.png b/primer/images/primer-symmetric-io.png new file mode 100644 index 000000000..ddafaef81 Binary files /dev/null and b/primer/images/primer-symmetric-io.png differ diff --git a/primer/images/primer-uos-ept-mapping.png b/primer/images/primer-uos-ept-mapping.png new file mode 100644 index 000000000..ff3e20ea2 Binary files /dev/null and b/primer/images/primer-uos-ept-mapping.png differ diff --git a/primer/images/primer-virtio-net.png b/primer/images/primer-virtio-net.png new file mode 100644 index 000000000..15d98a42e Binary files /dev/null and b/primer/images/primer-virtio-net.png differ diff --git a/primer/index.rst b/primer/index.rst new file mode 100644 index 000000000..581da1274 --- /dev/null +++ b/primer/index.rst @@ -0,0 +1,903 @@ +.. _primer: + +Developer Primer +################ + +This Developer Primer introduces the fundamental components of ACRN and +the virtualization technology used by this open source reference stack. +Code level documentation and additional details can be found by +consulting the :ref:`acrn_apis` documentation and the `source code in +GitHub`_. + +.. _source code in GitHub: https://github.com/projectacrn + +The ACRN Hypervisor acts as a host with full control of the processor(s) +and the hardware (physical memory, interrupt management and I/O). It +provides the User OS with an abstraction of a virtual platform, allowing +the guest to behave as if were executing directly on a logical +processor. + +.. _source tree structure: + +Source Tree Structure +********************* + +Understanding the ACRN hypervisor and the ACRN device model source tree +structure is helpful for locating the code associated with a particular +hypervisor and device emulation feature. The ACRN hypervisor and the +ACRN device model source tree provides the following top-level +directories: + +ACRN hypervisor source tree +=========================== + +**arch/x86/** + hypervisor architecture, which includes arch x86 related source files + to run the hypervisor, such as CPU, memory, interrupt, and vmx. + +**boot/** + boot stuff mainly including ACPI related + +**bsp/** + board support package, used to support NUC with UEFI + +**common/** + common source files for hypervisor, which including VM hypercall + definition, VM main loop, and VM software loader + +**debug/** + all debug related source files, which will not be compiled for + release version, mainly including console, uart, logmsg and shell + +**include/** + include files for all public APIs (doxygen comments in these source + files are used to generate the :ref:`acrn_apis` documentation) + +**lib/** + runtime service libraries + +ACRN Device Model source tree +============================= + +**core/** + ACRN Device model core logic (main loop, SOS interface, etc.) + +**hw/** + Hardware emulation code, with the following subdirectories: + + **acpi/** + ACPI table generator. + + **pci/** + PCI devices, including VBS-Us (virtio backend drivers in user-space). + + **platform/** + platform devices such as uart, and keyboard. + +**include/** + include files for all public APIs (doxygen comments in these source + files are used to generate the :ref:`acrn_apis` documentation) + +**samples/** + include files for all public APIs (doxygen comments in these source + +ACRN documentation source tree +============================== + +Project ACRN documentation is written using the reStructuredText markup +language (.rst file extension) with Sphinx extensions, and processed +using Sphinx to create a formatted stand-alone website, (the one you're +reading now.) Developers can view this content either in its raw form as +.rst markup files in the acrn-documentation repo, or you can generate +the HTML content and view it with a web browser directly on your +workstation, useful if you're contributing documentation to the project. + +**api/** + ReST files for API document generation + +**custom-doxygen/** + Customization files for doxygen-generated html output (while + generated, we currently don't include the doxygen html output but do use + the XML output to feed into the Sphinx-generation process) + +**getting_started/** + ReST files and images for the Getting Started Guide + +**primer/** + ReST files and images for the Developer Primer + +**images/** + Image files not specific to a document (logos, and such) + +**introduction/** + ReST files and images for the Introduction to Project ACRN + +**scripts/** + Files used to assist building the documentation set + +**static/** + Sphinx folder for extras added to the generated output (such as custom + CSS additions) + +CPU virtualization +****************** + +The ACRN hypervisor uses static partitioning of the physical CPU cores, +providing each User OS a virtualized environment containing at least one +statically assigned physical CPU core. The CPUID features for a +partitioned physical core is the same as the native CPU features. CPU +power management (Cx/Px) is managed by the User OS. + +The supported Intel |reg| NUC platform (see :ref:`hardware`) has a CPU +with four cores. The Service OS is assigned one core and the other three +cores are assigned to the User OS. ``XSAVE`` and ``XRSTOR`` instructions +(used to perform a full save/restore of the extended state in the +processor to/from memory) are currently not supported in the User OS. +(The kernel boot parameters must specify ``noxsave``). Processor core +sharing among User OSes is planned for a future release. + +The following sections introduce CPU virtualization related +concepts and technologies. + +Host GDT +======== + +The ACRN hypervisor initializes the host Global Descriptor Table (GDT), +used to define the characteristics of the various memory areas during +program execution. Code Segment ``CS:0x8`` and Data Segment ``DS:0x10`` +are configured as Hypervisor selectors, with their settings in host the +GDT as shown in :numref:`host-gdt`: + +.. figure:: images/primer-host-gdt.png + :align: center + :name: host-gdt + + Host GDT + +Host IDT +======== + +The ACRN hypervisor installs interrupt gates for both Exceptions and +Vectors. That means exceptions and interrupts will automatically disable +interrupts. The ``HOST_GDT_RING0_CODE_SEL`` is used in the Host IDT +table. + +Guest SMP Booting +================= + +The Bootstrap Processor (BSP) vCPU for the User OS boots into x64 long +mode directly, while the Application Processors (AP) vCPU boots into +real mode. The virtualized Local Advanced Programmable Interrupt +Controller (vLAPIC) for the User OS in the hypervisor emulates the +INIT/STARTUP signals. + +The AP vCPU belonging to the User OS begins in an infinite loop, waiting +for an INIT signal. Once the User OS issues a Startup IPI (SIPI) signal +to another vCPU, the vLAPIC traps the request, resets the target vCPU, +and then enters the ``INIT->STARTUP#1->STARTUP#2`` cycle to boot the +vCPUs for the User OS. + +VMX configuration +================= + +ACRN hypervisor has the Virtual Machine configuration (VMX) shown in +:numref:`VMX_MSR` below. (These configuration settings may change in the future, according to +virtualization policies.) + +.. table:: VMX Configuration + :align: center + :widths: auto + :name: VMX_MSR + + +----------------------------------------+----------------+---------------------------------------+ + | **VMX MSR** | **Bits** | **Description** | + +========================================+================+=======================================+ + | **MSR\_IA32\_VMX\_PINBASED\_CTLS** | Bit0 set | Enable External IRQ VM Exit | + + +----------------+---------------------------------------+ + | | Bit6 set | Enable HV pre-40ms Preemption timer | + + +----------------+---------------------------------------+ + | | Bit7 clr | Post interrupt did not support | + +----------------------------------------+----------------+---------------------------------------+ + | **MSR\_IA32\_VMX\_PROCBASED\_CTLS** | Bit25 set | Enable I/O bitmap | + + +----------------+---------------------------------------+ + | | Bit28 set | Enable MSR bitmap | + + +----------------+---------------------------------------+ + | | Bit19,20 set | Enable CR8 store/load | + +----------------------------------------+----------------+---------------------------------------+ + | **MSR\_IA32\_VMX\_PROCBASED\_CTLS2** | Bit1 set | Enable EPT | + + +----------------+---------------------------------------+ + | | Bit7 set | Allow guest real mode | + +----------------------------------------+----------------+---------------------------------------+ + | **MSR\_IA32\_VMX\_EXIT\_CTLS** | Bit15 | VMX Exit auto ack vector | + + +----------------+---------------------------------------+ + | | Bit18,19 | MSR IA32\_PAT save/load | + + +----------------+---------------------------------------+ + | | Bit20,21 | MSR IA32\_EFER save/load | + + +----------------+---------------------------------------+ + | | Bit9 | 64-bit mode after VM Exit | + +----------------------------------------+----------------+---------------------------------------+ + + +CPUID and Guest TSC calibration +=============================== + +User OS access to CPUID will be trapped by ACRN hypervisor, however +the ACRN hypervisor will pass through most of the native CPUID +information to the guest, except the virtualized CPUID 0x1 (to +provide fake x86_model). + +The Time Stamp Counter (TSC) is a 64-bit register present on all x86 +processors that counts the number of cycles since reset. ACRN hypervisor +also virtualizes ``MSR_PLATFORM_INFO`` and ``MSR_ATOM_FSB_FREQ``. + +RDTSC/RDTSCP +============ + +User OS vCPU reads of ``RDTSC``, ``RDTSCP``, or ``MSR_IA32_TSC_AUX`` +will not make the VM Exit to the hypervisor. Thus the vCPUID provided by +``MSR_IA32_TSC_AUX`` can be changed via the User OS. + +The ``RDTSCP`` instruction is widely used by the ACRN hypervisor to +identify the current CPU (and read the current value of the processor's +time-stamp counter). Because there is no VM Exit for +``MSR_IA32_TSC_AUX`` msr register, the hypervisor will save and restore +the ``MSR_IA32_TSC_AUX`` value on every VM Exit and Enter. Before the +hypervisor restores the host CPU ID, we must not use a ``RDTSCP`` +instruction because it would return the vCPU ID instead of host CPU ID. + +CR Register virtualization +========================== + +Guest CR8 access will make the VM Exit, and is emulated in the +hypervisor for vLAPIC to update its PPR register. Guest access to CR3 +will not make the VM Exit. + +MSR BITMAP +========== + +In the ACRN hypervisor, only these module-specific registers (MSR) are +supported: + +**MSR_IA32_TSC_DEADLINE** + emulates Guest TSC timer program + +**MSR_PLATFORM_INFO** + emulates a fake X86 module + +**MSR_ATOM_FSB_FREQ** + provides the CPU frequency directly via this MSR to avoid TSC calibration + +I/O BITMAP +========== + +All User OS I/O port accesses are trapped into the ACRN hypervisor by +default. Most of the Service OS I/O port accesses are not trapped into +the ACRN hypervisor, allowing the Service OS direct access to the +hardware port. + +The Service OS I/O trap policy is: + +**0x3F8/0x3FC** + for emulated vUART inside hypervisor for SOS only, will be trapped + +**0x20/0xA0/0x460** + for vPIC emulation in hypervisor, will be trapped + +**0xCF8/0xCFC** + for hypervisor PCI device interception, will be trapped + +Exceptions +========== + +The User OS handles its exceptions inside the VM, including page fault, +GP, etc. A #MC and #DB exception causes a VM Exit to the ACRN hypervisor +console. + +Memory virtualization +********************* + +ACRN hypervisor provides memory virtualization by using a static +partition of system memory. Each virtual machine owns its own contiguous +partition of memory, with the Service OS staying in lower memory and the +User OS instances in high memory. (High memory is memory which is not +permanently mapped in the kernel address space, while Low Memory is +always mapped, so you can access it in the kernel simply by +dereferencing a pointer.) In future implementations, this will evolve to +utilize EPT/VT-d. + +ACRN hypervisor memory is not visible to any User OS. In the ACRN +hypervisor, there are a few memory accesses that need to work +efficiently: + +- ACRN hypervisor to access host memory +- vCPU per VM to access guest memory +- vCPU per VM to access host memory +- vCPU per VM to access MMIO memory + +The rest of this section introduces how these kinds of memory accesses +are managed. It gives an overview of physical memory layout, +Paravirtualization (MMU) memory mapping in the hypervisor and VMs, and +Host-Guest Extended Page Table (EPT) memory mapping for each VM. + +Physical Memory Layout +====================== + +The Physical Memory Layout Example for Service OS & User OS is shown in +:numref:`primer-mem-layout` below: + +.. figure:: images/primer-mem-layout.png + :align: center + :name: primer-mem-layout + + Memory Layout + +:numref:`primer-mem-layout` shows an example of physical memory layout +of the Service and User OS. The Service OS accepts the whole e820 table +(all usable memory address ranges not reserved for use by the BIOS) +after filtering out the Hypervisor memory too. From the SOS's point of +view, it takes control of all available physical memory, including User +OS memory, not used by the hypervisor (or BIOS). Each User OSes memory +is allocated from (High) SOS memory and the User OS only owns this +section of memory control. + +Some of the physical memory of a 32-bit machine, needs to be sacrificed +by making it hidden so memory-mapped I/O (MMIO) devices have room to +communicate. This creates an MMIO hole for VMs to access some range of +MMIO addresses directly for communicating to devices; or they may need +the hypervisor to trap some range of MMIO to do device emulation. This +access control is done through EPT mapping. + +PV (MMU) Memory Mapping in the Hypervisor +========================================= + +.. figure:: images/primer-pv-mapping.png + :align: center + :name: primer-pv-mapping + + ACRN Hypervisor PV Mapping Example + +The ACRN hypervisor is trusted and can access and control all system +memory, as shown in :numref:`primer-pv-mapping`. Because the hypervisor +is running in protected mode, an MMU page table must be prepared for its +PV translation. To simplify things, the PV translation page table is set +as a 1:1 mapping. Some MMIO range mappings could be removed if they are +not needed. This PV page table is created when the hypervisor memory is +first initialized. + +PV (MMU) Memory Mapping in VMs +============================== + +As mentioned earlier, the Primary vCPU starts to run in protected mode +when its VM is started. But before it begins, a temporary PV (MMU) page +table must be prepared.. + +This page table is a 1:1 mapping for 4 Gb, and only lives for a short +time when the vCPU first runs. After the vCPU starts to run its kernel +image (for example Linux\*), the kernel will create its own PV page +tables, after which, the temporary page table will be obsoleted. + +Host-Guest (EPT) Memory Mapping +=============================== + +The VMs (both SOS and UOS) need to create an Extended Page Table (EPT) to +access the host physical memory based on its guest physical memory. The +guest VMs also need to set an MMIO trap to trigger EPT violations for +device emulation (such as IOAPIC, and LAPIC). This memory layout is +shown in :numref:`primer-sos-ept-mapping`: + +.. figure:: images/primer-sos-ept-mapping.png + :align: center + :name: primer-sos-ept-mapping + + SOS EPT Mapping Example + +The SOS takes control of all the host physical memory space: its EPT +mapping covers almost all of the host memory except that reserved for +the hypervisor (HV) and a few MMIO trap ranges for IOAPIC & LAPIC +emulation. The guest to host mapping for SOS is 1:1. + +.. figure:: images/primer-uos-ept-mapping.png + :align: center + :name: primer-uos-ept-mapping + + UOS EPT Mapping Example + +However, for the UOS, its memory EPT mapping is linear but with an +offset (as shown in :numref:`primer-uos-ept-mapping`). The MMIO hole is +not mapped to trap all MMIO accesses from the UOS (and do emulating in +the device model). To support pass through devices in the future, some +MMIO range mapping may be added. + +Graphic mediation +***************** + +Intel |reg| Graphics Virtualization Technology –g (Intel |reg| GVT-g) +provides GPU sharing capability to multiple VMs by using a mediated +pass-through techniquer. This allows a VM to access performance critical +I/O resources (usually partitioned) directly, without intervention from +the hypervisor in most cases. + +Privileged operations from this VM are trap-and-emulated to provide +secure isolation among VMs. The Hypervisor must ensure that no +vulnerability is exposed when assigning performance-critical resource to +each VM. When a performance-critical resource cannot be partitioned, a +scheduler must be implemented (either in software or hardware) to allow +time-based sharing among multiple VMs. In this case, the device must +allow the hypervisor to save and restore the hardware state associated +with the shared resource, either through direct I/O register read/write +(when there is no software invisible state) or through a device-specific +context save/restore mechanism (where there is a software invisible +state). + +In the initial release of Project ACRN, graphic mediation is not +enabled, and is planned for a future release. + +I/O emulation +************* + +The I/O path is explained in the :ref:`ACRN-io-mediator` section of the +:ref:`introduction`. The following sections, provide additional device +assignment management and PIO/MMIO trap flow introduction. + +Device Assignment Management +============================ + +ACRN hypervisor provides major device assignment management. Since the +hypervisor owns all native vectors and IRQs, there must be a mapping +table to handle the Guest IRQ/Vector to Host IRQ/Vector. Currently we +assign all devices to VM0 except the UART. + +If a PCI device (with MSI/MSI-x) is assigned to Guest, the User OS will +program the PCI config space and set the guest vector to this device. A +Hypercall ``CWP_VM_PCI_MSIX_FIXUP`` is provided. Once the guest programs +the guest vector, the User OS may call this hypercall to notify the ACRN +hypervisor. The hypervisor allocates a host vector, creates a guest-host +mapping relation, and replaces the guest vector with a real native +vector for the device: + +**PCI MSI/MSI-X** + PCI Message Signalled Interrupts (MSI/MSX-x) from + devices can be triggered from a hypercall when a guest program + vectors. All PCI devices are programed with real vectors + allocated by the Hypervisor. + +**PCI/INTx** + Device assignment is triggered when the guest programs + the virtual Advanced I/O Programmable Interrupt Controller + (vIOAPC) Redirection Table Entries (RTE). + +**Legacy** + Legacy devices are assigned to VM0. + +User OS device assignment is similar to the above, except the User OS +doesn't call hypercall. Instead, the Guest program PCI configuration +space will be trapped into the Device Module, and Device Module may +issue hypercall to notify hypervisor the guest vector is changing. + +Currently, there are two types of I/O Emulation supported: MMIO and +PORTIO trap handling. MMIO emulation is triggered by an EPT violation +VMExit only. If there is an EPT misconfiguration and VMExit occurs, the +hypervisor will halt the system. (Because the hypervisor set up all EPT +page table mapping at the beginning of the Guest boot, there should not +be an EPT misconfiguration.) + +There are multiple places where I/O emulation can happen - in ACRN +hypervisor, Service OS Kernel VHM module, or in the Service OS Userland +ACRN Device Module. + +PIO/MMIO trap Flow +================== + +Here is a description of the PIO/MMIO trap flow: + +1. Instruction decoder: get the Guest Physical Address (GPA) from VM + Exit, go through gla2gpa() page walker if necessary. + +2. Emulate the instruction. Here the hypervisor will have an address + range check to see if the hypervisor is interested in this IO + port or MMIO GPA access. + +3. Hypervisor emulates vLAPIC, vIOAPIC, vPIC, and vUART only (for + Service OS only). Any other emulation request are forwarded to + the SOS for handling. The vCPU raising the I/O request will + halt until this I/O request is processed successfully. An IPI will + send to vCPU0 of SOS to notify there is an I/O request waiting for + service. + +4. Service OS VHM module takes the I/O request and dispatches the request + to multiple clients. These clients could be SOS kernel space + VBS-K, MPT, or User-land Device model. VHM I/O request server + selects a default fallback client responsible to handle any I/O + request not handled by other clients. (The Device Manager is the + default fallback client.) Each client needs to register its I/O + range or specific PCI bus/device/function (BDF) numbers. If an I/O + request falls into the client range, the I/O request server will + send the request to that client. + +5. Multiple clients - fallback client (Device Model in user-land), + VBS-K client, MPT client. + Once the I/O request emulation completes, the client updates the + request status and notifies the hypervisor by a hypercall. + Hypervisor picks up that request, do any necessary cleanup, + and resume the Guest vCPU. + +Most I/O emulation tasks are done by the SOS CPU, and requests come from +UOS vCPUs. + +Virtual interrupt +***************** + +All interrupts received by the User OS comes from a virtual interrupt +injected by a virtual vLAPIC, vIOAPIC, or vPIC. All device emulation is +done inside the SOS Userspace device model. However for performance +consideration, vLAPIC, vIOAPIC, and vPIC devices are emulated inside the +ACRN hypervisor directly. From the guest point of view, vPIC uses +Virtual Wire Mode via vIOAPIC. + +The symmetric I/O Mode is shown in :numref:`primer-symmetric-io`: + +.. figure:: images/primer-symmetric-io.png + :align: center + :name: primer-symmetric-io + + Symmetric I/O Mode + + +**Kernel boot param with vPIC** + add "maxcpu=0" to User OS to use PIC + +**Kernel boot param with vIOAPIC** + add "maxcpu=1" (as long as not "0") User OS will use IOAPIC. Keep + IOAPIC pin2 as source of PIC. + +Virtual LAPIC +============= + +The LAPIC (Local Advanced Programmable interrupt Controller) is +virtualized for SOS or UOS. The vLAPIC is currently emulated by a Guest +MMIO trap to GPA address range: 0xFEE00000 - 0xFEE100000 (1MB). ACRN +hypervisor will support APIC-v and Post interrupts in a future release. + +vLAPIC provides the same feature as a native LAPIC: + +- Mask/Unmask vectors +- Inject virtual vectors (Level or Edge trigger mode) to vCPU +- Notify vIOAPIC of EOI processing +- Provide TSC Timer service +- vLAPIC support CR8 to update TPR +- INIT/STARTUP handling + +Virtual IOAPIC +============== + +A vIOAPIC is emulated by the hypervisor when the Guest accesses MMIO GPA +Range: 0xFEC00000 - 0xFEC01000. The vIOAPIC for the SOS will match the +same pin numbers as the native HW IOAPIC. The vIOAPIC for UOS only +provides 24 Pins. When a vIOAPIC PIN is asserted, the vIOAPIC calls +vLAPIC APIs to inject the vector to the Guest. + +Virtual PIC +=========== + +A vPIC is required for TSC calculation. Normally the UOS boots with a +vIOAPIC. A vPIC is a source of external interrupts to the Guest. On +every VMExit, the hypervisor checks if there are pending external PIC +interrupts. + +Virtual Interrupt Injection +=========================== + +The source of virtual interrupts comes from either the Device Module or +from assigned devices: + +**SOS assigned devices** + As we assigned all devices to SOS directly whenever a devices' + physical interrupts come, we inject the corresponding virtual interrupts + to SOS via the vLAPIC/vIOAPIC. In this case, the SOS doesn't use the + vPIC and does not have emulated devices. + +**UOS assigned devices** + Only PCI devices are assigned to UOS, and virtual interrupt injection + follows the same way as the SOS. A virtual interrupt injection operation + is triggered when a device's physical interrupt is triggered. + +**UOS emulated devices** + Device Module (user-land Device Model) is responsible for UOS emulated + devices' interrupt lifecycle management. The Device Model knows when an + emulated device needs to assert a virtual IOPAIC/PIC Pin or needs to + send a virtual MSI vector to the Guest. This logic is entirely handled + by the Device Model. + +:numref:`primer-hypervisor-interrupt` shows how the hypervisor handles +interrupt processing and pending interrupts (acrn_do_intr_process): + +.. figure:: images/primer-hypervisor-interrupt.png + :align: center + :name: primer-hypervisor-interrupt + + Hypervisor Interrupt handler + +There are many cases where the Guest RFLAG.IF is cleared and interrupts +are disabled. The hypervisor will check if the Guest IRQ window is +available before injection. NMI is unmasked interrupt injection +regardless of existing guest IRQ window status. If the current IRQ +windows is not available, hypervisor enables +``MSR_IA32_VMX_PROCBASED_CTLS_IRQ_WIN`` (PROCBASED_CTRL.bit[2]) and +VMEnter directly. The injection will be done on next VMExit once the +Guest issues STI (GuestRFLAG.IF=1). + +VT-x and VT-d +************* + +Since 2006, Intel CPUs have supported hardware assist - VT-x +instructions, where the CPU itself traps specific guest instructions and +register accesses directly into the VMM without need for binary +translation (and modification) of the guest operating system. Guest +operating systems can be run natively without modification, although it +is common to still install virtualization-aware para-virtualized drivers +into the guests to improve functionality. One common example is access +to storage via emulated SCSI devices. + +Intel CPUs and chipsets support various Virtualization Technology (VT) +features - such as VT-x and VT-d. Physical events on the platform +trigger CPU **VM Exits** (a trap into the VMM) to handle physical +events such as physical device interrupts, + +In the ACRN hypervisor design, VT-d can be used to do DMA Remapping, +such as Address translation and Isolation. +:numref:`primer-dma-address-mapping` is an example of address +translation: + +.. figure:: images/primer-dma-address-mapping.png + :align: center + :name: primer-dma-address-mapping + + DMA address mapping + +Hypercall +********* + +ACRN hypervisor currently supports less than a dozen +:ref:`hypercall_apis` and VHM upcall APIs to support the necessary VM +management, IO request distribution and guest memory mappings. The +hypervisor and Service OS (SOS) reserve vector 0xF4 for hypervisor +notification to the SOS. This upcall is necessary whenever device +emulation is required by the SOS. The upcall vector 0xF4 is injected to +SOS vCPU0. + +Refer to the :ref:`acrn_apis` documentation for details. + +Device emulation +**************** + +The ACRN Device Model emulates different kinds of platform devices, such as +RTC, LPC, UART, PCI device, and Virtio block device. The most important +thing about device emulation is to handle the I/O request from different +devices. The I/O request could be PIO, MMIO, or PCI CFG SPACE access. For +example: + +- a CMOS RTC device may access 0x70/0x71 PIO to get the CMOS time, +- a GPU PCI device may access its MMIO or PIO BAR space to complete + its framebuffer rendering, or +- the bootloader may access PCI devices' CFG + SPACE for BAR reprogramming. + +ACRN Device Model injects interrupts/MSIs to its frontend devices when +necessary as well, for example, a RTC device needs to get its ALARM +interrupt or a PCI device with MSI capability needs to get its MSI. The +Data Model also provides a PIRQ routing mechanism for platform devices. + +Virtio Devices +************** + +This section introduces the Virtio devices supported by ACRN. Currently +all the Back-end virtio drivers are implemented using the virtio APIs +and the FE drivers are re-using Linux standard Front-end virtio drivers. + +Virtio-rnd +================= + +The virtio-rnd entropy device supplies high-quality randomness for guest +use. The virtio device ID of the virtio-rnd device is 4, and supports +one virtqueue of 64 entries (configurable in the source code). No +feature bits are defined. + +When the FE driver requires random bytes, the BE device places bytes of +random data onto the virtqueue. + +To launch the virtio-rnd device, you can use the following command: + +.. code-block:: bash + + ./acrn-dm -A -m 1168M \ + -s 0:0,hostbridge \ + -s 1,virtio-blk,./uos.img \ + -s 2,virtio-rnd \ + -k bzImage \ + -B "root=/dev/vda rw rootwait noxsave maxcpus=0 nohpet \ + console=hvc0 no_timer_check ignore_loglevel \ + log_buf_len=16M consoleblank=0 tsc=reliable" vm1 + +To verify the result in user OS side, you can use the following command: + +.. code-block:: bash + + od /dev/random + +Virtio-blk +========== + +The virtio-blk device is a simple virtual block device. The FE driver +will place read, write, and other requests onto the virtqueue, so that +the BE driver can process them accordingly. + +The virtio device ID of the virtio-blk is 2, and it supports one +virtqueue with 64 entries, configurable in the source code. The feature +bits supported by the BE device are as follows: + +**VTBLK\_F\_SEG\_MAX(bit 2)** + Maximum number of segments in a request is in seg_max. + +**VTBLK\_F\_BLK\_SIZE(bit 6)** + block size of disk is in blk\_size. + +**VTBLK\_F\_FLUSH(bit 9)** + cache flush command support. + +**VTBLK\_F\_TOPOLOGY(bit 10)** + device exports information on optimal I/O alignment. + +To use the virtio-blk device, use the following command: + +.. code-block:: bash + + ./acrn-dm -A -m 1168M \ + -s 0:0,hostbridge \ + -s 1,virtio-blk,./uos.img** \ + -k bzImage -B "root=/dev/vda rw rootwait noxsave maxcpus=0 \ + nohpet console=hvc0 no_timer_check ignore_loglevel \ + log_buf_len=16M consoleblank=0 tsc=reliable" vm1 + +To verify the result, you should expect the user OS to boot +successfully. + +Virtio-net +========== + +The virtio-net device is a virtual Ethernet device. The virtio device ID +of the virtio-net is 1. The virtio-net device supports two virtqueues, +one for transmitting packets and the other for receiving packets. The +FE driver will place empty buffers onto one virtqueue for receiving +packets, and enqueue outgoing packets onto the other virtqueue for +transmission. Currently the size of each virtqueue is 1000, configurable +in the source code. + +To access the external network from user OS, a L2 virtual switch should +be created in the service OS, and the BE driver is bonded to a tap/tun +device linking under the L2 virtual switch. See +:numref:`primer-virtio-net`: + +.. figure:: images/primer-virtio-net.png + :align: center + :name: primer-virtio-net + + Accessing external network from User OS + +Currently the feature bits supported by the BE device are: + +**VIRTIO\_NET\_F\_MAC(bit 5)** + device has given MAC adderss. + +**VIRTIO\_NET\_F\_MRG\_RXBUF(bit 15)** + BE driver can merge receive buffers. + +**VIRTIO\_NET\_F\_STATUS(bit 16)** + configuration status field is available. + +**VIRTIO\_F\_NOTIFY\_ON\_EMPTY(bit 24)** + device will issue an interrupt if it runs out of available + descriptors on a virtqueue. + +To enable the virtio-net device, use the following command: + +.. code-block:: bash + + ./acrn-dm -A -m 1168M \ + -s 0:0,hostbridge \ + -s 1,virtio-blk,./uos.img \ + -s 2,virtio-net,tap0 \ + -k bzImage -B "root=/dev/vda rw rootwait noxsave maxcpus=0 \ + nohpet console=hvc0 no_timer_check ignore_loglevel \ + log_buf_len=16M consoleblank=0 tsc=reliable" vm1 + +To verify the correctness of the device, the external +network should be accessible from the user OS. + +Virtio-console +============== + +The virtio-console device is a simple device for data input and output. +The virtio device ID of the virtio-console device is 3. A device could +have from one to 16 ports. Each port has a pair of input and output +virtqueues used to communicate information between the FE and BE +drivers. Currently the size of each virtqueue is 64, configurable in the +source code. + +Similar to virtio-net device, the two virtqueues specific to a port are +for transmitting virtqueue and receiving virtqueue. The FE driver will +place empty buffers onto the receiving virtqueue for incoming data, and +enqueue outgoing characters onto transmitting virtqueue. + +Currently the feature bits supported by the BE device are: + +**VTCON\_F\_SIZE(bit 0)** + configuration columns and rows are valid. + +**VTCON\_F\_MULTIPORT(bit 1)** + device supports multiple ports, and control virtqueues will be used. + +**VTCON\_F\_EMERG\_WRITE(bit 2)** + device supports emergency write. + +Virtio-console supports redirecting guest output to various backend +devices, including stdio/pty/tty. Users could follow the syntax below to +specify which backend to use: + +.. code-block:: none + + virtio-console,[@]stdio\|tty\|pty:portname[=portpath][,[@]stdio\|tty\|pty:portname[=portpath]] + +For example, to use stdio as a virtio-console backend, use the following +command: + +.. code-block:: bash + + ./acrn-dm -A -m 1168M \ + -s 0:0,hostbridge \ + -s 1,virtio-blk,./uos.img \ + -s 3,virtio-console,@stdio:stdio\_port \ + -k bzImage -B "root=/dev/vda rw rootwait noxsave maxcpus=0 \ + nohpet console=hvc0 no_timer_check ignore_loglevel \ + log_buf_len=16M consoleblank=0 tsc=reliable" vm1 + +Then user could login into user OS: + +.. code-block:: bash + + Ubuntu 17.04 xubuntu hvc0 + xubuntu login: root + Password: + +To use pty as a virtio-console backend, use the following command: + +.. code-block:: bash + + ./acrn-dm -A -m 1168M \ + -s 0:0,hostbridge \ + -s 1,virtio-blk,./uos.img \ + -s 2,virtio-net,tap0 \ + -s 3,virtio-console,@pty:pty\_port \ + -k ./bzImage -B "root=/dev/vda rw rootwait noxsave maxcpus=0 \ + nohpet console=hvc0 no_timer_check ignore_loglevel \ + log_buf_len=16M consoleblank=0 tsc=reliable" vm1 & + +When ACRN-DM boots User OS successfully, a similar log will be shown +as below: + +.. code-block:: none + + ************************************************************** + virt-console backend redirected to /dev/pts/0 + ************************************************************** + +You can then use the following command to login the User OS: + +.. code-block:: bash + + minicom -D /dev/pts/0 + +or + +.. code-block:: bash + + screen /dev/pts/0 diff --git a/static/acrn-custom.css b/static/acrn-custom.css index 463e68c68..310775310 100644 --- a/static/acrn-custom.css +++ b/static/acrn-custom.css @@ -18,11 +18,17 @@ color: rgba(255,255,255,1); } +/* add some space before the figure caption */ p.caption { # border-top: 1px solid; margin-top: 1em; } +/* add a colon after the figure/table number (before the caption) */ +span.caption-number::after { + content: ": "; +} + /* make .. hlist:: tables fill the page */ table.hlist { width: 95% !important;