Merge pull request #9084 from amshinde/document-intel-gpu-vfio

docs: Document Intel Discrete GPUs usage with Kata
2025-09-02 09:24:35 +00:00 · 2024-04-16 16:17:03 -07:00
parent c26dad8fe5 973a15332a
commit af3b19ed18
6 changed files with 283 additions and 7 deletions
--- a/docs/hypervisors.md
+++ b/docs/hypervisors.md
@@ -60,7 +60,7 @@ are available, their default values and how each setting can be used.
 | [ACRN] | [`configuration-acrn.toml`](../src/runtime/config/configuration-acrn.toml.in) | `acrn` | | | | |
 | [Cloud Hypervisor] | [`configuration-clh.toml`](../src/runtime/config/configuration-clh.toml.in) | `clh` | | [`configuration-cloud-hypervisor.toml`](../src/runtime-rs/config/configuration-cloud-hypervisor.toml.in) | `cloud-hypervisor` | |
 | [Firecracker] | [`configuration-fc.toml`](../src/runtime/config/configuration-fc.toml.in) | `fc` | | | | |
-| [QEMU] | [`configuration-qemu.toml`](../src/runtime/config/configuration-qemu.toml.in) | `qemu` | yes | [`configuration-qemu.toml`](../src/runtime-rs/config/configuration-qemu.toml.in) | `qemu` | |
+| [QEMU] | [`configuration-qemu.toml`](../src/runtime/config/configuration-qemu.toml.in) | `qemu` | yes | [`configuration-qemu.toml`](../src/runtime-rs/config/configuration-qemu-runtime-rs.toml.in) | `qemu` | |
 | [`Dragonball`] | | | | [`configuration-dragonball.toml`](../src/runtime-rs/config/configuration-dragonball.toml.in) | `dragonball` | yes |
 | [StratoVirt] | [`configuration-stratovirt.toml`](../src/runtime/config/configuration-stratovirt.toml.in) | `stratovirt` | | | | |

--- a/docs/use-cases/GPU-passthrough-and-Kata.md
+++ b/docs/use-cases/GPU-passthrough-and-Kata.md
@@ -2,5 +2,5 @@

 Kata Containers supports passing certain GPUs from the host into the container. Select the GPU vendor for detailed information:

- [Intel](Intel-GPU-passthrough-and-Kata.md)
+- [Intel Discrete GPUs](Intel-Discrete-GPU-passthrough-and-Kata.md)/[Intel Integrated GPUs](Intel-GPU-passthrough-and-Kata.md)
 - [NVIDIA](NVIDIA-GPU-passthrough-and-Kata.md)
--- a/docs/use-cases/Intel-Discrete-GPU-passthrough-and-Kata.md
+++ b/docs/use-cases/Intel-Discrete-GPU-passthrough-and-Kata.md
@@ -0,0 +1,274 @@
+# Using Intel Discrete GPU device with Kata Containers
+
+This guide covers the use case for passing Intel Discrete GPUs to Kata.
+These include the Intel® Data Center GPU Max Series and Intel® Data Center GPU Flex Series.
+For integrated GPUs please refer to [Integrate-Intel-GPUs-with-Kata](Intel-GPU-passthrough-and-Kata.md)
+
+> **Note:** These instructions are for a system that has an x86_64 CPU.
+
+An Intel Discrete GPU can be passed to a Kata Container using GPU passthrough, 
+or SR-IOV passthrough.
+
+In Intel GPU pass-through mode, an entire physical GPU is directly assigned to one VM. 
+In this mode of operation, the GPU is accessed exclusively by the Intel driver running in
+the VM to which it is assigned. The GPU is not shared among VMs.
+
+With SR-IOV mode, it is possible to pass a Virtual GPU instance to a virtual machine.
+With this, multiple Virtual GPU instances can be carved out of a single physical GPU 
+and be passed to different VMs, allowing the GPU to be shared.
+
+| Technology | Description |
+|-|-|
+| GPU passthrough | Physical GPU assigned to a single VM |
+| SR-IOV passthrough | Physical GPU shared by multiple VMs |
+
+## Hardware Requirements
+
+Intel GPUs Recommended for Virtualization:
+
+- Intel® Data Center GPU Max Series (`Ponte Vecchio`)
+- Intel® Data Center GPU Flex Series (`Arctic Sound-M`)
+- Intel® Data Center GPU Arc Series 
+
+The following steps outline the workflow for using an Intel Graphics device with Kata Containers.
+
+## Host BIOS requirements
+
+Hardware such as Intel Max and Flex series require larger PCI BARs. 
+
+For large BAR devices, MMIO mapping above the 4GB address space should be enabled in the PCI configuration of the BIOS.
+
+Some hardware vendors use a different name in the BIOS, such as:
+
+- Above 4GB Decoding
+- Memory Hole for PCI MMIO
+- Memory Mapped I/O above 4GB
+
+## Host Kernel Requirements
+
+For device passthrough to work with the Max and Flex Series, an out of tree kernel driver is required.
+
+For Ubuntu 22.04 server, follow these instructions to install the out of tree GPU driver:
+```bash
+$ sudo apt update
+$ sudo apt install -y gpg-agent wget
+$ wget -qO - https://repositories.intel.com/gpu/intel-graphics.key | \
+    sudo gpg --dearmor --output /usr/share/keyrings/intel-graphics.gpg
+$ source /etc/os-release
+$ echo "deb [arch=amd64 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/gpu/ubuntu ${VERSION_CODENAME}/lts/2350 unified" | \
+    sudo tee /etc/apt/sources.list.d/intel-gpu-${VERSION_CODENAME}.list
+$ sudo apt update
+$ sudo apt install -y linux-headers-"$(uname -r)" flex bison intel-fw-gpu intel-i915-dkms xpu-smi
+$ sudo reboot
+```
+For support on other distributions, please refer to [DGPU-docs](https://dgpu-docs.intel.com/driver/installation.html)
+
+You can also install the driver from source which is maintained at [intel-gpu-i915-backports](https://github.com/intel-gpu/intel-gpu-i915-backports)
+Detailed instructions for reference can be found at: https://github.com/intel-gpu/intel-gpu-i915-backports/blob/backport/main/docs/README_ubuntu.md.
+
+Below are the steps for installing the driver from source on an Ubuntu 22.04 LTS system:
+```bash
+$ export I915_BRANCH="backport/main"
+$ git clone -b ${I915_BRANCH} --depth 1 https://github.com/intel-gpu/intel-gpu-i915-backports.git
+$ cd intel-gpu-i915-backports/
+$ sudo apt install -y dkms make debhelper devscripts build-essential flex bison mawk
+$ sudo apt install -y linux-headers-"$(uname -r)" linux-image-unsigned-"$(uname -r)"
+$ make i915dkmsdeb-pkg
+```
+The above make command will create Debian package in parent folder:  `intel-i915-dkms_<release version>.<kernel-version>.deb`
+Install the package as:
+```bash
+$ sudo dpkg -i intel-i915-dkms_<release version>.<kernel-version>.deb
+$ sudo reboot
+```
+
+Additionally, verify that the following kernel configs are enabled for your host kernel:
+```
+CONFIG_VFIO
+CONFIG_VFIO_IOMMU_TYPE1
+CONFIG_VFIO_PCI
+```
+
+## Host kernel command line 
+
+Your host kernel needs to be booted with `intel_iommu=on` and `i915.enable_iaf=0` on the kernel command
+line.
+
+1. Run the following to change the kernel command line using grub:
+```bash
+$ sudo vim /etc/default/grub
+```
+
+2. At the end of the GRUB_CMDLINE_LINUX_DEFAULT append the below line:
+
+`intel_iommu=on iommu=pt i915.max_vfs=63 i915.enable_iaf=0`
+
+3. Update grub as per OS distribution:
+
+For Ubuntu:
+```bash
+$ sudo update-grub
+```
+
+For CentOS/RHEL:
+```bash
+$ sudo grub2-mkconfig -o /boot/grub2/grub.cfg 
+``` 
+
+4. Reboot the system
+```bash
+$ sudo reboot
+```
+
+## Install and configure Kata Containers
+
+To use this feature, you need Kata version 1.3.0 or above.
+Follow the [Kata Containers setup instructions](../install/README.md)
+to install the latest version of Kata.
+
+To use large BARs devices (for example, NVIDIA Tesla P100), you need Kata version 1.11.0 or above.
+
+In order to pass a GPU to a Kata Container, you need to enable the `hotplug_vfio_on_root_bus`
+configuration in the Kata `configuration.toml` file as shown below.
+
+```bash
+$ sudo sed -i -e 's/^# *\(hotplug_vfio_on_root_bus\).*=.*$/\1 = true/g' /usr/share/defaults/kata-containers/configuration.toml
+```
+
+Make sure you are using the `q35` machine type by verifying `machine_type = "q35"` is
+set in the `configuration.toml`. Make sure `pcie_root_port` is set to a positive value.
+
+After making the above changes, configuration in the `configuration.toml` should look like this:
+```
+machine_type = "q35"
+
+hotplug_vfio_on_root_bus = true
+pcie_root_port = 1
+```
+
+## GPU passthrough with Kata Containers
+
+Use the following steps to pass an Intel discrete GPU  with Kata:
+
+1. Find the Bus-Device-Function (BDF) for GPU device:
+
+   ```
+   $ sudo lspci -nn -D | grep Display
+   ```
+
+   Run the previous command to determine the BDF for the GPU device on host.<br/>
+   From the previous output, PCI address `0000:29:00.0` is assigned to the hardware GPU device.<br/>
+   We choose this BDF to use it later to unbind the GPU device from the host for the purpose of demonstration.<br/>
+
+2. Find the IOMMU group for the GPU device:
+
+   ```bash
+   $ BDF="0000:29:00.0"
+   $ readlink -e /sys/bus/pci/devices/$BDF/iommu_group
+   /sys/kernel/iommu_groups/27
+   ```
+
+   The previous output shows that the GPU belongs to IOMMU group 27.
+
+3. Bind the GPU to the `vfio-pci` device driver:
+
+   ```bash
+   $ BDF="0000:29:00.0"
+   $ DEV="/sys/bus/pci/devices/$BDF"
+   $ echo "vfio-pci" | sudo tee "$DEV"/driver_override
+   $ echo $BDF | sudo tee "$DEV"/driver/unbind
+   $ echo "$BDF" | sudo tee "/sys/bus/pci/drivers_probe"
+   ```
+
+   After you run the previous commands, the GPU is  bound to `vfio-pci` driver.<br/>
+   A new directory with the IOMMU group number is created under `/dev/vfio`:
+
+   ```bash
+   $ ls -l /dev/vfio
+     total 0
+     crw------- 1   root root  241,   0 May 18 15:38 27
+     crw-rw-rw- 1 root root  10, 196 May 18 15:37 vfio
+   ```
+
+   Later, to return the device to the standard driver, we simply clear the
+   `driver_override` and re-probe the device, ex:
+
+   ```bash
+   $ echo | sudo tee "$DEV/preferred_driver"
+   $ echo $BDF | sudo tee $DEV/driver/unbind
+   $ echo $BDF | sudo tee /sys/bus/pci/drivers_probe
+   ```
+
+5. Start a Kata container with GPU device:
+
+   ```bash
+   $ sudo ctr --debug run --runtime "io.containerd.kata.v2"  --device "/dev/vfio/27"  --rm -t  "docker.io/library/archlinux:latest" arch uname -r
+
+   ```
+
+   Run `lspci` within the container to verify the GPU device is seen in the list of
+   the PCI devices. Note the vendor-device id of the GPU ("8086:0bd5") in the `lspci` output.
+
+## SR-IOV mode for Intel Discrete GPUs
+
+Use the following steps to pass an Intel Graphics device in SR-IOV mode to a Kata Container:
+
+1. Find the BDF for GPU device:
+
+   ```sh
+   $ sudo lspci -nn -D | grep Display
+     0000:29:00.0 Display controller [0380]: Intel Corporation Ponte Vecchio 1T [8086:0bd5] (rev 2f)
+     0000:3a:00.0 Display controller [0380]: Intel Corporation Ponte Vecchio 1T [8086:0bd5] (rev 2f)
+     0000:9a:00.0 Display controller [0380]: Intel Corporation Ponte Vecchio 1T [8086:0bd5] (rev 2f)
+     0000:ca:00.0 Display controller [0380]: Intel Corporation Ponte Vecchio 1T [8086:0bd5] (rev 2f)
+   ```
+
+   Run the previous command to find out the BDF for the GPU device on host.
+   We choose the GPU with PCI address "0000:3a:00.0" to assign a GPU SR-IOV interface.
+
+2. Carve out SR-IOV slice for the GPU:
+
+   List our total possible SR-IOV virtual interfaces for the GPU:
+
+   ```bash
+   $ BDF="0000:3a:00.0"
+   $ cat  "/sys/bus/pci/devices/$BDF/sriov_totalvfs"
+   63
+   ``` 
+
+   Create SR-IOV interfaces for the GPU:
+   ```sh
+   $ echo 4 | sudo tee /sys/bus/pci/devices/$BDF/sriov_numvfs
+     4
+   $ sudo lspci | grep Display
+     29:00.0 Display controller: Intel Corporation Ponte Vecchio 1T (rev 2f)
+     3a:00.0 Display controller: Intel Corporation Ponte Vecchio 1T (rev 2f)
+     3a:00.1 Display controller: Intel Corporation Ponte Vecchio 1T (rev 2f)
+     3a:00.2 Display controller: Intel Corporation Ponte Vecchio 1T (rev 2f)
+     3a:00.3 Display controller: Intel Corporation Ponte Vecchio 1T (rev 2f)
+     3a:00.4 Display controller: Intel Corporation Ponte Vecchio 1T (rev 2f)
+     9a:00.0 Display controller: Intel Corporation Ponte Vecchio 1T (rev 2f)
+     ca:00.0 Display controller: Intel Corporation Ponte Vecchio 1T (rev 2f)
+   ```
+   The above output shows the SR-IOV interfaces created for the GPU.
+
+3. Find the IOMMU group for the GPU SR-IOV interface(VGPU):
+
+   ```bash
+   $ BDF="0000:3a:00:1"
+   $ readlink -e "/sys/bus/pci/devices/$BDF/iommu_group"
+     /sys/kernel/iommu_groups/437
+   $ ls -l /dev/vfio
+     total 0
+     crw-------   1 root root  241,   0 May 18 11:30 437
+     crw-rw-rw- 1 root root  10, 196 May 18 11:29 vfio
+   ```
+
+   Now you can use the device node `/dev/vfio/437` in docker command line to pass
+   the VGPU to a Kata Container.
+
+4. Start a Kata Containers container with GPU device enabled:
+
+   ```bash
+   $ sudo ctr --debug run --runtime "io.containerd.kata.v2"  --device /dev/vfio/437  --rm -t  "docker.io/library/archlinux:latest" arch uname -r
+   ```
--- a/tests/cmd/check-spelling/data/acronyms.txt
+++ b/tests/cmd/check-spelling/data/acronyms.txt
@@ -121,3 +121,4 @@ gic
 msr
 cpuid
 pio
+gpu
--- a/tests/cmd/check-spelling/data/main.txt
+++ b/tests/cmd/check-spelling/data/main.txt
@@ -132,4 +132,5 @@ upcall
 Upcall
 ioctl/A
 struct/A # struct in Rust
-Struct/A
+Struct/A
+intel
--- a/tests/cmd/check-spelling/kata-dictionary.dic
+++ b/tests/cmd/check-spelling/kata-dictionary.dic
@@ -1,4 +1,4 @@
-384
+386
 ACPI/AB
 ACS/AB
 API/AB
@@ -146,7 +146,7 @@ SVG/AB
 SaaS/B
 SemaphoreCI/B
 StratoVirt/B
-Struct/A#
+Struct/A
 Sysbench/B
 TBD/AB
 TDX
@@ -222,7 +222,6 @@ deliverable/AB
 deploy
 dev
 devicemapper/B
-devmapper
 dialer
 dialog/A
 dind/B
@@ -242,6 +241,7 @@ gRPC/AB
 gic
 golang/B
 goroutine/AB
+gpu
 hostname/AB
 hotplug/ACD
 howto/AB
@@ -254,6 +254,7 @@ implementor/A
 init/AB
 initramfs/AB
 initrd/AB
+intel
 ioctl/A
 iodepth/A
 ioengine/A
@@ -334,7 +335,6 @@ serverless
 signoff/A
 snapcraft/B
 snapd/B
-snapshotters
 stalebot/B
 startup
 stderr/AB