diff --git a/use-cases/GPU-passthrough-and-Kata.md b/use-cases/GPU-passthrough-and-Kata.md index 565f0d6e2c..493c947c95 100644 --- a/use-cases/GPU-passthrough-and-Kata.md +++ b/use-cases/GPU-passthrough-and-Kata.md @@ -1,274 +1,6 @@ -# Using Intel GPU device with Kata Containers +# Using GPUs with Kata Containers -- [Using Intel GPU device with Kata Containers](#using-intel-gpu-device-with-kata-containers) - - [Hardware Requirements](#hardware-requirements) - - [Host Kernel Requirements](#host-kernel-requirements) - - [Install and configure Kata Containers](#install-and-configure-kata-containers) - - [Build Kata Containers kernel with GPU support](#build-kata-containers-kernel-with-gpu-support) - - [GVT-d with Kata Containers](#gvt-d-with-kata-containers) - - [GVT-g with Kata Containers](#gvt-g-with-kata-containers) +Kata Containers supports passing certain GPUs from the host into the container. Select the GPU vendor for detailed information: -An Intel Graphics device can be passed to a Kata Containers container using GPU -passthrough (Intel GVT-d) as well as GPU mediated passthrough (Intel GVT-g). - -Intel GVT-d (one VM to one physical GPU) also named as Intel-Graphics-Device -passthrough feature is one flavor of graphics virtualization approach. -This flavor allows direct assignment of an entire GPU to a single user, -passing the native driver capabilities through the hypervisor without any limitations. - -Intel GVT-g (multiple VMs to one physical GPU) is a full GPU virtualization solution -with mediated pass-through.
-A virtual GPU instance is maintained for each VM, with part of performance critical -resources, directly assigned. The ability to run a native graphics driver inside a -VM without hypervisor intervention in performance critical paths, achieves a good -balance among performance, feature, and sharing capability. - -| Technology | Description | Behaviour | Detail | -|-|-|-|-| -| Intel GVT-d | GPU passthrough | Physical GPU assigned to a single VM | Direct GPU assignment to VM without limitation | -| Intel GVT-g | GPU sharing | Physical GPU shared by multiple VMs | Mediated passthrough | - -## Hardware Requirements - - - For client platforms, 5th generation Intel® Core Processor Graphics or higher are required. - - For server platforms, E3_v4 or higher Xeon Processor Graphics are required. - -The following steps outline the workflow for using an Intel Graphics device with Kata. - -## Host Kernel Requirements - -The following configurations need to be enabled on your host kernel: - -``` -CONFIG_VFIO_IOMMU_TYPE1=m -CONFIG_VFIO=m -CONFIG_VFIO_PCI=m -CONFIG_VFIO_MDEV=m -CONFIG_VFIO_MDEV_DEVICE=m -CONFIG_DRM_I915_GVT=m -CONFIG_DRM_I915_GVT_KVMGT=m -``` - -Your host kernel needs to be booted with `intel_iommu=on` on the kernel command -line. - -## Install and configure Kata Containers - -To use this feature, you need Kata version 1.3.0 or above. -Follow the [Kata Containers setup instructions](https://github.com/kata-containers/documentation/blob/master/install/README.md) -to install the latest version of Kata. - -In order to pass a GPU to a Kata Container, you need to enable the `hotplug_vfio_on_root_bus` -configuration in the Kata `configuration.toml` file as shown below. - -``` -$ sudo sed -i -e 's/^# *\(hotplug_vfio_on_root_bus\).*=.*$/\1 = true/g' /usr/share/defaults/kata-containers/configuration.toml -``` - -Make sure you are using the `pc` machine type by verifying `machine_type = "pc"` is -set in the `configuration.toml`. - -## Build Kata Containers kernel with GPU support - -The default guest kernel installed with Kata Containers does not provide GPU support. -To use an Intel GPU with Kata Containers, you need to build a kernel with the necessary -GPU support. - -The following i915 kernel config options need to be enabled: -``` -CONFIG_DRM=y -CONFIG_DRM_I915=y -CONFIG_DRM_I915_USERPTR=y -``` - -Build the Kata Containers kernel with the previous config options, using the instructions -described in [Building Kata Containers kernel](https://github.com/kata-containers/packaging/tree/master/kernel). -For further details on building and installing guest kernels, see [the developer guide](https://github.com/kata-containers/documentation/blob/master/Developer-Guide.md#install-guest-kernel-images). - -## GVT-d with Kata Containers - -Use the following steps to pass an Intel Graphics device in GVT-d mode with Kata: - -1. Find the Bus-Device-Function (BDF) for GPU device: - - ``` - $ sudo lspci -nn -D | grep Graphics - 0000:00:02.0 VGA compatible controller [0300]: Intel Corporation Broadwell-U Integrated Graphics [8086:1616] (rev 09) - ``` - - Run the previous command to determine the BDF for the GPU device on host.
- From the previous output, PCI address `0000:00:02.0` is assigned to the hardware GPU device.
- This BDF is used later to unbind the GPU device from the host.
- "8086 1616" is the device ID of the hardware GPU device. It is used later to - rebind the GPU device to `vfio-pci` driver. - -2. Find the IOMMU group for the GPU device: - - ``` - $ BDF="0000:00:02.0" - $ readlink -e /sys/bus/pci/devices/$BDF/iommu_group - /sys/kernel/iommu_groups/1 - ``` - - The previous output shows that the GPU belongs to IOMMU group 1. - -3. Unbind the GPU: - - ``` - $ echo $BDF | sudo tee /sys/bus/pci/devices/$BDF/driver/unbind - ``` - -4. Bind the GPU to the `vfio-pci` device driver: - - ``` - $ sudo modprobe vfio-pci - $ echo 8086 1616 | sudo tee /sys/bus/pci/drivers/vfio-pci/new_id - $ echo $BDF | sudo tee --append /sys/bus/pci/drivers/vfio-pci/bind - ``` - - After you run the previous commands, the GPU is bound to `vfio-pci` driver.
- A new directory with the IOMMU group number is created under `/dev/vfio`: - - ``` - $ ls -l /dev/vfio - total 0 - crw------- 1 root root 241, 0 May 18 15:38 1 - crw-rw-rw- 1 root root 10, 196 May 18 15:37 vfio - ``` - -5. Start a Kata container with GPU device: - - ``` - $ sudo docker run -it --runtime=kata-runtime --rm --device /dev/vfio/1 -v /dev:/dev debian /bin/bash - ``` - - Run `lspci` within the container to verify the GPU device is seen in the list of - the PCI devices. Note the vendor-device id of the GPU ("8086:1616") in the `lspci` output. - - ``` - $ lspci -nn -D - 0000:00:00.0 Class [0600]: Device [8086:1237] (rev 02) - 0000:00:01.0 Class [0601]: Device [8086:7000] - 0000:00:01.1 Class [0101]: Device [8086:7010] - 0000:00:01.3 Class [0680]: Device [8086:7113] (rev 03) - 0000:00:02.0 Class [0604]: Device [1b36:0001] - 0000:00:03.0 Class [0780]: Device [1af4:1003] - 0000:00:04.0 Class [0100]: Device [1af4:1004] - 0000:00:05.0 Class [0002]: Device [1af4:1009] - 0000:00:06.0 Class [0200]: Device [1af4:1000] - 0000:00:0f.0 Class [0300]: Device [8086:1616] (rev 09) - ``` - - Additionally, you can access the device node for the graphics device: - - ``` - $ ls /dev/dri - card0 renderD128 - ``` - -## GVT-g with Kata Containers - -For GVT-g, you append `i915.enable_gvt=1` in addition to `intel_iommu=on` -on your host kernel command line and then reboot your host. - -Use the following steps to pass an Intel Graphics device in GVT-g mode to a Kata Container: - -1. Find the BDF for GPU device: - - ``` - $ sudo lspci -nn -D | grep Graphics - 0000:00:02.0 VGA compatible controller [0300]: Intel Corporation Broadwell-U Integrated Graphics [8086:1616] (rev 09) - ``` - - Run the previous command to find out the BDF for the GPU device on host. - The previous output shows PCI address "0000:00:02.0" is assigned to the GPU device. - -2. Choose the MDEV (Mediated Device) type for VGPU (Virtual GPU): - - For background on `mdev` types, please follow this [kernel documentation](https://github.com/torvalds/linux/blob/master/Documentation/vfio-mediated-device.txt). - - * List out the `mdev` types for the VGPU: - - ``` - $ BDF="0000:00:02.0" - - $ ls /sys/devices/pci0000:00/$BDF/mdev_supported_types - i915-GVTg_V4_1 i915-GVTg_V4_2 i915-GVTg_V4_4 i915-GVTg_V4_8 - ``` - - * Inspect the `mdev` types and choose one that fits your requirement: - - ``` - $ cd /sys/devices/pci0000:00/0000:00:02.0/mdev_supported_types/i915-GVTg_V4_8 && ls - available_instances create description device_api devices - - $ cat description - low_gm_size: 64MB - high_gm_size: 384MB - fence: 4 - resolution: 1024x768 - weight: 2 - - $ cat available_instances - 7 - ``` - - The output of file `description` represents the GPU resources that are - assigned to the VGPU with specified MDEV type.The output of file `available_instances` - represents the remaining amount of VGPUs you can create with specified MDEV type. - -3. Create a VGPU: - - * Generate a UUID: - - ``` - $ gpu_uuid=$(uuid) - ``` - - * Write the UUID to the `create` file under the chosen `mdev` type: - - ``` - $ echo $(gpu_uuid) | sudo tee /sys/devices/pci0000:00/0000:00:02.0/mdev_supported_types/i915-GVTg_V4_8/create - ``` - -4. Find the IOMMU group for the VGPU: - - ``` - $ ls -la /sys/devices/pci0000:00/0000:00:02.0/mdev_supported_types/i915-GVTg_V4_8/devices/${gpu_uuid}/iommu_group - lrwxrwxrwx 1 root root 0 May 18 14:35 devices/bbc4aafe-5807-11e8-a43e-03533cceae7d/iommu_group -> ../../../../kernel/iommu_groups/0 - - $ ls -l /dev/vfio - total 0 - crw------- 1 root root 241, 0 May 18 11:30 0 - crw-rw-rw- 1 root root 10, 196 May 18 11:29 vfio - ``` - - The IOMMU group "0" is created from the previous output.
- Now you can use the device node `/dev/vfio/0` in docker command line to pass - the VGPU to a Kata Container. - -5. Start Kata container with GPU device enabled: - - ``` - $ sudo docker run -it --runtime=kata-runtime --rm --device /dev/vfio/0 -v /dev:/dev debian /bin/bash - $ lspci -nn -D - 0000:00:00.0 Class [0600]: Device [8086:1237] (rev 02) - 0000:00:01.0 Class [0601]: Device [8086:7000] - 0000:00:01.1 Class [0101]: Device [8086:7010] - 0000:00:01.3 Class [0680]: Device [8086:7113] (rev 03) - 0000:00:02.0 Class [0604]: Device [1b36:0001] - 0000:00:03.0 Class [0780]: Device [1af4:1003] - 0000:00:04.0 Class [0100]: Device [1af4:1004] - 0000:00:05.0 Class [0002]: Device [1af4:1009] - 0000:00:06.0 Class [0200]: Device [1af4:1000] - 0000:00:0f.0 Class [0300]: Device [8086:1616] (rev 09) - ``` - - BDF "0000:00:0f.0" is assigned to the VGPU device. - - Additionally, you can access the device node for the graphics device: - - ``` - $ ls /dev/dri - card0 renderD128 - ``` +- [Intel](Intel-GPU-passthrough-and-Kata.md) +- [Nvidia](Nvidia-GPU-passthrough-and-Kata.md) diff --git a/use-cases/Intel-GPU-passthrough-and-Kata.md b/use-cases/Intel-GPU-passthrough-and-Kata.md new file mode 100644 index 0000000000..b16c89ddf6 --- /dev/null +++ b/use-cases/Intel-GPU-passthrough-and-Kata.md @@ -0,0 +1,295 @@ +# Using Intel GPU device with Kata Containers + +- [Using Intel GPU device with Kata Containers](#using-intel-gpu-device-with-kata-containers) + - [Hardware Requirements](#hardware-requirements) + - [Host Kernel Requirements](#host-kernel-requirements) + - [Install and configure Kata Containers](#install-and-configure-kata-containers) + - [Build Kata Containers kernel with GPU support](#build-kata-containers-kernel-with-gpu-support) + - [GVT-d with Kata Containers](#gvt-d-with-kata-containers) + - [GVT-g with Kata Containers](#gvt-g-with-kata-containers) + +An Intel Graphics device can be passed to a Kata Containers container using GPU +passthrough (Intel GVT-d) as well as GPU mediated passthrough (Intel GVT-g). + +Intel GVT-d (one VM to one physical GPU) also named as Intel-Graphics-Device +passthrough feature is one flavor of graphics virtualization approach. +This flavor allows direct assignment of an entire GPU to a single user, +passing the native driver capabilities through the hypervisor without any limitations. + +Intel GVT-g (multiple VMs to one physical GPU) is a full GPU virtualization solution +with mediated pass-through.
+A virtual GPU instance is maintained for each VM, with part of performance critical +resources, directly assigned. The ability to run a native graphics driver inside a +VM without hypervisor intervention in performance critical paths, achieves a good +balance among performance, feature, and sharing capability. + +| Technology | Description | Behaviour | Detail | +|-|-|-|-| +| Intel GVT-d | GPU passthrough | Physical GPU assigned to a single VM | Direct GPU assignment to VM without limitation | +| Intel GVT-g | GPU sharing | Physical GPU shared by multiple VMs | Mediated passthrough | + +## Hardware Requirements + + - For client platforms, 5th generation Intel® Core Processor Graphics or higher are required. + - For server platforms, E3_v4 or higher Xeon Processor Graphics are required. + +The following steps outline the workflow for using an Intel Graphics device with Kata. + +## Host Kernel Requirements + +The following configurations need to be enabled on your host kernel: + +``` +CONFIG_VFIO_IOMMU_TYPE1=m +CONFIG_VFIO=m +CONFIG_VFIO_PCI=m +CONFIG_VFIO_MDEV=m +CONFIG_VFIO_MDEV_DEVICE=m +CONFIG_DRM_I915_GVT=m +CONFIG_DRM_I915_GVT_KVMGT=m +``` + +Your host kernel needs to be booted with `intel_iommu=on` on the kernel command +line. + +## Install and configure Kata Containers + +To use this feature, you need Kata version 1.3.0 or above. +Follow the [Kata Containers setup instructions](https://github.com/kata-containers/documentation/blob/master/install/README.md) +to install the latest version of Kata. + +In order to pass a GPU to a Kata Container, you need to enable the `hotplug_vfio_on_root_bus` +configuration in the Kata `configuration.toml` file as shown below. + +``` +$ sudo sed -i -e 's/^# *\(hotplug_vfio_on_root_bus\).*=.*$/\1 = true/g' /usr/share/defaults/kata-containers/configuration.toml +``` + +Make sure you are using the `pc` machine type by verifying `machine_type = "pc"` is +set in the `configuration.toml`. + +## Build Kata Containers kernel with GPU support + +The default guest kernel installed with Kata Containers does not provide GPU support. +To use an Intel GPU with Kata Containers, you need to build a kernel with the necessary +GPU support. + +The following i915 kernel config options need to be enabled: +``` +CONFIG_DRM=y +CONFIG_DRM_I915=y +CONFIG_DRM_I915_USERPTR=y +``` + +Build the Kata Containers kernel with the previous config options, using the instructions +described in [Building Kata Containers kernel](https://github.com/kata-containers/packaging/tree/master/kernel). +For further details on building and installing guest kernels, see [the developer guide](https://github.com/kata-containers/documentation/blob/master/Developer-Guide.md#install-guest-kernel-images). + +There is an easy way to build a guest kernel that supports Intel GPU: +``` +## Build guest kernel with https://github.com/kata-containers/packaging/tree/master/kernel + +# Prepare (download guest kernel source, generate .config) +$ ./build-kernel.sh -g intel -f setup + +# Build guest kernel +$ ./build-kernel.sh -g intel build + +# Install guest kernel +$ sudo -E ./build-kernel.sh -g intel install +/usr/share/kata-containers/vmlinux-intel-gpu.container -> vmlinux-5.4.15-70-intel-gpu +/usr/share/kata-containers/vmlinuz-intel-gpu.container -> vmlinuz-5.4.15-70-intel-gpu +``` + +Before using the new guest kernel, please update the `kernel` parameters in `configuration.toml`. +``` +kernel = "/usr/share/kata-containers/vmlinuz-intel-gpu.container" +``` + +## GVT-d with Kata Containers + +Use the following steps to pass an Intel Graphics device in GVT-d mode with Kata: + +1. Find the Bus-Device-Function (BDF) for GPU device: + + ``` + $ sudo lspci -nn -D | grep Graphics + 0000:00:02.0 VGA compatible controller [0300]: Intel Corporation Broadwell-U Integrated Graphics [8086:1616] (rev 09) + ``` + + Run the previous command to determine the BDF for the GPU device on host.
+ From the previous output, PCI address `0000:00:02.0` is assigned to the hardware GPU device.
+ This BDF is used later to unbind the GPU device from the host.
+ "8086 1616" is the device ID of the hardware GPU device. It is used later to + rebind the GPU device to `vfio-pci` driver. + +2. Find the IOMMU group for the GPU device: + + ``` + $ BDF="0000:00:02.0" + $ readlink -e /sys/bus/pci/devices/$BDF/iommu_group + /sys/kernel/iommu_groups/1 + ``` + + The previous output shows that the GPU belongs to IOMMU group 1. + +3. Unbind the GPU: + + ``` + $ echo $BDF | sudo tee /sys/bus/pci/devices/$BDF/driver/unbind + ``` + +4. Bind the GPU to the `vfio-pci` device driver: + + ``` + $ sudo modprobe vfio-pci + $ echo 8086 1616 | sudo tee /sys/bus/pci/drivers/vfio-pci/new_id + $ echo $BDF | sudo tee --append /sys/bus/pci/drivers/vfio-pci/bind + ``` + + After you run the previous commands, the GPU is bound to `vfio-pci` driver.
+ A new directory with the IOMMU group number is created under `/dev/vfio`: + + ``` + $ ls -l /dev/vfio + total 0 + crw------- 1 root root 241, 0 May 18 15:38 1 + crw-rw-rw- 1 root root 10, 196 May 18 15:37 vfio + ``` + +5. Start a Kata container with GPU device: + + ``` + $ sudo docker run -it --runtime=kata-runtime --rm --device /dev/vfio/1 -v /dev:/dev debian /bin/bash + ``` + + Run `lspci` within the container to verify the GPU device is seen in the list of + the PCI devices. Note the vendor-device id of the GPU ("8086:1616") in the `lspci` output. + + ``` + $ lspci -nn -D + 0000:00:00.0 Class [0600]: Device [8086:1237] (rev 02) + 0000:00:01.0 Class [0601]: Device [8086:7000] + 0000:00:01.1 Class [0101]: Device [8086:7010] + 0000:00:01.3 Class [0680]: Device [8086:7113] (rev 03) + 0000:00:02.0 Class [0604]: Device [1b36:0001] + 0000:00:03.0 Class [0780]: Device [1af4:1003] + 0000:00:04.0 Class [0100]: Device [1af4:1004] + 0000:00:05.0 Class [0002]: Device [1af4:1009] + 0000:00:06.0 Class [0200]: Device [1af4:1000] + 0000:00:0f.0 Class [0300]: Device [8086:1616] (rev 09) + ``` + + Additionally, you can access the device node for the graphics device: + + ``` + $ ls /dev/dri + card0 renderD128 + ``` + +## GVT-g with Kata Containers + +For GVT-g, you append `i915.enable_gvt=1` in addition to `intel_iommu=on` +on your host kernel command line and then reboot your host. + +Use the following steps to pass an Intel Graphics device in GVT-g mode to a Kata Container: + +1. Find the BDF for GPU device: + + ``` + $ sudo lspci -nn -D | grep Graphics + 0000:00:02.0 VGA compatible controller [0300]: Intel Corporation Broadwell-U Integrated Graphics [8086:1616] (rev 09) + ``` + + Run the previous command to find out the BDF for the GPU device on host. + The previous output shows PCI address "0000:00:02.0" is assigned to the GPU device. + +2. Choose the MDEV (Mediated Device) type for VGPU (Virtual GPU): + + For background on `mdev` types, please follow this [kernel documentation](https://github.com/torvalds/linux/blob/master/Documentation/driver-api/vfio-mediated-device.rst). + + * List out the `mdev` types for the VGPU: + + ``` + $ BDF="0000:00:02.0" + + $ ls /sys/devices/pci0000:00/$BDF/mdev_supported_types + i915-GVTg_V4_1 i915-GVTg_V4_2 i915-GVTg_V4_4 i915-GVTg_V4_8 + ``` + + * Inspect the `mdev` types and choose one that fits your requirement: + + ``` + $ cd /sys/devices/pci0000:00/0000:00:02.0/mdev_supported_types/i915-GVTg_V4_8 && ls + available_instances create description device_api devices + + $ cat description + low_gm_size: 64MB + high_gm_size: 384MB + fence: 4 + resolution: 1024x768 + weight: 2 + + $ cat available_instances + 7 + ``` + + The output of file `description` represents the GPU resources that are + assigned to the VGPU with specified MDEV type.The output of file `available_instances` + represents the remaining amount of VGPUs you can create with specified MDEV type. + +3. Create a VGPU: + + * Generate a UUID: + + ``` + $ gpu_uuid=$(uuid) + ``` + + * Write the UUID to the `create` file under the chosen `mdev` type: + + ``` + $ echo $(gpu_uuid) | sudo tee /sys/devices/pci0000:00/0000:00:02.0/mdev_supported_types/i915-GVTg_V4_8/create + ``` + +4. Find the IOMMU group for the VGPU: + + ``` + $ ls -la /sys/devices/pci0000:00/0000:00:02.0/mdev_supported_types/i915-GVTg_V4_8/devices/${gpu_uuid}/iommu_group + lrwxrwxrwx 1 root root 0 May 18 14:35 devices/bbc4aafe-5807-11e8-a43e-03533cceae7d/iommu_group -> ../../../../kernel/iommu_groups/0 + + $ ls -l /dev/vfio + total 0 + crw------- 1 root root 241, 0 May 18 11:30 0 + crw-rw-rw- 1 root root 10, 196 May 18 11:29 vfio + ``` + + The IOMMU group "0" is created from the previous output.
+ Now you can use the device node `/dev/vfio/0` in docker command line to pass + the VGPU to a Kata Container. + +5. Start Kata container with GPU device enabled: + + ``` + $ sudo docker run -it --runtime=kata-runtime --rm --device /dev/vfio/0 -v /dev:/dev debian /bin/bash + $ lspci -nn -D + 0000:00:00.0 Class [0600]: Device [8086:1237] (rev 02) + 0000:00:01.0 Class [0601]: Device [8086:7000] + 0000:00:01.1 Class [0101]: Device [8086:7010] + 0000:00:01.3 Class [0680]: Device [8086:7113] (rev 03) + 0000:00:02.0 Class [0604]: Device [1b36:0001] + 0000:00:03.0 Class [0780]: Device [1af4:1003] + 0000:00:04.0 Class [0100]: Device [1af4:1004] + 0000:00:05.0 Class [0002]: Device [1af4:1009] + 0000:00:06.0 Class [0200]: Device [1af4:1000] + 0000:00:0f.0 Class [0300]: Device [8086:1616] (rev 09) + ``` + + BDF "0000:00:0f.0" is assigned to the VGPU device. + + Additionally, you can access the device node for the graphics device: + + ``` + $ ls /dev/dri + card0 renderD128 + ``` diff --git a/use-cases/Nvidia-GPU-passthrough-and-Kata.md b/use-cases/Nvidia-GPU-passthrough-and-Kata.md new file mode 100644 index 0000000000..8e37e97ad2 --- /dev/null +++ b/use-cases/Nvidia-GPU-passthrough-and-Kata.md @@ -0,0 +1,313 @@ +# Using Nvidia GPU device with Kata Containers + +- [Using Nvidia GPU device with Kata Containers](#using-nvidia-gpu-device-with-kata-containers) + - [Hardware Requirements](#hardware-requirements) + - [Host BIOS Requirements](#host-bios-requirements) + - [Host Kernel Requirements](#host-kernel-requirements) + - [Install and configure Kata Containers](#install-and-configure-kata-containers) + - [Build Kata Containers kernel with GPU support](#build-kata-containers-kernel-with-gpu-support) + - [Nvidia GPU pass-through mode with Kata Containers](#nvidia-gpu-pass-through-mode-with-kata-containers) + - [Nvidia vGPU mode with Kata Containers](#nvidia-vgpu-mode-with-kata-containers) + - [Install Nvidia Driver in Kata Containers](#install-nvidia-driver-in-kata-containers) + - [References](#references) + + +An Nvidia GPU device can be passed to a Kata Containers container using GPU passthrough +(Nvidia GPU pass-through mode) as well as GPU mediated passthrough (Nvidia vGPU mode).  + +Nvidia GPU pass-through mode, an entire physical GPU is directly assigned to one VM, +bypassing the Nvidia Virtual GPU Manager. In this mode of operation, the GPU is accessed +exclusively by the Nvidia driver running in the VM to which it is assigned. +The GPU is not shared among VMs. + +Nvidia Virtual GPU (vGPU) enables multiple virtual machines (VMs) to have simultaneous, +direct access to a single physical GPU, using the same Nvidia graphics drivers that are +deployed on non-virtualized operating systems. By doing this, Nvidia vGPU provides VMs +with unparalleled graphics performance, compute performance, and application compatibility, +together with the cost-effectiveness and scalability brought about by sharing a GPU +among multiple workloads. + +| Technology | Description | Behaviour | Detail | +| --- | --- | --- | --- | +| Nvidia GPU pass-through mode | GPU passthrough | Physical GPU assigned to a single VM | Direct GPU assignment to VM without limitation | +| Nvidia vGPU mode | GPU sharing | Physical GPU shared by multiple VMs | Mediated passthrough | + +## Hardware Requirements +Nvidia GPUs Recommended for Virtualization: + +- Nvidia Tesla (T4, M10, P6, V100 or newer) +- Nvidia Quadro RTX 6000/8000 + +## Host BIOS Requirements + +Some hardware requires a larger PCI BARs window, for example, Nvidia Tesla P100, K40m +``` +$ lspci -s 04:00.0 -vv | grep Region + Region 0: Memory at c6000000 (32-bit, non-prefetchable) [size=16M] + Region 1: Memory at 383800000000 (64-bit, prefetchable) [size=16G] #above 4G + Region 3: Memory at 383c00000000 (64-bit, prefetchable) [size=32M] +``` + +For large BARs devices, MMIO mapping above 4G address space should be `enabled` +in the PCI configuration of the BIOS. + +Some hardware vendors use different name in BIOS, such as: + +- Above 4G Decoding +- Memory Hole for PCI MMIO +- Memory Mapped I/O above 4GB + +The following steps outline the workflow for using an Nvidia GPU with Kata. + +## Host Kernel Requirements +The following configurations need to be enabled on your host kernel: + +- `CONFIG_VFIO` +- `CONFIG_VFIO_IOMMU_TYPE1` +- `CONFIG_VFIO_MDEV` +- `CONFIG_VFIO_MDEV_DEVICE` +- `CONFIG_VFIO_PCI` + +Your host kernel needs to be booted with `intel_iommu=on` on the kernel command line. + +## Install and configure Kata Containers +To use non-large BARs devices (for example, Nvidia Tesla T4), you need Kata version 1.3.0 or above. +Follow the [Kata Containers setup instructions](https://github.com/kata-containers/documentation/blob/master/install/README.md) +to install the latest version of Kata. + +The following configuration in the Kata `configuration.toml` file as shown below can work: +``` +machine_type = "pc" + +hotplug_vfio_on_root_bus = true +``` + +To use large BARs devices (for example, Nvidia Tesla P100), you need Kata version 1.11.0 or above. + +The following configuration in the Kata `configuration.toml` file as shown below can work: + +Hotplug for PCI devices by `shpchp` (Linux's SHPC PCI Hotplug driver): +``` +machine_type = "q35" + +hotplug_vfio_on_root_bus = false +``` + +Hotplug for PCIe devices by `pciehp` (Linux's PCIe Hotplug driver): +``` +machine_type = "q35" + +hotplug_vfio_on_root_bus = true +pcie_root_port = 1 +``` + +## Build Kata Containers kernel with GPU support +The default guest kernel installed with Kata Containers does not provide GPU support. +To use an Nvidia GPU with Kata Containers, you need to build a kernel with the +necessary GPU support. + +The following kernel config options need to be enabled: +``` +# Support PCI/PCIe device hotplug (Required for large BARs device) +CONFIG_HOTPLUG_PCI_PCIE=y +CONFIG_HOTPLUG_PCI_SHPC=y + +# Support for loading modules (Required for load Nvidia drivers) +CONFIG_MODULES=y +CONFIG_MODULE_UNLOAD=y + +# Enable the MMIO access method for PCIe devices (Required for large BARs device) +CONFIG_PCI_MMCONFIG=y +``` + +The following kernel config options need to be disabled: +``` +# Disable Open Source Nvidia driver nouveau +# It conflicts with Nvidia official driver +CONFIG_DRM_NOUVEAU=n +``` +> **Note**: `CONFIG_DRM_NOUVEAU` is normally disabled by default. +It is worth checking that it is not enabled in your kernel configuration to prevent any conflicts. + + +Build the Kata Containers kernel with the previous config options, +using the instructions described in [Building Kata Containers kernel](https://github.com/kata-containers/packaging/tree/master/kernel). +For further details on building and installing guest kernels, +see [the developer guide](https://github.com/kata-containers/documentation/blob/master/Developer-Guide.md#install-guest-kernel-images). + +There is an easy way to build a guest kernel that supports Nvidia GPU: +``` +## Build guest kernel with https://github.com/kata-containers/packaging/tree/master/kernel + +# Prepare (download guest kernel source, generate .config) +$ ./build-kernel.sh -v 4.19.86 -g nvidia -f setup + +# Build guest kernel +$ ./build-kernel.sh -v 4.19.86 -g nvidia build + +# Install guest kernel +$ sudo -E ./build-kernel.sh -v 4.19.86 -g nvidia install +/usr/share/kata-containers/vmlinux-nvidia-gpu.container -> vmlinux-4.19.86-70-nvidia-gpu +/usr/share/kata-containers/vmlinuz-nvidia-gpu.container -> vmlinuz-4.19.86-70-nvidia-gpu +``` + +To build Nvidia Driver in Kata container, `kernel-devel` is required. +This is a way to generate rpm packages for `kernel-devel`: +``` +$ cd kata-linux-4.19.86-68 +$ make rpm-pkg +Output RPMs: +~/rpmbuild/RPMS/x86_64/kernel-devel-4.19.86_nvidia_gpu-1.x86_64.rpm +``` +> **Note**: +> - `kernel-devel` should be installed in Kata container before run Nvidia driver installer. +> - Run `make deb-pkg` to build the deb package. + +Before using the new guest kernel, please update the `kernel` parameters in `configuration.toml`. +``` +kernel = "/usr/share/kata-containers/vmlinuz-nvidia-gpu.container" +``` + +## Nvidia GPU pass-through mode with Kata Containers +Use the following steps to pass an Nvidia GPU device in pass-through mode with Kata: + +1. Find the Bus-Device-Function (BDF) for GPU device on host: + ``` + $ sudo lspci -nn -D | grep -i nvidia + 0000:04:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:15f8] (rev a1) + 0000:84:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:15f8] (rev a1) + ``` + > PCI address `0000:04:00.0` is assigned to the hardware GPU device. + > `10de:15f8` is the device ID of the hardware GPU device. + +2. Find the IOMMU group for the GPU device: + ``` + $ BDF="0000:04:00.0" + $ readlink -e /sys/bus/pci/devices/$BDF/iommu_group + /sys/kernel/iommu_groups/45 + ``` + The previous output shows that the GPU belongs to IOMMU group 45. + +3. Check the IOMMU group number under `/dev/vfio`: + ``` + $ ls -l /dev/vfio + total 0 + crw------- 1 root root 248, 0 Feb 28 09:57 45 + crw------- 1 root root 248, 1 Feb 28 09:57 54 + crw-rw-rw- 1 root root 10, 196 Feb 28 09:57 vfio + ``` + +4. Start a Kata container with GPU device: + ``` + $ sudo docker run -it --runtime=kata-runtime --rm --device /dev/vfio/45 centos /bin/bash + ``` + +5. Run `lspci` within the container to verify the GPU device is seen in the list + of the PCI devices. Note the vendor-device id of the GPU (`10de:15f8`) in the `lspci` output. + ``` + $ lspci -nn -D | grep '10de:15f8' + 0000:01:01.0 3D controller [0302]: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] [10de:15f8] (rev a1) + ``` + +6. Additionally, you can check the PCI BARs space of ​​the Nvidia GPU device in the container: + ``` + $ lspci -s 01:01.0 -vv | grep Region + Region 0: Memory at c0000000 (32-bit, non-prefetchable) [disabled] [size=16M] + Region 1: Memory at 4400000000 (64-bit, prefetchable) [disabled] [size=16G] + Region 3: Memory at 4800000000 (64-bit, prefetchable) [disabled] [size=32M] + ``` + > **Note**: If you see a message similar to the above, the BAR space of the Nvidia + > GPU has been successfully allocated. + +## Nvidia vGPU mode with Kata Containers + +Nvidia vGPU is a licensed product on all supported GPU boards. A software license +is required to enable all vGPU features within the guest VM. + +> **Note**: There is no suitable test environment, so it is not written here. + + +## Install Nvidia Driver in Kata Containers +Download the official Nvidia driver from +[https://www.nvidia.com/Download/index.aspx](https://www.nvidia.com/Download/index.aspx), +for example `NVIDIA-Linux-x86_64-418.87.01.run`. + +Install the `kernel-devel`(generated in the previous steps) for guest kernel: +``` +$ sudo rpm -ivh kernel-devel-4.19.86_gpu-1.x86_64.rpm +``` + +Here is an example to extract, compile and install Nvidia driver: +``` +## Extract +$ sh ./NVIDIA-Linux-x86_64-418.87.01.run -x + +## Compile and install (It will take some time) +$ cd NVIDIA-Linux-x86_64-418.87.01 +$ sudo ./nvidia-installer -a -q --ui=none \ + --no-cc-version-check \ + --no-opengl-files --no-install-libglvnd \ + --kernel-source-path=/usr/src/kernels/`uname -r` +``` + +Or just run one command line: +``` +$ sudo sh ./NVIDIA-Linux-x86_64-418.87.01.run -a -q --ui=none \ + --no-cc-version-check \ + --no-opengl-files --no-install-libglvnd \ + --kernel-source-path=/usr/src/kernels/`uname -r` +``` + +To view detailed logs of the installer: +``` +$ tail -f /var/log/nvidia-installer.log +``` + +Load Nvidia driver module manually +``` +# Optional(generate modules.dep and map files for Nvidia driver) +$ sudo depmod + +# Load module +$ sudo modprobe nvidia-drm + +# Check module +$ lsmod | grep nvidia +nvidia_drm 45056 0 +nvidia_modeset 1093632 1 nvidia_drm +nvidia 18202624 1 nvidia_modeset +drm_kms_helper 159744 1 nvidia_drm +drm 364544 3 nvidia_drm,drm_kms_helper +i2c_core 65536 3 nvidia,drm_kms_helper,drm +ipmi_msghandler 49152 1 nvidia +``` + + +Check Nvidia device status with `nvidia-smi` +``` +$ nvidia-smi +Tue Mar 3 00:03:49 2020 ++-----------------------------------------------------------------------------+ +| NVIDIA-SMI 418.87.01 Driver Version: 418.87.01 CUDA Version: 10.1 | +|-------------------------------+----------------------+----------------------+ +| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | +| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | +|===============================+======================+======================| +| 0 Tesla P100-PCIE... Off | 00000000:01:01.0 Off | 0 | +| N/A 27C P0 25W / 250W | 0MiB / 16280MiB | 0% Default | ++-------------------------------+----------------------+----------------------+ + ++-----------------------------------------------------------------------------+ +| Processes: GPU Memory | +| GPU PID Type Process name Usage | +|=============================================================================| +| No running processes found | ++-----------------------------------------------------------------------------+ + +``` + +## References + +- [Configuring a VM for GPU Pass-Through by Using the QEMU Command Line](https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#using-gpu-pass-through-red-hat-el-qemu-cli) +- https://gitlab.com/nvidia/container-images/driver/-/tree/master +- https://github.com/NVIDIA/nvidia-docker/wiki/Driver-containers-(Beta)