Merge pull request #615 from Jimmy-Xu/add-nvidia-gpu-use-case

use-cases: Add documentation for using Nvidia GPU with Kata
This commit is contained in:
Xu Wang 2020-03-20 23:29:17 +08:00 committed by GitHub
commit bc22bb8d7d
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 612 additions and 272 deletions

View File

@ -1,274 +1,6 @@
# Using Intel GPU device with Kata Containers
# Using GPUs with Kata Containers
- [Using Intel GPU device with Kata Containers](#using-intel-gpu-device-with-kata-containers)
- [Hardware Requirements](#hardware-requirements)
- [Host Kernel Requirements](#host-kernel-requirements)
- [Install and configure Kata Containers](#install-and-configure-kata-containers)
- [Build Kata Containers kernel with GPU support](#build-kata-containers-kernel-with-gpu-support)
- [GVT-d with Kata Containers](#gvt-d-with-kata-containers)
- [GVT-g with Kata Containers](#gvt-g-with-kata-containers)
Kata Containers supports passing certain GPUs from the host into the container. Select the GPU vendor for detailed information:
An Intel Graphics device can be passed to a Kata Containers container using GPU
passthrough (Intel GVT-d) as well as GPU mediated passthrough (Intel GVT-g).
Intel GVT-d (one VM to one physical GPU) also named as Intel-Graphics-Device
passthrough feature is one flavor of graphics virtualization approach.
This flavor allows direct assignment of an entire GPU to a single user,
passing the native driver capabilities through the hypervisor without any limitations.
Intel GVT-g (multiple VMs to one physical GPU) is a full GPU virtualization solution
with mediated pass-through.<br/>
A virtual GPU instance is maintained for each VM, with part of performance critical
resources, directly assigned. The ability to run a native graphics driver inside a
VM without hypervisor intervention in performance critical paths, achieves a good
balance among performance, feature, and sharing capability.
| Technology | Description | Behaviour | Detail |
|-|-|-|-|
| Intel GVT-d | GPU passthrough | Physical GPU assigned to a single VM | Direct GPU assignment to VM without limitation |
| Intel GVT-g | GPU sharing | Physical GPU shared by multiple VMs | Mediated passthrough |
## Hardware Requirements
- For client platforms, 5th generation Intel® Core Processor Graphics or higher are required.
- For server platforms, E3_v4 or higher Xeon Processor Graphics are required.
The following steps outline the workflow for using an Intel Graphics device with Kata.
## Host Kernel Requirements
The following configurations need to be enabled on your host kernel:
```
CONFIG_VFIO_IOMMU_TYPE1=m
CONFIG_VFIO=m
CONFIG_VFIO_PCI=m
CONFIG_VFIO_MDEV=m
CONFIG_VFIO_MDEV_DEVICE=m
CONFIG_DRM_I915_GVT=m
CONFIG_DRM_I915_GVT_KVMGT=m
```
Your host kernel needs to be booted with `intel_iommu=on` on the kernel command
line.
## Install and configure Kata Containers
To use this feature, you need Kata version 1.3.0 or above.
Follow the [Kata Containers setup instructions](https://github.com/kata-containers/documentation/blob/master/install/README.md)
to install the latest version of Kata.
In order to pass a GPU to a Kata Container, you need to enable the `hotplug_vfio_on_root_bus`
configuration in the Kata `configuration.toml` file as shown below.
```
$ sudo sed -i -e 's/^# *\(hotplug_vfio_on_root_bus\).*=.*$/\1 = true/g' /usr/share/defaults/kata-containers/configuration.toml
```
Make sure you are using the `pc` machine type by verifying `machine_type = "pc"` is
set in the `configuration.toml`.
## Build Kata Containers kernel with GPU support
The default guest kernel installed with Kata Containers does not provide GPU support.
To use an Intel GPU with Kata Containers, you need to build a kernel with the necessary
GPU support.
The following i915 kernel config options need to be enabled:
```
CONFIG_DRM=y
CONFIG_DRM_I915=y
CONFIG_DRM_I915_USERPTR=y
```
Build the Kata Containers kernel with the previous config options, using the instructions
described in [Building Kata Containers kernel](https://github.com/kata-containers/packaging/tree/master/kernel).
For further details on building and installing guest kernels, see [the developer guide](https://github.com/kata-containers/documentation/blob/master/Developer-Guide.md#install-guest-kernel-images).
## GVT-d with Kata Containers
Use the following steps to pass an Intel Graphics device in GVT-d mode with Kata:
1. Find the Bus-Device-Function (BDF) for GPU device:
```
$ sudo lspci -nn -D | grep Graphics
0000:00:02.0 VGA compatible controller [0300]: Intel Corporation Broadwell-U Integrated Graphics [8086:1616] (rev 09)
```
Run the previous command to determine the BDF for the GPU device on host.<br/>
From the previous output, PCI address `0000:00:02.0` is assigned to the hardware GPU device.<br/>
This BDF is used later to unbind the GPU device from the host.<br/>
"8086 1616" is the device ID of the hardware GPU device. It is used later to
rebind the GPU device to `vfio-pci` driver.
2. Find the IOMMU group for the GPU device:
```
$ BDF="0000:00:02.0"
$ readlink -e /sys/bus/pci/devices/$BDF/iommu_group
/sys/kernel/iommu_groups/1
```
The previous output shows that the GPU belongs to IOMMU group 1.
3. Unbind the GPU:
```
$ echo $BDF | sudo tee /sys/bus/pci/devices/$BDF/driver/unbind
```
4. Bind the GPU to the `vfio-pci` device driver:
```
$ sudo modprobe vfio-pci
$ echo 8086 1616 | sudo tee /sys/bus/pci/drivers/vfio-pci/new_id
$ echo $BDF | sudo tee --append /sys/bus/pci/drivers/vfio-pci/bind
```
After you run the previous commands, the GPU is bound to `vfio-pci` driver.<br/>
A new directory with the IOMMU group number is created under `/dev/vfio`:
```
$ ls -l /dev/vfio
total 0
crw------- 1 root root 241, 0 May 18 15:38 1
crw-rw-rw- 1 root root 10, 196 May 18 15:37 vfio
```
5. Start a Kata container with GPU device:
```
$ sudo docker run -it --runtime=kata-runtime --rm --device /dev/vfio/1 -v /dev:/dev debian /bin/bash
```
Run `lspci` within the container to verify the GPU device is seen in the list of
the PCI devices. Note the vendor-device id of the GPU ("8086:1616") in the `lspci` output.
```
$ lspci -nn -D
0000:00:00.0 Class [0600]: Device [8086:1237] (rev 02)
0000:00:01.0 Class [0601]: Device [8086:7000]
0000:00:01.1 Class [0101]: Device [8086:7010]
0000:00:01.3 Class [0680]: Device [8086:7113] (rev 03)
0000:00:02.0 Class [0604]: Device [1b36:0001]
0000:00:03.0 Class [0780]: Device [1af4:1003]
0000:00:04.0 Class [0100]: Device [1af4:1004]
0000:00:05.0 Class [0002]: Device [1af4:1009]
0000:00:06.0 Class [0200]: Device [1af4:1000]
0000:00:0f.0 Class [0300]: Device [8086:1616] (rev 09)
```
Additionally, you can access the device node for the graphics device:
```
$ ls /dev/dri
card0 renderD128
```
## GVT-g with Kata Containers
For GVT-g, you append `i915.enable_gvt=1` in addition to `intel_iommu=on`
on your host kernel command line and then reboot your host.
Use the following steps to pass an Intel Graphics device in GVT-g mode to a Kata Container:
1. Find the BDF for GPU device:
```
$ sudo lspci -nn -D | grep Graphics
0000:00:02.0 VGA compatible controller [0300]: Intel Corporation Broadwell-U Integrated Graphics [8086:1616] (rev 09)
```
Run the previous command to find out the BDF for the GPU device on host.
The previous output shows PCI address "0000:00:02.0" is assigned to the GPU device.
2. Choose the MDEV (Mediated Device) type for VGPU (Virtual GPU):
For background on `mdev` types, please follow this [kernel documentation](https://github.com/torvalds/linux/blob/master/Documentation/vfio-mediated-device.txt).
* List out the `mdev` types for the VGPU:
```
$ BDF="0000:00:02.0"
$ ls /sys/devices/pci0000:00/$BDF/mdev_supported_types
i915-GVTg_V4_1 i915-GVTg_V4_2 i915-GVTg_V4_4 i915-GVTg_V4_8
```
* Inspect the `mdev` types and choose one that fits your requirement:
```
$ cd /sys/devices/pci0000:00/0000:00:02.0/mdev_supported_types/i915-GVTg_V4_8 && ls
available_instances create description device_api devices
$ cat description
low_gm_size: 64MB
high_gm_size: 384MB
fence: 4
resolution: 1024x768
weight: 2
$ cat available_instances
7
```
The output of file `description` represents the GPU resources that are
assigned to the VGPU with specified MDEV type.The output of file `available_instances`
represents the remaining amount of VGPUs you can create with specified MDEV type.
3. Create a VGPU:
* Generate a UUID:
```
$ gpu_uuid=$(uuid)
```
* Write the UUID to the `create` file under the chosen `mdev` type:
```
$ echo $(gpu_uuid) | sudo tee /sys/devices/pci0000:00/0000:00:02.0/mdev_supported_types/i915-GVTg_V4_8/create
```
4. Find the IOMMU group for the VGPU:
```
$ ls -la /sys/devices/pci0000:00/0000:00:02.0/mdev_supported_types/i915-GVTg_V4_8/devices/${gpu_uuid}/iommu_group
lrwxrwxrwx 1 root root 0 May 18 14:35 devices/bbc4aafe-5807-11e8-a43e-03533cceae7d/iommu_group -> ../../../../kernel/iommu_groups/0
$ ls -l /dev/vfio
total 0
crw------- 1 root root 241, 0 May 18 11:30 0
crw-rw-rw- 1 root root 10, 196 May 18 11:29 vfio
```
The IOMMU group "0" is created from the previous output.<br/>
Now you can use the device node `/dev/vfio/0` in docker command line to pass
the VGPU to a Kata Container.
5. Start Kata container with GPU device enabled:
```
$ sudo docker run -it --runtime=kata-runtime --rm --device /dev/vfio/0 -v /dev:/dev debian /bin/bash
$ lspci -nn -D
0000:00:00.0 Class [0600]: Device [8086:1237] (rev 02)
0000:00:01.0 Class [0601]: Device [8086:7000]
0000:00:01.1 Class [0101]: Device [8086:7010]
0000:00:01.3 Class [0680]: Device [8086:7113] (rev 03)
0000:00:02.0 Class [0604]: Device [1b36:0001]
0000:00:03.0 Class [0780]: Device [1af4:1003]
0000:00:04.0 Class [0100]: Device [1af4:1004]
0000:00:05.0 Class [0002]: Device [1af4:1009]
0000:00:06.0 Class [0200]: Device [1af4:1000]
0000:00:0f.0 Class [0300]: Device [8086:1616] (rev 09)
```
BDF "0000:00:0f.0" is assigned to the VGPU device.
Additionally, you can access the device node for the graphics device:
```
$ ls /dev/dri
card0 renderD128
```
- [Intel](Intel-GPU-passthrough-and-Kata.md)
- [Nvidia](Nvidia-GPU-passthrough-and-Kata.md)

View File

@ -0,0 +1,295 @@
# Using Intel GPU device with Kata Containers
- [Using Intel GPU device with Kata Containers](#using-intel-gpu-device-with-kata-containers)
- [Hardware Requirements](#hardware-requirements)
- [Host Kernel Requirements](#host-kernel-requirements)
- [Install and configure Kata Containers](#install-and-configure-kata-containers)
- [Build Kata Containers kernel with GPU support](#build-kata-containers-kernel-with-gpu-support)
- [GVT-d with Kata Containers](#gvt-d-with-kata-containers)
- [GVT-g with Kata Containers](#gvt-g-with-kata-containers)
An Intel Graphics device can be passed to a Kata Containers container using GPU
passthrough (Intel GVT-d) as well as GPU mediated passthrough (Intel GVT-g).
Intel GVT-d (one VM to one physical GPU) also named as Intel-Graphics-Device
passthrough feature is one flavor of graphics virtualization approach.
This flavor allows direct assignment of an entire GPU to a single user,
passing the native driver capabilities through the hypervisor without any limitations.
Intel GVT-g (multiple VMs to one physical GPU) is a full GPU virtualization solution
with mediated pass-through.<br/>
A virtual GPU instance is maintained for each VM, with part of performance critical
resources, directly assigned. The ability to run a native graphics driver inside a
VM without hypervisor intervention in performance critical paths, achieves a good
balance among performance, feature, and sharing capability.
| Technology | Description | Behaviour | Detail |
|-|-|-|-|
| Intel GVT-d | GPU passthrough | Physical GPU assigned to a single VM | Direct GPU assignment to VM without limitation |
| Intel GVT-g | GPU sharing | Physical GPU shared by multiple VMs | Mediated passthrough |
## Hardware Requirements
- For client platforms, 5th generation Intel® Core Processor Graphics or higher are required.
- For server platforms, E3_v4 or higher Xeon Processor Graphics are required.
The following steps outline the workflow for using an Intel Graphics device with Kata.
## Host Kernel Requirements
The following configurations need to be enabled on your host kernel:
```
CONFIG_VFIO_IOMMU_TYPE1=m
CONFIG_VFIO=m
CONFIG_VFIO_PCI=m
CONFIG_VFIO_MDEV=m
CONFIG_VFIO_MDEV_DEVICE=m
CONFIG_DRM_I915_GVT=m
CONFIG_DRM_I915_GVT_KVMGT=m
```
Your host kernel needs to be booted with `intel_iommu=on` on the kernel command
line.
## Install and configure Kata Containers
To use this feature, you need Kata version 1.3.0 or above.
Follow the [Kata Containers setup instructions](https://github.com/kata-containers/documentation/blob/master/install/README.md)
to install the latest version of Kata.
In order to pass a GPU to a Kata Container, you need to enable the `hotplug_vfio_on_root_bus`
configuration in the Kata `configuration.toml` file as shown below.
```
$ sudo sed -i -e 's/^# *\(hotplug_vfio_on_root_bus\).*=.*$/\1 = true/g' /usr/share/defaults/kata-containers/configuration.toml
```
Make sure you are using the `pc` machine type by verifying `machine_type = "pc"` is
set in the `configuration.toml`.
## Build Kata Containers kernel with GPU support
The default guest kernel installed with Kata Containers does not provide GPU support.
To use an Intel GPU with Kata Containers, you need to build a kernel with the necessary
GPU support.
The following i915 kernel config options need to be enabled:
```
CONFIG_DRM=y
CONFIG_DRM_I915=y
CONFIG_DRM_I915_USERPTR=y
```
Build the Kata Containers kernel with the previous config options, using the instructions
described in [Building Kata Containers kernel](https://github.com/kata-containers/packaging/tree/master/kernel).
For further details on building and installing guest kernels, see [the developer guide](https://github.com/kata-containers/documentation/blob/master/Developer-Guide.md#install-guest-kernel-images).
There is an easy way to build a guest kernel that supports Intel GPU:
```
## Build guest kernel with https://github.com/kata-containers/packaging/tree/master/kernel
# Prepare (download guest kernel source, generate .config)
$ ./build-kernel.sh -g intel -f setup
# Build guest kernel
$ ./build-kernel.sh -g intel build
# Install guest kernel
$ sudo -E ./build-kernel.sh -g intel install
/usr/share/kata-containers/vmlinux-intel-gpu.container -> vmlinux-5.4.15-70-intel-gpu
/usr/share/kata-containers/vmlinuz-intel-gpu.container -> vmlinuz-5.4.15-70-intel-gpu
```
Before using the new guest kernel, please update the `kernel` parameters in `configuration.toml`.
```
kernel = "/usr/share/kata-containers/vmlinuz-intel-gpu.container"
```
## GVT-d with Kata Containers
Use the following steps to pass an Intel Graphics device in GVT-d mode with Kata:
1. Find the Bus-Device-Function (BDF) for GPU device:
```
$ sudo lspci -nn -D | grep Graphics
0000:00:02.0 VGA compatible controller [0300]: Intel Corporation Broadwell-U Integrated Graphics [8086:1616] (rev 09)
```
Run the previous command to determine the BDF for the GPU device on host.<br/>
From the previous output, PCI address `0000:00:02.0` is assigned to the hardware GPU device.<br/>
This BDF is used later to unbind the GPU device from the host.<br/>
"8086 1616" is the device ID of the hardware GPU device. It is used later to
rebind the GPU device to `vfio-pci` driver.
2. Find the IOMMU group for the GPU device:
```
$ BDF="0000:00:02.0"
$ readlink -e /sys/bus/pci/devices/$BDF/iommu_group
/sys/kernel/iommu_groups/1
```
The previous output shows that the GPU belongs to IOMMU group 1.
3. Unbind the GPU:
```
$ echo $BDF | sudo tee /sys/bus/pci/devices/$BDF/driver/unbind
```
4. Bind the GPU to the `vfio-pci` device driver:
```
$ sudo modprobe vfio-pci
$ echo 8086 1616 | sudo tee /sys/bus/pci/drivers/vfio-pci/new_id
$ echo $BDF | sudo tee --append /sys/bus/pci/drivers/vfio-pci/bind
```
After you run the previous commands, the GPU is bound to `vfio-pci` driver.<br/>
A new directory with the IOMMU group number is created under `/dev/vfio`:
```
$ ls -l /dev/vfio
total 0
crw------- 1 root root 241, 0 May 18 15:38 1
crw-rw-rw- 1 root root 10, 196 May 18 15:37 vfio
```
5. Start a Kata container with GPU device:
```
$ sudo docker run -it --runtime=kata-runtime --rm --device /dev/vfio/1 -v /dev:/dev debian /bin/bash
```
Run `lspci` within the container to verify the GPU device is seen in the list of
the PCI devices. Note the vendor-device id of the GPU ("8086:1616") in the `lspci` output.
```
$ lspci -nn -D
0000:00:00.0 Class [0600]: Device [8086:1237] (rev 02)
0000:00:01.0 Class [0601]: Device [8086:7000]
0000:00:01.1 Class [0101]: Device [8086:7010]
0000:00:01.3 Class [0680]: Device [8086:7113] (rev 03)
0000:00:02.0 Class [0604]: Device [1b36:0001]
0000:00:03.0 Class [0780]: Device [1af4:1003]
0000:00:04.0 Class [0100]: Device [1af4:1004]
0000:00:05.0 Class [0002]: Device [1af4:1009]
0000:00:06.0 Class [0200]: Device [1af4:1000]
0000:00:0f.0 Class [0300]: Device [8086:1616] (rev 09)
```
Additionally, you can access the device node for the graphics device:
```
$ ls /dev/dri
card0 renderD128
```
## GVT-g with Kata Containers
For GVT-g, you append `i915.enable_gvt=1` in addition to `intel_iommu=on`
on your host kernel command line and then reboot your host.
Use the following steps to pass an Intel Graphics device in GVT-g mode to a Kata Container:
1. Find the BDF for GPU device:
```
$ sudo lspci -nn -D | grep Graphics
0000:00:02.0 VGA compatible controller [0300]: Intel Corporation Broadwell-U Integrated Graphics [8086:1616] (rev 09)
```
Run the previous command to find out the BDF for the GPU device on host.
The previous output shows PCI address "0000:00:02.0" is assigned to the GPU device.
2. Choose the MDEV (Mediated Device) type for VGPU (Virtual GPU):
For background on `mdev` types, please follow this [kernel documentation](https://github.com/torvalds/linux/blob/master/Documentation/driver-api/vfio-mediated-device.rst).
* List out the `mdev` types for the VGPU:
```
$ BDF="0000:00:02.0"
$ ls /sys/devices/pci0000:00/$BDF/mdev_supported_types
i915-GVTg_V4_1 i915-GVTg_V4_2 i915-GVTg_V4_4 i915-GVTg_V4_8
```
* Inspect the `mdev` types and choose one that fits your requirement:
```
$ cd /sys/devices/pci0000:00/0000:00:02.0/mdev_supported_types/i915-GVTg_V4_8 && ls
available_instances create description device_api devices
$ cat description
low_gm_size: 64MB
high_gm_size: 384MB
fence: 4
resolution: 1024x768
weight: 2
$ cat available_instances
7
```
The output of file `description` represents the GPU resources that are
assigned to the VGPU with specified MDEV type.The output of file `available_instances`
represents the remaining amount of VGPUs you can create with specified MDEV type.
3. Create a VGPU:
* Generate a UUID:
```
$ gpu_uuid=$(uuid)
```
* Write the UUID to the `create` file under the chosen `mdev` type:
```
$ echo $(gpu_uuid) | sudo tee /sys/devices/pci0000:00/0000:00:02.0/mdev_supported_types/i915-GVTg_V4_8/create
```
4. Find the IOMMU group for the VGPU:
```
$ ls -la /sys/devices/pci0000:00/0000:00:02.0/mdev_supported_types/i915-GVTg_V4_8/devices/${gpu_uuid}/iommu_group
lrwxrwxrwx 1 root root 0 May 18 14:35 devices/bbc4aafe-5807-11e8-a43e-03533cceae7d/iommu_group -> ../../../../kernel/iommu_groups/0
$ ls -l /dev/vfio
total 0
crw------- 1 root root 241, 0 May 18 11:30 0
crw-rw-rw- 1 root root 10, 196 May 18 11:29 vfio
```
The IOMMU group "0" is created from the previous output.<br/>
Now you can use the device node `/dev/vfio/0` in docker command line to pass
the VGPU to a Kata Container.
5. Start Kata container with GPU device enabled:
```
$ sudo docker run -it --runtime=kata-runtime --rm --device /dev/vfio/0 -v /dev:/dev debian /bin/bash
$ lspci -nn -D
0000:00:00.0 Class [0600]: Device [8086:1237] (rev 02)
0000:00:01.0 Class [0601]: Device [8086:7000]
0000:00:01.1 Class [0101]: Device [8086:7010]
0000:00:01.3 Class [0680]: Device [8086:7113] (rev 03)
0000:00:02.0 Class [0604]: Device [1b36:0001]
0000:00:03.0 Class [0780]: Device [1af4:1003]
0000:00:04.0 Class [0100]: Device [1af4:1004]
0000:00:05.0 Class [0002]: Device [1af4:1009]
0000:00:06.0 Class [0200]: Device [1af4:1000]
0000:00:0f.0 Class [0300]: Device [8086:1616] (rev 09)
```
BDF "0000:00:0f.0" is assigned to the VGPU device.
Additionally, you can access the device node for the graphics device:
```
$ ls /dev/dri
card0 renderD128
```

View File

@ -0,0 +1,313 @@
# Using Nvidia GPU device with Kata Containers
- [Using Nvidia GPU device with Kata Containers](#using-nvidia-gpu-device-with-kata-containers)
- [Hardware Requirements](#hardware-requirements)
- [Host BIOS Requirements](#host-bios-requirements)
- [Host Kernel Requirements](#host-kernel-requirements)
- [Install and configure Kata Containers](#install-and-configure-kata-containers)
- [Build Kata Containers kernel with GPU support](#build-kata-containers-kernel-with-gpu-support)
- [Nvidia GPU pass-through mode with Kata Containers](#nvidia-gpu-pass-through-mode-with-kata-containers)
- [Nvidia vGPU mode with Kata Containers](#nvidia-vgpu-mode-with-kata-containers)
- [Install Nvidia Driver in Kata Containers](#install-nvidia-driver-in-kata-containers)
- [References](#references)
An Nvidia GPU device can be passed to a Kata Containers container using GPU passthrough
(Nvidia GPU pass-through mode) as well as GPU mediated passthrough (Nvidia vGPU mode). 
Nvidia GPU pass-through mode, an entire physical GPU is directly assigned to one VM,
bypassing the Nvidia Virtual GPU Manager. In this mode of operation, the GPU is accessed
exclusively by the Nvidia driver running in the VM to which it is assigned.
The GPU is not shared among VMs.
Nvidia Virtual GPU (vGPU) enables multiple virtual machines (VMs) to have simultaneous,
direct access to a single physical GPU, using the same Nvidia graphics drivers that are
deployed on non-virtualized operating systems. By doing this, Nvidia vGPU provides VMs
with unparalleled graphics performance, compute performance, and application compatibility,
together with the cost-effectiveness and scalability brought about by sharing a GPU
among multiple workloads.
| Technology | Description | Behaviour | Detail |
| --- | --- | --- | --- |
| Nvidia GPU pass-through mode | GPU passthrough | Physical GPU assigned to a single VM | Direct GPU assignment to VM without limitation |
| Nvidia vGPU mode | GPU sharing | Physical GPU shared by multiple VMs | Mediated passthrough |
## Hardware Requirements
Nvidia GPUs Recommended for Virtualization:
- Nvidia Tesla (T4, M10, P6, V100 or newer)
- Nvidia Quadro RTX 6000/8000
## Host BIOS Requirements
Some hardware requires a larger PCI BARs window, for example, Nvidia Tesla P100, K40m
```
$ lspci -s 04:00.0 -vv | grep Region
Region 0: Memory at c6000000 (32-bit, non-prefetchable) [size=16M]
Region 1: Memory at 383800000000 (64-bit, prefetchable) [size=16G] #above 4G
Region 3: Memory at 383c00000000 (64-bit, prefetchable) [size=32M]
```
For large BARs devices, MMIO mapping above 4G address space should be `enabled`
in the PCI configuration of the BIOS.
Some hardware vendors use different name in BIOS, such as:
- Above 4G Decoding
- Memory Hole for PCI MMIO
- Memory Mapped I/O above 4GB
The following steps outline the workflow for using an Nvidia GPU with Kata.
## Host Kernel Requirements
The following configurations need to be enabled on your host kernel:
- `CONFIG_VFIO`
- `CONFIG_VFIO_IOMMU_TYPE1`
- `CONFIG_VFIO_MDEV`
- `CONFIG_VFIO_MDEV_DEVICE`
- `CONFIG_VFIO_PCI`
Your host kernel needs to be booted with `intel_iommu=on` on the kernel command line.
## Install and configure Kata Containers
To use non-large BARs devices (for example, Nvidia Tesla T4), you need Kata version 1.3.0 or above.
Follow the [Kata Containers setup instructions](https://github.com/kata-containers/documentation/blob/master/install/README.md)
to install the latest version of Kata.
The following configuration in the Kata `configuration.toml` file as shown below can work:
```
machine_type = "pc"
hotplug_vfio_on_root_bus = true
```
To use large BARs devices (for example, Nvidia Tesla P100), you need Kata version 1.11.0 or above.
The following configuration in the Kata `configuration.toml` file as shown below can work:
Hotplug for PCI devices by `shpchp` (Linux's SHPC PCI Hotplug driver):
```
machine_type = "q35"
hotplug_vfio_on_root_bus = false
```
Hotplug for PCIe devices by `pciehp` (Linux's PCIe Hotplug driver):
```
machine_type = "q35"
hotplug_vfio_on_root_bus = true
pcie_root_port = 1
```
## Build Kata Containers kernel with GPU support
The default guest kernel installed with Kata Containers does not provide GPU support.
To use an Nvidia GPU with Kata Containers, you need to build a kernel with the
necessary GPU support.
The following kernel config options need to be enabled:
```
# Support PCI/PCIe device hotplug (Required for large BARs device)
CONFIG_HOTPLUG_PCI_PCIE=y
CONFIG_HOTPLUG_PCI_SHPC=y
# Support for loading modules (Required for load Nvidia drivers)
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# Enable the MMIO access method for PCIe devices (Required for large BARs device)
CONFIG_PCI_MMCONFIG=y
```
The following kernel config options need to be disabled:
```
# Disable Open Source Nvidia driver nouveau
# It conflicts with Nvidia official driver
CONFIG_DRM_NOUVEAU=n
```
> **Note**: `CONFIG_DRM_NOUVEAU` is normally disabled by default.
It is worth checking that it is not enabled in your kernel configuration to prevent any conflicts.
Build the Kata Containers kernel with the previous config options,
using the instructions described in [Building Kata Containers kernel](https://github.com/kata-containers/packaging/tree/master/kernel).
For further details on building and installing guest kernels,
see [the developer guide](https://github.com/kata-containers/documentation/blob/master/Developer-Guide.md#install-guest-kernel-images).
There is an easy way to build a guest kernel that supports Nvidia GPU:
```
## Build guest kernel with https://github.com/kata-containers/packaging/tree/master/kernel
# Prepare (download guest kernel source, generate .config)
$ ./build-kernel.sh -v 4.19.86 -g nvidia -f setup
# Build guest kernel
$ ./build-kernel.sh -v 4.19.86 -g nvidia build
# Install guest kernel
$ sudo -E ./build-kernel.sh -v 4.19.86 -g nvidia install
/usr/share/kata-containers/vmlinux-nvidia-gpu.container -> vmlinux-4.19.86-70-nvidia-gpu
/usr/share/kata-containers/vmlinuz-nvidia-gpu.container -> vmlinuz-4.19.86-70-nvidia-gpu
```
To build Nvidia Driver in Kata container, `kernel-devel` is required.
This is a way to generate rpm packages for `kernel-devel`:
```
$ cd kata-linux-4.19.86-68
$ make rpm-pkg
Output RPMs:
~/rpmbuild/RPMS/x86_64/kernel-devel-4.19.86_nvidia_gpu-1.x86_64.rpm
```
> **Note**:
> - `kernel-devel` should be installed in Kata container before run Nvidia driver installer.
> - Run `make deb-pkg` to build the deb package.
Before using the new guest kernel, please update the `kernel` parameters in `configuration.toml`.
```
kernel = "/usr/share/kata-containers/vmlinuz-nvidia-gpu.container"
```
## Nvidia GPU pass-through mode with Kata Containers
Use the following steps to pass an Nvidia GPU device in pass-through mode with Kata:
1. Find the Bus-Device-Function (BDF) for GPU device on host:
```
$ sudo lspci -nn -D | grep -i nvidia
0000:04:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:15f8] (rev a1)
0000:84:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:15f8] (rev a1)
```
> PCI address `0000:04:00.0` is assigned to the hardware GPU device.
> `10de:15f8` is the device ID of the hardware GPU device.
2. Find the IOMMU group for the GPU device:
```
$ BDF="0000:04:00.0"
$ readlink -e /sys/bus/pci/devices/$BDF/iommu_group
/sys/kernel/iommu_groups/45
```
The previous output shows that the GPU belongs to IOMMU group 45.
3. Check the IOMMU group number under `/dev/vfio`:
```
$ ls -l /dev/vfio
total 0
crw------- 1 root root 248, 0 Feb 28 09:57 45
crw------- 1 root root 248, 1 Feb 28 09:57 54
crw-rw-rw- 1 root root 10, 196 Feb 28 09:57 vfio
```
4. Start a Kata container with GPU device:
```
$ sudo docker run -it --runtime=kata-runtime --rm --device /dev/vfio/45 centos /bin/bash
```
5. Run `lspci` within the container to verify the GPU device is seen in the list
of the PCI devices. Note the vendor-device id of the GPU (`10de:15f8`) in the `lspci` output.
```
$ lspci -nn -D | grep '10de:15f8'
0000:01:01.0 3D controller [0302]: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] [10de:15f8] (rev a1)
```
6. Additionally, you can check the PCI BARs space of the Nvidia GPU device in the container:
```
$ lspci -s 01:01.0 -vv | grep Region
Region 0: Memory at c0000000 (32-bit, non-prefetchable) [disabled] [size=16M]
Region 1: Memory at 4400000000 (64-bit, prefetchable) [disabled] [size=16G]
Region 3: Memory at 4800000000 (64-bit, prefetchable) [disabled] [size=32M]
```
> **Note**: If you see a message similar to the above, the BAR space of the Nvidia
> GPU has been successfully allocated.
## Nvidia vGPU mode with Kata Containers
Nvidia vGPU is a licensed product on all supported GPU boards. A software license
is required to enable all vGPU features within the guest VM.
> **Note**: There is no suitable test environment, so it is not written here.
## Install Nvidia Driver in Kata Containers
Download the official Nvidia driver from
[https://www.nvidia.com/Download/index.aspx](https://www.nvidia.com/Download/index.aspx),
for example `NVIDIA-Linux-x86_64-418.87.01.run`.
Install the `kernel-devel`(generated in the previous steps) for guest kernel:
```
$ sudo rpm -ivh kernel-devel-4.19.86_gpu-1.x86_64.rpm
```
Here is an example to extract, compile and install Nvidia driver:
```
## Extract
$ sh ./NVIDIA-Linux-x86_64-418.87.01.run -x
## Compile and install (It will take some time)
$ cd NVIDIA-Linux-x86_64-418.87.01
$ sudo ./nvidia-installer -a -q --ui=none \
--no-cc-version-check \
--no-opengl-files --no-install-libglvnd \
--kernel-source-path=/usr/src/kernels/`uname -r`
```
Or just run one command line:
```
$ sudo sh ./NVIDIA-Linux-x86_64-418.87.01.run -a -q --ui=none \
--no-cc-version-check \
--no-opengl-files --no-install-libglvnd \
--kernel-source-path=/usr/src/kernels/`uname -r`
```
To view detailed logs of the installer:
```
$ tail -f /var/log/nvidia-installer.log
```
Load Nvidia driver module manually
```
# Optionalgenerate modules.dep and map files for Nvidia driver
$ sudo depmod
# Load module
$ sudo modprobe nvidia-drm
# Check module
$ lsmod | grep nvidia
nvidia_drm 45056 0
nvidia_modeset 1093632 1 nvidia_drm
nvidia 18202624 1 nvidia_modeset
drm_kms_helper 159744 1 nvidia_drm
drm 364544 3 nvidia_drm,drm_kms_helper
i2c_core 65536 3 nvidia,drm_kms_helper,drm
ipmi_msghandler 49152 1 nvidia
```
Check Nvidia device status with `nvidia-smi`
```
$ nvidia-smi
Tue Mar 3 00:03:49 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01 Driver Version: 418.87.01 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:01:01.0 Off | 0 |
| N/A 27C P0 25W / 250W | 0MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
```
## References
- [Configuring a VM for GPU Pass-Through by Using the QEMU Command Line](https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#using-gpu-pass-through-red-hat-el-qemu-cli)
- https://gitlab.com/nvidia/container-images/driver/-/tree/master
- https://github.com/NVIDIA/nvidia-docker/wiki/Driver-containers-(Beta)