From dd4bd7f47109e40b893422ba40610074e44bb0e4 Mon Sep 17 00:00:00 2001
From: Zvonko Kaiser <zkaiser@nvidia.com>
Date: Fri, 18 Mar 2022 03:40:26 -0700
Subject: [PATCH] doc: Added initial doc update for NV GPUs

Fixed rpm vs deb references
Update to the shell portion

Fixes #3379

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
---
 docs/use-cases/GPU-passthrough-and-Kata.md    |   2 +-
 .../NVIDIA-GPU-passthrough-and-Kata.md        | 372 ++++++++++++++++++
 .../Nvidia-GPU-passthrough-and-Kata.md        | 293 --------------
 3 files changed, 373 insertions(+), 294 deletions(-)
 create mode 100644 docs/use-cases/NVIDIA-GPU-passthrough-and-Kata.md
 delete mode 100644 docs/use-cases/Nvidia-GPU-passthrough-and-Kata.md

diff --git a/docs/use-cases/GPU-passthrough-and-Kata.md b/docs/use-cases/GPU-passthrough-and-Kata.md
index 493c947c95..4f00ec52ee 100644
--- a/docs/use-cases/GPU-passthrough-and-Kata.md
+++ b/docs/use-cases/GPU-passthrough-and-Kata.md
@@ -3,4 +3,4 @@
 Kata Containers supports passing certain GPUs from the host into the container. Select the GPU vendor for detailed information:
 
 - [Intel](Intel-GPU-passthrough-and-Kata.md)
-- [Nvidia](Nvidia-GPU-passthrough-and-Kata.md)
+- [NVIDIA](NVIDIA-GPU-passthrough-and-Kata.md)
diff --git a/docs/use-cases/NVIDIA-GPU-passthrough-and-Kata.md b/docs/use-cases/NVIDIA-GPU-passthrough-and-Kata.md
new file mode 100644
index 0000000000..32943b8d32
--- /dev/null
+++ b/docs/use-cases/NVIDIA-GPU-passthrough-and-Kata.md
@@ -0,0 +1,372 @@
+# Using NVIDIA GPU device with Kata Containers
+
+An NVIDIA GPU device can be passed to a Kata Containers container using GPU
+passthrough (NVIDIA GPU pass-through mode) as well as GPU mediated passthrough
+(NVIDIA vGPU mode).
+
+NVIDIA GPU pass-through mode, an entire physical GPU is directly assigned to one
+VM, bypassing the NVIDIA Virtual GPU Manager. In this mode of operation, the GPU
+is accessed exclusively by the NVIDIA driver running in the VM to which it is
+assigned. The GPU is not shared among VMs.
+
+NVIDIA Virtual GPU (vGPU) enables multiple virtual machines (VMs) to have
+simultaneous, direct access to a single physical GPU, using the same NVIDIA
+graphics drivers that are deployed on non-virtualized operating systems. By
+doing this, NVIDIA vGPU provides VMs with unparalleled graphics performance,
+compute performance, and application compatibility, together with the
+cost-effectiveness and scalability brought about by sharing a GPU among multiple
+workloads. A vGPU can be either time-sliced or Multi-Instance GPU (MIG)-backed
+with [MIG-slices](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/).
+
+| Technology | Description | Behavior | Detail |
+| --- | --- | --- | --- |
+| NVIDIA GPU pass-through mode | GPU passthrough | Physical GPU assigned to a single VM | Direct GPU assignment to VM without limitation |
+| NVIDIA vGPU time-sliced | GPU time-sliced | Physical GPU time-sliced for multiple VMs | Mediated passthrough |
+| NVIDIA vGPU MIG-backed | GPU with MIG-slices | Physical GPU MIG-sliced for multiple VMs | Mediated passthrough |
+
+## Hardware Requirements
+
+NVIDIA GPUs Recommended for Virtualization:
+
+- NVIDIA Tesla (T4, M10, P6, V100 or newer)
+- NVIDIA Quadro RTX 6000/8000
+
+## Host BIOS Requirements
+
+Some hardware requires a larger PCI BARs window, for example, NVIDIA Tesla P100,
+K40m
+
+```sh
+$ lspci -s d0:00.0 -vv | grep Region
+        Region 0: Memory at e7000000 (32-bit, non-prefetchable) [size=16M]
+        Region 1: Memory at 222800000000 (64-bit, prefetchable) [size=32G] # Above 4G
+        Region 3: Memory at 223810000000 (64-bit, prefetchable) [size=32M]
+```
+
+For large BARs devices, MMIO mapping above 4G address space should be `enabled`
+in the PCI configuration of the BIOS.
+
+Some hardware vendors use different name in BIOS, such as:
+
+- Above 4G Decoding
+- Memory Hole for PCI MMIO
+- Memory Mapped I/O above 4GB
+
+If one is using a GPU based on the Ampere architecture and later additionally
+SR-IOV needs to be enabled for the vGPU use-case.
+
+The following steps outline the workflow for using an NVIDIA GPU with Kata.
+
+## Host Kernel Requirements
+
+The following configurations need to be enabled on your host kernel:
+
+- `CONFIG_VFIO`
+- `CONFIG_VFIO_IOMMU_TYPE1`
+- `CONFIG_VFIO_MDEV`
+- `CONFIG_VFIO_MDEV_DEVICE`
+- `CONFIG_VFIO_PCI`
+
+Your host kernel needs to be booted with `intel_iommu=on` on the kernel command
+line.
+
+## Install and configure Kata Containers
+
+To use non-large BARs devices (for example, NVIDIA Tesla T4), you need Kata
+version 1.3.0 or above. Follow the [Kata Containers setup
+instructions](../install/README.md) to install the latest version of Kata.
+
+To use large BARs devices (for example, NVIDIA Tesla P100), you need Kata
+version 1.11.0 or above.
+
+The following configuration in the Kata `configuration.toml` file as shown below
+can work:
+
+Hotplug for PCI devices with small BARs by `acpi_pcihp` (Linux's ACPI PCI
+Hotplug driver):
+
+```sh
+machine_type = "q35"
+
+hotplug_vfio_on_root_bus = false
+```
+
+Hotplug for PCIe devices with large BARs by `pciehp` (Linux's PCIe Hotplug
+driver):
+
+```sh
+machine_type = "q35"
+
+hotplug_vfio_on_root_bus = true
+pcie_root_port = 1
+```
+
+## Build Kata Containers kernel with GPU support
+
+The default guest kernel installed with Kata Containers does not provide GPU
+support. To use an NVIDIA GPU with Kata Containers, you need to build a kernel
+with the necessary GPU support.
+
+The following kernel config options need to be enabled:
+
+```sh
+# Support PCI/PCIe device hotplug (Required for large BARs device)
+CONFIG_HOTPLUG_PCI_PCIE=y
+
+# Support for loading modules (Required for load NVIDIA drivers)
+CONFIG_MODULES=y
+CONFIG_MODULE_UNLOAD=y
+
+# Enable the MMIO access method for PCIe devices (Required for large BARs device)
+ CONFIG_PCI_MMCONFIG=y
+```
+
+The following kernel config options need to be disabled:
+
+```sh
+# Disable Open Source NVIDIA driver nouveau
+# It conflicts with NVIDIA official driver
+CONFIG_DRM_NOUVEAU=n
+```
+
+> **Note**: `CONFIG_DRM_NOUVEAU` is normally disabled by default.
+It is worth checking that it is not enabled in your kernel configuration to
+prevent any conflicts.
+
+Build the Kata Containers kernel with the previous config options, using the
+instructions described in [Building Kata Containers
+kernel](../../tools/packaging/kernel). For further details on building and
+installing guest kernels, see [the developer
+guide](../Developer-Guide.md#install-guest-kernel-images).
+
+There is an easy way to build a guest kernel that supports NVIDIA GPU:
+
+```sh
+## Build guest kernel with ../../tools/packaging/kernel
+
+# Prepare (download guest kernel source, generate .config)
+$ ./build-kernel.sh -v 5.15.23 -g nvidia -f setup
+
+# Build guest kernel
+$ ./build-kernel.sh -v 5.15.23 -g nvidia build
+
+# Install guest kernel
+$ sudo -E ./build-kernel.sh -v 5.15.23 -g nvidia install
+```
+
+To build NVIDIA Driver in Kata container, `linux-headers` is required.
+This is a way to generate deb packages for `linux-headers`:
+
+> **Note**:
+> Run `make rpm-pkg` to build the rpm package.
+> Run `make deb-pkg` to build the deb package.
+>
+
+```sh
+$ cd kata-linux-5.15.23-89
+$ make deb-pkg
+```
+Before using the new guest kernel, please update the `kernel` parameters in
+ `configuration.toml`.
+
+```sh
+kernel = "/usr/share/kata-containers/vmlinuz-nvidia-gpu.container"
+```
+
+## NVIDIA GPU pass-through mode with Kata Containers
+
+Use the following steps to pass an NVIDIA GPU device in pass-through mode with Kata:
+
+1. Find the Bus-Device-Function (BDF) for GPU device on host:
+
+   ```sh
+   $ sudo lspci -nn -D | grep -i nvidia
+   0000:d0:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:20b9] (rev a1)
+   ```
+
+   > PCI address `0000:d0:00.0` is assigned to the hardware GPU device.
+   > `10de:20b9` is the device ID of the hardware GPU device.
+
+2. Find the IOMMU group for the GPU device:
+
+   ```sh
+   $ BDF="0000:d0:00.0"
+   $ readlink -e /sys/bus/pci/devices/$BDF/iommu_group
+   ```
+
+   The previous output shows that the GPU belongs to IOMMU group 192. The next
+   step is to bind the GPU to the VFIO-PCI driver.
+
+   ```sh
+   $ BDF="0000:d0:00.0"
+   $ DEV="/sys/bus/pci/devices/$BDF"
+   $ echo "vfio-pci" > $DEV/driver_override
+   $ echo $BDF > $DEV/driver/unbind
+   $ echo $BDF > /sys/bus/pci/drivers_probe
+   # To return the device to the standard driver, we simply clear the
+   # driver_override and reprobe the device, ex:
+   $ echo > $DEV/preferred_driver
+   $ echo $BDF > $DEV/driver/unbind
+   $ echo $BDF > /sys/bus/pci/drivers_probe
+   ```
+
+3. Check the IOMMU group number under `/dev/vfio`:
+
+   ```sh
+   $ ls -l /dev/vfio
+   total 0
+   crw------- 1 zvonkok zvonkok 243,   0 Mar 18 03:06 192
+   crw-rw-rw- 1 root    root     10, 196 Mar 18 02:27 vfio
+   ```
+
+4. Start a Kata container with GPU device:
+
+   ```sh
+   # You may need to `modprobe vhost-vsock` if you get
+   # host system doesn't support vsock: stat /dev/vhost-vsock
+   $ sudo ctr --debug run --runtime "io.containerd.kata.v2"  --device /dev/vfio/192  --rm -t  "docker.io/library/archlinux:latest" arch uname -r
+   ```
+
+5. Run `lspci` within the container to verify the GPU device is seen in the list
+   of the PCI devices. Note the vendor-device id of the GPU (`10de:20b9`) in the `lspci` output.
+
+   ```sh
+   $ sudo ctr --debug run --runtime "io.containerd.kata.v2"  --device /dev/vfio/192  --rm -t  "docker.io/library/archlinux:latest" arch sh -c "lspci -nn | grep '10de:20b9'"
+   ```
+
+6. Additionally, you can check the PCI BARs space of ​​the NVIDIA GPU device in the container:
+
+   ```sh
+   $ sudo ctr --debug run --runtime "io.containerd.kata.v2"  --device /dev/vfio/192  --rm -t  "docker.io/library/archlinux:latest" arch sh -c "lspci -s 02:00.0 -vv | grep Region"
+   ```
+
+   > **Note**: If you see a message similar to the above, the BAR space of the NVIDIA
+   > GPU has been successfully allocated.
+
+## NVIDIA vGPU mode with Kata Containers
+
+NVIDIA vGPU is a licensed product on all supported GPU boards. A software license
+is required to enable all vGPU features within the guest VM.
+
+> **TODO**: Will follow up with instructions
+
+## Install NVIDIA Driver + Toolkit in Kata Containers Guest OS
+
+Consult the [Developer-Guide](https://github.com/kata-containers/kata-containers/blob/main/docs/Developer-Guide.md#create-a-rootfs-image) on how to create a
+rootfs base image for a distribution of your choice. This is going to be used as
+a base for a NVIDIA enabled guest OS. Use the `EXTRA_PKGS` variable to install
+all the needed packages to compile the drivers. Also copy the kernel development
+packages from the previous `make deb-pkg` into `$ROOTFS_DIR`.
+
+```sh
+export EXTRA_PKGS="gcc make curl gnupg"
+```
+
+Having the `$ROOTFS_DIR` exported in the previous step we can now install all the
+need parts in the guest OS. In this case we have an Ubuntu based rootfs.
+
+First off all mount the special filesystems into the rootfs
+
+```sh
+$ sudo mount -t sysfs -o ro none ${ROOTFS_DIR}/sys
+$ sudo mount -t proc -o ro none ${ROOTFS_DIR}/proc
+$ sudo mount -t tmpfs none ${ROOTFS_DIR}/tmp
+$ sudo mount -o bind,ro /dev ${ROOTFS_DIR}/dev
+$ sudo mount -t devpts none ${ROOTFS_DIR}/dev/pts
+```
+
+Now we can enter `chroot`
+
+```sh
+$ sudo chroot ${ROOTFS_DIR}
+```
+
+Inside the rootfs one is going to install the drivers and toolkit to enable easy
+creation of GPU containers with Kata. We can also use this rootfs for any other
+container not specifically only for GPUs.
+
+As a prerequisite install the copied kernel development packages
+
+```sh
+$ sudo dpkg -i *.deb
+```
+
+Get the driver run file, since we need to build the driver against a kernel that
+is not running on the host we need the ability to specify the exact version we
+want the driver to build against. Take the kernel version one used for building
+the NVIDIA kernel (`5.15.23-nvidia-gpu`).
+
+```sh
+$ wget https://us.download.nvidia.com/XFree86/Linux-x86_64/510.54/NVIDIA-Linux-x86_64-510.54.run
+$ chmod +x NVIDIA-Linux-x86_64-510.54.run
+# Extract the source files so we can run the installer with arguments
+$ ./NVIDIA-Linux-x86_64-510.54.run -x
+$ cd NVIDIA-Linux-x86_64-510.54
+$ ./nvidia-installer -k 5.15.23-nvidia-gpu
+```
+Having the drivers installed we need to install the toolkit which will take care
+of providing the right bits into the container.
+
+```sh
+$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
+$ curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
+$ curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
+$ apt update
+$ apt install nvidia-container-toolkit
+```
+
+Create the hook execution file for Kata:
+
+```
+# Content of $ROOTFS_DIR/usr/share/oci/hooks/prestart/nvidia-container-toolkit.sh
+
+#!/bin/bash -x
+
+/usr/bin/nvidia-container-toolkit -debug $@
+```
+
+As a last step one can do some cleanup of files or package caches. Build the
+rootfs and configure it for use with Kata according to the development guide.
+
+Enable the `guest_hook_path` in Kata's `configuration.toml`
+
+```sh
+guest_hook_path = "/usr/share/oci/hooks"
+```
+
+One has build a NVIDIA rootfs, kernel and now we can run any GPU container
+without installing the drivers into the container. Check NVIDIA device status
+with `nvidia-smi`
+
+```sh
+$  sudo ctr --debug run --runtime "io.containerd.kata.v2"  --device /dev/vfio/192  --rm -t "docker.io/nvidia/cuda:11.6.0-base-ubuntu20.04" cuda nvidia-smi
+Fri Mar 18 10:36:59 2022
++-----------------------------------------------------------------------------+
+| NVIDIA-SMI 510.54       Driver Version: 510.54       CUDA Version: 11.6     |
+|-------------------------------+----------------------+----------------------+
+| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
+| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
+|                               |                      |               MIG M. |
+|===============================+======================+======================|
+|   0  NVIDIA A30X         Off  | 00000000:02:00.0 Off |                    0 |
+| N/A   38C    P0    67W / 230W |      0MiB / 24576MiB |      0%      Default |
+|                               |                      |             Disabled |
++-------------------------------+----------------------+----------------------+
+
++-----------------------------------------------------------------------------+
+| Processes:                                                                  |
+|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
+|        ID   ID                                                   Usage      |
+|=============================================================================|
+|  No running processes found                                                 |
++-----------------------------------------------------------------------------+
+```
+
+As a last step one can remove the additional packages and files that were added
+to the `$ROOTFS_DIR` to keep it as small as possible.
+
+## References
+
+- [Configuring a VM for GPU Pass-Through by Using the QEMU Command Line](https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#using-gpu-pass-through-red-hat-el-qemu-cli)
+- https://gitlab.com/nvidia/container-images/driver/-/tree/master
+- https://github.com/NVIDIA/nvidia-docker/wiki/Driver-containers
diff --git a/docs/use-cases/Nvidia-GPU-passthrough-and-Kata.md b/docs/use-cases/Nvidia-GPU-passthrough-and-Kata.md
deleted file mode 100644
index 58b36c02d7..0000000000
--- a/docs/use-cases/Nvidia-GPU-passthrough-and-Kata.md
+++ /dev/null
@@ -1,293 +0,0 @@
-# Using Nvidia GPU device with Kata Containers
-
-An Nvidia GPU device can be passed to a Kata Containers container using GPU passthrough
-(Nvidia GPU pass-through mode) as well as GPU mediated passthrough (Nvidia vGPU mode). 
-
-Nvidia GPU pass-through mode, an entire physical GPU is directly assigned to one VM,
-bypassing the Nvidia Virtual GPU Manager. In this mode of operation, the GPU is accessed
-exclusively by the Nvidia driver running in the VM to which it is assigned.
-The GPU is not shared among VMs.
-
-Nvidia Virtual GPU (vGPU) enables multiple virtual machines (VMs) to have simultaneous,
-direct access to a single physical GPU, using the same Nvidia graphics drivers that are
-deployed on non-virtualized operating systems. By doing this, Nvidia vGPU provides VMs
-with unparalleled graphics performance, compute performance, and application compatibility,
-together with the cost-effectiveness and scalability brought about by sharing a GPU
-among multiple workloads.
-
-| Technology | Description | Behaviour | Detail |
-| --- | --- | --- | --- |
-| Nvidia GPU pass-through mode | GPU passthrough | Physical GPU assigned to a single VM | Direct GPU assignment to VM without limitation |
-| Nvidia vGPU mode | GPU sharing | Physical GPU shared by multiple VMs | Mediated passthrough |
-
-## Hardware Requirements
-Nvidia GPUs Recommended for Virtualization:
-
-- Nvidia Tesla (T4, M10, P6, V100 or newer)
-- Nvidia Quadro RTX 6000/8000
-
-## Host BIOS Requirements
-
-Some hardware requires a larger PCI BARs window, for example, Nvidia Tesla P100, K40m
-```
-$ lspci -s 04:00.0 -vv | grep Region
-      Region 0: Memory at c6000000 (32-bit, non-prefetchable) [size=16M]
-      Region 1: Memory at 383800000000 (64-bit, prefetchable) [size=16G] #above 4G
-      Region 3: Memory at 383c00000000 (64-bit, prefetchable) [size=32M]
-```
-
-For large BARs devices, MMIO mapping above 4G address space should be `enabled`
-in the PCI configuration of the BIOS.
-
-Some hardware vendors use different name in BIOS, such as:
-
-- Above 4G Decoding
-- Memory Hole for PCI MMIO
-- Memory Mapped I/O above 4GB
-
-The following steps outline the workflow for using an Nvidia GPU with Kata.
-
-## Host Kernel Requirements
-The following configurations need to be enabled on your host kernel:
-
-- `CONFIG_VFIO`
-- `CONFIG_VFIO_IOMMU_TYPE1`
-- `CONFIG_VFIO_MDEV`
-- `CONFIG_VFIO_MDEV_DEVICE`
-- `CONFIG_VFIO_PCI`
-
-Your host kernel needs to be booted with `intel_iommu=on` on the kernel command line.
-
-## Install and configure Kata Containers
-To use non-large BARs devices (for example, Nvidia Tesla T4), you need Kata version 1.3.0 or above.
-Follow the [Kata Containers setup instructions](../install/README.md)
-to install the latest version of Kata.
-
-To use large BARs devices (for example, Nvidia Tesla P100), you need Kata version 1.11.0 or above.
-
-The following configuration in the Kata `configuration.toml` file as shown below can work:
-
-Hotplug for PCI devices by `acpi_pcihp` (Linux's ACPI PCI Hotplug driver):
-```
-machine_type = "q35"
-
-hotplug_vfio_on_root_bus = false
-```
-
-Hotplug for PCIe devices by `pciehp` (Linux's PCIe Hotplug driver):
-```
-machine_type = "q35"
-
-hotplug_vfio_on_root_bus = true
-pcie_root_port = 1
-```
-
-## Build Kata Containers kernel with GPU support
-The default guest kernel installed with Kata Containers does not provide GPU support.
-To use an Nvidia GPU with Kata Containers, you need to build a kernel with the
-necessary GPU support.
-
-The following kernel config options need to be enabled:
-```
-# Support PCI/PCIe device hotplug (Required for large BARs device)
-CONFIG_HOTPLUG_PCI_PCIE=y
-
-# Support for loading modules (Required for load Nvidia drivers)
-CONFIG_MODULES=y
-CONFIG_MODULE_UNLOAD=y
-
-# Enable the MMIO access method for PCIe devices (Required for large BARs device)
-CONFIG_PCI_MMCONFIG=y
-```
-
-The following kernel config options need to be disabled:
-```
-# Disable Open Source Nvidia driver nouveau
-# It conflicts with Nvidia official driver
-CONFIG_DRM_NOUVEAU=n
-```
-> **Note**: `CONFIG_DRM_NOUVEAU` is normally disabled by default.
-It is worth checking that it is not enabled in your kernel configuration to prevent any conflicts.
-
-
-Build the Kata Containers kernel with the previous config options,
-using the instructions described in [Building Kata Containers kernel](../../tools/packaging/kernel).
-For further details on building and installing guest kernels,
-see [the developer guide](../Developer-Guide.md#install-guest-kernel-images).
-
-There is an easy way to build a guest kernel that supports Nvidia GPU:
-```
-## Build guest kernel with ../../tools/packaging/kernel
-
-# Prepare (download guest kernel source, generate .config)
-$ ./build-kernel.sh -v 4.19.86 -g nvidia -f setup
-
-# Build guest kernel
-$ ./build-kernel.sh -v 4.19.86 -g nvidia build
-
-# Install guest kernel
-$ sudo -E ./build-kernel.sh -v 4.19.86 -g nvidia install
-/usr/share/kata-containers/vmlinux-nvidia-gpu.container -> vmlinux-4.19.86-70-nvidia-gpu
-/usr/share/kata-containers/vmlinuz-nvidia-gpu.container -> vmlinuz-4.19.86-70-nvidia-gpu
-```
-
-To build Nvidia Driver in Kata container, `kernel-devel` is required.  
-This is a way to generate rpm packages for `kernel-devel`:
-```
-$ cd kata-linux-4.19.86-68
-$ make rpm-pkg
-Output RPMs:
-~/rpmbuild/RPMS/x86_64/kernel-devel-4.19.86_nvidia_gpu-1.x86_64.rpm
-```
-> **Note**:
-> - `kernel-devel` should be installed in Kata container before run Nvidia driver installer.
-> - Run `make deb-pkg` to build the deb package.
-
-Before using the new guest kernel, please update the `kernel` parameters in `configuration.toml`.
-```
-kernel = "/usr/share/kata-containers/vmlinuz-nvidia-gpu.container"
-```
-
-## Nvidia GPU pass-through mode with Kata Containers
-Use the following steps to pass an Nvidia GPU device in pass-through mode with Kata:
-
-1. Find the Bus-Device-Function (BDF) for GPU device on host:
-   ```
-   $ sudo lspci -nn -D | grep -i nvidia
-   0000:04:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:15f8] (rev a1)
-   0000:84:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:15f8] (rev a1)
-   ```
-   > PCI address `0000:04:00.0` is assigned to the hardware GPU device.
-   > `10de:15f8` is the device ID of the hardware GPU device.
-
-2. Find the IOMMU group for the GPU device:
-   ```
-   $ BDF="0000:04:00.0"
-   $ readlink -e /sys/bus/pci/devices/$BDF/iommu_group
-   /sys/kernel/iommu_groups/45
-   ```
-   The previous output shows that the GPU belongs to IOMMU group 45.
-
-3. Check the IOMMU group number under `/dev/vfio`:
-   ```
-   $ ls -l /dev/vfio
-   total 0
-   crw------- 1 root root 248,   0 Feb 28 09:57 45
-   crw------- 1 root root 248,   1 Feb 28 09:57 54
-   crw-rw-rw- 1 root root  10, 196 Feb 28 09:57 vfio
-   ```
-
-4. Start a Kata container with GPU device:
-   ```
-   $ sudo docker run -it --runtime=kata-runtime --cap-add=ALL --device /dev/vfio/45 centos /bin/bash
-   ```
-
-5. Run `lspci` within the container to verify the GPU device is seen in the list
-   of the PCI devices. Note the vendor-device id of the GPU (`10de:15f8`) in the `lspci` output.
-   ```
-   $ lspci -nn -D | grep '10de:15f8'
-   0000:01:01.0 3D controller [0302]: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] [10de:15f8] (rev a1)
-   ```
-
-6. Additionally, you can check the PCI BARs space of ​​the Nvidia GPU device in the container:
-   ```
-   $ lspci -s 01:01.0 -vv | grep Region
-		Region 0: Memory at c0000000 (32-bit, non-prefetchable) [disabled] [size=16M]
-		Region 1: Memory at 4400000000 (64-bit, prefetchable) [disabled] [size=16G]
-		Region 3: Memory at 4800000000 (64-bit, prefetchable) [disabled] [size=32M]
-   ```
-   > **Note**: If you see a message similar to the above, the BAR space of the Nvidia
-   > GPU has been successfully allocated.
-
-## Nvidia vGPU mode with Kata Containers
-
-Nvidia vGPU is a licensed product on all supported GPU boards. A software license
-is required to enable all vGPU features within the guest VM.
-
-> **Note**: There is no suitable test environment, so it is not written here.
-
-
-## Install Nvidia Driver in Kata Containers
-Download the official Nvidia driver from
-[https://www.nvidia.com/Download/index.aspx](https://www.nvidia.com/Download/index.aspx),
-for example `NVIDIA-Linux-x86_64-418.87.01.run`.
-
-Install the `kernel-devel`(generated in the previous steps) for guest kernel:
-```
-$ sudo rpm -ivh kernel-devel-4.19.86_gpu-1.x86_64.rpm
-```
-
-Here is an example to extract, compile and install Nvidia driver:
-```
-## Extract
-$ sh ./NVIDIA-Linux-x86_64-418.87.01.run -x
-
-## Compile and install (It will take some time)
-$ cd NVIDIA-Linux-x86_64-418.87.01
-$ sudo ./nvidia-installer -a -q --ui=none \
- --no-cc-version-check \
- --no-opengl-files --no-install-libglvnd \
- --kernel-source-path=/usr/src/kernels/`uname -r`
-```
-
-Or just run one command line:
-```
-$ sudo sh ./NVIDIA-Linux-x86_64-418.87.01.run -a -q --ui=none \
- --no-cc-version-check \
- --no-opengl-files --no-install-libglvnd \
- --kernel-source-path=/usr/src/kernels/`uname -r`
-```
-
-To view detailed logs of the installer:
-```
-$ tail -f /var/log/nvidia-installer.log
-```
-
-Load Nvidia driver module manually
-```
-# Optional（generate modules.dep and map files for Nvidia driver）
-$ sudo depmod
-
-# Load module
-$ sudo modprobe nvidia-drm
-
-# Check module
-$ lsmod | grep nvidia
-nvidia_drm             45056  0
-nvidia_modeset       1093632  1 nvidia_drm
-nvidia              18202624  1 nvidia_modeset
-drm_kms_helper        159744  1 nvidia_drm
-drm                   364544  3 nvidia_drm,drm_kms_helper
-i2c_core               65536  3 nvidia,drm_kms_helper,drm
-ipmi_msghandler        49152  1 nvidia
-```
-
-
-Check Nvidia device status with `nvidia-smi`
-```
-$ nvidia-smi
-Tue Mar  3 00:03:49 2020
-+-----------------------------------------------------------------------------+
-| NVIDIA-SMI 418.87.01    Driver Version: 418.87.01    CUDA Version: 10.1     |
-|-------------------------------+----------------------+----------------------+
-| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
-| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
-|===============================+======================+======================|
-|   0  Tesla P100-PCIE...  Off  | 00000000:01:01.0 Off |                    0 |
-| N/A   27C    P0    25W / 250W |      0MiB / 16280MiB |      0%      Default |
-+-------------------------------+----------------------+----------------------+
-
-+-----------------------------------------------------------------------------+
-| Processes:                                                       GPU Memory |
-|  GPU       PID   Type   Process name                             Usage      |
-|=============================================================================|
-|  No running processes found                                                 |
-+-----------------------------------------------------------------------------+
-
-```
-
-## References
-
-- [Configuring a VM for GPU Pass-Through by Using the QEMU Command Line](https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#using-gpu-pass-through-red-hat-el-qemu-cli)
-- https://gitlab.com/nvidia/container-images/driver/-/tree/master
-- https://github.com/NVIDIA/nvidia-docker/wiki/Driver-containers