From dd4bd7f47109e40b893422ba40610074e44bb0e4 Mon Sep 17 00:00:00 2001 From: Zvonko Kaiser Date: Fri, 18 Mar 2022 03:40:26 -0700 Subject: [PATCH] doc: Added initial doc update for NV GPUs Fixed rpm vs deb references Update to the shell portion Fixes #3379 Signed-off-by: Zvonko Kaiser --- docs/use-cases/GPU-passthrough-and-Kata.md | 2 +- .../NVIDIA-GPU-passthrough-and-Kata.md | 372 ++++++++++++++++++ .../Nvidia-GPU-passthrough-and-Kata.md | 293 -------------- 3 files changed, 373 insertions(+), 294 deletions(-) create mode 100644 docs/use-cases/NVIDIA-GPU-passthrough-and-Kata.md delete mode 100644 docs/use-cases/Nvidia-GPU-passthrough-and-Kata.md diff --git a/docs/use-cases/GPU-passthrough-and-Kata.md b/docs/use-cases/GPU-passthrough-and-Kata.md index 493c947c95..4f00ec52ee 100644 --- a/docs/use-cases/GPU-passthrough-and-Kata.md +++ b/docs/use-cases/GPU-passthrough-and-Kata.md @@ -3,4 +3,4 @@ Kata Containers supports passing certain GPUs from the host into the container. Select the GPU vendor for detailed information: - [Intel](Intel-GPU-passthrough-and-Kata.md) -- [Nvidia](Nvidia-GPU-passthrough-and-Kata.md) +- [NVIDIA](NVIDIA-GPU-passthrough-and-Kata.md) diff --git a/docs/use-cases/NVIDIA-GPU-passthrough-and-Kata.md b/docs/use-cases/NVIDIA-GPU-passthrough-and-Kata.md new file mode 100644 index 0000000000..32943b8d32 --- /dev/null +++ b/docs/use-cases/NVIDIA-GPU-passthrough-and-Kata.md @@ -0,0 +1,372 @@ +# Using NVIDIA GPU device with Kata Containers + +An NVIDIA GPU device can be passed to a Kata Containers container using GPU +passthrough (NVIDIA GPU pass-through mode) as well as GPU mediated passthrough +(NVIDIA vGPU mode). + +NVIDIA GPU pass-through mode, an entire physical GPU is directly assigned to one +VM, bypassing the NVIDIA Virtual GPU Manager. In this mode of operation, the GPU +is accessed exclusively by the NVIDIA driver running in the VM to which it is +assigned. The GPU is not shared among VMs. + +NVIDIA Virtual GPU (vGPU) enables multiple virtual machines (VMs) to have +simultaneous, direct access to a single physical GPU, using the same NVIDIA +graphics drivers that are deployed on non-virtualized operating systems. By +doing this, NVIDIA vGPU provides VMs with unparalleled graphics performance, +compute performance, and application compatibility, together with the +cost-effectiveness and scalability brought about by sharing a GPU among multiple +workloads. A vGPU can be either time-sliced or Multi-Instance GPU (MIG)-backed +with [MIG-slices](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/). + +| Technology | Description | Behavior | Detail | +| --- | --- | --- | --- | +| NVIDIA GPU pass-through mode | GPU passthrough | Physical GPU assigned to a single VM | Direct GPU assignment to VM without limitation | +| NVIDIA vGPU time-sliced | GPU time-sliced | Physical GPU time-sliced for multiple VMs | Mediated passthrough | +| NVIDIA vGPU MIG-backed | GPU with MIG-slices | Physical GPU MIG-sliced for multiple VMs | Mediated passthrough | + +## Hardware Requirements + +NVIDIA GPUs Recommended for Virtualization: + +- NVIDIA Tesla (T4, M10, P6, V100 or newer) +- NVIDIA Quadro RTX 6000/8000 + +## Host BIOS Requirements + +Some hardware requires a larger PCI BARs window, for example, NVIDIA Tesla P100, +K40m + +```sh +$ lspci -s d0:00.0 -vv | grep Region + Region 0: Memory at e7000000 (32-bit, non-prefetchable) [size=16M] + Region 1: Memory at 222800000000 (64-bit, prefetchable) [size=32G] # Above 4G + Region 3: Memory at 223810000000 (64-bit, prefetchable) [size=32M] +``` + +For large BARs devices, MMIO mapping above 4G address space should be `enabled` +in the PCI configuration of the BIOS. + +Some hardware vendors use different name in BIOS, such as: + +- Above 4G Decoding +- Memory Hole for PCI MMIO +- Memory Mapped I/O above 4GB + +If one is using a GPU based on the Ampere architecture and later additionally +SR-IOV needs to be enabled for the vGPU use-case. + +The following steps outline the workflow for using an NVIDIA GPU with Kata. + +## Host Kernel Requirements + +The following configurations need to be enabled on your host kernel: + +- `CONFIG_VFIO` +- `CONFIG_VFIO_IOMMU_TYPE1` +- `CONFIG_VFIO_MDEV` +- `CONFIG_VFIO_MDEV_DEVICE` +- `CONFIG_VFIO_PCI` + +Your host kernel needs to be booted with `intel_iommu=on` on the kernel command +line. + +## Install and configure Kata Containers + +To use non-large BARs devices (for example, NVIDIA Tesla T4), you need Kata +version 1.3.0 or above. Follow the [Kata Containers setup +instructions](../install/README.md) to install the latest version of Kata. + +To use large BARs devices (for example, NVIDIA Tesla P100), you need Kata +version 1.11.0 or above. + +The following configuration in the Kata `configuration.toml` file as shown below +can work: + +Hotplug for PCI devices with small BARs by `acpi_pcihp` (Linux's ACPI PCI +Hotplug driver): + +```sh +machine_type = "q35" + +hotplug_vfio_on_root_bus = false +``` + +Hotplug for PCIe devices with large BARs by `pciehp` (Linux's PCIe Hotplug +driver): + +```sh +machine_type = "q35" + +hotplug_vfio_on_root_bus = true +pcie_root_port = 1 +``` + +## Build Kata Containers kernel with GPU support + +The default guest kernel installed with Kata Containers does not provide GPU +support. To use an NVIDIA GPU with Kata Containers, you need to build a kernel +with the necessary GPU support. + +The following kernel config options need to be enabled: + +```sh +# Support PCI/PCIe device hotplug (Required for large BARs device) +CONFIG_HOTPLUG_PCI_PCIE=y + +# Support for loading modules (Required for load NVIDIA drivers) +CONFIG_MODULES=y +CONFIG_MODULE_UNLOAD=y + +# Enable the MMIO access method for PCIe devices (Required for large BARs device) + CONFIG_PCI_MMCONFIG=y +``` + +The following kernel config options need to be disabled: + +```sh +# Disable Open Source NVIDIA driver nouveau +# It conflicts with NVIDIA official driver +CONFIG_DRM_NOUVEAU=n +``` + +> **Note**: `CONFIG_DRM_NOUVEAU` is normally disabled by default. +It is worth checking that it is not enabled in your kernel configuration to +prevent any conflicts. + +Build the Kata Containers kernel with the previous config options, using the +instructions described in [Building Kata Containers +kernel](../../tools/packaging/kernel). For further details on building and +installing guest kernels, see [the developer +guide](../Developer-Guide.md#install-guest-kernel-images). + +There is an easy way to build a guest kernel that supports NVIDIA GPU: + +```sh +## Build guest kernel with ../../tools/packaging/kernel + +# Prepare (download guest kernel source, generate .config) +$ ./build-kernel.sh -v 5.15.23 -g nvidia -f setup + +# Build guest kernel +$ ./build-kernel.sh -v 5.15.23 -g nvidia build + +# Install guest kernel +$ sudo -E ./build-kernel.sh -v 5.15.23 -g nvidia install +``` + +To build NVIDIA Driver in Kata container, `linux-headers` is required. +This is a way to generate deb packages for `linux-headers`: + +> **Note**: +> Run `make rpm-pkg` to build the rpm package. +> Run `make deb-pkg` to build the deb package. +> + +```sh +$ cd kata-linux-5.15.23-89 +$ make deb-pkg +``` +Before using the new guest kernel, please update the `kernel` parameters in + `configuration.toml`. + +```sh +kernel = "/usr/share/kata-containers/vmlinuz-nvidia-gpu.container" +``` + +## NVIDIA GPU pass-through mode with Kata Containers + +Use the following steps to pass an NVIDIA GPU device in pass-through mode with Kata: + +1. Find the Bus-Device-Function (BDF) for GPU device on host: + + ```sh + $ sudo lspci -nn -D | grep -i nvidia + 0000:d0:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:20b9] (rev a1) + ``` + + > PCI address `0000:d0:00.0` is assigned to the hardware GPU device. + > `10de:20b9` is the device ID of the hardware GPU device. + +2. Find the IOMMU group for the GPU device: + + ```sh + $ BDF="0000:d0:00.0" + $ readlink -e /sys/bus/pci/devices/$BDF/iommu_group + ``` + + The previous output shows that the GPU belongs to IOMMU group 192. The next + step is to bind the GPU to the VFIO-PCI driver. + + ```sh + $ BDF="0000:d0:00.0" + $ DEV="/sys/bus/pci/devices/$BDF" + $ echo "vfio-pci" > $DEV/driver_override + $ echo $BDF > $DEV/driver/unbind + $ echo $BDF > /sys/bus/pci/drivers_probe + # To return the device to the standard driver, we simply clear the + # driver_override and reprobe the device, ex: + $ echo > $DEV/preferred_driver + $ echo $BDF > $DEV/driver/unbind + $ echo $BDF > /sys/bus/pci/drivers_probe + ``` + +3. Check the IOMMU group number under `/dev/vfio`: + + ```sh + $ ls -l /dev/vfio + total 0 + crw------- 1 zvonkok zvonkok 243, 0 Mar 18 03:06 192 + crw-rw-rw- 1 root root 10, 196 Mar 18 02:27 vfio + ``` + +4. Start a Kata container with GPU device: + + ```sh + # You may need to `modprobe vhost-vsock` if you get + # host system doesn't support vsock: stat /dev/vhost-vsock + $ sudo ctr --debug run --runtime "io.containerd.kata.v2" --device /dev/vfio/192 --rm -t "docker.io/library/archlinux:latest" arch uname -r + ``` + +5. Run `lspci` within the container to verify the GPU device is seen in the list + of the PCI devices. Note the vendor-device id of the GPU (`10de:20b9`) in the `lspci` output. + + ```sh + $ sudo ctr --debug run --runtime "io.containerd.kata.v2" --device /dev/vfio/192 --rm -t "docker.io/library/archlinux:latest" arch sh -c "lspci -nn | grep '10de:20b9'" + ``` + +6. Additionally, you can check the PCI BARs space of ​​the NVIDIA GPU device in the container: + + ```sh + $ sudo ctr --debug run --runtime "io.containerd.kata.v2" --device /dev/vfio/192 --rm -t "docker.io/library/archlinux:latest" arch sh -c "lspci -s 02:00.0 -vv | grep Region" + ``` + + > **Note**: If you see a message similar to the above, the BAR space of the NVIDIA + > GPU has been successfully allocated. + +## NVIDIA vGPU mode with Kata Containers + +NVIDIA vGPU is a licensed product on all supported GPU boards. A software license +is required to enable all vGPU features within the guest VM. + +> **TODO**: Will follow up with instructions + +## Install NVIDIA Driver + Toolkit in Kata Containers Guest OS + +Consult the [Developer-Guide](https://github.com/kata-containers/kata-containers/blob/main/docs/Developer-Guide.md#create-a-rootfs-image) on how to create a +rootfs base image for a distribution of your choice. This is going to be used as +a base for a NVIDIA enabled guest OS. Use the `EXTRA_PKGS` variable to install +all the needed packages to compile the drivers. Also copy the kernel development +packages from the previous `make deb-pkg` into `$ROOTFS_DIR`. + +```sh +export EXTRA_PKGS="gcc make curl gnupg" +``` + +Having the `$ROOTFS_DIR` exported in the previous step we can now install all the +need parts in the guest OS. In this case we have an Ubuntu based rootfs. + +First off all mount the special filesystems into the rootfs + +```sh +$ sudo mount -t sysfs -o ro none ${ROOTFS_DIR}/sys +$ sudo mount -t proc -o ro none ${ROOTFS_DIR}/proc +$ sudo mount -t tmpfs none ${ROOTFS_DIR}/tmp +$ sudo mount -o bind,ro /dev ${ROOTFS_DIR}/dev +$ sudo mount -t devpts none ${ROOTFS_DIR}/dev/pts +``` + +Now we can enter `chroot` + +```sh +$ sudo chroot ${ROOTFS_DIR} +``` + +Inside the rootfs one is going to install the drivers and toolkit to enable easy +creation of GPU containers with Kata. We can also use this rootfs for any other +container not specifically only for GPUs. + +As a prerequisite install the copied kernel development packages + +```sh +$ sudo dpkg -i *.deb +``` + +Get the driver run file, since we need to build the driver against a kernel that +is not running on the host we need the ability to specify the exact version we +want the driver to build against. Take the kernel version one used for building +the NVIDIA kernel (`5.15.23-nvidia-gpu`). + +```sh +$ wget https://us.download.nvidia.com/XFree86/Linux-x86_64/510.54/NVIDIA-Linux-x86_64-510.54.run +$ chmod +x NVIDIA-Linux-x86_64-510.54.run +# Extract the source files so we can run the installer with arguments +$ ./NVIDIA-Linux-x86_64-510.54.run -x +$ cd NVIDIA-Linux-x86_64-510.54 +$ ./nvidia-installer -k 5.15.23-nvidia-gpu +``` +Having the drivers installed we need to install the toolkit which will take care +of providing the right bits into the container. + +```sh +$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) +$ curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg +$ curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list +$ apt update +$ apt install nvidia-container-toolkit +``` + +Create the hook execution file for Kata: + +``` +# Content of $ROOTFS_DIR/usr/share/oci/hooks/prestart/nvidia-container-toolkit.sh + +#!/bin/bash -x + +/usr/bin/nvidia-container-toolkit -debug $@ +``` + +As a last step one can do some cleanup of files or package caches. Build the +rootfs and configure it for use with Kata according to the development guide. + +Enable the `guest_hook_path` in Kata's `configuration.toml` + +```sh +guest_hook_path = "/usr/share/oci/hooks" +``` + +One has build a NVIDIA rootfs, kernel and now we can run any GPU container +without installing the drivers into the container. Check NVIDIA device status +with `nvidia-smi` + +```sh +$ sudo ctr --debug run --runtime "io.containerd.kata.v2" --device /dev/vfio/192 --rm -t "docker.io/nvidia/cuda:11.6.0-base-ubuntu20.04" cuda nvidia-smi +Fri Mar 18 10:36:59 2022 ++-----------------------------------------------------------------------------+ +| NVIDIA-SMI 510.54 Driver Version: 510.54 CUDA Version: 11.6 | +|-------------------------------+----------------------+----------------------+ +| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | +| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | +| | | MIG M. | +|===============================+======================+======================| +| 0 NVIDIA A30X Off | 00000000:02:00.0 Off | 0 | +| N/A 38C P0 67W / 230W | 0MiB / 24576MiB | 0% Default | +| | | Disabled | ++-------------------------------+----------------------+----------------------+ + ++-----------------------------------------------------------------------------+ +| Processes: | +| GPU GI CI PID Type Process name GPU Memory | +| ID ID Usage | +|=============================================================================| +| No running processes found | ++-----------------------------------------------------------------------------+ +``` + +As a last step one can remove the additional packages and files that were added +to the `$ROOTFS_DIR` to keep it as small as possible. + +## References + +- [Configuring a VM for GPU Pass-Through by Using the QEMU Command Line](https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#using-gpu-pass-through-red-hat-el-qemu-cli) +- https://gitlab.com/nvidia/container-images/driver/-/tree/master +- https://github.com/NVIDIA/nvidia-docker/wiki/Driver-containers diff --git a/docs/use-cases/Nvidia-GPU-passthrough-and-Kata.md b/docs/use-cases/Nvidia-GPU-passthrough-and-Kata.md deleted file mode 100644 index 58b36c02d7..0000000000 --- a/docs/use-cases/Nvidia-GPU-passthrough-and-Kata.md +++ /dev/null @@ -1,293 +0,0 @@ -# Using Nvidia GPU device with Kata Containers - -An Nvidia GPU device can be passed to a Kata Containers container using GPU passthrough -(Nvidia GPU pass-through mode) as well as GPU mediated passthrough (Nvidia vGPU mode).  - -Nvidia GPU pass-through mode, an entire physical GPU is directly assigned to one VM, -bypassing the Nvidia Virtual GPU Manager. In this mode of operation, the GPU is accessed -exclusively by the Nvidia driver running in the VM to which it is assigned. -The GPU is not shared among VMs. - -Nvidia Virtual GPU (vGPU) enables multiple virtual machines (VMs) to have simultaneous, -direct access to a single physical GPU, using the same Nvidia graphics drivers that are -deployed on non-virtualized operating systems. By doing this, Nvidia vGPU provides VMs -with unparalleled graphics performance, compute performance, and application compatibility, -together with the cost-effectiveness and scalability brought about by sharing a GPU -among multiple workloads. - -| Technology | Description | Behaviour | Detail | -| --- | --- | --- | --- | -| Nvidia GPU pass-through mode | GPU passthrough | Physical GPU assigned to a single VM | Direct GPU assignment to VM without limitation | -| Nvidia vGPU mode | GPU sharing | Physical GPU shared by multiple VMs | Mediated passthrough | - -## Hardware Requirements -Nvidia GPUs Recommended for Virtualization: - -- Nvidia Tesla (T4, M10, P6, V100 or newer) -- Nvidia Quadro RTX 6000/8000 - -## Host BIOS Requirements - -Some hardware requires a larger PCI BARs window, for example, Nvidia Tesla P100, K40m -``` -$ lspci -s 04:00.0 -vv | grep Region - Region 0: Memory at c6000000 (32-bit, non-prefetchable) [size=16M] - Region 1: Memory at 383800000000 (64-bit, prefetchable) [size=16G] #above 4G - Region 3: Memory at 383c00000000 (64-bit, prefetchable) [size=32M] -``` - -For large BARs devices, MMIO mapping above 4G address space should be `enabled` -in the PCI configuration of the BIOS. - -Some hardware vendors use different name in BIOS, such as: - -- Above 4G Decoding -- Memory Hole for PCI MMIO -- Memory Mapped I/O above 4GB - -The following steps outline the workflow for using an Nvidia GPU with Kata. - -## Host Kernel Requirements -The following configurations need to be enabled on your host kernel: - -- `CONFIG_VFIO` -- `CONFIG_VFIO_IOMMU_TYPE1` -- `CONFIG_VFIO_MDEV` -- `CONFIG_VFIO_MDEV_DEVICE` -- `CONFIG_VFIO_PCI` - -Your host kernel needs to be booted with `intel_iommu=on` on the kernel command line. - -## Install and configure Kata Containers -To use non-large BARs devices (for example, Nvidia Tesla T4), you need Kata version 1.3.0 or above. -Follow the [Kata Containers setup instructions](../install/README.md) -to install the latest version of Kata. - -To use large BARs devices (for example, Nvidia Tesla P100), you need Kata version 1.11.0 or above. - -The following configuration in the Kata `configuration.toml` file as shown below can work: - -Hotplug for PCI devices by `acpi_pcihp` (Linux's ACPI PCI Hotplug driver): -``` -machine_type = "q35" - -hotplug_vfio_on_root_bus = false -``` - -Hotplug for PCIe devices by `pciehp` (Linux's PCIe Hotplug driver): -``` -machine_type = "q35" - -hotplug_vfio_on_root_bus = true -pcie_root_port = 1 -``` - -## Build Kata Containers kernel with GPU support -The default guest kernel installed with Kata Containers does not provide GPU support. -To use an Nvidia GPU with Kata Containers, you need to build a kernel with the -necessary GPU support. - -The following kernel config options need to be enabled: -``` -# Support PCI/PCIe device hotplug (Required for large BARs device) -CONFIG_HOTPLUG_PCI_PCIE=y - -# Support for loading modules (Required for load Nvidia drivers) -CONFIG_MODULES=y -CONFIG_MODULE_UNLOAD=y - -# Enable the MMIO access method for PCIe devices (Required for large BARs device) -CONFIG_PCI_MMCONFIG=y -``` - -The following kernel config options need to be disabled: -``` -# Disable Open Source Nvidia driver nouveau -# It conflicts with Nvidia official driver -CONFIG_DRM_NOUVEAU=n -``` -> **Note**: `CONFIG_DRM_NOUVEAU` is normally disabled by default. -It is worth checking that it is not enabled in your kernel configuration to prevent any conflicts. - - -Build the Kata Containers kernel with the previous config options, -using the instructions described in [Building Kata Containers kernel](../../tools/packaging/kernel). -For further details on building and installing guest kernels, -see [the developer guide](../Developer-Guide.md#install-guest-kernel-images). - -There is an easy way to build a guest kernel that supports Nvidia GPU: -``` -## Build guest kernel with ../../tools/packaging/kernel - -# Prepare (download guest kernel source, generate .config) -$ ./build-kernel.sh -v 4.19.86 -g nvidia -f setup - -# Build guest kernel -$ ./build-kernel.sh -v 4.19.86 -g nvidia build - -# Install guest kernel -$ sudo -E ./build-kernel.sh -v 4.19.86 -g nvidia install -/usr/share/kata-containers/vmlinux-nvidia-gpu.container -> vmlinux-4.19.86-70-nvidia-gpu -/usr/share/kata-containers/vmlinuz-nvidia-gpu.container -> vmlinuz-4.19.86-70-nvidia-gpu -``` - -To build Nvidia Driver in Kata container, `kernel-devel` is required. -This is a way to generate rpm packages for `kernel-devel`: -``` -$ cd kata-linux-4.19.86-68 -$ make rpm-pkg -Output RPMs: -~/rpmbuild/RPMS/x86_64/kernel-devel-4.19.86_nvidia_gpu-1.x86_64.rpm -``` -> **Note**: -> - `kernel-devel` should be installed in Kata container before run Nvidia driver installer. -> - Run `make deb-pkg` to build the deb package. - -Before using the new guest kernel, please update the `kernel` parameters in `configuration.toml`. -``` -kernel = "/usr/share/kata-containers/vmlinuz-nvidia-gpu.container" -``` - -## Nvidia GPU pass-through mode with Kata Containers -Use the following steps to pass an Nvidia GPU device in pass-through mode with Kata: - -1. Find the Bus-Device-Function (BDF) for GPU device on host: - ``` - $ sudo lspci -nn -D | grep -i nvidia - 0000:04:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:15f8] (rev a1) - 0000:84:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:15f8] (rev a1) - ``` - > PCI address `0000:04:00.0` is assigned to the hardware GPU device. - > `10de:15f8` is the device ID of the hardware GPU device. - -2. Find the IOMMU group for the GPU device: - ``` - $ BDF="0000:04:00.0" - $ readlink -e /sys/bus/pci/devices/$BDF/iommu_group - /sys/kernel/iommu_groups/45 - ``` - The previous output shows that the GPU belongs to IOMMU group 45. - -3. Check the IOMMU group number under `/dev/vfio`: - ``` - $ ls -l /dev/vfio - total 0 - crw------- 1 root root 248, 0 Feb 28 09:57 45 - crw------- 1 root root 248, 1 Feb 28 09:57 54 - crw-rw-rw- 1 root root 10, 196 Feb 28 09:57 vfio - ``` - -4. Start a Kata container with GPU device: - ``` - $ sudo docker run -it --runtime=kata-runtime --cap-add=ALL --device /dev/vfio/45 centos /bin/bash - ``` - -5. Run `lspci` within the container to verify the GPU device is seen in the list - of the PCI devices. Note the vendor-device id of the GPU (`10de:15f8`) in the `lspci` output. - ``` - $ lspci -nn -D | grep '10de:15f8' - 0000:01:01.0 3D controller [0302]: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] [10de:15f8] (rev a1) - ``` - -6. Additionally, you can check the PCI BARs space of ​​the Nvidia GPU device in the container: - ``` - $ lspci -s 01:01.0 -vv | grep Region - Region 0: Memory at c0000000 (32-bit, non-prefetchable) [disabled] [size=16M] - Region 1: Memory at 4400000000 (64-bit, prefetchable) [disabled] [size=16G] - Region 3: Memory at 4800000000 (64-bit, prefetchable) [disabled] [size=32M] - ``` - > **Note**: If you see a message similar to the above, the BAR space of the Nvidia - > GPU has been successfully allocated. - -## Nvidia vGPU mode with Kata Containers - -Nvidia vGPU is a licensed product on all supported GPU boards. A software license -is required to enable all vGPU features within the guest VM. - -> **Note**: There is no suitable test environment, so it is not written here. - - -## Install Nvidia Driver in Kata Containers -Download the official Nvidia driver from -[https://www.nvidia.com/Download/index.aspx](https://www.nvidia.com/Download/index.aspx), -for example `NVIDIA-Linux-x86_64-418.87.01.run`. - -Install the `kernel-devel`(generated in the previous steps) for guest kernel: -``` -$ sudo rpm -ivh kernel-devel-4.19.86_gpu-1.x86_64.rpm -``` - -Here is an example to extract, compile and install Nvidia driver: -``` -## Extract -$ sh ./NVIDIA-Linux-x86_64-418.87.01.run -x - -## Compile and install (It will take some time) -$ cd NVIDIA-Linux-x86_64-418.87.01 -$ sudo ./nvidia-installer -a -q --ui=none \ - --no-cc-version-check \ - --no-opengl-files --no-install-libglvnd \ - --kernel-source-path=/usr/src/kernels/`uname -r` -``` - -Or just run one command line: -``` -$ sudo sh ./NVIDIA-Linux-x86_64-418.87.01.run -a -q --ui=none \ - --no-cc-version-check \ - --no-opengl-files --no-install-libglvnd \ - --kernel-source-path=/usr/src/kernels/`uname -r` -``` - -To view detailed logs of the installer: -``` -$ tail -f /var/log/nvidia-installer.log -``` - -Load Nvidia driver module manually -``` -# Optional(generate modules.dep and map files for Nvidia driver) -$ sudo depmod - -# Load module -$ sudo modprobe nvidia-drm - -# Check module -$ lsmod | grep nvidia -nvidia_drm 45056 0 -nvidia_modeset 1093632 1 nvidia_drm -nvidia 18202624 1 nvidia_modeset -drm_kms_helper 159744 1 nvidia_drm -drm 364544 3 nvidia_drm,drm_kms_helper -i2c_core 65536 3 nvidia,drm_kms_helper,drm -ipmi_msghandler 49152 1 nvidia -``` - - -Check Nvidia device status with `nvidia-smi` -``` -$ nvidia-smi -Tue Mar 3 00:03:49 2020 -+-----------------------------------------------------------------------------+ -| NVIDIA-SMI 418.87.01 Driver Version: 418.87.01 CUDA Version: 10.1 | -|-------------------------------+----------------------+----------------------+ -| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | -| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | -|===============================+======================+======================| -| 0 Tesla P100-PCIE... Off | 00000000:01:01.0 Off | 0 | -| N/A 27C P0 25W / 250W | 0MiB / 16280MiB | 0% Default | -+-------------------------------+----------------------+----------------------+ - -+-----------------------------------------------------------------------------+ -| Processes: GPU Memory | -| GPU PID Type Process name Usage | -|=============================================================================| -| No running processes found | -+-----------------------------------------------------------------------------+ - -``` - -## References - -- [Configuring a VM for GPU Pass-Through by Using the QEMU Command Line](https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#using-gpu-pass-through-red-hat-el-qemu-cli) -- https://gitlab.com/nvidia/container-images/driver/-/tree/master -- https://github.com/NVIDIA/nvidia-docker/wiki/Driver-containers