mirror of
https://github.com/kata-containers/kata-containers.git
synced 2025-06-27 07:48:55 +00:00
Now that #4213 is merged we need updated documentation for vGPU time-sliced or vGPU MIG-backed. Fixes: #4343 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
593 lines
22 KiB
Markdown
593 lines
22 KiB
Markdown
# Using NVIDIA GPU device with Kata Containers
|
||
|
||
An NVIDIA GPU device can be passed to a Kata Containers container using GPU
|
||
passthrough (NVIDIA GPU pass-through mode) as well as GPU mediated passthrough
|
||
(NVIDIA `vGPU` mode).
|
||
|
||
NVIDIA GPU pass-through mode, an entire physical GPU is directly assigned to one
|
||
VM, bypassing the NVIDIA Virtual GPU Manager. In this mode of operation, the GPU
|
||
is accessed exclusively by the NVIDIA driver running in the VM to which it is
|
||
assigned. The GPU is not shared among VMs.
|
||
|
||
NVIDIA Virtual GPU (`vGPU`) enables multiple virtual machines (VMs) to have
|
||
simultaneous, direct access to a single physical GPU, using the same NVIDIA
|
||
graphics drivers that are deployed on non-virtualized operating systems. By
|
||
doing this, NVIDIA `vGPU` provides VMs with unparalleled graphics performance,
|
||
compute performance, and application compatibility, together with the
|
||
cost-effectiveness and scalability brought about by sharing a GPU among multiple
|
||
workloads. A `vGPU` can be either time-sliced or Multi-Instance GPU (MIG)-backed
|
||
with [MIG-slices](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/).
|
||
|
||
| Technology | Description | Behavior | Detail |
|
||
| --- | --- | --- | --- |
|
||
| NVIDIA GPU pass-through mode | GPU passthrough | Physical GPU assigned to a single VM | Direct GPU assignment to VM without limitation |
|
||
| NVIDIA vGPU time-sliced | GPU time-sliced | Physical GPU time-sliced for multiple VMs | Mediated passthrough |
|
||
| NVIDIA vGPU MIG-backed | GPU with MIG-slices | Physical GPU MIG-sliced for multiple VMs | Mediated passthrough |
|
||
|
||
## Hardware Requirements
|
||
|
||
NVIDIA GPUs Recommended for Virtualization:
|
||
|
||
- NVIDIA Tesla (T4, M10, P6, V100 or newer)
|
||
- NVIDIA Quadro RTX 6000/8000
|
||
|
||
## Host BIOS Requirements
|
||
|
||
Some hardware requires a larger PCI BARs window, for example, NVIDIA Tesla P100,
|
||
K40m
|
||
|
||
```sh
|
||
$ lspci -s d0:00.0 -vv | grep Region
|
||
Region 0: Memory at e7000000 (32-bit, non-prefetchable) [size=16M]
|
||
Region 1: Memory at 222800000000 (64-bit, prefetchable) [size=32G] # Above 4G
|
||
Region 3: Memory at 223810000000 (64-bit, prefetchable) [size=32M]
|
||
```
|
||
|
||
For large BARs devices, MMIO mapping above 4G address space should be `enabled`
|
||
in the PCI configuration of the BIOS.
|
||
|
||
Some hardware vendors use a different name in BIOS, such as:
|
||
|
||
- Above 4G Decoding
|
||
- Memory Hole for PCI MMIO
|
||
- Memory Mapped I/O above 4GB
|
||
|
||
If one is using a GPU based on the Ampere architecture and later additionally
|
||
SR-IOV needs to be enabled for the `vGPU` use-case.
|
||
|
||
The following steps outline the workflow for using an NVIDIA GPU with Kata.
|
||
|
||
## Host Kernel Requirements
|
||
|
||
The following configurations need to be enabled on your host kernel:
|
||
|
||
- `CONFIG_VFIO`
|
||
- `CONFIG_VFIO_IOMMU_TYPE1`
|
||
- `CONFIG_VFIO_MDEV`
|
||
- `CONFIG_VFIO_MDEV_DEVICE`
|
||
- `CONFIG_VFIO_PCI`
|
||
|
||
Your host kernel needs to be booted with `intel_iommu=on` on the kernel command
|
||
line.
|
||
|
||
## Install and configure Kata Containers
|
||
|
||
To use non-large BARs devices (for example, NVIDIA Tesla T4), you need Kata
|
||
version 1.3.0 or above. Follow the [Kata Containers setup
|
||
instructions](../install/README.md) to install the latest version of Kata.
|
||
|
||
To use large BARs devices (for example, NVIDIA Tesla P100), you need Kata
|
||
version 1.11.0 or above.
|
||
|
||
The following configuration in the Kata `configuration.toml` file as shown below
|
||
can work:
|
||
|
||
Hotplug for PCI devices with small BARs by `acpi_pcihp` (Linux's ACPI PCI
|
||
Hotplug driver):
|
||
|
||
```sh
|
||
machine_type = "q35"
|
||
|
||
hotplug_vfio_on_root_bus = false
|
||
```
|
||
|
||
Hotplug for PCIe devices with large BARs by `pciehp` (Linux's PCIe Hotplug
|
||
driver):
|
||
|
||
```sh
|
||
machine_type = "q35"
|
||
|
||
hotplug_vfio_on_root_bus = true
|
||
pcie_root_port = 1
|
||
```
|
||
|
||
## Build Kata Containers kernel with GPU support
|
||
|
||
The default guest kernel installed with Kata Containers does not provide GPU
|
||
support. To use an NVIDIA GPU with Kata Containers, you need to build a kernel
|
||
with the necessary GPU support.
|
||
|
||
The following kernel config options need to be enabled:
|
||
|
||
```sh
|
||
# Support PCI/PCIe device hotplug (Required for large BARs device)
|
||
CONFIG_HOTPLUG_PCI_PCIE=y
|
||
|
||
# Support for loading modules (Required for load NVIDIA drivers)
|
||
CONFIG_MODULES=y
|
||
CONFIG_MODULE_UNLOAD=y
|
||
|
||
# Enable the MMIO access method for PCIe devices (Required for large BARs device)
|
||
CONFIG_PCI_MMCONFIG=y
|
||
```
|
||
|
||
The following kernel config options need to be disabled:
|
||
|
||
```sh
|
||
# Disable Open Source NVIDIA driver nouveau
|
||
# It conflicts with NVIDIA official driver
|
||
CONFIG_DRM_NOUVEAU=n
|
||
```
|
||
|
||
> **Note**: `CONFIG_DRM_NOUVEAU` is normally disabled by default.
|
||
It is worth checking that it is not enabled in your kernel configuration to
|
||
prevent any conflicts.
|
||
|
||
Build the Kata Containers kernel with the previous config options, using the
|
||
instructions described in [Building Kata Containers
|
||
kernel](../../tools/packaging/kernel). For further details on building and
|
||
installing guest kernels, see [the developer
|
||
guide](../Developer-Guide.md#install-guest-kernel-images).
|
||
|
||
There is an easy way to build a guest kernel that supports NVIDIA GPU:
|
||
|
||
```sh
|
||
## Build guest kernel with ../../tools/packaging/kernel
|
||
|
||
# Prepare (download guest kernel source, generate .config)
|
||
$ ./build-kernel.sh -v 5.15.23 -g nvidia -f setup
|
||
|
||
# Build guest kernel
|
||
$ ./build-kernel.sh -v 5.15.23 -g nvidia build
|
||
|
||
# Install guest kernel
|
||
$ sudo -E ./build-kernel.sh -v 5.15.23 -g nvidia install
|
||
```
|
||
|
||
To build NVIDIA Driver in Kata container, `linux-headers` are required.
|
||
This is a way to generate deb packages for `linux-headers`:
|
||
|
||
> **Note**:
|
||
> Run `make rpm-pkg` to build the rpm package.
|
||
> Run `make deb-pkg` to build the deb package.
|
||
>
|
||
|
||
```sh
|
||
$ cd kata-linux-5.15.23-89
|
||
$ make deb-pkg
|
||
```
|
||
Before using the new guest kernel, please update the `kernel` parameters in
|
||
`configuration.toml`.
|
||
|
||
```sh
|
||
kernel = "/usr/share/kata-containers/vmlinuz-nvidia-gpu.container"
|
||
```
|
||
|
||
## NVIDIA GPU pass-through mode with Kata Containers
|
||
|
||
Use the following steps to pass an NVIDIA GPU device in pass-through mode with Kata:
|
||
|
||
1. Find the Bus-Device-Function (BDF) for the GPU device on the host:
|
||
|
||
```sh
|
||
$ sudo lspci -nn -D | grep -i nvidia
|
||
0000:d0:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:20b9] (rev a1)
|
||
```
|
||
|
||
> PCI address `0000:d0:00.0` is assigned to the hardware GPU device.
|
||
> `10de:20b9` is the device ID of the hardware GPU device.
|
||
|
||
2. Find the IOMMU group for the GPU device:
|
||
|
||
```sh
|
||
$ BDF="0000:d0:00.0"
|
||
$ readlink -e /sys/bus/pci/devices/$BDF/iommu_group
|
||
```
|
||
|
||
The previous output shows that the GPU belongs to IOMMU group 192. The next
|
||
step is to bind the GPU to the VFIO-PCI driver.
|
||
|
||
```sh
|
||
$ BDF="0000:d0:00.0"
|
||
$ DEV="/sys/bus/pci/devices/$BDF"
|
||
$ echo "vfio-pci" > $DEV/driver_override
|
||
$ echo $BDF > $DEV/driver/unbind
|
||
$ echo $BDF > /sys/bus/pci/drivers_probe
|
||
# To return the device to the standard driver, we simply clear the
|
||
# driver_override and reprobe the device, ex:
|
||
$ echo > $DEV/preferred_driver
|
||
$ echo $BDF > $DEV/driver/unbind
|
||
$ echo $BDF > /sys/bus/pci/drivers_probe
|
||
```
|
||
|
||
3. Check the IOMMU group number under `/dev/vfio`:
|
||
|
||
```sh
|
||
$ ls -l /dev/vfio
|
||
total 0
|
||
crw------- 1 zvonkok zvonkok 243, 0 Mar 18 03:06 192
|
||
crw-rw-rw- 1 root root 10, 196 Mar 18 02:27 vfio
|
||
```
|
||
|
||
4. Start a Kata container with the GPU device:
|
||
|
||
```sh
|
||
# You may need to `modprobe vhost-vsock` if you get
|
||
# host system doesn't support vsock: stat /dev/vhost-vsock
|
||
$ sudo ctr --debug run --runtime "io.containerd.kata.v2" --device /dev/vfio/192 --rm -t "docker.io/library/archlinux:latest" arch uname -r
|
||
```
|
||
|
||
5. Run `lspci` within the container to verify the GPU device is seen in the list
|
||
of the PCI devices. Note the vendor-device id of the GPU (`10de:20b9`) in the `lspci` output.
|
||
|
||
```sh
|
||
$ sudo ctr --debug run --runtime "io.containerd.kata.v2" --device /dev/vfio/192 --rm -t "docker.io/library/archlinux:latest" arch sh -c "lspci -nn | grep '10de:20b9'"
|
||
```
|
||
|
||
6. Additionally, you can check the PCI BARs space of the NVIDIA GPU device in the container:
|
||
|
||
```sh
|
||
$ sudo ctr --debug run --runtime "io.containerd.kata.v2" --device /dev/vfio/192 --rm -t "docker.io/library/archlinux:latest" arch sh -c "lspci -s 02:00.0 -vv | grep Region"
|
||
```
|
||
|
||
> **Note**: If you see a message similar to the above, the BAR space of the NVIDIA
|
||
> GPU has been successfully allocated.
|
||
|
||
## NVIDIA vGPU mode with Kata Containers
|
||
|
||
NVIDIA vGPU is a licensed product on all supported GPU boards. A software license
|
||
is required to enable all vGPU features within the guest VM. NVIDIA vGPU manager
|
||
needs to be installed on the host to configure GPUs in vGPU mode. See [NVIDIA Virtual GPU Software Documentation v14.0 through 14.1](https://docs.nvidia.com/grid/14.0/) for more details.
|
||
|
||
### NVIDIA vGPU time-sliced
|
||
|
||
In the time-sliced mode, the GPU is not partitioned and the workload uses the
|
||
whole GPU and shares access to the GPU engines. Processes are scheduled in
|
||
series. The best effort scheduler is the default one and can be exchanged by
|
||
other scheduling policies see the documentation above how to do that.
|
||
|
||
Beware if you had `MIG` enabled before to disable `MIG` on the GPU if you want
|
||
to use `time-sliced` `vGPU`.
|
||
|
||
```sh
|
||
$ sudo nvidia-smi -mig 0
|
||
```
|
||
|
||
Enable the virtual functions for the physical GPU in the `sysfs` file system.
|
||
|
||
```sh
|
||
$ sudo /usr/lib/nvidia/sriov-manage -e 0000:41:00.0
|
||
```
|
||
|
||
Get the `BDF` of the available virtual function on the GPU, and choose one for the
|
||
following steps.
|
||
|
||
```sh
|
||
$ cd /sys/bus/pci/devices/0000:41:00.0/
|
||
$ ls -l | grep virtfn
|
||
```
|
||
|
||
#### List all available vGPU instances
|
||
|
||
The following shell snippet will walk the `sysfs` and only print instances
|
||
that are available, that can be created.
|
||
|
||
```sh
|
||
# The 00.0 is often the PF of the device the VFs will have the funciont in the
|
||
# BDF incremented by some values so e.g. the very first VF is 0000:41:00.4
|
||
|
||
cd /sys/bus/pci/devices/0000:41:00.0/
|
||
|
||
for vf in $(ls -d virtfn*)
|
||
do
|
||
BDF=$(basename $(readlink -f $vf))
|
||
for md in $(ls -d $vf/mdev_supported_types/*)
|
||
do
|
||
AVAIL=$(cat $md/available_instances)
|
||
NAME=$(cat $md/name)
|
||
DIR=$(basename $md)
|
||
|
||
if [ $AVAIL -gt 0 ]; then
|
||
echo "| BDF | INSTANCES | NAME | DIR |"
|
||
echo "+--------------+-----------+----------------+------------+"
|
||
printf "| %12s |%10d |%15s | %10s |\n\n" "$BDF" "$AVAIL" "$NAME" "$DIR"
|
||
fi
|
||
|
||
done
|
||
done
|
||
```
|
||
|
||
If there are available instances you get something like this (for the first VF),
|
||
beware that the output is highly dependent on the GPU you have, if there is no
|
||
output check again if `MIG` is really disabled.
|
||
|
||
```sh
|
||
| BDF | INSTANCES | NAME | DIR |
|
||
+--------------+-----------+----------------+------------+
|
||
| 0000:41:00.4 | 1 | GRID A100D-4C | nvidia-692 |
|
||
|
||
| BDF | INSTANCES | NAME | DIR |
|
||
+--------------+-----------+----------------+------------+
|
||
| 0000:41:00.4 | 1 | GRID A100D-8C | nvidia-693 |
|
||
|
||
| BDF | INSTANCES | NAME | DIR |
|
||
+--------------+-----------+----------------+------------+
|
||
| 0000:41:00.4 | 1 | GRID A100D-10C | nvidia-694 |
|
||
|
||
| BDF | INSTANCES | NAME | DIR |
|
||
+--------------+-----------+----------------+------------+
|
||
| 0000:41:00.4 | 1 | GRID A100D-16C | nvidia-695 |
|
||
|
||
| BDF | INSTANCES | NAME | DIR |
|
||
+--------------+-----------+----------------+------------+
|
||
| 0000:41:00.4 | 1 | GRID A100D-20C | nvidia-696 |
|
||
|
||
| BDF | INSTANCES | NAME | DIR |
|
||
+--------------+-----------+----------------+------------+
|
||
| 0000:41:00.4 | 1 | GRID A100D-40C | nvidia-697 |
|
||
|
||
| BDF | INSTANCES | NAME | DIR |
|
||
+--------------+-----------+----------------+------------+
|
||
| 0000:41:00.4 | 1 | GRID A100D-80C | nvidia-698 |
|
||
|
||
```
|
||
|
||
Change to the `mdev_supported_types` directory for the virtual function on which
|
||
you want to create the `vGPU`. Taking the first output as an example:
|
||
|
||
```sh
|
||
$ cd virtfn0/mdev_supported_types/nvidia-692
|
||
$ UUIDGEN=$(uuidgen)
|
||
$ sudo bash -c "echo $UUIDGEN > create"
|
||
```
|
||
|
||
Confirm that the `vGPU` was created. You should see the `UUID` pointing to a
|
||
subdirectory of the `sysfs` space.
|
||
|
||
```sh
|
||
$ ls -l /sys/bus/mdev/devices/
|
||
```
|
||
|
||
Get the `IOMMU` group number and verify there is a `VFIO` device created to use
|
||
with Kata.
|
||
|
||
```sh
|
||
$ ls -l /sys/bus/mdev/devices/*/
|
||
$ ls -l /dev/vfio
|
||
```
|
||
|
||
Use the `VFIO` device created in the same way as in the pass-through use-case.
|
||
Beware that the guest needs the NVIDIA guest drivers, so one would need to build
|
||
a new guest `OS` image.
|
||
|
||
### NVIDIA vGPU MIG-backed
|
||
|
||
We're not going into detail what `MIG` is but briefly it is a technology to
|
||
partition the hardware into independent instances with guaranteed quality of
|
||
service. For more details see [NVIDIA Multi-Instance GPU User Guide](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/).
|
||
|
||
First enable `MIG` mode for a GPU, depending on the platform you're running
|
||
a reboot would be necessary. Some platforms support GPU reset.
|
||
|
||
```sh
|
||
$ sudo nvidia-smi -mig 1
|
||
```
|
||
|
||
If the platform supports a GPU reset one can run, otherwise you will get a
|
||
warning to reboot the server.
|
||
|
||
```sh
|
||
$ sudo nvidia-smi --gpu-reset
|
||
```
|
||
|
||
The driver per default provides a number of profiles that users can opt-in when
|
||
configuring the MIG feature.
|
||
|
||
```sh
|
||
$ sudo nvidia-smi mig -lgip
|
||
+-----------------------------------------------------------------------------+
|
||
| GPU instance profiles: |
|
||
| GPU Name ID Instances Memory P2P SM DEC ENC |
|
||
| Free/Total GiB CE JPEG OFA |
|
||
|=============================================================================|
|
||
| 0 MIG 1g.10gb 19 7/7 9.50 No 14 0 0 |
|
||
| 1 0 0 |
|
||
+-----------------------------------------------------------------------------+
|
||
| 0 MIG 1g.10gb+me 20 1/1 9.50 No 14 1 0 |
|
||
| 1 1 1 |
|
||
+-----------------------------------------------------------------------------+
|
||
| 0 MIG 2g.20gb 14 3/3 19.50 No 28 1 0 |
|
||
| 2 0 0 |
|
||
+-----------------------------------------------------------------------------+
|
||
...
|
||
```
|
||
|
||
Create the GPU instances that correspond to the `vGPU` types of the `MIG-backed`
|
||
`vGPUs` that you will create [NVIDIA A100 PCIe 80GB Virtual GPU Types](https://docs.nvidia.com/grid/13.0/grid-vgpu-user-guide/index.html#vgpu-types-nvidia-a100-pcie-80gb).
|
||
|
||
```sh
|
||
# MIG 1g.10gb --> vGPU A100D-1-10C
|
||
$ sudo nvidia-smi mig -cgi 19
|
||
```
|
||
|
||
List the GPU instances and get the GPU instance id to create the compute
|
||
instance.
|
||
|
||
```sh
|
||
$ sudo nvidia-smi mig -lgi # list the created GPU instances
|
||
$ sudo nvidia-smi mig -cci -gi 9 # each GPU instance can have several compute
|
||
# instances. Instance -> Workload
|
||
```
|
||
|
||
Verify that the compute instances were created within the GPU instance
|
||
|
||
```sh
|
||
$ nvidia-smi
|
||
... snip ...
|
||
+-----------------------------------------------------------------------------+
|
||
| MIG devices: |
|
||
+------------------+----------------------+-----------+-----------------------+
|
||
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
|
||
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG|
|
||
| | | ECC| |
|
||
|==================+======================+===========+=======================|
|
||
| 0 9 0 0 | 0MiB / 9728MiB | 14 0 | 1 0 0 0 0 |
|
||
| | 0MiB / 4095MiB | | |
|
||
+------------------+----------------------+-----------+-----------------------+
|
||
... snip ...
|
||
```
|
||
|
||
We can use the [snippet](#list-all-available-vgpu-instances) from before to list
|
||
the available `vGPU` instances, this time `MIG-backed`.
|
||
|
||
```sh
|
||
| BDF | INSTANCES | NAME | DIR |
|
||
+--------------+-----------+----------------+------------+
|
||
| 0000:41:00.4 | 1 |GRID A100D-1-10C | nvidia-699 |
|
||
|
||
| BDF | INSTANCES | NAME | DIR |
|
||
+--------------+-----------+----------------+------------+
|
||
| 0000:41:00.5 | 1 |GRID A100D-1-10C | nvidia-699 |
|
||
|
||
| BDF | INSTANCES | NAME | DIR |
|
||
+--------------+-----------+----------------+------------+
|
||
| 0000:41:01.6 | 1 |GRID A100D-1-10C | nvidia-699 |
|
||
... snip ...
|
||
```
|
||
|
||
Repeat the steps after the [snippet](#list-all-available-vgpu-instances) listing
|
||
to create the corresponding `mdev` device and use the guest `OS` created in the
|
||
previous section with `time-sliced` `vGPUs`.
|
||
|
||
## Install NVIDIA Driver + Toolkit in Kata Containers Guest OS
|
||
|
||
Consult the [Developer-Guide](https://github.com/kata-containers/kata-containers/blob/main/docs/Developer-Guide.md#create-a-rootfs-image) on how to create a
|
||
rootfs base image for a distribution of your choice. This is going to be used as
|
||
a base for a NVIDIA enabled guest OS. Use the `EXTRA_PKGS` variable to install
|
||
all the needed packages to compile the drivers. Also copy the kernel development
|
||
packages from the previous `make deb-pkg` into `$ROOTFS_DIR`.
|
||
|
||
```sh
|
||
export EXTRA_PKGS="gcc make curl gnupg"
|
||
```
|
||
|
||
Having the `$ROOTFS_DIR` exported in the previous step we can now install all the
|
||
needed parts in the guest OS. In this case, we have an Ubuntu based rootfs.
|
||
|
||
First off all mount the special filesystems into the rootfs
|
||
|
||
```sh
|
||
$ sudo mount -t sysfs -o ro none ${ROOTFS_DIR}/sys
|
||
$ sudo mount -t proc -o ro none ${ROOTFS_DIR}/proc
|
||
$ sudo mount -t tmpfs none ${ROOTFS_DIR}/tmp
|
||
$ sudo mount -o bind,ro /dev ${ROOTFS_DIR}/dev
|
||
$ sudo mount -t devpts none ${ROOTFS_DIR}/dev/pts
|
||
```
|
||
|
||
Now we can enter `chroot`
|
||
|
||
```sh
|
||
$ sudo chroot ${ROOTFS_DIR}
|
||
```
|
||
|
||
Inside the rootfs one is going to install the drivers and toolkit to enable the
|
||
easy creation of GPU containers with Kata. We can also use this rootfs for any
|
||
other container not specifically only for GPUs.
|
||
|
||
As a prerequisite install the copied kernel development packages
|
||
|
||
```sh
|
||
$ sudo dpkg -i *.deb
|
||
```
|
||
|
||
Get the driver run file, since we need to build the driver against a kernel that
|
||
is not running on the host we need the ability to specify the exact version we
|
||
want the driver to build against. Take the kernel version one used for building
|
||
the NVIDIA kernel (`5.15.23-nvidia-gpu`).
|
||
|
||
```sh
|
||
$ wget https://us.download.nvidia.com/XFree86/Linux-x86_64/510.54/NVIDIA-Linux-x86_64-510.54.run
|
||
$ chmod +x NVIDIA-Linux-x86_64-510.54.run
|
||
# Extract the source files so we can run the installer with arguments
|
||
$ ./NVIDIA-Linux-x86_64-510.54.run -x
|
||
$ cd NVIDIA-Linux-x86_64-510.54
|
||
$ ./nvidia-installer -k 5.15.23-nvidia-gpu
|
||
```
|
||
|
||
Having the drivers installed we need to install the toolkit which will take care
|
||
of providing the right bits into the container.
|
||
|
||
```sh
|
||
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
|
||
$ curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
|
||
$ curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
|
||
$ apt update
|
||
$ apt install nvidia-container-toolkit
|
||
```
|
||
|
||
Create the hook execution file for Kata:
|
||
|
||
```
|
||
# Content of $ROOTFS_DIR/usr/share/oci/hooks/prestart/nvidia-container-toolkit.sh
|
||
|
||
#!/bin/bash -x
|
||
|
||
/usr/bin/nvidia-container-toolkit -debug $@
|
||
```
|
||
|
||
As the last step one can do some cleanup of files or package caches. Build the
|
||
rootfs and configure it for use with Kata according to the development guide.
|
||
|
||
Enable the `guest_hook_path` in Kata's `configuration.toml`
|
||
|
||
```sh
|
||
guest_hook_path = "/usr/share/oci/hooks"
|
||
```
|
||
|
||
One has built a NVIDIA rootfs, kernel and now we can run any GPU container
|
||
without installing the drivers into the container. Check NVIDIA device status
|
||
with `nvidia-smi`
|
||
|
||
```sh
|
||
$ sudo ctr --debug run --runtime "io.containerd.kata.v2" --device /dev/vfio/192 --rm -t "docker.io/nvidia/cuda:11.6.0-base-ubuntu20.04" cuda nvidia-smi
|
||
Fri Mar 18 10:36:59 2022
|
||
+-----------------------------------------------------------------------------+
|
||
| NVIDIA-SMI 510.54 Driver Version: 510.54 CUDA Version: 11.6 |
|
||
|-------------------------------+----------------------+----------------------+
|
||
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
|
||
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|
||
| | | MIG M. |
|
||
|===============================+======================+======================|
|
||
| 0 NVIDIA A30X Off | 00000000:02:00.0 Off | 0 |
|
||
| N/A 38C P0 67W / 230W | 0MiB / 24576MiB | 0% Default |
|
||
| | | Disabled |
|
||
+-------------------------------+----------------------+----------------------+
|
||
|
||
+-----------------------------------------------------------------------------+
|
||
| Processes: |
|
||
| GPU GI CI PID Type Process name GPU Memory |
|
||
| ID ID Usage |
|
||
|=============================================================================|
|
||
| No running processes found |
|
||
+-----------------------------------------------------------------------------+
|
||
```
|
||
|
||
As the last step one can remove the additional packages and files that were added
|
||
to the `$ROOTFS_DIR` to keep it as small as possible.
|
||
|
||
## References
|
||
|
||
- [Configuring a VM for GPU Pass-Through by Using the QEMU Command Line](https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#using-gpu-pass-through-red-hat-el-qemu-cli)
|
||
- https://gitlab.com/nvidia/container-images/driver/-/tree/master
|
||
- https://github.com/NVIDIA/nvidia-docker/wiki/Driver-containers
|