mirror of
https://github.com/kata-containers/kata-containers.git
synced 2025-04-29 20:24:31 +00:00
docs: Update vGPU use-case
Now that #4213 is merged we need updated documentation for vGPU time-sliced or vGPU MIG-backed. Fixes: #4343 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
This commit is contained in:
parent
d157f9b71e
commit
6d0ff901ab
@ -2,20 +2,20 @@
|
|||||||
|
|
||||||
An NVIDIA GPU device can be passed to a Kata Containers container using GPU
|
An NVIDIA GPU device can be passed to a Kata Containers container using GPU
|
||||||
passthrough (NVIDIA GPU pass-through mode) as well as GPU mediated passthrough
|
passthrough (NVIDIA GPU pass-through mode) as well as GPU mediated passthrough
|
||||||
(NVIDIA vGPU mode).
|
(NVIDIA `vGPU` mode).
|
||||||
|
|
||||||
NVIDIA GPU pass-through mode, an entire physical GPU is directly assigned to one
|
NVIDIA GPU pass-through mode, an entire physical GPU is directly assigned to one
|
||||||
VM, bypassing the NVIDIA Virtual GPU Manager. In this mode of operation, the GPU
|
VM, bypassing the NVIDIA Virtual GPU Manager. In this mode of operation, the GPU
|
||||||
is accessed exclusively by the NVIDIA driver running in the VM to which it is
|
is accessed exclusively by the NVIDIA driver running in the VM to which it is
|
||||||
assigned. The GPU is not shared among VMs.
|
assigned. The GPU is not shared among VMs.
|
||||||
|
|
||||||
NVIDIA Virtual GPU (vGPU) enables multiple virtual machines (VMs) to have
|
NVIDIA Virtual GPU (`vGPU`) enables multiple virtual machines (VMs) to have
|
||||||
simultaneous, direct access to a single physical GPU, using the same NVIDIA
|
simultaneous, direct access to a single physical GPU, using the same NVIDIA
|
||||||
graphics drivers that are deployed on non-virtualized operating systems. By
|
graphics drivers that are deployed on non-virtualized operating systems. By
|
||||||
doing this, NVIDIA vGPU provides VMs with unparalleled graphics performance,
|
doing this, NVIDIA `vGPU` provides VMs with unparalleled graphics performance,
|
||||||
compute performance, and application compatibility, together with the
|
compute performance, and application compatibility, together with the
|
||||||
cost-effectiveness and scalability brought about by sharing a GPU among multiple
|
cost-effectiveness and scalability brought about by sharing a GPU among multiple
|
||||||
workloads. A vGPU can be either time-sliced or Multi-Instance GPU (MIG)-backed
|
workloads. A `vGPU` can be either time-sliced or Multi-Instance GPU (MIG)-backed
|
||||||
with [MIG-slices](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/).
|
with [MIG-slices](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/).
|
||||||
|
|
||||||
| Technology | Description | Behavior | Detail |
|
| Technology | Description | Behavior | Detail |
|
||||||
@ -46,14 +46,14 @@ $ lspci -s d0:00.0 -vv | grep Region
|
|||||||
For large BARs devices, MMIO mapping above 4G address space should be `enabled`
|
For large BARs devices, MMIO mapping above 4G address space should be `enabled`
|
||||||
in the PCI configuration of the BIOS.
|
in the PCI configuration of the BIOS.
|
||||||
|
|
||||||
Some hardware vendors use different name in BIOS, such as:
|
Some hardware vendors use a different name in BIOS, such as:
|
||||||
|
|
||||||
- Above 4G Decoding
|
- Above 4G Decoding
|
||||||
- Memory Hole for PCI MMIO
|
- Memory Hole for PCI MMIO
|
||||||
- Memory Mapped I/O above 4GB
|
- Memory Mapped I/O above 4GB
|
||||||
|
|
||||||
If one is using a GPU based on the Ampere architecture and later additionally
|
If one is using a GPU based on the Ampere architecture and later additionally
|
||||||
SR-IOV needs to be enabled for the vGPU use-case.
|
SR-IOV needs to be enabled for the `vGPU` use-case.
|
||||||
|
|
||||||
The following steps outline the workflow for using an NVIDIA GPU with Kata.
|
The following steps outline the workflow for using an NVIDIA GPU with Kata.
|
||||||
|
|
||||||
@ -154,7 +154,7 @@ $ ./build-kernel.sh -v 5.15.23 -g nvidia build
|
|||||||
$ sudo -E ./build-kernel.sh -v 5.15.23 -g nvidia install
|
$ sudo -E ./build-kernel.sh -v 5.15.23 -g nvidia install
|
||||||
```
|
```
|
||||||
|
|
||||||
To build NVIDIA Driver in Kata container, `linux-headers` is required.
|
To build NVIDIA Driver in Kata container, `linux-headers` are required.
|
||||||
This is a way to generate deb packages for `linux-headers`:
|
This is a way to generate deb packages for `linux-headers`:
|
||||||
|
|
||||||
> **Note**:
|
> **Note**:
|
||||||
@ -177,7 +177,7 @@ kernel = "/usr/share/kata-containers/vmlinuz-nvidia-gpu.container"
|
|||||||
|
|
||||||
Use the following steps to pass an NVIDIA GPU device in pass-through mode with Kata:
|
Use the following steps to pass an NVIDIA GPU device in pass-through mode with Kata:
|
||||||
|
|
||||||
1. Find the Bus-Device-Function (BDF) for GPU device on host:
|
1. Find the Bus-Device-Function (BDF) for the GPU device on the host:
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
$ sudo lspci -nn -D | grep -i nvidia
|
$ sudo lspci -nn -D | grep -i nvidia
|
||||||
@ -219,7 +219,7 @@ Use the following steps to pass an NVIDIA GPU device in pass-through mode with K
|
|||||||
crw-rw-rw- 1 root root 10, 196 Mar 18 02:27 vfio
|
crw-rw-rw- 1 root root 10, 196 Mar 18 02:27 vfio
|
||||||
```
|
```
|
||||||
|
|
||||||
4. Start a Kata container with GPU device:
|
4. Start a Kata container with the GPU device:
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
# You may need to `modprobe vhost-vsock` if you get
|
# You may need to `modprobe vhost-vsock` if you get
|
||||||
@ -246,9 +246,228 @@ Use the following steps to pass an NVIDIA GPU device in pass-through mode with K
|
|||||||
## NVIDIA vGPU mode with Kata Containers
|
## NVIDIA vGPU mode with Kata Containers
|
||||||
|
|
||||||
NVIDIA vGPU is a licensed product on all supported GPU boards. A software license
|
NVIDIA vGPU is a licensed product on all supported GPU boards. A software license
|
||||||
is required to enable all vGPU features within the guest VM.
|
is required to enable all vGPU features within the guest VM. NVIDIA vGPU manager
|
||||||
|
needs to be installed on the host to configure GPUs in vGPU mode. See [NVIDIA Virtual GPU Software Documentation v14.0 through 14.1](https://docs.nvidia.com/grid/14.0/) for more details.
|
||||||
|
|
||||||
> **TODO**: Will follow up with instructions
|
### NVIDIA vGPU time-sliced
|
||||||
|
|
||||||
|
In the time-sliced mode, the GPU is not partitioned and the workload uses the
|
||||||
|
whole GPU and shares access to the GPU engines. Processes are scheduled in
|
||||||
|
series. The best effort scheduler is the default one and can be exchanged by
|
||||||
|
other scheduling policies see the documentation above how to do that.
|
||||||
|
|
||||||
|
Beware if you had `MIG` enabled before to disable `MIG` on the GPU if you want
|
||||||
|
to use `time-sliced` `vGPU`.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
$ sudo nvidia-smi -mig 0
|
||||||
|
```
|
||||||
|
|
||||||
|
Enable the virtual functions for the physical GPU in the `sysfs` file system.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
$ sudo /usr/lib/nvidia/sriov-manage -e 0000:41:00.0
|
||||||
|
```
|
||||||
|
|
||||||
|
Get the `BDF` of the available virtual function on the GPU, and choose one for the
|
||||||
|
following steps.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
$ cd /sys/bus/pci/devices/0000:41:00.0/
|
||||||
|
$ ls -l | grep virtfn
|
||||||
|
```
|
||||||
|
|
||||||
|
#### List all available vGPU instances
|
||||||
|
|
||||||
|
The following shell snippet will walk the `sysfs` and only print instances
|
||||||
|
that are available, that can be created.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
# The 00.0 is often the PF of the device the VFs will have the funciont in the
|
||||||
|
# BDF incremented by some values so e.g. the very first VF is 0000:41:00.4
|
||||||
|
|
||||||
|
cd /sys/bus/pci/devices/0000:41:00.0/
|
||||||
|
|
||||||
|
for vf in $(ls -d virtfn*)
|
||||||
|
do
|
||||||
|
BDF=$(basename $(readlink -f $vf))
|
||||||
|
for md in $(ls -d $vf/mdev_supported_types/*)
|
||||||
|
do
|
||||||
|
AVAIL=$(cat $md/available_instances)
|
||||||
|
NAME=$(cat $md/name)
|
||||||
|
DIR=$(basename $md)
|
||||||
|
|
||||||
|
if [ $AVAIL -gt 0 ]; then
|
||||||
|
echo "| BDF | INSTANCES | NAME | DIR |"
|
||||||
|
echo "+--------------+-----------+----------------+------------+"
|
||||||
|
printf "| %12s |%10d |%15s | %10s |\n\n" "$BDF" "$AVAIL" "$NAME" "$DIR"
|
||||||
|
fi
|
||||||
|
|
||||||
|
done
|
||||||
|
done
|
||||||
|
```
|
||||||
|
|
||||||
|
If there are available instances you get something like this (for the first VF),
|
||||||
|
beware that the output is highly dependent on the GPU you have, if there is no
|
||||||
|
output check again if `MIG` is really disabled.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
| BDF | INSTANCES | NAME | DIR |
|
||||||
|
+--------------+-----------+----------------+------------+
|
||||||
|
| 0000:41:00.4 | 1 | GRID A100D-4C | nvidia-692 |
|
||||||
|
|
||||||
|
| BDF | INSTANCES | NAME | DIR |
|
||||||
|
+--------------+-----------+----------------+------------+
|
||||||
|
| 0000:41:00.4 | 1 | GRID A100D-8C | nvidia-693 |
|
||||||
|
|
||||||
|
| BDF | INSTANCES | NAME | DIR |
|
||||||
|
+--------------+-----------+----------------+------------+
|
||||||
|
| 0000:41:00.4 | 1 | GRID A100D-10C | nvidia-694 |
|
||||||
|
|
||||||
|
| BDF | INSTANCES | NAME | DIR |
|
||||||
|
+--------------+-----------+----------------+------------+
|
||||||
|
| 0000:41:00.4 | 1 | GRID A100D-16C | nvidia-695 |
|
||||||
|
|
||||||
|
| BDF | INSTANCES | NAME | DIR |
|
||||||
|
+--------------+-----------+----------------+------------+
|
||||||
|
| 0000:41:00.4 | 1 | GRID A100D-20C | nvidia-696 |
|
||||||
|
|
||||||
|
| BDF | INSTANCES | NAME | DIR |
|
||||||
|
+--------------+-----------+----------------+------------+
|
||||||
|
| 0000:41:00.4 | 1 | GRID A100D-40C | nvidia-697 |
|
||||||
|
|
||||||
|
| BDF | INSTANCES | NAME | DIR |
|
||||||
|
+--------------+-----------+----------------+------------+
|
||||||
|
| 0000:41:00.4 | 1 | GRID A100D-80C | nvidia-698 |
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
Change to the `mdev_supported_types` directory for the virtual function on which
|
||||||
|
you want to create the `vGPU`. Taking the first output as an example:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
$ cd virtfn0/mdev_supported_types/nvidia-692
|
||||||
|
$ UUIDGEN=$(uuidgen)
|
||||||
|
$ sudo bash -c "echo $UUIDGEN > create"
|
||||||
|
```
|
||||||
|
|
||||||
|
Confirm that the `vGPU` was created. You should see the `UUID` pointing to a
|
||||||
|
subdirectory of the `sysfs` space.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
$ ls -l /sys/bus/mdev/devices/
|
||||||
|
```
|
||||||
|
|
||||||
|
Get the `IOMMU` group number and verify there is a `VFIO` device created to use
|
||||||
|
with Kata.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
$ ls -l /sys/bus/mdev/devices/*/
|
||||||
|
$ ls -l /dev/vfio
|
||||||
|
```
|
||||||
|
|
||||||
|
Use the `VFIO` device created in the same way as in the pass-through use-case.
|
||||||
|
Beware that the guest needs the NVIDIA guest drivers, so one would need to build
|
||||||
|
a new guest `OS` image.
|
||||||
|
|
||||||
|
### NVIDIA vGPU MIG-backed
|
||||||
|
|
||||||
|
We're not going into detail what `MIG` is but briefly it is a technology to
|
||||||
|
partition the hardware into independent instances with guaranteed quality of
|
||||||
|
service. For more details see [NVIDIA Multi-Instance GPU User Guide](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/).
|
||||||
|
|
||||||
|
First enable `MIG` mode for a GPU, depending on the platform you're running
|
||||||
|
a reboot would be necessary. Some platforms support GPU reset.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
$ sudo nvidia-smi -mig 1
|
||||||
|
```
|
||||||
|
|
||||||
|
If the platform supports a GPU reset one can run, otherwise you will get a
|
||||||
|
warning to reboot the server.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
$ sudo nvidia-smi --gpu-reset
|
||||||
|
```
|
||||||
|
|
||||||
|
The driver per default provides a number of profiles that users can opt-in when
|
||||||
|
configuring the MIG feature.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
$ sudo nvidia-smi mig -lgip
|
||||||
|
+-----------------------------------------------------------------------------+
|
||||||
|
| GPU instance profiles: |
|
||||||
|
| GPU Name ID Instances Memory P2P SM DEC ENC |
|
||||||
|
| Free/Total GiB CE JPEG OFA |
|
||||||
|
|=============================================================================|
|
||||||
|
| 0 MIG 1g.10gb 19 7/7 9.50 No 14 0 0 |
|
||||||
|
| 1 0 0 |
|
||||||
|
+-----------------------------------------------------------------------------+
|
||||||
|
| 0 MIG 1g.10gb+me 20 1/1 9.50 No 14 1 0 |
|
||||||
|
| 1 1 1 |
|
||||||
|
+-----------------------------------------------------------------------------+
|
||||||
|
| 0 MIG 2g.20gb 14 3/3 19.50 No 28 1 0 |
|
||||||
|
| 2 0 0 |
|
||||||
|
+-----------------------------------------------------------------------------+
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
Create the GPU instances that correspond to the `vGPU` types of the `MIG-backed`
|
||||||
|
`vGPUs` that you will create [NVIDIA A100 PCIe 80GB Virtual GPU Types](https://docs.nvidia.com/grid/13.0/grid-vgpu-user-guide/index.html#vgpu-types-nvidia-a100-pcie-80gb).
|
||||||
|
|
||||||
|
```sh
|
||||||
|
# MIG 1g.10gb --> vGPU A100D-1-10C
|
||||||
|
$ sudo nvidia-smi mig -cgi 19
|
||||||
|
```
|
||||||
|
|
||||||
|
List the GPU instances and get the GPU instance id to create the compute
|
||||||
|
instance.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
$ sudo nvidia-smi mig -lgi # list the created GPU instances
|
||||||
|
$ sudo nvidia-smi mig -cci -gi 9 # each GPU instance can have several compute
|
||||||
|
# instances. Instance -> Workload
|
||||||
|
```
|
||||||
|
|
||||||
|
Verify that the compute instances were created within the GPU instance
|
||||||
|
|
||||||
|
```sh
|
||||||
|
$ nvidia-smi
|
||||||
|
... snip ...
|
||||||
|
+-----------------------------------------------------------------------------+
|
||||||
|
| MIG devices: |
|
||||||
|
+------------------+----------------------+-----------+-----------------------+
|
||||||
|
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
|
||||||
|
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG|
|
||||||
|
| | | ECC| |
|
||||||
|
|==================+======================+===========+=======================|
|
||||||
|
| 0 9 0 0 | 0MiB / 9728MiB | 14 0 | 1 0 0 0 0 |
|
||||||
|
| | 0MiB / 4095MiB | | |
|
||||||
|
+------------------+----------------------+-----------+-----------------------+
|
||||||
|
... snip ...
|
||||||
|
```
|
||||||
|
|
||||||
|
We can use the [snippet](#list-all-available-vgpu-instances) from before to list
|
||||||
|
the available `vGPU` instances, this time `MIG-backed`.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
| BDF | INSTANCES | NAME | DIR |
|
||||||
|
+--------------+-----------+----------------+------------+
|
||||||
|
| 0000:41:00.4 | 1 |GRID A100D-1-10C | nvidia-699 |
|
||||||
|
|
||||||
|
| BDF | INSTANCES | NAME | DIR |
|
||||||
|
+--------------+-----------+----------------+------------+
|
||||||
|
| 0000:41:00.5 | 1 |GRID A100D-1-10C | nvidia-699 |
|
||||||
|
|
||||||
|
| BDF | INSTANCES | NAME | DIR |
|
||||||
|
+--------------+-----------+----------------+------------+
|
||||||
|
| 0000:41:01.6 | 1 |GRID A100D-1-10C | nvidia-699 |
|
||||||
|
... snip ...
|
||||||
|
```
|
||||||
|
|
||||||
|
Repeat the steps after the [snippet](#list-all-available-vgpu-instances) listing
|
||||||
|
to create the corresponding `mdev` device and use the guest `OS` created in the
|
||||||
|
previous section with `time-sliced` `vGPUs`.
|
||||||
|
|
||||||
## Install NVIDIA Driver + Toolkit in Kata Containers Guest OS
|
## Install NVIDIA Driver + Toolkit in Kata Containers Guest OS
|
||||||
|
|
||||||
@ -263,7 +482,7 @@ export EXTRA_PKGS="gcc make curl gnupg"
|
|||||||
```
|
```
|
||||||
|
|
||||||
Having the `$ROOTFS_DIR` exported in the previous step we can now install all the
|
Having the `$ROOTFS_DIR` exported in the previous step we can now install all the
|
||||||
need parts in the guest OS. In this case we have an Ubuntu based rootfs.
|
needed parts in the guest OS. In this case, we have an Ubuntu based rootfs.
|
||||||
|
|
||||||
First off all mount the special filesystems into the rootfs
|
First off all mount the special filesystems into the rootfs
|
||||||
|
|
||||||
@ -281,9 +500,9 @@ Now we can enter `chroot`
|
|||||||
$ sudo chroot ${ROOTFS_DIR}
|
$ sudo chroot ${ROOTFS_DIR}
|
||||||
```
|
```
|
||||||
|
|
||||||
Inside the rootfs one is going to install the drivers and toolkit to enable easy
|
Inside the rootfs one is going to install the drivers and toolkit to enable the
|
||||||
creation of GPU containers with Kata. We can also use this rootfs for any other
|
easy creation of GPU containers with Kata. We can also use this rootfs for any
|
||||||
container not specifically only for GPUs.
|
other container not specifically only for GPUs.
|
||||||
|
|
||||||
As a prerequisite install the copied kernel development packages
|
As a prerequisite install the copied kernel development packages
|
||||||
|
|
||||||
@ -304,6 +523,7 @@ $ ./NVIDIA-Linux-x86_64-510.54.run -x
|
|||||||
$ cd NVIDIA-Linux-x86_64-510.54
|
$ cd NVIDIA-Linux-x86_64-510.54
|
||||||
$ ./nvidia-installer -k 5.15.23-nvidia-gpu
|
$ ./nvidia-installer -k 5.15.23-nvidia-gpu
|
||||||
```
|
```
|
||||||
|
|
||||||
Having the drivers installed we need to install the toolkit which will take care
|
Having the drivers installed we need to install the toolkit which will take care
|
||||||
of providing the right bits into the container.
|
of providing the right bits into the container.
|
||||||
|
|
||||||
@ -325,7 +545,7 @@ Create the hook execution file for Kata:
|
|||||||
/usr/bin/nvidia-container-toolkit -debug $@
|
/usr/bin/nvidia-container-toolkit -debug $@
|
||||||
```
|
```
|
||||||
|
|
||||||
As a last step one can do some cleanup of files or package caches. Build the
|
As the last step one can do some cleanup of files or package caches. Build the
|
||||||
rootfs and configure it for use with Kata according to the development guide.
|
rootfs and configure it for use with Kata according to the development guide.
|
||||||
|
|
||||||
Enable the `guest_hook_path` in Kata's `configuration.toml`
|
Enable the `guest_hook_path` in Kata's `configuration.toml`
|
||||||
@ -334,7 +554,7 @@ Enable the `guest_hook_path` in Kata's `configuration.toml`
|
|||||||
guest_hook_path = "/usr/share/oci/hooks"
|
guest_hook_path = "/usr/share/oci/hooks"
|
||||||
```
|
```
|
||||||
|
|
||||||
One has build a NVIDIA rootfs, kernel and now we can run any GPU container
|
One has built a NVIDIA rootfs, kernel and now we can run any GPU container
|
||||||
without installing the drivers into the container. Check NVIDIA device status
|
without installing the drivers into the container. Check NVIDIA device status
|
||||||
with `nvidia-smi`
|
with `nvidia-smi`
|
||||||
|
|
||||||
@ -362,7 +582,7 @@ Fri Mar 18 10:36:59 2022
|
|||||||
+-----------------------------------------------------------------------------+
|
+-----------------------------------------------------------------------------+
|
||||||
```
|
```
|
||||||
|
|
||||||
As a last step one can remove the additional packages and files that were added
|
As the last step one can remove the additional packages and files that were added
|
||||||
to the `$ROOTFS_DIR` to keep it as small as possible.
|
to the `$ROOTFS_DIR` to keep it as small as possible.
|
||||||
|
|
||||||
## References
|
## References
|
||||||
|
Loading…
Reference in New Issue
Block a user