mirror of
https://github.com/k3s-io/kubernetes.git
synced 2025-07-25 12:43:23 +00:00
e2e_node: use upstream gpu installer
The current GPU installer was built in 2017, from source that no longer
exists in Kubernetes ([adding commit][1]. The image was built on 2017-06-13.
Unfortunately, this installer no longer appears to work. When debugging
on the same node type as used by test-infra, it failed to build the
driver as the kernel sha was no longer available.
This lead to needing to find a new way to install GPUs. The smallest
logical change was switching to [cos-gpu-installer][2]
. There is a newer version of this available on [googlesource][3] that
I have not yet tested as it's not clear what the state of the project
is, as I couldn't find docs outside of the source itself.
We install things to the same location as previously to avoid needing
extra downstream changes. There are a couple of weird issues here
however, like needing to run the container twice to correctly update the
LD Cache.
[1]: 1e77594958/cluster/gce/gci/nvidia-gpus/Dockerfile
[2]: https://github.com/GoogleCloudPlatform/cos-gpu-installer
[3]: https://cos.googlesource.com/cos/tools/+/refs/heads/master/src/cmd/cos_gpu_installer/
This commit is contained in:
parent
b9565beef0
commit
0cc8af82a1
@ -2,7 +2,18 @@
|
||||
|
||||
runcmd:
|
||||
- modprobe configs
|
||||
- docker run -v /dev:/dev -v /home/kubernetes/bin/nvidia:/rootfs/nvidia -v /etc/os-release:/rootfs/etc/os-release -v /proc/sysrq-trigger:/sysrq -e BASE_DIR=/rootfs/nvidia --privileged k8s.gcr.io/cos-nvidia-driver-install@sha256:cb55c7971c337fece62f2bfe858662522a01e43ac9984a2dd1dd5c71487d225c
|
||||
# Setup the installation target at make it executable
|
||||
- mkdir -p /home/kubernetes/bin/nvidia
|
||||
- mount --bind /home/kubernetes/bin/nvidia /home/kubernetes/bin/nvidia
|
||||
- mount -o remount,exec /home/kubernetes/bin/nvidia
|
||||
# Compile and install the nvidia driver (precompiled driver installation currently fails)
|
||||
- docker run --net=host --pid=host -v /dev:/dev -v /:/root -v /home/kubernetes/bin/nvidia:/usr/local/nvidia -e NVIDIA_INSTALL_DIR_HOST=/home/kubernetes/bin/nvidia -e NVIDIA_INSTALL_DIR_CONTAINER=/usr/local/nvidia -e NVIDIA_DRIVER_VERSION=460.91.03 --privileged gcr.io/cos-cloud/cos-gpu-installer:latest
|
||||
# Run the installer again, as on the first try it doesn't detect the libnvidia-ml.so
|
||||
# on the second attempt we detect it and update the ld cache.
|
||||
- docker run --net=host --pid=host -v /dev:/dev -v /:/root -v /home/kubernetes/bin/nvidia:/usr/local/nvidia -e NVIDIA_INSTALL_DIR_HOST=/home/kubernetes/bin/nvidia -e NVIDIA_INSTALL_DIR_CONTAINER=/usr/local/nvidia -e NVIDIA_DRIVER_VERSION=460.91.03 --privileged gcr.io/cos-cloud/cos-gpu-installer:latest
|
||||
# Remove build containers. They're very large.
|
||||
- docker rm -f $(docker ps -aq)
|
||||
# Standard installation proceeds
|
||||
- mount /tmp /tmp -o remount,exec,suid
|
||||
- usermod -a -G docker jenkins
|
||||
- mkdir -p /var/lib/kubelet
|
||||
|
Loading…
Reference in New Issue
Block a user