Hi,
When I was checking the versions of CUDA and PyTorch in the docker container that is used in pets-prize-challenge-runtime, I noticed that the version numbers returned through various interfaces seem to be inconsistent. This topic is somewhat similar to a past question here.
CUDA 11.0 and cuDNN / NVIDIA driver versions
- PyTorch: 1.12.1.post201
- CUDA version by PyTorch: 11.2
- CUDA version by nvcc: V12.0.76
- CUDA runtime version: 11.0.x
I used the commands shown at the bottom of this message to collect the version information.
These version numbers should be consistent.
The last part of the PyTorch version number should correspond to a version number of CUDA, such as 1.12.1+cu116. However, post201 is an unknown number.
The reason of this inconsistency seems to be in the choice of the base image and the python modules.
In Dockerfile
FROM nvidia/cuda:11.0.3-base-ubuntu20.04
In environment-gpu.yml
- nvidia::cuda-nvcc=12.0.76
- pytorch-gpu=1.12.1
The exact binary package of PyTorch is determined by conda automatically from conda-forge channel, which resulted to choose 1.12.1.post201, which seems to support CUDA 11.2. However, that is different from the CUDA bundled in the base image (CUDA 11.0.x).
Separately, nvcc of a different version is installed.
I am not sure this inconsistency is still compatible at this moment, but at least I can say that this is not a popular or āstandardā configuration. In the official website of PyTorch, PyTorch 1.12.1 binary package is assumed to be used with CUDA 10.2, 11.3, or 11.6. These are distributed in pytorch
channel.
As a result, other PyTorch-related modules distributed by the ecosystem assume the combination of the versions of CUDA and PyTorch shown above.
I was trying to use PyG (torch-geometric, a graph neural network library), but I could not make PyG to work in the current runtime environment because it does not support the combination of CUDA 11.0 and PyTorch 1.12.1. The installation of PyG is successful but it causes a runtime error.
Is there any way to upgrade the CUDA version to 10.2, 11.3 or 11.6? I think I can submit a pull request to change the base image and change the versions of the related packages correspondingly.
(I know it is almost too late to propose this ā¦)
I used the following commands to collect the version numbers within the container.
$ conda run -n condaenv python -c "import torch; print(torch._version_)"
$ conda run -n condaenv python -c "import torch; print(torch.version.cuda)"
$ conda run -n condaenv nvcc -V
$ ls -la /usr/local/cuda-11.0/targets/x86_64-linux/lib/