I’ve been trying to run “make test-submission” but it’s showing
could not select device driver “” with capabilities: [[gpu]].
I have reinstalled my CUDA driver (12.4.1), nvidia-utils (550.120-1), as well as nvidia-container-toolkit (1.16.1-3) but nothing so far seems to work.
So I’m wondering if the problem is my CUDA version, because the runtime github repo stated that we are required to have CUDA 11. Can someone confirm this for me, I really appreciate this.
System Information:
Operating System: Manjaro Linux
KDE Plasma Version: 6.1.5
KDE Frameworks Version: 6.6.0
Qt Version: 6.7.2
Kernel Version: 6.11.2-4-MANJARO (64-bit)
Graphics Platform: X11
GPU: Nvidia RTX 2060 Super
Based on my understanding of CUDA drivers and the NVIDIA Container Toolkit, they should be backwards compatible, in the sense that the relatively recent versions you have installed on the host should support a CUDA runtime library in the container that is an older version like 11.8.
Can you provide more logs or more information about what is writing out the error message that you’ve shown?
Can you also confirm that the image you are using is the GPU version of the image? Is it a locally built image, or is from make pull? When you run make test-submission, I believe it should print out the name of the image before it starts the container.
The fact that you were able to not get a GPU error from using the pulled cdcnarratives.azurecr.io/cdc-narratives-competition:gpu-latest image is good—it means that there’s something specific about the first case that is not working, but that your overall setup should be fine.
Since you have a local image built, it’ll default to using that when you use the Makefile commands. You’ll need to use the SUBMISSION_IMAGE environment variable to specify a different image, like:
SUBMISSION_IMAGE=cdcnarratives.azurecr.io/cdc-narratives-competition:gpu-latest make test-submission
(this is a long command, make sure you grab the whole line)
Alternatively, you can delete your local image.
The reason you’re getting a different error is because the image expects several mounted directories, which your sudo docker run command does not have. It’s these lines you see in the make test-submission printout: