I recently submitted my code, which had been thoroughly tested and was working locally on a GPU. However, upon running a smoke test, I encountered an error: RuntimeError: cuDNN error: CUDNN_STATUS_ARCH_MISMATCH.
The environment packages for my current submission are as follows:
cuda-version: 11.2 (hb11dac2_2, conda-forge)
cudatoolkit: 11.2.2 (hbe64b41_11, conda-forge)
cudnn: 8.8.0.121 (h0800d71_0, conda-forge)
In contrast, the environment for my previously submitted code, which worked successfully a few days ago, had these versions:
cudatoolkit: 11.2.2 (hbe64b41_11, conda-forge)
cudnn: 8.4.1.50 (hed8a83a_0, conda-forge)
Is it possible that the update of cudnn is causing this error? I’d appreciate any guidance on how to resolve the issue.
We are also experiencing the same issue when submitting. Running the code locally (via test-submission) works without issue in a number of different setups.
Hi @emily, This PR shouldn’t be a problem since it just adds additional dependencies to the same runtime environment and we are all able to execute our code locally. I would guess that an extra addition of specific CUDNN package/version via conda would resolve the issue.
Setting the CUDNN backend as false could also help. e.g.:
Hi @amanolis, I reverted that PR so that submissions would go back to working in the meantime. That PR appears to be a change in recent days that could have affected the cudnn version
we are all able to execute our code locally
Can you clarify this? Do you mean that using the image with the MinkowskiEngine added worked? It would be very surprising if that worked but then submissions through the DrivenData platform failed.
I would guess that an extra addition of specific CUDNN package/version via conda would resolve the issue.
Very possible. Feel free to test this out and submit a PR to the runtime repo.
Okay, things should be good to go now. We’ve pinned the cudnn version to 8.4. Thanks again for flagging this and for your patience as we implemented a fix!
Edit: MinkowskiEngine has been restored to the runtime repo as well.