RuntimeError: cuDNN error: CUDNN_STATUS_ARCH_MISMATCH

Hello,

I recently submitted my code, which had been thoroughly tested and was working locally on a GPU. However, upon running a smoke test, I encountered an error: RuntimeError: cuDNN error: CUDNN_STATUS_ARCH_MISMATCH.

The environment packages for my current submission are as follows:

  • cuda-version: 11.2 (hb11dac2_2, conda-forge)
  • cudatoolkit: 11.2.2 (hbe64b41_11, conda-forge)
  • cudnn: 8.8.0.121 (h0800d71_0, conda-forge)

In contrast, the environment for my previously submitted code, which worked successfully a few days ago, had these versions:

  • cudatoolkit: 11.2.2 (hbe64b41_11, conda-forge)
  • cudnn: 8.4.1.50 (hed8a83a_0, conda-forge)

Is it possible that the update of cudnn is causing this error? I’d appreciate any guidance on how to resolve the issue.

I am also facing a similar error now. While I have submitted successfully before without changing my setup

1 Like

We are also experiencing the same issue when submitting. Running the code locally (via test-submission) works without issue in a number of different setups.

Thanks all for letting us know. This is most likely due to a recent PR to the runtime repo.

We’ll revert that change now to ensure that submissions which previously worked still work. Let us know if you’re still having issues.

Hi @emily, This PR shouldn’t be a problem since it just adds additional dependencies to the same runtime environment and we are all able to execute our code locally. I would guess that an extra addition of specific CUDNN package/version via conda would resolve the issue.

Setting the CUDNN backend as false could also help. e.g.:

torch.backends.cudnn.enabled = False

Hi @amanolis, I reverted that PR so that submissions would go back to working in the meantime. That PR appears to be a change in recent days that could have affected the cudnn version

we are all able to execute our code locally

Can you clarify this? Do you mean that using the image with the MinkowskiEngine added worked? It would be very surprising if that worked but then submissions through the DrivenData platform failed.

I would guess that an extra addition of specific CUDNN package/version via conda would resolve the issue.

Very possible. Feel free to test this out and submit a PR to the runtime repo.

Update: our first attempt has not fixed the issue. We are still looking into the cause. Thanks for your patience.

Still come across the same issue minutes ago, has the rollback process been done?

Okay, things should be good to go now. We’ve pinned the cudnn version to 8.4. Thanks again for flagging this and for your patience as we implemented a fix!

Edit: MinkowskiEngine has been restored to the runtime repo as well.

1 Like