vLLM + CUDA mismatch

Just a quick note after seeing this error pop up in the environment while trying to perform a smoke test:

ImportError: undefined symbol: _ZN3c104cuda9SetDeviceEi

This means vLLM was installed for a different PyTorch/CUDA version than what’s actually running. Do vLLM, PyTorch, and CUDA versions all match?

@amorarc Thanks for flagging this. After investigating, we have decided that we won’t be supporting vLLM in the competition runtime. The runtime has been updated.

vLLM is tightly coupled to specific CUDA and PyTorch versions, and the available wheels are not compatible with our CUDA 12.6 / PyTorch 2.9.0 environment. We recommend using a different accelerated llm runtime if you would like to use one.

1 Like

The vLLM framework is fully compatible with PyTorch 2.9.0 and CUDA 12.6. The current errors are likely due to the outdated vLLM 0.2.5 version installed during the 433db53 commit. To ensure proper model acceleration and stability, I kindly request the organizers to upgrade vLLM to a version between 0.14.0 and 0.16.0.

@decem Thanks for this suggestion! If you’re able, please submit a PR to the runtime repository with the proposed vLLM upgrade and we’ll be happy to take a look and review it.

My local hardware is a Blackwell architecture GPU that does not support CUDA 12.6, so I’m afraid I won’t be able to submit a correct pull request.

Yeah, I’ve ran into the same issue. Updating the CUDA kernel of the eval environment to 12.8 would be nice so that people using Blackwell can run an identical container although sounds like that might not be possible. I’ve just been using a slightly different Docker definition to run stuff locally.

1 Like

@cszc Hi! I’ve just submitted **PR #5** to re-enable vLLM (v0.14.0) support.

Contrary to my previous concern, I managed to verify this configuration on an RTX 4090 with CUDA 12.6. While I understand the competition runtime uses A100 GPUs, the software stack (CUDA/PyTorch) is now aligned, and the ImportError / symbol issues are resolved on my end.

I’ve addressed the dependency deadlocks between vLLM, NeMo (protobuf), and cuda-python using specific uv overrides, as detailed in the PR’s changelog. Could you please help run a final smoke test on your A100 environment? If it passes, this will restore high-performance inference support for all participants.

Thank you for your time and for reviewing the PR!

1 Like

@cszc I have a significant update on PR #5.

While performing further validation using the competition’s pytest tests/test_imports.py on my CUDA 12.6 environment, I identified and fixed a critical protobuf version conflict that would have broken wandb and NeMo.

The Issue:

  • vLLM metadata requests protobuf>=6.30.0 (resolving to 7.x), but NeMo and wandb use pre-generated code only compatible with the protobuf 5.x API. Version 6.x+ introduced breaking changes in the generated code format.

The Fix:

  • I’ve updated the uv override for protobuf to >=5.29.5, <5.30.
  • This satisfies NeMo’s hard constraints and ensures wandb works correctly.
  • Note: vLLM inference works perfectly with protobuf 5.29.x (the 6.x requirement is only for its gRPC server features, which aren’t used in this runtime).

Verification Results:

All 9 core tests now pass successfully in the CUDA 12.6 environment:

  • test_pytorch, test_torchaudio, test_whisper
  • test_canary_qwen, test_granite_speech, test_phi_4_multimodal
  • test_parakeet_tdt, test_wav2vec, test_qwen3_asr

The PR is now fully “battle-tested” and ready for your review. Could you please help verify it on your A100 infrastructure?

Thank you!

1 Like