@amorarc Thanks for flagging this. After investigating, we have decided that we won’t be supporting vLLM in the competition runtime. The runtime has been updated.
vLLM is tightly coupled to specific CUDA and PyTorch versions, and the available wheels are not compatible with our CUDA 12.6 / PyTorch 2.9.0 environment. We recommend using a different accelerated llm runtime if you would like to use one.
The vLLM framework is fully compatible with PyTorch 2.9.0 and CUDA 12.6. The current errors are likely due to the outdated vLLM 0.2.5 version installed during the 433db53 commit. To ensure proper model acceleration and stability, I kindly request the organizers to upgrade vLLM to a version between 0.14.0 and 0.16.0.
@decem Thanks for this suggestion! If you’re able, please submit a PR to the runtime repository with the proposed vLLM upgrade and we’ll be happy to take a look and review it.
Yeah, I’ve ran into the same issue. Updating the CUDA kernel of the eval environment to 12.8 would be nice so that people using Blackwell can run an identical container although sounds like that might not be possible. I’ve just been using a slightly different Docker definition to run stuff locally.
@cszc Hi! I’ve just submitted **PR #5** to re-enable vLLM (v0.14.0) support.
Contrary to my previous concern, I managed to verify this configuration on an RTX 4090 with CUDA 12.6. While I understand the competition runtime uses A100 GPUs, the software stack (CUDA/PyTorch) is now aligned, and the ImportError / symbol issues are resolved on my end.
I’ve addressed the dependency deadlocks between vLLM, NeMo (protobuf), and cuda-python using specific uv overrides, as detailed in the PR’s changelog. Could you please help run a final smoke test on your A100 environment? If it passes, this will restore high-performance inference support for all participants.
While performing further validation using the competition’s pytest tests/test_imports.py on my CUDA 12.6 environment, I identified and fixed a critical protobuf version conflict that would have broken wandb and NeMo.
The Issue:
vLLM metadata requests protobuf>=6.30.0 (resolving to 7.x), but NeMo and wandb use pre-generated code only compatible with the protobuf 5.x API. Version 6.x+ introduced breaking changes in the generated code format.
The Fix:
I’ve updated the uv override for protobuf to >=5.29.5, <5.30.
This satisfies NeMo’s hard constraints and ensures wandb works correctly.
Note: vLLM inference works perfectly with protobuf 5.29.x (the 6.x requirement is only for its gRPC server features, which aren’t used in this runtime).
Verification Results:
All 9 core tests now pass successfully in the CUDA 12.6 environment: