Whisper Licensing Terms

Hi all,

Just wanted to check whether these pretrained models are allowed for prize elegibility since licensing is a bit confusing:

Distil-Whisper: I noticed it was trained on TED-LIUM (CC-BY-NC-ND 3.0), which has a Non-Commercial clause. Does this disqualify it for prize eligibility under the external data rules?

Whisper large-v3: The model weights are MIT-licensed, but the training data is web-scraped with unclear provenance. Is this sufficient for prize eligibility, or does the training data need to be traceable?

Thank you!

Hello - yes, that is correct. As stated in our External Data and Models section, “To be eligible for prizes, any external data or pre-trained models used must be licensed so that the resulting model can be released for broad use, in and beyond the competition, including for commercial purposes (no NC, CC NC, or CC BY-NC licenses).”

1 Like