Clarification on "No External Data" Rule and Computing Resources

aiwei · June 27, 2025, 3:40pm

Our team has started developing ML pipelines for the competition and would appreciate clarification on a couple of rule interpretations:

1. Pre-trained models and “external data”: Are pre-trained models (e.g., ImageNet-trained CNNs, CLIP, pre-trained transformers) considered “external data” under the competition rules? These are standard practices in modern ML, but are technically trained on external datasets not provided by the competition.

2. Open source model restrictions: Are there any restrictions on using large open source multimodal models (like Llama, Qwen, etc.)? While these meet the open source license requirement, they could require substantial compute resources and might create advantages based on hardware access rather than methodology.

Would appreciate any guidance on these points to ensure we’re all interpreting the rules consistently!

Topic		Replies	Views
Rules clarification: external data(checkpoints?) reproducibility? manual curation/scoring? Predict Wind Speeds of Tropical Storms	1	699	December 10, 2020
Rules clarifications Where's Whale-do?	4	313	July 3, 2022
External Data Clarification Power Laws	4	859	February 20, 2018
Annotation, External data N+1 Fish, N+2 Fish	2	1121	September 28, 2017
Pretrained Neural Networks Tick Tick Bloom Challenge	2	298	February 13, 2023

Clarification on "No External Data" Rule and Computing Resources

Related topics