Pre-trained models and external data

Are pre-trained models and external data allowed?

Well, about the external data:

Unless otherwise expressly stated on the Competition Website, Participants must not use data other than the Data to develop and test their models and Submissions.

But… pre-trained models use external data, so…


It is ok to use pretrained models as long as the model and the weights can be released under an open source license. We have added this note to the problem description.

Do we have to publish which external model or data are we using? Or only on the case of winning?

Per the problem description as long as the model and weights that can be released under and open source license, you do not need to share which pretrained models you are using ahead of time.


I feel like that allowing pre-trained models and disallowing external data might be a bit tricky. I assume that somebody can get any dataset (available for commercial use) and heavily overfit some neural network, then release it under open license (in some very hidden place that doesn’t even available for search indexing or release it one day before the deadline). After that, it’s possible to use any dataset that “wrapped” as a pre-trained model.

Not sure what rule can be added to ensure that this type of models won’t be used.


@itdxer - maybe that is OK in the eye of the hosts. Ultimately, their aim to have the best possible model for the task requested, what better way than potentially training a model on some closed source industry data. As a bonus, ML hobbyist and energy enthusiasts gain access to an open-source pre-trained SOTA model.

I guess, on the other hand, it might seem unfair to those who do not have access to this data. However, there is plenty of open source data available to train it on