Using external data

I realize the rules state that we cannot, however, given that successfully completing the first track is a prerequisite for the second track, it seems maybe we can?

This is stated in the Innovation Track:

Interpretability & biological groundedness: Explain how your model makes its predictions in a way that would be useful to a biologist or forensic scientist. Do the features it considers important make biological sense? How well does your model integrate with experiments, or other biological data sources that might provide further information?

I can think of good approaches that could incorporate other biological data, however, it is more difficult for me to think of an approach where I am not allowed to use this data, and then down the line I could just add on biological data. It seems to me that those are just different approaches for different models. If I design a model that /does/ incorporate biological data on the first pass, I could think of ways to reasonably add to that data in the future to make it robust.

So, can we use other data sources in this competition? If the ultimate goal is robustness and safety of engineered biological devices, this seems like an unnecessary barrier in my opinion.

@cszc I see you are a moderator, could you clarify the external data usage? Where it is allowed to be used?

@mmcguffi @Chavdar Thanks for this question. External data may be used in the Innovation Track for the purposes of showcasing the capabilities of the model post-training:

While the model submitted to the Innovation Track should be trained only on the data provided for the Prediction Track, you are welcome to use other data to illustrate the capabilities of that model in your Innovation Track report.

We’ve added some language on the page to clarify. If you have other ideas for ways to incorporate external data into your model, you are welcome to describe these in your report.