Hi,
First off, I’d like to thank DrivenData for organizing the contest in such a way that making a mistake in the hindcast stage doesn’t preclude prizes in following stages – provided there was a successful code execution run, of course.
I wanted to share a problem that I’ve come up against: the Microsoft Planetary Computer data. My score is a little better with some of that data (judging from my local scoring), but it’s unwieldy and very slow to collect for all 26 sites. So slow, in fact, that the bare minimum data I need to increase my score can take slightly over 30 minutes to prepare just by itself. I think any benefit to my score will probably be offset by losses in the efficiency category when it comes to the overall prize metrics, and randomly failing runs throughout the Forecast stage (prediction runtime issues) isn’t an option. Since that seems to be contrary to the spirit of the challenge, I wanted to raise the issue and make sure that I’m not misunderstanding something myself.
To be clear, it’s such a pain to work with and such a meager increase in prediction accuracy that I just cut it out of my current submissions. But again, that seems contrary to the spirit of what we’re doing.
So to sum up: is the efficiency portion of the Overall metrics in respect to the Forecast stage runtime? Or will that be judged in respect to subsequent efficiency, meaning that only the 10% from the Forecast stage predictions is what’s included in the Overall prize metrics?