How will evaluation criteria be combined?

Could you share more details about how the criteria (forecast skill, rigor, …) will be combined into one score? You provided the weighting, but it’s not transparent how each criterion is brought to the same scale as the others.

Even for the criteria with a quantifiable scale “Forecast Skill (Hindcast cross-validation) (30%)”, there are many options. Will the score/points/grades for the forecasting skill be based on ranking? Do only the X top spots get points? Or will the pinball score be directly used somehow? Or in relation to some state-of-the-art score?

I’m asking for a very pragmatic reason: I’m trying to decide whether I should sacrifice model efficiency to squeeze out a few extra points on the leaderboard.

Are there any more details that you can share at this time about how the evaluation criteria will be combined?

Hi @kurisu,

Thanks for your patience. We’re not able to provide any further specific details at this time.

However, the tradeoff between efficiency and forecast skill you mention sounds relevant to the evaluation of your model, and we encourage you to include results about it in your report. Judges will take all reported results into account during the evaluation process.

@jayqi,

I’m of the opinion that integrity demands I make this comment now, rather than at a later date: these contests are a huge investment of time (take it from me). It’s literally always necessary to sacrifice other opportunities in order to prioritize the best one(s). While I’ve enjoyed the contest – and am currently very pleased with my choosing to compete, instead of sacrificing this particular opportunity in favor of another (~1st place) – the rules have made it all but impossible to make plans. The work involved with these things already borders on exploitative, given how easy it is to walk away with nothing. Making business decisions impossible on top of that risk feels, to be perfectly frank, like adding insult to injury.

Just food for thought from someone that’s done something like a dozen for various government agencies. Thank you for your time spent reading my opinion, and best of luck all.