How will evaluation criteria be combined?

Could you share more details about how the criteria (forecast skill, rigor, …) will be combined into one score? You provided the weighting, but it’s not transparent how each criterion is brought to the same scale as the others.

Even for the criteria with a quantifiable scale “Forecast Skill (Hindcast cross-validation) (30%)”, there are many options. Will the score/points/grades for the forecasting skill be based on ranking? Do only the X top spots get points? Or will the pinball score be directly used somehow? Or in relation to some state-of-the-art score?

I’m asking for a very pragmatic reason: I’m trying to decide whether I should sacrifice model efficiency to squeeze out a few extra points on the leaderboard.

Are there any more details that you can share at this time about how the evaluation criteria will be combined?

Hi @kurisu,

Thanks for your patience. We’re not able to provide any further specific details at this time.

However, the tradeoff between efficiency and forecast skill you mention sounds relevant to the evaluation of your model, and we encourage you to include results about it in your report. Judges will take all reported results into account during the evaluation process.