How will evaluation criteria be combined?

kurisu · March 8, 2024, 2:12pm

Could you share more details about how the criteria (forecast skill, rigor, …) will be combined into one score? You provided the weighting, but it’s not transparent how each criterion is brought to the same scale as the others.

Even for the criteria with a quantifiable scale “Forecast Skill (Hindcast cross-validation) (30%)”, there are many options. Will the score/points/grades for the forecasting skill be based on ranking? Do only the X top spots get points? Or will the pinball score be directly used somehow? Or in relation to some state-of-the-art score?

I’m asking for a very pragmatic reason: I’m trying to decide whether I should sacrifice model efficiency to squeeze out a few extra points on the leaderboard.

Are there any more details that you can share at this time about how the evaluation criteria will be combined?

jayqi · March 19, 2024, 6:40pm

Hi @kurisu,

Thanks for your patience. We’re not able to provide any further specific details at this time.

However, the tradeoff between efficiency and forecast skill you mention sounds relevant to the evaluation of your model, and we encourage you to include results about it in your report. Judges will take all reported results into account during the evaluation process.

mmiron · June 30, 2024, 6:00am

@jayqi,

I’m of the opinion that integrity demands I make this comment now, rather than at a later date: these contests are a huge investment of time (take it from me). It’s literally always necessary to sacrifice other opportunities in order to prioritize the best one(s). While I’ve enjoyed the contest – and am currently very pleased with my choosing to compete, instead of sacrificing this particular opportunity in favor of another (~1st place) – the rules have made it all but impossible to make plans. The work involved with these things already borders on exploitative, given how easy it is to walk away with nothing. Making business decisions impossible on top of that risk feels, to be perfectly frank, like adding insult to injury.

Just food for thought from someone that’s done something like a dozen for various government agencies. Thank you for your time spent reading my opinion, and best of luck all.

Topic		Replies	Views
Overall Evaluation Criteria Water Supply Forecast Rodeo	1	128	February 20, 2024
Forecast stage clarification Water Supply Forecast Rodeo	4	235	December 15, 2023
Thoughts on Overall efficiency scoring Water Supply Forecast Rodeo	1	184	January 4, 2024
Clarification about "explainability" Water Supply Forecast Rodeo	3	218	December 13, 2023
Are further model improvements allowed for the final prize stage? Water Supply Forecast Rodeo	1	236	January 24, 2024

How will evaluation criteria be combined?

Related topics