Could you share more details about how the criteria (forecast skill, rigor, …) will be combined into one score? You provided the weighting, but it’s not transparent how each criterion is brought to the same scale as the others.
Even for the criteria with a quantifiable scale “Forecast Skill (Hindcast cross-validation) (30%)”, there are many options. Will the score/points/grades for the forecasting skill be based on ranking? Do only the X top spots get points? Or will the pinball score be directly used somehow? Or in relation to some state-of-the-art score?
I’m asking for a very pragmatic reason: I’m trying to decide whether I should sacrifice model efficiency to squeeze out a few extra points on the leaderboard.
Are there any more details that you can share at this time about how the evaluation criteria will be combined?