We are given best public Averaged Mean Quantile Loss on the leaderboard. For the final results of the hindcast stage, will there also be a private leaderboard, or the current leadearboard already shows all actual results for the hindcast stage from odd years between 2005-2023 (after minor corrections that should be added by November 28)?
Hi @progin,
There is no private leaderboard for this challenge.
However, the current leaderboard is for the Development Arena, and the Hindcast Stage will use the Evaluation Arena leaderboard (not yet available) as input to prizes. The Evaluation Arena will ask you to submit code to perform inference, rather than directly submitting the predictions themselves.
Assuming that participants are able to reproduce their scores with their submitted code, then you may see the same leaderboard scores on the Evaluation Arena leaderboard.
Please note that, as documented, participants will also be asked to submit a model report and the prizes will be awarded based on both leaderboard performance and a qualitative evaluation of the model report together.
Hi @jayqi,
Thank you for the clarification. Then, as we already know how our models impact the final score from the Hindcast Stage, let’s assume that I’ve added a feature to my model that boosts performance of my CV strategy (that seems reliable) by a significant level, say 10 on the Averaged Mean Quantile Loss. The feature however makes the leaderboard worse. Should I then forget about the added feature or still use it, even knowing that it makes the final score worse? On the one hand, it makes LB worse and I’m not sure if justification of its use in a report will have greater importance than the LB score, but on the other hand it seems right to use the feature that makes the model more robust instead of overfitting to the LB.
Hi @progin,
The Hindcast Evaluation Arena launched today with additional details about submissions and evaluation that you may find helpful.
Per the evaluation criteria, solutions will be evaluated for their forecast skill based on the quantile score of their submission (i.e., leaderboard score). Solutions will also be considered for their rigor, which will be evaluated based on the model report. As the model report format instructions note, you are encouraged to discuss features that you experimented with and why you did or did not include them.
You may also want to consider later parts of the competition—in particular the Overall prize evaluation. For the overall prize, we will ask participants to submit an expanded model report, and we plan to also require cross-validation results. This will also be an opportunity for you to discuss features that have more robust performance in a cross-validation setting. Further details about these submissions are not yet available—please keep an eye out for more on this soon.