Rankings for the hindcast stage

I seem to have overlooked the last sentence in the following paragraph, which is from Competition: Water Supply Forecast Rodeo: Development Arena – does this mean it’s possible to rank in a prize winning position, but not receive a payment because of the judges?

Hindcast Stage Prizes

October–December 2023

Test your model against historical ground truth data! Train your models and submit code. Your code will be executed to perform inference on a held-out test set. Prizes will be awarded based on a combination of your leaderboard performance and an evaluation of your model report by a panel of judges.

I’m just going to assume that it’s reserved solely for cases where the person legitimately shouldn’t have won, then (e.g. constructing the hold out set from publicly available data and training on it specifically or something like that).

Since nobody knows everything and the only way to see all perspectives is to walk in each other’s shoes, I would like to mention that I’ve found one thing repeatedly getting in my way during my participation in this challenge: personally I’m having difficulty clearly defining the specific data points that I’m required to predict. For example, I ended up noticing that during the hindcast stage, we’re apparently only required to predict one out of the 2-3 month season (season length varies by site). We’re literally given 2/3 of the data, and need only predict the remaining 1/3. While that’s lovely and much less work for me, I’d like to start in on my final solution for when I’ll have to predict all those months when they’re a full 6 months out. To do that, I’d like to be able to write code that I know will be of use for the task, but the data_read API is still being finalized. I sincerely hope my pointing this out is taken in the spirit with which it’s meant – believe me, I of all people know that there’s always something to stay late working on.

Thanks for hosting the challenge, folks. I’m glad to be part of it.

Hi @mmiron,

does this mean it’s possible to rank in a prize winning position, but not receive a payment because of the judges?

I’m just going to assume that it’s reserved solely for cases where the person legitimately shouldn’t have won, then (e.g. constructing the hold out set from publicly available data and training on it specifically or something like that).

The Hindcast Stage prizes (and also the Overall prizes later) are explicitly awarded based on an evaluation process that incorporates both leaderboard performance and evaluation of a model report by judges together. The leaderboard will be shown to give people an idea of their quantitative performance, but you should not think about ranks on the leaderboard as “prize winning positions”.

Forecast skill (i.e., the quantitative metrics shown on the leaderboard) will be the largest part of the evaluation criteria for solutions, but it is not the only evaluation criterion. More details about the evaluation criteria will be shared soon when the Evaluation Arena opens for testing.

The judging process in this competition exists because the task of seasonal water supply forecasting involves a very small set of ground truth data. In the Hindcast Stage, with 10 years for 26 sites, there are only 260 unique ground truth values. Because of this limitation in statistical rigor, solutions will additionally be evaluated qualitatively based on additional considerations of technical merit.

Solutions which violate challenge requirements like training on test data will be disqualified, and that is separate from the judging evaluation criteria.

personally I’m having difficulty clearly defining the specific data points that I’m required to predict. For example, I ended up noticing that during the hindcast stage, we’re apparently only required to predict one out of the 2-3 month season (season length varies by site). We’re literally given 2/3 of the data, and need only predict the remaining 1/3.

This is not a correct framing. Your task is to predict the seasonal water supply using only data from before the issue date. So even though we provide historical naturalized flow data for April, May, and June, you cannot use those months data as a feature if they don’t fall before a given issue date. For example, and issue date of March 15 may only use February and earlier naturalized flow observations as features. Please carefully no future data requirement.

but the data_read API is still being finalized

The data_reading code in the runtime repository is supplemental sample code that is provided to help participants get started. It is not an API for the challenge in the sense that participants are required to use it in their solutions. You should not wait for or depend on data_reading code for a data source to be added if you want to use that data source in your modeling.

The data_download code in the runtime repository does represent an API for the challenge, and you should make sure that you use it to download data in the same format. Data download code for approved data sources is mostly complete, and upcoming changes will primarily be for requested data sources that get approved.

1 Like

Thank you for being so thorough in addressing my questions; is there any additional information available about how ranks will be calculated? I.e. the actual formula that will be used to determine who receives a payment.

You have my apologies for being impatient, but without knowing the specific criteria that I’ll be judged by, it’s impossible to decide whether continuing is worth the investment of time. Has the formula not been decided on yet…?

Hi @mmiron,

The evaluation criteria and the model report requirements will be released next week. Thanks for your patience.

Hi @mmiron,

The Hindcast Stage Evaluation Arena launched today. You can see the website here. In particular, see the evaluation criteria and the model report requirements.