Retro-scoring? or big single-week impact?

oshbocker · March 18, 2022, 9:23pm

@Vervan @FBykov what is the intuition behind overfitting to Ground Measurement stations, if you had to guess (or if you found something in the data)? They actually appear to be more geographically dispersed than the other observations in the train labels.

Are they strategically located in high SWE areas, which might lead a model to overpredict SWE when generalizing to non-Ground Measurement stations?
Is it something to do with them being point measurements as opposed to measurements averaged over an entire 1 km^2 grid cell?
Is it some subtle measurement difference attributable to the way ASO captures data compared to the SNOTEL technology?

FBykov · March 20, 2022, 2:11am

I think the precition of ASO data is strongly depends on the type of the Earth surface, e.g. ASO errors is small for smooth surfaces, such as croplands and pastures. But ASO works not well for kurums, shrubs and forests (espetially evergreens). Because the airplane cannot see the snow in crevices between rocks and between the roots, therefore the ASO data for notsmooth surfaces have a negative bias.

ironbar · July 15, 2022, 5:54pm

Hello,
First of all congratulations to the winners!

Second I would like to ask to the host of the challenge for insights of that big leaderboard change on the 7 of March. I have already read that the ground truth is going to be published but I would like to know how I failed to be able to improve.

Thanks
ironbar

ironbar · August 1, 2022, 6:11am

I believed we deserve an explanation after devoting half a year for the challenge. 3 weeks without answer probed me wrong.

rbgb · August 2, 2022, 8:43pm

We take your concerns about the accuracy of evaluation seriously, especially since this competition was a big commitment for all who participated.

I have looked into the scoring, in particular the large changes in scores that occurred on 3/7. I’m happy to say that scoring worked exactly as intended. The reason for the large shift in scores on 3/7 is essentially what Emily already mentioned:

I’ll add: scores from 2/28 were determined using a total of ~400 ground truth measurements; scores from 3/7 were determined using over 4,000 ground truth measurements. At the end of the competition, we scored using over 42,000 measurements. Looking back now that the competition is closed, we know that 3/7 added the most new sites of any other single week in the competition.

In addition, prior to 3/7 all of the ground truth was from ground-based measurements. The updates for that week consisted of mostly flight-based measurements. As FBykov suggests, those datasets could have systematic differences. The goal of the competition was to make a model that performs well regardless of the measurement source.

The change on 3/7 was not a result of missing HRRR data. As Emily mentioned, the data update from 3/7 primarily added ground truth observations to the 2/17 column.

As an additional check, we manually calculated RMSE for two submissions for 2/28 and 3/7: one submission that saw a large decrease in score on 3/7 and one that saw a large increase. We were able to perfectly replicate the scores observed on the leaderboards for the respective weeks.

We’re still verifying the winners, validating solutions, and making final decisions. We will post more information about data releases as soon as possible after the validation process is complete.

emily · September 9, 2022, 9:15pm

Hi all – I’m excited to share that the winning solutions, write ups, and model reports are all available on Github. Check out the winners repo!

You can also read about the winning solutions in the “Meet the Winners” blog post.

Finally, we have released the real-time evaluation dataset used for final scoring. Please keep in mind that this represents the ground truth data that was available at the end of the challenge.

All of these links are available on the competition results page.

Thanks again to everyone who participated in this challenge and made it a success!

Topic		Replies	Views
Problem clarification Snowcast Showdown	8	1041	December 22, 2021
Many NAs on train_labels.csv Snowcast Showdown	4	921	December 9, 2021
Data Quality Issues? Mapping Disaster Risk from Aerial Imagery	3	820	December 14, 2019
Current test targets Snowcast Showdown	2	428	January 10, 2022
Evaluation Submission Process Snowcast Showdown	5	375	January 22, 2022

Retro-scoring? or big single-week impact?

Related topics