Back to DrivenData | Blog

Questions about the competition

Hello, I have some questions about the competition that I was hoping you could clarify:

  • In the Development Stage section it is stated that we have SWE values for 44K grid cells for model training. However, “train_labels.csv” contains 10878 cells. Same with “submission_format.csv” (36K cell grids are mentioned vs 9066 on the CSV) Why are the figures that different?
  • It is not fully clear to me how the RMSE is calculated. Do you take predicted and real SWE of all dates and grids and calculate the RMSE? My main confusion comes from this sentence: “y tilde” is the estimated Dst values for t0 and t+1. What is “Dst”, and what do t0 and t+1 represent?
  • You mention that “You may only use data up through the day of estimation”. Does this mean that for predicting SWE of week 2013-01-08, we can only use data prior to 2013-01-08 (we would be predicting the next week’s SWE)? Or can we use data up to 2013-01-14, which is the end of the week?
  • I guess the answer is no, but can we use ground truth SWE values of a grid cell prior to the week of estimation as features?
  • Similar to the previous question, can we use model SWE predictions of a grid cell prior to the week of estimation as features?

I think the second and third questions arise from the fact that I am not sure whether you need to predict future grid cell’s SWE given past values of ground measures, or if you have to predict grid cell’s SWE of a week given ground measurements of that same week.

Thank you in advance!

1 Like

Hi @Galeros93 - Thanks for the questions.

  • Many cells do not have ground truth values for every week. See this thread for context.
  • This was misquoted. The definition has been updated on that page, that should clear it up.
  • The estimates can use data up through the day of estimation. So if you are estimating SWE on 2013-01-08, you can use data up through 2013-01-08.
  • Right, the answer is no. No ground truth SWE values that are not in the ground_measures_features list can be used as features - and those will not apply for cells to be predicted.
  • Yes, you can use your own generated estimations for grid cells in prior weeks, as long as you are adhering to all other rules in how these are generated. As mentioned in the challenge, SWE is a cumulative process.

Hope that helps!

Thank you @glipstein. Please, allow me to delve into some points:

  • Many cells do not have ground truth values for every week. See this thread for context.

Counting the non-null SWE values on the “training_labels.csv” I still cannot get any number close to 44K samples. Concretely, I get 91490 values. Why do you think this may be happening? I leave my code below for replicability:

import pandas as pd
train_labels = pd.read_csv("train_labels.csv")
print((~train_labels.set_index("cell_id", drop=True).isnull()).sum().sum())
  • The estimates can use data up through the day of estimation. So if you are estimating SWE on 2013-01-08, you can use data up through 2013-01-08.

Sorry, let me ask this again using weeks, as the challenge indicates, so we remove all ambiguity. So for example, if I have to predict SWE for week 2013-01-15 on a grid cell, I can only use ground measurements (and/or satellite images, weather data, etc) of week 2013-01-08 and previous weeks (2013-01-01, etc), correct?

Thanks again for your answers!

1 Like

Hi @Galeros93, I understand the oposite. If you are making predictions for 2013-01-08 you can use ground measurements from 2013-01-08 also. If my understanding is correct then is not a problem of forecasting but a problem of creating high resolution data from low resolution inputs (althought it could be a mixture of both depending on when the ground measurements are taken )

3 Likes

Hi all -

However, “train_labels.csv” contains 10878 cells.

Ah I see. That number has been updated on the page (you are correct, this is 11K).

I can only use ground measurements (and/or satellite images, weather data, etc) of week 2013-01-08 and previous weeks (2013-01-01, etc)

The latest response is right. The SWE values are daily measures, that are made once a week. So a prediction on 2013-01-08 is for SWE on that day. Does that clear this up?

I’m not completely sure, with the latest response you are referring to my response? So my response is correct?

@ironbar Yes that is referring to your response. Just to clarify, the ground measures that are allowed are from different sites than the cells you’re be submitting predictions for. You are probably already aware of this from the website.

Good luck!

Thanks, @glipstein, and @ironbar. I understand now. For predicting any SWE on, let’s say, day 2013-01-08, I can use ground measurements (and weather data, satellite images, etc) of that very same day (and previous days) to make the prediction.