Hello, I have some questions about the competition that I was hoping you could clarify:
- In the Development Stage section it is stated that we have SWE values for 44K grid cells for model training. However, “train_labels.csv” contains 10878 cells. Same with “submission_format.csv” (36K cell grids are mentioned vs 9066 on the CSV) Why are the figures that different?
- It is not fully clear to me how the RMSE is calculated. Do you take predicted and real SWE of all dates and grids and calculate the RMSE? My main confusion comes from this sentence: “y tilde” is the estimated Dst values for t0 and t+1. What is “Dst”, and what do t0 and t+1 represent?
- You mention that “You may only use data up through the day of estimation”. Does this mean that for predicting SWE of week 2013-01-08, we can only use data prior to 2013-01-08 (we would be predicting the next week’s SWE)? Or can we use data up to 2013-01-14, which is the end of the week?
- I guess the answer is no, but can we use ground truth SWE values of a grid cell prior to the week of estimation as features?
- Similar to the previous question, can we use model SWE predictions of a grid cell prior to the week of estimation as features?
I think the second and third questions arise from the fact that I am not sure whether you need to predict future grid cell’s SWE given past values of ground measures, or if you have to predict grid cell’s SWE of a week given ground measurements of that same week.
Thank you in advance!