It seems 2,880 cells among the 18,130 are located with strange coordinates (blue square on the top left part of the image below). These cells are all in the training data, region labelled as “other”, and contain around 10% of the SWE labels. Is it normal ?
I mapped this and received following image that doesn’t include the same square you have at the top left
More over, I wasn’t able to find data in the training set outside of the states listed in the metadata
Maybe download the dataset again and double check if it’s still there?
I displayed the 700 sites provided in
ground_measures_*_feature.csv as you did (triangles), but also the 18,130 cells from
grid_cells.geojson (small dots). The problem appears with the latter file.
submission_format.csv file there are 9,066 cells. Looking at the locations of these cells (red dots below, as opposed to the blue dots and the 700 sites), we can see a large concentration of these cells are in California and Colorado. Is it normal that the metric of the challenge favor these 2 states ? This question linked to Problem clarification thread and how the test data will look like
@simon.jegou Thanks for your note and helpful visual. The geographic variation you are seeing is expected. You may choose to use or filter out specific dates and geographies as you see fit when training your model. Keep in mind that while you should produce estimates for every grid cell in the
submission_format.csv, you will only be evaluated on a subset of these values.