Grid ID's that are in submission_format but not in train_labels

bryanyahir03 · March 2, 2022, 7:58pm

I was making some simple analysis on both files and I notice that there are 68 unique grids_id in submission_format but only 66 unique grids_id in train_labels. The two missing grid_ids in train_files corresponds to 7F1D1 and WZNCR.

Are we suppose to train a model without those grid ids?

cszc · March 2, 2022, 9:13pm

Hi @bryanyahir03 - thanks for the observation! Yes, for the NO2 track there are two grid ids in the submission_format that are not present in train_labels. Models should be able to generalize to new unseen grid cells.

bryanyahir03 · March 3, 2022, 4:59am

Thanks for your reply,

There’s other doubt I have with the following statement:

winning solutions must be able to produce predictions for the same grid cells on a single new day.

Is the satellite data availabe in the s3 bucket for the new date or we need to develop a model that downloads the data from the orignal source?

cszc · March 3, 2022, 5:32am

The satellite data already hosted on S3 will remain hosted on S3 for new dates. Your final submission should include any code needed to retrieve ancillary data or other satellite data needed for a new date.

Topic		Replies	Views
Can we use location, grid_id as features NASA Airathon	4	522	March 8, 2022
Data files and train labels NASA Airathon	5	546	February 25, 2022
Many NAs on train_labels.csv Snowcast Showdown	4	922	December 9, 2021
Problem clarification Snowcast Showdown	8	1041	December 22, 2021
Final results submission NASA Airathon	3	408	February 4, 2022

Grid ID's that are in submission_format but not in train_labels

Related topics