Can we use location, grid_id as features

adityakumarsinha · March 8, 2022, 4:36pm

Hi All
To be sure can we use location as feature for training and inference, or create different model location wise? Is it acceptable?
Similarly can we use grid_ids or generic encoding of grid_id derived from data as features?
A quick response will help in focusing on the right path
All together grid_ids in submission would have removed this altogether , but most of the grid ids are present, In fact all in case of PM track

Regards
Aditya

cszc · March 8, 2022, 5:39pm

To be sure can we use location as feature for training and inference, or create different model location wise? Is it acceptable?

Yes, you can create separate models for different locations or use location as a feature.

Similarly can we use grid_ids or generic encoding of grid_id derived from data as features?

Yes, but be cautioned that there are grid IDs present in the NO2 test set that are not present in the NO2 train set. Any solution should be able to adapt to missing features, including grid ids.

adityakumarsinha · March 8, 2022, 5:53pm

But I think NO2 is on altogether different data.
One more question in the rules it is mentioned that training data should not be used in inference and lets suppose I create certain features which are computing on historical data of last few days (not label). As 2021 test data is just after 2020 training data then there is possibility that january 2021 features has used data from december which is in training.
Confused here whether such time series features that can overlap train and test can be used.

cszc · March 8, 2022, 5:56pm

I believe that question has already been answered here:

Please let me know if you need additional clarification.

adityakumarsinha · March 8, 2022, 6:07pm

I think rule is clear for labels.
It also suggest that I can use satellite data during training period in inference, like average of green band of last three days.
I hop I understood correctly.
Actually data is mix of timeseries and standlaone data also and at least I need to adapt my thinking to exact competition rules

Topic		Replies	Views
Model building (one or multiple) NASA Airathon	1	377	March 20, 2022
Grid ID's that are in submission_format but not in train_labels NASA Airathon	3	329	March 3, 2022
Clarification on features' dates used for prediction NASA Airathon	9	690	February 4, 2022
Can we use time-series model? Snowcast Showdown	9	728	January 7, 2022
Historical input/output features Snowcast Showdown	4	422	January 24, 2022

Can we use location, grid_id as features

Related topics