Can we use location, grid_id as features

Hi All
To be sure can we use location as feature for training and inference, or create different model location wise? Is it acceptable?
Similarly can we use grid_ids or generic encoding of grid_id derived from data as features?
A quick response will help in focusing on the right path
All together grid_ids in submission would have removed this altogether , but most of the grid ids are present, In fact all in case of PM track

Regards
Aditya

To be sure can we use location as feature for training and inference, or create different model location wise? Is it acceptable?

Yes, you can create separate models for different locations or use location as a feature.

Similarly can we use grid_ids or generic encoding of grid_id derived from data as features?

Yes, but be cautioned that there are grid IDs present in the NO2 test set that are not present in the NO2 train set. Any solution should be able to adapt to missing features, including grid ids.

1 Like

But I think NO2 is on altogether different data.
One more question in the rules it is mentioned that training data should not be used in inference and lets suppose I create certain features which are computing on historical data of last few days (not label). As 2021 test data is just after 2020 training data then there is possibility that january 2021 features has used data from december which is in training.
Confused here whether such time series features that can overlap train and test can be used.

I believe that question has already been answered here:

Please let me know if you need additional clarification.

1 Like

I think rule is clear for labels.
It also suggest that I can use satellite data during training period in inference, like average of green band of last three days.
I hop I understood correctly.
Actually data is mix of timeseries and standlaone data also and at least I need to adapt my thinking to exact competition rules