To be sure can we use location as feature for training and inference, or create different model location wise? Is it acceptable?
Similarly can we use grid_ids or generic encoding of grid_id derived from data as features?
A quick response will help in focusing on the right path
All together grid_ids in submission would have removed this altogether , but most of the grid ids are present, In fact all in case of PM track
But I think NO2 is on altogether different data.
One more question in the rules it is mentioned that training data should not be used in inference and lets suppose I create certain features which are computing on historical data of last few days (not label). As 2021 test data is just after 2020 training data then there is possibility that january 2021 features has used data from december which is in training.
Confused here whether such time series features that can overlap train and test can be used.
I think rule is clear for labels.
It also suggest that I can use satellite data during training period in inference, like average of green band of last three days.
I hop I understood correctly.
Actually data is mix of timeseries and standlaone data also and at least I need to adapt my thinking to exact competition rules