The rules mention that interpolation of data is not permitted. They further explain it with an example where if we are asked for March 2016 forecast we cannot use the year 2016 as predictor.
Does this mean that we have to use the given variables only and cannot feature engineer new variables ?
Or have I misunderstood something ?
Thank you.
Interpolation in this context means using the future to predict the past, which is why it is not a valid tool for this problem.
Creating new variables is a separate concern from interpolation. Feature engineering can, and probably should, be used as long as the input data for those new features does not include the future.
Ohh okay. Thanks for clarification.
‘Note: Weather data is available for test periods under the assumption that reasonably accurate forecasts will be available to algorithms that the time that we are attempting to make predictions about the future.’
Does this mean we are allowed to use the given weather data in the prediction period to predict the energy consumption?
e.g. :
If we have to predict consumption starting from 01.02.2015 until 01.03.2015, are we allowed to use the weather data from 01.02.2015 to 01.03.2015 aswell? Or are we only allowed to use weather data up to the 31.01.2015?
You may use all of the weather data that is made available to you as part of the competition.