Forecasting: Weather Data Little Overlap with Train-Submit SiteId-Timestamps

Please inform me if I have an outdated weather file (‘weather.csv’)… is there a fixed download somewhere?

The observed problem is that 5/6 of the ‘weather.csv’ data is out of time scope compared to the corresponding Train & Submit data.

An example:

Site 304 has 2 ForecastIds... which begin (train) and end (submission):

6972	2015-03-04 00:00:00    through     6973	        2016-09-21 00:00:00   

However, weather data ('weather.csv') for Site 304 contains the following Timestamp '2017-12-19 21:00:00'

The timestamp is clearly not aligned… 5/6th of the weather data compared to Train & Submit does not overlap.


You’ll need to resample the weather timestamps and sometimes nudge them around by a few minutes (i.e., 15) to get them to match the train/submit data. There’s also often more than one weather reading per site per timestamp, so you’ll need to choose the best/closest data first (otherwise joins get really exciting, in a bad way). I got about 80% coverage after doing this and then jumped ship to work on the next steps.