Back to DrivenData | Blog

40 % of test temperatures missing


In the problem description tab, it is stated that :
“In the test set, varying amounts of historical consumption and temperature data are given for each series, ranging from 1 day to 2 weeks. The temperature data may contain a small portion of wrong / missing values.”
However I found that roughly 40 % of the test temperatures are missing, a percentage high enough to contradict the previous statement.
Can anyone confirm this statistic?

Thank you in advance,


In cold_start_test.csv, 296 unique series_id have no info about temperature

In other words, about 47 % (296 out of 625) of unique series_id have missing values…sounds like a huge number of missing temperatures to me. I guess the description was wrong.

Yes, this is not a “small” amount. We’ve updated the documentation, thanks.