Hi, can someone explain me what’s means “true mean consumption over the prediction window under consideration” in the metric description? Thanks!
Take for example series_id = 100454
. Each measurement of consumption is made hourly in cold_start_test.csv
and you have to predict the consumption for a week. As you can see (in submission_format.csv
), the prediction_window
says daily
, which means that the number you insert there is the consumption on that day
7021 100454 2016-03-22 00:00:00 14.805556 0 daily
Of course, you don’t know that number. Let’s say that the model predicts a consumption of 124568 watt-hours, but the real number is 541268 watt-hours. The number 541268 is the true mean consumption over the prediction window under consideration.
As you may know now, the measurement (in consumption_train.csv
) is almost always (I’d say always) made hourly, so in this case, on 2016-03-22
, maybe the model predicts a consumption every hour (24 in total), but you have to sum those 24 measurements and insert the number there
7021 100454 2016-03-22 00:00:00 14.805556 124568 daily
Yes, I said have to sum instead of take the mean because “we include a prediction_window
column to help indicate what level of temporal aggregation we want you to predict.” Aggregation means sum.
I hope I am not mistaken about this.
Thank you so much! @rosgori, very clear.