I’m not sure i understood how the metric works.
Hourly True value: 50
nmae = abs(40-50)*((24/24)/50)
Which results in 0.2
This seems reasonable
If I multiply these values and treat them as Daily values:
Daily True Value: 1200
nmae = abs(960-1200)*((24/7)/1200)
Which results in 0.6857143
is this correct?
The daily error seems off to me.
This is how I understand the metric to work.
It become even more odd when you look at weekly model errors. My errors are in the order of 0.9 - 2.5 for many of the series in my test sets. Thus, we are essentially making week errors 12 worse than they should be!!
To me, it would have made more sense to weight these in a way such that the weekly errors are 12x more influential, not 12x bigger. This could have been done by taking an equal number of samples of hourly, daily and weekly, 1x, 24/7x and 12x respectively.
It would be interesting to know why this method was not used, in favour of the current NMAE.
I think this somehow works different,
It’s quite weird that the NMAE in LB is around 0.4, when indeed those weekly errors are penalized so harshly.
I’m pretty sure I got something wrong in the way i understood it.
I think the idea is to make each series equally important. In your example, taking into account that there are 24 predictions for hourly and 7 for daily,
mae1= 24*10*(24/24)/50=4.8 and
I’m not sure that is correct.
1 - the value you got is 4.8, where the “correct” metric in this case should be 0.2 (40/50 is 20% error)
2 - in your calculation you simply multiplied both sides by 24 or 7, simply removing these gets you to the same result.
3 - simply calculating (abs(true-pred) / true)
which is a regular MAE, gives this dataset 0.2 MAE, which seems to me the logical result.
Could you please give some insight on this?
What is the expected NMAE on this set of data?
As far as I understand you don’t use the true hourly value. Instead you get the average value of all the hours in the prediction (the 24 hours)
Probably the best way to clarify the doubts is releasing the code implementation of the metric.
Please @bull the metric needs to be clarified, it is essential.
The calculation is made directly on the same format as the submission format for the competition. This means that hourly is compared to hourly, daily to daily, and weekly to weekly. The aggregated values for consumption in daily and weekly predictions are sums, not averages.
The weights, as mentioned by @adilism are used to make sure each series is equally important when we take the overall mean across series, even if there are fewer predictions made for that particular series.
What happens if the true mean consumption is zero? Can we assume that this will not happen for any of the series_ids in the test sets?
There are potentially training examples with zero means over a whole day. This was just bad luck on my part selecting this day to test on and div/0.
@bull - can you confirm this please - just to be super clear
If actual mean is zero, it will be clipped to a very small number instead of dividing by zero. However, there are no examples where the value is exactly zero in the test set.