Hi,
I’m not sure i understood how the metric works.

E.G:

Hourly True value: 50
Prediction: 40

nmae = abs(40-50)*((24/24)/50)
Which results in 0.2
This seems reasonable

If I multiply these values and treat them as Daily values:
Daily True Value: 1200
Prediction: 960
nmae = abs(960-1200)*((24/7)/1200)
Which results in 0.6857143

It become even more odd when you look at weekly model errors. My errors are in the order of 0.9 - 2.5 for many of the series in my test sets. Thus, we are essentially making week errors 12 worse than they should be!!

To me, it would have made more sense to weight these in a way such that the weekly errors are 12x more influential, not 12x bigger. This could have been done by taking an equal number of samples of hourly, daily and weekly, 1x, 24/7x and 12x respectively.

It would be interesting to know why this method was not used, in favour of the current NMAE.

I think this somehow works different,
It’s quite weird that the NMAE in LB is around 0.4, when indeed those weekly errors are penalized so harshly.
I’m pretty sure I got something wrong in the way i understood it.

I think the idea is to make each series equally important. In your example, taking into account that there are 24 predictions for hourly and 7 for daily, mae1= 24*10*(24/24)/50=4.8 and mae2=7*(24*10)*(24/7)/(24*50)=4.8

1 - the value you got is 4.8, where the “correct” metric in this case should be 0.2 (40/50 is 20% error)
2 - in your calculation you simply multiplied both sides by 24 or 7, simply removing these gets you to the same result.
3 - simply calculating (abs(true-pred) / true)
which is a regular MAE, gives this dataset 0.2 MAE, which seems to me the logical result.

@bull
Could you please give some insight on this?
What is the expected NMAE on this set of data?

The calculation is made directly on the same format as the submission format for the competition. This means that hourly is compared to hourly, daily to daily, and weekly to weekly. The aggregated values for consumption in daily and weekly predictions are sums, not averages.

The weights, as mentioned by @adilism are used to make sure each series is equally important when we take the overall mean across series, even if there are fewer predictions made for that particular series.

If actual mean is zero, it will be clipped to a very small number instead of dividing by zero. However, there are no examples where the value is exactly zero in the test set.