Just Making Sure Data Is Correct

Hi!
Don’t want to be annoying or anything, just making sure the data is correct.
I spotted some places in which a specific hour has 10X the consumption of all other hours of that specific day + series.
e.g:
RowID 108926
All hours except 2PM of 18/8/2016 series 103634 have mean comsumption of 45k
2PM has a VERY high consumption of 440K
which is ~10X more than all other hours.

sometimes competitions have some kind of data issue that is found too late and a lot of competitor’s hours are gone.
Could you please just verify and let us know it’s OK and not some kind of glitch in the matrix?

Thanks!

Hi @AmirH, you might want to check your preprocessing. The values you cite for that series look as expected:

Hi Bull,
Thanks for your response.

The picture actually does show the issue i was talking about, the first part out of 4 of your plot shows the very low values with one single peak, that’s the data i meant.
image

It just looked very odd that one specific hour of the day would have 10X more than all other hours

These spikes are reflections of the data as collected. As with all real data, it is possible that some outliers are anomalies and some are real data. Effective methods will have to take this into consideration. :+1:

1 Like