Just to make sure the data is correct

AmirH · September 5, 2018, 4:14pm

Hi!
Don’t want to be annoying or anything, just making sure the data is correct.
I spotted some places in which a specific hour has 10X the consumption of all other hours of that specific day + series.
e.g:
RowID 108926
All hours except 2PM of 18/8/2016 series 103634 have mean comsumption of 45k
2PM has a VERY high consumption of 440K
which is ~10X more than all other hours.
This behaviour does not happen in this series at all in other days

sometimes competitions have some kind of data issue that is found too late and a lot of competitor’s hours are gone.
Could you please just verify and let us know it’s OK and not some kind of glitch in the matrix?

Thanks!

rosgori · September 6, 2018, 3:24am

For everybody, let’s plot that:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

train = pd.read_csv('consumption_train.csv', parse_dates=['timestamp'])
sns.relplot(x="timestamp", y="consumption", data=train[train.series_id == 103634], kind='line', aspect=2.5)
plt.legend(['id = 103_634'])

plt.show()

You talk about series_id = 103634 in the train file. Are we seeing something weird?

iandzindo · September 28, 2018, 6:35pm

I though the same thing, but the scatter plot below should clear this up.

It’s obvious from the scatter plot that there are daily variations. Low amounts of power are consumed from 19 PM to 7 AM. But, power consumption jumps to huge values (over 1 million wH) in the working hours, that is from 8 AM to 18 PM.

So, I would say that this sudden jump in consumption is not a glitch, as values as high as 440K wH seem to be fairly common in this distribution.

Topic		Replies	Views
Just Making Sure Data Is Correct	3	638	September 5, 2018
4th place solution Cold Start Energy Forecasting	4	1285	January 10, 2019
Preprocessing Data to smooth outliers Cold Start Energy Forecasting	0	658	October 19, 2018
Question about metric NMAE Cold Start Energy Forecasting	2	901	September 4, 2018
Does anybody have some baseline models in Python? Cold Start Energy Forecasting	0	696	October 3, 2018

Just to make sure the data is correct

Related topics