Back to DrivenData | Blog

Preprocessing Data to smooth outliers

Hi,
I did some preprocessing on the data to smooth out some outliers in various forms.
Attached are 2 csv files, 1 for test and 1 for train, with 2 new columns: New_Consumption, New_HourlyRate

I’m curious if using these 2 columns instead of the original consumption gives you better results or not, as i did a NON ML solution, i’m curious to see if a NN responds the same to it.

It’s too complicated to list the exact calculations i did, as i did them using PowerBI and not in python code that i can easily share,
If it produces good results, i can take the time to document it into some bullets.

Test Data:
https://drive.google.com/file/d/1jnD0c81LHEYeG5etaxNEDgAypWxWCe_w/view?usp=sharing

TrainData:
https://drive.google.com/file/d/1mBzezT6FFf3P1DvsFSYrdB8YhJSFNtX6/view?usp=sharing