Back to DrivenData | Blog

4th place solution

Hello.

This post is a short description of my final approach in this competition. I tried a lot of different ideas, which I’m going to describe in detail in a paper/blog post on medium, but 90% of success is due to:

Preprocessing
Min_max scaling within ‘series_id’, del all targets that have more than 4 constant values in sequence (I guess those are missing values that were replaced by median). Fill missing temperatures with hourly mean / month mean using only train data.

Features
Categorical features from timestamp: year, month, day, day of year, hour of year, day of week, hour of day. Those categories transformed with sin/cos transformation and used as numerical features.
Additional features: is_day_off, is_next_day_off, type of building (sum of day_off columns as string), series_id as category.

Validation
10 StratifiedKFold on series_id

Models
FF NN, 5 layers, 512 neurons in each layer, relu activation. Adam optimizer, mae loss (mape, rmse gave worse results).

Fun stuff
Two weeks ago I found a bug in my code: I was predicting the consumption 24-336 hours backward. It ruined days and weeks, but hours were ok (I was in top10 with 3307 score). After fixing this bug, I became top1 with a gap to the 2nd place, but it seems for me that I slightly overfitted the leaderboard and after a shakeup end up on 4th place.

Graz to the winners and thank you all for the competition. Good luck in next events!

11 Likes

Thanks Denis - I very much look forward to your blog. Please link it back here when you have published it!!

Thank you for sharing! Looking forward to the blog post on Medium.

Thanks @DenisVorotyntsev. We are eagerly waiting for your blog.

Post about my approach: https://towardsdatascience.com/cold-start-energy-predictions-d3971b1803e