What approaches did people use?

mlearn · August 1, 2016, 7:22am

This was a fun competition but unfortunately I didn’t have enough time to fully engage with it. What worked?

I’d have loved to have had time to try a recurrent neural network on this data but instead I only managed what I’d call a “benchmark” style solution. I used ranger in R (with default parameters) to build regression random forests for each target based on the provided features combined with those from T+2, T+1, T-1 and T-2. I rescaled the responses to sum to 1. I was surprised this did so well (rank 14).

Look forward to hearing more about this all at ECML. How does the ECML session work?

sonam · August 1, 2016, 7:36am

Man, we should have teamed up.

I generated featues ranginf rm std ,mean median etc to root mean square, acceleration . similar set of features from !-1 helped nicely. Rollign max ,mean etc on acceleration.

minmax features( max - abs(min)).

rms and acceleration were nice.

I used features from a mlp/CNN layer which helped but I dropped them in the end because of overfitting…

Finally few tricks which helped:
Discard all probs less than 0.05 add their residuals to the highest prob for that sample.

taking 5-6 top submissions and mean across them

both tricks helped in 0.25 around improvement.

All with ExtraTreesClassifier. Entropy helped vs Gini.

Coudn’t get XGBoost working till the end. Extra trees outperformed it. OnevsAll also helped.

Local Crossvalidation were off by 0.3

Most of work happened in last week.
But Should have teamed up from top rankers. Would have learned much more .

Score ~0.152

sonam · August 1, 2016, 7:41am

Anyone used wavelets features?frequency domain

aakansh9 · August 2, 2016, 7:58am

@sonam

Discard all probs less than 0.05 add there residuals to the highest prob for that sample.

How much score did that improve ?

My feature set was basic statistics (quantiles, min, max, mean ) of measurements or functions of measurements of all sensors in one second windows.

I did customise xgboost. Best single model gave 0.165. Then shifting those predictions by 1, 2, 3, 4 seconds and including them as features reduced it to 0.15ish. Stacking a couple of GBlinear (xgboost), GBtree (xgboost), ET, RF drove it to 0.142. That’s when I was trying thresholding the predictions and got to 0.140. (Public Score).

Looking forward to hear winning solutions.

Also, are there 2 sets of prizes - ECML workshop prizes and Datadriven/AARP prizes ? If yes, will they be awarded to the same top 3 teams ?

sonam · August 2, 2016, 8:21am

With threshold of 0.05, 0.0005 improvement on LB. Score.

aakansh9 · August 2, 2016, 8:42am

You mean 0.0025 ? (Post must be at at least 20 characters. Lame.)

sonam · August 2, 2016, 8:43am

Ahh yes… Typo It was.

simonkamronn · August 3, 2016, 7:11am

For the acceleration data I used a recurrent autoencoder (trained on both test and training data) to generate an embedding for every second. For the rest of the modalities I just took mean and std. I then stacked the vectors from each modality and used a bi-directional RNN with LSTM cells across the entire sequence.

The issue with this approach was clearly overfitting.

sonam · August 3, 2016, 1:34pm

Nice. I couldn’t get plain LSTM working
How much did it score?

simonkamronn · August 3, 2016, 1:57pm

Just above 18 on public set but around 12-14 on test set depending on the random split. So not very good but I suspect more external data and perhaps some feature engineering would benefit. Alas, not enough time in the end.

bikash · August 11, 2016, 8:07am

Example needed for RF or xgboost used in this competition. I want to see how to deal with target.csv. The target is multi column.

aakansh9 · August 14, 2016, 11:11am

@bikash I used these custom functions for CV and train with xgboost - https://gist.github.com/aakansh9/14dc322ae72ff144a311de5a955ac6aa . Code is not so optimized but it did work.

Topic		Replies	Views
Share the knowledge Pover-T Tests: Predicting Poverty	29	2708	March 4, 2018
Care to share general methodologies? Snowcast Showdown	18	571	June 5, 2022
22nd place Non ML submission looking for teammate Cold Start Energy Forecasting	2	805	September 17, 2018
Calling on the LB leaders: Did you use the indiv data at all? Pover-T Tests: Predicting Poverty	15	1553	February 22, 2018
Share your approach! Pump it Up: Data Mining the Water Table	46	20338	December 27, 2021

What approaches did people use?

Related topics