1st Place Solution

My best model solution is an ensemble of 11 models and it performed public 0.2482 and private 0.2539.

  • 3 Logistic Regression
  • 1 Logistic Regression with Bagging
  • 1 Random Forest
  • 6 Gradient Boost using xgboost

But using only 4 of that models to ensemble I can score public 0.2483 and private 0.2541.

  • 1 Logistic Regression

  • 1 Random Forest

  • 2 Gradient Boost using xgboost

  • I did little feature engineering.

  • For all features I considered 0 as a NA or null level.

  • Some models I did feature selection.

  • All models are trainned using a 4 fold crossvalidation, so I have a good estimate of the performance.

  • The enseble was done using a second level of xgboost trainning over all crossvalidated predictions of the first level.

More details I will provide in the model documentation and then I will post here.



Really looking forward to seeing the code to learn from it. In particular the evaluation, log loss code, I’m stuck at the moment.

Funny how my ensemble of LR, RF and XgB hasn’t helped me reach closer to 0.2500 public. I was stuck at 0.253 and the XgB and RF ensemble got me 0.2526 public.

Looking forward to have a look at your model documentation @giba

Congratulations with your win! Thank you for the interview, it was a very interesting read.

Can you please explain what do you mean under “6 XGBoost models”? Is it different random seeds or different parameters for the models?
If it is different parameters, can you please suggest which parameters are the best to vary in XGBoost before doing ensembles?