My best model solution is an ensemble of 11 models and it performed public 0.2482 and private 0.2539.
- 3 Logistic Regression
- 1 Logistic Regression with Bagging
- 1 Random Forest
- 6 Gradient Boost using xgboost
But using only 4 of that models to ensemble I can score public 0.2483 and private 0.2541.
-
1 Logistic Regression
-
1 Random Forest
-
2 Gradient Boost using xgboost
-
I did little feature engineering.
-
For all features I considered 0 as a NA or null level.
-
Some models I did feature selection.
-
All models are trainned using a 4 fold crossvalidation, so I have a good estimate of the performance.
-
The enseble was done using a second level of xgboost trainning over all crossvalidated predictions of the first level.
More details I will provide in the model documentation and then I will post here.
Gilberto