Just to share some knowledge (I ended in rank 200 so I am not sure it will be useful for someone), I got a really good boost at some point by eliminating low entropy columns. I passed from 0.24 to 0.18 logloss. My rationale was that all those columns had little information and thus would not contribute to the classifier (XGBoost). I think that the same results could be achieved with what Sagol said ( recursive feature elimination based on feature_importances ). I added some features, but they did not improve the model significantly: counting the members in the household, adding mean, std and median of numeric columns and also normalizing them.
I think I stopped submitting around a month ago, so the model could be really improved, especially because I did not do any kind of stacking. Thank you all for sharing!
1 Like