Share the knowledge

ezetl · March 1, 2018, 7:12pm

Just to share some knowledge (I ended in rank 200 so I am not sure it will be useful for someone), I got a really good boost at some point by eliminating low entropy columns. I passed from 0.24 to 0.18 logloss. My rationale was that all those columns had little information and thus would not contribute to the classifier (XGBoost). I think that the same results could be achieved with what Sagol said ( recursive feature elimination based on feature_importances ). I added some features, but they did not improve the model significantly: counting the members in the household, adding mean, std and median of numeric columns and also normalizing them.
I think I stopped submitting around a month ago, so the model could be really improved, especially because I did not do any kind of stacking. Thank you all for sharing!

Topic		Replies	Views
Calling on the LB leaders: Did you use the indiv data at all? Pover-T Tests: Predicting Poverty	15	1554	February 22, 2018
Leaderboard Split Pover-T Tests: Predicting Poverty	2	1567	February 7, 2018
Spitballing for fun? Richter's Predictor	9	2108	September 30, 2020
Luck with individual data? Pover-T Tests: Predicting Poverty	0	907	January 8, 2018
22nd place Non ML submission looking for teammate Cold Start Energy Forecasting	2	806	September 17, 2018

Share the knowledge

Related topics