Share the knowledge

authman · March 1, 2018, 3:55pm

catboost has been recommended by a bunch of high-performing, russian kaggle grandmasters. also, it’s renown for being able to deal with categorical variables (and all that they entail) out-of-the-box without really doing much / any preprocessing.

sagol · March 1, 2018, 4:31pm

catboost is very slow (

sagol · March 1, 2018, 4:35pm

result = xgboost * 0.4 + catboost * 0.4 + lightgbm * 0.2
Sometimes catboost can be useful for categorical features.

ezetl · March 1, 2018, 7:12pm

Just to share some knowledge (I ended in rank 200 so I am not sure it will be useful for someone), I got a really good boost at some point by eliminating low entropy columns. I passed from 0.24 to 0.18 logloss. My rationale was that all those columns had little information and thus would not contribute to the classifier (XGBoost). I think that the same results could be achieved with what Sagol said ( recursive feature elimination based on feature_importances ). I added some features, but they did not improve the model significantly: counting the members in the household, adding mean, std and median of numeric columns and also normalizing them.
I think I stopped submitting around a month ago, so the model could be really improved, especially because I did not do any kind of stacking. Thank you all for sharing!

priya.ana · March 2, 2018, 2:48am

Thank you so much for your replies I did not go over 0.2 so definitely lots for me to learn Would love to see some code snippets if possible Thanks again will try some of these measures in my code and see if I get some improvement

bull · March 2, 2018, 6:57pm

Amazing work everyone, thanks for participating! Really great to see the collaboration and knowledge sharing that happened on the forums.

Once we’ve reviewed the winning submissions, we’ll make the code available on our GitHub repository for competitions winners as well:

Thanks again to all!

siddhant · March 3, 2018, 12:08am

Reading the comments made me realize how hard some people have worked on the dataset. Amazing work people! My final rank is 122 but my solution is really simple - XGBoost tuned using hyperopt. I only spent a couple of days on this problem. If anybody is interested, here is my solution on Github.

Gillesvdw · March 3, 2018, 7:58am

Catboost can automatically deal with categorical features and has really good default hyper-parameters. My baseline, which was catboost with the default parameters on the hhold data scores 0.1746

payback · March 3, 2018, 10:13am

i want to share my solution, feel free to drop me a line!

RGama · March 4, 2018, 11:54pm

Hello everyone,

Great competition!

Our solution is an ensemble of models built using gradient boosting (lightgbm) and neural networks (keras).

We tried to take into account the only interpretable feature – hhold_size – when normalizing the features created from the individual hhold members data.
Screenshot from 2018-03-04 11-07-06
The most challenging part was feature selection. We did this using a couple of techniques. The most successful one was to fit a model to the core group of features and the group of features we wanted to add/test. We then evaluated the effect that a random permutation on each individual feature had on the predictions of that model. After going through every feature, we removed the ones for which we registered a score improvement.

The cross-validation scores of our best submission were:
A: 0.2517962 (20-fold cv)
B: 0.1726869 (20-fold cv)
C: 0.0154211 (5-fold cv)

Final: 0.1466347

We will give more details on our final write-up.

RGama and hugoguh
(the Ag100 team)

Topic		Replies	Views
Calling on the LB leaders: Did you use the indiv data at all? Pover-T Tests: Predicting Poverty	15	1554	February 22, 2018
Leaderboard Split Pover-T Tests: Predicting Poverty	2	1567	February 7, 2018
Spitballing for fun? Richter's Predictor	9	2103	September 30, 2020
Luck with individual data? Pover-T Tests: Predicting Poverty	0	907	January 8, 2018
22nd place Non ML submission looking for teammate Cold Start Energy Forecasting	2	805	September 17, 2018

Share the knowledge

Related topics