Other (non-winning) solutions

jrnew · April 28, 2015, 4:20am

Thought it would be useful to share the non-winning solutions as well, so here’s the my solution done in collaboration with @MasterAwk. Link to PDF of report here (we chose to work on this for a class project, hence the pretty report). We didn’t have too much time to spend working on this so it’s a pretty rough attempt. Anyway, summary given below:

Result: Rank 34, 0.261 public leaderboard score, 0.266 private leaderboard score

Models (didn’t do any ensembling!):

Gradient boosting machine (this worked best)
Random forest

Feature engineering:

Number of numeric features with missing values for each woman
Number of ordinal features with missing values for each woman
Number of categorical features with missing values for each woman

Feature selection:

Features with a proportion of missing values exceeding a certain cut-off in the training set would be dropped, since missing value imputation is hardly meaningful for such features. A cut-off of 90% worked best.

Missing value imputation:

For numeric features, missing values were set to 0.
For ordinal features, missing values were set to -1, since the lowest category for each ordinal feature is coded as either 0 or 1 in the training set.
For categorical features, a new category “missing” was introduced and missing values were set to this category instead.

What we should have done:

More models + ensembling
Cross validation

Of course, sharing of your solutions + any feedback would be welcome!

dartdog · April 28, 2015, 9:23pm

Thanks, looks nice will take a while to digest… Wish it was Python, but oh well… need to learn more R

Topic		Replies	Views
Handling Missing Values Pump it Up: Data Mining the Water Table	1	2176	April 20, 2017
How to deal with large number of missing values? Countable Care	0	1996	April 6, 2015
1st Place Solution Countable Care	3	5267	August 13, 2015
What's your strategy? Warm Up: Predict Blood Donations	26	10803	August 23, 2020
Household country B Data has so many bugs in R Pover-T Tests: Predicting Poverty	3	1002	February 6, 2018

Other (non-winning) solutions

Related topics