DrivenData’s Predict Blood Donations
Feature engineering did not improve scores in most cases. Scaling was used for algorithms that required it. Hyper-parameters were estimated by GridSearchCV, a brute-force stratified 10-fold cross-validated search.
leaderboard_score is the contest score for predictions of the unknown test-set; lower is better. Camel-case model names refer to scikit-learn models; lower-case were hand-crafted in some way.
|ensemble of averages
Simple logistic regression did quite well; it seems odd that bagging and boosting both reduced its performance. In general though, ensembling did improve performances.
A number of statistics were recorded for each model from 10-fold CV predictions of the training data:
accuracy the proportion correctly predicted
logloss the sklearn.metrics.log_loss
AUC the area under the ROC curve
f1 the weighted average of precision and recall
mu the average over 100 cross-validated scores with permutations
std the stdev over 100 cross-validated scores with permutations
Starting with all the variables, R’s step function produced the following
lm(formula = leaderboard_score ~ mu + std, data = score_data,
na.action = na.omit)
Min 1Q Median 3Q Max
-0.18728 -0.05472 -0.03539 0.02082 0.42898
Estimate Std. Error t value Pr(>|t|)
(Intercept) 25.722 2.962 8.685 3.09e-07 ***
mu -33.089 3.897 -8.490 4.11e-07 ***
std -60.589 7.857 -7.711 1.35e-06 ***
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1499 on 15 degrees of freedom
(8 observations deleted due to missingness)
Multiple R-squared: 0.8311, Adjusted R-squared: 0.8086
F-statistic: 36.91 on 2 and 15 DF, p-value: 1.61e-06
Possibly std is a stand-in for statistical-learning’s variance.
The work is available on GitHub and BitBucket. (Only GitHub permits the viewing of IPython notebooks).