Back to DrivenData | Blog

What's your strategy?


I think you have to scale your data then the results would be better then before.


HI all!
This is my first post… my code is on
my score is 0.4269 with very few lines of code
preprocessing: check for NA, outliers, multicollinearity
feature engineering: some, check the code
strategy: xgboost and H2o automl


Hi all!
My score: 0.4350

Model used: vanilla logistic regression with 10-fold cross validation using caret in R

Pre-processing: remove total volume (100% correlation with number of donations)

Feature engineering: added new_donor variable (if months since last donation = months since first donation). I tried adding other variables like frequency (average months in between donations), interaction between the existing variables but didnt seem to improve performance much.

I have a question for anyone using the logLoss metrics in Caret. Do you get a negative logLoss? It’s weird I thought it should be higher than 0 but doesn’t seem to be the case.

My code:
bd_train <- trainControl(method=“repeatedcv”, number = 10, repeats=3, savePredictions = TRUE, classProbs=TRUE, summaryFunction=mnLogLoss)

model_bd1 <- train(donated ~ mo_last + no_donation + mo_first + new_donor , data=blood_donation, method=“glm”, family=“binomial”, trControl=bd_train, metric=“logLoss”)


Hi All! This is my first ever hands-on since completing DataCamp data scientist track )

So… rank 109 / 0.4349
Python (in PyCharm) with Keras, deep learning model comprised of BatchNorm layer, three Dense layers.
Model performs with around 0.5 loss and 0.72-ish accuracy metric.

Dropout layers did not improve much.