Back to DrivenData | Blog

What's your strategy?


#21

I think you have to scale your data then the results would be better then before.


#22

HI all!
This is my first post… my code is on https://github.com/Payback80/drivendata_blood_donation
my score is 0.4269 with very few lines of code
preprocessing: check for NA, outliers, multicollinearity
feature engineering: some, check the code
strategy: xgboost and H2o automl


#23

Hi all!
My score: 0.4350

Model used: vanilla logistic regression with 10-fold cross validation using caret in R

Pre-processing: remove total volume (100% correlation with number of donations)

Feature engineering: added new_donor variable (if months since last donation = months since first donation). I tried adding other variables like frequency (average months in between donations), interaction between the existing variables but didnt seem to improve performance much.

I have a question for anyone using the logLoss metrics in Caret. Do you get a negative logLoss? It’s weird I thought it should be higher than 0 but doesn’t seem to be the case.

My code:
bd_train <- trainControl(method=“repeatedcv”, number = 10, repeats=3, savePredictions = TRUE, classProbs=TRUE, summaryFunction=mnLogLoss)

model_bd1 <- train(donated ~ mo_last + no_donation + mo_first + new_donor , data=blood_donation, method=“glm”, family=“binomial”, trControl=bd_train, metric=“logLoss”)


#24

Hi All! This is my first ever hands-on since completing DataCamp data scientist track )

So… rank 109 / 0.4349
Python (in PyCharm) with Keras, deep learning model comprised of BatchNorm layer, three Dense layers.
Model performs with around 0.5 loss and 0.72-ish accuracy metric.

Dropout layers did not improve much.