What's your strategy?

I think you have to scale your data then the results would be better then before.

HI all!
This is my first post… my code is on https://github.com/Payback80/drivendata_blood_donation
my score is 0.4269 with very few lines of code
preprocessing: check for NA, outliers, multicollinearity
feature engineering: some, check the code
strategy: xgboost and H2o automl

1 Like

Hi all!
My score: 0.4350

Model used: vanilla logistic regression with 10-fold cross validation using caret in R

Pre-processing: remove total volume (100% correlation with number of donations)

Feature engineering: added new_donor variable (if months since last donation = months since first donation). I tried adding other variables like frequency (average months in between donations), interaction between the existing variables but didnt seem to improve performance much.

I have a question for anyone using the logLoss metrics in Caret. Do you get a negative logLoss? It’s weird I thought it should be higher than 0 but doesn’t seem to be the case.

My code:
bd_train <- trainControl(method=“repeatedcv”, number = 10, repeats=3, savePredictions = TRUE, classProbs=TRUE, summaryFunction=mnLogLoss)

model_bd1 <- train(donated ~ mo_last + no_donation + mo_first + new_donor , data=blood_donation, method=“glm”, family=“binomial”, trControl=bd_train, metric=“logLoss”)

Hi All! This is my first ever hands-on since completing DataCamp data scientist track )

So… rank 109 / 0.4349
Python (in PyCharm) with Keras, deep learning model comprised of BatchNorm layer, three Dense layers.
Model performs with around 0.5 loss and 0.72-ish accuracy metric.

Dropout layers did not improve much.

Hello,
I’m new to this competition. I’ve just begun to solve it. I only have RStudio, and I’m unable to find a package to calculate Log Loss. Any recommendation for free software that I could use would be much appreciated.

Thank you!

logloss <- function(z,y, eps=1e-13) {
#z: real values
#y: predicted values
#eps: numeric cero (0 would cause inifinte results)

y[y<eps]=eps
y[y>(1-eps)]=1-eps
l=mean(abs(z*log(y) + (1-z) * log(1-y)))

return(l)
}

1 Like

Hello,

Did you make any progress? I am teaching a new course and I would like to give them real examples to work on, we are using Rstudio Cloud for R and Google Colab for Python.

Best regards,