I tried both sklearn cross_val_score and my own CV function using stratified K fold. With both, I get scores that are really different from LB, without a trend I can notice…
For example: I tried a logistic regression with no feature engineering. On CV, I get ~ 0.3 but then I get ~3. on the LB. -------- here, CV is much lower than LB
However, I tried a lightgbm model and I get a CV score that is lower than the LB by ~0.1. --------CV is slightly lower than LB
Then, still using the same lightgbm parameters, I removed some features, transform, etc. and I saw little change in CV, but ~2-fold improvement on the leaderboard over the same model with all the features. It seems to be all over the place. -------- CV is much higher than LB.
If you could point out any problems with my approach, please let me know.
- load data
- Stratified KFold to split data
- Preprocess/normalize etc on the current fold (I also tried doing this before splitting data, but wasn’t sure if that would case a data leak)
- Train, Predict, and store those predictions
- calculate LogLoss on those preds
- take mean of all three country’s logloss to get final mean log loss (tried weighted average as well, but it isn’t a big difference)