I’ve split the labelled data into train/validation/test datasets. I can produce models that generalise well to the test set I’ve created, however they have lower performance on the (unlabelled) competition data. The gain in loss from my test results to the competition results is ~40% (i.e. 0.4 to about 0.56).
Is anyone else having this same issue? Is there a fundamental difference in the labelled and unlabelled data (e.g. taken from different geographical locations) that I’m missing?
Turns out there was a bug in my code - I thought I was using only verified examples but the unverified ones had slipped in somehow! I’ve re-trained my model now and the losses are far more similar, plus the predicted class distribution of the competition data seems much closer to the original dataset.
It seems to me that, I am still facing this issue. After reaching to 0.5 loss, my CrossValidation and LB are giving different results (eg, CV= 0.43, LB= 0.68).