I’ve split the labelled data into train/validation/test datasets. I can produce models that generalise well to the test set I’ve created, however they have lower performance on the (unlabelled) competition data. The gain in loss from my test results to the competition results is ~40% (i.e. 0.4 to about 0.56).
Is anyone else having this same issue? Is there a fundamental difference in the labelled and unlabelled data (e.g. taken from different geographical locations) that I’m missing?