Large difference in validation loss and test loss

My validation cross entropy loss is ~0.0013 but my test score on the leaderboard is ~0.06. My validation set is 20% of the training data, and is generated from the Dataset utility provided in the competition benchmark. Why is there a large discrepancy in the validation loss and test loss? Is anyone else facing the same issue?

Hi @prajwalkr, I’d check the normal culprits: overfitting and validation sample that is not representative. Let us know if you those are being handled and there is still a large mismatch.

Hi @bull, thank you for your reply. As per your suggestion, I changed the validation split to 40% from the previous 20%. I get a validation loss of ~0.002 but score about 0.07 on the leaderboard.

I wanted to clarify one specific aspect which might possibly explain this discrepancy. So far, I’ve ignored the existence of multiple classes in a video, since they were very very rare in number in the training data. Is the frequency of multiple classes in a video much higher in the test set?

Further, do you see any other reason for this difference in validation and test loss?

Hi, I have seen the difference in the local validation loss and leaderboard, but the difference was around 25%, not 35 times.

My validation loss is actually a bit higher than LB but I don’t use Dataset utility. Does it contain a proper stratified train/valid splitting?

@dmytro, that is precisely why I felt I was doing something very wrong

@thinline72 Yes, the dataset utility does do proper stratified train / validation splitting.