Large difference in validation loss and test loss

prajwalkr · November 5, 2017, 10:00pm

My validation cross entropy loss is ~0.0013 but my test score on the leaderboard is ~0.06. My validation set is 20% of the training data, and is generated from the Dataset utility provided in the competition benchmark. Why is there a large discrepancy in the validation loss and test loss? Is anyone else facing the same issue?

bull · November 8, 2017, 9:47pm

Hi @prajwalkr, I’d check the normal culprits: overfitting and validation sample that is not representative. Let us know if you those are being handled and there is still a large mismatch.

prajwalkr · November 11, 2017, 1:12pm

Hi @bull, thank you for your reply. As per your suggestion, I changed the validation split to 40% from the previous 20%. I get a validation loss of ~0.002 but score about 0.07 on the leaderboard.

I wanted to clarify one specific aspect which might possibly explain this discrepancy. So far, I’ve ignored the existence of multiple classes in a video, since they were very very rare in number in the training data. Is the frequency of multiple classes in a video much higher in the test set?

Further, do you see any other reason for this difference in validation and test loss?

dmytro · November 12, 2017, 8:18am

Hi, I have seen the difference in the local validation loss and leaderboard, but the difference was around 25%, not 35 times.

thinline72 · November 12, 2017, 8:52am

My validation loss is actually a bit higher than LB but I don’t use Dataset utility. Does it contain a proper stratified train/valid splitting?

prajwalkr · November 12, 2017, 4:30pm

@dmytro, that is precisely why I felt I was doing something very wrong

prajwalkr · November 12, 2017, 4:31pm

@thinline72 Yes, the dataset utility does do proper stratified train / validation splitting.

Topic		Replies	Views
Is your performance on training data quite different from that on testing data? Senior Data Science: Safe Aging with SPHERE	5	1660	July 8, 2016
1-fold confusion matrix Mapping Disaster Risk from Aerial Imagery	6	606	December 17, 2019
Different results on personal test set and competition test set Mapping Disaster Risk from Aerial Imagery	3	640	December 11, 2019
CV/LB correlation Genetic Engineering Attribution	1	764	September 20, 2020
Cross-validation and public leaderboard MagNet: Model the Geomagnetic Field	7	819	February 8, 2021

Large difference in validation loss and test loss

Related topics