Sharing CV scores and LB scores

Optimo · April 30, 2023, 10:11am

Hi everyone,

I wanted to make this thread so that contenders can share their CV vs LB scores.
In fact I have seen a very poor correlation between CV and LB so I wondered if the same happened to you.

Here are a few examples of my CV vs LB score (note that my CV is computed using the out of fold predictions of a 5 fold cross validation):
CV 0.390 → LB 0.4364
CV 0.376 → LB 0.4361
CV 0.360 → LB 0.4462
CV 0.377 → LB 0.4627

The correlation is very poor as you can see, what about yours ?

gbrought · May 1, 2023, 1:03am

So far, mine have been fairly correlated, with LB only 0.02 - 0.03 higher than CV, using metadata and a very simple 2 layer conv with attention on the patches and MLP head to classify the attention scores + embedded metadata.

sheep · May 3, 2023, 3:16pm

Mine is similar to yours with nearly no correlation. Logloss is not suitable for this case. Let’s assume we have a false negative prediction and the total sample size in the public or private dataset is 260. If we yield 0.01 for a positive instance, this would result in a 0.018 influence on the average loss.

Optimo · May 3, 2023, 3:36pm

This is a simulation from out of fold predictions, where I consider that 1/3 of the test data is in public leaderboard and 2/3 in private. I make random selection of a test set with the same size as the real one:

Looks like it’s hard to infer private score from public score.

I also compared two different models of mine with similar CVs 0.376 and 0.377. Here are their corresponding scores on different random public set:

There could be a large a shakeup…

BruhMann · May 4, 2023, 12:41am

I am suffering from the same problem as Optimo and was wondering the same thing. It could be that some pre-processing or attention needs to be applied.

zsolt.bedohazi · May 11, 2023, 10:00am

We achieved pretty high ROC AUC and accuracy scores locally, but we also experienced the exact same problem and we can also confirm that there is no strong correlation between the local CV fold scores and the public leaderboard scores based on the log loss metric. This is due to the fact that the distribution of the public test set is unknown which is a problem only due to the log loss metric, it wouldn’t be a problem if any other metric had been selected. It is highly probable that fitting the models to the distribution achieves much better log loss scores than letting the model learn meaningful signals.

Optimo · May 11, 2023, 10:15am

My best models have around 0.34 logloss and 0.82 AUC, what about yours @zsolt.bedohazi ?

Topic		Replies	Views
Help with setting up cross validation Pover-T Tests: Predicting Poverty	13	2965	February 14, 2018
Cross-validation and public leaderboard MagNet: Model the Geomagnetic Field	7	851	February 8, 2021
1-fold confusion matrix Mapping Disaster Risk from Aerial Imagery	6	616	December 17, 2019
Anomaly detection: bug in scoring Power Laws	4	826	March 3, 2018
Leaderboard score precision Countable Care	2	2322	March 17, 2015

Sharing CV scores and LB scores

Related topics