Data Quality in the Test Set


I have a question on the Data Quality of the Test Set.
The Problem Description Page mentions possible False Negatives in the Ground Truth Data, but that is not the only issue occurring in the Ground Truth of the Training Set.

Do we have to assume that the Test Set Kelp Ground Truth has the same structure as the Training Sets’ including all its shortcomings?

Thank you very much and best regards

1 Like

Hi @DavidEis, thanks for the thoughtful question. Yes, you should assume that the test set has the same distribution as the training set, including errors.