Data Quality in the Test Set

DavidEis · February 2, 2024, 2:49pm

Hi,

I have a question on the Data Quality of the Test Set.
The Problem Description Page mentions possible False Negatives in the Ground Truth Data, but that is not the only issue occurring in the Ground Truth of the Training Set.

Do we have to assume that the Test Set Kelp Ground Truth has the same structure as the Training Sets’ including all its shortcomings?

Thank you very much and best regards
David

cszc · February 2, 2024, 4:16pm

Hi @DavidEis, thanks for the thoughtful question. Yes, you should assume that the test set has the same distribution as the training set, including errors.

Topic		Replies	Views
Data Split Methdology Mars Spectrometry	1	308	September 23, 2022
Train and test data consistency Youth Mental Health: Automated Abstraction	11	263	October 14, 2024
Power Laws Forecasting: Where is test data? Power Laws	1	805	March 18, 2018
Data Quality Issues? Mapping Disaster Risk from Aerial Imagery	3	820	December 14, 2019
Some questions about the competition TissueNet: Detect Lesions in Cervical Biopsies	3	515	November 7, 2020

Data Quality in the Test Set

Related topics