Train and test data consistency

Hello, @kwetstone. Could you please confirm whether the test data used for the final evaluation has undergone the same cleaning and quality control level as the training data? Thanks.

Hi @qahn,

Yes, the final test data has been processed in the exact same way as the training data. Let us know if you have any more questions!

You’ve a CV/LB gap, right? Same here.

1 Like

Same here. I am seeing ~10% delta which is a bit concerning to me.

1 Like

I did find that in the training dataset that there were a noticeable amount of rows that are seemingly misclassified, especially in the WeaponType1 column. Would we assume the same misclassifications on the public/private dataset, or has it undergone more scrutiny? (eg. drowning misclassified as fall, or unknown classified as sharp instrument)

Hi @cyong, could you point us to a few of those rows? You can assume that the train and test sets have undergone the same level of data processing.

One question: The test set is split in public / private sets and current leaderboard is releated to public one? I don’t find the ratio of public/private anywhere.