Hello, @kwetstone. Could you please confirm whether the test data used for the final evaluation has undergone the same cleaning and quality control level as the training data? Thanks.
Hi @qahn,
Yes, the final test data has been processed in the exact same way as the training data. Let us know if you have any more questions!
You’ve a CV/LB gap, right? Same here.
Same here. I am seeing ~10% delta which is a bit concerning to me.
I did find that in the training dataset that there were a noticeable amount of rows that are seemingly misclassified, especially in the WeaponType1 column. Would we assume the same misclassifications on the public/private dataset, or has it undergone more scrutiny? (eg. drowning misclassified as fall, or unknown classified as sharp instrument)
Hi @cyong, could you point us to a few of those rows? You can assume that the train and test sets have undergone the same level of data processing.
One question: The test set is split in public / private sets and current leaderboard is releated to public one? I don’t find the ratio of public/private anywhere.
Hi @MPWARE , that is correct - the leaderboard is based on the public portion of the test set, and there is a separate private portion that is withheld. We don’t release any additional information about the split; our advice is to create the best solution possible that doesn’t over-fit the public leaderboard.
Hi @ishashah,
btkc and cgfg are cases of drowning based on what the notes say that is not classified as Fall and Unknown respectively.
dsni is classified as sharp instrument but from the notes, it should be unknown as well.
I have more examples if needed.
Hi @cyong , thanks for flagging these cases for us. Since this dataset is human-coded, there is a minimal level of human error present. Since this is real data from the CDC that has undergone a few rounds of review, we won’t be updating it unless there is a systematic error throughout the data.
Thanks for the note, and please don’t hesitate to bring up any other concerns you have throughout the competition.
Thanks @ishashah for the clarification. I do want to raise that in my opinion, an entire class is coded inconsistently and would appreciate further clarification. In WeaponType1
, the Blunt Instrument class has 4 note examples. I would classify these 4 notes as Fall, Fall, Vehicle and Vehicle. Knowing that Blunt Instrument is also a minority class in the test set, the effect is not huge, but would appreciate any clarification.
Hi @cyong , thank you for pointing out these specific cases. We will certainly flag these instances of miscoding to the CDC.
Given that these cases are a small proportion of all records, we will be keeping them as-is for the duration of the competition. Since the performance metric takes a micro-averaged F1 score for categorical variables, these rare cases will have a very small impact on performance. Please let us know if you have any other comments!