Question about the Data

I’ve noticed there are several circumstances where the narrative text doesn’t match the encoding in some of the columns. For example, uid ‘afih’ states that the victim was kidnapped by their father and abused for 5 years, but the ‘AbusedAsChild’ and ‘InterpersonalViolenceVictim’ columns are both labeled as ‘0’ (didn’t occur). Does this mean that there wasn’t concrete evidence that these crimes were the direct cause of the suicide? Or are the columns incorrectly labeled? Or are there columns that document this that we don’t have? I could see this causing pretty significant issues in modeling algorithms and extraction.

1 Like

We have similar observations.

Hi @HEgan and @iwyw ,

Thanks for flagging this case for us. Since this dataset is human-coded, there is a minimal level of human error present. Since this is real data from the CDC that has undergone a few rounds of review, we won’t be updating it unless there is a systematic error throughout the data.

In this case , the 0 value of InterpersonalViolenceVictim is correct, since the violence did not occur within the last month. Many of the other variables are also coded based on specific time parameters defined within the NVDRS coding manual, which is a good reference for looking up these more nuanced definitions.

Please don’t hesitate to bring up any other concerns you have throughout the competition. Good luck!