Question about the Data

HEgan · October 9, 2024, 5:33pm

I’ve noticed there are several circumstances where the narrative text doesn’t match the encoding in some of the columns. For example, uid ‘afih’ states that the victim was kidnapped by their father and abused for 5 years, but the ‘AbusedAsChild’ and ‘InterpersonalViolenceVictim’ columns are both labeled as ‘0’ (didn’t occur). Does this mean that there wasn’t concrete evidence that these crimes were the direct cause of the suicide? Or are the columns incorrectly labeled? Or are there columns that document this that we don’t have? I could see this causing pretty significant issues in modeling algorithms and extraction.

iwyw · October 9, 2024, 8:35pm

We have similar observations.

ishashah · October 10, 2024, 2:13pm

Hi @HEgan and @iwyw ,

Thanks for flagging this case for us. Since this dataset is human-coded, there is a minimal level of human error present. Since this is real data from the CDC that has undergone a few rounds of review, we won’t be updating it unless there is a systematic error throughout the data.

In this case , the 0 value of InterpersonalViolenceVictim is correct, since the violence did not occur within the last month. Many of the other variables are also coded based on specific time parameters defined within the NVDRS coding manual, which is a good reference for looking up these more nuanced definitions.

Please don’t hesitate to bring up any other concerns you have throughout the competition. Good luck!

Topic		Replies	Views
Discrepancy in age variable PREPARE Challenge	2	127	November 13, 2024
Narrativas de test Youth Mental Health: Automated Abstraction	1	52	October 14, 2024
NUL Values in columns containing personal data of the volunteers Flu Shot Learning	1	782	December 14, 2021
Feature Engineering and Labeling New Variables in the Youth Mental Health Narratives Youth Mental Health: Automated Abstraction	2	125	October 25, 2024
Novel Variables: Midpoint submissions feedback Youth Mental Health: Novel Variables	6	220	November 6, 2024

Question about the Data

Related topics