Thank you to everyone who made midpoint submissions! The overall quality of submissions was highly impressive, and we appreciated participants’ thoughtfulness, clarity, and hard work. Midpoints submissions were reviewed by both the CDC and DrivenData, and already show promise for strengthening youth mental health research.
We are still finalizing the selection of midpoint bonus prize recipients. Winners will be notified shortly.
In the meantime, we are sharing some key takeaways and guidance for putting together your final submissions. This post discusses both what made submissions especially good, and some common pitfalls to avoid.
1. Anchor on a hypothesis
The strongest submissions were driven by a specific hypothesis about useful new information to extract from the narratives. In some cases the hypothesis was based on existing literature, in others it was informed by personal experience. A hypothesis-driven approach contributes to a higher score on the insight component of the evaluation criteria.
Keep in mind that the CDC has already investigated trends in the existing standard variables, and performed basic analysis such as clustering. Analysis of the data and existing standard variables should be mostly driven by an idea for new variables, rather than the other way around.
2. Keep the data private!
Remember that you cannot upload competition data to any third party services or APIs that retain the data, per the external data rules. Participants can use external models, but must do so by loading open-source model weights.
Narratives have been anonymized for use in the competition, but still capture extremely personal and private information. To engage with these experiences respectfully as a data scientist, make sure not to share the data or enable unintended uses.
3. Build a pipeline
The best submissions tested multiples approaches, and found creative ways to combine techniques into a multi-method analysis pipeline. Think critically about the strengths and weaknesses of different pre-trained models, techniques, or methods, and how they could work together in new ways. Submissions that did this, and explained why specific models or approaches were chosen, scored better on the technical novelty evaluation component.
The very best submissions not only combined multiple techniques into one pipeline, but documented that pipeline well. Make it easy for a reader to understand the full flow of your analysis from start to finish. For example, consider including a technical diagram as a visual aid.
4. Accurately scope your analysis
Include suggestions for future work that the CDC could explore to carry your ideas forward. Your submission can fit into the context of a broader research goal, and only take on part of that goal.
We know that time and resources are limited, and participants cannot explore every possible line of inquiry. Don’t be afraid to note limitations in what you were able to accomplish.
5. Approach behaviors with nuance
The impact of any specific behavior on mental health is nuanced, and rarely just positive or negative. Submissions were scored higher when they avoided making value judgements.
Remember, your submission does not need to draw definitive causal conclusions. The primary goal is to extract information that will be useful to researchers working to protect youth mental health.
6. Use thoughtful visuals
Use well-developed visuals that support your narrative. Make sure what your visual captures is clearly documented. The best visuals helped to tell a story — for example, diagrams of an analysis pipeline.
7. Check all existing variables
Make sure your suggested new variable(s) are not already being extracted. The competition data includes only a subset of the existing standard variables. For the full list, reference the National Violent Death Reporting System Coding Manual.