Novel Variables: Midpoint submissions feedback

kwetstone · November 4, 2024, 8:37pm

Thank you to everyone who made midpoint submissions! The overall quality of submissions was highly impressive, and we appreciated participants’ thoughtfulness, clarity, and hard work. Midpoints submissions were reviewed by both the CDC and DrivenData, and already show promise for strengthening youth mental health research.

It was difficult to choose just 5 most promising midpoint submissions to receive bonus prizes. All finalists for the bonus prizes were notified via email on November 4, 2024.

In the meantime, we are sharing some key takeaways and guidance for putting together your final submissions. This post discusses both what made submissions especially good, and some common pitfalls to avoid.

1. Anchor on a hypothesis

The strongest submissions were driven by a specific hypothesis about useful new information to extract from the narratives. In some cases the hypothesis was based on existing literature, in others it was informed by personal experience. A hypothesis-driven approach contributes to a higher score on the insight component of the evaluation criteria.

Keep in mind that the CDC has already investigated trends in the existing standard variables, and performed basic analysis such as clustering. Analysis of the data and existing standard variables should be mostly driven by an idea for new variables, rather than the other way around.

2. Keep the data private!

Remember that you cannot upload competition data to any third party services or APIs that retain the data, per the external data rules. Participants can use external models, but must do so by loading open-source model weights.

Narratives have been anonymized for use in the competition, but still capture extremely personal and private information. To engage with these experiences respectfully as a data scientist, make sure not to share the data or enable unintended uses.

3. Build a pipeline

The best submissions tested multiples approaches, and found creative ways to combine techniques into a multi-method analysis pipeline. Think critically about the strengths and weaknesses of different pre-trained models, techniques, or methods, and how they could work together in new ways. Submissions that did this, and explained why specific models or approaches were chosen, scored better on the technical novelty evaluation component.

The very best submissions not only combined multiple techniques into one pipeline, but documented that pipeline well. Make it easy for a reader to understand the full flow of your analysis from start to finish. For example, consider including a technical diagram as a visual aid.

4. Accurately scope your analysis

Include suggestions for future work that the CDC could explore to carry your ideas forward. Your submission can fit into the context of a broader research goal, and only take on part of that goal.

We know that time and resources are limited, and participants cannot explore every possible line of inquiry. Don’t be afraid to note limitations in what you were able to accomplish.

5. Approach behaviors with nuance

The impact of any specific behavior on mental health is nuanced, and rarely just positive or negative. Submissions were scored higher when they avoided making value judgements.

Remember, your submission does not need to draw definitive causal conclusions. The primary goal is to extract information that will be useful to researchers working to protect youth mental health.

6. Use thoughtful visuals

Use well-developed visuals that support your narrative. Make sure what your visual captures is clearly documented. The best visuals helped to tell a story — for example, diagrams of an analysis pipeline.

7. Check all existing variables

Make sure your suggested new variable(s) are not already being extracted. The competition data includes only a subset of the existing standard variables. For the full list, reference the National Violent Death Reporting System Coding Manual.

meagvo · November 5, 2024, 9:34am

Where can we find our midpoint submission feedback?

kwetstone · November 5, 2024, 2:28pm

@meagvo We won’t be providing individual feedback on the midpoint submissions. The general feedback here synthesizes trends that we saw across submissions.

If you have more specific questions, please feel free to post them!

meagvo · November 5, 2024, 2:47pm

Thank you for clarifying! I misunderstood how the feedback works.

xvii · November 6, 2024, 7:12am

Hi
On notifying the winners, when will that happen?
Thanks

kwetstone · November 6, 2024, 3:14pm

@xvii All of the bonus prize finalists have now been notified. I updated the post to reflect that.

raspberrycup · November 6, 2024, 4:28pm

Hi,
Thanks for the update! Is it possible to share some statistics about the midpoint submissions?

Topic		Replies	Views
Feedback on mid-point submission Youth Mental Health: Novel Variables	1	80	October 15, 2024
About the Youth Mental Health: Novel Variables category Youth Mental Health: Novel Variables	0	127	August 28, 2024
Feedback on final submission & winners Youth Mental Health: Novel Variables	1	105	December 4, 2024
Feature Engineering and Labeling New Variables in the Youth Mental Health Narratives Youth Mental Health: Automated Abstraction	2	100	October 25, 2024
Midpoint submissions feedback Unsupervised Wisdom	2	440	September 18, 2023