Clarification about Model Features

We have posted a new announcement to both the Acoustic Track and Social Determinants Track to provide some additional clarity around what is permitted for model features. The full text of that announcement is copied below for awareness. Please feel free to reach out with any questions!

Thanks for everyone’s engagement and participation so far! Based on the questions we’ve received, we wanted to provide some more clarity about what is and isn’t allowed when developing features for model training and inference. Solutions should adhere to the following rules:

  • Participants may annotate provided training data as long as they are included with solutions to enable reproduction and do not overfit to the test set.
  • Participants may not add any manual annotations to the provided test data. Eligible solutions need to be able to run on test samples automatically using the test data as provided.
  • Each test data sample should be processed independently during inference without the use of information from other cases in the test set. As a result, running model training code with the same training data but a different set of test data or no test data should produce the same model weights and fitted feature parameters. Eligible solutions need to be able to run on test samples automatically using the test data as provided.

For example, in the Acoustic Track, you are welcome to manually label training data with features such as language and task. However, these annotations may not be added to the test data. Your inference pipeline should be able to automatically run on unseen data, without any hand labeling.

Similarly, in the Social Determinants Track, your model should only use the provided feature data as input. For example, you are not permitted to use a person’s reported 2016 composite score to predict their 2021 score.

The problem descriptions in the Acoustic Track and Social Determinants Track have been updated to reflect these clarifications.

The challenge rules are in place to promote fair competition and useful solutions. If you are unsure whether your solution meets the competition rule or have other questions, please feel free to post in the forum or reach out to us at info@drivendata.org. Furthermore, if you believe an existing submission does not meet these requirements, please let us know so that we can remove it.