Is cross track data usage allowed?

wdong · November 11, 2024, 3:12pm

It appears to me that the novel variable track contains some different labels, and proper usage of such labels might help improving the score of the automated abstraction track. Could the organizers please clarify?

kwetstone · November 12, 2024, 2:24pm

@wdong The data for the novel variables track does indeed include more standard variables than the automated abstraction track.

The goal of the automated abstraction track is to predict a set of key standard variables based solely on the narratives, without reference to other standard variables. Per the code submission page, the only features available in the code execution runtime to generate predictions are the narratives. The additional standard variables from the novel variables track will not be available.

Does that answer your question? I’m not sure exactly what needs clarification.

bamps53 · November 13, 2024, 11:37am

@kwetstone
Even though we can still use them as training data. Will it be allowed?

wdong · November 13, 2024, 4:24pm

I understand that the labels won’t be available at inference time. Hypothetically one could use the extra labels for training only.

kwetstone · November 13, 2024, 5:13pm

@bamps53 @wdong Cross-track data would be allowed during training. Since it is part of the same competition group, it would not be considered external data.

Note that the Novel Variables track includes the same training sample as the Automated Abstraction track – there are no additional cases. The extra standard variables in the Novel Variables track all reflect information in the narratives, so do not contain new information.

wdong · November 14, 2024, 3:12pm

This answers the question. Thanks!

Topic		Replies	Views
Feature Engineering and Labeling New Variables in the Youth Mental Health Narratives Youth Mental Health: Automated Abstraction	2	125	October 25, 2024
Using test data for modelling Genetic Engineering Attribution	11	1105	October 18, 2020
Clarification about Model Features PREPARE Challenge	0	108	December 5, 2024
Are self-supervised or pseudo-label on test data allowed Kelp Wanted: Segmenting Kelp Forests	1	148	February 15, 2024
Using external data Genetic Engineering Attribution	2	618	September 30, 2020

Is cross track data usage allowed?

Related topics