It appears to me that the novel variable track contains some different labels, and proper usage of such labels might help improving the score of the automated abstraction track. Could the organizers please clarify?
@wdong The data for the novel variables track does indeed include more standard variables than the automated abstraction track.
The goal of the automated abstraction track is to predict a set of key standard variables based solely on the narratives, without reference to other standard variables. Per the code submission page, the only features available in the code execution runtime to generate predictions are the narratives. The additional standard variables from the novel variables track will not be available.
Does that answer your question? I’m not sure exactly what needs clarification.
@kwetstone
Even though we can still use them as training data. Will it be allowed?
I understand that the labels won’t be available at inference time. Hypothetically one could use the extra labels for training only.
@bamps53 @wdong Cross-track data would be allowed during training. Since it is part of the same competition group, it would not be considered external data.
Note that the Novel Variables track includes the same training sample as the Automated Abstraction track – there are no additional cases. The extra standard variables in the Novel Variables track all reflect information in the narratives, so do not contain new information.
This answers the question. Thanks!