Out of sample in private testing

it is mention that: " Code from prize-eligible solutions will be collected and run against an out-of-sample verification set to ensure that models can perform comparably on unseen data."

for private testing, which of the following(s) is correct ???

  1. the test samples will be from the 1314 labs in the trained data (i.e. classification methods like the RF benchmark code is applicable and sufficient)

  2. the test samples may be from other labs not found in training (i.e. we would need metric learning, or nearest neighbors method. The BLAST benchmark code is applicable)

Also, will the submitted code be run for inference only or both training/and inference?

1 Like

Hi @hengcherkeng - Thanks for checking on this, however we will not be releasing any information about the verification process or data set. Your label predictions are still limited to the columns in the submission format.

I have another question about eligibility/verification process, I see at least two ways to 'run against an out-of-sample"

  • we ship a pipeline process/code, and, the out-of-sample is part of the process from the very beginning of the pipeline: example, clustering that could required all data to be mix-up

  • we ship the full pipeline/code but you expect the prediction part/trained model only to be run against the out-of-sample and the training data does not intervene directly anymore

I would like to be sure we are not missing any restriction that we need to know

Many thanks

Hi @xiaomen - The trained model that is submitted will be used to make predictions on new data that is similar to how it will be used in the future. The models won’t be re-trained on new data.

1 Like