Out of sample in private testing

hengcherkeng · September 3, 2020, 12:21am

it is mention that: " Code from prize-eligible solutions will be collected and run against an out-of-sample verification set to ensure that models can perform comparably on unseen data."

for private testing, which of the following(s) is correct ???

the test samples will be from the 1314 labs in the trained data (i.e. classification methods like the RF benchmark code is applicable and sufficient)
the test samples may be from other labs not found in training (i.e. we would need metric learning, or nearest neighbors method. The BLAST benchmark code is applicable)

Also, will the submitted code be run for inference only or both training/and inference?

glipstein · September 14, 2020, 6:26pm

Hi @hengcherkeng - Thanks for checking on this, however we will not be releasing any information about the verification process or data set. Your label predictions are still limited to the columns in the submission format.

xiaomen · September 15, 2020, 10:26am

I have another question about eligibility/verification process, I see at least two ways to 'run against an out-of-sample"

we ship a pipeline process/code, and, the out-of-sample is part of the process from the very beginning of the pipeline: example, clustering that could required all data to be mix-up
we ship the full pipeline/code but you expect the prediction part/trained model only to be run against the out-of-sample and the training data does not intervene directly anymore

I would like to be sure we are not missing any restriction that we need to know

Many thanks

glipstein · September 21, 2020, 4:25pm

Hi @xiaomen - The trained model that is submitted will be used to make predictions on new data that is similar to how it will be used in the future. The models won’t be re-trained on new data.

Topic		Replies	Views
Using test data for modelling Genetic Engineering Attribution	11	1073	October 18, 2020
Test data link not available? NASA Pose Bowl	5	276	April 1, 2024
Private Testing Info MagNet: Model the Geomagnetic Field	2	505	January 26, 2021
Private LB scores Genetic Engineering Attribution	2	686	August 29, 2020
Setting final submissions? Genetic Engineering Attribution	9	656	October 21, 2020

Out of sample in private testing

Related topics