Is unsupervised/self-supervised learning of test images allowed?

i am thinking of using transformer-based masked image modeling, aka masked image token, for self-supervised learning SSL using both train and test images. (e.g. MAE pre-training in ViT).

Is it allowed in this competition?
(i note that some competitions in driven data allow SSL, but some do not. There is no clear rule on this)

1 Like

Hi @hengcherkeng,

Thanks for your question. Per the competition rules, any use of the test set for pseudo-labeling, self-supervised learning, and otherwise is not permitted:

Unless otherwise specified on the Competition Website, for the purposes of quantitative evaluation of Submissions, Participants agree to process each test data sample independently without the use of information from other cases in the test set. By default, this precludes using information gathered across multiple test samples during training, for instance through pseudo labeling. Eligible Submissions and models must be able to run inference on new test data automatically, without retraining the model.

Note: Here, each test data sample corresponds to a patch, rather than a pixel, so you are allowed to use other pixels within that patch to generate a prediction for a single pixel.

1 Like

Thanks for the reply. I would like to clarify:

" test data sample corresponds to a patch,"

i think each test sample refers to a chip id.
each chip id has multiple 256x256 patches (over different months). we use multiple patches of each id to predict a single agbm.


on a side note:

“Eligible Submissions and models must be able to run inference on new test data automatically, without retraining the model.”

Actually, pseudo-labeled is not quite the same as SSL (one has supervised from the model, the other has supervised from data). further SSL-trained models can also run on new test samples without retraining.
maybe driven data can update its rules … it is now a common practice to use masked token SSL approach in the transformer for better performance. Technology has advanced very fast in the past few years.

1 Like

This is an interesting question. I agree with @hengcherkeng here, that self-supervised techniques could be very useful and that these would naturally (outside of this comp perhaps) be applied to new data (i.e. the test data). It’s not the same as training on the test data with ground truth. And if done right, shouldn’t lead to leakage. But, grateful that @hengcherkeng asked this question. I’ll not use the test data for SSL pretraining!

you can still use SSL on train data. it will still improve your results. (e.g. you can google for image masked modeling for transformer and CNN)

1 Like

Completely agree :slight_smile: Will definitely use SSL on the training data. I suspect that the majority of my training will be self-supervised, with only a small amount of finetuning at the end. At least, that will be my intention going into this… hopefully it will work :slight_smile: haha

Hi @hengcherkeng,

Thanks for the recommendation, we will take it into account in future competitions. To be clear, the rules permit SSL on the training data, but not on the test data. These rules are in place to mitigate the potential for overfitting.

1 Like