Is unsupervised/self-supervised learning of test images allowed?

hengcherkeng · November 4, 2022, 8:13am

i am thinking of using transformer-based masked image modeling, aka masked image token, for self-supervised learning SSL using both train and test images. (e.g. MAE pre-training in ViT).

Is it allowed in this competition?
(i note that some competitions in driven data allow SSL, but some do not. There is no clear rule on this)

ishashah · November 4, 2022, 7:28pm

Hi @hengcherkeng,

Thanks for your question. Per the competition rules, any use of the test set for pseudo-labeling, self-supervised learning, and otherwise is not permitted:

Unless otherwise specified on the Competition Website, for the purposes of quantitative evaluation of Submissions, Participants agree to process each test data sample independently without the use of information from other cases in the test set. By default, this precludes using information gathered across multiple test samples during training, for instance through pseudo labeling. Eligible Submissions and models must be able to run inference on new test data automatically, without retraining the model.

Note: Here, each test data sample corresponds to a patch, rather than a pixel, so you are allowed to use other pixels within that patch to generate a prediction for a single pixel.

hengcherkeng · November 4, 2022, 9:07pm

Thanks for the reply. I would like to clarify:

" test data sample corresponds to a patch,"

i think each test sample refers to a chip id.
each chip id has multiple 256x256 patches (over different months). we use multiple patches of each id to predict a single agbm.

on a side note:

“Eligible Submissions and models must be able to run inference on new test data automatically, without retraining the model.”

Actually, pseudo-labeled is not quite the same as SSL (one has supervised from the model, the other has supervised from data). further SSL-trained models can also run on new test samples without retraining.
maybe driven data can update its rules … it is now a common practice to use masked token SSL approach in the transformer for better performance. Technology has advanced very fast in the past few years.

nick.burns · November 11, 2022, 5:36am

This is an interesting question. I agree with @hengcherkeng here, that self-supervised techniques could be very useful and that these would naturally (outside of this comp perhaps) be applied to new data (i.e. the test data). It’s not the same as training on the test data with ground truth. And if done right, shouldn’t lead to leakage. But, grateful that @hengcherkeng asked this question. I’ll not use the test data for SSL pretraining!

hengcherkeng · November 11, 2022, 5:52am

you can still use SSL on train data. it will still improve your results. (e.g. you can google for image masked modeling for transformer and CNN)

nick.burns · November 11, 2022, 5:55am

Completely agree Will definitely use SSL on the training data. I suspect that the majority of my training will be self-supervised, with only a small amount of finetuning at the end. At least, that will be my intention going into this… hopefully it will work haha

ishashah · November 14, 2022, 2:27pm

Hi @hengcherkeng,

Thanks for the recommendation, we will take it into account in future competitions. To be clear, the rules permit SSL on the training data, but not on the test data. These rules are in place to mitigate the potential for overfitting.

Topic		Replies	Views
Are self-supervised or pseudo-label on test data allowed Kelp Wanted: Segmenting Kelp Forests	1	131	February 15, 2024
Pseudo labeling Hateful Memes	14	1139	November 1, 2020
Test Data Question Mars Spectrometry	1	360	March 15, 2022
Pretrained Neural Networks Tick Tick Bloom Challenge	2	299	February 13, 2023
Different results on personal test set and competition test set Mapping Disaster Risk from Aerial Imagery	3	640	December 11, 2019

Is unsupervised/self-supervised learning of test images allowed?

Related topics