Phase 2 Submissions Cheating?

In my opinion, the purpose of creating an AI model is to let the model do the work (automated). If you have model and a test data, and you label the test data using human or machine, then you feed it to the model, then your run again the AI model to label the test data (with idea of labeled test set), then it clearly defeats the purpose of creating an AI model. A labeled test set feed into the AI Model will yield high accuracy score (Because your model has already an idea about the result of the test). My point is, an AI model should not have an idea about the test set (it should only learn from dev and training sets), because that is the purpose of test. In the real world, a model will not have an idea or clue about the test (the social media post, text and image, that a user will post). If a model needs a feed itself first with labeled test data before it can predict accurately, in my opinion, I think it’s a failed model.

This is my first competition and I really learned a lot, thanks Driven Data and Facebook AI for this opportunity. Congratulation to everyone!

1 Like

@ipr999 In this competition pseudo-label wasn’t allowed, which was a choice of the organizers.
As competitions are good for learning, here is something new for you to learn:
Pseudo-labeling is not cheating! It is a valid machine learning technique which is used both in academia and in real life scenarios. In pseudo labeling you use the model that you’ve trained on the training data (with a good validation score) to predict your test set. And then add the predicted test set as training data, and keep on iterating. This has several advantages, one of them is that it is a form of regularization which can reduce over-fitting.
A search of arvix will return hundreds of papers describing the advantages of pseudo-labeling.
If you search on google, you’ll find good explanations on what it is, and it’s advantages.

Thank for the info. But, I don’t get the logic of adding test set to training when we have already been provided by 9,540 samples (dev_seen + dev_unseen + training), is this samples not enough to create a good model? Do we still need to result to pseudo labeling given that we have huge number of samples from dev and training set?

@ipr999 9540 samples is definitely not enough to train a good model, and in this cases is even less because dev_seen and dev_unseen are almost completely the same. That is actually one of the main problems with this competition, there isn’t enough training data.
The models have so many parameters that they can easily memorize (overfit) the dataset, instead of learning how to generalize. You can see that after some epochs the network starts overfitting to the training set (training loss is almost 0)
In machine learning more distinct data is always better.

Just to give you an example of the sizes used to train the standard models, whose weights are used when you choose to finetune a pre-trained model:

  • COCO dataset has 200K+ images
  • ImageNet dataset has 14M images

Of course those problems have many more classes, so you would always need a bigger dataset, but if you calculate images per class you’ll see that it has plenty more images than this competition.

1 Like

If I can still remember my basic statistics, 9,540 samples will be considered appropriate sample size, but off course you are right the larger the sample size the better. I respect your opinion about the sample size, but I think we can still create a good model given that we have 9,540 sample, as long as we can apply new ways to dissect the image and text data and convert it to a new data; in my case I have created 9,396 data for text and 29,778 for image.

I think you are very correct. In fact, our team members had the exact same questions in mind. We noted that one of the competitors entered and made a “super-human” entry for both phase 1 and phase 2 on the same day (close to the end of the competition) and stayed there forever. This looked highly suspicious to us, but to maintain the decorum of the competition we never complained!

I think it should be definitely scrutinised. In fact leading experts from industry and academia should be invited to review the entries. It seemed a little odd to me when FB declared the winners just based on the leaderboard numbers!

Hi all – The leaderboard has been updated based on a review of the top eligible scores. Thanks for your patience during this process. Don’t forget, you can hear from the prize finalists at the competition session at NeurIPS this coming Friday, and all winning solutions will be shared openly with a final announcement of the results in the coming weeks.

Thanks to everyone for all your hard work on this challenge.

1 Like