Hello to all the participants of the Meta Video Similarity Challenge! It’s exciting to see so many of you pushing the boundaries of performance across both tracks of this challenge.
As we move closer to the end of Phase 1, I wanted to remind you all that for your submissions to be considered eligible for Phase 2 of the competition, you must include a code execution component to your submission that contains code capable of running inference on a subset of videos in the test dataset. For more details, please see the code submission format page for each track (descriptor track link, matching track link).
Please continue to post any questions or clarifications you have here, and good luck!
@chrisk-dd Thank you for this competition. It is a great opportunity to learn new things.
I have a question regarding the Matching Track – evaluation and submission.
From competition details “For your Phase 1 submissions, you will submit a list of predicted matches for all videos in the query test set, as well as code capable of generating predicted matches for an unseen query video and database.
In Phase 2, you will have access to approximately 8,000 new query videos, which may or may not contain content derived from videos in the Phase 1 test reference corpus.”
According to the baseline implementation provided on Github (vsc2022/baseline.md at main · facebookresearch/vsc2022 · GitHub) query and reference test descriptors are required to perform matching and get predictions full_matches.csv (sscd_baseline.py script). But a pseudocode version of main.py script focuses only on the subset of query video ids. It seems I miss something, but how the matches can be obtained without the reference videos? Do we have an access to the reference test videos from main.py from your computer cluster in the Matching track?
What is the simple proper way of implementing the baseline matching in the main.py? Do we have to calculate query and reference descriptors on your computer cluster online together as it is asked in Descriptor track before performing matching? Is it okay (reasonable) to pre-calculate descriptors for all videos in the reference test set and provide them in a zip submission file (to not exceed the 140 min limit)? Or, probably, a better (viable?) approach is to build some matching-based model with train reference videos locally and use it on your compute cluster that will decide whether the query test video is derived from the video reference set or not?
And, to make it clear, in Phase 2 we should expect that our algorithm will only see a new set of query videos, right? The reference test videos will not be updated in this case, right?
Hey @igla! Thanks for your great questions! I hope I can provide some clarity for you around what we expect you to submit for the Matching Track.
You are correct that the main.py pseudocode script focuses on generating matches for only a subset of query videos. This is because we expect you to pre-calculate and attach to your submission any data you need in order to be able to conduct matching on this subset of test query videos, including descriptors or other information about the reference set.
In the Descriptor Track, the need to attach pre-computed data on the reference set was a bit more clear because we explicitly require you to submit reference video descriptors. Because the Matching Track is more open-ended, we don’t explicitly require you to submit these descriptors. However, if you are adapting the baseline solution, you would want to do exactly that - attach your reference descriptors (similar to the way in which you might have done for the Descriptor Track) to your submission and use them to conduct matching with the descriptors you generate on the test set.
In other words,
Is it okay to pre-calculate descriptors for all videos in the reference test set and provide them in a zip submission file? Or to build some matching-based model with train reference videos locally and use it on your compute cluster that will decide whether the query test video is derived from the video reference set or not?
I think the answer to both of these questions is “yes.” It is okay for you to pre-calculate reference descriptors on the test set and provide them to your code submission, and it is okay for you to train a matching-based model with train reference videos locally and use that model to conduct inference in the code execution environment.
And, to make it clear, in Phase 2 we should expect that our algorithm will only see a new set of query videos, right? The reference test videos will not be updated in this case, right?
Yes, you are correct. You will only see new query videos in Phase 2.
I hope this helps answer your questions - please follow-up if there’s anything I’ve missed!