20k+ null reference_id in public_ground_truth?

Most query images in public_ground_truth.csv don’t have a reference_id
Is the same expected to the test set (the other 25k images)?
Thank you

Hi henrique! Welcome to the challenge, and thanks for the question.

You are correct in noting that most query images in the public ground truth do not have a reference id. These are the “distractor” queries.

There are a total of 50K query images, of which 10K have matches in the reference set, while the remaining 40K do not. See below a relevant section of the “2021 Image Similarity Dataset and Challenge” paper:

Development query image set (Phase I): 10,000 images from the reference set mixed with 40,000 distractor images that are not part of the reference set, that have been edited in various ways. Distractor image queries have no matching counterpart in the set of 1 million reference images.

I hope this helps - good luck!

2 Likes