Performance metric calculated incorrectly?

This might be a super silly question but here I go…

I was playing around with the quickstart file.
When I use the original scripts, I got the mAP metric “0.038531746031746035”.
Then, I changed the number of predictions from 20 to 30 by changing the 13th line in “submission_quickstart/main.py” like this:

result_images = database_image_ids[database_image_ids != query_image_id][:30].tolist()

Then the metric improved to “0.06402116402116402”, which somehow surprised me.
Is it an expected behavior?
I feel the “score_submissions.py” should give an error if more than 20 predictions are provided for one query or the performance metric calculation should use the predictions with the 20 highest scores.

Thanks!

Hi @mizuhirosuzuki,

The provided score_submissions.py is very lightweight and only does a small amount of validation. It is not intended to be exhaustive. You are correct that it does not check for a limit on the number of predictions.

The scoring in the platform does more extensive validation, including the prediction limit, valid values, missing queries, etc. If you make a submission to the platform with more than 20 predictions for any query, that submission will fail validation.

Please see the competition documentation regarding the submission format for guidance on what your submission needs to look like.

Thank you for asking this question. If something seems confusing to a lot of competitors, we may update the documentation or update the provided resources to help clarify things.

1 Like