Custom descriptor_eval.py and retrieval evaluation requirements

I have two questions about the retrieval evaluation requirements for the descriptor track:

  1. Can we customize descriptor_eval.py, for example by using the provided SSCD score_normalization from vsc/baseline/sscd_baseline.py? If we do something like this, how does the scoring work? Does the runtime use provided code to evaluate the uploaded embeddings, and also execute the main.py separately? Or does it use some default code to evaluate uploaded embeddings and just run main.py separately to verify it executes?

  2. Assuming we can customize evaluation, can we do additional computation following computation of embeddings (as part of evaluation)? In my case, I would like to compute embeddings, then combine query and reference embeddings with some (very cheap) function. This would output some meta-embeddings which could be fed to e.g., _global_threshold_knn_search as normal. Downside is that the meta-embeddings would exist per query-reference pair, and I’m not sure if that violates the “one embedding per second” requirement. I would consider it part of the retrieval lookup function. Any guidance on this?

Hey @simple_machine-

Does the runtime use provided code to evaluate the uploaded embeddings, and also execute the main.py separately? Or does it use some default code to evaluate uploaded embeddings and just run main.py separately to verify it executes?

The runtime runs the entrypoint.sh shell script, which first attempts to run the provided main.py if it exists to generate a subset of query descriptors, then runs descriptor_eval.py to evaluate this subset against the appropriate subset of the ground truth. It then runs descriptor_eval.py on the entire set of submitted query and reference descriptors.

Because the same retrieval must be run for all participants in the same way, it is not possible to customize descriptor_eval.py.

Can we do additional computation following computation of embeddings (as part of evaluation)?

I believe this is answered above, as it is not possible to modify descriptor_eval.py. In addition, please note the independence criterion in the rules:

Submitted descriptors for a video may not make use of other videos (query or reference) in the test set.

Please let me know if you have any additional questions or clarifications!

-Chris

Thanks that answers everything!