Hello, thanks for bringing this copy detection competition to us! I really expect to join in.
But I met a problem when I running make test-submission
in GitHub - drivendataorg/meta-vsc-descriptor-runtime: Containerized runtime for the Descriptor Track of the Meta Video Similarity Competition
Here is the main part of log in submission/log.txt
Validating submission...
+ conda run --no-capture-output -n condaenv python /opt/validation.py --query_features subset_query_descriptors.npz --ref_features reference_descriptors.npz --query_metadata /data/query_metadata.csv --ref_metadata /data/reference_metadata.csv --subset /data/query_subset.csv
+ echo 'Running similarity search to generate subset rankings for scoring...'
Running similarity search to generate subset rankings for scoring...
+ conda run --no-capture-output -n condaenv python /opt/descriptor_eval.py --query_features subset_query_descriptors.npz --ref_features reference_descriptors.npz --candidates_output subset_rankings.csv
2022-12-19 07:23:52 INFO Starting Descriptor level eval
2022-12-19 07:23:52 INFO Loaded 83 query features
2022-12-19 07:23:52 INFO Loaded 40318 ref features
2022-12-19 07:23:52 INFO Performing search for 99600 nearest vectors
2022-12-19 07:23:54 INFO too many results 12255872 > 199200, scaling back radius
2022-12-19 07:23:54 INFO too many results 328200 > 199200, scaling back radius
2022-12-19 07:23:55 INFO too many results 237113 > 199200, scaling back radius
2022-12-19 07:23:55 INFO too many results 213381 > 199200, scaling back radius
2022-12-19 07:23:55 INFO search done in 2.421 s + 0.244 s, total 161285 results, end threshold 14.4867
2022-12-19 07:23:58 INFO Got 96303 unique video pairs.
2022-12-19 07:23:58 INFO Limiting to 2075 highest score pairs.
2022-12-19 07:23:58 INFO Storing candidates to subset_rankings.csv
+ echo '... finished'
... finished
+ echo '... finished'
... finished
+ echo 'Running similarity search to generate rankings for scoring...'
Running similarity search to generate rankings for scoring...
+ conda run --no-capture-output -n condaenv python /opt/descriptor_eval.py --query_features query_descriptors.npz --ref_features reference_descriptors.npz --candidates_output full_rankings.csv
2022-12-19 07:24:01 INFO Starting Descriptor level eval
2022-12-19 07:24:01 INFO Loaded 8295 query features
2022-12-19 07:24:02 INFO Loaded 40318 ref features
2022-12-19 07:24:02 INFO Performing search for 9954000 nearest vectors
2022-12-19 07:24:05 INFO too many results 36767616 > 19908000, scaling back radius
2022-12-19 07:24:09 INFO too many results 23277797 > 19908000, scaling back radius
2022-12-19 07:24:11 INFO too many results 20995243 > 19908000, scaling back radius
2022-12-19 07:24:17 INFO too many results 39909330 > 19908000, scaling back radius
2022-12-19 07:24:20 INFO too many results 20196043 > 19908000, scaling back radius
2022-12-19 07:24:24 INFO too many results 19990746 > 19908000, scaling back radius
2022-12-19 07:24:47 INFO too many results 39617011 > 19908000, scaling back radius
2022-12-19 07:25:16 INFO too many results 20004222 > 19908000, scaling back radius
2022-12-19 07:25:30 INFO search done in 79.957 s + 7.992 s, total 12054860 results, end threshold 14.8545
/tmp/tmp76jdk3kh: line 3: 73 Killed python /opt/descriptor_eval.py --query_features query_descriptors.npz --ref_features reference_descriptors.npz --candidates_output full_rankings.csv
ERROR conda.cli.main_run:execute(49): `conda run python /opt/descriptor_eval.py --query_features query_descriptors.npz --ref_features reference_descriptors.npz --candidates_output full_rankings.csv` failed. (See above for error)
It looks like the subset is processed smoothly. But processing the full set goes wrong. I can’t see the detail error log, could you please take a look?