I'm new here, dumb questions

How can I compete if I don’t have infrastructure, only google colab ? is there another option ?

Thank you :slight_smile:

Hi @diegoto,

Welcome! Can you elaborate on what you mean by not having infrastructure? Does this mean you don’t have access to a GPU (besides Google Colab), or you don’t have a machine that you can use to test your submission using the Docker container?

Having access to a GPU is not required in order to make a submission. You may be able to develop your solution in Google Colab and then copy your files into a zip file to submit on the platform. Note that you are not required to submit the code component of the submission (main.py script) in order to have your submission scored, although this is required before the end of Phase 1 in order to be prize-eligible. If it makes things easier, you could just submit the .csv for the Matching Track, or .npz descriptors for the Descriptor Track.

Having a machine with at least 12GB of free space and the other prerequisites installed will make it much easier to test your solution locally, although in theory you could do without it.

If you let us know more about what resources are available to you, perhaps we or other participants can provide more ideas.

Mike

1 Like

Hi @mike-dd,

Yes, my local machine is very useless and google colab does not support docker. Also, google colab does not have enough space in disk. What can I do to compete? I have tried using subsets for dataset and tricks for docker on google colab but it is very tiring, I have not achieved anything.

Thank you for your help,

Diego

Hi @diegoto,

Sorry for the slow reply during the holidays here.

Also, google colab does not have enough space in disk. What can I do to compete?

Have you tried uploading competition data to Google Drive and mounting that onto the Google Colab virtual machine? I have not personally done this but it seems like it should work. See docs here.

My other suggestion would be to focus initially on producing a submission with the prediction csv or descriptor npz files (for the matching or descriptor tracks, respectively). Don’t worry about the main.py script for now – you can get to this later once you’re satisfied with the rest of the submission.

I hope this helps. If other participants have used Google Colab for this or other competitions, please feel free to add your own comments.

Mike

Hi @mike-dd,

I have other dumb questions.

I would like to understand more clearly how the creation of the npz files works, could you please show a more extreme example? for example, what happens if there are several embeddings per video? how would you modify the format to handle the correspondence of what belongs to each video? i.e., per video there should be a list of timestamps and embeddings? i am confused about that, also, how would you calculate the metric taking that into account?

I am not using the code you have available, for me it would be very useful if you could share with me some python script to get results from the npz files.

I hope I can upload a result :sweat:

Thank you,

Diego

Hi @diegoto,

We’re happy to try to answer these questions.

You’ve probably already seen the example here for how to create the npz files. If you want multiple embeddings per video, you would have repeated video ids for the videos with multiple embeddings and distinct timestamp intervals for each embedding. So if you have multiple embeddings for video Q20000, something like:

qry_video_ids = [20000, 20000, 20001, ...]  
qry_timestamps = [[0.0, 1.1], [1.1, 2.2], [0.3, 1.4], ...]

The metric is computed using the maximum of the pairwise inner-product similarities between descriptor embeddings for each video pair.

To evaluate your submission locally, I’d encourage you to look into using the evaluation script provided by Meta here. This same script is also included as a submodule for the runtime repository, with more documentation here. This is the official evaluation methodology and we don’t really support other approaches.

I hope this helps. Feel free to keep the questions coming. They likely help others as well.

Hi @mike-dd,

Thanks for your answers.

I have other questions, reading the runtime specs I see it says the following “The submission must complete execution in less than 150 minutes” does this apply for the whole test set (~48000 query + reference videos)? or is it only for the query test set (~8000 query videos) ?

Regarding the introduction in the code submission format, when running the main.py file, what does the restriction 10 seconds per query mean ? ~800 query videos ? so, ~8000 query videos in 100 seconds ? what would be the relation with the restriction of complete execution in less than 150 minutes ?

Thank you.

Hi @diegoto,

The below sentence refers to your main.py script which will only run on the test subset of 800 videos.

The submission must complete execution in less than 150 minutes (ten seconds per query for ~800 query videos plus overhead for the similarity search)

So: 150 mins ~= 800 videos * 10 seconds + overhead.

We run this just to measure computational costs of your solution (and btw it’s not required in order to have your solution scored – only in order to be prize-eligible).

We don’t run your main.py script on the ~8,000 query videos. You just need to include the descriptors for the ~8,000 query videos (and 40,000 reference videos) in your descriptors .npz files.

Hope this helps!