My logs are stuck at “Running submission with Python”. Not even a print appears. Is something wrong with the machines running the code?
Hi @Voltaire, thanks for the note!
This appears to be an issue with your code. If I try to run your submission using the GPU docker image on my local machine I see the same hanging behavior.
I couldn’t find an easy fix for you, but if I commented out the line from utils import *
the submission prints logs (and errors eventually). It seems to be the case that there is some hang up in the dependency imports. I couldn’t track down why it is hanging, but there seem to be a few different ways you import torch
and some potentially circular imports as well.
Since the problem reproduces for me locally in my test environment, I don’t think this is a problem with our cloud resources. I recommended you cancel your job (I canceled it for you) and get things running locally before submitting.
Hope that helps!
Hi @bull, thanks for the help, I was able to get it working again.
Best,
Philip