We were trying to use deepsparse for inference (installing it offline when the submission would run) and saw that it took 4+ seconds to run while locally using the same docker image and restrictions --memory=4G and --cpus=3 we saw that we could process 1 sample in 1.4s. The latter would give us enough time to try out our model, but since it is 4+ we could not finish in time.
What could be the reason for the large difference in inference time?
Here is a non-exhaustive list of things that might be different between your local setup and the competition runtime:
The particular CPU processor hardware may not be as powerful as what you have on your own machine. You can find details and a link to references about the virtual machine we’re using documented here.
For the competition runtime, we’re loading the images from an object store container (Azure Blob Storage) mounted as a data volume on the container. This may have different read speeds from what you see when reading from the local file system on your machine.
Another tip is to make sure you’ve carefully checked the logs from your run for whether there are any signs that something might not be working correctly.