IMPORTANT: How to speed up Inference & Queue time

@Max_Schaefer

  1. I got CUDA out of memory error, would changing the batch_size help fix this error?
  2. I trained on batch_size of 8, would testing on a different batch_size give an error?

Full error:
RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 11.17 GiB total capacity; 10.54 GiB already allocated; 99.88 MiB free; 10.67 GiB reserved in total by PyTorch)