Hi! I am trying to do a submission of whisper medium model just to check the submission process and get an initial idea of metrics. I get “timed out after running for too long“, and i just wanted to confirm that that original model is indeed too slow for the challenge, or there is something else up with my submission. I essentially copied the parakeet example and replaced the model. Everything is running on cuda and batch size is 16. Any info would be helpful, thank you!!!
Try increasing the batch-size or the model latency is not small enough.
@apatakis Thanks for your question! It sounds like the model and current hyperparameters are likely too large/slow for the runtime constraints. We encourage you to experiment with various techniques to reduce inference time, such as faster model variants, more efficient batching and chunking strategies, and decoding settings. Good luck!
My submissions 305009, 305010 generated final output but didn’t go through scoring. Could you check if there were no issues on your end?
@cszc any help? the submission gets timed out. could you check if there’s any issue on your end?
hi thanks for replying! After increasing the batch size a lot, and restricting the max generated tokens, was able to perform the transcription in ~1h.
@oknaitik today mine went though with no issue. I had a smoke test submission some days ago that failed at scoring
Hi @oknaitik - I checked our logs and don’t see any issues on our end. However, your logs indicate that inference is running slightly over the 5-minute smoke test limit.
I’d recommend experimenting with the ways mentioned above to improve inference efficiency and bring it under the limit.