Smoke test utterances

Hi,

I was wondering whether you could share the exact utterances used in the smoke dataset, along with the corresponding scoring script.

I ran my model locally on the utterances in smoke_test_submission_format and evaluated the results using metric/score.py from the provided GitHub repository, but the WER I obtained locally was dramatically different from the WER reported by the cloud smoke test.

Having access to the exact smoke test utterances and scoring setup would be very helpful for debugging whether the discrepancy is due to an environment mismatch or an issue with my own model.

Thank you very much for your help.

Hi @jialuli - The exact utterance IDs are shared in the “Smoke test submission format” file on the data download pages. That and metric/score.py should give you everything you need to replicate the score locally. Good luck!