Smoke test data source

Good evening,
I apologize if this is naive, but am I allowed to know if the smoke test data is a subset of the 77k test utterances or if its just a subset of training data.

Thanks,

Denis

Hi @ooousay,

You can find this information in the documentation of the smoke test data:

In the smoke test runtime, data/ contains 9,000 audio files from the training set.

Competition: On Top of Pasketti: Children’s Speech Recognition Challenge - Word Track

In the smoke test runtime, data/ contains 3,000 audio files from the training set.

Competition: On Top of Pasketti: Children’s Speech Recognition Challenge - Phonetic Track

You can also take a look at the “Smoke test submission format” file on the data download page for each track for the specific utterance IDs for the smoke test audio files.

1 Like