Good evening,
I apologize if this is naive, but am I allowed to know if the smoke test data is a subset of the 77k test utterances or if its just a subset of training data.
Thanks,
Denis
Good evening,
I apologize if this is naive, but am I allowed to know if the smoke test data is a subset of the 77k test utterances or if its just a subset of training data.
Thanks,
Denis
Hi @ooousay,
You can find this information in the documentation of the smoke test data:
In the smoke test runtime,
data/contains 9,000 audio files from the training set.
— Competition: On Top of Pasketti: Children’s Speech Recognition Challenge - Word Track
In the smoke test runtime,
data/contains 3,000 audio files from the training set.
— Competition: On Top of Pasketti: Children’s Speech Recognition Challenge - Phonetic Track
You can also take a look at the “Smoke test submission format” file on the data download page for each track for the specific utterance IDs for the smoke test audio files.