Smoke test data source

ooousay · February 14, 2026, 12:43am

Good evening,
I apologize if this is naive, but am I allowed to know if the smoke test data is a subset of the 77k test utterances or if its just a subset of training data.

Thanks,

Denis

jayqi · February 14, 2026, 1:59am

Hi @ooousay,

You can find this information in the documentation of the smoke test data:

In the smoke test runtime, data/ contains 9,000 audio files from the training set.

— Competition: On Top of Pasketti: Children’s Speech Recognition Challenge - Word Track

In the smoke test runtime, data/ contains 3,000 audio files from the training set.

— Competition: On Top of Pasketti: Children’s Speech Recognition Challenge - Phonetic Track

You can also take a look at the “Smoke test submission format” file on the data download page for each track for the specific utterance IDs for the smoke test audio files.

Topic		Replies	Views
Question about Smoke Test Dataset and WER Calculation Children’s Speech Recognition Challenge	4	155	March 12, 2026
Smoke test utterances Children’s Speech Recognition Challenge	3	109	March 7, 2026
Submission pass smoke test but fail normal submission Children’s Speech Recognition Challenge	2	110	March 14, 2026
Test data Audio Goodnight Moon, Hello Early Literacy Screening	1	74	January 15, 2025
Smoke test throws error as data files missing Goodnight Moon, Hello Early Literacy Screening	2	73	December 10, 2024

Smoke test data source

Related topics