Question about Smoke Test Dataset and WER Calculation

huix.c · March 9, 2026, 10:26pm

Hi,

Is the smoke test evaluated on the audio files that are available for download from the competition website? I was only able to find a little over 2,000 audio files.

I used those files together with the provided score.py script to calculate the WER locally, but the result is quite different from the WER shown on the website. Is this expected, or could it be that I downloaded the wrong dataset?

cszc · March 10, 2026, 5:49pm

Hi @huix.c,

The smoke test is evaluated on audio files from the training data. The training data are comprised of two corpora - one is hosted by DrivenData, and another is hosted on TalkBank. Instructions for accessing the TalkBank data are available on the data download page. You’ll need both datasets to reconstruct the smoke test data.

Good luck!

oknaitik · March 11, 2026, 7:52am

Can you review the pull request please?

oknaitik · March 12, 2026, 3:03pm

@cszc Did your team get a chance to review the PRs??

cszc · March 12, 2026, 9:31pm

@oknaitik Sorry for the delay. I was able to review and deploy today.

Topic		Replies	Views
Smoke test utterances Children’s Speech Recognition Challenge	3	109	March 7, 2026
Smoke test data source Children’s Speech Recognition Challenge	1	182	February 14, 2026
Submission pass smoke test but fail normal submission Children’s Speech Recognition Challenge	2	110	March 14, 2026
Entire Smoketest Used For Scoring? Children’s Speech Recognition Challenge	1	91	March 14, 2026
Clarification on Test Dataset Access and Submission	2	52	February 14, 2026

Question about Smoke Test Dataset and WER Calculation

Related topics