Clarification on rules

adengit · March 25, 2026, 8:32pm

I wanted to seek out clarification on transcription rules. I thought we were meant to output the intended word (if the child says ‘Pasketti’, it turns into ‘Spaghetti’) but keep grammatical errors in-tact (‘Goed’ turns into ‘Goed’).

I see a transcription with the term “ ‘em “ (“we flipped ‘em at different angles”) and I want to clarify - we’re targeting these things like ‘em vs ‘them, right? Is there any clear set of rules for transcription and what to keep, etc?

cszc · March 26, 2026, 3:51pm

Hi @adengit - thanks for the question! You are right - in general, the model should output the intended word, i.e. “them” instead of “em”. A description of the labels is included in the Problem Description. As stated there:

While efforts have been made to apply consistent normalization, the data were collected from multiple sources with differing annotation protocols, and some variation in labeling should be expected.

adengit · March 26, 2026, 7:03pm

Thank you! In general, is there consistency in the test set (follow the intended word, etc)?

Thank you so much!

cszc · March 26, 2026, 8:07pm

Yes, as much as possible, we’ve tried to transcribe the intended word in the test set, but again, there might be some variation.

Topic		Replies	Views
Qwen_asr is not available Children’s Speech Recognition Challenge	3	276	February 20, 2026
Filter Only Child's Speech Part Children’s Speech Recognition Challenge	1	268	February 5, 2026
Training data - incredibly corrupt Children’s Speech Recognition Challenge	2	104	May 4, 2026
Now that we are done, who wants to talk about what worked? Children’s Speech Recognition Challenge	15	336	July 5, 2026
Can I expect only these phonetic characters or more? Children’s Speech Recognition Challenge	6	190	March 31, 2026

Clarification on rules

Related topics