Clarification on rules

I wanted to seek out clarification on transcription rules. I thought we were meant to output the intended word (if the child says ‘Pasketti’, it turns into ‘Spaghetti’) but keep grammatical errors in-tact (‘Goed’ turns into ‘Goed’).

I see a transcription with the term “ ‘em “ (“we flipped ‘em at different angles”) and I want to clarify - we’re targeting these things like ‘em vs ‘them, right? Is there any clear set of rules for transcription and what to keep, etc?

Hi @adengit - thanks for the question! You are right - in general, the model should output the intended word, i.e. “them” instead of “em”. A description of the labels is included in the Problem Description. As stated there:

While efforts have been made to apply consistent normalization, the data were collected from multiple sources with differing annotation protocols, and some variation in labeling should be expected.

Thank you! In general, is there consistency in the test set (follow the intended word, etc)?

Thank you so much!

Yes, as much as possible, we’ve tried to transcribe the intended word in the test set, but again, there might be some variation.