I wanted to seek out clarification on transcription rules. I thought we were meant to output the intended word (if the child says ‘Pasketti’, it turns into ‘Spaghetti’) but keep grammatical errors in-tact (‘Goed’ turns into ‘Goed’).
I see a transcription with the term “ ‘em “ (“we flipped ‘em at different angles”) and I want to clarify - we’re targeting these things like ‘em vs ‘them, right? Is there any clear set of rules for transcription and what to keep, etc?
Hi @adengit - thanks for the question! You are right - in general, the model should output the intended word, i.e. “them” instead of “em”. A description of the labels is included in the Problem Description. As stated there:
While efforts have been made to apply consistent normalization, the data were collected from multiple sources with differing annotation protocols, and some variation in labeling should be expected.