Just a heads up, evidently the submission window is closed (3 hours early). The actual training data that was first released (not Talkbank/Childes) was incredibly corrupt - many transcriptions had words appended on, didn’t represent the audio, and I found 68+ cases of the transcription being 1 or 2 words, but the actual audio being 12+ words (the WER for the individual sample is often 12+). I didn’t make a public post, I truly didn’t know if it was a joke or a part of the challenge or what. I ended up, to get to a ~.11 WER on the local val set (that I can’t submit), needing to clean up the first training set a ton. Curious if anyone else also did this. A staggering amount of the transcriptions weren’t just “different transcription philosophies”, but completely incorrect (there were ipa characters too, btw)
I think clarity from the organizers regarding IPA vs not for the test data would have been good. The vocal exercises that sometimes used IPA and sometimes didn’t was a bit of an issue in the training set and kind of encourages probing to see if it is consistent in the test set. Oh well, I still enjoyed the competition.