Can I expect only these phonetic characters or more?

I collected phonetic letters from main and extra datasets and used scoring script to check if they are valid (they are valid).
That’s what I found:

phonetic_chars count
297923
i 111898
n 91048
ɑ 77628
ə 74327
t 71591
d 70436
ɪ 66453
s 59844
ɹ 53693
ɛ 49516
w 49466
k 47502
æ 46985
ʌ 45928
ʊ 40687
b 38208
ɚ 36146
l 35914
m 34477
u 33463
o 33289
ð 31564
e 31039
h 30903
f 29044
p 27322
g 25821
ː 24677
z 24175
j 18394
ɔ 15661
ŋ 12252
v 10185
θ 9506
ʃ 9022
ʧ 8903
ɫ 8134
ɾ 6571
ʤ 6301
ʔ 5670
ɐ 2174
ʝ 436
ʁ 179
c 176
ʒ 161
x 154
ɬ 117
ç 104
ɟ 100
χ 17
r 2
  • Can I expect only these characters, or should I expect more? If I need to expect another character, what kind of character should it be? Or maybe I need to remove or replace some of them.

    I understand that the distribution may vary, but I need to know what kinds of characters I should expect. This is very important for everyone, I think. It’s about the format we should expect for the future.

Please, LIKE IT if it’s useful post.

Only valid IPA characters as specified in the scoring script will be found in either the phonetic training or test set.

In phonetic track, are we supposed to predict the exact phonemes? For example, a child has a speech impediment and says “Wabbit” (/wæbɪt/) instead of “Rabbit” (/ɹæbɪt/). We predict “Rabbit”(/ɹæbɪt/) here, right?

@oknaitik Your task is to predict the former, /wæbɪt/, i.e. the actual speech sounds or phones, not the phonemes.

From the problem description:

The ground truth labels for the Phonetic Track are normalized phonetic transcriptions of individual utterances using the International Phonetic Alphabet (IPA), with a one-to-one mapping between Unicode characters and phones. Each transcription captures the full sequence of speech sounds in the corresponding audio clip and may include substitutions, omissions, or non-standard productions that are typically ignored in word-level ASR.

@cszc Kindly confirm if our model should generalize to real-world noisy conditions for the phonetic track as well.

@cszc @hannahmoro Kindly reply to this as well.

All available information about the phonetic track test set distribution is provided in the problem description. We encourage participants to focus on building robust, generalizable models rather than overfitting to specific conditions.