Are we allowed to do this?
Did we even need to use noise-augmentation on training data for the phonetic task? I suspect that the Talkbank+Drivendata dataset distribution were similar to the pvt set given how close my LB scores were to clean-val-cer.
What about the word track? For me CV LB never matched.
Never bothered because training was expensive.
I did the phoentic track and had an eval set that was 50-50 talkbank and DD. Ended up being accurate to blind set by 0.01 CER
0.01 as in 1% or 0.01% CER
Used any noise augmentation? I couldn’t decide if augmentation and to what degree was even needed for this track.
0.01 CER my bad. Tried all kinds of augmentation and nothing really worked