New Tutorial: Finetuning Wav2Vec2 with Hugging Face Transformers for the Phonetic Track

cszc · March 11, 2026, 5:42pm

Hi Solvers,

We just published another reference implementation! This tutorial for the Children’s Speech Recognition Challenge: Phonetic Track.

In this in-depth blog post and companion repo, we walk through how to fine-tune Wav2Vec2 using the Hugging Face Transformers library. The solution results in a 0.3460 IPA-CER on the public leaderboard.

In the tutorial, we:

Demonstrate how to load and explore the data.
Provide a basic framework for building a model.
Walk through how to package your work correctly for submission.

Whether you’re just getting started or looking to benchmark your approach, this should give you a strong foundation to build on. Read the full post here: Competition: On Top of Pasketti: Children’s Speech Recognition Challenge - Phonetic Track

And if you haven’t already, you can check out the Word Track tutorial here: https://drivendata.co/blog/child-asr-word-benchmark

Happy modeling!

Topic		Replies	Views
New Tutorial: Finetuning Parakeet with NeMo Children’s Speech Recognition Challenge	0	193	March 3, 2026
Now that we are done, who wants to talk about what worked? Children’s Speech Recognition Challenge	15	336	July 5, 2026
Qwen_asr is not available Children’s Speech Recognition Challenge	3	276	February 20, 2026
Use of Adult Speech Data for Pretraining and Fine-Tuning Children’s Speech Recognition Challenge	3	182	March 23, 2026
Can we use data from other track? Children’s Speech Recognition Challenge	7	193	March 18, 2026

New Tutorial: Finetuning Wav2Vec2 with Hugging Face Transformers for the Phonetic Track

Related topics