New Tutorial: Finetuning Wav2Vec2 with Hugging Face Transformers for the Phonetic Track

Hi Solvers,

We just published another reference implementation! This tutorial for the Children’s Speech Recognition Challenge: Phonetic Track.

In this in-depth blog post and companion repo, we walk through how to fine-tune Wav2Vec2 using the Hugging Face Transformers library. The solution results in a 0.3460 IPA-CER on the public leaderboard.

In the tutorial, we:

  1. Demonstrate how to load and explore the data.
  2. Provide a basic framework for building a model.
  3. Walk through how to package your work correctly for submission.

Whether you’re just getting started or looking to benchmark your approach, this should give you a strong foundation to build on. Read the full post here: Competition: On Top of Pasketti: Children’s Speech Recognition Challenge - Phonetic Track

And if you haven’t already, you can check out the Word Track tutorial here: https://drivendata.co/blog/child-asr-word-benchmark

Happy modeling!

1 Like