Me Ranting over this Competition

Fine-tuning nemo-asr models in this competition is a nightmare.

  1. Compute is a joke: 16GB on a P100 is barely enough to breathe. I frequently get into OOM crashes mid-run. It’s hard to iterate fast and climb LB when you’re just babysitting memory usage.
  2. Dependency Hell: I wanted to use the native NeMo Hydra configs and PEFT frameworks, but Kaggle’s Python version makes it a total mess. For a good amount of time, I was stuck debugging environment conflicts, only to move to HF-style code.
  3. The Validation Gap: My val_wer looks great, but the test set feels like a completely different distribution. Seeing the delta feels like I’m shooting in the dark and feel like giving up.

<rant_over>

2 Likes

Quite true, LB seems to come from an entirely different distribution. I’ve tried (and still am) multiple local validation strategies but the delta remains, and even the trend does not correlate most times.

Thanks for sharing your thoughts!

Well I got LB and local align for phonetic track but for word my model finetuned on train data was much worse then offical baseline…

If Gezi cannot get his CV-LB aligned, there’s no hope for the rest of us ! :upside_down_face:

Hi @oknaitik - We totally hear your frustrations - and thank you for being such an active and helpful presence in the forums!

A quick reminder to everyone: if you’re using external infrastructure, please make sure it complies with the competition rules. Competition data must remain private and encrypted, and you may not use services that retain or train on the data. It is the participants responsibility to ensure the data is treated in accordance with the rules.

On the modeling side, NeMo fine-tuning can definitely be finicky. We recently shared a reference solution that you may find useful.

And re: the validation gap - as noted in the problem description, the test set includes out-of-sample data. We encourage solvers to focus on approaches that generalize well across speakers, recording conditions, and speech types.

1 Like

Save&CommitAll long queues, some reporting sub-50s, have become a prominent issue and are being frequently reported on Kaggle’s forum. I’m literally waiting 1-2 hours for every run.

There’s another global crisis going on - P100 compute.