Solutions postings

I currently preparing solution write-up for organizers. I also plan to publish small solution writeup on arxiv after that (I will post link in this thread if you interested).

Code will be posted on drivendata github a little bit later, like it was in previous competitions.

My solution is based on these 2 libraries I prepared and already posted:

  1. The best model for me was DenseNet121 3D with (96, 128, 128, 3) input shape and batch size equal to 6. I used large dropout 0.5 at the classifcation layers to prevent overfitting. I started with imagenet weights which I converted for 3D variant of nets.
  2. I trained only on ROI part of videos, which I extracted using your code from forum: Python code to find the roi - #4 by Shanka ))
  3. Batches generated in proportion 25 (stalled==1) / 75 (stalled == 0)
  4. I validate using MCC from begining and then switched to ROC AUC. My validation score was around 0.96-0.98 ROC AUC.
  5. I started with micro dataset and then increase number of used videos up to ~50K (using all available stalled == 1)
  6. Last trick which allows me to increase score from 0.82 to 0.86 on public LB was to finetune only on tier1 data (looks like test set contains only tier1 ???)
  7. I applied augmmentations with volumentations library which I remade a little bit to increase speed and add some more useful augs.
  8. I used 5KFold cross-validation. My validation MCC score wasn’t the same as LB, but direction was similar. Increasing at local gives me better result on LB.
  9. Loss function: binary crossentropy. Optimizer: AdamAccumulate
  10. I choose THR to output binary probabilities using leaderboard (so there was some chance to overfit on LB). I found out the optimal number of stalled videos in test set was around 600-700.
1 Like