Hi guys,
With competition almost over and no further attempts here’s an overview of my solution. The source is open-source and available at GitHub - antorsae/fish: Identify fish challenge https://www.drivendata.org/competitions/48/identify-fish-challenge/
STEP 1. Take each video and create new videos cropped to 384x384 centered in the zone of interest (where the fish action is happening) and rotating them so all fish / ruler is horizontal, so they end up like this:
INPUT
OUTPUT
This part is implemented here: https://github.com/antorsae/fish/blob/master/boat.py
STEP 2 Take all videos created in step 1) and build frame by frame predictions of length, fish species including a new species_no_fish when there’s no fish.
Here’s an example of the generated predictions for the training data:
row_id,video_id,frame,length,species_fourspot,species_grey sole,species_other,species_plaice,species_summer,species_windowpane,species_winter,species_no_fish
0,00WK7DR6FyPZ5u3A,0,157.63180541992188,1.7653231099146183e-09,1.0,3.6601498720756354e-08,3.1204638162307674e-07,2.3616204103404925e-08,5.9193606460894443e-08,1.3712766033791013e-08,6.261254969358587e-12
1,00WK7DR6FyPZ5u3A,1,162.67236328125,1.2406671523468304e-10,1.0,2.6489321847122937e-09,1.1546971734333056e-07,9.599543382421416e-10,4.391315311380595e-09,5.5750559724288e-10,1.2707434930703254e-13
2,00WK7DR6FyPZ5u3A,2,155.6374969482422,8.897231285054374e-10,0.9999971389770508,1.4144226270218496e-06,1.3051650284978678e-06,1.1586666737173346e-08,8.389072547743126e-08,3.659388880805636e-08,9.07268617872381e-12
…
this is implemented using again resnet-152 with transfer learning and classifying the 7 species + a new one (no fish) and regressing the length (only propagating gradients when it’s an actual fish). Oversampling is used for balancing classes. Implementation @ https://github.com/antorsae/fish/blob/master/fish.py
STEP 3 Initially I wanted to take the output of 2) and build a RNN to predict the fish sequence. I made some attempts to make it work: by trying to use a CTC function predicting the frame where the fish changes, or generating proxy signals (square wave with mod 2 of the fish number and use binary cross entropy loss, or a chainsaw wave using MSE loss) but none worked. I believe with more work the CTC approach would have worked.
So with just 1 day left I explored the frame-by-frame predictions https://github.com/antorsae/fish/blob/master/playground-sequence-prediction-thresholding.ipynb and settled on some basic thresholding to generate the fish sequences: https://github.com/antorsae/fish/blob/master/generate_submission.py
Not bad but way worse I think than what a decent RNN-based approach would have provided.
LESSONS LEARNT
Since the most important metric was sequence accuracy, I should have focused on an end-to-end RNN (or 3d CNN) sequence prediction taking the frame-by-frame CNN features and not just the frame-by-frame probabilities for 8 classes.
Any comments or suggestions would be greatly appreciated. I really enjoyed this competition!