Hi guys,
With competition almost over and no further attempts here’s an overview of my solution. The source is open-source and available at GitHub - antorsae/fish: Identify fish challenge https://www.drivendata.org/competitions/48/identify-fish-challenge/
STEP 1. Take each video and create new videos cropped to 384x384 centered in the zone of interest (where the fish action is happening) and rotating them so all fish / ruler is horizontal, so they end up like this:
This part is implemented here: https://github.com/antorsae/fish/blob/master/boat.py
STEP 2 Take all videos created in step 1) and build frame by frame predictions of length, fish species including a new species_no_fish when there’s no fish.
Here’s an example of the generated predictions for the training data:
row_id,video_id,frame,length,species_fourspot,species_grey sole,species_other,species_plaice,species_summer,species_windowpane,species_winter,species_no_fish
this is implemented using again resnet-152 with transfer learning and classifying the 7 species + a new one (no fish) and regressing the length (only propagating gradients when it’s an actual fish). Oversampling is used for balancing classes. Implementation @ https://github.com/antorsae/fish/blob/master/fish.py
STEP 3 Initially I wanted to take the output of 2) and build a RNN to predict the fish sequence. I made some attempts to make it work: by trying to use a CTC function predicting the frame where the fish changes, or generating proxy signals (square wave with mod 2 of the fish number and use binary cross entropy loss, or a chainsaw wave using MSE loss) but none worked. I believe with more work the CTC approach would have worked.
So with just 1 day left I explored the frame-by-frame predictions https://github.com/antorsae/fish/blob/master/playground-sequence-prediction-thresholding.ipynb and settled on some basic thresholding to generate the fish sequences: https://github.com/antorsae/fish/blob/master/generate_submission.py
Not bad but way worse I think than what a decent RNN-based approach would have provided.
Since the most important metric was sequence accuracy, I should have focused on an end-to-end RNN (or 3d CNN) sequence prediction taking the frame-by-frame CNN features and not just the frame-by-frame probabilities for 8 classes.
Any comments or suggestions would be greatly appreciated. I really enjoyed this competition!