Back to DrivenData | Blog

Visualization of solution

I made a small video with visualization of how my method works:

4 Likes

Congrats! How did you detect fish numbers?

I trained XGBoost classifier which predict for frame if it has fish or not. As features it uses info obtained from neural nets about current frame and about N previous frames and N next frames (N=7 for my final solution). Using these predictions I count fishes. Fill “no fish” gaps with closest fish number.

1 Like

Thi ZfTurbo,
thanks for the video and congratulations to place 2 :thumbsup:

I was wondering what features you predicted with the neural net which you feed into XGBoost?
just the probability of ether fish species and no fish at all?
also, how did you made the length measurement?

  1. I used predictions from each classification neural net (3 in total) as features for XGBoost. There were 8 classes (7 fish types + no fish) for each net. Also I used some features from segmentation net (UNET) like sum of probabilities and maximum probability for frame.
  2. For length I used width, height or diagonal of bounding box (I choose what to use based on boat id). Bounding box was obtained from UNET segmentation for current frame. I also trained XGboost model to predict length, but it had the same quality as my initial algorithm.

I am really curious :slight_smile:

What did you feed into the XGBoost for length prediction?

and also, I had a lot of problems with computation cost and memory size (mainly because of crapy hardware) so I had downconverted the images size quite a bit. To what size did you scale the images applied to the conv nets?
And may I ask what hardware you were using?

I am really curious :slight_smile:
What did you feed into the XGBoost for length prediction?

I fed “no fish” class from each classification neural net, length calculated by default method. All features added for current frame and some neighbour frames. I didn’t experiment much with it, but feels like I need to add more features from UNET classifier or length obtained from bboxes on different thresholds (I only used single threshold length).

and also, I had a lot of problems with computation cost and memory size (mainly because of crapy hardware) so I had downconverted the images size quite a bit. To what size did you scale the images applied to the conv nets?
And may I ask what hardware you were using?

I have several 1080Ti and 1080 GPUs available. I created my own version of UNET specially for this problem. It uses input of original size: 1280x720. For classification I used default input for pretrained neural nets, e.g. 224x224 (ResNet, DenseNet) and 299x299 (Inception). It’s ok since I fed small crops to them. I got crops from region of interest, which extracted with UNET.