Many thanks to DrivenData team, Radiant Earth Foundation team, and all organizers and sponsors for this great challenge! Congrats and thanks to all participants! It was really hard to get to the top.
Special thanks to all people involved in data creation and preparation. High quality of the dataset makes it invaluable for any kind of research.
My solution is an ensemble based on several different architectures which were trained with different backbones and different depth of previous frames. Local cross-validation is based on 5-fold GroupKFold split by storm.
Architectures:
CNN
CNN-3D
CNN + LSTM (unidirectional, without attention)
CNN + Transformer (1-layer, 6-layers, 12-layers)
CNN + ConvLSTM2D
Backbones:
ResNet (50, 101, 152)
SE-ResNet (50)
DenseNet (121)
Inception
Xception
EfficientNet (B0, B3, B4, B7)
I used default resolution of 366x366, augmentations (flips and rotations) and corresponding TTAs. Best single model is CNN+LSTM trained on 24 consequent historical frames with ResNet50 backbone. In my experiments deeper historical data did not substantially improve the score but contributed to ensemble. I trained everything with Adam optimizer. Batch sizes and learning rates are different with aim to have largest batch which can fit in memory. Learning rate schedule is simple reduction on plateau.