Simple CNN with image size 224 gave good results.
Experiments shown that larger backbones did not give better results, probably because features in the images are not so numerous.
Increasing image size nevertheless gave better results increasing dropout for the simple CNN.
Adding LSTM or GRU with big size images would then make several GPUs needed to process the batches.
“It would be great if we somehow could see how traditional methods like SATCON and ADT would have performed on both test sets as a reference”
If the organizer allows submission after competitions, then one can try and submit. But it seems that the submit button becomes disabled after the competition for drivendata.
I dont think that possible as these methods require additonal input as far as i know. I was more hoping that the organizers could provide the additional benchmark
seems that the top teams have revealed their solutions. Here is mine:
4th-place in private ranking
image backbone: resnet18D
emsemble of the base models
basically, the base model use the network architecture:
embed the image to 512 dim vetcors
concate with timestamp embedding (can be just the timestamp itself or sin+cos embedding)
for encoder, the past history windspeed is input as well
1a. resnet18D-224 LSTM: bi-lstm-2layer for encoder, lstm-2layer for decoder (2xfold)
1b. resnet18D-256 LSTM: same as above
2a. resnet18D-224 transformer: 2layer multihead-dot-product-attention (mha) for encoder, 2layer mha for decoder (2xfold)
2b.resnet18D-256 transformer :same as above(2xfold)
3a. resnet18D-224 transformer: similar to 2a but with some minor changes on how to concate image and timestamp features(2xfold)
4a. resnet18D-224 transformer: similar to 2a but with another minor changes on how to concate image and timestamp features(2xfold)
during training, I use 8 to32 batches. each batch has 32 image sequence, with are randomly sampled in time (i.e. they can be 0.5,1 and even up to 8 hrs apart). The history length is also randomly sampled. history can be from 0 to 31