Some baselline performance for your reference

pure (ensemble) image CNN : public LB ~0.88
CNN+LSTM : ~0.72
CNN+LSTM+transformer : ~0.68

using smaller imagenet backbone is better, e.g. resnet18D or resnet34
using smaller image size is better: e.g. 256, 224

past temporal images and windspeed has more information than a single image

2 Likes

Simple CNN with image size 224 gave good results.
Experiments shown that larger backbones did not give better results, probably because features in the images are not so numerous.
Increasing image size nevertheless gave better results increasing dropout for the simple CNN.
Adding LSTM or GRU with big size images would then make several GPUs needed to process the batches.

1 Like

Great work heng! I follow you since Servestal competition and you never dissapoint :wink:

“using smaller imagenet backbone is better” Totally agree, I have tried with resnet18D and resnet34 and resnet18 performed better.

“using smaller image size is better” Not my case, I also have tried using smaller image size 224 but the RMSE decreased.

Looking forward to have a look at your code and find out my mistakes.

By the way, my best model was a CNN + BiLSTM + LSTM ~7.12

i am now preparing a blog to publish full details of my implementation code and experiment results. will let you know once it is ready!

4 Likes

It would be great if we somehow could see how traditional methods like SATCON and ADT would have performed on both testsets as a reference

“It would be great if we somehow could see how traditional methods like SATCON and ADT would have performed on both test sets as a reference”

If the organizer allows submission after competitions, then one can try and submit. But it seems that the submit button becomes disabled after the competition for drivendata.

I dont think that possible as these methods require additonal input as far as i know. I was more hoping that the organizers could provide the additional benchmark

It would be nice if the organizers at least make the test “wind speed” values publicly available for further refinement of our models.

1 Like

oh good! New at this and was wondering if my approach was at all correct. Looking forward to reading your blog post Heng!

seems that the top teams have revealed their solutions. Here is mine:

4th-place in private ranking

  • image backbone: resnet18D
  • emsemble of the base models

basically, the base model use the network architecture:

  • embed the image to 512 dim vetcors
  • concate with timestamp embedding (can be just the timestamp itself or sin+cos embedding)
  • for encoder, the past history windspeed is input as well

1a. resnet18D-224 LSTM: bi-lstm-2layer for encoder, lstm-2layer for decoder (2xfold)
1b. resnet18D-256 LSTM: same as above
2a. resnet18D-224 transformer: 2layer multihead-dot-product-attention (mha) for encoder, 2layer mha for decoder (2xfold)
2b.resnet18D-256 transformer :same as above(2xfold)
3a. resnet18D-224 transformer: similar to 2a but with some minor changes on how to concate image and timestamp features(2xfold)
4a. resnet18D-224 transformer: similar to 2a but with another minor changes on how to concate image and timestamp features(2xfold)

during training, I use 8 to32 batches. each batch has 32 image sequence, with are randomly sampled in time (i.e. they can be 0.5,1 and even up to 8 hrs apart). The history length is also randomly sampled. history can be from 0 to 31

1 Like