First of all before trying out different models and go for hypermeter tuning you should spend 1 or 2 weeks only on EDA. An okayish model is good enough for the beginning.
-
work on data cleaning
-
error bucket technique:
- run inference on your dev set. Pick out 100 samples and try to sort them into error buckets
- address the largest buckets first by augmentation/treshold techniques etc.
- more details about this method by landing ai
-
learn from similar competitions in the past: “Understanding Clouds from Stallite Images” 2019 kaggle competition.
-
use pseudo-labels:
- after cleaning your data you still have at least 5% data left to work on
- run inference on these samples with broken/bad masks with your best model
- reintroduce some of these masks back into your model, but only if they make sense/are accurate
- it will introduce bias into your data, so be careful which masks you want to reintroduce to your dataset
- since the hidden test set also consists of about 5% bad masks, you could finetune your model later on the orig. dataset to reintroduce the error
-
copy & paste augmentation
- paperswithcode link
- technique is initially been introduced for instance segmentation, but i’m sure there is a way to make it work for semantic segmentation
-
model ensemble of two models for cloud detection shared by Azavea
-
additional useful links shared by the competition organizers are at the bottom of this page
What else? Feel free to share