Back to DrivenData | Blog

Feel stuck? Ideas that might help you to procced

First of all before trying out different models and go for hypermeter tuning you should spend 1 or 2 weeks only on EDA. An okayish model is good enough for the beginning.

  • work on data cleaning

  • error bucket technique:

    • run inference on your dev set. Pick out 100 samples and try to sort them into error buckets
    • address the largest buckets first by augmentation/treshold techniques etc.
    • more details about this method by landing ai
  • learn from similar competitions in the past: “Understanding Clouds from Stallite Images” 2019 kaggle competition.

  • use pseudo-labels:

    • after cleaning your data you still have at least 5% data left to work on
    • run inference on these samples with broken/bad masks with your best model
    • reintroduce some of these masks back into your model, but only if they make sense/are accurate
    • it will introduce bias into your data, so be careful which masks you want to reintroduce to your dataset
    • since the hidden test set also consists of about 5% bad masks, you could finetune your model later on the orig. dataset to reintroduce the error
  • copy & paste augmentation

    • paperswithcode link
    • technique is initially been introduced for instance segmentation, but i’m sure there is a way to make it work for semantic segmentation
  • model ensemble of two models for cloud detection shared by Azavea

  • additional useful links shared by the competition organizers are at the bottom of this page

What else? Feel free to share