Back to DrivenData | Blog

No discussion for this competition?

Surprisingly, there is not much of discussion that is happening in this competition, either the competition is really easy or everyone is being too competitive about it.

Anyways, I am stuck a point where I am unable to reduce my loss after 0.5 on leaderboard.
My reason on why this is happening:

  1. Varying image sizes or roof dimensions (i.e. small big)
  2. Imbalance in classes, however I tried common augmentation techniques, and could improve leaderboard rank.
  3. Doubt regarding, misclassified images in train dataset

Feel free to add comments…anyways only one week is left, trying to understand as much before competition ends.

~Sanket

1 Like

This is my first contest and I’ve been a little surprised that no one communicates. It seems like Kaggle contests have more discussion.

I don’t think the variable image size is causing me problems, but maybe I just haven’t noticed.

The misclassified images are really problematic. I’ve tried a number of ways to handle it, including manually fixing the labels on ones that are obviously wrong. Manually fixing the worst offenders improves things a little, but I had to ensemble using the original labels (not the fixed labels). This project is an exercise in building a classifier to replicate the human subject matter expert’s errors. It’s a little disappointing.

1 Like

Thank you replying @ocitalis , thanks for your reply. Good to know that there are competitors willing to explore.
Will definitely update on my progress.

To clarify, when discussing misclassified images here, are you referring to the unverified Castries and Gros Islet data or the other (verified) data?

I was referring to the verified data that appears to be incorrectly labeled despite being verified.

1 Like

Yea life feels tough here when you are from Kaggle. We have to face uncleaned data and discussion-less competition . With regards to dirty verified data, just rely on your local CV and don’t overfit public LB as we don’t know the labelling&distribution of pub&private LB.

Hello everyone. I am stuck as well, it is like I hit a brick wall at 0.45 and it has been very frustrating.
Using the non-verified data as is has not worked for me (~0.5).

I tried correcting the non-verified labels by training a classifier on verified data and using that to relabel the non-verified data. It doesn’t work (yet), I hit the same brick wall…

There must be some trick that we are missing to go lower. 0.35 is so much lower it seems impossible.

Agreed, I’ve leveled out at around 0.51 and can’t seem to squeeze much more performance out of my models, hopefully we can find out after the competition what the top teams have done.