No discussion for this competition?

sanket10 · December 12, 2019, 8:45pm

Surprisingly, there is not much of discussion that is happening in this competition, either the competition is really easy or everyone is being too competitive about it.

Anyways, I am stuck a point where I am unable to reduce my loss after 0.5 on leaderboard.
My reason on why this is happening:

Varying image sizes or roof dimensions (i.e. small big)
Imbalance in classes, however I tried common augmentation techniques, and could improve leaderboard rank.
Doubt regarding, misclassified images in train dataset

Feel free to add comments…anyways only one week is left, trying to understand as much before competition ends.

~Sanket

ocitalis · December 12, 2019, 9:50pm

This is my first contest and I’ve been a little surprised that no one communicates. It seems like Kaggle contests have more discussion.

I don’t think the variable image size is causing me problems, but maybe I just haven’t noticed.

The misclassified images are really problematic. I’ve tried a number of ways to handle it, including manually fixing the labels on ones that are obviously wrong. Manually fixing the worst offenders improves things a little, but I had to ensemble using the original labels (not the fixed labels). This project is an exercise in building a classifier to replicate the human subject matter expert’s errors. It’s a little disappointing.

sanket10 · December 12, 2019, 10:10pm

Thank you replying @ocitalis , thanks for your reply. Good to know that there are competitors willing to explore.
Will definitely update on my progress.

jearly · December 13, 2019, 11:47am

To clarify, when discussing misclassified images here, are you referring to the unverified Castries and Gros Islet data or the other (verified) data?

ocitalis · December 13, 2019, 2:33pm

I was referring to the verified data that appears to be incorrectly labeled despite being verified.

SamSepiol · December 14, 2019, 6:19am

Yea life feels tough here when you are from Kaggle. We have to face uncleaned data and discussion-less competition . With regards to dirty verified data, just rely on your local CV and don’t overfit public LB as we don’t know the labelling&distribution of pub&private LB.

gaetanbahl · December 16, 2019, 8:04am

Hello everyone. I am stuck as well, it is like I hit a brick wall at 0.45 and it has been very frustrating.
Using the non-verified data as is has not worked for me (~0.5).

I tried correcting the non-verified labels by training a classifier on verified data and using that to relabel the non-verified data. It doesn’t work (yet), I hit the same brick wall…

There must be some trick that we are missing to go lower. 0.35 is so much lower it seems impossible.

jearly · December 16, 2019, 9:55pm

Agreed, I’ve leveled out at around 0.51 and can’t seem to squeeze much more performance out of my models, hopefully we can find out after the competition what the top teams have done.

Topic		Replies	Views
Release of winning solutions? Mapping Disaster Risk from Aerial Imagery	5	866	February 14, 2020
Different results on personal test set and competition test set Mapping Disaster Risk from Aerial Imagery	3	640	December 11, 2019
Data Quality Issues? Mapping Disaster Risk from Aerial Imagery	3	820	December 14, 2019
Which loss for final submission? Mapping Disaster Risk from Aerial Imagery	0	543	December 14, 2019
The process of qualitative evaluation On Cloud N	1	343	February 8, 2022

No discussion for this competition?

Related topics