Back to DrivenData | Blog

Bad masks (ahfi, cutk, iboy ...)

Hi,

I’m just joining the competition, after a quick sanity check of data, It looks like I’ve found out some bad masks. For example, look at chip_id = iboy, mask is zero (no cloud) but the true color image has clouds. Same for cutk, hxhj, ahfi, mpbf.
I’ve downloaded all the labels twice to make sure it was not a download issue.

Do I miss something or we’ve bad masks?

5 Likes

In some of my output during training I’ve noticed that there are more bad masks as well. I’d guess a couple of percent are wrong.

eg.

image

Would be really good if one of the organisers could comment on how clean the test data for the leaderboard is?

3 Likes

Some are inverted like this one:

Some looks very bad as the one you’ve spotted:

2 Likes

5-10% of masks are bad. It’s my opinion after short tests

3 Likes

Thanks for bringing that up. In preparing the dataset, we definitely saw examples of noisy labels. Mostly we saw examples of noise like @MPWARE shared in the first post – chips with scattered clouds that were labeled as all 0 or 1. @dfulu the example you shared is interesting; could you share the chip ID for that one so we could look into it?

2 Likes

Hi @rbgb, I don’t have the ID of that one handy. It was a random example I logged during training.

Here’s a few I do examples I do have

2 Likes

Here’s a bonus example (don’t know the ID)

Capture

1 Like

I would also say 5% of bad masks.
Another example like dfulu:

@rbgb Can we consider the same for both public and private test datasets?

3 Likes

@MPWARE Yes, you can assume a similar distribution of bad masks in the test set.

2 Likes

I also found afyr by accident, it seems you have already discovered this one. Thanks for sharing your observations.