No Idea is Working

I have experimented with various preprocessing methodologies, including the removal of false kelp and scaling the channels by dividing by 65535, alongside employing different models such as UNet/DeepLab with various backbones with hyperparameter tuning and also tried pixel-wise classification. Despite these efforts, the outcomes have been unsatisfactory. I am currently facing challenges in identifying the root cause of the issue and determining the most effective approach. Any guidance or suggestions on how to diagnose and address these issues would be greatly appreciated.

1 Like

Scaling by 65535 might not be the best way to go here.

If you read the page linked in the description about the USGS lvl2 Landsat products, it says that the valid range is:

7273-43636

So you can try and clip anything outside of those ranges.

If you want to get more involved, you can look at how it was done in this paper.
Their code is here, but basically they normalized with min and max values:

R_MIN=$(echo “(0 + 0.2) / 0.0000275” | bc -l)
R_MAX=$(echo “(0.3 + 0.2) / 0.0000275” | bc -l)

Which is basically 7000 to 18000.

1 Like

Got it, I’ll make sure to clip any values outside the specified range for both the satellite and mask images.

Also, I’m wondering if there are any preprocessing algorithms designed for Landsat data that might be useful here. Would it be worth checking them out, or would it be a waste of time?

Maybe you know this, but it is necessary to use more than the RGB channels for this data. The other channels are more useful than the RGB. There are negative values too, and those need to be handled (set to 0 or max or whatever) before scaling. I’ve been scaling by 65535 and it’s not bad but fnand’s suggestion is good.

Yeah, I have tried removing negative values (-32768) by replacing them with 0 so that the model doesn’t focus on these pixels, scaled them by 65535, and then removed false kelp pixels using DEM and cloud masks and also used false color channels as well. But nothing is working out for me. That is where I am confused as to why it’s not working. My dice score on the val set is coming maximum up to 0.57. But it’s not increasing above this. Can you help me in this please ?

I’m not removing any “false” pixels under clouds, I’m ignoring 2 bad training images only. My local validation scores are fairly close to my LB score. Maybe you have a bug in your code that creates your test predictions?

Also be careful with this assumption.
I’ve seen a few cases where the coregistration between the DEM and the spectral bands are way off (HH246316)

Oh. I thought if there is a cloud present in a region and if kelp is marked in the segmentation mask, then it should be a false segmentation.

Can you please explain a little bit more what you mean by that? Do you mean that the geolocation between the spectral bands and DEM doesn’t match?

I am using the DeepLab model with different backbones but am not getting good scores with it. Shall I waste my time on hyperparameter tuning for this model or switch to vision transformers? Also, is it good to use false color channels with RGB channels for training, or shall I train them separately?

So, take my opinions with a pinch of salt as I’m not scoring that high, but I haven’t seen much of a correlation between model size and score.

Have you tried a simple small-ish unet?

Try use all the bands you can and maybe calculate some vegetation indices.

From what I’ve seen in the spectral bands, there are values as low as 0 and as high as 65535 (disregarding nodata values). What are these then? Saturated pixels? Is it safe to just downright clip them?
Also, I don’t understand that 7000-18000 range, if from here they say 7273-43636.

I’m not sure how you’re using the DEM, but keep in mind that a good 10 to 15% of the kelp pixels are between 0 and 15 meters of altitude. If you just set the mask as 0 where DEM > 0 you’re losing a significant portion of kelps.

Also keep in mind that the highest peak of the Falkland Islands is at around 706 m (and no kelp reaches that high).