Issue with training data points

srmsoumya · January 28, 2023, 5:24pm

Overlaying the training data points on Sentinel Mosaic shows some weird patterns of straight lines (that are very close by) that don’t cover any water body.
Is this an issue with the data collection process? I see a number of instances of similar patterns which might effect the training scores.

Points are color coded based on the severity scores.
ezgif.com-gif-maker

kwetstone · January 30, 2023, 4:10pm

Hi @srmsoumya! That’s a great question.

It is possible that there is noise in the underlying data. The data was collected from a large number of public health and water quality managers in the real world, and may be subject to human error. However, these cases are rare and there are no underlying systematic issues with the competition data.

This mirrors most real-world problems where data is not perfect It is a good note and worth considering if there is a better way to identify and handle these cases.

Good luck!

Topic		Replies	Views
Waterbody types or data errors Tick Tick Bloom Challenge	1	315	January 17, 2023
Train and test data consistency Youth Mental Health: Automated Abstraction	11	263	October 14, 2024
Data Quality Issues? Mapping Disaster Risk from Aerial Imagery	3	820	December 14, 2019
Benchmark code not working? Tick Tick Bloom Challenge	2	413	January 30, 2023
Data download problem N+1 Fish, N+2 Fish	7	1073	September 6, 2017

Issue with training data points

Related topics