Outside data sources?

To start I understand the nature of the competition and why you don’t want to open up any/all datasources (since it is a mix of past data in the test set). I would like to petition though for a few additional data sources.

A: The use of NVI from the same planetary computer web-resource you have shown for other imagery data. One of the example articles you listed as resources uses this, Mapping algal bloom dynamics in small reservoirs using Sentinel-2 imagery in Google Earth Engine - ScienceDirect. The NVI has coverage over the entire sample period time span (at the monthly level it appears). Can we use this data source (querying months prior to the sample measurement?). This is just derivative from the same satellite imagery you do say is OK to use.

B: The use of NOAA’s hydrography vector dataset. This is useful for several reasons. One reason is that smaller lakes have higher concentrations in the data I can see. Second it provides bounding boxes to select out of the raster data of interest. I get that the survey of this vector dataset only starts in 2017 (that I can find), but given this is generated by NOAA I think NASA should be open to using that data.

In the spirit of openness here is even a snippet of python code to download a water-body given an input lat/lon from a hosted source at ESRI.

# Example script downloading info for Lakes
import requests

# Example downloading lake used in the benchmark example
lat = 41.98006
lon = -110.65734

# Creating the request URL
url = 'https://services.arcgis.com/P3ePLMYs2RVChkJx/arcgis/rest/services/USA_Detailed_Water_Bodies/FeatureServer/0/query?'
url += 'where=1%3D1&outFields=*&geometry='
url += f'{lon}%2C{lat}%2C{lon}%2C{lat}'
url += '&geometryType=esriGeometryEnvelope&inSR=4326&spatialRel=esriSpatialRelIntersects&outSR=4326&f=json'

# Now can grab the result from ESRI
response = requests.get(url)
lj = response.json()
data = lj['features'][0]['attributes']
print(data) # Lake Viva Naughton
# see also lj['features'][0]['geometry']
# for the vector geometry

Thank you, Andy Wheeler

1 Like

Hi Andy! These are very good points, and we appreciate your thoughtful question and engagement with the data!

At this point, we unfortunately can’t add new approved data sources. The list of approved sources on the competition page is based on discussion with subject matter experts at NASA about the most important variables, combined with input from water quality managers about what is feasible for them to use as input to a model on a regular basis. The main constraint here is the latter - keeping the model usable for a broad range of public health groups.

I’d recommend looking through the approved sources to find the closest information to what you are interested in from NVI and the hydrology vector dataset. It sounds like there is a lot of overlap, and you can derive very similar information from the existing sources.

Best of luck, and we look forward to seeing your work!