Does a HUC -> site_id mapping need extra data approval?

As part of my solution, I have a .csv file that matches the HUCs from the SWANN data to the sites. I wanted to ensure this doesn’t count as an additional data source that needs approval.

The HUC IDs are based on data from

It’s just a CSV file with columns HUC and site_id.

Since the mapping is fixed and it doesn’t contain data that are directly used as features, I assumed uploading it with my solution would be okay. But I wanted to make sure it is okay to use it.

Hi @kurisu, is not an approved data source, but I will follow-up with the challenge organizers to clarify this case, with the understanding that this is static geographic metadata.

I am following up with you by email. Can you please respond to that email as soon as possible with a detailed explanation of how you created the HUCs to site_id mapping, and a copy of that file?

Hi @jayqi, can we use USGS metadata from NWIS ( It also contains HUC. It can be retrieved via this code easily.

from dataretrieval import nwis
df_usgs_meta = nwis.get_info(sites=usgs_ids)[0]

Hi @jayqi

Thanks for the quick response. Exactly, it’s static geographic metadata. I use the mapping to decide how to match the sites to HUCs. Based on this list I download only the relevant UA/SWANN CSV files from the “HUC spatially averaged data (” as they are called on the approved data sources list.

Hi @kurisu and @rasyidstat,

Here’s an update following discussion with the challenge organizers.

We are not requiring that participants source HUC definitions from any specific source, as long as it is reasonably official. (We are not treating HUC metadata like normal feature data that must be from approved sources.) However, your process should be clearly documented and reproducible in your model report and training code.

Both the 250k HUC dataset from USGS ScienceBase and USGS NWIS / USGS Site Service are acceptable.

The above clarification has been added as a new section on the Problem Description page.