Using the USGS streamflow data source will probably be very important. But when you query the data for a USGS site it returns the observed flow at the site, while the train data in the competition is the natural flow. Will you post the details on how you calculated the natural flow in the training data so we can calculate it ourselves from the original USGS data?
Monthly time series of past naturalized flow observations is available now on the data download page. We’ve added a new section discussing the dataset on the problem description page.
Regarding calculating naturalized flows from observed streamflow data—the challenge organizers do not want participants to reproduce or model the naturalization calculations. The reason for that is because the adjustments in the naturalization calculation depend on water management operations, and water management operations in general may be influenced by operational water supply forecasts. The reason that we use naturalized flow as a target variable is that it represents the volume of water without human influence and should be able to be directly modeled by considering natural hydrological inputs, like snowmelt, climate and weather, etc.