When you run the download, are you able to partially download the CDEC data (i.e., it runs partway before you get a ConnectionError), or does it happen immediately and you aren’t able to download any of the CDEC data?
I just pushed an update where the download code will retry up to 5 times with a delay when it encounters a ConnectionError. In other situations, this has solved this kind of problem for me. Please update your repositories and try again, and let me know if you still encounter any issues. You should make sure that you have this commit.
I also have similar issue with USGS streamflow. Only got 18 files for 18 sites each year while the description says we should have data for 25 (out of 26) sites.
@jayqi : Could you kindly let us know how to fix this issue? or you could just upload the files somewhere (as they are not very big) ?
That is correct. CDEC has snow monitoring stations in California, and only 3 of the forecast sites are in California: san_joaquin_river_millerton_reservoir, american_river_folsom_lake, and merced_river_yosemite_at_pohono_bridge. This is intended to be a supplement to the SNOTEL snow monitoring stations, which do not have as much coverage in California.
Can you please provide more detail about what errors you are seeing? For example, please post a stack trace or error log message.
It is case that 25 of the 26 sites have associated USGS monitoring stations. However, not all 25 of the sites will have available data for every year (this varies year by year). For example, I have data for 22 sites for FY2005 and 18 sites for FY2023. We will publish a list soon of the files that will be present in the code execution runtime for the test split.
Regarding uploading the files—because of the wide range of possible locations and years that teams way want both for training and for testing, we are generally not planning to directly upload feature data for you to download.
But for me, it didn’t work with the new cdec.py.
I encountered the same error: requests.exceptions.ConnectionError.
The situation doesn’t change, I can partially download the CDEC data but it failed.
I am using the same version of libraries in data_download repository, but I don’t get it why it’s keep happening.
I’ve made an update to the CDEC code that should improve reliability. It now downloads data for a batch of up to 100 stations at a time, instead of for each station individually. It also now runs serially without multithreading. Based on my testing, the reduced amount of network calls to CDEC servers should improve things. The new code is available as of this commit.