Responding to a few different items in one place for transparency for everyone who is interested in using R.
Another R package useful for data download is dataRetrieval for download of USGS streamflow data. I
Thanks! I have sent the approval request for the two packages.
@tabumis (and CC @riverbend since you brought it up in thread)—we received your request for the R packages snotelr
and USGS’s dataRetrieval
to be available. This is under review.
However, I do want to point out that in the code execution environment, we are rehosting predownloaded data files from both SNOTEL and USGS for locations associated with the forecast sites for the test years. These are just CSV files, and you can read these files however you want, such as using R.
For training, you can download data for training years using either Python or R without the necessary dependencies being included in the runtime environment. Your training environment is separate from the code execution runtime. The code execution runtime is for you to submit a trained model for performing inference.
If you are planning to download data for additional SNOTEL or USGS stations during the code execution run at inference time, then you are permitted make network calls to the NRCS and USGS web service APIs. We will review the snotelr
and dataRetrieval
packages for possible inclusion in the runtime environment, but you are also free to use generic HTTP request libraries to download that data.
@riverbend
What version of R will we have access to the runtime environment?
We will choose a relatively up-to-date version of R (4.0.0+) that is likely to be compatible with packages. This may be the current version of R (4.3.2). If you have known constraints (e.g., a package you need to use has particular requirements), please let us know.
How do I we find out which R packages are available via conda
You will need to determine whether conda-forge has a particular package available. The convention is typically that this is named "r-<packagename>"
, e.g., see r-dplyr. A Google search like "conda-forge r dplyr"
typically will turn up the relevant package.
it would be easiest for me if I could just provide a list of the requested packages in a forum post such as this
While we prefer pull requests to the runtime repository, providing a list in a GitHub issue will also be accepted.
I do not have any Python experience, is there a way run this completely in R Studio, the environmental I typically use for my analyses? Alternatively, I do have Team members who do have Python experience, so we probably could figure out how to run R from within Python if that is allowed. From my brief research on this topic, it looks like the Python rpy2 package is the best way to run R within Python
In the context of the code execution runtime, our recommendation for the simplest approach would be to conceptually follow the example that I previously posted (here). In this approach, you do whatever you like entirely in R (and you can use RStudio as your editor for writing and testing). Then, you would submit that R code along with a Python script that uses the subprocess.run
function in Python, which allows you to call command-line shell commands, to call your R scripts as if you were calling them from the command line.
Using a framework like rpy2 is likely more difficult if you are not proficient in Python. When using rpy2, one actually is writing Python code against rpy2 APIs, and rpy2 turns that into R code that it calls with the R program under the hood.