ERA5-Land Issues: Accessing the cdsapi Python client in the runtime environment


Currently access to ERA5-Land and ERA5-Land-T reanalysis requires access to the Climate Data Store (CDS) API (cdsapi). However, as per the API instructions, this requires a CDS account and a .cdsapirc file with your CDS credentials.

Will we be able to use this account when executing the code in the remote runtime environment? Are there any restrictions as to which type of account we should be using (e.g. is there an account created for this challenge so we can all have equal access to the data)? If we have to use our own credentials, how can we securely upload the required .cdsapirc file (which would require us to share our login credentials as plain text).

On a similar note for the ERA5 data, the approved data sources only links the hourly ERA5 data, which is a massive data download. Should we assume that it is also okay to use monthly aggregations, even though that version of the ERA5 data is not specifically linked?

Finally, what is the storage capacity for runtime data downloads and is there a limit on how long it should take us to download data? Some of the downloadable data is quite large and may lake time to download (e.g. the monthly aggregations for ERA5 still take about an hour to download locally).

Any insight that could be shared on this topic would be much appreciated :slight_smile:

Hi @jitters,

Regarding Copernicus CDS credentials: you will need to use your own credentials to download data during the remote runtime environment by including them as part of your submission ZIP archive. These credentials will be stored in plain text in DrivenData’s storage backend for competition code execution. DrivenData staff—and only DrivenData staff—will be able to access your submission contents. We will only use these credentials while running your code to download data from CDS as part of your submission. If you have further questions or concerns, please let us know.

There are a few different ways you can authenticate the cdsapi client during the code execution runtime:

  1. CDSAPI_RC environment variable (probably the simplest): cdsapi supports the environment variable CDSAPI_RC for setting a path to your .cdsapirc file. This means, for example, you can include your .cdsapirc file in your, and then set os.environ["CDSAPI_RC"] = str(src_dir / ".cdsapirc") before you instantiate cdsapi.Client.
  2. CDSAPI_URL and CDSAPI_KEY environment variables: cdsapi will read these environment variables if they are set. Note that this happens when the cdsapi module is first imported, so you must set them before import cdsapi is ever run.
  3. Explicitly pass when instantiating Client: you can instantiate a cdsapi.Client with keyword arguments url and key, i.e., client = Client(url=..., key=...).

To note another gotcha: shell globs (* in shell commands) normally do not match files that begin with dots. That means if you’re doing something like zip src/* then it will not include a file like .cdsapirc. The make pack-submission command in the runtime repository uses glob, so it will not include .cdsapirc in the it creates.

Regarding approval for the monthly aggregations for ERA5-Land, I will confirm with challenge organizers and follow up.

Regarding storage capacity: the hardware specifications for the runtime nodes are available on the code submission format page. These nodes have 180 GiB of disk in total. Not all of that will be available in practice given the space needed by the operating system, but you should be able to use most of it.

@jitters The monthly-averaged version of ERA5-Land has been approved. Please see the latest announcement.