Forecast Data Question

Hi,
You state that the feature data during the Forecast will be downloaded the same way as the Hindcast (locally available), I was wondering if the data is not available when you download the data what should we expect to see?

[Edit 2] - Also, do you plan on providing an updated supplementary_nrcs_train_monthly_naturized_flow.csv file that includes the test year data? It’s not currently in the forecast download.

Hi @jimking100,

For each issue date, we will run the data download bulk command to download whatever data is available from each data source. Depending on how the particular data source is written out, if some data for some date isn’t available, there will either be missing rows in a CSV file or missing files in a directory.

Re: updated supplementary_nrcs_train_monthly_naturized_flow.csv—we will work on making this available the week after New Year’s.

When running predictions for 2024, I got an error “ValueError: No objects to concatenate”.

The error comes from this line

Could you kindly have a look if the cpc data is there ?

Hi,
I received two errors when the automatic 1/1/24 forecast was run and I am hoping you can provide some more insight on them. One was for the monthly naturalized flow and the other was the soi data. In both cases, I assume the files would exist and at least have the past data in the files since these files and the past data are already in your possession, but it seems this is not the case? Can you shed some light on what actually exists on 1/1/24 in the data directory? Do you expect us to load the past data?

On a broader note, it would be very helpful if you could provide us with access to the 1/1/24 - 1/11/24 data directory during this test period so we can better understand what is or is not being provided. I suppose we could make repeated submissions to print this data in our logs, but submissions are limited and this does not seem like a very efficient method.

Hi @jimking100,

I just thought I’d chime in and mention that the SOI file (/code_execution/data/teleconnections/soi.txt) appears to have been loaded and parsed without issue by my submission. That piece of my code is unchanged from the hindcast stage. Hope that helps you narrow down where to look, if nothing else.

Hi @motoki,

Thanks for flagging this. It looks like what happened is that there is actually no 2024 data available yet. The download pipeline saved a not-real data file, and wsfr_read.climate.cpc_outlooks did not handle this case (since it never came up in Hindcast). The data download drive and runtime image has been updated with a fix.

It is now expected that you should get these errors if you try to load the files directly:

FileNotFoundError: [Errno 2] No such file or directory: '/code_execution/data/cpc_outlooks/cpcllftd.2024.dat'

FileNotFoundError: [Errno 2] No such file or directory: '/code_execution/data/cpc_outlooks/cpcllfpd.2024.dat'

The wsfr-read package functions has been updated to skip those and log an expected warning that looks something like this:

2024-01-02 11:28:40.527 | WARNING | wsfr_read.climate.cpc_outlooks:read_cpc_outlooks_precip:292 - No CPC outlooks available for calender year 2024. Only data from calender year 2023 loaded.

I just reran your submission after these fixes were implemented, and it still failed (you should have received an email). It looks like you have your own copies of the data reading functions, rather than using the installed wsfr_read that is included in the runtime environment, so the errors that you got are expected.

1 Like

Hi,
So i’m use the logs to try to answer my previous questions and I’ve resolved the soi.txt issue (an issue with my code). I also see the issues in monthly naturalized flow data, but have questions:

  1. It appears the sweetwater data is missing entirely from the monthly naturalized flow - I would expect it to at least have some Oct or Nov data for 2023 or nan’s or zeros - can you explain?
  2. There are zeros in many of the Dec entries, is zero an actual value or does that mean there is no data for that month? Are nan’s ever used to show no data?

Hi @jimking100,

sweetwater_r_nr_alcova indeed has no data available. Due to the way the data processing is set up, these rows show up as missing instead of NA—these rows are also missing from the raw CSV from NRCS.

I don’t see any zeros in the data, but there are a lot of missing values. Are you sure you’re not turning missing values into zero on your end?

Regarding your request for access to the mounted data, we will look into making this available.

Thanks @jayqi . I managed to adapt my code and it works fine now.

Hi @jimking100,

Please see the latest announcement about access to mounted data and about the supplementary training data.

Hi everyone,

We’ve made an update to how test_monthly_naturalized_flow.csv is produced so that all 23 sites should show up with rows for every month since October 2023, even if there is no data. This should be reflected as of the 2024-01-08 issue date.

To confirm, for 2024-01-08, we still have no data available for three sites: pueblo_reservoir_inflow, sweetwater_r_nr_alcova, and ruedi_reservoir_inflow. You will see empty values for them for all three of the 2023-10, 2023-11, and 2023-12 rows.

@jayqi : I think my submission hit the same issue? Could you recheck if the cpc data is empty? I handle the missing file but not the empty file.

Hi @motoki,

The data looks fine to me. I believe the issue is with your code.

Here’s an example submission that I ran that loaded the data successfully using wsfr_read.climate.cpc_outlooks:

from loguru import logger
from wsfr_read.climate import cpc_outlooks


def predict(
    site_id,
    issue_date,
    assets,
    src_dir,
    data_dir,
    preprocessed_dir,
) -> tuple[float, float, float]:
    logger.info("site_id is {}", site_id)
    logger.info("issue_date is: {}", issue_date)

    df_precip = cpc_outlooks.read_cpc_outlooks_precip(issue_date, site_id)
    logger.info("df_precip.head():\n{}", df_precip.head())
    logger.info("df_precip.tail():\n{}", df_precip.tail())

    df_temp = cpc_outlooks.read_cpc_outlooks_temp(issue_date, site_id)
    logger.info("df_temp.head():\n{}", df_temp.head())
    logger.info("df_temp.tail():\n{}", df_temp.tail())
    raise Exception("Stop.")

Logs:

2024-01-23 15:16:28.315 | INFO     | src.solution:predict:13 - site_id is hungry_horse_reservoir_inflow
2024-01-23 15:16:28.315 | INFO     | src.solution:predict:14 - issue_date is: 2024-01-22
2024-01-23 15:16:29.584 | INFO     | src.solution:predict:17 - df_precip.head():
                               R   98.   95.   90.  ...  C MEAN    F SD  C SD  POWER
issue_date YEAR MN LEAD CD                          ...                             
2023-10-18 2023 10 1    20  0.34  0.85  0.98  1.12  ...    1.88  0.1036  0.11   0.29
                        21  0.37  1.86  2.15  2.43  ...    4.04  0.1303  0.14   0.30
                   2    20  0.22  0.73  0.87  1.01  ...    1.82  0.1169  0.12   0.29
                        21  0.22  1.72  2.01  2.29  ...    3.72  0.2733  0.28   0.52
                   3    20  0.07  0.99  1.14  1.29  ...    2.06  0.1197  0.12   0.33

[5 rows x 19 columns]
2024-01-23 15:16:29.599 | INFO     | src.solution:predict:18 - df_precip.tail():
                              R   98.   95.   90.  ...  C MEAN  F SD  C SD  POWER
issue_date YEAR MN LEAD CD                         ...                           
2024-01-17 2024 1  11   21  0.0  1.93  2.24  2.53  ...    3.72  0.28  0.28   0.52
                   12   20  0.0  1.07  1.24  1.40  ...    2.06  0.12  0.12   0.33
                        21  0.0  1.99  2.25  2.49  ...    3.49  0.13  0.13   0.34
                   13   20  0.0  1.48  1.82  2.12  ...    3.18  0.86  0.86   1.02
                        21  0.0  2.20  2.47  2.71  ...    3.70  0.16  0.16   0.41

[5 rows x 19 columns]
2024-01-23 15:16:30.428 | INFO     | src.solution:predict:21 - df_temp.head():
                               R    98.    95.  ...  C MEAN    F SD  C SD
issue_date YEAR MN LEAD CD                      ...                      
2023-10-18 2023 10 1    20  0.22  22.72  23.89  ...   27.40  2.8515  2.92
                        21  0.19  22.79  23.63  ...   25.93  2.0515  2.09
                   2    20  0.13  19.10  20.51  ...   24.95  3.4292  3.46
                        21  0.22  20.07  21.13  ...   24.11  2.5781  2.64
                   3    20  0.27  22.67  24.00  ...   28.45  3.2490  3.37

[5 rows x 18 columns]
2024-01-23 15:16:30.442 | INFO     | src.solution:predict:22 - df_temp.tail():
                              R    98.    95.  ...  C MEAN  F SD  C SD
issue_date YEAR MN LEAD CD                     ...                    
2024-01-17 2024 1  11   21  0.0  18.70  19.78  ...   24.11  2.64  2.64
                   12   20  0.0  21.54  22.92  ...   28.45  3.37  3.37
                        21  0.0  22.92  23.96  ...   28.11  2.53  2.53
                   13   20  0.0  27.52  28.91  ...   34.49  3.40  3.40
                        21  0.0  28.86  29.92  ...   34.13  2.57  2.57

[5 rows x 18 columns]

Many thanks for providing this example. I just use your functions and everything seems fine now.