Discrepancy between the training data and the submission format

We are given training data for annual streamflow values, but are being asked to provide forecasts for weekly streamflow values. I’m not sure how this is possible. Is there streamflow data that we are missing? Clarification would be appreciated.

I have the same question. Is it common in time series problems to train on yearly data and predict for a different time period? It seems odd.

Hi @yanteixeira, @jitters: please carefully review the “Forecasting task” section on the problem description page.

The forecast target variable is indeed an annual cumulative streamflow volume. The dates in the submission format correspond to the issue date of the forecast, i.e., the date that the forecast is being made. You are being asked to produce multiple forecasts per year. A forecast issued on January 1 for the April–July cumulative streamflow has very different predictor information available vs. a forecast issued on March 1 for the April–July cumulative streamflow.

In the context of this problem, the annual physical conditions (weather, snow, etc.) are different enough year-to-year that we consider each year to be independent sets of observations. The task is not to forecast a contiguous time series across years, but to forecast within years using data from each year.

There is antecedent (past) streamflow data that is relevant but has not been provided yet. We will have more detail on this soon. In addition, successful solutions will be expected to incorporate a wide variety of other data. Keep an eye out on the “Approved data sources” page for updates.

4 Likes

@jayqi Thank you for clarifying!

To have a valid forecast then should each prediction only use data that was measured before the target forecast date? For example, the forecast for March 8th, 2017 shouldn’t have data that was measured after (or including) March 8th, 2017?

Thank you for your response; it made things a bit clearer.
The problem lies in the training data, as we are supposed to forecast on the 1st, 8th, 15th, and 22nd of each month. We should have daily or at least weekly training data

To have a valid forecast then should each prediction only use data that was measured before the target forecast date?

@jitters Each prediction should only use data measured before the issue date. (Not inclusive of the issue date itself.) We’ve updated the problem description with a new section that discusses this explicitly in more detail.

The problem lies in the training data, as we are supposed to forecast on the 1st, 8th, 15th, and 22nd of each month. We should have daily or at least weekly training data

@AnasBam Past monthly time series data for the target variable is now available on the data download page. Additionally, a new section discussing this data has been added to the problem description.

@jayqi Thanks for this clarification, makes sense. Additionally, the problem statement mentions ‘Training on years in the test set is prohibited’, but what about training on the months of water year (october, november, december) of the previous year. For example for issue date of 01/01/2022, can we train just on months; october, november, december of previous year (2021) although 2021 is on test set?
From the new monthly naturalized flow data, I am guessing this is ok, but want to make certain of this.

1 Like

Additionally, the problem statement mentions ‘Training on years in the test set is prohibited’, but what about training on the months of water year (october, november, december) of the previous year. For example for issue date of 01/01/2022, can we train just on months; october, november, december of previous year (2021) although 2021 is on test set?

Hi @saket. Yes, that is correct. The months Oct 2021, Nov 2021, and Dec 2021 are part of the 2022 water year, and accordingly associated with the 2022 seasonal water supply forecast. You may use feature data from those months when training on a 2022 seasonal water supply observation.

By similar logic, Oct 2022, Nov 2022, and Dec 2022 are associated with the 2023 seasonal water supply forecast, and 2023 is a test forecast year. You may not train on the 2023 seasonal water supply ground truth label for the Hindcast Stage.