Feature data from before the issue date rule still valid?

kamarain · January 10, 2024, 8:10pm

That rule makes sense in the hindcast context. But is it so for the real time forecasts? Let’s assume I’m using a predictor data X which is in monthly time resolution and is published 1st day each month. Even though it would be available at the issue date of the same day, I can not use it?

jayqi · January 10, 2024, 8:27pm

Hi @kamarain,

The Forecast Stage requirements regarding time and data use are documented here.

We are still requiring that you must only use predictor data observed through the day before the issue date. This is necessary to maximize consistency in the competition, as we do not have any way to guarantee that all participants’ jobs are run at exactly the same time. Furthermore, due to the allowance for participants to fix failed submissions, some jobs may be run several days later after the designated issue date.

mmiron · January 10, 2024, 9:58pm

Hi @jayqi,

I just realized that it’s conceivable my last submission could be interpreted as violating that rule, even though it doesn’t. Let me explain: the portion that could be a violation is only using what’s available in the mounted data drive (that part isn’t pulling in any external data). So even though my code could be a violation if it found data from a future date, that can’t ever happen in the Forecast period because it’s only using what’s available in the runtime environment; am I correct in that appraisal?

I guess that’s what I get for trying to sidestep issues about what time zones my code gets run in.

jayqi · January 10, 2024, 10:25pm

Hi @mmiron,

While the data in the mounted data drive is controlled by DrivenData and should not have any disallowed “future” data, we still ask that you design your code so that you subset to the issue date appropriately. This makes the code requirements consistent across the Hindcast Stage, Forecast Stage, and final cross-validation. This will reduce the risk of any possible problems and help us audit your code for compliance.

Regarding timezones, most data sources used in the competition do not have meaningful sub-daily time resolution, and you should just subset based on the calendar date used in the raw data. If you have any questions about a specific data source, please ask.

mmiron · January 10, 2024, 11:45pm

Given the looming deadline, thanks for your prompt reply, @jayqi. The issue I wanted to sidestep is not knowing what time of day the predictions will be made at, combined with uncertainty about the time zones of the data sources (which you’ve just cleared up for me); though now that you mention it, the issue_date doesn’t include a time of day to begin with. It seems my concern isn’t possible in the first place (I’m not perfect, what can I say.)

Topic		Replies	Views
Forecast stage clarification Water Supply Forecast Rodeo	4	235	December 15, 2023
Negative influence of the Hindcast stage. Possible fixes Water Supply Forecast Rodeo	8	493	December 14, 2023
SNOTEL data during code jobs execution Water Supply Forecast Rodeo	5	193	January 4, 2024
Discrepancy between the training data and the submission format Water Supply Forecast Rodeo	7	529	November 2, 2023
Runtime Restriction Questions Water Supply Forecast Rodeo	3	147	December 13, 2023

Feature data from before the issue date rule still valid?

Related topics