Feature data from before the issue date rule still valid?

That rule makes sense in the hindcast context. But is it so for the real time forecasts? Let’s assume I’m using a predictor data X which is in monthly time resolution and is published 1st day each month. Even though it would be available at the issue date of the same day, I can not use it?

Hi @kamarain,

The Forecast Stage requirements regarding time and data use are documented here.

We are still requiring that you must only use predictor data observed through the day before the issue date. This is necessary to maximize consistency in the competition, as we do not have any way to guarantee that all participants’ jobs are run at exactly the same time. Furthermore, due to the allowance for participants to fix failed submissions, some jobs may be run several days later after the designated issue date.

1 Like

Hi @jayqi,

I just realized that it’s conceivable my last submission could be interpreted as violating that rule, even though it doesn’t. Let me explain: the portion that could be a violation is only using what’s available in the mounted data drive (that part isn’t pulling in any external data). So even though my code could be a violation if it found data from a future date, that can’t ever happen in the Forecast period because it’s only using what’s available in the runtime environment; am I correct in that appraisal?

I guess that’s what I get for trying to sidestep issues about what time zones my code gets run in. :expressionless:

Hi @mmiron,

While the data in the mounted data drive is controlled by DrivenData and should not have any disallowed “future” data, we still ask that you design your code so that you subset to the issue date appropriately. This makes the code requirements consistent across the Hindcast Stage, Forecast Stage, and final cross-validation. This will reduce the risk of any possible problems and help us audit your code for compliance.

Regarding timezones, most data sources used in the competition do not have meaningful sub-daily time resolution, and you should just subset based on the calendar date used in the raw data. If you have any questions about a specific data source, please ask.

1 Like

Given the looming deadline, thanks for your prompt reply, @jayqi. The issue I wanted to sidestep is not knowing what time of day the predictions will be made at, combined with uncertainty about the time zones of the data sources (which you’ve just cleared up for me); though now that you mention it, the issue_date doesn’t include a time of day to begin with. It seems my concern isn’t possible in the first place (I’m not perfect, what can I say.)