Forecast stage questions

Hi, I have a few questions related to Forecast stage:

  1. Can we update the model or retrain the model after testing phase of Forecast stage?
  2. During the evaluation phase of Forecast stage, what if the code fails? e.g. API error, connection issue, change of data format, and other unexpected things that do not happen during testing phase

Thanks

3 Likes

Hi @rasyidstat,

You will only be able to update and retrain the model during the testing phase. Once the testing phase ends and the evaluation phase begins, models should not be changed.

There will be ways to address unexpected issues, such as rerunning failed jobs or allow limited fixes that do not substantively change the model. More details about this will be provided later.

2 Likes

Hi @jayqi, I have additional questions

  1. It’s only allowed to fix one issue date. What if the next issue date got errors, will we get disqualified? Or will we always use the previous issue date forecast?
  2. Data is already available in the mounted volume. When is the data updated or downloaded? Is it incremental or replaced? Do you always use skip_existing=True?
  3. Since two issue dates are used as testing, does it mean that Forecast stage result will be evaluated only from 26 issue dates with total 26 x 26 sites = 676 records?
  4. For bonus prize (long lead time and regional), is it evaluated based on Forecast stage result or LOOCV Overall stage result?

Hi @rasyidstat,

  1. It’s only allowed to fix one issue date. What if the next issue date got errors, will we get disqualified? Or will we always use the previous issue date forecast?

This is not correct. You will be allowed to submit a fix for any and all issue dates that have an error.

The “one allowance” refers to there being a time limit to fix a given failed job—specifically, you will have until the next issue date. If you don’t make a fix within the time limit, you are given one allowance to make a late fix. If you have failed jobs that never get addressed, they will have previous predictions filled forward, so you will generally not get disqualified for failures.

  1. Data is already available in the mounted volume. When is the data updated or downloaded? Is it incremental or replaced? Do you always use skip_existing=True?

The data will be downloaded on the issue date. It will replace existing data, meaning that if some data source updates previous values, then we will download the updated values.

  1. Since two issue dates are used as testing, does it mean that Forecast stage result will be evaluated only from 26 issue dates with total 26 x 26 sites = 676 records?

That is close to correct. It’s actually 4 fewer because detroit_lake_inflow 's season ends in June, so it does not have any issue dates in July.

  1. For bonus prize (long lead time and regional), is it evaluated based on Forecast stage result or LOOCV Overall stage result?

The details for this aren’t fully finalized yet, but they part of the Overall evaluation that includes the LOOCV.

Hi @jayqi, thanks for the answers.

The “one allowance” refers to there being a time limit to fix a given failed job—specifically, you will have until the next issue date. If you don’t make a fix within the time limit, you are given one allowance to make a late fix. If you have failed jobs that never get addressed, they will have previous predictions filled forward, so you will generally not get disqualified for failures.

I am afraid that for late fix, there will be more competitive advantage since it uses new data being downloaded.

Imagine this scenario:

  • Issue date = 2023-01-08. Data is downloaded. Somehow, the latest record for USGS site A is 2023-01-05 (2 days gap)
  • Code error for issue date = 2023-01-08
  • Issue date = 2023-01-15. Data is downloaded and replaced. The latest record for USGS site A is 2023-01-12 (2 days gap)
  • Also code error for issue date = 2023-01-15
  • Fix the error for issue date = 2023-01-08 and 2023-01-15. Forecast for issue date in 2023-01-08 uses new downloaded data of USGS site A 2023-01-07 while in reality others have data up until 2023-01-05

Hi @jayqi, I have additional questions

  1. For what severity, will we return the error? download issue, format issue, missing data issue?
  2. If there’s no error, can we update our submission code to change preprocessing logic and handle missing data or any data issue? And also request to rerun the issue date with known data issue? Imagine a scenario where there is an error in sensor of weather site, the value is very abnormal, let say become 9999. Our code runs fine. But we want to fix it as the data input does not make sense and rerun the issue date. Can we do that?
  3. At which exact time mounted volume download code run and inference code run?
  4. It’s stated that inference code can run later than issue date. Is there any max time limit between, e.g. no later than 4 days of issue date? And does it mean that download code also will run at later date?
  5. Is it permitted to download the data by ourselves if the data is already available in mounted volume? Should we always use mounted volume data whenever possible?
  6. Is there a chance that a specific issue date will be not evaluated for scoring? Why this can happen?

Hi @rasyidstat,

Thanks for your patience while the challenge organizers have been reviewing your questions.

I am afraid that for late fix, there will be more competitive advantage since it uses new data being downloaded.

Challenge organizers anticipate that the advantage gained from occasional fixes possibly having access to more recent data will be relatively small and limited. We will be monitoring job failures and fixes, and teams that are deliberately failing jobs in order to run submissions later to access more up-to-date data may be subject to disqualification.

  1. For what severity, will we return the error? download issue, format issue, missing data issue?

In general, we are expecting that submissions should not deliberately fail for any reason. We expect that by default, your submission should run automatically without any manual intervention. Fixes are being allowed to address unforeseen runtime errors.

  1. If there’s no error, can we update our submission code to change preprocessing logic and handle missing data or any data issue? And also request to rerun the issue date with known data issue? Imagine a scenario where there is an error in sensor of weather site, the value is very abnormal, let say become 9999. Our code runs fine. But we want to fix it as the data input does not make sense and rerun the issue date. Can we do that?

For data issues that do not result in any runtime error, we are generally not permitting you to update your code. You may want to consider designing your solution defensively to handle such cases.

In exceptional cases, challenge organizers may consider interventions, such as rerunning submissions after a data source fixes a data issue or excluding an issue date. These situations will be evaluated on a case-by-case basis while considering the impact on the fairness and quality of the competition. In general, active interventions will only be made for exceptional and extreme circumstances, and you should generally not expect that they will happen.

  1. At which exact time mounted volume download code run and inference code run?

  2. It’s stated that inference code can run later than issue date. Is there any max time limit between, e.g. no later than 4 days of issue date? And does it mean that download code also will run at later date?

We are not guaranteeing any exact time for when the mounted volume data is downloaded, or when admin-scheduled jobs occur. In general, we expect they will happen some time during the day of the issue day, U.S. time.

  1. Is it permitted to download the data by ourselves if the data is already available in mounted volume? Should we always use mounted volume data whenever possible?

We discourage you from downloading data that is already available in the mounted volume redundantly, and we encourage you to use the mounted volume data whenever possible. This will improve consistency between runs, no matter when they happen. It will reduce the likelihood of a failure from a data issue, as well as reducing the likelihood that you unintentionally do anything that is not permitted.

  1. Is there a chance that a specific issue date will be not evaluated for scoring? Why this can happen?

Currently, only 2024-01-01 and 2024-01-08, which are trial issue dates during the open submission period, will be excluded from your score. However, challenge organizers reserve the possibility that other issue dates may be excluded as a result of unforeseen circumstances. This will be evaluated on a case-by-case basis and will consider the impact on the fairness and quality of the competition. This would be an exceptional situation, and you should assume by default that it won’t happen.

1 Like