(Short-term) Forecasting or (Concurrent) Estimation?

In the problem description, it has been mentioned that:

“Remember that for each observation, you may only use input values that are available before this time.”

“Remember that you are not allowed to use future data, so the time_end of an granule must be before the datetime of a given observation.”

In other words: satellite_time_end < observation_datetime_start

Thus, the problem is a short-term forecasting problem.

However, the information provided by czsc in the discussion forum is different:

“Therefore, you can use satellite data with an endtime on or before 2019-02-01T07:59:00Z for a label with datetime 2019-01-31T08:00:00Z.”

“In short, you are allowed to use data up-through (i.e. including) the date of prediction. This means that yes, you can use satellite data from the same local date.”

This statement implies: satellite_time_end < observation_datetime_end.

Thus, according to czsc, the problem is a concurrent estimation problem. It is in contradiction to the problem statement.

If the satellite data are used from the same date as observation, then we are not doing a short-term (say day-ahead or hours-ahead) forecasting.

Please clarify.


Thanks for this questions @sukantabasu . Sorry about the confusion. It is correct to think about this problem as a concurrent estimation or “nowcasting” problem, i.e. satellite_time_ end < observation_datetime_ end. We will clarify the language on the website.

1 Like


We will clarify the language on the website.
Not written. So please tell us Clearly.

  1. When and how many days should we forcast?

2.How many points every point should we forcast?
Los Angeles (South Coast Air Basin)

How many points at each point?

Please clarify.
Thank you!

Hi @future-forecast. For details on inference, please refer to the submission format section of the problem description (pm2.5 and no2). Your submission should include a prediction for every row in submission_format.csv, where each row represents a 24 hours period for a given grid cell.

You should forecast the average concentration for the 24 hours following the timestamp. From the problem description:

The UTC datetime of the measurement in the format YYYY-MM-DDTHH:mm:ssZ . A value represents the average between 12:00am to 11:59pm local time. The datetime provided represents the start of that 24 hour period in UTC time.

You should predict for every grid_id in the submission_format.csv. You can find more information about the grid cells in the grid metadata file on the data download page.

Hope that clarifies some things!