In the problem description, it has been mentioned that:
“Remember that for each observation, you may only use input values that are available before this time.”
“Remember that you are not allowed to use future data, so the time_end of an granule must be before the datetime of a given observation.”
In other words: satellite_time_end < observation_datetime_start
Thus, the problem is a short-term forecasting problem.
However, the information provided by czsc in the discussion forum is different:
“Therefore, you can use satellite data with an endtime on or before 2019-02-01T07:59:00Z for a label with datetime 2019-01-31T08:00:00Z.”
“In short, you are allowed to use data up-through (i.e. including) the date of prediction. This means that yes, you can use satellite data from the same local date.”
This statement implies: satellite_time_end < observation_datetime_end.
Thus, according to czsc, the problem is a concurrent estimation problem. It is in contradiction to the problem statement.
If the satellite data are used from the same date as observation, then we are not doing a short-term (say day-ahead or hours-ahead) forecasting.
Thanks for this questions @sukantabasu . Sorry about the confusion. It is correct to think about this problem as a concurrent estimation or “nowcasting” problem, i.e. satellite_time_ end < observation_datetime_ end. We will clarify the language on the website.
Hi @future-forecast. For details on inference, please refer to the submission format section of the problem description (pm2.5 and no2). Your submission should include a prediction for every row in submission_format.csv, where each row represents a 24 hours period for a given grid cell.
You should forecast the average concentration for the 24 hours following the timestamp. From the problem description:
The UTC datetime of the measurement in the format YYYY-MM-DDTHH:mm:ssZ . A value represents the average between 12:00am to 11:59pm local time. The datetime provided represents the start of that 24 hour period in UTC time.
You should predict for every grid_id in the submission_format.csv. You can find more information about the grid cells in the grid metadata file on the data download page.