(Short-term) Forecasting or (Concurrent) Estimation?

sukantabasu · February 11, 2022, 10:05am

In the problem description, it has been mentioned that:

“Remember that for each observation, you may only use input values that are available before this time.”

“Remember that you are not allowed to use future data, so the time_end of an granule must be before the datetime of a given observation.”

In other words: satellite_time_end < observation_datetime_start

Thus, the problem is a short-term forecasting problem.

However, the information provided by czsc in the discussion forum is different:

“Therefore, you can use satellite data with an endtime on or before 2019-02-01T07:59:00Z for a label with datetime 2019-01-31T08:00:00Z.”

“In short, you are allowed to use data up-through (i.e. including) the date of prediction. This means that yes, you can use satellite data from the same local date.”

This statement implies: satellite_time_end < observation_datetime_end.

Thus, according to czsc, the problem is a concurrent estimation problem. It is in contradiction to the problem statement.

If the satellite data are used from the same date as observation, then we are not doing a short-term (say day-ahead or hours-ahead) forecasting.

Please clarify.

cszc · February 16, 2022, 9:00pm

Thanks for this questions @sukantabasu . Sorry about the confusion. It is correct to think about this problem as a concurrent estimation or “nowcasting” problem, i.e. satellite_time_ end < observation_datetime_ end. We will clarify the language on the website.

future-forecast · March 20, 2022, 3:53am

Hi!

We will clarify the language on the website.
Not written. So please tell us Clearly.

When and how many days should we forcast?

2.How many points every point should we forcast?
Los Angeles (South Coast Air Basin)
Delhi
Taipei

How many points at each point?

Please clarify.
Thank you!

cszc · March 20, 2022, 6:38pm

Hi @future-forecast. For details on inference, please refer to the submission format section of the problem description (pm2.5 and no2). Your submission should include a prediction for every row in submission_format.csv, where each row represents a 24 hours period for a given grid cell.

You should forecast the average concentration for the 24 hours following the timestamp. From the problem description:

The UTC datetime of the measurement in the format YYYY-MM-DDTHH:mm:ssZ . A value represents the average between 12:00am to 11:59pm local time. The datetime provided represents the start of that 24 hour period in UTC time.

You should predict for every grid_id in the submission_format.csv. You can find more information about the grid cells in the grid metadata file on the data download page.

Hope that clarifies some things!

Topic		Replies	Views
Clarification on features' dates used for prediction NASA Airathon	9	688	February 4, 2022
Problem clarification Snowcast Showdown	8	1041	December 22, 2021
Historical input/output features Snowcast Showdown	4	422	January 24, 2022
Clarification of Problem Objectives Power Laws	2	876	February 21, 2018
Take advantage /temporal data /up to the point of prediction Predict Wind Speeds of Tropical Storms	7	891	January 11, 2021

(Short-term) Forecasting or (Concurrent) Estimation?

Related topics