Clarification on features' dates used for prediction

lzvam · January 31, 2022, 11:29am

it is unclear to me reading the problem description whether we are allowed to use satellite data on the same day of the label value or only from previous dates? for example, for label’s datetime 31st Jan 2019, are we allowed any satellite data taken before that date or also form the same date? thanks.

Carl_Malings · February 1, 2022, 10:18pm

I don’t have an answer, unfortunately, but rather a related question. In the problem description it states:

“You may use historical ground truth training data as feature input to your model. Note that for verification, you may only use historical data up until the point of inference in order to make your prediction.”

Does that mean that we would have access to the ground-truth data from up to the day before we are trying to predict the concentrations for, which can be used as input data into the forecast algorithm? As a more concrete example, if I am trying to predict the concentration at a location on January 2, 2021, would I be able to know the concentration on January 1, 2021 and use that to inform my forecast? Or does this only apply to data from the training period provided for the competition, but not for the testing or for any validation periods?

cszc · February 1, 2022, 10:54pm

In short, you are allowed to use data up-through (i.e. including) the date of prediction. This means that yes, you can use satellite data from the same local date.

The label’s datetime represents the start of a 24 period over which the air quality is averaged. For example, a label with datetime 2019-01-31T08:00:00Z represents an average taken from 2019-01-31T08:00:00Z to 2019-02-01T07:59:00Z (inclusive).

Therefore, you can use satellite data with an endtime on or before 2019-02-01T07:59:00Z for a label with datetime 2019-01-31T08:00:00Z. Note that this is 11:59pm local time (pacific time).

cszc · February 1, 2022, 10:59pm

Yes, this is correct. You will have access to historical ground truth data to make your predictions, e.g. you can use the concentrations from January 1, 2021 to make a prediction for January 2, 2021. Edit: This is no longer true. The use of ground truth as input is disallowed.

Carl_Malings · February 2, 2022, 2:21pm

Thank you! What is the best source for ground truth data from the testing period, and how should we structure the code to accept these data for any validation cases?

cszc · February 2, 2022, 6:32pm

Sorry @Carl_Malings! I misspoke in my previous reply. ~~You can only use historical data for the training period. It will not be available for the test or validation periods. Apologies for the confusion - I’ve edited my previous post.~~ The use of groundtruth as input data is disallowed.

Carl_Malings · February 2, 2022, 10:30pm

OK, that is what I initially suspected. Thank you!

cszc · February 3, 2022, 9:01pm

@Carl_Malings Sorry to keep changing my answer on this, but upon further review, we’ve decided not to allow any ground truth data as input to the model, for any of the train, test, or validation periods. The problem description is now updated to reflect this. We hope this simplifies things. Again, apologies for the confusion!

Carl_Malings · February 3, 2022, 10:28pm

OK. So, just to clarify: The ground-truth data is available for training, but once the forecasting method has been trained/calibrated, no ground-truth data can be used as an additional input to inform the forecasting.

Related to this: would it be allowable to “save” some of this ground data, or information derived from it (e.g., “what was the average concentration at this location across all Mondays during the training period?”) into some kind of “lookup table” which is used by the forecasting method?

cszc · February 4, 2022, 9:41pm

The ground truth from the training period is available for developing your model. You may use that model when running inference. The example you provided sounds like it reflects information on weekly averages from that period in your model, which is fine to do.

Topic		Replies	Views
(Short-term) Forecasting or (Concurrent) Estimation? NASA Airathon	3	447	March 20, 2022
Can we use time-series model? Snowcast Showdown	9	728	January 7, 2022
Historical input/output features Snowcast Showdown	4	422	January 24, 2022
What is past data? Power Laws	11	1304	March 8, 2018
Can we use location, grid_id as features NASA Airathon	4	522	March 8, 2022

Clarification on features' dates used for prediction

Related topics