Test Data Questions - Pandemic

jimking100 · October 24, 2022, 7:59pm

Hi,

I have two question about the test data used in the Pandemic Forecasting.

In the baseline example the test data is derived from the same data set used for training (essentially a subset of the the training data). Can we assume that this will remain the case in the contest or could the test data be derived from an entirely different set of data than what is used for training?
When we write the predictions to a .csv file I’m unclear as to whether we write predictions for just the test data or for every individual in the original data set. Could you please clarify that.

Thanks!

jayqi · October 25, 2022, 5:44pm

The “Prediction Target and Evaluation Metric” section has relevant details about the prediction target. Overall, the idea is that the prediction task is a time series problem on a single population, and the train/test split is timewise.

You should similarly expect that in the evaluation runtime, there will be a single population. Ground truth disease states will be provided for every individual for the first 56 days of the simulation. The “test split” will be predicting infection risk for every individual in the population for the following week of simulation time.

Topic		Replies	Views
Track B: quick clarification regarding train/test folders in the runtime repo PETs Prize Challenge	1	265	January 4, 2023
Power Laws Forecasting: Where is test data? Power Laws	1	805	March 18, 2018
Evaluation Data ——Financial Crime PETs Prize Challenge	3	292	December 17, 2022
Restrictions for using test data for training Sustainable Industry: Rinse Over Run	5	1191	January 17, 2019
About example solution PETs Prize Challenge	1	228	December 22, 2022

Test Data Questions - Pandemic

Related topics