Test Data Questions - Pandemic


I have two question about the test data used in the Pandemic Forecasting.

  1. In the baseline example the test data is derived from the same data set used for training (essentially a subset of the the training data). Can we assume that this will remain the case in the contest or could the test data be derived from an entirely different set of data than what is used for training?

  2. When we write the predictions to a .csv file I’m unclear as to whether we write predictions for just the test data or for every individual in the original data set. Could you please clarify that.


Hi @jimking100,

The “Prediction Target and Evaluation Metric” section has relevant details about the prediction target. Overall, the idea is that the prediction task is a time series problem on a single population, and the train/test split is timewise.

You should similarly expect that in the evaluation runtime, there will be a single population. Ground truth disease states will be provided for every individual for the first 56 days of the simulation. The “test split” will be predicting infection risk for every individual in the population for the following week of simulation time.