Data partitioning and preparation for Pandemic Track

Thank you for sharing the data and the code for the pandemic track.

If you could please help clarify the following with respect to data preparation scrips for logistic regression and GNN approaches -

  • Can we assume that while partitioning the data, the provided scripts take care of removing edges between individuals across two different partitions?
  • Do these partitions reflect logical silos across any administrative unit?
  • Are we to assume that both, the data partitioning and the data preparation methodologies are fixed, or would the use of innovative privacy-enhancing tech as part of the data preparation step be within the scope of this challenge?


The code provided in the NSSAC/WH-challenge-baselines repository are for centralized baseline and don’t do any partitioning.

For general documentation on how partitioning will be done, please review the following two sections of the data overview page:

In Phase 2 evaluation, all teams’ solutions will be evaluated on fixed sets of pre-made partitions. Creating the partitions is not something that solutions should be doing. Data in the evaluation environment will already be partitioned.

We will be providing scripts by the start of Phase 2 for both data tracks to help teams create similar partitions for local development using the development dataset.