Data partitioning and preparation for Pandemic Track

ambrish · September 1, 2022, 10:00pm

Thank you for sharing the data and the code for the pandemic track.

If you could please help clarify the following with respect to data preparation scrips for logistic regression and GNN approaches -

Can we assume that while partitioning the data, the provided scripts take care of removing edges between individuals across two different partitions?
Do these partitions reflect logical silos across any administrative unit?
Are we to assume that both, the data partitioning and the data preparation methodologies are fixed, or would the use of innovative privacy-enhancing tech as part of the data preparation step be within the scope of this challenge?

Thanks,

jayqi · September 1, 2022, 10:22pm

The code provided in the NSSAC/WH-challenge-baselines repository are for centralized baseline and don’t do any partitioning.

For general documentation on how partitioning will be done, please review the following two sections of the data overview page:

In Phase 2 evaluation, all teams’ solutions will be evaluated on fixed sets of pre-made partitions. Creating the partitions is not something that solutions should be doing. Data in the evaluation environment will already be partitioned.

We will be providing scripts by the start of Phase 2 for both data tracks to help teams create similar partitions for local development using the development dataset.

Topic		Replies	Views
Track B: quick clarification regarding train/test folders in the runtime repo PETs Prize Challenge	1	265	January 4, 2023
Test Data Questions - Pandemic PETs Prize Challenge	1	244	October 25, 2022
Specifying different hyperparameters for different federated scenarios PETs Prize Challenge	1	222	January 25, 2023
Labels test data of evaluation sets PETs Prize Challenge	3	261	February 14, 2023
Track B: caching preprocessed data PETs Prize Challenge	4	225	January 25, 2023

Data partitioning and preparation for Pandemic Track

Related topics