What are the epsilon values and dataset distributions in the final evaluation?

vvv214 · October 19, 2020, 1:29am

Hi, Just want to ask what are the epsilon values and dataset distributions in the final evaluation? Are the epsilons still 1, 2, and 10? And how is the dataset look like? I assume the schema will be the same. But is the distribution similar (or even the same)?

Christine_Task · October 19, 2020, 7:38pm

So the final epsilon evaluation values will include 1, 2 and 10, but we may also look at performance on epsilon values smaller than 1 if we need to get a clearer picture of finalists privacy/utility trade off. While you’re preparing, you might check out your performance on epsilon values between 0.1 and 1.0.

For the final data-sets, it will be the same schema and it’ll still be real baltimore police data—we won’t do anything to artificially shift the distribution of the data. But the distribution of the final data itself may vary naturally from the development phase data somewhat, and you should be cautious about overfitting.

There’s a common real world use-case for differential privacy, in which an organization has been releasing simply anonymized data for many years, but now wants to convert to formally private data going forwards. You can use the previously released anonymized data (in this case, the data set we provide you during the development phase) while you’re developing and tuning your algorithm-- but then whatever you build should still be usable for several years to come without unduly biasing the new privatized data to continue to look like the old anonymized data.

It’s good to observe and use the big, significant patterns in the development data, but you don’t want to memorize fine details.

vvv214 · October 27, 2020, 2:37pm

Hi Christine,

A follow-up question is what is the `max_records_per_individual’ (currently 20) value in the test dataset? I understand it is an input to the algorithm and can change. Can we know a rough range of it?

Christine_Task · October 28, 2020, 2:26pm

max_records_per_individual will remain at 20 for the final scoring. It has to be taken as input to the algorithm (rather than checked on the data itself) in order to satisfy differential privacy… but we will never change it (or the schema) for final scoring.

The schema and the value for max_records_per_individual in final scoring will always be the same as they were during the development phase.

vvv214 · October 28, 2020, 3:04pm

Got it. Thank you Christine!

rmckenna · November 6, 2020, 7:01pm

Quick follow-up: will the exact set of neighborhood and incident types be the same in the final dataset and have the same encoding? Will the data be for a single year again or could it span multiple years?

Christine_Task · November 6, 2020, 9:12pm

Yep, the entire schema will be exactly the same in the final scoring as it is during the development phase. Neighborhoods and incident codes will be the same. And the input data set will always be a single year worth of data (and your results should be aggregated by month).

However, to check for generalizability, we will evaluate your solutions twice, over two different years of data. Ie, your algorithm will be run on one year of data (and several epsilons, multiple times per epsilon). Then your algorithm will be run on a different year of data (and the same epsilons, multiple times per epsilon). It’s a fairly robust evaluation.

This was one of the motivations for the design of the prescreened arena leaderboard. Once you successfully submit to the prescreened arena, we’ll know your executable can run without error in our environment, and that significantly improves the efficiency of the final scoring process.

tliu64 · November 13, 2020, 4:41pm

Hi, even though the data for the evaluation is coming from different years, can we still expect the input schema to have Year=2019?

Christine_Task · November 13, 2020, 5:10pm

Hi tlieu64-- That’s a good thing to confirm. The final schema will be identical to the schema you have now for neighborhoods and call descriptions. But the value of ‘Year’ will change appropriately (for whichever year we’re testing on).

Topic		Replies	Views
Sprint 3 Results! Differential Privacy Temporal Map Challenge	2	380	June 28, 2021
IMPORTANT: Regarding final submission write-ups Differential Privacy Temporal Map Challenge	2	505	May 16, 2021
Question about sensitivity Differential Privacy Temporal Map Challenge	4	416	October 28, 2020
With final dataset domain be same as provisional dataset? Differential Privacy Temporal Map Challenge	4	450	February 11, 2021
Train and test data consistency Youth Mental Health: Automated Abstraction	11	260	October 14, 2024

What are the epsilon values and dataset distributions in the final evaluation?

Related topics