Hi, Just want to ask what are the epsilon values and dataset distributions in the final evaluation? Are the epsilons still 1, 2, and 10? And how is the dataset look like? I assume the schema will be the same. But is the distribution similar (or even the same)?
So the final epsilon evaluation values will include 1, 2 and 10, but we may also look at performance on epsilon values smaller than 1 if we need to get a clearer picture of finalists privacy/utility trade off. While you’re preparing, you might check out your performance on epsilon values between 0.1 and 1.0.
For the final data-sets, it will be the same schema and it’ll still be real baltimore police data—we won’t do anything to artificially shift the distribution of the data. But the distribution of the final data itself may vary naturally from the development phase data somewhat, and you should be cautious about overfitting.
There’s a common real world use-case for differential privacy, in which an organization has been releasing simply anonymized data for many years, but now wants to convert to formally private data going forwards. You can use the previously released anonymized data (in this case, the data set we provide you during the development phase) while you’re developing and tuning your algorithm-- but then whatever you build should still be usable for several years to come without unduly biasing the new privatized data to continue to look like the old anonymized data.
It’s good to observe and use the big, significant patterns in the development data, but you don’t want to memorize fine details.
A follow-up question is what is the `max_records_per_individual’ (currently 20) value in the test dataset? I understand it is an input to the algorithm and can change. Can we know a rough range of it?
max_records_per_individual will remain at 20 for the final scoring. It has to be taken as input to the algorithm (rather than checked on the data itself) in order to satisfy differential privacy… but we will never change it (or the schema) for final scoring.
The schema and the value for max_records_per_individual in final scoring will always be the same as they were during the development phase.
Got it. Thank you Christine!
Quick follow-up: will the exact set of neighborhood and incident types be the same in the final dataset and have the same encoding? Will the data be for a single year again or could it span multiple years?
Yep, the entire schema will be exactly the same in the final scoring as it is during the development phase. Neighborhoods and incident codes will be the same. And the input data set will always be a single year worth of data (and your results should be aggregated by month).
However, to check for generalizability, we will evaluate your solutions twice, over two different years of data. Ie, your algorithm will be run on one year of data (and several epsilons, multiple times per epsilon). Then your algorithm will be run on a different year of data (and the same epsilons, multiple times per epsilon). It’s a fairly robust evaluation.
This was one of the motivations for the design of the prescreened arena leaderboard. Once you successfully submit to the prescreened arena, we’ll know your executable can run without error in our environment, and that significantly improves the efficiency of the final scoring process.
Hi, even though the data for the evaluation is coming from different years, can we still expect the input schema to have Year=2019?
Hi tlieu64-- That’s a good thing to confirm. The final schema will be identical to the schema you have now for neighborhoods and call descriptions. But the value of ‘Year’ will change appropriately (for whichever year we’re testing on).