Thanks for your hard work and great submissions–we’re seeing a variety of clever and creative approaches! We’ve now kicked off final scoring and differential privacy validation.
If we run into any questions, we’ll be reaching out by email, so keep an eye on your inbox for the next few weeks. We’ll be announcing winners next month, along with final scoring data (so you’ll get feedback on how your submissions scored for each final data set and epsilon value).
And then after the holidays, in January, we’ll be starting Sprint 2 (keep an eye on your email for the registration link, or check back with drivendata). If you’d like to know what that might be like, here’s a bit of a spoiler– try registering for our sister competition, “A Better Meterstick for Differential Privacy” (on HeroX), and checking out the example demographic data-set here: https://www.herox.com/bettermeterstick/resource/560
While you’re over there, considering forwarding the metrics challenge over to any data-science or data-user folks you know (or participating yourself!).
We’ll be sharing the complete Sprint 2 data-set on January 6th, when Sprint 2 kicks off. It will have a larger feature set than the metric challenge example demographic data, along with simulated individuals (most with more than 5 records), time segments (years), and more map segments (PUMA). Because we’ll be using more features, we’ll also be changing your output data format and our scoring approach. We’ll be collecting a list of records within each time/map segment (ie, synthetic data), rather than total counts of record types. And our scoring will be based on the 3-marginal metric (think randomized 3-dimensional pie chart, we’ll have more details on that later as well).
If you haven’t worked with synthetic data before, you might like to look through some of the solutions for the NIST 2018 Differential Privacy Synthetic Data Challenge, which you can find here (scroll down to the Challenge section):
A combination of the intuition and tricks you’ve all developed this sprint for dealing with temporal data, plus some of the intuition and tricks that are helpful for dealing with larger feature sets, will likely prove very handy as we ramp things up in the next sprint. We’ll be releasing some guidance and things to think about (as well as the Sprint 1 winners announcement!) between now and the end of the year. Note that participating in Sprint 1 isn’t a requirement-- Everyone is welcome to join for Sprint 2.
Have a good holiday break, and we’ll see you at Sprint 2 next year!