You’re already aware of this, but just to be official about things-- Final submissions (with executable, source code, final write-up/proof, and code-guide) are due into to the prescreened leaderboard by 8pm EST Monday, 2/22. We’ve got a few people already into to the prescreened leaderboard, and we’re looking forward to welcoming the rest of you. It’s a lovely weekend for privacy-preserving data science!
Let us know if you have any last minute questions, or run into any difficulty submitting. We’ll check in on this thread periodically over the weekend.
And one more note-- remember your final write-ups will be getting a very thorough read through for differential privacy validation. We’ll be reaching out if we run into any issues (keep an eye on your inbox!). Feel free to break out the LaTex, but remember that what you really want to aim for is saying things in simple, clear, well-defined steps. Imagine explaining them to your roommate. Here’s points to keep in mind:
(1) Write out your algorithm clearly, step by step, in straightforward english (and/or math).
You can have technical supporting material or background, but at some point in your write-up you should have explicitly written out the simple sequence of recipe steps that your algorithm performs. You want to imagine writing lecture notes for students, and try to make your process as easy for us to follow as possible, rather than writing as you would for formal publication.
(2) Explain, in simple terms, how your algorithm handles the fact that one individual can contribute up to seven records.
This is tricky and very easy to make mistakes on. So for your sake and ours, it’s best to write this down in very simple, clear terms that will make it easy to catch any oversights. Remember that we’re not providing privacy at the per-record level, we’re providing it at the per-individual level. Neighboring worlds can differ by seven records (all tied to the same individual). Note– If you realize at the last minute that you’ve made a serious mistake here, rather than not submitting for final scoring you can always fall back to pruning the input data down to a single year (each individual appears at most once per year in this survey data-set, so this behaves like one-record-per-individual), and then generating multiple years of data from the model you trained on that single year. That’s obviously not ideal as a real world solution, but it will satisfy differential privacy.
(3) Include a "codeguide"
Either in your write-up or in a separate readme file, include a guide that maps your algorithm description to your source code. This should say where in the code we can find each major step of the algorithm, and also indicate which parts of the code handle (i) preprocessing (handles each individual separately and has sensitivity cost 0), (ii) privatization (processes the input data, accounts for sensitivity cost and adds noise accordingly), and (iii) post-processing (only touches the privatized data).
(4) Cite references
If your algorithm is based on previously published research (either wholly, or you’re using particular results), please include pointers to those papers (links to publicly accessible pdfs are ideal). The DP validation process is thorough, and if we’re not already familiar with the results you’re using we’ll be reading up on them ourselves. It’ll help a lot if you can include the citation up front!
(5) (remember to comment your source code)
This isn’t actually part of your write-up, but it definitely helps us with the review. Please be merciful to your reviewers and make sure your source code has comments, and helpful function and variable names. Of course, you want to be cautious about any last minute refactoring, it’s very easy to end up accidentally breaking everything if you try to clean up your code in a rush. But at least getting some good comments in there will help us out a lot, and that makes it more likely you won’t be getting “requests for clarification” from us in the next couple weeks.
And that’s it. Happy privatizing! We’re looking forward to seeing what all you’ve got, and to handing out prizes next month.