I’m wondering if we could submit multiple times to test the algorithm for the final scoring. I know it may leak the information of sensitive data during the final scoring if multiple submission is allowed, but I am still wondering if there are other data sources other than the 2019 data or any other approaches to evaluate the algorithm.
So you’re welcome to resubmit to the prescreened arena as much as you’d like, in that way it’s just like the open arena-- in the prescreened arena your code is still running on the same publicly available 2019 data, the only difference is that we’re running your code for you–so you have a chance to catch any bugs/hard-coded file paths/etc before final submission in early november.
We won’t let your algorithm touch the withheld private data until you’re invited submit your final version (along with final write-up/proof and source code) on November 15th for final scoring, at which point there will be no leaderboard updates until final scoring is complete. You won’t be able to see the results of your code on the withheld data until everyone has passed (or failed) final DP validation, scoring is complete for all submissions on all epsilons, and we announce the prizes.
You only have the 2019 data to work with while you’re developing your algorithm-- which I understand is frustrating… but also realistic for many real world use cases (where it’s equally frustrating, but we have to deal with it). When you only have one data-set, but you wish you had more (in order to estimate variance between data sets), one trick you can use is called bootstrapping: https://en.wikipedia.org/wiki/Bootstrapping_(statistics)
Thanks, Christine! Thanks for clarifying the rule. The bootstrapping method also looks useful!