With final dataset domain be same as provisional dataset?

I understand the PUMA codes will be different in the final dataset but can we assume every other column have the same set of possible values in the final dataset as it does in the provisional dataset (according to the parameters.json file)?

That’s basically correct. PUMA will change, and it’s not inconceivable that there might be a small adjustment to YEAR, but all other columns will have the same set of possible values listed in the parameters.json file.

Are we allowed to use ACS data from before 2012 as public data?

Hi @tliu64 - Thanks for checking on this. This falls under the second bullet below copied from the competition rules, so would not be allowed.

External data sets and pre-trained models are allowed for use in the competition provided the following are satisfied:

  • the external data and pre-trained models are freely and publicly available to all participants under a permissive open source license;
  • the data does not have the same schema or source as the data set(s) provided in the contest; and
  • their source and usage are defined in the algorithm description, and they have been approved during the differential privacy Pre-screening Process.