Sprint 3: Question on column taxi_id

Hi,
Is the taxi_id column private?
Also, Will the domain for taxi_id from the dataset provided during the development phase be the same as the domain in the dataset used for the final scoring? This domain is not specified in data/parameters.json.

Ok, so, in order:

  • The taxi_id column is private.
  • The set of actual ids in the data will not be the same between the public development data and final scoring data (and it’s not expected that the number of taxis will be the same either).
  • Its domain (which is the same for both the public and final scoring data) is the set of all 7 digit ID numbers (1000000 to 9999999).

You can intuitively think of taxi_id as equivalent to an individual’s name, and the variable’s domain as the set of all possible names of length 7.

This domain is not specified in data/parameters.json .

Thanks, we updated this to be clear about the domain.