Questions about data set

I have a few questions about the data set:

  1. Am I correct that the Yelp data set only includes data and reviews from 2004-2011?

  2. For Phase II, we will receive a new Yelp database containing just the reviews file with reviews between 27.04.2015 - 13.06.2015?

  3. For Phase II, we will not receive any new inspection data, so there will be a 3-month gap in the inspections data since Phase I training data ends on 25.03.2015 and Phase II inspections start on 17.06.2015?


I’m very interested in these questions as well - they will greatly help us build a model for phase II if we could get the answers.

Any news regarding silogram’s questions? In addition, will more recent data be provided for “tips”? As of now, it seems to cover two years (2009 to 2011) only.

Thanks in advance.

@silogram @dkay @LilianaMedina - sorry for the wait on this, but happy to have answers.

We took your questions/feedback as a challenge and went back to our friends at Yelp for more data, and they were actually able to give us an even more robust data set.

By now, you and all the others who signed up for this competition got an e-mail about the update (content mirrored in an announcement), but we’ve released the new data and have extended the time available for the competition.

Thanks for your patience, and hope the extra time helps!

1 Like