I acknowledge that the Yelp data is valuable. However, I’m curious if anyone has had success in ignoring the Yelp data and using only training_labels.txt? Has anyone beat the linear regression benchmark score of 1.1386 using this approach?
For the phase 1 test set, my first submission was exactly that if I remember correctly, and I just barely beat the benchmark. However, that only works for the phase 1 test set and would not make any sense otherwise.