I downloaded the new data from the drivensata download page. But in the new folder, business_id.txt file is missing from the new data download. It was there in the old data.
Can anybody take a look into it ?
Thanks
Hi @scigeek,
The business_id.txt
file is just a list of the unique IDs from the Yelp dataset. We’ve put a new copy that matches the latest data release up on the data download page in case it’s useful!
Peter
@scigeek: couldn’t resist the opportunity to plug jq
Here’s the one-liner:
jq .business_id yelp_academic_dataset_business.json
Which results in:
"Jp9svt7sRT4zwdbzQ8KQmw"
"CgdK8DiyX9Y4kTKEPi_qgA"
"SAnMTC1rm-PhP8DQC4zeyg"
"75dtVyDb8Sfwb7dR0cBvdg"
"0uvgsJnwyCvNpjHOEYtlyQ"
"fYWIxI6kwuVqpPu1I1baWA"
"jAjZ6CXLrXW30zXM3ZEjhg"
"eBhOMstBTgGvJak8amU91g"
"283jjsb0TMPtkFB8AnUe_g"
"KktmGOBowhaKauCA2vPGFg"
...
If you don’t want the quotes you can pass jq
the option -r
for raw output.
Note: there may be a couple of IDs here that aren’t in restaurant_ids_to_yelp_ids.csv
- that’s the authoritative mapping so I’d use that for any matching.
1 Like