business_id.txt file is missing in the new data

I downloaded the new data from the drivensata download page. But in the new folder, business_id.txt file is missing from the new data download. It was there in the old data.
Can anybody take a look into it ?
Thanks

Hi @scigeek,

The business_id.txt file is just a list of the unique IDs from the Yelp dataset. We’ve put a new copy that matches the latest data release up on the data download page in case it’s useful!

Peter

@scigeek: couldn’t resist the opportunity to plug jq :smiley_cat:

Here’s the one-liner:

jq .business_id yelp_academic_dataset_business.json

Which results in:

"Jp9svt7sRT4zwdbzQ8KQmw"
"CgdK8DiyX9Y4kTKEPi_qgA"
"SAnMTC1rm-PhP8DQC4zeyg"
"75dtVyDb8Sfwb7dR0cBvdg"
"0uvgsJnwyCvNpjHOEYtlyQ"
"fYWIxI6kwuVqpPu1I1baWA"
"jAjZ6CXLrXW30zXM3ZEjhg"
"eBhOMstBTgGvJak8amU91g"
"283jjsb0TMPtkFB8AnUe_g"
"KktmGOBowhaKauCA2vPGFg"
...

If you don’t want the quotes you can pass jq the option -r for raw output.

Note: there may be a couple of IDs here that aren’t in restaurant_ids_to_yelp_ids.csv - that’s the authoritative mapping so I’d use that for any matching.

1 Like