Python: Read Write JSON

In this article, I will be showing how to read data from a JSON file and write data to a new JSON file using Python programming language.

Suppose, I have the following JSON data which is saved in a file named sample-1.json.

Download: sample-1.json


{"votes": {"funny": 25, "useful": 59, "cool": 29}, "user_id": "1", "name": "User A", "average_stars": 3.45614035087719, "review_count": 57, "type": "user"}
{"votes": {"funny": 3, "useful": 19, "cool": 10}, "user_id": "2", "name": "User B", "average_stars": 4.3653846153846096, "review_count": 52, "type": "user"}
{"votes": {"funny": 0, "useful": 7, "cool": 1}, "user_id": "3", "name": "User C", "average_stars": 3.7272727272727302, "review_count": 12, "type": "user"}
{"votes": {"funny": 29, "useful": 39, "cool": 22}, "user_id": "4", "name": "User D", "average_stars": 4.1153846153846096, "review_count": 26, "type": "user"}
{"votes": {"funny": 4, "useful": 6, "cool": 4}, "user_id": "5", "name": "User E", "average_stars": 3.4285714285714302, "review_count": 7, "type": "user"}
{"votes": {"funny": 0, "useful": 0, "cool": 1}, "user_id": "1", "review_id": "1", "stars": 5, "date": "2009-06-09", "text": "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus rutrum auctor ante, eget condimentum purus pulvinar id.", "type": "review", "business_id": "6"}
{"votes": {"funny": 0, "useful": 0, "cool": 0}, "user_id": "2", "review_id": "2", "stars": 5, "date": "2011-02-17", "text": "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus rutrum auctor ante, eget condimentum purus pulvinar id.", "type": "review", "business_id": "7"}
{"votes": {"funny": 1, "useful": 2, "cool": 1}, "user_id": "3", "review_id": "3", "stars": 4, "date": "2009-09-15", "text": "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus rutrum auctor ante, eget condimentum purus pulvinar id.", "type": "review", "business_id": "8"}
{"votes": {"funny": 0, "useful": 0, "cool": 0}, "user_id": "4", "review_id": "4", "stars": 4, "date": "2010-09-19", "text": "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus rutrum auctor ante, eget condimentum purus pulvinar id.", "type": "review", "business_id": "9"}
{"votes": {"funny": 0, "useful": 0, "cool": 0}, "user_id": "5", "review_id": "5", "stars": 4, "date": "2012-07-13", "text": "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus rutrum auctor ante, eget condimentum purus pulvinar id.", "type": "review", "business_id": "10"}
{"business_id": "6", "categories": ["Massage", "Beauty and Spas"], "name": "Business A", "review_count": 4, "stars": 4.0, "type": "business"}
{"business_id": "7", "categories": ["Tattoo", "Beauty and Spas"], "name": "Business B", "review_count": 2, "stars": 4.0, "type": "business"}
{"business_id": "8", "categories": ["Music & DVDs", "Books, Mags, Music and Video", "Shopping"], "name": "Business C", "review_count": 3, "stars": 3.5, "type": "business"}
{"business_id": "9", "categories": ["Food", "Coffee & Tea"], "name": "Business D", "review_count": 85, "stars": 3.5, "type": "business"}
{"business_id": "10", "categories": ["Property Management", "Home Services", "Real Estate"], "name": "Business E", "review_count": 8, "stars": 3.5, "type": "business"}

As you can see from the above JSON data that each JSON entry has a key ‘type‘ having value user, review, or business. The type key distinguishes whether the entry is about user, business, or review.

From this JSON file, we will be selecting entries based on “type”, i.e. we will be selecting entries for user, entries for business, and entries for review. And then, we will be saving each of them in a separate JSON files.

We will be creating a new JSON file (sample-user.json) for users and saving all user data in it. Similarly, we will be creating a new JSON file (sample-business.json) for businesses and saving all business data in it.

However, there is a slight difference logic for reviews. We will filter the reviews by category. We will only select those reviews which falls under Food or Beauty and Spas category and save them to a new JSON file (sample-review.json).

Here’s the full source code:


import json
from pprint import pprint

user_dict = {}
review_dict = {}
business_dict = {}

with open('sample-1.json', 'r') as f:
    for line in f:				
		js = json.loads(line)		
		
		if (js['type'] == 'review'):			
			review_dict[str(js['business_id'])] = {}				
			review_dict[str(js['business_id'])][str(js['user_id'])] = {'user_id': str(js['user_id']), 'business_id': str(js['business_id']), 'stars': str(js['stars']), 'text': js['text'].encode('ascii', 'ignore').decode('ascii')}
					
		if (js['type'] == 'user'):
			user_dict[str(js['user_id'])] = {}
			user_dict[str(js['user_id'])] = {'user_id': str(js['user_id']), 'name': js['name'], 'review_count': str(js['review_count'])}
		
		if (js['type'] == 'business'):			
			if ('Food' in js['categories'] or 'Beauty and Spas' in js['categories']): # selecting only 'Food' and 'Beauty and Spas' category
				business_dict[str(js['business_id'])] = {}
				business_dict[str(js['business_id'])] = {'business_id': str(js['business_id']), 'name': js['name'], 'review_count': str(js['review_count'])}
		

# if you want to print the dictionaries
# pprint(user_dict)		
# pprint(business_dict)
# pprint(review_dict)

# Create a new json file named 'sample-user.json' 
# and save user information from user_dict dictionary
with open('sample-user.json', 'w') as myfile:
	for key,val in user_dict.iteritems():		
		#json.dump(val, myfile)
		json.dump(val, myfile, sort_keys = True, indent = 4) # more readable format		
		myfile.write('\n')


# Create a new json file named 'sample-business.json' 
# and save business information from business_dict dictionary
with open('sample-business.json', 'w') as myfile:
	for key,val in business_dict.iteritems():		
		#json.dump(val, myfile)
		json.dump(val, myfile, sort_keys = True, indent = 4) # more readable format		
		myfile.write('\n')


# Create a new json file named 'sample-review.json' 
# and save review information from review_dict dictionary
with open('sample-review.json', 'w') as myfile:
	for key,val in review_dict.iteritems():	
		for k,v in val.iteritems():	
			#json.dump(v, myfile)
			json.dump(v, myfile, sort_keys = True, indent = 4) # more readable format		
			myfile.write('\n')

Here are the user, business and review JSON files created from above code:

sample-user.json | sample-business.json | sample-review.json

Hope this helps.
Thanks.