In this article, I will be showing how to read data from a CSV file and write data to a new CSV file using Python programming language.
Here’s a scenario: Suppose, I have a CSV file with 3 columns (user_id, item_id, and star_rating). It has data of 100 users. Each user has rated 20 different items. So, altogether there are 100 * 20 = 2000 entries in the CSV file.
Download: Input CSV File (CSV file which we will be working on)
Here’s how the CSV file looks like:
user_id | item_id | star_rating |
---|---|---|
1 | 31 | 3 |
1 | 28 | 3 |
1 | 20 | 4 |
1 | 34 | 1 |
Now, I have a requirement to select single entry for each user, i.e. I have to select a total of 100 entries having 100 different users.
For this, I will first read the CSV file and create a Python dictionary having single entry for each user. Then, I will use that dictionary to write values to a new CSV file.
Here’s the full source code:
I am reading data from CSV file named dataset-recsys.csv and writing to a new CSV file named dataset-recsys-new.csv.
import csv
from pprint import pprint
dataset = {} # new dictionary
with open('dataset-recsys.csv') as myfile: # reading data from csv file
reader = csv.DictReader(myfile, delimiter=',')
i = 0
for line in reader:
i += 1
if (i == 1): # skip header
continue
if (int(line['user_id']) not in dataset): # add user_id to dictionary
dataset[int(line['user_id'])] = {}
if (len(dataset[int(line['user_id'])]) == 1): # adding only one row for each user_id
continue
row = {'user_id': line['user_id'], 'item_id': line['item_id'], 'star_rating': line['star_rating']}
dataset[int(line['user_id'])][int(line['item_id'])] = row
print 'Reading Successful'
# pprint(dataset) # if you like to print the dictionary
fieldnames = ['user_id', 'item_id', 'star_rating']
with open('dataset-recsys-new.csv', "a") as myfile: # writing data to new csv file
writer = csv.DictWriter(myfile, delimiter = ',', fieldnames = fieldnames)
writer.writeheader()
star = 0
for key,val in dataset.iteritems():
for k,v in val.iteritems():
writer.writerow(v)
print 'Writing Successful'
Download: Output CSV File (CSV file that is the output from above code)
Hope this helps.
Thanks.