Python: Graph plotting with Matplotlib (Line Graph)

This article deals with plotting line graphs with Matplotlib (a Python’s library).

In the article Machine Learning & Sentiment Analysis: Text Classification using Python & NLTK, I had described about evaluating three different classifiers’ accuracy using different feature sets. In this article, I will be using the accuracy result data obtained from that evaluation.

Below is the result data which we will use to plot line graph. The data is saved in a CSV file named result4-accuracy-blog.csv.

classifierf1-all-wordsf1-all-words-stopwordsf1-bigramsf1-bigrams-stopwords
nb-cv0.74330890870.82347607750.83936915610.8701525025
maxent-cv0.72531488380.80717900280.81746868970.8649529103
svm-cv0.87368060020.88224188340.87975462370.8942136239

As seen above, the CSV file contains accuracy result in terms of F1 measure while using cross-validation during evaluation.

Row 1 = Heading information
Row 2 = Accuracy result for Naive Bayes Classifier
Row 3 = Accuracy result for Maximum Entropy Classifier
Row 4 = Accuracy result for Support Vector Machine Classifier
Col 1 = Accuracy result using “all words” feature set
Col 2 = Accuracy result using “all words with stopword filter” feature set
Col 3 = Accuracy result using “bigram words” feature set
Col 4 = Accuracy result using “bigram words with stopword filter” feature set

Here is the full Python & Matplotlib code to plot the line graph:


import matplotlib.pyplot as plt
import csv

x = []

y_f1 = []
y_f1_nb = []
y_f1_maxent = []
y_f1_svm = []

var = [ ('all words', 'f1-all-words'), 
        ('all words+stopwords', 'f1-all-words-stopwords'), 
        ('bigrams', 'f1-bigrams'),
        ('bigrams+stopwords', 'f1-bigrams-stopwords'),      
      ]

my_x = []

for i,(key, val) in enumerate(var): 
    x.append(i+1)
    my_x.append(key)    

i = 0
with open('result4-accuracy-blog.csv') as myfile:   
    reader = csv.DictReader(myfile, delimiter=',')  
    for line in reader:     
        if i == 0:
            for (key,val) in var:
                y_f1_nb.append(float(line[val]))
        elif i == 1:
            for (key,val) in var:
                y_f1_maxent.append(float(line[val]))
        elif i == 2:
            for (key,val) in var:
                y_f1_svm.append(float(line[val]))       
        i += 1
        
fig = plt.figure()
ax = plt.subplot(111) # row x col x position (here 1 x 1 x 1)

plt.xticks(x, my_x, rotation=75) # rotate x-axis labels to 75 degree
ax.plot(x, y_f1_nb, label='f1 - naive bayes', marker='o', linestyle='-', linewidth=1.5)
ax.plot(x, y_f1_maxent, label='f1 - maximum entropy', marker='o', linestyle='--', linewidth=1.5)
ax.plot(x, y_f1_svm, label='f1 - svm', marker='o', linestyle='-.', linewidth=1.5)

plt.xlim(0, len(var)+1)
plt.tight_layout() # showing xticks (as xticks have long names)
ax.grid()

plt.title('F1 measure plot', color='#000000', weight="bold", size="large")
plt.ylabel('F1 MEASURE')
plt.xlabel('FEATURES')

ax.legend(loc='lower right', fancybox=True, shadow=True)

plt.show()

Here is the image of graph drawn:

line graph python matplotlib

Drawing two plots in a single figure

Below is the data from an experiment done for one of my projects on Recommender System. I have used Mean Absolute Error (MAE) as the evaluation metrics. 5-fold cross validation was done using different ratings (user ratings, review ratings, combined ratings). The 5 rows represent MAE value from the 5-fold cross validation result. The data is saved in a CSV file named result3-blog2.csv.

user-rating-maereview-rating-maecombined-rating-maereview-rating-mae-mlcombined-rating-mae-ml
0.76005473530.79023724190.64653611660.78369693620.7170221067
0.82378628340.78567033760.6723066410.76442750490.7477524197
0.79758648510.89033823380.72906562840.82047932970.692127153
0.95264987660.79191942160.83697514160.79316554630.8319407532
0.69294690330.862918940.61987575850.64312044880.6824413178

Here is the full Python & Matplotlib code to plot the line graph (2 line graphs in a single figure):


import matplotlib.pyplot as plt
import csv

x = []
y_stars_mae = []
y_text_round_mae = []
y_mean_round_mae = []
y_text_round_mae_ml_bigram = []
y_mean_round_mae_ml_bigram = []

i = 0
with open('result3-blog2.csv') as myfile:   
    reader = csv.DictReader(myfile, delimiter=',')  
    for line in reader: 
        i += 1
        x.append(i)
        
        y_stars_mae.append(float(line['user-rating-mae']))
        y_text_round_mae.append(float(line['review-rating-mae']))
        y_mean_round_mae.append(float(line['combined-rating-mae']))
        y_text_round_mae_ml_bigram.append(float(line['review-rating-mae-ml']))
        y_mean_round_mae_ml_bigram.append(float(line['combined-rating-mae-ml']))

fig = plt.figure()
ax = plt.subplot(121) # row x col x position
bx = plt.subplot(122)

#linestyle = '-', '-.', '--'
#color='green', 'blue', 'red'
ax.plot(x, y_stars_mae, label='user rating', linestyle='-', marker='o', linewidth=1.5)
ax.plot(x, y_text_round_mae, label='review rating', linestyle='-.', marker='o', linewidth=1.5)
ax.plot(x, y_mean_round_mae, label='combined rating', linestyle='--', marker='o', linewidth=1.5)
ax.grid()
ax.legend(loc='upper left', fancybox=True, shadow=True)
ax.set_xlim(0, 6)
ax.set_title('MAE plot 1', color='#000000', weight="bold", size="large")
ax.set_ylabel('MAE')
ax.set_xlabel('Data Sample')

bx.plot(x, y_stars_mae, label='user rating', linestyle='-', marker='o', linewidth=1.5)
bx.plot(x, y_text_round_mae_ml_bigram, label='review rating ml', linestyle='-.', marker='o', linewidth=1.5)
bx.plot(x, y_mean_round_mae_ml_bigram, label='combined rating ml', linestyle='--', marker='o', linewidth=1.5)
bx.grid()
bx.legend(loc='upper left', fancybox=True, shadow=True)
bx.set_xlim(0, 6)
bx.set_title('MAE plot 2', color='#000000', weight="bold", size="large")
bx.set_ylabel('MAE')
bx.set_xlabel('Data Sample')

plt.show()

Here is the image of graph drawn:

line graph python matplotlib

Hope this helps.
Thanks.