Python: Twitter Sentiment Analysis using TextBlob

This article shows how you can perform Sentiment Analysis on Twitter Tweet Data using Python and TextBlob.

TextBlob provides an API that can perform different Natural Language Processing (NLP) tasks like Part-of-Speech Tagging, Noun Phrase Extraction, Sentiment Analysis, Classification (Naive Bayes, Decision Tree), Language Translation and Detection, Spelling Correction, etc.

TextBlob is built upon Natural Language Toolkit (NLTK).

Sentiment Analysis means analyzing the sentiment of a given text or document and categorizing the text/document into a specific class or category (like positive and negative). In other words, we can say that sentiment analysis classifies any particular text or document as positive or negative. Basically, the classification is done for two classes: positive and negative. However, we can add more classes like neutral, highly positive, highly negative, etc.

Installing TextBlob

You have to run the following command to install TextBlob:

pip install -U textblob
python -m textblob.download_corpora

Simple TextBlob Sentiment Analysis Example

We will see a simple textblob example that does Sentiment Analysis on any given text. The sentiment property gives the sentiment scores to the given text. There are two scores given: Polarity and Subjectivity.

The polarity score is a float within the range [-1.0, 1.0] where negative value indicates negative text and positive value indicates that the given text is positive.

The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.

from textblob import TextBlob

text = TextBlob("It was a wonderful movie. I liked it very much.")

print (text.sentiment)
print ('polarity: {}'.format(text.sentiment.polarity))
print ('subjectivity: {}'.format(text.sentiment.subjectivity))
'''
Output:

Sentiment(polarity=0.62, subjectivity=0.6866666666666666)
polarity: 0.62
subjectivity: 0.686666666667
'''

text = TextBlob("I liked the acting of the lead actor but I didn't like the movie overall.")
print (text.sentiment)
'''
Output:

Sentiment(polarity=0.19999999999999998, subjectivity=0.26666666666666666)
'''

text = TextBlob("I liked the acting of the lead actor and I liked the movie overall.")
print (text.sentiment)
'''
Output:

Sentiment(polarity=0.3, subjectivity=0.4)
'''

Using NLTK’s Twitter Corpus

  • We use the twitter_samples corpus to train the TextBlob's NaiveBayesClassifier.
  • Using the twitter_samples corpus, we create a train set and test set containing a certain amount of positive and negative tweets.
  • And, then we test the accuracy of the trained classifier.
from nltk.corpus import twitter_samples
print (twitter_samples.fileids())
'''
Output:

['negative_tweets.json', 'positive_tweets.json', 'tweets.20150430-223406.json']
'''

pos_tweets = twitter_samples.strings('positive_tweets.json')
print (len(pos_tweets)) # Output: 5000

neg_tweets = twitter_samples.strings('negative_tweets.json')
print (len(neg_tweets)) # Output: 5000

#all_tweets = twitter_samples.strings('tweets.20150430-223406.json')
#print (len(all_tweets)) # Output: 20000

# positive tweets words list
pos_tweets_set = []
for tweet in pos_tweets:
    pos_tweets_set.append((tweet, 'pos'))

# negative tweets words list
neg_tweets_set = []
for tweet in neg_tweets:
    neg_tweets_set.append((tweet, 'neg'))

print (len(pos_tweets_set), len(neg_tweets_set)) # Output: (5000, 5000)

# radomize pos_reviews_set and neg_reviews_set
# doing so will output different accuracy result everytime we run the program
from random import shuffle 
shuffle(pos_tweets_set)
shuffle(neg_tweets_set)

Create Train and Test Set

Just for this example, we create a small train and test set:

  • test set = 200 tweets (100 positive + 100 negative)
  • train set = 400 tweets (200 positive + 200 negative)

Note that, large training set will result in high classification accuracy. So, larger training data set is always better.

# test set = 200 tweets (100 positive + 100 negative)
# train set = 400 tweets (200 positive + 200 negative)
test_set = pos_tweets_set[:100] + neg_tweets_set[:100]
train_set = pos_tweets_set[100:300] + neg_tweets_set[100:300]

print(len(test_set),  len(train_set)) # Output: (200, 400)

# train classifier
from textblob.classifiers import NaiveBayesClassifier
classifier = NaiveBayesClassifier(train_set)

Training the Classifier & Calculating Accuracy

# calculate accuracy
accuracy = classifier.accuracy(test_set)
print (accuracy) # Output: 0.715

# show most frequently occurring words
print (classifier.show_informative_features(10))
'''
Output:

Most Informative Features
           contains(not) = True              neg : pos    =      6.6 : 1.0
          contains(love) = True              pos : neg    =      6.3 : 1.0
           contains(day) = True              pos : neg    =      5.7 : 1.0
            contains(no) = True              neg : pos    =      5.4 : 1.0
            contains(na) = True              neg : pos    =      5.0 : 1.0
        contains(Thanks) = True              pos : neg    =      3.7 : 1.0
           contains(why) = True              neg : pos    =      3.7 : 1.0
         contains(happy) = True              pos : neg    =      3.7 : 1.0
         contains(never) = True              neg : pos    =      3.7 : 1.0
        contains(though) = True              neg : pos    =      3.7 : 1.0
'''

text = "It was a wonderful movie. I liked it very much."
print (classifier.classify(text)) # Output: pos

text = "I don't like movies having happy ending."
print (classifier.classify(text)) # Output: neg

text = "The script was predictable. However, it was a wonderful movie. I liked it very much."
blob = TextBlob(text, classifier=classifier)

print (blob) # Output: The script was predictable. However, it was a wonderful movie. I liked it very much.
print (blob.classify()) # Output: pos

for sentence in blob.sentences:
    print ("{} ({})".format(sentence, sentence.classify()))
'''
Output:

The script was predictable. (neg)
However, it was a wonderful movie. (pos)
I liked it very much. (pos)
''' 

Hope this helps. Thanks.