This article shows how you can perform Sentiment Analysis on Twitter Tweet Data using Python and TextBlob.
TextBlob provides an API that can perform different Natural Language Processing (NLP) tasks like Part-of-Speech Tagging, Noun Phrase Extraction, Sentiment Analysis, Classification (Naive Bayes, Decision Tree), Language Translation and Detection, Spelling Correction, etc.
TextBlob is built upon Natural Language Toolkit (NLTK).
Sentiment Analysis means analyzing the sentiment of a given text or document and categorizing the text/document into a specific class or category (like positive and negative). In other words, we can say that sentiment analysis classifies any particular text or document as positive or negative. Basically, the classification is done for two classes: positive and negative. However, we can add more classes like neutral, highly positive, highly negative, etc.
Installing TextBlob
You have to run the following command to install TextBlob:
pip install -U textblob
python -m textblob.download_corpora
Simple TextBlob Sentiment Analysis Example
We will see a simple textblob example that does Sentiment Analysis on any given text. The sentiment
property gives the sentiment scores to the given text. There are two scores given: Polarity and Subjectivity.
The
polarity
score is a float within the range [-1.0, 1.0] where negative value indicates negative text and positive value indicates that the given text is positive.The
subjectivity
is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.
from textblob import TextBlob
text = TextBlob("It was a wonderful movie. I liked it very much.")
print (text.sentiment)
print ('polarity: {}'.format(text.sentiment.polarity))
print ('subjectivity: {}'.format(text.sentiment.subjectivity))
'''
Output:
Sentiment(polarity=0.62, subjectivity=0.6866666666666666)
polarity: 0.62
subjectivity: 0.686666666667
'''
text = TextBlob("I liked the acting of the lead actor but I didn't like the movie overall.")
print (text.sentiment)
'''
Output:
Sentiment(polarity=0.19999999999999998, subjectivity=0.26666666666666666)
'''
text = TextBlob("I liked the acting of the lead actor and I liked the movie overall.")
print (text.sentiment)
'''
Output:
Sentiment(polarity=0.3, subjectivity=0.4)
'''
Using NLTK’s Twitter Corpus
- We use the
twitter_samples
corpus to train theTextBlob's NaiveBayesClassifier
. - Using the twitter_samples corpus, we create a train set and test set containing a certain amount of positive and negative tweets.
- And, then we test the accuracy of the trained classifier.
from nltk.corpus import twitter_samples
print (twitter_samples.fileids())
'''
Output:
['negative_tweets.json', 'positive_tweets.json', 'tweets.20150430-223406.json']
'''
pos_tweets = twitter_samples.strings('positive_tweets.json')
print (len(pos_tweets)) # Output: 5000
neg_tweets = twitter_samples.strings('negative_tweets.json')
print (len(neg_tweets)) # Output: 5000
#all_tweets = twitter_samples.strings('tweets.20150430-223406.json')
#print (len(all_tweets)) # Output: 20000
# positive tweets words list
pos_tweets_set = []
for tweet in pos_tweets:
pos_tweets_set.append((tweet, 'pos'))
# negative tweets words list
neg_tweets_set = []
for tweet in neg_tweets:
neg_tweets_set.append((tweet, 'neg'))
print (len(pos_tweets_set), len(neg_tweets_set)) # Output: (5000, 5000)
# radomize pos_reviews_set and neg_reviews_set
# doing so will output different accuracy result everytime we run the program
from random import shuffle
shuffle(pos_tweets_set)
shuffle(neg_tweets_set)
Create Train and Test Set
Just for this example, we create a small train and test set:
- test set = 200 tweets (100 positive + 100 negative)
- train set = 400 tweets (200 positive + 200 negative)
Note that, large training set will result in high classification accuracy. So, larger training data set is always better.
# test set = 200 tweets (100 positive + 100 negative)
# train set = 400 tweets (200 positive + 200 negative)
test_set = pos_tweets_set[:100] + neg_tweets_set[:100]
train_set = pos_tweets_set[100:300] + neg_tweets_set[100:300]
print(len(test_set), len(train_set)) # Output: (200, 400)
# train classifier
from textblob.classifiers import NaiveBayesClassifier
classifier = NaiveBayesClassifier(train_set)
Training the Classifier & Calculating Accuracy
# calculate accuracy
accuracy = classifier.accuracy(test_set)
print (accuracy) # Output: 0.715
# show most frequently occurring words
print (classifier.show_informative_features(10))
'''
Output:
Most Informative Features
contains(not) = True neg : pos = 6.6 : 1.0
contains(love) = True pos : neg = 6.3 : 1.0
contains(day) = True pos : neg = 5.7 : 1.0
contains(no) = True neg : pos = 5.4 : 1.0
contains(na) = True neg : pos = 5.0 : 1.0
contains(Thanks) = True pos : neg = 3.7 : 1.0
contains(why) = True neg : pos = 3.7 : 1.0
contains(happy) = True pos : neg = 3.7 : 1.0
contains(never) = True neg : pos = 3.7 : 1.0
contains(though) = True neg : pos = 3.7 : 1.0
'''
text = "It was a wonderful movie. I liked it very much."
print (classifier.classify(text)) # Output: pos
text = "I don't like movies having happy ending."
print (classifier.classify(text)) # Output: neg
text = "The script was predictable. However, it was a wonderful movie. I liked it very much."
blob = TextBlob(text, classifier=classifier)
print (blob) # Output: The script was predictable. However, it was a wonderful movie. I liked it very much.
print (blob.classify()) # Output: pos
for sentence in blob.sentences:
print ("{} ({})".format(sentence, sentence.classify()))
'''
Output:
The script was predictable. (neg)
However, it was a wonderful movie. (pos)
I liked it very much. (pos)
'''
Hope this helps. Thanks.