This article shows how you can perform Sentiment Analysis on Twitter Real-Time Tweets Data using Python and TextBlob.
I have written one article on similar topic on Sentiment Analysis on Tweets using TextBlob. In that article, I had written on using TextBlob and Sentiment Analysis using the NLTK’s Twitter Corpus.
In this article, we will be using GetOldTweets-python package to fetch/search tweets.
GetOldTweets-python lets you:
– fetch tweets by any user
– search tweets for any text term
– search tweets between any dates
– get tweets by location
– and more…GetOldTweets-python also has the feature to export tweets to a CSV file so that you can first save the tweets and process on the saved tweets.
TextBlob provides an API that can perform different Natural Language Processing (NLP) tasks like Part-of-Speech Tagging, Noun Phrase Extraction, Sentiment Analysis, Classification (Naive Bayes, Decision Tree), Language Translation and Detection, Spelling Correction, etc.
TextBlob is built upon Natural Language Toolkit (NLTK).
Sentiment Analysis means analyzing the sentiment of a given text or document and categorizing the text/document into a specific class or category (like positive and negative). In other words, we can say that sentiment analysis classifies any particular text or document as positive or negative. Basically, the classification is done for two classes: positive and negative. However, we can add more classes like neutral, highly positive, highly negative, etc.
Installing TextBlob
You have to run the following command to install TextBlob:
pip install -U textblob
python -m textblob.download_corpora
Simple TextBlob Sentiment Analysis Example
We will see a simple TextBlob example that does Sentiment Analysis on any given text. The sentiment
property gives the sentiment scores to the given text. There are two scores given: Polarity and Subjectivity.
The
polarity
score is a float within the range [-1.0, 1.0] where negative value indicates negative text and positive value indicates that the given text is positive.The
subjectivity
is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.
from textblob import TextBlob
text = TextBlob("It was a wonderful movie. I liked it very much.")
print (text.sentiment)
print ('polarity: {}'.format(text.sentiment.polarity))
print ('subjectivity: {}'.format(text.sentiment.subjectivity))
'''
Output:
Sentiment(polarity=0.62, subjectivity=0.6866666666666666)
polarity: 0.62
subjectivity: 0.686666666667
'''
text = TextBlob("I liked the acting of the lead actor but I didn't like the movie overall.")
print (text.sentiment)
'''
Output:
Sentiment(polarity=0.19999999999999998, subjectivity=0.26666666666666666)
'''
text = TextBlob("I liked the acting of the lead actor and I liked the movie overall.")
print (text.sentiment)
'''
Output:
Sentiment(polarity=0.3, subjectivity=0.4)
'''
Using GetOldTweets-python to fetch Tweets
- You can simply clone the GetOldTweets-python repository
- Go inside the cloned repository directory
- Run the
Main.py
file which contains the example code
python Main.py
Note:
At the time of writing this article, the GetOldTweets-python repository does not support adding Language filter to the search query. However, there is a pull request which adds support to language-based query. It’s not merged to the main branch until writing this article. Hope, it gets merged to the main branch soon.
You can refer to this fork of GetOldTweets-python for language search support.
Searching Tweets for our own Search Term
I am writing this code inside the GetOldTweets-python cloned repository folder.
First of all, we have to import the “got” package. GetOldTweets-python has different packages python 2 and python 3.
import sys
if sys.version_info[0] < 3:
import got
else:
import got3 as got
After this, we can write the code to fetch tweets.
Let’s try to search 15 tweets for the term IPLAuction between 2018-01-27 and 2018-01-28.
tweetCriteria = got.manager.TweetCriteria().setQuerySearch('IPLAuction').setSince("2018-01-27").setUntil("2018-01-28").setMaxTweets(15)
# You can use "setLang" only if the package supports language-based search query
# I have written about the language-based query support in above Note
# tweetCriteria = got.manager.TweetCriteria().setQuerySearch('IPLAuction').setSince("2018-01-27").setUntil("2018-01-28").setMaxTweets(15).setLang('en')
# get first fetched tweet
tweet = got.manager.TweetManager.getTweets(tweetCriteria)[0]
# print result
print (tweet.username) # output: Live_All_Sports
print (tweet.text) # output: IPL's greatest batsman (Gayle) and greatest bowler (Malinga) goes UNSOLD!! #IPLAuction
print (tweet.retweets) # output: 1
print (tweet.mentions) # output:
print (tweet.hashtags) # output: #IPLAuction
# print all tweets
tweets = got.manager.TweetManager.getTweets(tweetCriteria)
for tweet in tweets:
print (tweet.text + '\n')
'''
Output:
IPL's greatest batsman (Gayle) and greatest bowler (Malinga) goes UNSOLD!! #IPLAuction
#CSK needs more chennai players @SPFleming7 .. Washington Sundar and Baba Aparjith please #IPLAuction #WhistlePodu
Pretty zinta is like shewag... Power play hitter.. purchased almost all the players in first one hour... #IPLAuction
How many people outside of India even know IPL is the biggest sports corruption on the face of the Earth??? Most senior players are aware but hush because they make BIG BUCKS! #IPLAuction
Who will get Evan Lewis in their side? #IPLAuction
Singh is King join Chennai super king#IPLAuction
"It's mind boggling really. This is comfortably life changing": Chris Woakes on IPL auction http:// ift.tt/2BxWDHv
One thing shocked me in #IPLAuction #DelhiDaredevils buy #ChrisMorris for Rs 1100 Lakh's And #SunrisersHyderabad takes #Yusufpathan for Rs 190 Lakh's @MazherArshad @bhogleharsha #ipl #IPL2018 #IPL2018Auction #IPLAuction2018
Buying players is so middle class, let's buy trophy #IPLAuction pic.twitter.com/HbhJEqibiP
Yeyyyyy.... CSK. Pension Scheme.. Bedi Prasanna please register to IPL Auction
Which @IPL Franchise has formulated perfect Squad for #IPL2018 ? #IPLAuction2018 #IPLRetention @ChennaiIPL @mipaltan @KKRiders @SunRisers #IPLAuction
#IPLAuction players ke shopping
Costliest player till now. 1900 cr #IPLAuction pic.twitter.com/bkRiw7hn8K
In Some Parallel Universe #IPLAuction
It's shameful that so much of money is wasted on shit like #IPLAuction . such sums of money can easily solve so many public problems in India. High time Indians came out of their addiction to cricket and movies.
'''
Clean Tweets
Let’s write a function to clean tweets. We remove mentions, hashtags, URL links, and punctuations from the tweets using regular-expression.
import re # importing regex
import string
def clean_tweet(tweet):
'''
Remove unncessary things from the tweet
like mentions, hashtags, URL links, punctuations
'''
# remove old style retweet text "RT"
tweet = re.sub(r'^RT[\s]+', '', tweet)
# remove hyperlinks
tweet = re.sub(r'https?:\/\/.*[\r\n]*', '', tweet)
# remove hashtags
# only removing the hash # sign from the word
tweet = re.sub(r'#', '', tweet)
# remove mentions
tweet = re.sub(r'@[A-Za-z0-9]+', '', tweet)
# remove punctuations like quote, exclamation sign, etc.
# we replace them with a space
tweet = re.sub(r'['+string.punctuation+']+', ' ', tweet)
return tweet
# testing clean_tweet function
sample_tweet = "One thing shocked me in #IPLAuction #DelhiDaredevils buy #ChrisMorris for Rs 1100 Lakh's And #SunrisersHyderabad takes #Yusufpathan for Rs 190 Lakh's @MazherArshad @bhogleharsha #ipl #IPL2018 #IPL2018Auction #IPLAuction2018"
print (clean_tweet(sample_tweet))
'''
Output:
One thing shocked me in IPLAuction DelhiDaredevils buy ChrisMorris for Rs 1100 Lakh s And SunrisersHyderabad takes Yusufpathan for Rs 190 Lakh s ipl IPL2018 IPL2018Auction IPLAuction2018
'''
Get Sentiment of the Tweet
We create a new function which returns the sentiment of a given tweet, i.e. whether a given tweet is positive, negative, or neutral.
We pass the cleaned tweet text to the TextBlob class which creates a TextBlob object. It contains sentiment polarity and subjectivity of the text. Polarity greater than zero is positive, lesser than zero is negative and equal to zero can be considered as neutral.
def get_tweet_sentiment(tweet):
'''
Get sentiment value of the tweet text
It can be either positive, negative or neutral
'''
# create TextBlob object of the passed tweet text
blob = TextBlob(clean_tweet(tweet))
# get sentiment
if blob.sentiment.polarity > 0:
sentiment = 'positive'
elif blob.sentiment.polarity < 0:
sentiment = 'negative'
else:
sentiment = 'neutral'
return sentiment
# testing tweet sentiment
sample_tweet = "One thing shocked me in #IPLAuction #DelhiDaredevils buy #ChrisMorris for Rs 1100 Lakh's And #SunrisersHyderabad takes #Yusufpathan for Rs 190 Lakh's @MazherArshad @bhogleharsha #ipl #IPL2018 #IPL2018Auction #IPLAuction2018"
print (get_tweet_sentiment(sample_tweet)) # Output: negative
Process Tweets
We create a new function which gets tweet sentiment and returns an array of tweets and their respective sentiment value.
def get_processed_tweets(tweets):
'''
Get array of processed tweets containing
the tweet text and its sentiment value
'''
processed_tweets = []
for tweet in tweets:
tweet_dict = {}
tweet_dict['text'] = tweet.text
tweet_dict['sentiment'] = get_tweet_sentiment(tweet.text)
# if the tweet contains retweet
# then only append the single tweet
# and don't append the retweets of the same tweet
if tweet.retweets > 0:
if tweet_dict not in processed_tweets:
processed_tweets.append(tweet_dict)
else:
processed_tweets.append(tweet_dict)
return processed_tweets
# getting tweets with sentiment value
tweetCriteria = got.manager.TweetCriteria().setQuerySearch('IPLAuction').setSince("2018-01-27").setUntil("2018-01-28").setMaxTweets(10).setLang('en')
tweets = got.manager.TweetManager.getTweets(tweetCriteria)
tweets_with_sentiment = get_processed_tweets(tweets)
for item in tweets_with_sentiment:
print (item)
print ('')
'''
Output:
{'text': "IPL's greatest batsman (Gayle) and greatest bowler (Malinga) goes UNSOLD!! #IPLAuction", 'sentiment': 'positive'}
{'text': 'Who will get Evan Lewis in their side? #IPLAuction', 'sentiment': 'neutral'}
{'text': 'Singh is King join Chennai super king#IPLAuction', 'sentiment': 'positive'}
{'text': '#CSK needs more chennai players @SPFleming7 .. Washington Sundar and Baba Aparjith please #IPLAuction #WhistlePodu', 'sentiment': 'positive'}
{'text': '#IPLAuction Kolkata Knight Riders https:// fb.me/8hEEFPASg', 'sentiment': 'neutral'}
{'text': 'Pretty zinta is like shewag... Power play hitter.. purchased almost all the players in first one hour... #IPLAuction', 'sentiment': 'positive'}
{'text': 'In Some Parallel Universe #IPLAuction', 'sentiment': 'neutral'}
{'text': 'Yeyyyyy.... CSK. Pension Scheme.. Bedi Prasanna please register to IPL Auction', 'sentiment': 'neutral'}
{'text': "One thing shocked me in #IPLAuction #DelhiDaredevils buy #ChrisMorris for Rs 1100 Lakh's And #SunrisersHyderabad takes #Yusufpathan for Rs 190 Lakh's @MazherArshad @bhogleharsha #ipl #IPL2018 #IPL2018Auction #IPLAuction2018", 'sentiment': 'negative'}
{'text': "Buying players is so middle class, let's buy trophy #IPLAuction pic.twitter.com/HbhJEqibiP", 'sentiment': 'neutral'}
'''
Get percentage of positive, negative, and neutral tweets
Above, we got the sentiment value of each tweet we fetch. The sentiment value can be either positive, negative, or neutral.
Now, let’s get the percentage and count of positive, negative, and neutral tweets.
Here, we fetch 1000 tweets and process them.
tweetCriteria = got.manager.TweetCriteria().setQuerySearch('IPLAuction').setSince("2018-01-27").setUntil("2018-01-28").setMaxTweets(1000).setLang('en')
tweets = got.manager.TweetManager.getTweets(tweetCriteria)
tweets_with_sentiment = get_processed_tweets(tweets)
positive_tweets = []
for tweet in tweets_with_sentiment:
if tweet['sentiment'] == 'positive':
positive_tweets.append(tweet)
# The above for loop can be shortened using List Comprehension
# positive_tweets = == 'positive']
negative_tweets = []
for tweet in tweets_with_sentiment:
if tweet['sentiment'] == 'negative':
negative_tweets.append(tweet)
# The above for loop can be shortened using List Comprehension
# negative_tweets = == 'negative']
neutral_tweets = []
for tweet in tweets_with_sentiment:
if tweet['sentiment'] == 'neutral':
neutral_tweets.append(tweet)
# The above for loop can be shortened using List Comprehension
# neutral_tweets = == 'neutral']
positive_percent = 100 * len(positive_tweets) / len(tweets_with_sentiment)
negative_percent = 100 * len(negative_tweets) / len(tweets_with_sentiment)
neutral_percent = 100 * len(neutral_tweets) / len(tweets_with_sentiment)
print ('Postive Tweets | Count: {} , Percent: {} %' . format(len(positive_tweets), positive_percent))
print ('Negative Tweets | Count: {} , Percent: {} %' . format(len(negative_tweets), negative_percent))
print ('Neutral Tweets | Count: {} , Percent: {} %' . format(len(neutral_tweets), neutral_percent))
'''
Output:
Postive Tweets | Count: 515 , Percent: 51 %
Negative Tweets | Count: 80 , Percent: 8 %
Neutral Tweets | Count: 401 , Percent: 40 %
'''
Hope this helps. Thanks.