Below is the step-by-step beginner guide to conduct experiment on any Recommender System research that contains some work on Natural Language Processing (NLP) as well. So, this can be a guide to NLP research work as well specifically for Sentiment Analysis. Recommender System research can include users’ reviews text and process them using Sentiment Analysis & Machine Learning techniques.
Before starting the research on Recommender Systems or NLP experiment with data, we should be very clear about the things highlighted below. If these things are understood properly then it will be easy to understand other research papers while doing literature review.
Note: We should be clear that in any research work there should be a baseline for the experiment. And, then you propose a new/modified framework assuming that it is better than the baseline framework. Then, you perform the experiment and evaluate both baseline and your proposed framework. The evaluation result shows if your proposed framework is better than the baseline or not.
I will be listing out the important steps and the techniques used in each step. The description of each and every techniques and terms are not provided.
1) Data Preparation
a) Dataset can created custom by applications or people. Or, there can be freely available datasets online.
b) Pre-processing data of the dataset (converting every letter to lowercase, removing punctuation, etc.) is necessary before using it in experiement.
2) Training a Classifier & Sentiment Analysis
Certain amount of positive and negative reviews/text can be used to create/train a classifier. A trained classifier is able to classify document/text into different categories.
There are different types of classifiers, like:
a) Naive Bayes classifier
b) Logistic Regression classifier
c) Decision Trees classifier
d) Support Vector Machine (SVM) classifier
e) Maximum Entropy classifier
To train a classifier, a feature set is required.
There are different feature selection methods, like:
a) Bag of Words
b) Term Frequency and Inverse Document Frequency (TF-IDF)
c) Word to Vector
d) Latent Semantic Indexing (LSI)
e) Latent Dirchlet Allocation (LDA)
f) Dependency Structure Tree Analysis
Once the classifier is trained with the fixed set of positive and negative text reviews, we can then use this trained classifer to classify new review text. The classifier should now be able to classify text reviews as positive or negative.
The classifier can classify words as positive or negative or neutral. Now, we need to perform sentiment analysis on the whole review text.
Sentiment Analysis on texts can be performed in various ways, like:
a) Document level analysis
b) Sentence level analysis
c) Entity and Aspect level analysis
For analyzing the overall sentiment (whether positive or negative) of a review, we can use “Sentence level analysis“. Sentiment Analysis can be performed for each sentence of the review. If we need to analyze different entity and aspects of the review then “Entity and Aspect level analysis” would be useful.
3) Recommendation Techniques
In recommendation system research, recommendation techniques are used to recommend certain number of items to users. Generally there are three types of recommendation techniques:
a) Content Based Filtering (CBF) technique
b) Collaborative Filtering (CF) technique
i) User-based Collaborative Filtering
ii) Item-based Collaborative Filteringc) Hybrid technique
Hybrid technique is a combination of Content Based and Collaborative Filtering technique.
While using the above recommendation techniques, similarity method need to be used to compute similarity between two items.
There are different types of similarity methods, like:
a) Euclidean distance
b) Manhattan distance
c) Minkowski distance
d) Cosine similarity
e) Jaccard similarity
4) Evaluation
After the recommender system is able to recommend items to users, the final work will be to figure out how accurate is the recommendation.
Validation metrics are used to measure the accuracy of the recommendation systems. There are two types of accuracy validation metrics.
a) Predictive Accuracy Metrics
This includes different types of accuracy metrics like:
i) Mean Absolute Error (MAE)
ii) Root Mean Square Error (RMSE)b) Decision-support Accuracy Metrics
This includes different types of accuracy metrics like:
i) Precision
ii) Recall
iii) F1-measure (F1 Score)
5) Result & Conclusion
At the end, you need to compare the recommendation accuracy of your proposed framework to that of the baseline of your experiment. This comparison will show if your proposed framework works better than the baseline or not.
Hope this will provide a good insight to proceed further on this field.