Home » JAVA, Recommender System14 March 2016

Recommender System using JAVA & Apache Mahout

Apache Mahout is a project of Apache Software Foundation. Mahout helps building scalable Machine Learning applications. It primarily focuses in the areas of Collaborative Filtering, Classification, and Clustering.

Here is a very nice video tutorial on Mahout Item Recommender Tutorial using Java and Eclipse. It thoroughly explains about how to use Movielens dataset and create an Item-based recommender system to recommend certain number of most similar items for each items.

Here’s another useful tutorial about Creating a User-Based Recommender in 5 minutes along with evaluating the system. Here’s its video tutorial: Mahout User Recommender Tutorial with Eclipse and Maven

In this article, I will be showing code to evaluate a recommender system using both user-based filtering and item-based filtering. I will also be including the code to recommend similar items.

I will be fetching data from a CSV file named dataset-recsys.csv.

Download: dataset-recsys.csv

The CSV file consists of 3 columns (user_id, item_id, and rating). The first column is user_id. The second column is item_id and that can be ID of anything like hotels, movies, books, etc. The third column is rating which is the rating provided by users to items. The rating ranges from 1 to 5. 5 is considered as best rating and 1 is considered as worst rating.

Creating a Data Model

We will be using FileDataModel class to create a Data Model.

Creating Similarity

Finding similarity depends upon what kind of filtering approach we are following. For user-based filtering approach, it’s about finding similar users to a particular user. And, for item-based filtering approach, it’s about finding similar items to a particular item.

There are different similarity measures available in Mahout. Like LogLikelihoodSimilarity, TanimotoCoefficientSimilarity, PearsonCorrelationSimilarity, and EuclideanDistanceSimilarity, etc.

User-based Similarity

Item-based Similarity

Neighborhood Strategy

For user-based filtering, NearestNUserNeighborhood class computes a neighborhood consisting of the nearest n users to a given user.

Building Recommender System

For user-based filtering:

For item-based filtering:

Evaluation

For evaluation purpose, we use evaluate function from RMSRecommenderEvaluator class to evaluate RMSE and from GenericRecommenderIRStatsEvaluator class to evaluate Precision, Recall, and F1 Score.

Details about parameters of the evaluate function from RMSRecommenderEvaluator class:

double evaluate(RecommenderBuilder recommenderBuilder,
DataModelBuilder dataModelBuilder,
DataModel dataModel,
double trainingPercentage,
double evaluationPercentage)
throws TasteException

trainingPercentage – percentage of each user’s preferences to use to produce recommendations; the rest are compared to estimated preference values to evaluate Recommender performance

evaluationPercentage – percentage of users to use in evaluation

Details about parameters of the evaluate function from GenericRecommenderIRStatsEvaluator class:

IRStatistics evaluate(RecommenderBuilder recommenderBuilder,
DataModelBuilder dataModelBuilder,
DataModel dataModel,
IDRescorer rescorer,
int at,
double relevanceThreshold,
double evaluationPercentage)
throws TasteException

dataModel – dataset to test on

rescorer – if any, to use when computing recommendations

at – as in, “precision at 5”. The number of recommendations to consider when evaluating precision, etc.

relevanceThreshold – items whose preference value is at least this value are considered “relevant” for the purposes of computations

Here is the full source code for recommending certain items to users. The code recommends 5 items to first 10 users along with the recommendation value. The code is based upon this video tutorial.

Here is a full source code for creating and evaluating Item-based Recommender System. The evaluation is done for metrics Root Mean Square Error (RMSE), Precision, Recall, and F1 Score. Certain number of items are also recommended for a particular user.

Here is a full source code for creating and evaluating User-based Recommender System. The evaluation is done for metrics Root Mean Square Error (RMSE), Precision, Recall, and F1 Score. Certain number of items are also recommended for a particular user.

Apache Mahout Video Tutorials – I

Apache Mahout Video Tutorials – II

Hope this helps.
Thanks.

Recommender System

Get New Post by Email

Find me on

FacebookTwitterGoogle+LinkedInRSS Feed