Recommender System using JAVA & Apache Mahout

Apache Mahout is a project of Apache Software Foundation. Mahout helps building scalable Machine Learning applications. It primarily focuses in the areas of Collaborative Filtering, Classification, and Clustering.

Here is a very nice video tutorial on Mahout Item Recommender Tutorial using Java and Eclipse. It thoroughly explains about how to use Movielens dataset and create an Item-based recommender system to recommend certain number of most similar items for each items.

Here’s another useful tutorial about Creating a User-Based Recommender in 5 minutes along with evaluating the system. Here’s its video tutorial: Mahout User Recommender Tutorial with Eclipse and Maven

In this article, I will be showing code to evaluate a recommender system using both user-based filtering and item-based filtering. I will also be including the code to recommend similar items.

I will be fetching data from a CSV file named dataset-recsys.csv.

Download: dataset-recsys.csv

The CSV file consists of 3 columns (user_id, item_id, and rating). The first column is user_id. The second column is item_id and that can be ID of anything like hotels, movies, books, etc. The third column is rating which is the rating provided by users to items. The rating ranges from 1 to 5. 5 is considered as best rating and 1 is considered as worst rating.

Creating a Data Model

We will be using FileDataModel class to create a Data Model.


DataModel dm = new FileDataModel(new File("data/dataset-recsys.csv"));

Creating Similarity

Finding similarity depends upon what kind of filtering approach we are following. For user-based filtering approach, it’s about finding similar users to a particular user. And, for item-based filtering approach, it’s about finding similar items to a particular item.

There are different similarity measures available in Mahout. Like LogLikelihoodSimilarity, TanimotoCoefficientSimilarity, PearsonCorrelationSimilarity, and EuclideanDistanceSimilarity, etc.

User-based Similarity


UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserSimilarity similarity = new LogLikelihoodSimilarity(model);
UserSimilarity similarity = new TanimotoCoefficientSimilarity(model); 
UserSimilarity similarity = new EuclideanDistanceSimilarity(model); 
UserSimilarity similarity = new GenericUserSimilarity(model); 
UserSimilarity similarity = new SpearmanCorrelationSimilarity(model); 

Item-based Similarity


ItemSimilarity similarity = new PearsonCorrelationSimilarity(model);
ItemSimilarity similarity = new LogLikelihoodSimilarity(model);
ItemSimilarity similarity = new TanimotoCoefficientSimilarity(model); 
ItemSimilarity similarity = new EuclideanDistanceSimilarity(model); 
ItemSimilarity similarity = new GenericUserSimilarity(model); 

Neighborhood Strategy

For user-based filtering, NearestNUserNeighborhood class computes a neighborhood consisting of the nearest n users to a given user.


// here n = 100, computes 100 nearest neighbors
UserNeighborhood neighborhood = new NearestNUserNeighborhood (100, similarity, model); 

Building Recommender System

For user-based filtering:


new GenericUserBasedRecommender(model, neighborhood, similarity);

For item-based filtering:


new GenericItemBasedRecommender(model, similarity);

Evaluation

For evaluation purpose, we use evaluate function from RMSRecommenderEvaluator class to evaluate RMSE and from GenericRecommenderIRStatsEvaluator class to evaluate Precision, Recall, and F1 Score.


RecommenderEvaluator evaluator = new RMSRecommenderEvaluator();
double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
System.out.println("RMSE: " + score);

Details about parameters of the evaluate function from RMSRecommenderEvaluator class:

double evaluate(RecommenderBuilder recommenderBuilder,
DataModelBuilder dataModelBuilder,
DataModel dataModel,
double trainingPercentage,
double evaluationPercentage)
throws TasteException

trainingPercentage – percentage of each user’s preferences to use to produce recommendations; the rest are compared to estimated preference values to evaluate Recommender performance

evaluationPercentage – percentage of users to use in evaluation


RecommenderIRStatsEvaluator statsEvaluator = new GenericRecommenderIRStatsEvaluator();
IRStatistics stats = statsEvaluator.evaluate(recommenderBuilder, null, model, null, 10, 4, 0.7); // evaluate precision recall at 10
System.out.println("Precision: " + stats.getPrecision());
System.out.println("Recall: " + stats.getRecall());
System.out.println("F1 Score: " + stats.getF1Measure());        

Details about parameters of the evaluate function from GenericRecommenderIRStatsEvaluator class:

IRStatistics evaluate(RecommenderBuilder recommenderBuilder,
DataModelBuilder dataModelBuilder,
DataModel dataModel,
IDRescorer rescorer,
int at,
double relevanceThreshold,
double evaluationPercentage)
throws TasteException

dataModel – dataset to test on

rescorer – if any, to use when computing recommendations

at – as in, “precision at 5”. The number of recommendations to consider when evaluating precision, etc.

relevanceThreshold – items whose preference value is at least this value are considered “relevant” for the purposes of computations

Here is the full source code for recommending certain items to users. The code recommends 5 items to first 10 users along with the recommendation value. The code is based upon this video tutorial.


package com.chapagain.itemrecommend;

import java.io.File;
import java.io.IOException;
import java.util.List;

import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.impl.common.LongPrimitiveIterator;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.LogLikelihoodSimilarity;
import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
import org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.similarity.ItemSimilarity;

public class ItemRecommend {

	/**
	 * @param args
	 */
	public static void main(String[] args) {
		try {			
			DataModel dm = new FileDataModel(new File("data/dataset-recsys.csv"));
			
			ItemSimilarity sim = new LogLikelihoodSimilarity(dm);
			//TanimotoCoefficientSimilarity sim = new TanimotoCoefficientSimilarity(dm); 
			//ItemSimilarity sim = new TanimotoCoefficientSimilarity(dm);
			//ItemSimilarity sim = new PearsonCorrelationSimilarity(dm);
			
			GenericItemBasedRecommender recommender = new GenericItemBasedRecommender(dm, sim);			
			
			int x = 1;
			for (LongPrimitiveIterator items = dm.getItemIDs(); items.hasNext();) {
				long itemId = items.nextLong();
				List<RecommendedItem> recommendations = recommender.mostSimilarItems(itemId, 5);
				
				for (RecommendedItem recommendation : recommendations) {
					System.out.println(itemId + "," + recommendation.getItemID() + "," + recommendation.getValue());					
				}
				
				x++;
				if (x > 10) System.exit(1);	// generate recommendation for first 10 items only			
			}
			
		} catch (IOException e) {
			System.out.println("There was an error.");
			e.printStackTrace();
		} catch (TasteException e) {
			System.out.println("There was a Taste Exception.");
			e.printStackTrace();
		}

	}

}

Here is a full source code for creating and evaluating Item-based Recommender System. The evaluation is done for metrics Root Mean Square Error (RMSE), Precision, Recall, and F1 Score. Certain number of items are also recommended for a particular user.


package com.chapagain.itemrecommend;

import java.io.File;
import java.util.List;

import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.IRStatistics;
import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
import org.apache.mahout.cf.taste.eval.RecommenderEvaluator;
import org.apache.mahout.cf.taste.eval.RecommenderIRStatsEvaluator;
import org.apache.mahout.cf.taste.impl.eval.GenericRecommenderIRStatsEvaluator;
import org.apache.mahout.cf.taste.impl.eval.RMSRecommenderEvaluator;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.EuclideanDistanceSimilarity;
import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.similarity.ItemSimilarity;
import org.apache.mahout.common.RandomUtils;

public class ItemBasedRecommender {
    public static void main(String[] args) throws Exception {
    	RandomUtils.useTestSeed(); // to randomize the evaluation result   	        
        DataModel model = new FileDataModel(new File("data/dataset-recsys.csv"));
 
        RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
            public Recommender buildRecommender(DataModel model) throws TasteException {            	
            	//ItemSimilarity similarity = new EuclideanDistanceSimilarity(model);
            	ItemSimilarity similarity = new PearsonCorrelationSimilarity(model);            	
            			
            	//Optimizer optimizer = new NonNegativeQuadraticOptimizer();
                return new GenericItemBasedRecommender(model, similarity);                
            }
        };
 
        // Recommend certain number of items for a particular user
        // Here, recommending 5 items to user_id = 9
        Recommender recommender = recommenderBuilder.buildRecommender(model);
        List<RecommendedItem> recomendations = recommender.recommend(9, 5); // recommend (user_id, number_of_items_to_recommend)
        for (RecommendedItem recommendedItem : recomendations) {
            System.out.println(recommendedItem);    
        }
        
	RecommenderEvaluator evaluator = new RMSRecommenderEvaluator();		
	double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);	
	System.out.println("RMSE: " + score);
        
        RecommenderIRStatsEvaluator statsEvaluator = new GenericRecommenderIRStatsEvaluator();        
        IRStatistics stats = statsEvaluator.evaluate(recommenderBuilder, null, model, null, 10, 4, 0.7); // evaluate precision recall at 10
        
	System.out.println("Precision: " + stats.getPrecision());
	System.out.println("Recall: " + stats.getRecall());
	System.out.println("F1 Score: " + stats.getF1Measure());                
    }
}

Here is a full source code for creating and evaluating User-based Recommender System. The evaluation is done for metrics Root Mean Square Error (RMSE), Precision, Recall, and F1 Score. Certain number of items are also recommended for a particular user.


package com.chapagain.itemrecommend;

import java.io.File;
import java.util.List;

import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.IRStatistics;
import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
import org.apache.mahout.cf.taste.eval.RecommenderEvaluator;
import org.apache.mahout.cf.taste.eval.RecommenderIRStatsEvaluator;
import org.apache.mahout.cf.taste.impl.eval.AverageAbsoluteDifferenceRecommenderEvaluator;
import org.apache.mahout.cf.taste.impl.eval.GenericRecommenderIRStatsEvaluator;
import org.apache.mahout.cf.taste.impl.eval.RMSRecommenderEvaluator;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.EuclideanDistanceSimilarity;
import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
import org.apache.mahout.cf.taste.impl.similarity.SpearmanCorrelationSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;
import org.apache.mahout.common.RandomUtils;

public class UserBasedRecommender {
    public static void main(String[] args) throws Exception {
    	RandomUtils.useTestSeed(); // to randomize the evaluation result        
        DataModel model = new FileDataModel(new File("data/dataset-recsys.csv"));
 
        RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
            public Recommender buildRecommender(DataModel model) throws TasteException {            	
            	
            	UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
            	//SpearmanCorrelationSimilarity similarity = new SpearmanCorrelationSimilarity(model);
            	
                // neighborhood size = 100
            	UserNeighborhood neighborhood = new NearestNUserNeighborhood (100, similarity, model);            	
            	return new GenericUserBasedRecommender(model, neighborhood, similarity);            	
            }
        };
 
        // Recommend certain number of items for a particular user
        // Here, recommending 5 items to user_id = 9
        Recommender recommender = recommenderBuilder.buildRecommender(model);
        List<RecommendedItem> recomendations = recommender.recommend(9, 5);
        for (RecommendedItem recommendedItem : recomendations) {
            System.out.println(recommendedItem);    
        }
        
	RecommenderEvaluator evaluator = new RMSRecommenderEvaluator();
	double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
	System.out.println("RMSE: " + score);
        
        RecommenderIRStatsEvaluator statsEvaluator = new GenericRecommenderIRStatsEvaluator();
        IRStatistics stats = statsEvaluator.evaluate(recommenderBuilder, null, model, null, 10, 4, 0.7); // evaluate precision recall at 10
        
	System.out.println("Precision: " + stats.getPrecision());
	System.out.println("Recall: " + stats.getRecall());
	System.out.println("F1 Score: " + stats.getF1Measure());               
    }
}

Apache Mahout Video Tutorials – I

Apache Mahout Video Tutorials – II

Hope this helps.
Thanks.