Apache Mahout is a project of Apache Software Foundation. Mahout helps building scalable Machine Learning applications. It primarily focuses in the areas of Collaborative Filtering, Classification, and Clustering.
Here is a very nice video tutorial on Mahout Item Recommender Tutorial using Java and Eclipse. It thoroughly explains about how to use Movielens dataset and create an Item-based recommender system to recommend certain number of most similar items for each items.
Here’s another useful tutorial about Creating a User-Based Recommender in 5 minutes along with evaluating the system. Here’s its video tutorial: Mahout User Recommender Tutorial with Eclipse and Maven
In this article, I will be showing code to evaluate a recommender system using both user-based filtering and item-based filtering. I will also be including the code to recommend similar items.
I will be fetching data from a CSV file named dataset-recsys.csv.
Download: dataset-recsys.csv
The CSV file consists of 3 columns (user_id, item_id, and rating). The first column is user_id. The second column is item_id and that can be ID of anything like hotels, movies, books, etc. The third column is rating which is the rating provided by users to items. The rating ranges from 1 to 5. 5 is considered as best rating and 1 is considered as worst rating.
Creating a Data Model
We will be using FileDataModel class to create a Data Model.
DataModel dm = new FileDataModel(new File("data/dataset-recsys.csv"));
Creating Similarity
Finding similarity depends upon what kind of filtering approach we are following. For user-based filtering approach, it’s about finding similar users to a particular user. And, for item-based filtering approach, it’s about finding similar items to a particular item.
There are different similarity measures available in Mahout. Like LogLikelihoodSimilarity, TanimotoCoefficientSimilarity, PearsonCorrelationSimilarity, and EuclideanDistanceSimilarity, etc.
User-based Similarity
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserSimilarity similarity = new LogLikelihoodSimilarity(model);
UserSimilarity similarity = new TanimotoCoefficientSimilarity(model);
UserSimilarity similarity = new EuclideanDistanceSimilarity(model);
UserSimilarity similarity = new GenericUserSimilarity(model);
UserSimilarity similarity = new SpearmanCorrelationSimilarity(model);
Item-based Similarity
ItemSimilarity similarity = new PearsonCorrelationSimilarity(model);
ItemSimilarity similarity = new LogLikelihoodSimilarity(model);
ItemSimilarity similarity = new TanimotoCoefficientSimilarity(model);
ItemSimilarity similarity = new EuclideanDistanceSimilarity(model);
ItemSimilarity similarity = new GenericUserSimilarity(model);
Neighborhood Strategy
For user-based filtering, NearestNUserNeighborhood class computes a neighborhood consisting of the nearest n users to a given user.
// here n = 100, computes 100 nearest neighbors
UserNeighborhood neighborhood = new NearestNUserNeighborhood (100, similarity, model);
Building Recommender System
For user-based filtering:
new GenericUserBasedRecommender(model, neighborhood, similarity);
For item-based filtering:
new GenericItemBasedRecommender(model, similarity);
Evaluation
For evaluation purpose, we use evaluate function from RMSRecommenderEvaluator class to evaluate RMSE and from GenericRecommenderIRStatsEvaluator class to evaluate Precision, Recall, and F1 Score.
RecommenderEvaluator evaluator = new RMSRecommenderEvaluator();
double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
System.out.println("RMSE: " + score);
Details about parameters of the evaluate function from RMSRecommenderEvaluator class:
double evaluate(RecommenderBuilder recommenderBuilder,
DataModelBuilder dataModelBuilder,
DataModel dataModel,
double trainingPercentage,
double evaluationPercentage)
throws TasteExceptiontrainingPercentage – percentage of each user’s preferences to use to produce recommendations; the rest are compared to estimated preference values to evaluate Recommender performance
evaluationPercentage – percentage of users to use in evaluation
RecommenderIRStatsEvaluator statsEvaluator = new GenericRecommenderIRStatsEvaluator();
IRStatistics stats = statsEvaluator.evaluate(recommenderBuilder, null, model, null, 10, 4, 0.7); // evaluate precision recall at 10
System.out.println("Precision: " + stats.getPrecision());
System.out.println("Recall: " + stats.getRecall());
System.out.println("F1 Score: " + stats.getF1Measure());
Details about parameters of the evaluate function from GenericRecommenderIRStatsEvaluator class:
IRStatistics evaluate(RecommenderBuilder recommenderBuilder,
DataModelBuilder dataModelBuilder,
DataModel dataModel,
IDRescorer rescorer,
int at,
double relevanceThreshold,
double evaluationPercentage)
throws TasteExceptiondataModel – dataset to test on
rescorer – if any, to use when computing recommendations
at – as in, “precision at 5”. The number of recommendations to consider when evaluating precision, etc.
relevanceThreshold – items whose preference value is at least this value are considered “relevant” for the purposes of computations
Here is the full source code for recommending certain items to users. The code recommends 5 items to first 10 users along with the recommendation value. The code is based upon this video tutorial.
package com.chapagain.itemrecommend;
import java.io.File;
import java.io.IOException;
import java.util.List;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.impl.common.LongPrimitiveIterator;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.LogLikelihoodSimilarity;
import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
import org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.similarity.ItemSimilarity;
public class ItemRecommend {
/**
* @param args
*/
public static void main(String[] args) {
try {
DataModel dm = new FileDataModel(new File("data/dataset-recsys.csv"));
ItemSimilarity sim = new LogLikelihoodSimilarity(dm);
//TanimotoCoefficientSimilarity sim = new TanimotoCoefficientSimilarity(dm);
//ItemSimilarity sim = new TanimotoCoefficientSimilarity(dm);
//ItemSimilarity sim = new PearsonCorrelationSimilarity(dm);
GenericItemBasedRecommender recommender = new GenericItemBasedRecommender(dm, sim);
int x = 1;
for (LongPrimitiveIterator items = dm.getItemIDs(); items.hasNext();) {
long itemId = items.nextLong();
List<RecommendedItem> recommendations = recommender.mostSimilarItems(itemId, 5);
for (RecommendedItem recommendation : recommendations) {
System.out.println(itemId + "," + recommendation.getItemID() + "," + recommendation.getValue());
}
x++;
if (x > 10) System.exit(1); // generate recommendation for first 10 items only
}
} catch (IOException e) {
System.out.println("There was an error.");
e.printStackTrace();
} catch (TasteException e) {
System.out.println("There was a Taste Exception.");
e.printStackTrace();
}
}
}
Here is a full source code for creating and evaluating Item-based Recommender System. The evaluation is done for metrics Root Mean Square Error (RMSE), Precision, Recall, and F1 Score. Certain number of items are also recommended for a particular user.
package com.chapagain.itemrecommend;
import java.io.File;
import java.util.List;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.IRStatistics;
import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
import org.apache.mahout.cf.taste.eval.RecommenderEvaluator;
import org.apache.mahout.cf.taste.eval.RecommenderIRStatsEvaluator;
import org.apache.mahout.cf.taste.impl.eval.GenericRecommenderIRStatsEvaluator;
import org.apache.mahout.cf.taste.impl.eval.RMSRecommenderEvaluator;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.EuclideanDistanceSimilarity;
import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.similarity.ItemSimilarity;
import org.apache.mahout.common.RandomUtils;
public class ItemBasedRecommender {
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed(); // to randomize the evaluation result
DataModel model = new FileDataModel(new File("data/dataset-recsys.csv"));
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
public Recommender buildRecommender(DataModel model) throws TasteException {
//ItemSimilarity similarity = new EuclideanDistanceSimilarity(model);
ItemSimilarity similarity = new PearsonCorrelationSimilarity(model);
//Optimizer optimizer = new NonNegativeQuadraticOptimizer();
return new GenericItemBasedRecommender(model, similarity);
}
};
// Recommend certain number of items for a particular user
// Here, recommending 5 items to user_id = 9
Recommender recommender = recommenderBuilder.buildRecommender(model);
List<RecommendedItem> recomendations = recommender.recommend(9, 5); // recommend (user_id, number_of_items_to_recommend)
for (RecommendedItem recommendedItem : recomendations) {
System.out.println(recommendedItem);
}
RecommenderEvaluator evaluator = new RMSRecommenderEvaluator();
double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
System.out.println("RMSE: " + score);
RecommenderIRStatsEvaluator statsEvaluator = new GenericRecommenderIRStatsEvaluator();
IRStatistics stats = statsEvaluator.evaluate(recommenderBuilder, null, model, null, 10, 4, 0.7); // evaluate precision recall at 10
System.out.println("Precision: " + stats.getPrecision());
System.out.println("Recall: " + stats.getRecall());
System.out.println("F1 Score: " + stats.getF1Measure());
}
}
Here is a full source code for creating and evaluating User-based Recommender System. The evaluation is done for metrics Root Mean Square Error (RMSE), Precision, Recall, and F1 Score. Certain number of items are also recommended for a particular user.
package com.chapagain.itemrecommend;
import java.io.File;
import java.util.List;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.IRStatistics;
import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
import org.apache.mahout.cf.taste.eval.RecommenderEvaluator;
import org.apache.mahout.cf.taste.eval.RecommenderIRStatsEvaluator;
import org.apache.mahout.cf.taste.impl.eval.AverageAbsoluteDifferenceRecommenderEvaluator;
import org.apache.mahout.cf.taste.impl.eval.GenericRecommenderIRStatsEvaluator;
import org.apache.mahout.cf.taste.impl.eval.RMSRecommenderEvaluator;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.EuclideanDistanceSimilarity;
import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
import org.apache.mahout.cf.taste.impl.similarity.SpearmanCorrelationSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;
import org.apache.mahout.common.RandomUtils;
public class UserBasedRecommender {
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed(); // to randomize the evaluation result
DataModel model = new FileDataModel(new File("data/dataset-recsys.csv"));
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
public Recommender buildRecommender(DataModel model) throws TasteException {
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
//SpearmanCorrelationSimilarity similarity = new SpearmanCorrelationSimilarity(model);
// neighborhood size = 100
UserNeighborhood neighborhood = new NearestNUserNeighborhood (100, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
// Recommend certain number of items for a particular user
// Here, recommending 5 items to user_id = 9
Recommender recommender = recommenderBuilder.buildRecommender(model);
List<RecommendedItem> recomendations = recommender.recommend(9, 5);
for (RecommendedItem recommendedItem : recomendations) {
System.out.println(recommendedItem);
}
RecommenderEvaluator evaluator = new RMSRecommenderEvaluator();
double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
System.out.println("RMSE: " + score);
RecommenderIRStatsEvaluator statsEvaluator = new GenericRecommenderIRStatsEvaluator();
IRStatistics stats = statsEvaluator.evaluate(recommenderBuilder, null, model, null, 10, 4, 0.7); // evaluate precision recall at 10
System.out.println("Precision: " + stats.getPrecision());
System.out.println("Recall: " + stats.getRecall());
System.out.println("F1 Score: " + stats.getF1Measure());
}
}
Apache Mahout Video Tutorials – I
Apache Mahout Video Tutorials – II
Hope this helps.
Thanks.