Home » Python, Recommender System7 March 2016

Recommender System using Python & Crab

Crab as known as scikits.recommender is a Python framework for building recommender engines integrated with the world of scientific Python packages (numpy, scipy, matplotlib).

Currently, Crab supports two Recommender Algorithms: User-based Collaborative Filtering and Item-based Collaborative Filtering.

Here is a tutorial on Introduction to Recommender Systems with Crab. It briefly explains about what Recommendation is, what are Collaborative Filtering and Content-based Filtering algorithms, and how Crab is used to build and evaluate a Recommender system. It shows an example of implementing User-based Collaborative Filtering algorithm on a sample movie dataset.

Here, I will be showing code to evaluate a recommender system using both user-based filtering and item-based filtering.

I will be fetching data from a CSV file. The CSV file consists of 3 fields (user_id, item_id, and star_rating). item_id can be ID of anything like hotels, movies, books, etc. star_rating is the rating provided by users to items. The rating ranges from 1 to 5. 5 is considered as best rating and 1 is considered as worst rating.

For this article, I have created a dummy CSV file named dataset-recsys.csv containing three columns (user_id, item_id, and star_rating). You can download the CSV file from here.

Here is the code to create the CSV file:

Creating a Python dictionary

During the process of building and evaluating a recommender system, we will first read data from the CSV file and create a Python dictionary.

Create a Data Model

Now, the dictionary is used to create a data model. In our example we will use MatrixPreference Data Model.

However, Boolean Data Model can also be used.

Creating Similarity

For user-based filtering, we use UserSimilarity class and for item-based filtering, we use ItemSimilarity class. Crab provides different similarity measures implementation like euclidean_distances, cosine_distances, and jaccard_coefficient.

User-based Similarity

Item-based Similarity

Neighborhood Strategy

For user-based filtering:

For item-based filtering:

Building Recommender System

For user-based filtering:

For item-based filtering:

Evaluation

For evaluation purpose, we use evaluate function from CfEvaluator class of Crab Framework. Currently, it supports the following evaluation metrics: Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Normalized Mean Absolute Error (NMAE), Precision, Recall, and F1 Score.

Here is the details about each parameters of the evaluate function:

metric: [None|’rmse’|’f1score’|’precision’|’recall’|’nmae’|’mae’]
If metrics is None, all metrics available will be evaluated.
Otherwise it will return the specified metric evaluated.

sampling_users: float or sampling, optional, default = None
If an float is passed, it is the percentage of evaluated
users. If sampling_users is None, all users are used in the
evaluation. Specific sampling objects can be passed, see
scikits.crab.metrics.sampling module for the list of possible
objects.

sampling_ratings: float or sampling, optional, default = None
If an float is passed, it is the percentage of evaluated
ratings. If sampling_ratings is None, 70% will be used in the
training set and 30% in the test set. Specific sampling objects
can be passed, see scikits.crab.metrics.sampling module
for the list of possible objects.

at: integer, optional, default = None
This number at is the ‘at’ value, as in ‘precision at 5’. For
example this would mean precision or recall evaluated by removing
the top 5 preferences for a user and then finding the percentage of
those 5 items included in the top 5 recommendations for that user.
If at is None, it will consider all the top 3 elements.

Returns
——-
Returns a dictionary containing the evaluation results:
(NMAE, MAE, RMSE, Precision, Recall, F1-Score)

The recommender system can be evaluated separately for each individual evaluation metrics shown above or it can be evaluated for all at once.

Evaluating each metric separately

Evaluating all metrics at once

Using 70% of data as training set and 30% as test set and evaluating precision and recall at N. Here we keep N = 10.

Evaluating with Cross Validation

Here, we use 5-fold cross validation to evaluate RMSE metric.

Here is the full source code for building and evaluating a recommender system using Item-based Collaborative Filtering technique:

Hope this helps.
Thanks.

Recommender System

Get New Post by Email

Find me on

FacebookTwitterGoogle+LinkedInRSS Feed
  • Anonymous

    Very informative article about Collaborative Filtering Recommender, using Python.

  • Anonymous

    Nice post. But you haven’t mentioned how to recommend a particular item to a user (for example, user_id= 5).

    recsys.recommend(5)

    Is the above syntax same for both User Based Recommender & Item Based Recommender ?

  • Soufiane Fadil

    Thanks for this nice tutorial.
    By doing : if (i == 1): continue at line 6 you skip the first record in the dataset please remove it from the code.

    Manu thanks