Recommender systems research has used several types of measures for evaluating the quality of a recommender system. They can be mainly categorized into two classes:

*Statistical accuracy metrics*evaluate the accuracy of a system by comparing the numerical recommendation scores against the actual user ratings for the user-item pairs in the test dataset.*Mean Absolute Error*(MAE) between ratings and predictions is a widely used metric. MAE is a measure of the deviation of recommendations from their true user-specified values. For each ratings-prediction pair <*p*_{i},*q*_{i}> this metric treats the absolute error between them i.e., |*p*_{i}-*q*_{i}| equally. The MAE is computed by first summing these absolute errors of the*N*corresponding ratings-prediction pairs and then computing the average. Formally,

The lower the MAE, the more accurately the recommendation engine predicts user ratings.

*Root Mean Squared Error*(RMSE), and*Correlation*are also used as statistical accuracy metric*Decision support accuracy metrics*evaluate how effective a prediction engine is at helping a user select high-quality items from the set of all items. These metrics assume the prediction process as a binary operation--either items are predicted (good) or not (bad). With this observation, whether a item has a prediction score of 1.5 or 2.5 on a five-point scale is irrelevant if the user only chooses to consider predictions of 4 or higher. The most commonly used decision support accuracy metrics are*reversal rate*,*weighted errors*and*ROC sensitivity*[23].

We used MAE as our choice of evaluation metric to report prediction experiments because it is most commonly used and easiest to interpret directly. In our previous experiments [23] we have seen that MAE and ROC provide the same ordering of different experimental schemes in terms of prediction quality.