In recommenderlab R package, on predicting UBCF
based on binary rating matrix, why does the script do crossprod between knn (k
nearest neighbors) similarities and new input binary ratings for the items? I'm writing a study and I wondering why is it a good way.
The results of the predict were very good on market basket recommendation, an I'm confused on crossprod
useful.
As described here, in UBCF
the missing ratings are predicted as aggregate ratings of the similar (neighboring) users.
Once the users in the neighborhood are found, their ratings are aggregated to form the predicted rating for the active user u_a
(as shown below).
Now, crossprod()
is used for computing the weighted average (can be used to compute simple average too, when weights are equal). Given matrices x
, y
, the matrix crossproduct is computed by crossprod()
as t(x) %*% y
or t(y) %*% x
(from documentation).
Take the following example from the documentation, as shown in the next figure:
Here, u_1, u_2 and u_4 are neighboring users for the active user u_a, for which ratings for 4 items are missing. Let's see how crossprod()
can be used to compute the missing ratings with simple and weighted averages of ratings of the neighboring users, respectively (using the code similar to the original implementation).
r_neighbors <- matrix(c(NA, 4.0, 4.0, 2.0, 1.0, 2.0, NA, NA,
3.0, NA, NA, NA, 5.0, 1.0, NA, NA,
4.0, NA, NA, 2.0, 1.0, 1.0, 2.0, 4.0), nrow=3, byrow=T)
u_a <- matrix(c(NA,NA,4.0,3.0,NA,1.0,NA,5.0), nrow=1)
# simple average of neighbor ratings, with all weights equal to 1
s_uk <- matrix(rep(1, 3), ncol=1)
r_a <- as(crossprod(replace(r_neighbors, is.na(r_neighbors), 0), s_uk), "matrix") /
as(crossprod(!is.na(r_neighbors), s_uk), "matrix")
u_a[is.na(u_a)] <- r_a[is.na(u_a)]
u_a
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
# [1,] 3.5 4 4 3 2.333333 1 2 5
The above ratings match exactly with the ones computed in the figure. Also, you can reproduce the same prediction results for the new user u_a
with recommenderlab
's predict()
, as shown below:
library(recommenderlab)
u_a <- matrix(c(NA,NA,4.0,3.0,NA,1.0,NA,5.0), nrow=1)
rec <- Recommender(as(r_neighbors, "realRatingMatrix"), method = "UBCF",
param=list(nn=3, normalize=NULL, weighted=FALSE))
pred <- as(predict(rec, newdata=as(u_a, "realRatingMatrix"), type="ratings"), "matrix")
u_a[is.na(u_a)] <- pred[is.na(u_a)]
u_a
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
# [1,] 3.5 4 4 3 2.333333 1 2 5
If you want to use-user similarity-based weights, the same code will do the job, with similarity weights this time,
u_a <- matrix(c(NA,NA,4.0,3.0,NA,1.0,NA,5.0), nrow=1)
s_uk <- matrix(c(0.3, 1.0, 0.3), ncol=1)
r_a <- as(crossprod(replace(r_neighbors, is.na(r_neighbors), 0), s_uk), "matrix") /
as(crossprod(!is.na(r_neighbors), s_uk), "matrix")
u_a[is.na(u_a)] <- r_a[is.na(u_a)]
u_a
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
# [1,] 3.230769 4 4 3 3.5 1 2 5