rrecommendation-enginerecommenderlab

Different results from base R cor() function than similarity() function in recommenderlab package?


Can anyone explain why these two correlation matrices return different results?

library(recommenderlab)
data(MovieLense)
cor_mat <- as( similarity(MovieLense, method = "pearson", which = "items"), "matrix" )
cor_mat_base <- suppressWarnings( cor(as(MovieLense, "matrix"), use = "pairwise.complete.obs") )
print( cor_mat[1:5, 1:5] )
print( cor_mat_base[1:5, 1:5] )

Solution

  • The dissimilarity() = 1 - pmax(cor(), 0) R base function. Also, it is important to specify the method for both of them to use the same one:

    library("recommenderlab")
    data(MovieLense)
    cor_mat <- as( dissimilarity(MovieLense, method = "pearson", 
                              which = "items"), "matrix" )
    cor_mat_base <- suppressWarnings( cor(as(MovieLense, "matrix"), method = "pearson"
                                          , use = "pairwise.complete.obs") )
    print( cor_mat[1:5, 1:5] )
    print(1- cor_mat_base[1:5, 1:5] )
    
    > print( cor_mat[1:5, 1:5] )
                      Toy Story (1995) GoldenEye (1995) Four Rooms (1995) Get Shorty (1995) Copycat (1995)
    Toy Story (1995)         0.0000000        0.7782159         0.8242057         0.8968647      0.6135248
    GoldenEye (1995)         0.7782159        0.0000000         0.7694644         0.7554443      0.7824406
    Four Rooms (1995)        0.8242057        0.7694644         0.0000000         1.0000000      0.8153877
    Get Shorty (1995)        0.8968647        0.7554443         1.0000000         0.0000000      1.0000000
    Copycat (1995)           0.6135248        0.7824406         0.8153877         1.0000000      0.0000000
    > print(1- cor_mat_base[1:5, 1:5] )
                      Toy Story (1995) GoldenEye (1995) Four Rooms (1995) Get Shorty (1995) Copycat (1995)
    Toy Story (1995)         0.0000000        0.7782159         0.8242057         0.8968647      0.6135248
    GoldenEye (1995)         0.7782159        0.0000000         0.7694644         0.7554443      0.7824406
    Four Rooms (1995)        0.8242057        0.7694644         0.0000000         1.2019687      0.8153877
    Get Shorty (1995)        0.8968647        0.7554443         1.2019687         0.0000000      1.2373503
    Copycat (1995)           0.6135248        0.7824406         0.8153877         1.2373503      0.0000000
    

    To understand it well, check the details of both packages :).

    OP/ EDIT: It is important to point out that there are some values that are a little different between even 1-dissimilarity and cor, having cor bigger than 1. This is because dissimilarity() sets a floor at 0 (i.e., does not return negative numbers), and also doing cor() could return values greater than 1. https://www.rdocumentation.org/packages/stats/versions/3.6.0/topics/cor they only specify that

    For r <- cor(*, use = "all.obs"), it is now guaranteed that all(abs(r) <= 1).

    This should be evaluated.