Can anyone explain why these two correlation matrices return different results?
library(recommenderlab)
data(MovieLense)
cor_mat <- as( similarity(MovieLense, method = "pearson", which = "items"), "matrix" )
cor_mat_base <- suppressWarnings( cor(as(MovieLense, "matrix"), use = "pairwise.complete.obs") )
print( cor_mat[1:5, 1:5] )
print( cor_mat_base[1:5, 1:5] )
The dissimilarity() = 1 - pmax(cor(), 0)
R base function. Also, it is important to specify the method
for both of them to use the same one:
library("recommenderlab")
data(MovieLense)
cor_mat <- as( dissimilarity(MovieLense, method = "pearson",
which = "items"), "matrix" )
cor_mat_base <- suppressWarnings( cor(as(MovieLense, "matrix"), method = "pearson"
, use = "pairwise.complete.obs") )
print( cor_mat[1:5, 1:5] )
print(1- cor_mat_base[1:5, 1:5] )
> print( cor_mat[1:5, 1:5] )
Toy Story (1995) GoldenEye (1995) Four Rooms (1995) Get Shorty (1995) Copycat (1995)
Toy Story (1995) 0.0000000 0.7782159 0.8242057 0.8968647 0.6135248
GoldenEye (1995) 0.7782159 0.0000000 0.7694644 0.7554443 0.7824406
Four Rooms (1995) 0.8242057 0.7694644 0.0000000 1.0000000 0.8153877
Get Shorty (1995) 0.8968647 0.7554443 1.0000000 0.0000000 1.0000000
Copycat (1995) 0.6135248 0.7824406 0.8153877 1.0000000 0.0000000
> print(1- cor_mat_base[1:5, 1:5] )
Toy Story (1995) GoldenEye (1995) Four Rooms (1995) Get Shorty (1995) Copycat (1995)
Toy Story (1995) 0.0000000 0.7782159 0.8242057 0.8968647 0.6135248
GoldenEye (1995) 0.7782159 0.0000000 0.7694644 0.7554443 0.7824406
Four Rooms (1995) 0.8242057 0.7694644 0.0000000 1.2019687 0.8153877
Get Shorty (1995) 0.8968647 0.7554443 1.2019687 0.0000000 1.2373503
Copycat (1995) 0.6135248 0.7824406 0.8153877 1.2373503 0.0000000
To understand it well, check the details of both packages :).
OP/ EDIT:
It is important to point out that there are some values that are a little different between even 1-dissimilarity
and cor
, having cor
bigger than 1. This is because dissimilarity()
sets a floor at 0 (i.e., does not return negative numbers), and also doing cor()
could return values greater than 1. https://www.rdocumentation.org/packages/stats/versions/3.6.0/topics/cor they only specify that
For r <- cor(*, use = "all.obs"), it is now guaranteed that all(abs(r) <= 1).
This should be evaluated.