I was getting all NA values except for the diagonal entries while finding correlation using R's cor()
. I removed NAs pairwise. When I explicitly removed the NAs then I got the desired results. Have I misunderstood the arguments?
I tried
> c <- Result_table[,.SD,.SDcols=c("organic_account_countsession", "organic_account_countsession")]
> b <- cor(c, use="pairwise.complete.obs")
organic_account_countsession organic_account_countsession
organic_account_countsession 1 NA
organic_account_countsession NA 1
Also tried this
> b <- cor(c, na.rm=TRUE)
Still got the same result.
Only when I do
c <- c[complete.cases(c)]
b <- cor(c)
organic_account_countsession organic_account_countsession
organic_account_countsession 1 1
organic_account_countsession 1 1
I get all 1s. I expect to get all 1s as I am finding the correlation of a variable with itself.
(Note : The variable has variance, NA is not due to no variance)
This turned out to be a different error altogether on my part.
I have imported the h2o
package along with the stats
package.
Turns out there is a cor()
function in h2o
as well with a different behavior.
cor <- stats::cor
solved the problem.