Consider the following survey data:
data <- replicate(10 ,sample(c(1,2,3,4), 1000, replace = TRUE)) %>%
as.data.frame()
V1:V9
are variables where 1 = "Good"
, 2 = "Okey"
and 3 = "Not Good"
and 4 = "Don't know"
while V10
is an ordinal variable where 1 = "Good"
, 2 = "Not good"
, 3 = "Don't know"
and 4 = "Don't want to answer"
.
I am interested in calculating a simple correlation matrix using cor()
on these variables. However, I only want to calculate it between the values that actually mean something. That is, 1,2,3
for V1:V9
and 1,2
for V10
.
In other words, I want a case wise delete of any value > 3
for V1:V9
and the same for any values > 2
for V10
within the cor()
function.
This would be similar to the use argument?
The only way I have managed to solve this is by mutating these values as NA.
library("dplyr")
data_test <- data_test %>%
mutate_each(funs(ifelse(. > 3, NA, .)), -V10) %>%
mutate(ifelse(V10 > 2, NA, V10))
cor(data_test, use = "complete.obs")
But is there a better way that does not necessarily rely on modifying the data.
PS. There are, of course, more adequate ways of calculating correlation between ordinal variables.
The answer to this question was more simple than I thought.
As @zx8754 points out you should be careful when choosing correlation method for categorical variables.
Anyways, you just change use = "pairwise.complete.obs"
in cor()
However, you still need to mutate 4 to NA
.