rcorrelationordinals

Case wise delete of values inside cor() in R


Consider the following survey data:

data <- replicate(10 ,sample(c(1,2,3,4), 1000, replace = TRUE)) %>%
  as.data.frame()

V1:V9 are variables where 1 = "Good", 2 = "Okey" and 3 = "Not Good" and 4 = "Don't know" while V10 is an ordinal variable where 1 = "Good", 2 = "Not good", 3 = "Don't know" and 4 = "Don't want to answer".

I am interested in calculating a simple correlation matrix using cor() on these variables. However, I only want to calculate it between the values that actually mean something. That is, 1,2,3 for V1:V9 and 1,2 for V10.

In other words, I want a case wise delete of any value > 3 for V1:V9 and the same for any values > 2 for V10 within the cor() function.

This would be similar to the use argument?

The only way I have managed to solve this is by mutating these values as NA.

library("dplyr")
data_test <- data_test %>%
      mutate_each(funs(ifelse(. > 3, NA, .)), -V10) %>%
      mutate(ifelse(V10 > 2, NA, V10))

cor(data_test, use = "complete.obs")

But is there a better way that does not necessarily rely on modifying the data.

PS. There are, of course, more adequate ways of calculating correlation between ordinal variables.


Solution

  • The answer to this question was more simple than I thought.

    As @zx8754 points out you should be careful when choosing correlation method for categorical variables.

    Anyways, you just change use = "pairwise.complete.obs" in cor()

    However, you still need to mutate 4 to NA.