rcorrelationmixed

how to find correlation in a mixed data including continuous, category and date types in r


I have a data including different types:

a <- data.frame(x=c("a","b","b","c","c","c","d","d","e","f"),y=c(1,2,2,2,3,1,4,7,10,2),m=c("a","d","ab","ac","ac","vc","ed","ed","e","df"),n=c(2,1,5,3,3,2,8,10,10,1))

Actually, the data is more complex than this, probably including date as well. Furthermore, this is an unsupervised issue. So there is no "class labels" here. So I cannot use the methods such as ANOVA. So, how can I find correlation between each two columns?

P.S. I find a function called mixed.cor in psych package, but cannot understand how to use it.

Furthermore, correlation is just representing the linear relation. What function should I use if I want to know the important of every column?


Solution

  • The measure of correlation that most people use for numeric variables (i.e. Pearson correlation) is not defined for categorical data. If you want to measure the association between a numeric variable and a categorical variable, you can use ANOVA. If you want to measure the association between two categorical variables, you can use a Chi-Squared test. If your categorical variable is ordered (e.g. low, medium, high), you can use Spearman rank correlation.