I am trying to get the corrplot for my data variables which are a combination of binary, continuous and categorical variables. However, when I run this code, it keeps giving me errors. The error when i load my data frame, called df2, is: Error in corrplot(df2) : The matrix is not in [-1, 1]!. How can I solve this?
When I compute the correlation I also get that for certain variables, I only receive NA's, even though they are numeric and integer values 1.
Attached an example of my data variables, where hh_code is the column used for identification: 2
How can I get the correlation between variables for my data in R? Thanks!
If you have a table in which some columns are numeric and others are categorical, you can use the function GGally::ggpairs
to get an overview about the associations between these variables:
library(GGally)
#> Loading required package: ggplot2
#> Registered S3 method overwritten by 'GGally':
#> method from
#> +.gg ggplot2
data <- ggplot2::mpg[c(1,3,4,7,8)]
ggpairs(data)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Created on 2022-05-17 by the reprex package (v2.0.0)
If you need a little bit more rigour, you can use statistical tests to get significant relationships between columns / variables (if their assumptions hold):
covariate x | outcome y | test | R function |
---|---|---|---|
numeric | numeric | Person correlation | cor.test(x,y, method = "pearson") |
binary | numeric | t test | t.test(x, y) |
ordinal (ordered factor) | ordinal (ordered factor) | Spearman correlation | cor.test(as.numeric(x), as.numeric(y), method="spearman") |
categorial (many levels) | numeric | ANOVA | anova(lm(y ~ x)) |