rcorrelationp-valuepearson-correlationpearson

Pearson correlation and p-value in columns from a data frame


For example, if we calculate Pearson correlation and P-value of first two variables of data set mtcars, results are something like this:

Correlation value:

      mpg  disp    
mpg   1.00 -0.85 
disp -0.85  1.00  

P-value:

      mpg    disp   
mpg  0.0000 0.0000 
disp 0.0000 0.0000       

Instead of this, Is there any way to have results like this:

          Corr.   p-value
mp  mp    1.00    0.0000
mp  dip  -0.85    0.0000

I have more than 200 variable and want to generate results like this and then write those results on CSV using write.csv command. Thank you!


Solution

  • If we wanted pairwise, cor.test, use combn

    out <- combn(mtcars, 2, FUN = function(x) 
        cor.test(x[[1]], x[[2]], conf.level = 0.95), simplify = FALSE)
    names(out) <- combn(names(mtcars), 2, FUN = paste, collapse='_')
    

    The output of corr.test is a list

    str(out[[1]])
    #List of 9
    # $ statistic  : Named num -8.92
    #  ..- attr(*, "names")= chr "t"
    # $ parameter  : Named int 30
    #  ..- attr(*, "names")= chr "df"
    # $ p.value    : num 6.11e-10
    # $ estimate   : Named num -0.852
    #  ..- attr(*, "names")= chr "cor"
    # $ null.value : Named num 0
    #  ..- attr(*, "names")= chr "correlation"
    # $ alternative: chr "two.sided"
    # $ method     : chr "Pearson's product-moment correlation"
    # $ data.name  : chr "x[[1]] and x[[2]]"
    # $ conf.int   : num [1:2] -0.926 -0.716
    #  ..- attr(*, "conf.level")= num 0.95
    

    It can be directly extracted with list extraction methods i.e. $ or [[

    mydata <– do.call(rbind, Map(cbind, corgroups = names(out), 
     unname(lapply(out, function(x)
            data.frame(cor.value = x$estimate, cor.pvalue = x$p.value)))))