rcorrelationpearson-correlation

Correlation analysis between specific columns (in groups of three) possible?


It is possible to calculate the correlation coefficients and p-values in groups? For example with this dataset:

df<-read.csv("http://renatabrandt.github.io/EBC2015/data/varechem.csv", row.names=1)

df

It is possible to calculate between the first 3 columns N, P, K, the correlation between the next three columns of Ca, Mg and S and so on?

I couldn't find any arguments in the cor function that would allow me to specify that. I'm no expert, but I suspect I could split my dataset into sub-datasets first. But even then I don't know how it can be implied that in the cor function.

Anyone have an idea how I could go about this, maybe another function? Or do I have to go the long way and just do each correlation separately? My real data set is relatively large, so an automated solution would help me a lot!


Solution

  • Perhaps this helps

    i1 <- rep(c(TRUE, FALSE), each = 3)
    cor(df[i1], df[!i1])
                    Ca          Mg           S          Zn          Mo    Baresoil
    N        -0.27089189 -0.16377388 -0.26225073 -0.13216191 -0.05773585  0.10592277
    P         0.73720757  0.59793864  0.75272609  0.70231212  0.17246511  0.01389203
    K         0.66482924  0.62775899  0.84377126  0.60004281  0.06819748  0.16855851
    Al       -0.20586385 -0.11839842  0.35975081 -0.05510622  0.51031000 -0.39955068
    Fe       -0.33216455 -0.20221718  0.05651145 -0.31205825  0.22059378 -0.45746292
    Mn        0.44331615  0.25758427  0.27457837  0.36446210 -0.20497945  0.24617616
    Humdepth  0.24364456  0.37140078  0.15821813  0.13964926  0.05860733  0.59244384
    pH        0.09142358 -0.09252275 -0.18689505 -0.08692419 -0.17496367 -0.53181541
    > 
    

    Or may be

    lst1 <- split.default(df, as.integer(gl(ncol(df), 6, ncol(df))))
    > lapply(Filter(\(x) ncol(x)  == 6, lst1), \(x) cor(x[1:3], x[4:6]))
    $`1`
              Ca         Mg          S
    N -0.2708919 -0.1637739 -0.2622507
    P  0.7372076  0.5979386  0.7527261
    K  0.6648292  0.6277590  0.8437713
    
    $`2`
                Zn         Mo   Baresoil
    Al -0.05510622  0.5103100 -0.3995507
    Fe -0.31205825  0.2205938 -0.4574629
    Mn  0.36446210 -0.2049795  0.2461762
    

    Or could be

    lapply(Filter(\(x) ncol(x)  == 6, lst1), \(x) list(cor(x[1:3]), cor(x[4:6])))
    

    -output

    $`1`
    $`1`[[1]]
               N          P          K
    N  1.0000000 -0.2511603 -0.1466368
    P -0.2511603  1.0000000  0.7540753
    K -0.1466368  0.7540753  1.0000000
    
    $`1`[[2]]
              Ca        Mg         S
    Ca 1.0000000 0.7982771 0.5395393
    Mg 0.7982771 1.0000000 0.6504151
    S  0.5395393 0.6504151 1.0000000
    
    
    $`2`
    $`2`[[1]]
               Al         Fe         Mn
    Al  1.0000000  0.8242146 -0.4704864
    Fe  0.8242146  1.0000000 -0.4360448
    Mn -0.4704864 -0.4360448  1.0000000
    
    $`2`[[2]]
                     Zn         Mo   Baresoil
    Zn       1.00000000 0.28200265 0.04450428
    Mo       0.28200265 1.00000000 0.03124277
    Baresoil 0.04450428 0.03124277 1.00000000