rdataframet-testcombn

Using combn() in R to find all possible t-test relationships, how to access the variables compared?


So, I have a DataFrame with a large number of variables, and I want to cross-check each variable with each other variable with a t-test.

A sample of my data, called trust_news:

row mean polity2 web rsf civil_liberties freedom_of_expression vdem_gov_censorship_effort vdem_self_censorship_effort vdem_freedom_of_expression ciri_freedom_of_speech_and_press media_integrity vdem_critical_press vdem_media_perspective vdem_media_bias vdem_media_corruption vdem_media_freedom
1 2.68 8 87.2661 25.69 0.785599008 0.758906967 0.731895466 0.742219428 1 1 0.81449235 0.889046047 0.782079459 0.693825991 0.733503755 1
2 2.8 8 94.8967 22.23 0.810742702 0.832891911 0.8447733 0.831499528 1 1 0.88417386 0.868772592 0.881994928 0.835622928 0.828566864 1
3 3.22 10 89.7391 14.6 0.821268417 0.83327835 0.883343829 0.805721471 1 1 0.829951651 0.917491749 0.725950972 0.709774199 0.874261064 1
5 2.96 10 74.3872 24.98 0.813949794 0.781986225 0.844615869 0.729330399 0.666666667 0.5 0.878769429 0.872387239 0.919019442 0.841939049 0.810193322 0.5

Then, I run this code on it:

trust_news_combos <- combn(trust_news, 1, t.test, simplify = TRUE)

First off, is the code correct? I have no clue what to put for m in the combn() function. AAnyway, that line gives me this:

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16
1 c(t = 85.1670166474227) c(t = 15.9614095646055) c(t = 29.2365516170159) c(t = 11.0778062107689) c(t = 30.4673329981756) c(t = 26.8521522144486) c(t = 23.160185720972) c(t = 25.1063414199952) c(t = 17.1830959329723) c(t = 11.06502519693) c(t = 33.0841916129404) c(t = 29.3707961673045) c(t = 31.2455551028106) c(t = 39.1490231250879) c(t = 27.6089179039943) c(t = 14.0719508946058)
2 c(df = 32) c(df = 32) c(df = 32) c(df = 32) c(df = 32) c(df = 32) c(df = 32) c(df = 32) c(df = 32) c(df = 32) c(df = 32) c(df = 32) c(df = 32) c(df = 32) c(df = 32) c(df = 32)
3 2.69E-39 8.55E-17 1.18E-24 1.75E-12 3.29E-25 1.61E-23 1.46E-21 1.26E-22 1.03E-17 1.80E-12 2.55E-26 1.02E-24 1.51E-25 1.32E-28 6.88E-24 2.96E-15
4 c(3.00189912275063 3.14900996815846) c(7.56066019283154 9.77267314050179) c(73.5097801046279 84.5198259559781) c(19.628297122971 28.4729149982411) c(0.682586494865725 0.780396107679729) c(0.639468676034051 0.744449016935646) c(0.664192511270674 0.792289818305084) c(0.665160025455844 0.782621785210823) c(0.676674167771883 0.858679367682662) c(0.543941635486123 0.78939169784721) c(0.739756992152986 0.836824222392469) c(0.730937293702635 0.839876930600395) c(0.729509614919607 0.831257822777363) c(0.709894349786553 0.787820841122538) c(0.708427672557418 0.821287114048642) c(0.647915673315896 0.867235841835619)
5 c(mean of x = 3.07545454545455) c(mean of x = 8.66666666666667) c(mean of x = 79.014803030303) c(mean of x = 24.0506060606061) c(mean of x = 0.731491301272727) c(mean of x = 0.691958846484849) c(mean of x = 0.728241164787879) c(mean of x = 0.723890905333333) c(mean of x = 0.767676767727273) c(mean of x = 0.666666666666667) c(mean of x = 0.788290607272727) c(mean of x = 0.785407112151515) c(mean of x = 0.780383718848485) c(mean of x = 0.748857595454545) c(mean of x = 0.76485739330303) c(mean of x = 0.757575757575758)
6 c(mean = 0) c(mean = 0) c(mean = 0) c(mean = 0) c(mean = 0) c(mean = 0) c(mean = 0) c(mean = 0) c(mean = 0) c(mean = 0) c(mean = 0) c(mean = 0) c(mean = 0) c(mean = 0) c(mean = 0) c(mean = 0)
7 0.036110864 0.542976272 2.702603374 2.171062176 0.024009036 0.025769214 0.031443667 0.028832991 0.044676278 0.0602499 0.023826806 0.02674109 0.024975831 0.019128385 0.027703273 0.053835873
8 two.sided two.sided two.sided two.sided two.sided two.sided two.sided two.sided two.sided two.sided two.sided two.sided two.sided two.sided two.sided two.sided
9 One Sample t-test One Sample t-test One Sample t-test One Sample t-test One Sample t-test One Sample t-test One Sample t-test One Sample t-test One Sample t-test One Sample t-test One Sample t-test One Sample t-test One Sample t-test One Sample t-test One Sample t-test One Sample t-test
10 x[a] x[a] x[a] x[a] x[a] x[a] x[a] x[a] x[a] x[a] x[a] x[a] x[a] x[a] x[a] x[a]

It gives me the p-values I'm looking for in row 3, but how do I check which two columns are being checked?

Any help is appreciated and will be thanked in my final code!


Solution

  • You should write a small function to compute exactly what you need, and use it instead of the standard function t.test. For example:

    # get four column names
    cols <- names(mtcars)[1:4]   # use trust_news instead of mtcars, and keep all the names
    
    # compute the pval for a pair of names
    pval <- function(pair) {
      value <- t.test(mtcars[, pair[1]], mtcars[, pair[2]])$p.value
      names(value) <- paste(pair, collapse = " vs. ")
      value
    }
    
    # do it for all pairs.  Don't simplify, and it will keep the names
    combn(cols, 2, pval, simplify = FALSE)
    #> [[1]]
    #>  mpg vs. cyl 
    #> 9.507708e-15 
    #> 
    #> [[2]]
    #> mpg vs. disp 
    #> 7.978234e-11 
    #> 
    #> [[3]]
    #>   mpg vs. hp 
    #> 1.030354e-11 
    #> 
    #> [[4]]
    #> cyl vs. disp 
    #> 1.774454e-11 
    #> 
    #> [[5]]
    #>   cyl vs. hp 
    #> 8.321996e-13 
    #> 
    #> [[6]]
    #> disp vs. hp 
    #> 0.001545647
    

    Created on 2021-05-22 by the reprex package (v2.0.0)