rstatchi-squaredbonferroni

Multiple chi-square tests in R


Imagine I have the following data:

ID. Drug1. Drug2. Drug3. Drug4.
1. 1. 0. 0. 0.
2. 0. 0. 0. 1.
3. 0. 1. 0. 0.
4. 0. 0. 1. 0.
5. 1. 0. 0. 0.

Where ID is the number given to each patient and each Drug variable is a binary variable where 1 indicates that patient had a certain condition on that drug and 0 indicates he/she didn't.

In order to compare the proportion of the rate of condition between drugs, I want to perform chi-sqauare tests like: Drug1 vs Drug2, Drug1 vs Drug3, Drug1 vs Drug4, Drug2 vs Drug3, Drug2 vs Drug4, etc.

How can I do this in R in one line of code? Btw, is it necessary to implement correction for multiple comparisons (e.g., Bonferroni)?


Solution

  • Below is a tidyverse approach using {dplyr}. I first generate some data to run real tests with meaningful results. Then we can use the colnames of mydat with combn to get all pairs of drugs. Then we can use rowwise and mutate and apply chisq.test() to each row. Here we use the strings in V1 and V2 to subset the variables in mydat. Since we are in a data.frame we have to wrap the result in list if its a non-atomic vector. We can subset chisq_test with $p.value to get the p values.

    library(dplyr) 
    set.seed(123)
    
    mydat <- tibble(ID = 1:1000,
                    Drug1 = round(rnorm(1000, 0.8, sd = 0.5)),
                    Drug2 = round(rnorm(1000, 0.6, sd = 0.5), 0),
                    Drug3 = round(rnorm(1000, 0.5, sd = 0.5), 0),
                    Drug4 = round(rnorm(1000, 0.3, sd = 0.3), 0)
                    ) %>% 
      mutate(across(starts_with("Drug"), ~ case_when(.x >0 ~ 0,
                                                     .x <1 ~ 1,
                                                     TRUE ~ .x))
      )
    
    mydat %>% 
      select(-ID) %>% 
      colnames() %>% 
      combn(2) %>% 
      t() %>% 
      as_tibble() %>% 
      rowwise %>% 
      mutate(chisq_test = list(
        table(mydat[[V1]], mydat[[V2]]) %>% chisq.test()
        ),
        chisq_pval = chisq_test$p.value
        )
    
    #> Using compatibility `.name_repair`.
    #> # A tibble: 6 x 4
    #> # Rowwise: 
    #>   V1    V2    chisq_test chisq_pval
    #>   <chr> <chr> <list>          <dbl>
    #> 1 Drug1 Drug2 <htest>       0.00694
    #> 2 Drug1 Drug3 <htest>       0.298  
    #> 3 Drug1 Drug4 <htest>       0.926  
    #> 4 Drug2 Drug3 <htest>       0.998  
    #> 5 Drug2 Drug4 <htest>       0.574  
    #> 6 Drug3 Drug4 <htest>       0.895
    

    Created on 2022-04-04 by the reprex package (v2.0.1)


    Below is my old answer, which compares the distribution of 0 and 1 within each drug, which is not what the OP asked for, as @KU99 correctly pointed out in the comments.

    library(tibble) # for reading in your data
    
    mydat <-
      tribble(~ID, ~Drug1,  ~Drug2, ~Drug3,  ~Drug4,
               1, 1,      0,      0,      0,  
               2, 0,      0,      0,      1,  
               3, 0,      1,      0,      0,  
               4, 0,      0,      1,      0,  
               5, 1,      0,      0,      0
      )
    
    lapply(mydat[, -1], function(x) chisq.test(table(x)))
    
    #> $Drug1
    #> 
    #>  Chi-squared test for given probabilities
    #> 
    #> data:  table(x)
    #> X-squared = 0.2, df = 1, p-value = 0.6547
    #> 
    #> 
    #> $Drug2
    #> 
    #>  Chi-squared test for given probabilities
    #> 
    #> data:  table(x)
    #> X-squared = 1.8, df = 1, p-value = 0.1797
    #> 
    #> 
    #> $Drug3
    #> 
    #>  Chi-squared test for given probabilities
    #> 
    #> data:  table(x)
    #> X-squared = 1.8, df = 1, p-value = 0.1797
    #> 
    #> 
    #> $Drug4
    #> 
    #>  Chi-squared test for given probabilities
    #> 
    #> data:  table(x)
    #> X-squared = 1.8, df = 1, p-value = 0.1797
    

    Created on 2022-03-29 by the reprex package (v0.3.0)