rchi-squaredcontingency

Perform multiple chi-squared test on dataframe based on column value


I have a data frame with count numbers and I want to perform a chisq.test for each value of the variable Cluster. So basically, I need 4 contingency tables (for "A","B","C","D") where rows = Category, columns = Drug, value = Total. And subsequently a chisq.test should be run for all 4 tabels.

Example data frame

df <- data.frame(Cluster = c(rep("A",8),rep("B",8),rep("C",8),rep("D",8)),
                 Category = rep(c(rep("0-1",2),rep("2-4",2),rep("5-12",2),rep(">12",2)),2),
                 Drug = rep(c("drug X","drug Y"),16),
                 Total = as.numeric(sample(20:200,32,replace=TRUE)))

Solution

  • Firstly, use xtabs() to produce stratified contingency tables.

    tab <- xtabs(Total ~ Category + Drug + Cluster, df)
    tab
    
    # , , Cluster = A
    # 
    #         Drug
    # Category drug X drug Y
    #     >12      92     75
    #     0-1      33    146
    #     2-4     193     95
    #     5-12     76    195
    # 
    # etc.
    

    Then use apply() to conduct a Pearson's Chi-squared test over each stratum.

    apply(tab, 3, chisq.test)
    
    # $A
    # 
    #   Pearson's Chi-squared test
    # 
    # data:  array(newX[, i], d.call, dn.call)
    # X-squared = 145.98, df = 3, p-value < 2.2e-16
    #
    # etc.
    

    Furthermore, you can perform a Cochran-Mantel-Haenszel chi-squared test for conditional independence.

    mantelhaen.test(tab)
    
    #   Cochran-Mantel-Haenszel test
    # 
    # data:  tab
    # Cochran-Mantel-Haenszel M^2 = 59.587, df = 3, p-value = 7.204e-13