data <- data.frame(
sex = factor(c("M", "F", "M")),
ageid = factor(c(8, 6, 7)),
married = factor(c(2, 1, 2)),
cagv_typ = factor(c("non-primary", "primary", "non-primary")),
sq5_1 = factor(c(1, 1, 1)),
sq5_2 = factor(c(0, 1, 0))
)
Among this dataframe, sex and married are variable, and the rest of them are outcomes. Actually I have more than 10 outcome variables and 5 subgroup variables.
At first, I made the following codes:
chisq_test <- function(data, var1, var2) {
contingency_table <- table(data[[var1]], data[[var2]])
test_result <- chisq.test(contingency_table)
return(test_result)
}
chisq_test(data = sq_catvar, var1 = "sex", var2 = "cagv_typ")
However, I found it still is super time-consuming if I manually input the outcome and variables one by one. Thus, I wonder if there is better approach to do chi-square test with reduced time.
Thank you in advance.
Best wishes
You can use expand.grid
to get all the combinations you are looking for:
combos <- expand.grid(x = names(data)[c(1, 3)], y = names(data)[-c(1, 3)])
combos
#> x y
#> 1 sex ageid
#> 2 married ageid
#> 3 sex cagv_typ
#> 4 married cagv_typ
#> 5 sex sq5_1
#> 6 married sq5_1
#> 7 sex sq5_2
#> 8 married sq5_2
And we can use apply
to iterate down this data frame and apply your chisq_test
function to each combination of variables. This will return a list of 8 chi-square tests:
combos$pval <- apply(combos, 1, function(x) chisq_test(data, x[1], x[2])$p.val)
combos
#> x y pval
#> 1 sex ageid 0.2231302
#> 2 married ageid 0.2231302
#> 3 sex cagv_typ 0.6650055
#> 4 married cagv_typ 0.6650055
#> 5 sex sq5_1 0.5637029
#> 6 married sq5_1 0.5637029
#> 7 sex sq5_2 0.6650055
#> 8 married sq5_2 0.6650055
This will easily scale up to five x variables and 10 y variables using the same code.
Please remember that if you are carrying out 50 Chi square tests, the p values will not be valid due to multiple hypothesis testing, and you will need a Bonferroni correction or similar to take account of the fact that you would expect 2 or 3 "significant" results purely by chance with this many significance tests.
Created on 2023-09-12 with reprex v2.0.2