Imagine I have the following data:
ID. | Drug1. | Drug2. | Drug3. | Drug4. |
---|---|---|---|---|
1. | 1. | 0. | 0. | 0. |
2. | 0. | 0. | 0. | 1. |
3. | 0. | 1. | 0. | 0. |
4. | 0. | 0. | 1. | 0. |
5. | 1. | 0. | 0. | 0. |
Where ID is the number given to each patient and each Drug variable is a binary variable where 1 indicates that patient had a certain condition on that drug and 0 indicates he/she didn't.
In order to compare the proportion of the rate of condition between drugs, I want to perform chi-sqauare tests like: Drug1 vs Drug2, Drug1 vs Drug3, Drug1 vs Drug4, Drug2 vs Drug3, Drug2 vs Drug4, etc.
How can I do this in R in one line of code? Btw, is it necessary to implement correction for multiple comparisons (e.g., Bonferroni)?
Below is a tidyverse approach using {dplyr}.
I first generate some data to run real tests with meaningful results.
Then we can use the colnames
of mydat
with combn
to get all pairs of drugs. Then we can use rowwise
and mutate
and apply chisq.test()
to each row. Here we use the strings in V1
and V2
to subset the variables in mydat
. Since we are in a data.frame
we have to wrap the result in list
if its a non-atomic vector. We can subset chisq_test
with $p.value
to get the p values.
library(dplyr)
set.seed(123)
mydat <- tibble(ID = 1:1000,
Drug1 = round(rnorm(1000, 0.8, sd = 0.5)),
Drug2 = round(rnorm(1000, 0.6, sd = 0.5), 0),
Drug3 = round(rnorm(1000, 0.5, sd = 0.5), 0),
Drug4 = round(rnorm(1000, 0.3, sd = 0.3), 0)
) %>%
mutate(across(starts_with("Drug"), ~ case_when(.x >0 ~ 0,
.x <1 ~ 1,
TRUE ~ .x))
)
mydat %>%
select(-ID) %>%
colnames() %>%
combn(2) %>%
t() %>%
as_tibble() %>%
rowwise %>%
mutate(chisq_test = list(
table(mydat[[V1]], mydat[[V2]]) %>% chisq.test()
),
chisq_pval = chisq_test$p.value
)
#> Using compatibility `.name_repair`.
#> # A tibble: 6 x 4
#> # Rowwise:
#> V1 V2 chisq_test chisq_pval
#> <chr> <chr> <list> <dbl>
#> 1 Drug1 Drug2 <htest> 0.00694
#> 2 Drug1 Drug3 <htest> 0.298
#> 3 Drug1 Drug4 <htest> 0.926
#> 4 Drug2 Drug3 <htest> 0.998
#> 5 Drug2 Drug4 <htest> 0.574
#> 6 Drug3 Drug4 <htest> 0.895
Created on 2022-04-04 by the reprex package (v2.0.1)
Below is my old answer, which compares the distribution of 0
and 1
within each drug, which is not what the OP asked for, as @KU99 correctly pointed out in the comments.
library(tibble) # for reading in your data
mydat <-
tribble(~ID, ~Drug1, ~Drug2, ~Drug3, ~Drug4,
1, 1, 0, 0, 0,
2, 0, 0, 0, 1,
3, 0, 1, 0, 0,
4, 0, 0, 1, 0,
5, 1, 0, 0, 0
)
lapply(mydat[, -1], function(x) chisq.test(table(x)))
#> $Drug1
#>
#> Chi-squared test for given probabilities
#>
#> data: table(x)
#> X-squared = 0.2, df = 1, p-value = 0.6547
#>
#>
#> $Drug2
#>
#> Chi-squared test for given probabilities
#>
#> data: table(x)
#> X-squared = 1.8, df = 1, p-value = 0.1797
#>
#>
#> $Drug3
#>
#> Chi-squared test for given probabilities
#>
#> data: table(x)
#> X-squared = 1.8, df = 1, p-value = 0.1797
#>
#>
#> $Drug4
#>
#> Chi-squared test for given probabilities
#>
#> data: table(x)
#> X-squared = 1.8, df = 1, p-value = 0.1797
Created on 2022-03-29 by the reprex package (v0.3.0)