First, I admit, I've asked this elsewhere (Crossvalidated), but I guess it is not necessarily the same people reading both forums.
I'm finishing a manuscript, but have one nagging problem which I'm unable to find an answer.
Study has two groups which receive different procedure for a disease. After the procedure, one patient may face multiple different adverse events (complications). I want to test if there is a difference in total number of adverse events between groups.
Obviously it would be all too easy with a simple single categorical variable, but I'm in doubt if using chi-squared or Fisher's test is the right approach. If I'm not mistaken, their presumptions differ as this is not simple on/off per case-situation.
One idea I got was to create a continuous dummy variable for total number of complications per case and compare those with t-test or something, but it still answers a bit different question imho.
The exact problem without reported answer can be found for example from the CHOCOLATE-trial (https://www.bmj.com/content/363/bmj.k3965.long) table 3, last 3 rows.
Here is a reproducible code of the end result.
# Values are strored in vectors where each level means if a complication occured. Level 0 means there was none
# All variables have similar levels
# If variable compl_1 would have level 0 (there was no complication on that case),
# variables compl_2 and compl_3 would also have in real dataset.
# This is not true for the following reproducible example,
# but it hopefully still delineates the problem.
# Here we have a sample size of 200 cases, 100 for each group
data <- data.frame(group = as.factor(ifelse(runif(200) > 0.5, 0, 1)),
compl_1 = c(rep(0, 140), rep(1, 40), rep(2, 14), rep(3,4), rep(4,2)),
compl_2 = c(rep(0, 182), rep(1, 10), rep(2, 7), rep(3,1), rep(4,0)),
compl_3 = c(rep(0, 187), rep(1, 5), rep(2, 7), rep(3,1), rep(4,0)))
# Then we count totals for each complication type by pivoting long, doing the count and pivoting back
table <- data |>
pivot_longer(compl_1:compl_3) |>
count(group, value) |>
pivot_wider(names_from = group, values_from = n, names_sort = T, values_fill = 0)
# Of course count for level 0 is bonkers as there were 100 cases per group
# So this is fixed by replacing it with the count of 0-levels in compl_1
no_complication <- data |>
dplyr::select(group, compl_1) |>
count(group, compl_1) |>
pivot_wider(names_from = group, values_from = n, names_sort = T, values_fill = 0) |>
slice_head(n=1)
table[1,] <- no_complication[1,]
# Next we add a row that contains the total number of complication (occurences of levels 1-4 for compl-variables)
complications_total <- table |>
slice(-1) |>
mutate(`0` = sum(`0`, na.rm = T),
`1` = sum(`1`, na.rm = T)) |>
slice(1)
#To make end result clearer, let's add some names to cases
complications_total$value <- factor(complications_total$value, labels = "Total complications")
table$value <- factor(table$value, labels = c("No complication",
"Head cut",
"Foot off",
"Hand torn",
"Eyes blown"))
#Then we add the row of totals and give names to operations
table <- table |>
add_row(complications_total[1,]) |>
arrange(desc(`0`)) |>
rename(Henry = `0`,
Martin = `1`)
So in other words, I want to add a new column with p-values and with the correct statistical test. Comparing group "No complications" is obviously easy, but for others I can't come up with anything.
Edit: After some extra thought, I think I've found a solution. Was probably overthinking this. I'll just take case based mean of each complication and compare those. I guess this is just a variant of what Brian suggested, so I'll flag that as the solution. Thanks for both!
As done in the paper, you could calculate the relative risk of a complication in the groups if you just wish to collapse all complications.
library(tidyverse)
library(epitools)
rr_res <- table %>% # From your final 'table' data above
# Create contingency table, collapsing all complications into one group
mutate(value = factor(ifelse(value == "No complication", "No complication", "Complication"),
# Order for risk of complication
levels = c("No complication", "Complication"))) %>%
group_by(value) %>%
summarise(Henry = sum(Henry), Martin = sum(Martin)) %>%
select(-value) %>%
as.matrix() %>%
t() %>% # Conditions in rows, outcome in columns
riskratio.wald()
rr_res$measure[2,] %>% round(2)
produces risk ratio with 95%CI of being in Martin group and developing complication.
estimate lower upper
0.82 0.67 0.99
You might also produce odds-ratio with test.