I would like to manually set groups for variables inside the call to tbl_summary. A group of variables should be reported both individually and as a group.
If i had a logical variable apple
and a logical variable banana
, i would like to report the frequencies of apple
and banana
, but also for fruits
.
Please see below an example with data adapted from dplyr::starwars. For ease of posting, I included plain markdown tables in this example(not the actual tbl_summary rendered tables), but I believe these explain my issue quite well.
Example data
library(dplyr)
data <-
starwars |>
slice_head(n=20) |>
select(gender, sex, eye_color, skin_color, species) |>
mutate(male_sex = sex == "male",
masculine_gender = gender == "masculine",
human_species = species == "Human",
blue_eyes = eye_color == "blue",
light_skin = skin_color == "light",
.keep = "unused") |>
{\(x) filter(x, complete.cases(x))}()
head(data)
# A tibble: 6 × 5
male_sex masculine_gender human_species blue_eyes light_skin
<lgl> <lgl> <lgl> <lgl> <lgl>
1 TRUE TRUE TRUE TRUE FALSE
2 FALSE TRUE FALSE FALSE FALSE
3 FALSE TRUE FALSE FALSE FALSE
4 TRUE TRUE TRUE FALSE FALSE
5 FALSE FALSE TRUE FALSE TRUE
6 TRUE TRUE TRUE TRUE TRUE
Vanilla tbl_summary() output with raw data
tbl_summary(data)
Characteristic | N = 191 |
---|---|
male_sex | 13 (68%) |
masculine_gender | 17 (89%) |
human_species | 12 (63%) |
blue_eyes | 6 (32%) |
light_skin | 4 (21%) |
This is how the final table should look like (note the hierarchical indenting/offsetting):
Characteristic | N = 191 |
---|---|
anything_at_all | 19 (100%) |
male_or_masculine | 17 (89%) |
male_sex | 13 (68%) |
masculine_gender | 17 (89%) |
human_species | 12 (63%) |
blue_or_light | 8 (42%) |
blue_eyes | 6 (32%) |
light_skin | 4 (21%) |
I could create the table above with some upstream data manipulation and manual editing of a regular markdown table. However, I would like to know if there is some sort of solution for such grouping strategy implemented in gtsummary. If i should stick with pre-processing the data, ok, then how do I create the hierarchical indentation?
Code that reproduces the desired table(without the indentation):
data |>
mutate(male_or_masculine = if_any(c(male_sex, masculine_gender)),
.before = male_sex) |>
mutate(blue_or_light = if_any(c(blue_eyes, light_skin)),
.before = blue_eyes) |>
mutate(anything_at_all = if_any(male_sex:light_skin),
.before = everything()) |>
tbl_summary()
I think the function bstfun::add_variable_grouping()
will do what you need.
https://mskcc-epi-bio.github.io/bstfun/reference/add_variable_grouping.html
set.seed(11234)
add_variable_grouping_ex1 <-
data.frame(
race_asian = sample(c(TRUE, FALSE), 20, replace = TRUE),
race_black = sample(c(TRUE, FALSE), 20, replace = TRUE),
race_white = sample(c(TRUE, FALSE), 20, replace = TRUE),
age = rnorm(20, mean = 50, sd = 10)
) %>%
gtsummary::tbl_summary(
label = list(race_asian = "Asian",
race_black = "Black",
race_white = "White",
age = "Age")
) %>%
add_variable_grouping(
"Race (check all that apply)" = c("race_asian", "race_black", "race_white")
)
With the example from the question:
data |>
tbl_summary() |>
bstfun::add_variable_grouping("male_or_masculine" = c("male_sex",
"masculine_gender"),
"blue_or_light" = c("blue_eyes",
"light_skin"))