rgtsummary

Group variables into toplevel variables within tbl_summary()


I would like to manually set groups for variables inside the call to tbl_summary. A group of variables should be reported both individually and as a group. If i had a logical variable apple and a logical variable banana, i would like to report the frequencies of apple and banana, but also for fruits.

Please see below an example with data adapted from dplyr::starwars. For ease of posting, I included plain markdown tables in this example(not the actual tbl_summary rendered tables), but I believe these explain my issue quite well.

Example data

library(dplyr)
data <-
    starwars |> 
    slice_head(n=20) |> 
    select(gender, sex, eye_color, skin_color, species) |> 
    mutate(male_sex = sex == "male",
           masculine_gender = gender == "masculine",
           human_species = species == "Human",
           blue_eyes = eye_color == "blue",
           light_skin = skin_color == "light",
           .keep = "unused") |> 
    {\(x) filter(x, complete.cases(x))}()  

head(data)
# A tibble: 6 × 5
  male_sex masculine_gender human_species blue_eyes light_skin
  <lgl>    <lgl>            <lgl>         <lgl>     <lgl>     
1 TRUE     TRUE             TRUE          TRUE      FALSE     
2 FALSE    TRUE             FALSE         FALSE     FALSE     
3 FALSE    TRUE             FALSE         FALSE     FALSE     
4 TRUE     TRUE             TRUE          FALSE     FALSE     
5 FALSE    FALSE            TRUE          FALSE     TRUE      
6 TRUE     TRUE             TRUE          TRUE      TRUE      

Vanilla tbl_summary() output with raw data

tbl_summary(data)
Characteristic N = 191
male_sex 13 (68%)
masculine_gender 17 (89%)
human_species 12 (63%)
blue_eyes 6 (32%)
light_skin 4 (21%)

This is how the final table should look like (note the hierarchical indenting/offsetting):

Characteristic N = 191
anything_at_all 19 (100%)
  male_or_masculine 17 (89%)
    male_sex 13 (68%)
    masculine_gender 17 (89%)
  human_species 12 (63%)
  blue_or_light 8 (42%)
    blue_eyes 6 (32%)
    light_skin 4 (21%)

I could create the table above with some upstream data manipulation and manual editing of a regular markdown table. However, I would like to know if there is some sort of solution for such grouping strategy implemented in gtsummary. If i should stick with pre-processing the data, ok, then how do I create the hierarchical indentation?

Code that reproduces the desired table(without the indentation):

data |> 
    mutate(male_or_masculine = if_any(c(male_sex, masculine_gender)),
           .before = male_sex) |> 
    mutate(blue_or_light = if_any(c(blue_eyes, light_skin)),
           .before = blue_eyes) |> 
    mutate(anything_at_all = if_any(male_sex:light_skin),
           .before = everything()) |> 
    tbl_summary()

Solution

  • I think the function bstfun::add_variable_grouping() will do what you need.

    https://mskcc-epi-bio.github.io/bstfun/reference/add_variable_grouping.html

    set.seed(11234)
    add_variable_grouping_ex1 <-
      data.frame(
        race_asian = sample(c(TRUE, FALSE), 20, replace = TRUE),
        race_black = sample(c(TRUE, FALSE), 20, replace = TRUE),
        race_white = sample(c(TRUE, FALSE), 20, replace = TRUE),
        age = rnorm(20, mean = 50, sd = 10)
      ) %>%
      gtsummary::tbl_summary(
        label = list(race_asian = "Asian",
                     race_black = "Black",
                     race_white = "White",
                     age = "Age")
      ) %>%
      add_variable_grouping(
        "Race (check all that apply)" = c("race_asian", "race_black", "race_white")
      )
    

    enter image description here

    With the example from the question:

    data |> 
        tbl_summary() |> 
        bstfun::add_variable_grouping("male_or_masculine" = c("male_sex",
                                                              "masculine_gender"),
                                      "blue_or_light" = c("blue_eyes", 
                                                          "light_skin"))
    
    

    enter image description here