rtidyversegtsummary

Variable grouping in gtsummary with dichotomous variable


I am trying to represent multiple dichotomous variables as a subgroup of another variable.

Below a sample of my current code:

set.seed(123)
library(tidyverse)
library(gtsummary)
library(bstfun)

d <- tidyr::tibble(
  id = c(1:20),
  et_yn = structure(factor(sample(seq(0,1), size = 20, replace = TRUE), levels = c(0,1), labels = c("No", "Yes")), label = "ET")
) %>%
  rowwise() %>%
  mutate(
    et_typ_1 = factor(ifelse(et_yn == "Yes", sample(seq(0,1), size = 1, replace = T), NA), levels = c(0,1), labels = c("No", "Yes")),
    et_typ_2 = factor(ifelse(et_yn == "Yes", sample(seq(0,1), size = 1, replace = T), NA), levels = c(0:1), labels = c("No", "Yes")),
    et_gp_1 = factor(ifelse(et_typ_2 == "Yes", sample(seq(0,1), size = 1, replace = T), NA), levels = c(0,1), labels = c("No", "Yes")),
    et_gp_2 = factor(ifelse(et_typ_2 == "Yes", sample(seq(0,1), size = 1, replace = T), NA), levels = c(0,1), labels = c("No", "Yes")),
    et_gp_3 = factor(ifelse(et_typ_2 == "Yes", sample(seq(0,1), size = 1, replace = T), NA), levels = c(0,1), labels = c("No", "Yes")),
  ) %>%
  data.frame() %>%
  mutate(
    et_typ_2 = structure(et_typ_2, label = "ET gp"),
    et_gp_1 = structure(et_gp_1, label = "St.a"),
    et_gp_2 = structure(et_gp_2, label = "St.b"),
    et_gp_3 = structure(et_gp_3, label = "St.c"),
  )

d %>%
  tbl_summary(
    data = .,
    include = -id,
    missing = "no",
  ) %>%
  add_n() %>%
  modify_header(stat_0 = "**n (%)**") %>%
  bstfun::add_variable_grouping(
    "GP" = c("et_gp_1", "et_gp_2", "et_gp_3"),
  ) %>%
  gtsummary::as_kable()
Characteristic N n (%)
ET 20 9 (45%)
et_typ_1 9 3 (33%)
ET gp 9 4 (44%)
GP
St.a 4 2 (50%)
St.b 4 1 (25%)
St.c 4 1 (25%)

Created on 2024-12-30 with reprex v2.1.1

However the output I would like is to have all St.<x> without an N and the GP row with that same N, like so:

Characteristic N n (%)
ET 20 9 (45%)
et_typ_1 9 3 (33%)
ET gp 9 4 (44%)
GP 4
St.a 2 (50%)
St.b 1 (25%)
St.c 1 (25%)

I tried appending the following code at the bottom of the previous code:

d %>%
  tbl_summary(
    data = .,
    include = -id,
    missing = "no",
  ) %>%
  add_n() %>%
  modify_header(stat_0 = "**n (%)**") %>%
  bstfun::add_variable_grouping(
    "GP" = c("et_gp_1", "et_gp_2", "et_gp_3"),
  ) %>%
  modify_table_body(
    fun = ~ .x %>%
      mutate(n = ifelse(variable == "GP", max(n), NA))
  )

which doesn't unfortunately result in the desired output and actually seems to have no effect.

What am I doing wrong?

The bstfun::add_variable_grouping() function is also a piece of code I would like to replace.

Thank you.


Solution

  • Here's an example of getting the table you're after.

    1. Build your basic table
    2. Indent the grouping variables
    3. Add a row for the grouping header
    4. Remove the Ns for the grouping variables, and add the N to the group header level.
    library(gtsummary)
    
    tbl <- trial |> 
      tbl_summary(
        include = c("age", "response", "death"),
        type = c("response", "death") ~ "dichotomous",
        missing = "no"
      ) |> 
      add_n()
    
    # print the table_body to see what is available to manipulate
    tbl$table_body
    #> # A tibble: 3 × 7
    #>   variable var_type    row_type var_label      label          n     stat_0     
    #>   <chr>    <chr>       <chr>    <chr>          <chr>          <chr> <chr>      
    #> 1 age      continuous  label    Age            Age            189   47 (38, 57)
    #> 2 response dichotomous label    Tumor Response Tumor Response 193   61 (32%)   
    #> 3 death    dichotomous label    Patient Died   Patient Died   200   112 (56%)
    
    # make final modificatons to the table
    tbl |> 
      # begin by indenting the sub variables
      modify_column_indent(
        columns = "label",
        rows = .data$variable %in% c("response", "death"),
        indent = 4
      ) |> 
      # next add a row for the grouping
      modify_table_body(
        ~ .x |> 
          tibble::add_row(label = "Treatment Outcome", .before = 2L)
      ) |> 
      # finally, remove Ns for the subgroups, and add an N to the grouping level
      modify_table_body(
        ~ .x |> 
          dplyr::mutate(
            n = dplyr::case_when(
              # put any number you want here, it's not always the case that the subgroups 
              # will have the same number of non-missing observations (as in this example)
              label == "Treatment Outcome" ~ stringr::str_glue("{nrow(trial)}"), 
              variable %in% c("response", "death") ~ NA,
              .default = .data$n
            )
          )
      ) |> 
      as_kable() # convert to kable to display on stackoverflow
    
    Characteristic N N = 200
    Age 189 47 (38, 57)
    Treatment Outcome 200
    Tumor Response 61 (32%)
    Patient Died 112 (56%)

    Created on 2024-12-30 with reprex v2.1.1