rgtsummary

GTSUMMARY Header counts with TBL_STRATA_NESTED_STACK


I am trying to generate a summary table for multiple parameters by Treatment. Below is a sample program.

# Load packages for program execution
pkg.load <- c("gtsummary", "cards", "labelled", "admiral", "tidyverse", "knitr")
invisible(lapply(pkg.load, library, character.only = TRUE))

# Apply GTSUMMARY theme
theme_gtsummary_compact()

# Load data
adsl <- pharmaverseadam::adsl |> 
  filter(SAFFL == "Y") |> 
  select(c(USUBJID, TRT01A))

# Load data
advs <- pharmaverseadam::advs %>%
  filter(SAFFL == "Y" & !is.na(ANRIND)) %>%
  select(c(USUBJID, TRT01A, PARAMCD, PARAM, AVISIT, AVISITN, ADT, AVAL, CHG, PCHG, ANRIND))

# Keep max for summary
advs.smr <- advs %>%
  group_by(USUBJID, TRT01A, PARAMCD, PARAM) %>%
  summarise(AVL.NRIND = max(ANRIND, na.rm = TRUE),
            .groups = 'drop') |> 
  arrange(USUBJID, PARAMCD) |> 
  filter((PARAMCD == 'SYSBP' & !grepl("2$", USUBJID)) | (PARAMCD == 'DIABP' & !grepl("3$", USUBJID))|
           (PARAMCD == 'TEMP' & !grepl("1$", USUBJID)))

# Table summary
tbl.smry <- advs.smr |> 
  tbl_strata_nested_stack(
    strata = PARAM
    ,.tbl_fun = ~ .x %>%
      tbl_summary(
        by = TRT01A,
        include = AVL.NRIND,
        type = list(AVL.NRIND ~ "categorical"),
        statistic = list(all_categorical() ~ "{n} ({p})"),
        digits = list(everything() ~ c(0, 1)),
        label = list(AVL.NRIND = "Normal Indicator"),
        missing = "ifany",
        percent = adsl
      ) |> 
      add_overall(last = TRUE))

# ADSL Freq counts
addmargins(table(adsl$TRT01A, useNA = "always"))
show_header_names(tbl.smry)

Output:

enter image description here

There are couple of challenges I am facing here.

  1. The header counts are not correct. I expected the percent call within tbl_summary to dictate the header counts (i.e. N's used for denominator are presented on header). It seems like a bug to me as the percentages on table body are calculated correctly using ADSL; but N count in header is presented from 1st PARAMCD.
  2. I see console message stating the count differences within PARAMCD's. I don't see these counts saved anywhere under the GTSUMMARY object (tbl.smry in program). I want to use the corresponding counts to populate header in the final output (DOCX). Is there any way to store the different N counts by PARAMCD's?

enter image description here


Solution

  • In the past, the headers of tables have been set independently of the tbl_summary(percent) value. We just recently added flexibility to pass a data frame in the percent argument, and in the change, we did not change the calculations of the headers (which use the tbl_summary(data) argument to construct). However, reading through your example, I agree with you that using tbl_summary(percent=<data.frame>) makes more sense, when a data frame is passed. I updated the dev version of gtsummary with the behaviour and will be included in the next release. Thanks for posting here; it helps make the pkg better!

    The example below uses the dev version of the package. But if you are unable to use the dev version, you can merge the adsl data frame to correct the denominators, e.g. ~ .x |> right_join(adsl, by = "USUBJID"). In the example below, I opted to use tbl_strata2() over tbl_strata_nested_stack(), because the result is more compact while still including the necessary information. But you can certainly still use the latter if that better suits your needs.

    # # install dev version
    # pak::pak("ddsjoberg/gtsummary")
    
    library(tidyverse)
    library(gtsummary)
    packageVersion("gtsummary")
    #> [1] '2.4.0.9002'
    
    # Apply GTSUMMARY theme
    theme_gtsummary_compact()
    #> Setting theme "Compact"
    
    # Load data
    adsl <- pharmaverseadam::adsl |> 
      filter(SAFFL == "Y") |> 
      select(c(USUBJID, TRT01A))
    
    # Load data
    advs <- pharmaverseadam::advs %>%
      filter(SAFFL == "Y", !is.na(ANRIND), PARAMCD %in% c('SYSBP', 'DIABP')) %>%
      select(c(USUBJID, TRT01A, PARAMCD, PARAM, AVISIT, AVISITN, ADT, AVAL, CHG, PCHG, ANRIND)) |> 
      # Keep max for summary
      slice_max(order_by = ANRIND, n = 1, with_ties = FALSE, by = c(USUBJID, PARAMCD))
    
    tbl <- advs |> 
      tbl_strata2(
        strata = PARAM,
        ~ .x |> 
          tbl_summary(
            include = ANRIND,
            by = TRT01A,
            label = list(ANRIND = .y),
            percent = adsl
          ) |> 
          add_overall(last = TRUE),
        .combine_with = "tbl_stack",
        .combine_args = list(group_header = NULL)
      )
    

    enter image description here Created on 2025-10-05 with reprex v2.1.1