rgtsummary

How to get column and row percent when using tbl_summary in R?


I am attempting to create a descriptive table using tbl_summary in R that will show both the column percentage and the row percentage for each category. Consider the following example code using the iris R dataset:

iris_table <- iris %>%
  mutate(Sepal.Length.Cat = if_else(Sepal.Length > 5, "Big","Small")) %>% 
  tbl_summary(by = Species,
              include = c(Sepal.Length.Cat, Sepal.Width),
              type = list(
                Sepal.Width ~ "continuous2"
              ),
              statistic = list(
                all_continuous2() ~ c("{mean} ({sd})","{min} - {max}"),
                all_categorical() ~ "{n} ({p}%)"
              ),
              missing_text = "Missing") %>% 
  add_overall(col_label = "**Overall** <br>N = {n}") %>% 
  modify_footnote(update = everything() ~ NA)
print(iris_table)

Using the above code, I get the default output for the {p} argument, which is percent = "column". However, I would also like to include a row percent. For example, n = 22 in the cell for Setosa species with "Big" sepal length. Right now, the cell is showing 22 (44%), which 22 / 50 Setosa total. I would like for it to show something like "n = 22 (44%; 19%)", with the 19% being the additional row percent (i.e., 22/ 118 total Big).

I have attempted using the percent = "row" argument that is built into tbl_summary. However, that not only gets rid of the column percent, but it also changes the percents in the Overall column to be row percents as well. I would like the column percents to stay and the Overall category to remain just column percents.


Solution

  • The package wasn't designed to show both row and column percentages. But in the new version of the package, there are more generalizable ways to create bespoke tables. In the example below, we first calculate all the statistics that will appear in the table, then pass them to a new function called tbl_ard_summary(). It's not the easiest code to read, but it does get you both percentage.

    library(cards)
    library(gtsummary)
    packageVersion("gtsummary")
    #> [1] '2.0.1.9002'
    
    iris2 <- iris |> 
      dplyr::mutate(Sepal.Length.Cat = ifelse(Sepal.Length > 5, "Big","Small"))
    
    # create the primary ARD
    ard <- iris2 |>
      ard_stack(
        .by = Species,
        ard_continuous(variables = Sepal.Width),
        ard_categorical(variables = Sepal.Length.Cat),
        .missing = TRUE,
        .attributes = TRUE
      ) |> 
      # create ARD for row percentages
      bind_ard(
        ard_categorical(iris2, by = Species, variables = Sepal.Length.Cat, statistic = ~"p", denominator = "row") |> 
          dplyr::mutate(stat_name = ifelse(stat_name == "p", "p_row", stat_name))
      )
    
    
    # pass the ARD to gtsummary to create table
    ard |> 
      tbl_ard_summary(
        by = Species,
        include = c(Sepal.Length.Cat, Sepal.Width),
        type = list(
          Sepal.Width ~ "continuous2"
        ),
        statistic = list(
          all_continuous2() ~ c("{mean} ({sd})","{min} - {max}"),
          all_categorical() ~ "{n} (Column {p}%; Row {p_row}%)"
        ),
        missing_text = "Missing"
      ) |> 
      modify_footnote(all_stat_cols() ~ NA) |> 
      as_kable() # convert to kable to display on SO
    
    Characteristic setosa versicolor virginica
    Sepal.Length.Cat
    Big 22 (Column 44.0%; Row 18.6%) 47 (Column 94.0%; Row 39.8%) 49 (Column 98.0%; Row 41.5%)
    Small 28 (Column 56.0%; Row 87.5%) 3 (Column 6.0%; Row 9.4%) 1 (Column 2.0%; Row 3.1%)
    Sepal.Width
    Mean (SD) 3.4 (0.4) 2.8 (0.3) 3.0 (0.3)
    Min - Max 2.3 - 4.4 2.0 - 3.4 2.2 - 3.8

    Created on 2024-08-27 with reprex v2.1.1