rtidyversegtsummarygtable

Ensuring n<5 cells in gtable do not provide results


I hope someone can help me with this.

In the example dataset below, participants are divided into Group A and B. The objective is to ensure that when the gtable is produced, when the n is below 5, there will be a blank cell. In this example, contrary to what the code below shows, the objective would be for all ethnicity groups characterising females in group A to be ommitted (because their n is 3 or 0), for the ethinicity group B, corresponding to males in group A to be ommitted (n = 3), etc.

The overall purpose is to implement what is called a statistical disclosure control, whereby if the n in a cell is below a number, then the results for that cell are not disclosed, to avoid potential identification of the participants.

In this example the columns for gender across group A and B all have n>5, namely 9, 27, 15, and 27. The objective would also be to have the entire column not have data if the n was below 5 (same rationale as the one applied on a per cell basis).

Any help would be much appreciated, thank you

Group <- c("A", "B", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
               "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B",
           "A", "B", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
           "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B",
           "A", "B", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
           "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B")
Sex <- c("M", "F", "F", "M", "F", "M", "M", "M", "M", "F", "M", "M", "M",
           "M", "M", "F", "M", "M", "M", "M", "M", "F", "M", "F", "M", "F",
         "M", "F", "F", "M", "F", "M", "M", "M", "M", "F", "M", "M", "M",
         "M", "M", "F", "M", "M", "M", "M", "M", "F", "M", "F", "M", "F",
         "M", "F", "F", "M", "F", "M", "M", "M", "M", "F", "M", "M", "M",
         "M", "M", "F", "M", "M", "M", "M", "M", "F", "M", "F", "M", "F")
Height <- c(170, 181, 190, 183, 199, 165, 155, 170, 185, 176, 176, 177, 182, 
            181, 164, 165, 171, 181, 201, 171, 173, 167, 168, 184, 183, 182,
            170, 181, 190, 183, 199, 165, 155, 170, 185, 176, 176, 177, 182, 
            181, 164, 165, 171, 181, 201, 171, 173, 167, 168, 184, 183, 182,
            170, 181, 190, 183, 199, 165, 155, 170, 185, 176, 176, 177, 182, 
            181, 164, 165, 171, 181, 201, 171, 173, 167, 168, 184, 183, 182)
Ethnicity <- c("A", "A", "B", "A", "A", "C", "C", "B", "C", "C", "C", "D", "D",
              "E", "E", "D", "D", "D", "E", "E", "E", "A", "F", "C", "D", "F",
              "A", "A", "B", "A", "A", "C", "C", "B", "C", "C", "C", "D", "D",
              "E", "E", "D", "D", "D", "E", "E", "E", "A", "F", "C", "D", "F",
              "A", "A", "B", "A", "A", "C", "C", "B", "C", "C", "C", "D", "D",
              "E", "E", "D", "D", "D", "E", "E", "E", "A", "F", "C", "D", "F")

df <- data.frame(Group, Sex, Height, Ethnicity)

df %>%
  tbl_strata(strata = Group,
             .tbl_fun = ~ .x %>%
               tbl_summary(by = Sex,
                           statistic = list(all_continuous() ~ "{mean} ({sd})",
                                            all_categorical() ~ "{n} / {N} ({p}%)"),
                           digits = all_continuous() ~ 1,
                           missing_text = "(Missing Data)")) %>%
  modify_header(label ~ "**Relevant Data**") %>%
  modify_caption("**Table 1. Key descriptives**") %>%
  bold_labels()

Created on 2023-08-19 with reprex v2.0.2


Solution

  • Using gtsummary you can modify the table body, and perform your own operations to substitute empty cell data depending on values.

    For example, if you want to look at the table body, try:

    tab <- df %>%
      tbl_strata(strata = Group,
                 .tbl_fun = ~ .x %>%
                   tbl_summary(by = Sex,
                               statistic = list(all_continuous() ~ "{mean} ({sd})",
                                                all_categorical() ~ "{n} / {N} ({p}%)"),
                               digits = all_continuous() ~ 1,
                               missing_text = "(Missing Data)"))
    

    And then:

    tab$table_body
    

    You will see stat_1_1, stat_1_2, etc. as columns to modify:

      variable  var_label row_type label     var_type_1  stat_1_1     stat_2_1      var_type_2  stat_1_2     stat_2_2     
      <chr>     <chr>     <chr>    <chr>     <chr>       <chr>        <chr>         <chr>       <chr>        <chr>        
    1 Height    Height    label    Height    continuous  188.3 (10.0) 173.7 (9.3)   continuous  175.8 (8.4)  177.0 (10.6) 
    2 Ethnicity Ethnicity label    Ethnicity categorical NA           NA            categorical NA           NA           
    3 Ethnicity Ethnicity level    A         categorical 3 / 9 (33%)  6 / 27 (22%)  categorical 6 / 15 (40%) 0 / 27 (0%)  
    4 Ethnicity Ethnicity level    B         categorical 3 / 9 (33%)  3 / 27 (11%)  NA          NA           NA           
    5 Ethnicity Ethnicity level    C         categorical 3 / 9 (33%)  12 / 27 (44%) categorical 3 / 15 (20%) 0 / 27 (0%)  
    6 Ethnicity Ethnicity level    D         categorical 0 / 9 (0%)   6 / 27 (22%)  categorical 3 / 15 (20%) 9 / 27 (33%) 
    7 Ethnicity Ethnicity level    E         NA          NA           NA            categorical 0 / 15 (0%)  15 / 27 (56%)
    8 Ethnicity Ethnicity level    F         NA          NA           NA            categorical 3 / 15 (20%) 3 / 27 (11%) 
    

    Using modify_table_body can you mutate across those columns, and check if the value is less than 5.

    While there are multiple options to extract the number from a character value (in this case, with '/' and a percentage in parentheses), if we make the assumption you want the first number (which is 'n'), you can use parse_number from readr.

    Here is a complete example:

    library(gtsummary)
    library(tidyverse)
    
    df %>%
      tbl_strata(strata = Group,
                 .tbl_fun = ~ .x %>%
                   tbl_summary(by = Sex,
                               statistic = list(all_continuous() ~ "{mean} ({sd})",
                                                all_categorical() ~ "{n} / {N} ({p}%)"),
                               digits = all_continuous() ~ 1,
                               missing_text = "(Missing Data)")) %>%
      modify_header(label ~ "**Relevant Data**") %>%
      modify_caption("**Table 1. Key descriptives**") %>%
      modify_table_body(~ .x %>% 
                          mutate(across(starts_with("stat_"), 
                                        ~ifelse(parse_number(.) < 5, 
                                                "", 
                                                .)))) %>% 
      bold_labels()
    

    Table

    table with cell values omitted if less than 5