I hope someone can help me with this.
In the example dataset below, participants are divided into Group A and B. The objective is to ensure that when the gtable is produced, when the n is below 5, there will be a blank cell. In this example, contrary to what the code below shows, the objective would be for all ethnicity groups characterising females in group A to be ommitted (because their n is 3 or 0), for the ethinicity group B, corresponding to males in group A to be ommitted (n = 3), etc.
The overall purpose is to implement what is called a statistical disclosure control, whereby if the n in a cell is below a number, then the results for that cell are not disclosed, to avoid potential identification of the participants.
In this example the columns for gender across group A and B all have n>5, namely 9, 27, 15, and 27. The objective would also be to have the entire column not have data if the n was below 5 (same rationale as the one applied on a per cell basis).
Any help would be much appreciated, thank you
Group <- c("A", "B", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B",
"A", "B", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B",
"A", "B", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B")
Sex <- c("M", "F", "F", "M", "F", "M", "M", "M", "M", "F", "M", "M", "M",
"M", "M", "F", "M", "M", "M", "M", "M", "F", "M", "F", "M", "F",
"M", "F", "F", "M", "F", "M", "M", "M", "M", "F", "M", "M", "M",
"M", "M", "F", "M", "M", "M", "M", "M", "F", "M", "F", "M", "F",
"M", "F", "F", "M", "F", "M", "M", "M", "M", "F", "M", "M", "M",
"M", "M", "F", "M", "M", "M", "M", "M", "F", "M", "F", "M", "F")
Height <- c(170, 181, 190, 183, 199, 165, 155, 170, 185, 176, 176, 177, 182,
181, 164, 165, 171, 181, 201, 171, 173, 167, 168, 184, 183, 182,
170, 181, 190, 183, 199, 165, 155, 170, 185, 176, 176, 177, 182,
181, 164, 165, 171, 181, 201, 171, 173, 167, 168, 184, 183, 182,
170, 181, 190, 183, 199, 165, 155, 170, 185, 176, 176, 177, 182,
181, 164, 165, 171, 181, 201, 171, 173, 167, 168, 184, 183, 182)
Ethnicity <- c("A", "A", "B", "A", "A", "C", "C", "B", "C", "C", "C", "D", "D",
"E", "E", "D", "D", "D", "E", "E", "E", "A", "F", "C", "D", "F",
"A", "A", "B", "A", "A", "C", "C", "B", "C", "C", "C", "D", "D",
"E", "E", "D", "D", "D", "E", "E", "E", "A", "F", "C", "D", "F",
"A", "A", "B", "A", "A", "C", "C", "B", "C", "C", "C", "D", "D",
"E", "E", "D", "D", "D", "E", "E", "E", "A", "F", "C", "D", "F")
df <- data.frame(Group, Sex, Height, Ethnicity)
df %>%
tbl_strata(strata = Group,
.tbl_fun = ~ .x %>%
tbl_summary(by = Sex,
statistic = list(all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n} / {N} ({p}%)"),
digits = all_continuous() ~ 1,
missing_text = "(Missing Data)")) %>%
modify_header(label ~ "**Relevant Data**") %>%
modify_caption("**Table 1. Key descriptives**") %>%
bold_labels()
Created on 2023-08-19 with reprex v2.0.2
Using gtsummary
you can modify the table body, and perform your own operations to substitute empty cell data depending on values.
For example, if you want to look at the table body, try:
tab <- df %>%
tbl_strata(strata = Group,
.tbl_fun = ~ .x %>%
tbl_summary(by = Sex,
statistic = list(all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n} / {N} ({p}%)"),
digits = all_continuous() ~ 1,
missing_text = "(Missing Data)"))
And then:
tab$table_body
You will see stat_1_1
, stat_1_2
, etc. as columns to modify:
variable var_label row_type label var_type_1 stat_1_1 stat_2_1 var_type_2 stat_1_2 stat_2_2
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Height Height label Height continuous 188.3 (10.0) 173.7 (9.3) continuous 175.8 (8.4) 177.0 (10.6)
2 Ethnicity Ethnicity label Ethnicity categorical NA NA categorical NA NA
3 Ethnicity Ethnicity level A categorical 3 / 9 (33%) 6 / 27 (22%) categorical 6 / 15 (40%) 0 / 27 (0%)
4 Ethnicity Ethnicity level B categorical 3 / 9 (33%) 3 / 27 (11%) NA NA NA
5 Ethnicity Ethnicity level C categorical 3 / 9 (33%) 12 / 27 (44%) categorical 3 / 15 (20%) 0 / 27 (0%)
6 Ethnicity Ethnicity level D categorical 0 / 9 (0%) 6 / 27 (22%) categorical 3 / 15 (20%) 9 / 27 (33%)
7 Ethnicity Ethnicity level E NA NA NA categorical 0 / 15 (0%) 15 / 27 (56%)
8 Ethnicity Ethnicity level F NA NA NA categorical 3 / 15 (20%) 3 / 27 (11%)
Using modify_table_body
can you mutate
across
those columns, and check if the value is less than 5.
While there are multiple options to extract the number from a character value (in this case, with '/' and a percentage in parentheses), if we make the assumption you want the first number (which is 'n'), you can use parse_number
from readr
.
Here is a complete example:
library(gtsummary)
library(tidyverse)
df %>%
tbl_strata(strata = Group,
.tbl_fun = ~ .x %>%
tbl_summary(by = Sex,
statistic = list(all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n} / {N} ({p}%)"),
digits = all_continuous() ~ 1,
missing_text = "(Missing Data)")) %>%
modify_header(label ~ "**Relevant Data**") %>%
modify_caption("**Table 1. Key descriptives**") %>%
modify_table_body(~ .x %>%
mutate(across(starts_with("stat_"),
~ifelse(parse_number(.) < 5,
"",
.)))) %>%
bold_labels()
Table