rgtsummary

How can I filter observation per its p-value after I used `tbl_merge`?


I tried to populate a table through merge and want to filter out those rows with any p-value greater than 5%, here is an example:

library(cards)
library(gtsummary)

tbl_1 <- 
    tbl_hierarchical(
      data = cards::ADAE |> filter(SEX == "M"),
      denominator = cards::ADAE |> filter(SEX == "M"),
      id = USUBJID,
      by = TRTA,
      variables = c(AESOC, AEDECOD),
      overall_row = TRUE,
    ) |> 
    sort_hierarchical("descending")

tbl_2 <- 
  tbl_hierarchical(
    data = cards::ADAE |> filter(SEX == "F"),
    denominator = cards::ADAE |> filter(SEX == "F"),
    id = USUBJID,
    by = TRTA,
    variables = c(AESOC, AEDECOD),
    overall_row = TRUE,
  ) |> 
  sort_hierarchical("descending")

tbl <- tbl_merge(tbls = list(tbl_1, tbl_2),
                 tab_spanner = c("Male", "Female"))

tbl

enter image description here

How can I filter out p >= 5% observations at the end, since filter_hierarchical() can only be used on tbl_hierarchical()? I cannot filter the values before merge because some of the observations with filter condition may only exist in one of the table...

Many thanks for your help!


Solution

  • As a note, this is not the p value, just the percentage of observations.

    You can actually directly filter on table_body. We need to extract the percentages from the string. Here I assume you want to keep a row if any percentage is greater than 5.

    library(cards)
    library(gtsummary)
    library(dplyr)
    
    tbl_1 <- 
      tbl_hierarchical(
        data = cards::ADAE |> filter(SEX == "M"),
        denominator = cards::ADAE |> filter(SEX == "M"),
        id = USUBJID,
        by = TRTA,
        variables = c(AESOC, AEDECOD),
        overall_row = TRUE,
      ) |> 
      sort_hierarchical("descending")
    
    tbl_2 <- 
      tbl_hierarchical(
        data = cards::ADAE |> filter(SEX == "F"),
        denominator = cards::ADAE |> filter(SEX == "F"),
        id = USUBJID,
        by = TRTA,
        variables = c(AESOC, AEDECOD),
        overall_row = TRUE,
      ) |> 
      sort_hierarchical("descending")
    
    tbl <- tbl_merge(tbls = list(tbl_1, tbl_2),
                     tab_spanner = c("Male", "Female"))
    
    # Extract value, check if larger than 5. Replace NA with FALSE
    get_percent <- function(value){
      replace_na(as.numeric(stringr::str_extract(value, "(?<=\\()\\d+(?:\\.\\d+)?(?=%\\))")) >=5, FALSE)
    }
    
    tbl$table_body <- tbl$table_body |> 
      filter(
        get_percent(stat_1_1) | get_percent(stat_2_1)  | get_percent(stat_3_1) |
          get_percent(stat_1_2) | get_percent(stat_2_2)  | get_percent(stat_3_2) 
      )