rr-markdownfrequency-distribution

Rounding in frequency tables in R


I was wondering if anyone proficient in R/RMarkdown would be able to guide me with an issue. I am looking to generate a frequency table and so far, I have been using tableby of the arsenal package as it is easy and convenient to integrate in a RMarkdown docx/html. However, I have been asked to provide rounded frequencies (to the nearest 5 or 10) and have been trying to find ways to do it without much success.

I have generated a fake simple dataset as I cannot share my data for confidentialy reason and this is how I would do a normal table.

set.seed(1234)

library(dplyr)
library(arsenal)

x1 <- c(rep("Man",40),rep("Woman",60)) %>% as.factor()
x2 <- sample(c("Sick","Healthy"),100,replace=TRUE) %>% as.factor()

df <- data.frame(x1,x2)

Control_notrounded <- tableby.control(digits=0,digits.pct=2,cat.stats=c("countpct","Nmiss2"))

table <- tableby(x1~x2,control=Control_notrounded,data=df)
print(summary(table))

However, even though rounding to the nearest 10 with a traditional rounding function is performed by passing digits=-1, this does not seem to be a working approach with that function as I get a warning indicating that digits must be >=0.

Control_rounded <- tableby.control(digits=-1,digits.pct=2,cat.stats=c("countpct","Nmiss2"))
table2 <- tableby(x1~x2,control=Control_rounded,data=df)
print(summary(table2))

Is there any way to do that? Otherwise, would anyone have an alternative package that would allow to create relatively straightforwardly frequency tables with rounded values?


Solution

  • I can recommend using the gtsummary package for creating baseline tables instead - then try the following round_5_gtsummary() function from this little GitHub package:

    set.seed(1234)
    library(dplyr)
    library(gtsummary)
    library(stringr)
    
    x1 <- c(rep("Man",40),rep("Woman",60)) %>% as.factor()
    x2 <- sample(c("Sick","Healthy"),100,replace=TRUE) %>% as.factor()
    df <- data.frame(x1,x2)
    
    install.packages("devtools")
    devtools::install_github("zheer-kejlberg/Z.gtsummary.addons")
    library(Z.gtsummary.addons)
    
    df %>% tbl_summary(by = "x1") %>% 
      add_overall(last = TRUE) %>% 
      round_5_gtsummary()  %>%
      add_p()
    
    

    Result: enter image description here


    WEIGHTED VERSION

    # Create IPT weights
    library(WeightIt)
    df$w <- weightit(x1~x2, data = df, estimand = "ATT", focal = "Man")$weights
    

    Use survey to create a svydesign object. Then apply tbl_svysummary() to that:

    library(survey)
    df %>% survey::svydesign(~1, data = ., weights = ~w) %>%
      tbl_svysummary(by = "x1", include=c(x2)) %>%
      add_overall(last = TRUE) %>%
      round_5_gtsummary() %>%
      add_p()
    

    ALTERNATIVE WAY:

    To use the built-in tbl_summary(digits=) argument to separately round the counts and percentages, you can do:

    library(gtsummary)
    library(dplyr)
    set.seed(1234)
    
    round_5 <- function(vec) {
      fun <- function(x) {
        if (x < 1) { return(round(x*100/5)*5)
        } else { return(round(x/5)*5) }
      }
      vec <- purrr::map_vec(vec, .f = fun)
    }
    
    df <- data.frame(
      x1 = c(rep("Man", 40), rep("Woman", 60)) %>% as.factor(),
      x2 = sample(c("Sick", "Healthy"), 100, replace = TRUE) %>% as.factor()
    )
    
    df %>% 
      tbl_summary(
        by = "x1",
        digits = all_categorical() ~ round_5
      ) %>% 
      add_overall(last = TRUE) %>% 
      add_p()
    

    Results:

    enter image description here

    Note, this version doesn't recalculate percentages after rounding the counts; rather, it just rounds both separately.