rpurrrstargazer

Stargazer / iwalk error when printing summary statistics by group


I'm trying to print some summary statistics by a categorical variable and I keep getting the following error message:

Error in `map2()`:
ℹ In index: 1.
ℹ With name: Control.
Caused by error in `if (nchar(text.matrix[r, c]) > max.length[real.c]) ...`:
! missing value where TRUE/FALSE needed
Backtrace:
  1. ... %>% ...
  2. purrr::iwalk(...)
  3. purrr::walk2(.x, vec_index(.x), .f, ...)
  4. purrr::map2(.x, .y, .f, ..., .progress = .progress)
  5. purrr:::map2_("list", .x, .y, .f, ..., .progress = .progress)
  9. .f(.x[[i]], .y[[i]], ...)
 10. stargazer::stargazer(...)
 11. stargazer:::.stargazer.wrap(...)
 12. stargazer (local) .text.output(latex.code)
 13. stargazer (local) .text.column.width(t, c)

This is the code i'm running:

df %>%
  split(. $treat) %>% 
  iwalk(~ 
      stargazer(., 
        type = "text",
        flip = TRUE,
        title = "Table X: Balance tests by treatment status",
        covariate.labels = c(
          "Female (0/1)",
          "Year of birth",
          "Phone number (0/1)",
          "Registered Democrat (0/1)",
          "Registered Republican (0/1)",
          "Unaffiliated (0/1)"),
         align = TRUE)
      )

I've read other posts (e.g. this one and this one) which suggest the problem is with underscores in the covariate labels, but as you can see I have none.

I have tried removing the (0/1) from the labels, converting the treat variable to numeric (it's currently a factor), and i've tried changing the variable names to remove underscores. I get the ~same error message in each instance.

This data snippet can be used to repro the error:

df <- structure(list(treat = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L), levels = c("Control", 
"Treatment 1", "Treatment 2", "Treatment 3"
), class = "factor"), female = structure(c(1L, 1L, 2L, 2L, 2L, 
1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L), levels = c("female", 
"male"), class = "factor"), birth_year = c(1945, 1930, 1990, 
1984, 1992, 1957, 1996, 1977, 1975, 1985, 1936, 1992, 1958, 1939, 
1986, 1955, 1962, 1973, 1986, 1950), provided_phone_no = c(1, 
1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), dem = c(1, 
1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0), rep = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1), uaf = c(0, 
0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0)), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L), groups = structure(list(
    treat = structure(1:4, levels = c("Control", "Treatment 1", 
    "Treatment 2", "Treatment 3"), class = "factor"), 
    .rows = structure(list(1:5, 6:10, 11:15, 16:20), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = c(NA, -4L), .drop = TRUE, class = c("tbl_df", 
"tbl", "data.frame")))

Solution

  • Modelsummary makes it really easy to make these kinds of tables. It will give you a warning about what you need to include in your preamble, but other than that there isn't a lot of additional fiddling that you need to do!

    library(dplyr)
    library(purrr)
    library(modelsummary)
    
    
    nice_data = df |> 
      select(`Female (0/1)` = female,
             `Birth Year` = birth_year,
             `Phone Number (0/1)` = provided_phone_no,
             `Registered Dem (0/1)` = dem,
             `Registered Rep (0/1)` = rep,
             `Unaffiliated (0/1)` = uaf,
             `Treatment` = treat)
    
    
    ## you can rename it in the table but 
    ## I tend to prefer renaming before 
    
    examp = datasummary_balance(~Treatment,
                data = nice_data,
                ## this turns off calculating the difference in means
                dinm = FALSE,
                output = 'balance_table.tex')
    
    
    ## If you want to iterate through
     
    test_split = split(nice_data, nice_data$Treatment)
    
    names_vec = names(test_split)
    
    ## this will save the tables
    tables = map2(test_split, names_vec, \(x,y) datasummary_balance(~Treatment,
                data = x,
                ## this turns off calculating the difference in means
                dinm = FALSE,
                output = paste0("balance_table_", y, ".tex")))
    
    

    EDIT: Assuming you want LaTeX output than you need to pass some raw LaTeX code to your labels like this

    library(stargazr)
    
    nice_data %>%
      split(. $Treatment) %>% 
      iwalk(~ 
              stargazer(., 
                        type = "latex",
                        flip = TRUE,
                        title = "Table X: Balance tests by treatment status",
                        covariate.labels.include = FALSE,
                        covariate.labels = c(
                          "Female $\frac{0}{1}$",
                          "Year of birth $\frac{0}{1}$",
                          "Phone number",
                          "Registered Democrat $\frac{0}{1}$ ",
                          "Registered Republican $\frac{0}{1}$",
                          "Unaffiliated $\frac{0}{1}$"),
                        align = TRUE
                        )
      )
    
    

    Where you are wrapping the 0/1 in the math environment. However, I strongly recommend modelsummary for its simplicity and flexibility for making tables.

    Created on 2024-06-10 with reprex v2.1.0