rqwraps2

How to add results from group comparisons in new rows using qwraps2 in R


I am following Peter DeWitt's great tutorial on qwraps2 and summary_table, but am unable to progress further.

Here's my data and code so far:

data(mtcars)

  mtcars2 <- dplyr::mutate(mtcars,
                cyl_factor = factor(cyl,
                                    levels = c(6, 4, 8),
                                    labels = paste(c(6, 4, 8), "cylinders")),
                cyl_character = paste(cyl, "cylinders"),
                gear_factor = factor(gear,
                                     levels = c(3, 4, 5),
                                     labels = paste(c(3, 4, 5), "gears")))

new_summary <- mtcars2 %>%
  dplyr::select(.data$mpg, .data$wt, .data$gear_factor) %>%
  qsummary(.)

by_cyl <- mtcars2 %>%
  dplyr::group_by(.data$cyl_factor) %>%
  summary_table(., new_summary)

In the tutorial he calculates p values for group comparisons and adds the p values to a new column in the tables. I would like to expand on this by adding more results from the comparisons (Cohen's d and the 95% CI, as well as the p value). Then I would like to add these results to a new row under each variable, instead of as a new column next to each variable). Thus I would like the output to look something like this (I've made up the numbers for the group comparison test):

                 6 cylinders (N = 7)    4 cylinders (N = 11)    8 cylinders (N = 14)
mpg           
   minimum      17.80                   21.40                   10.40
   median (IQR) 19.70 (18.65, 21.00)    26.00 (22.80, 30.40)    15.20 (14.40, 16.25)
   mean (sd)    19.74 ± 1.45            26.66 ± 4.51            15.10 ± 2.56
   maximum      21.40                   33.90                   19.20
   comparison   d = 0.87, 95% CI [0.80, 0.94], p    = 0.001
wt            
   minimum      2.62                    1.51                    3.17
   median (IQR) 3.21 (2.82, 3.44)       2.20 (1.88, 2.62)       3.75 (3.53, 4.01)
   mean (sd)    3.12 ± 0.36             2.29 ± 0.57             4.00 ± 0.76
   maximum      3.46                    3.19                    5.42
   comparison   d = 0.87, 95% CI [0.80, 0.94], p    = 0.001

So I have two questions:

  1. How can I add a row to the tables and fill it with some content

  2. How can I run group comparison tests, put it in the right format, and enter it into the tables?

My main problem is question 1, that's very I'm stuck now. If I can get help in solving it, I might be able to figure out question 2 on my own by fiddling around with DeWitt's mpvals examples. Though I would be very happy to get help with question 2 as well.

So far I have tried to add a blank row in qsummary(), but couldn't make that work. I tried manipulating the character matrix created by summary_table, but couldn't figure out how to manipulate it. Any help is appreciated!


Solution

  • Adding a comparison row to each row group is not something that qwraps2::summary_table will support directly. This is because the issue is related to the limitations of markdown and the complexity of supporting all the different ways to implement spanning multicol of a table in LaTeX.

    Using qwraps2::summary_table to generate the primary table is a good starting point. Building the output table itself will require some other packages.

    With the release of qwraps2 version 0.5.0, the mtcars2 data is an exported data set and does not need to be built explicitly.

    library(qwraps2)
    options(qwraps2_markup = "markdown")
    
    summaries <- qsummary(mtcars2[, c("mpg", "wt", "gear_factor")])
    
    by_cyl <-
      summary_table(mtcars2, summaries = summaries, by = "cyl_factor")
    

    Note that the output from summary_table is a character matrix.

    str(by_cyl)
    #>  'qwraps2_summary_table' chr [1:11, 1:3] "17.80" "19.70 (18.65, 21.00)" ...
    #>  - attr(*, "dimnames")=List of 2
    #>   ..$ : chr [1:11] "minimum" "median (IQR)" "mean (sd)" "maximum" ...
    #>   ..$ : chr [1:3] "6 cylinders (N = 7)" "4 cylinders (N = 11)" "8 cylinders (N = 14)"
    #>  - attr(*, "rgroups")= Named int [1:3] 4 4 3
    #>   ..- attr(*, "names")= chr [1:3] "mpg" "wt" "gear_factor"
    

    Instead of Choen’s D I’ll report and F-statistic and p-value form an analysis of variance.

    mpg_comp <-
      paste(extract_fstat(lm(mpg ~ cyl_factor, data = mtcars2)),
            extract_fpvalue(lm(mpg ~ cyl_factor, data = mtcars2)),
            collapse = ", ")
    
    wt_comp <-
      paste(extract_fstat(lm(wt ~ cyl_factor, data = mtcars2)),
            extract_fpvalue(lm(wt ~ cyl_factor, data = mtcars2)),
            collapse = ", ")
    
    mpg_comp
    #> [1] "$F_{2, 29} = 39.70$ *P* < 0.0001"
    wt_comp
    #> [1] "$F_{2, 29} = 22.91$ *P* < 0.0001"
    

    For building the table, there are many options. Spanning multiple columns in markdown is not trivial. Different flavors of markdown will render the tables differently. Some will support multicolumn spanning, other flavors will not.

    For a markdown table, I recommend having a new column with the comparison reported. For the by_cyl table I would put the F stat and p-value on the rows where the mean is reported. This puts the statistical test and results on the line related to the summary statistic.

    by_cyl2 <- cbind(by_cyl, "comparison" = "&nbsp;")
    by_cyl2[grepl("mean", rownames(by_cyl2)), "comparison"] <- c(mpg_comp, wt_comp)
    
    by_cyl2
    #> 
    #> 
    #> |                             |6 cylinders (N = 7)  |4 cylinders (N = 11) |8 cylinders (N = 14) |comparison                       |
    #> |:----------------------------|:--------------------|:--------------------|:--------------------|:--------------------------------|
    #> |**mpg**                      |&nbsp;&nbsp;         |&nbsp;&nbsp;         |&nbsp;&nbsp;         |&nbsp;&nbsp;                     |
    #> |&nbsp;&nbsp; minimum         |17.80                |21.40                |10.40                |&nbsp;                           |
    #> |&nbsp;&nbsp; median (IQR)    |19.70 (18.65, 21.00) |26.00 (22.80, 30.40) |15.20 (14.40, 16.25) |&nbsp;                           |
    #> |&nbsp;&nbsp; mean (sd)       |19.74 &plusmn; 1.45  |26.66 &plusmn; 4.51  |15.10 &plusmn; 2.56  |$F_{2, 29} = 39.70$ *P* < 0.0001 |
    #> |&nbsp;&nbsp; maximum         |21.40                |33.90                |19.20                |&nbsp;                           |
    #> |**wt**                       |&nbsp;&nbsp;         |&nbsp;&nbsp;         |&nbsp;&nbsp;         |&nbsp;&nbsp;                     |
    #> |&nbsp;&nbsp; minimum         |2.62                 |1.51                 |3.17                 |&nbsp;                           |
    #> |&nbsp;&nbsp; median (IQR)    |3.21 (2.82, 3.44)    |2.20 (1.89, 2.62)    |3.75 (3.53, 4.01)    |&nbsp;                           |
    #> |&nbsp;&nbsp; mean (sd)       |3.12 &plusmn; 0.36   |2.29 &plusmn; 0.57   |4.00 &plusmn; 0.76   |$F_{2, 29} = 22.91$ *P* < 0.0001 |
    #> |&nbsp;&nbsp; maximum         |3.46                 |3.19                 |5.42                 |&nbsp;                           |
    #> |**gear_factor**              |&nbsp;&nbsp;         |&nbsp;&nbsp;         |&nbsp;&nbsp;         |&nbsp;&nbsp;                     |
    #> |&nbsp;&nbsp; 3 forward gears |2 (29)               |1 (9)                |12 (86)              |&nbsp;                           |
    #> |&nbsp;&nbsp; 4 forward gears |4 (57)               |8 (73)               |0 (0)                |&nbsp;                           |
    #> |&nbsp;&nbsp; 5 forward gears |1 (14)               |2 (18)               |2 (14)               |&nbsp;                           |
    

    If, instead, the comparison is a new row, I like the idea of using the summary to add a blank row then add the comparison to the blank row.

    summaries[[1]] <- c(summaries[[1]], "comparison" = ~ qwraps2::frmt(""))
    summaries[[2]] <- c(summaries[[2]], "comparison" = ~ qwraps2::frmt(""))
    
    by_cyl3 <- summary_table(mtcars2, summaries, by = "cyl_factor")
    
    by_cyl3[grepl("comparison", rownames(by_cyl3)), 1] <- c(mpg_comp, wt_comp)
    
    by_cyl3
    #> 
    #> 
    #> |                             |6 cylinders (N = 7)              |4 cylinders (N = 11) |8 cylinders (N = 14) |
    #> |:----------------------------|:--------------------------------|:--------------------|:--------------------|
    #> |**mpg**                      |&nbsp;&nbsp;                     |&nbsp;&nbsp;         |&nbsp;&nbsp;         |
    #> |&nbsp;&nbsp; minimum         |17.80                            |21.40                |10.40                |
    #> |&nbsp;&nbsp; median (IQR)    |19.70 (18.65, 21.00)             |26.00 (22.80, 30.40) |15.20 (14.40, 16.25) |
    #> |&nbsp;&nbsp; mean (sd)       |19.74 &plusmn; 1.45              |26.66 &plusmn; 4.51  |15.10 &plusmn; 2.56  |
    #> |&nbsp;&nbsp; maximum         |21.40                            |33.90                |19.20                |
    #> |&nbsp;&nbsp; comparison      |$F_{2, 29} = 39.70$ *P* < 0.0001 |                     |                     |
    #> |**wt**                       |&nbsp;&nbsp;                     |&nbsp;&nbsp;         |&nbsp;&nbsp;         |
    #> |&nbsp;&nbsp; minimum         |2.62                             |1.51                 |3.17                 |
    #> |&nbsp;&nbsp; median (IQR)    |3.21 (2.82, 3.44)                |2.20 (1.89, 2.62)    |3.75 (3.53, 4.01)    |
    #> |&nbsp;&nbsp; mean (sd)       |3.12 &plusmn; 0.36               |2.29 &plusmn; 0.57   |4.00 &plusmn; 0.76   |
    #> |&nbsp;&nbsp; maximum         |3.46                             |3.19                 |5.42                 |
    #> |&nbsp;&nbsp; comparison      |$F_{2, 29} = 22.91$ *P* < 0.0001 |                     |                     |
    #> |**gear_factor**              |&nbsp;&nbsp;                     |&nbsp;&nbsp;         |&nbsp;&nbsp;         |
    #> |&nbsp;&nbsp; 3 forward gears |2 (29)                           |1 (9)                |12 (86)              |
    #> |&nbsp;&nbsp; 4 forward gears |4 (57)                           |8 (73)               |0 (0)                |
    #> |&nbsp;&nbsp; 5 forward gears |1 (14)                           |2 (18)               |2 (14)               |
    

    This does not address spanning the multiple columns. That issue is, at least to my knowledge, non-trivial. I would use different tools and methods depending on the target file format. If I’m going to build a .pdf I would be working in LaTeX, not markdown, and use \multicolumn{}{}{} explicitly. If the target output was html I would build a html table explicitly. htmlTable is a great package for that. There are compatibility options with might help when building .docx or other Office style outputs. Created on 2020-09-15 by the reprex package (v0.3.0)

    devtools::session_info()
    #> ─ Session info ───────────────────────────────────────────────────────────────
    #>  setting  value                       
    #>  version  R version 4.0.2 (2020-06-22)
    #>  os       macOS Catalina 10.15.6      
    #>  system   x86_64, darwin17.0          
    #>  ui       X11                         
    #>  language (EN)                        
    #>  collate  en_US.UTF-8                 
    #>  ctype    en_US.UTF-8                 
    #>  tz       America/Denver              
    #>  date     2020-09-15                  
    #> 
    #> ─ Packages ───────────────────────────────────────────────────────────────────
    #>  package     * version date       lib source        
    #>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.0.0)
    #>  backports     1.1.9   2020-08-24 [1] CRAN (R 4.0.2)
    #>  callr         3.4.4   2020-09-07 [1] CRAN (R 4.0.2)
    #>  cli           2.0.2   2020-02-28 [1] CRAN (R 4.0.0)
    #>  crayon        1.3.4   2017-09-16 [1] CRAN (R 4.0.0)
    #>  desc          1.2.0   2018-05-01 [1] CRAN (R 4.0.0)
    #>  devtools      2.3.1   2020-07-21 [1] CRAN (R 4.0.2)
    #>  digest        0.6.25  2020-02-23 [1] CRAN (R 4.0.0)
    #>  ellipsis      0.3.1   2020-05-15 [1] CRAN (R 4.0.0)
    #>  evaluate      0.14    2019-05-28 [1] CRAN (R 4.0.0)
    #>  fansi         0.4.1   2020-01-08 [1] CRAN (R 4.0.0)
    #>  fs            1.5.0   2020-07-31 [1] CRAN (R 4.0.2)
    #>  glue          1.4.2   2020-08-27 [1] CRAN (R 4.0.2)
    #>  highr         0.8     2019-03-20 [1] CRAN (R 4.0.0)
    #>  htmltools     0.5.0   2020-06-16 [1] CRAN (R 4.0.0)
    #>  knitr         1.29    2020-06-23 [1] CRAN (R 4.0.0)
    #>  magrittr      1.5     2014-11-22 [1] CRAN (R 4.0.0)
    #>  memoise       1.1.0   2017-04-21 [1] CRAN (R 4.0.0)
    #>  pkgbuild      1.1.0   2020-07-13 [1] CRAN (R 4.0.2)
    #>  pkgload       1.1.0   2020-05-29 [1] CRAN (R 4.0.0)
    #>  prettyunits   1.1.1   2020-01-24 [1] CRAN (R 4.0.0)
    #>  processx      3.4.4   2020-09-03 [1] CRAN (R 4.0.2)
    #>  ps            1.3.4   2020-08-11 [1] CRAN (R 4.0.2)
    #>  qwraps2     * 0.5.0   2020-09-14 [1] local         
    #>  R6            2.4.1   2019-11-12 [1] CRAN (R 4.0.0)
    #>  Rcpp          1.0.5   2020-07-06 [1] CRAN (R 4.0.0)
    #>  remotes       2.2.0   2020-07-21 [1] CRAN (R 4.0.2)
    #>  rlang         0.4.7   2020-07-09 [1] CRAN (R 4.0.2)
    #>  rmarkdown     2.3     2020-06-18 [1] CRAN (R 4.0.0)
    #>  rprojroot     1.3-2   2018-01-03 [1] CRAN (R 4.0.0)
    #>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.0.0)
    #>  stringi       1.5.3   2020-09-09 [1] CRAN (R 4.0.2)
    #>  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.0.0)
    #>  testthat      2.3.2   2020-03-02 [1] CRAN (R 4.0.0)
    #>  usethis       1.6.1   2020-04-29 [1] CRAN (R 4.0.0)
    #>  withr         2.2.0   2020-04-20 [1] CRAN (R 4.0.0)
    #>  xfun          0.17    2020-09-09 [1] CRAN (R 4.0.2)
    #>  yaml          2.2.1   2020-02-01 [1] CRAN (R 4.0.0)
    #> 
    #> [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library