rpivot-tableexpsstidyselect

How to select row/column variables starting with a particular set of characters (e.g., Q4) in R package expss?


In survey research we often have blocks of variables starting with the same set of characters such as "Q4a", "Q4b", etc. Is it possible to avoid manually entering column names starting with "Q4" as arguments of the tab_cells function of expss package and use some sort of a selector similar to starts_with("Q4") in R package expss? I know how to do it using tidyverse, but expss offers many options that are not easily reproducible using other packages (e.g., nicely formated tables with the results of pairwise significance testing of means/column proportions).

The following code tabulates means of columns Q4a and Q4b, but required me to enter column names (Q4a, Q4b) manually instead of requesting to select all variables starting with "Q4".

# load expss library
library(expss)

# create a sample data frame
data=data.frame(Q1=c(1,2,3),
                Q4a=c(3,4,5),
                Q4b=c(6,7,8))

# tabulate means of columns starting with "Q4" (entered manually as arguments of the tab_cells function)
data %>% 
    tab_cells(Q4a,Q4b) %>%
    tab_cols(total()) %>% 
    tab_stat_fun(Mean = w_mean) %>%
    tab_pivot()

Solution

  • In the document of ?tab_cells, it said you can use mrset/mdset for multiple-response variables. You can search ?mrset for more details.

    data %>% 
        tab_cells(mrset_p("^Q4")) %>%    # similar to matches("^Q4") in <tidy-select>
        tab_cols(total()) %>% 
        tab_stat_fun(Mean = w_mean) %>%
        tab_pivot()
    
    data %>% 
        tab_cells(mrset(Q4a %to% Q4b)) %>%    # similar to Q4a:Q4b in <tidy-select>
        tab_cols(total()) %>% 
        tab_stat_fun(Mean = w_mean) %>%
        tab_pivot()
    
    Output
    # |     |      | #Total |
    # | --- | ---- | ------ |
    # | Q4a | Mean |      4 |
    # | Q4b | Mean |      7 |