rexpss

How to add column to show difference between variables in `expss` package


I'm a new user of expss package. I have a question about how to add a column to show difference.

For example I have an expss result as below. I would like to add a column to show the difference between V-engine and Straight engine (V-engine - Straight engine)

Code:

library(expss)
data(mtcars)

mtcars = apply_labels(mtcars,
                      mpg = "Miles/(US) gallon",
                      cyl = "Number of cylinders",
                      disp = "Displacement (cu.in.)",
                      hp = "Gross horsepower",
                      drat = "Rear axle ratio",
                      wt = "Weight (1000 lbs)",
                      qsec = "1/4 mile time",
                      vs = "Engine",
                      vs = c("V-engine" = 0,
                             "Straight engine" = 1),
                      am = "Transmission",
                      am = c("Automatic" = 0,
                             "Manual"=1),
                      gear = "Number of forward gears",
                      carb = "Number of carburetors"
)

mtcars %>% 
tab_cells(cyl, disp) %>% 
tab_cols(vs) %>% 
tab_stat_mean() %>% 
tab_pivot()
                                                              
 |                       |      |   Engine |                 |
 |                       |      | V-engine | Straight engine |
 | --------------------- | ---- | -------- | --------------- |
 |   Number of cylinders | Mean |      7.4 |             4.6 |
 | Displacement (cu.in.) | Mean |    307.1 |           132.5 |
> 

Many thanks


Solution

  • The output is a data.frame with etable class, thus we can create the column using any method that can work with data.frame

    library(dplyr)
    out <- mtcars %>% 
      tab_cells(cyl, disp) %>% 
      tab_cols(vs) %>% 
      tab_stat_mean() %>% 
      tab_pivot() %>%
      mutate(diff = `vs|0`- `vs|1`)
    

    -output

    > out
                                            
     |      |      |    vs |       |  diff |
     |      |      |     0 |     1 |       |
     | ---- | ---- | ----- | ----- | ----- |
     |  cyl | Mean |   7.4 |   4.6 |   2.9 |
     | disp | Mean | 307.1 | 132.5 | 174.7 |
     |      |      |       |       |       |
     |      |      |       |       |       |
    > str(out)
    Classes ‘etable’ and 'data.frame':  4 obs. of  4 variables:
     $ row_labels: chr  "cyl|Mean" "disp|Mean" NA NA
     $ vs|0      : num  7.44 307.15 NA NA
     $ vs|1      : num  4.57 132.46 NA NA
     $ diff      : num  2.87 174.69 NA NA
    

    With the table output showed, it would be

    mtcars %>% 
         tab_cells(cyl, disp) %>% 
         tab_cols(vs) %>% 
         tab_stat_mean() %>% 
         tab_pivot() %>%  mutate(diff = `Engine|V-engine` - `Engine|Straight engine`)
                                                                          
     |                       |      |   Engine |                 |  diff |
     |                       |      | V-engine | Straight engine |       |
     | --------------------- | ---- | -------- | --------------- | ----- |
     |   Number of cylinders | Mean |      7.4 |             4.6 |   2.9 |
     | Displacement (cu.in.) | Mean |    307.1 |           132.5 | 174.7 |
     |                       |      |          |                 |       |
     |                       |      |          |                 |       |
    

    Update

    If there are multiple groups of columns, then either we can use across or use map to apply the condition

    library(stringr)
    library(purrr)
    out1 <- mtcars %>% 
      tab_cells(cyl, disp) %>% 
      tab_cols(vs, am) %>% 
      tab_stat_mean() %>% 
      tab_pivot()  
    out2 <- bind_cols(out1, map_dfc(split.default(as.data.frame(out1[,-1]), 
         str_replace(names(out1)[-1], "\\|.*", "|diff")), reduce, `-`))
    

    -output

    > out2
                                                                                                                  
     |                       |      |   Engine |                 | Transmission |        | Engine | Transmission |
     |                       |      | V-engine | Straight engine |    Automatic | Manual |   diff |         diff |
     | --------------------- | ---- | -------- | --------------- | ------------ | ------ | ------ | ------------ |
     |   Number of cylinders | Mean |      7.4 |             4.6 |          6.9 |    5.1 |    2.9 |          1.9 |
     | Displacement (cu.in.) | Mean |    307.1 |           132.5 |        290.4 |  143.5 |  174.7 |        146.8 |
    > str(out2)
    Classes ‘etable’ and 'data.frame':  2 obs. of  7 variables:
     $ row_labels            : chr  "Number of cylinders|Mean" "Displacement (cu.in.)|Mean"
     $ Engine|V-engine       : num  7.44 307.15
     $ Engine|Straight engine: num  4.57 132.46
     $ Transmission|Automatic: num  6.95 290.38
     $ Transmission|Manual   : num  5.08 143.53
     $ Engine|diff           : num  2.87 174.69
     $ Transmission|diff     : num  1.87 146.85