rdplyracross

Setting `.names` for multiple columns in dplyr across


Background: I'm trying to use a modified version of this solution to create multiple lagged columns across multiple variables. The key part of the function in question is:

mutate(across(.cols = {{ var }}, .fns = map_lag, .names = "{.col}_lag{lags}"))

Where var and lags are arguments of the main function.

I've found that using a single column for var works fine and generate the correct .names for the output, as does selecting only a single value for lags (rather than a range e.g. 1:5, but feeding in a <tidy-select> set of columns as var AND a range for lags doesn't work with the current .names syntax (but works for the main purpose of the function).

Essentially, I think the problem boils down to specifying .names in across for both multiple values of {.col} and {lags}. Is there a way to specify .names such that it expands correctly?

Reprex:

test <- tibble(x=1:10, y=21:30)

calculate_lags <- function(df, var, lags) {
  map_lag <- lags %>% map(~partial(lag, n = .x))
  return(df %>% mutate(across(.cols = {{ var }}, .fns = map_lag, .names = "{.col}_lag{lags}")))
}

## Works fine with just one variable and a range of lags
test  %>% calculate_lags(x, 3:5)
# A tibble: 10 × 5
       x     y x_lag3 x_lag4 x_lag5
   <int> <int>  <int>  <int>  <int>
 1     1    21     NA     NA     NA
 2     2    22     NA     NA     NA
 3     3    23     NA     NA     NA
 4     4    24      1     NA     NA
 5     5    25      2      1     NA
 6     6    26      3      2      1
 7     7    27      4      3      2
 8     8    28      5      4      3
 9     9    29      6      5      4
10    10    30      7      6      5

## Or with multiple variables and a single value for lag
test  %>% calculate_lags(x:y, 2)
# A tibble: 10 × 4
       x     y x_lag2 y_lag2
   <int> <int>  <int>  <int>
 1     1    21     NA     NA
 2     2    22     NA     NA
 3     3    23      1     21
 4     4    24      2     22
 5     5    25      3     23
 6     6    26      4     24
 7     7    27      5     25
 8     8    28      6     26
 9     9    29      7     27
10    10    30      8     28

## But not with multiple columns AND a range of lags
test  %>% calculate_lags(x:y, 2:4)
> Error in `mutate()`:
> ℹ In argument: `across(.cols = x:y, .fns = map_lag, .names = "{.col}_lag{lags}")`.
> Caused by error:
> ! Variables must be length 1 or 6
> Run `rlang::last_trace()` to see where the error occurred.

Solution

  • As mentioned in the help page for mutate::across:

    .names
    A glue specification that describes how to name the output columns. This can use {.col} to stand for the selected column name, and {.fn} to stand for the name of the function being applied. The default (NULL) is equivalent to "{.col}" for the single function case and "{.col}_{.fn}" for the case where a list is used for .fns.

    In this case, we are passing a list of functions, so it could use "{.col}_{.fn}".

    Therefore, we can named the list of functions and use "{.col}_lag{.fn}":

    calculate_lags2 <- function(df, var, lags) {
      map_lag <- lags %>% map(~partial(lag, n = .x))
      names(map_lag) <- lags
      return(df %>% mutate(across(.cols = {{ var }}, .fns = map_lag, .names="{.col}_lag{.fn}")))
    }
    
    test  %>% calculate_lags2(x:y, 2:4)
    
           x     y x_lag2 x_lag3 x_lag4 y_lag2 y_lag3 y_lag4
       <int> <int>  <int>  <int>  <int>  <int>  <int>  <int>
     1     1    21     NA     NA     NA     NA     NA     NA
     2     2    22     NA     NA     NA     NA     NA     NA
     3     3    23      1     NA     NA     21     NA     NA
     4     4    24      2      1     NA     22     21     NA
     5     5    25      3      2      1     23     22     21
     6     6    26      4      3      2     24     23     22
     7     7    27      5      4      3     25     24     23
     8     8    28      6      5      4     26     25     24
     9     9    29      7      6      5     27     26     25
    10    10    30      8      7      6     28     27     26