Background: I'm trying to use a modified version of this solution to create multiple lagged columns across multiple variables. The key part of the function in question is:
mutate(across(.cols = {{ var }}, .fns = map_lag, .names = "{.col}_lag{lags}"))
Where var
and lags
are arguments of the main function.
I've found that using a single column for var
works fine and generate the correct .names
for the output, as does selecting only a single value for lags
(rather than a range e.g. 1:5
, but feeding in a <tidy-select>
set of columns as var
AND a range for lags
doesn't work with the current .names
syntax (but works for the main purpose of the function).
Essentially, I think the problem boils down to specifying .names
in across
for both multiple values of {.col}
and {lags}
. Is there a way to specify .names
such that it expands correctly?
Reprex:
test <- tibble(x=1:10, y=21:30)
calculate_lags <- function(df, var, lags) {
map_lag <- lags %>% map(~partial(lag, n = .x))
return(df %>% mutate(across(.cols = {{ var }}, .fns = map_lag, .names = "{.col}_lag{lags}")))
}
## Works fine with just one variable and a range of lags
test %>% calculate_lags(x, 3:5)
# A tibble: 10 × 5
x y x_lag3 x_lag4 x_lag5
<int> <int> <int> <int> <int>
1 1 21 NA NA NA
2 2 22 NA NA NA
3 3 23 NA NA NA
4 4 24 1 NA NA
5 5 25 2 1 NA
6 6 26 3 2 1
7 7 27 4 3 2
8 8 28 5 4 3
9 9 29 6 5 4
10 10 30 7 6 5
## Or with multiple variables and a single value for lag
test %>% calculate_lags(x:y, 2)
# A tibble: 10 × 4
x y x_lag2 y_lag2
<int> <int> <int> <int>
1 1 21 NA NA
2 2 22 NA NA
3 3 23 1 21
4 4 24 2 22
5 5 25 3 23
6 6 26 4 24
7 7 27 5 25
8 8 28 6 26
9 9 29 7 27
10 10 30 8 28
## But not with multiple columns AND a range of lags
test %>% calculate_lags(x:y, 2:4)
> Error in `mutate()`:
> ℹ In argument: `across(.cols = x:y, .fns = map_lag, .names = "{.col}_lag{lags}")`.
> Caused by error:
> ! Variables must be length 1 or 6
> Run `rlang::last_trace()` to see where the error occurred.
As mentioned in the help page for mutate::across
:
.names
A glue specification that describes how to name the output columns. This can use {.col} to stand for the selected column name, and {.fn} to stand for the name of the function being applied. The default (NULL) is equivalent to "{.col}" for the single function case and "{.col}_{.fn}" for the case where a list is used for .fns.
In this case, we are passing a list of functions, so it could use "{.col}_{.fn}"
.
Therefore, we can named the list of functions and use "{.col}_lag{.fn}"
:
calculate_lags2 <- function(df, var, lags) {
map_lag <- lags %>% map(~partial(lag, n = .x))
names(map_lag) <- lags
return(df %>% mutate(across(.cols = {{ var }}, .fns = map_lag, .names="{.col}_lag{.fn}")))
}
test %>% calculate_lags2(x:y, 2:4)
x y x_lag2 x_lag3 x_lag4 y_lag2 y_lag3 y_lag4
<int> <int> <int> <int> <int> <int> <int> <int>
1 1 21 NA NA NA NA NA NA
2 2 22 NA NA NA NA NA NA
3 3 23 1 NA NA 21 NA NA
4 4 24 2 1 NA 22 21 NA
5 5 25 3 2 1 23 22 21
6 6 26 4 3 2 24 23 22
7 7 27 5 4 3 25 24 23
8 8 28 6 5 4 26 25 24
9 9 29 7 6 5 27 26 25
10 10 30 8 7 6 28 27 26