Start with this code:
set.seed(0)
the_df <- tibble(date=seq.Date(ymd('20230101'),ymd('20230101')+days(9), by='days'),
lead1=rep(1:10),
lead2=runif(10),
lead3=runif(10),
lead4=runif(10))
the_df %>%
mutate(lag2=lead2-dplyr::lag(lead1,1),
lag3=lead3-dplyr::lag(lead2,1),
lag4=lead4-dplyr::lag(lead3,1),)
The output will look like:
Now imagine a tibble with hundreds of columns. How can this lagged difference from one column the next be replicated in vector form for all columns?
I.e., if the tibble has lead1 to lead 100, the result would be lag2 to lag100.
The statement
the_df %>%
mutate(across(lead1:lead4, ~ dplyr::lag(.x,1), .names="d_{.col}"))
calculates the lag of each column within the column and creates a new column of data with that result. How do I calculate the difference between two columns within the across function as in the simple four-column example but for all columns?
the_df |> mutate(across(lead2:last_col(), ~ .x - lag(get(names(the_df)[which(cur_column() == names(the_df)) - 1])), .names = "lag_{.col}"))
Make the data long, then get the name of the previous column::
the_df |>
tidyr::pivot_longer(-date, names_to='lead')
dplyr::mutate(prev_col = lag(lead)) -> df
Left join this intermediate df with itself, getting the previous column's value as a new column. Then calculate the value minus the previous column's lagged value. Then remove the superfluous columns
df |>
dplyr::left_join(df, by = c("date", "prev_col" = "lead")) |>
dplyr::mutate(value = value.x - lag(value.y), .by = lead) |>
dplyr::select(date, lead, value) -> output
To make the data wide again:
# make the lag data wide
tidyr::pivot_wider(output, names_from = lead, names_glue = "{paste0('lag', lead)}", values_from = value, id_cols = date) |>
# remove the useless lag1 column
select(-lag1) |>
# join it with the lead data
full_join(the_df) |>
# then get the columns in the right order
select(date, starts_with("lead"), starts_with("lag"))
Output:
# A tibble: 10 × 8
date lead1 lead2 lead3 lead4 lag2 lag3 lag4
<date> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2023-01-01 1 0.0271 0.973 0.908 NA NA NA
2 2023-01-02 2 0.985 0.520 0.603 -0.0149 0.493 -0.369
3 2023-01-03 3 0.838 0.0891 0.0151 -1.16 -0.896 -0.505
4 2023-01-04 4 0.199 0.00293 0.776 -2.80 -0.835 0.687
5 2023-01-05 5 0.891 0.846 0.283 -3.11 0.647 0.280
6 2023-01-06 6 0.337 0.902 0.272 -4.66 0.0110 -0.574
7 2023-01-07 7 0.787 0.939 0.303 -5.21 0.602 -0.599
8 2023-01-08 8 0.747 0.159 0.133 -6.25 -0.628 -0.805
9 2023-01-09 9 0.704 0.753 0.00196 -7.30 0.00534 -0.157
10 2023-01-10 10 0.908 0.976 0.876 -8.09 0.272 0.123