rdplyrpanellaglead

Lag and lead a variable in a dataframe by 1 month and 6 business days for panel data


I have a large panel data set and I would like to lag and lead a variable by 1 month and 6 business days. I know, for instance, from dplyr there is the lag or lead function. However, I also need to group by data based on the "Names" in the panel data.

My data look like this:

structure(list(Date = c("01.08.2018", "02.08.2018", "03.08.2018", 
"04.08.2018", "05.08.2018", "06.04.2019", "07.04.2019", "08.04.2019", 
"01.08.2018", "02.08.2018", "03.08.2018", "04.08.2018", "06.04.2019", 
"07.04.2019", "08.04.2019", "01.08.2018", "02.08.2018", "03.08.2018", 
"04.08.2018", "05.08.2018", "07.04.2019", "08.04.2019"), Name = c("A", 
"A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", 
"B", "C", "C", "C", "C", "C", "C", "C"), Rating = c(1L, 1L, 1L, 
3L, 3L, 4L, 4L, 4L, 3L, 3L, 2L, 2L, 2L, 1L, 1L, 1L, 3L, 3L, 3L, 
5L, 5L, 5L), Size = c(1234L, 24123L, 23L, 1L, 23L, 3L, 23L, 4L, 
323L, 3424L, 523L, 234L, 35L, 354L, 45L, 23L, 46L, 456L, 546L, 
24L, 134L, 1L)), class = "data.frame", row.names = c(NA, -22L
))

It is just a simplified version. My real data lasts from 01.08.2018 to 31.12.2021. How can I only lag and lead the variable called "Rating" by 1 month and 6 business days?

My difficulty is that I have 1 month and 6 business days and not just one variable in the dataframe. All the other variables should not be adjusted.

So far I tried this:

Data_2  <- Data  %>%
  group_by(Name)  %>%
  lag('Rating')
Data_3  <- Data  %>%
  group_by(Name)  %>%
  lead('Rating')

But this is not what I am aiming for.

EDIT:

My output should look like this in the case of lead: (I just used the first 5 rows to illustrate)

structure(list(Date = c("10.09.2018", "11.09.2018", "12.09.2018", 
"13.09.2018", "14.09.2018"), Name = c("A", "A", "A", "A", "A"
), Rating = c(1L, 1L, 1L, 3L, 3L), Size = c("Size from 10.09.2018 would be here", 
"Size from 11.09.2018 would be here", "Size from 12.09.2018 would be here", 
"Size from 13.09.2018 would be here", "Size from 14.09.2018 would be here"
)), class = "data.frame", row.names = c(NA, -5L))

So for row 1 I added 1 month and 6 business days which gives me 10.09.2018 and so on. The "Rating" will then be the one from 01.08.2018 but the "Size" will be the figure that was actually also reported on 10.09.2018. Then, I would like to do the same but go backwards 1 month and 6 business days.


Solution

  • Here's an approach that would work for "x days later." In this case I use 2 days later to demonstrate on your data, but 35 days later might be good to get the 5 week later #, with same day of week and so should be another "business day" most of the time.

    # Convert dates to a date format that can be calculated upon
    Data2 <- Data %>% mutate(Date = lubridate::dmy(Date))
             
    Data2 %>%
      mutate(Date_future = Date + 2) %>%
      left_join(Data2, by = c("Name", "Date_future" = "Date"), 
                suffix = c("_now", "_future"))
      # pipe into line below to just show selected columns
      # select(Date_future, Name, Rating_now, Size_future)
    

    Result

            Date Name Rating_now Size_now Date_future Rating_future Size_future
    1  2018-08-01    A          1     1234  2018-08-03             1          23
    2  2018-08-02    A          1    24123  2018-08-04             3           1
    3  2018-08-03    A          1       23  2018-08-05             3          23
    4  2018-08-04    A          3        1  2018-08-06            NA          NA
    5  2018-08-05    A          3       23  2018-08-07            NA          NA
    6  2019-04-06    A          4        3  2019-04-08             4           4
    7  2019-04-07    A          4       23  2019-04-09            NA          NA
    8  2019-04-08    A          4        4  2019-04-10            NA          NA
    9  2018-08-01    A          3      323  2018-08-03             1          23
    10 2018-08-02    B          3     3424  2018-08-04             2         234
    11 2018-08-03    B          2      523  2018-08-05            NA          NA
    12 2018-08-04    B          2      234  2018-08-06            NA          NA
    13 2019-04-06    B          2       35  2019-04-08             1          45
    14 2019-04-07    B          1      354  2019-04-09            NA          NA
    15 2019-04-08    B          1       45  2019-04-10            NA          NA
    16 2018-08-01    C          1       23  2018-08-03             3         456
    17 2018-08-02    C          3       46  2018-08-04             3         546
    18 2018-08-03    C          3      456  2018-08-05             5          24
    19 2018-08-04    C          3      546  2018-08-06            NA          NA
    20 2018-08-05    C          5       24  2018-08-07            NA          NA
    21 2019-04-07    C          5      134  2019-04-09            NA          NA
    22 2019-04-08    C          5        1  2019-04-10            NA          NA