rdata.table

Shift with dynamic n (number of position lead / lag by)


I have the below df:

df <- data.table(user = c('a', 'a', 'a', 'b', 'b')
                 , spend = 1:5
                 , shift_by = c(1,1,2,1,1)
                 ); df

   user spend shift_by
1:    a     1        1
2:    a     2        1
3:    a     3        2
4:    b     4        1
5:    b     5        1

I am looking to create a lead lag column only this time the n parameter in data.table's shift function is dynamic and takes df$shiftby as input. My expected result is:

df[, spend_shifted := c(NA, 1, 1, NA, 4)]; df

   user spend shift_by spend_shifted
1:    a     1        1            NA
2:    a     2        1             1
3:    a     3        2             1
4:    b     4        1            NA
5:    b     5        1             4

However, with the below attempt it gives:

df[, spend_shifted := shift(x=spend, n=shift_by, type="lag"), user]; df

   user spend shift_by spend_shifted
1:    a     1        1            NA
2:    a     2        1            NA
3:    a     3        2            NA
4:    b     4        1            NA
5:    b     5        1            NA

This is the closest example I could find. However, I need a group by and am after a data.table solution because of speed. Truly look forward to finding any ideas.


Solution

  • Maybe this could help

    > df[, spend_shifted := spend[replace(seq(.N) - shift_by, seq(.N) <= shift_by, NA)], user][]
       user spend shift_by spend_shifted
    1:    a     1        1            NA
    2:    a     2        1             1
    3:    a     3        2             1
    4:    b     4        1            NA
    5:    b     5        1             4