rdata.tablecumsum

Add 0 for cumulative sum based on condition in R


I have an example data.table and I am trying to code 3 different cumulative sums by ID, one for each "Type". My data table is also sorted by date.

DT<-data.table(ID=c(A,A,A,A,B,B,B,C,C,C,C),date=c("2008-01-01","2009-01-01","2010-01-01","2011-01-01","1978-01-01","1982-01-01","1985-01-01","2001-01-01","2011-01-01","2015-01-01","2019-01-01"),Type=c(1,2,2,3,3,1,1,2,2,1,3),Days=c(18,333,26,57,48,10,17,,212,55,64,18))

I tried:

DT[Type==1&order(date),cumdays1:=cumsum(Days),by=ID]
DT[Type==2&order(date),cumdays2:=cumsum(Days),by=ID]
DT[Type==3&order(date),cumdays3:=cumsum(Days),by=ID]

However, this gives me NA for when the Type is different to the one I was calculating.

    ID       date Type Days cumdays1 cumdays2 cumdays3
 1:  A 2008-01-01    1   18       18       NA       NA
 2:  A 2009-01-01    2  333       NA      333       NA
 3:  A 2010-01-01    2   26       NA      359       NA
 4:  A 2011-01-01    3   57       NA       NA       57
 5:  B 1978-01-01    3   48       NA       NA       48
 6:  B 1982-01-01    1   10       10       NA       NA
 7:  B 1985-01-01    1   17       27       NA       NA
 8:  C 2001-01-01    2  212       NA      212       NA
 9:  C 2011-01-01    2   55       NA      267       NA
10:  C 2015-01-01    1   64       64       NA       NA
11:  C 2019-01-01    3   18       NA       NA       18

I would like instead to continue calculate cumulative sum using Day=0 for these columns. I know I could code a separate Day variable for each Type, but is there a simpler code to do this at once?

My desired output

    ID       date Type Days cumdays1 cumdays2 cumdays3
 1:  A 2008-01-01    1   18       18       0        0
 2:  A 2009-01-01    2  333       18      333       0
 3:  A 2010-01-01    2   26       18      359       0
 4:  A 2011-01-01    3   57       18      359       57
 5:  B 1978-01-01    3   48       0        0        48
 6:  B 1982-01-01    1   10       10       0        48
 7:  B 1985-01-01    1   17       27       0        48
 8:  C 2001-01-01    2  212       0        212      0
 9:  C 2011-01-01    2   55       0        267      0
10:  C 2015-01-01    1   64       64       267      0
11:  C 2019-01-01    3   18       64       267      18

Solution

  • Using Base R:

    transform(dt[order(dt$ID,dt$date),],
          cumdays = apply(Days * diag(max(Type))[Type,], 
                           2, \(x)ave(x,ID, FUN = cumsum)))
    
        ID       date Type Days cumdays.V1 cumdays.V2 cumdays.V3
     1:  A 2008-01-01    1   18         18          0          0
     2:  A 2009-01-01    2  333         18        333          0
     3:  A 2010-01-01    2   26         18        359          0
     4:  A 2011-01-01    3   57         18        359         57
     5:  B 1978-01-01    3   48          0          0         48
     6:  B 1982-01-01    1   10         10          0         48
     7:  B 1985-01-01    1   17         27          0         48
     8:  C 2001-01-01    2  212          0        212          0
     9:  C 2011-01-01    2   55          0        267          0
    10:  C 2015-01-01    1   64         64        267          0
    11:  C 2019-01-01    3   18         64        267         18
    

    Tidyverse:

    dt %>%
       arrange(ID, date) %>%
       mutate(name = Type) %>%
       pivot_wider(id_cols = c(ID, date, Type),
                   names_prefix = 'cumdays',
                   values_from = Days, values_fill = 0) %>%
       mutate(across(starts_with('cumdays'), cumsum), .by = ID)
    
    # A tibble: 11 × 6
       ID    date        Type cumdays1 cumdays2 cumdays3
       <chr> <chr>      <int>    <int>    <int>    <int>
     1 A     2008-01-01     1       18        0        0
     2 A     2009-01-01     2       18      333        0
     3 A     2010-01-01     2       18      359        0
     4 A     2011-01-01     3       18      359       57
     5 B     1978-01-01     3        0        0       48
     6 B     1982-01-01     1       10        0       48
     7 B     1985-01-01     1       27        0       48
     8 C     2001-01-01     2        0      212        0
     9 C     2011-01-01     2        0      267        0
    10 C     2015-01-01     1       64      267        0
    11 C     2019-01-01     3       64      267       18