rdataframedata.tablelapplyrowsum

Conditional rolling sum across columns


I have a data frame of values across successive years (columns) for unique individuals (rows). A dummy data example is provided here:

dt = structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), `2015` = c(0, 
0.8219178, 0, 0.1369863, 0, 1.369863, 0.2739726, 0.8219178, 5, 
0), `2016` = c(0, 1.369863, 0, 0.2739726, 0, 0.2739726, 0, 3.2876712, 
0, 0), `2017` = c(0.6849315, 0, 0, 0.6849315, 0, 0.5479452, 0, 
0, 0, 0), `2018` = c(1.0958904, 0.5479452, 1.9178082, 0, 0, 0, 
0, 0, 0, 3), `2019` = c(0, 0, 0, 1.0958904, 0, 0.9589041, 0.5479452, 
0, 0, 0), `2020` = c(0.4383562, 0, 0, 0, 0.2739726, 0.6849315, 
0, 0, 0, 0)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-10L))

I want to create a dataset where the maximum value for each individual that should appear for each year is 1. In cases where it exceeds this value, I want to carry the excess value over 1 into the next year (column) and sum it to that year's value for each individual and so on.

The expected result is:

dt_expected = structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), `2015` = c(0, 
0.8219178, 0, 0.1369863, 0, 1, 0.2739726, 0.8219178, 1, 0), `2016` = c(0, 
1, 0, 0.2739726, 0, 0.6438356, 0, 1, 1, 0), `2017` = c(0.6849315, 
0.369863, 0, 0.6849315, 0, 0.5479452, 0, 1, 1, 0), `2018` = c(1, 
0.5479452, 1, 0, 0, 0, 0, 1, 1, 1), `2019` = c(0.0958904, 0, 
0.9178082, 1, 0, 0.9589041, 0.5479452, 0.2876712, 1, 1), `2020` = c(0.4383562, 
0, 0, 0.0958904, 0.2739726, 0.6849315, 0, 0, 0, 1)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -10L))

I am at a total loss of where to start with this problem, so any assistance achieving this using data.table would be greatly appreciated. My only thought is to use lapply with an ifelse function for the conditional component. Then should I be using rowSums or Reduce to achieve my outcome of shifting excess values across columns?


Solution

  • A translation of Martin Morgan's answer to :

    for (i in 2:(ncol(dt) - 1)) {
      x = dt[[i]]
      set(dt, j = i, value = pmin(x, 1))
      set(dt, j = i + 1, value = dt[[i + 1L]] + pmax(x - 1, 0))
    }