Say I have the following dataset dt
and a constant constant
.
dt <- structure(list(var1 = c(-92186.7470607738, -19163.5035325072,
-18178.8396858014, -9844.67882723287, -16494.7802822178, -17088.0576319257
), var2 = c(-3.12, NA, NA, NA, NA, NA)), class = c("data.table",
"data.frame"), row.names = c(NA, -6L))
constant <- 608383
print(dt)
var1 var2
1: -92186.747 -3.12
2: -19163.504 NA
3: -18178.840 NA
4: -9844.679 NA
5: -16494.780 NA
6: -17088.058 NA
The value of var2
depends on the value of the previous row, like so
for(i in 2:nrow(dt)){
prev <- dt[(i-1),]
dt[i, var2 := prev$var2 - var1/constant]
}
print(dt)
var1 var2
1: -92186.747 -3.120000
2: -19163.504 -3.088501
3: -18178.840 -3.058620
4: -9844.679 -3.042439
5: -16494.780 -3.015326
6: -17088.058 -2.987238
Though this for loop provides the desired output, it is very slow on a big dataset. Is there a faster way to achieve this output? Preferably using data.table
.
Here is a solution using accumulate
function from purrr
package in case you were interested. In this solution .y
represents the current value of var1
that we would like to iterate over and .x
represents the accumulated value that we calculate and put in var2
column. As you might have noticed I excluded the first value of var1
as it we don't apply our formula on it.
library(dplyr)
library(purrr)
dt %>%
mutate(var2 = accumulate(var1[-1], .init = var2[1], ~ .x - .y /constant))
var1 var2
1: -92186.747 -3.120000
2: -19163.504 -3.088501
3: -18178.840 -3.058620
4: -9844.679 -3.042439
5: -16494.780 -3.015326
6: -17088.058 -2.987238