I have a data.table representing three variables:
I've formed a dummy set of data to represent this below:
library(data.table)
library(dplyr)
Start_number <- 3
Duration <- 4
Table <- rbindlist(lapply(seq_len(Start_number),function(x) data.table(Start_time = x,
Time = as.integer(x + seq_len(Duration) - 1)) %>% mutate(Value = 2)))
(I've just put the "Value" column to be '2', however this will vary in my actual data set)
I need to create a table which has the following two columns of output:
Condition 1: We sum up the 'Value' column for rows where 'Time' > current row's value of 'Time' (e.g. if 'Time' = 1, we'd only sum up rows where 'Time' = 2, 3, 4, ...)
Condition 2: Where Condition 1 is met, we exclude summing up rows where 'Start_time' > 'Time'.
I'll explain what the intended output is to hopefully make more sense of the above.
Target_output <- data.table(Time = seq_len(Duration),Total_FV = c(6,10,12,6,2,0))
Row 1 of output: Need to sum up for all values of Time > 1, but only where Start_time <= 1. From the "Table" data.table, this gives us rows 2, 3 and 4 which sum up to 6.
Row 2 of output: Need to sum up for all values of Time > 2, but only where Start_time <= 2. This gives us rows 3 and 4 where Start_time = 1, and then rows 6 - 8 where Start_time = 2. In total, this gives us 10.
Row 3 of output: Need to sum up for all values of Time > 3, but only where Start_time <= 3. This gives us row 4 where Start_time = 1, rows 7 - 8 where Start_time = 2, and then rows 10 - 12 for Start_time = 3. In total, this gives us 12.
...and so on
I tried to achieve this by conducting using non-equi joins with 'Table' on itself as follows:
Result <- Table[Table,on = .(Start_time >= Time,Time > Time),.(Total_FV = sum(Value)),by = Time]
Unfortunately this looks to be a long way off.
Can someone please see if they have a data.table-like approach to this, presumably using joins?
Thanks!
This approach doesn't use joins, but might work for your needs.
ts = Table[, unique(Time)]
data.table(Time=ts, Total_FV = sapply(ts, \(t) Table[Time>t & Start_time<=t, sum(Value)]))
Output:
Time Total_FV
<int> <num>
1: 1 6
2: 2 10
3: 3 12
4: 4 6
5: 5 2
6: 6 0