There is a data.table, which has some data for different id's on yearly basis:
example_data <- data.table(
id = c('id1', 'id1', 'id1', 'id2', 'id2', 'id2', 'id2', 'id2'),
year = c(2010, 2011, 2012, 2010, 2011, 2012, 2014, 2015),
param1 = c(10, 10, 20, 10, 5, 7, 15, 15),
param2 = c(100, 150, 200, 50, 50, 75, 15, 10),
param3 = c(100, 100, 0, 500, 0, 400, 500, 1000)
)
> example_data
id year param1 param2 param3
1: id1 2010 10 100 100
2: id1 2011 10 150 100
3: id1 2012 20 200 0
4: id2 2010 10 50 500
5: id2 2011 5 50 0
6: id2 2012 7 75 400
7: id2 2014 15 15 500
8: id2 2015 15 10 1000
>
I need to add three columns, so that for each id I get a ratio of each param for previous year. I managed at least to make three new columns with just the values of params for previous year:
try01 <- example_data[example_data, on = .(id, year < year), mult="last"]
> try01
id year param1 param2 param3 i.param1 i.param2 i.param3
1: id1 2010 NA NA NA 10 100 100
2: id1 2011 10 100 100 10 150 100
3: id1 2012 10 150 100 20 200 0
4: id2 2010 NA NA NA 10 50 500
5: id2 2011 10 50 500 5 50 0
6: id2 2012 5 50 0 7 75 400
7: id2 2014 7 75 400 15 15 500
8: id2 2015 15 15 500 15 10 1000
This is ok, but here the original data is on the right side of the table, and previous values are on the left - it would be better to add new columns to the right instead.
And the original question is - how can I directly add not values from previous years, but the ratio " param current year / param previous year"? May be I could use shift or similar command?
You're on the right track with the suggestion of using shift()
. That will be a lot quicker than doing a (cartesian) non-equi join and then subsetting the last value.
param_cols <- grep("^param", names(example_data), value = TRUE)
new_cols <- paste0(param_cols, "_ratio")
example_data[, (new_cols) := lapply(
.SD,
\(x) x / shift(x)
),
by = id,
.SDcols = param_cols
]
# id year param1 param2 param3 param1_ratio param2_ratio param3_ratio
# <char> <num> <num> <num> <num> <num> <num> <num>
# 1: id1 2010 10 100 100 NA NA NA
# 2: id1 2011 10 150 100 1.000000 1.5000000 1.00
# 3: id1 2012 20 200 0 2.000000 1.3333333 0.00
# 4: id2 2010 10 50 500 NA NA NA
# 5: id2 2011 5 50 0 0.500000 1.0000000 0.00
# 6: id2 2012 7 75 400 1.400000 1.5000000 Inf
# 7: id2 2014 15 15 500 2.142857 0.2000000 1.25
# 8: id2 2015 15 10 1000 1.000000 0.6666667 2.00
I've assumed you want the first value to be NA
as there is no previous value. However, if not, you can change the fill
parameter in your shift()
call. For example, if you want it to always be 1, just divide by the first value by group, shift(x, fill = x[1])
.
Similarly, you will get Inf
rather than NA
if the previous value is 0
. If you don't want this you replace all the infinite values with NA
, e.g. after the previous stage:
example_data[,
(new_cols) := lapply(.SD, \(x) fifelse(is.infinite(x), NA, x)),
.SDcols = new_cols
]