I have two data.tables "commuters_old" and "commuter_new":
commuters_old: (1.703.000 observations)
from1 | to1 | amount1 |
---|---|---|
1001 | 1001 | 12 |
1001 | 1002 | 2 |
1001 | 1003 | 23 |
2000 | 1002 | 200 |
2001 | 2001 | 80 |
2001 | 2002 | 98 |
commuters_new: (1.400.000 observations)
from2 | to2 | amount2 |
---|---|---|
1000 | 1001 | 10 |
1001 | 1002 | 22 |
1001 | 1004 | 2 |
2000 | 1002 | 50 |
2001 | 2001 | 8 |
2001 | 2003 | 100 |
library(data.table)
commuters_old <- data.table(
from1 = c(1001L, 1001L, 1001L, 2000L, 2001L, 2001L),
to1 = c(1001L, 1002L, 1003L, 1002L, 2001L, 2002L),
amount1 = c(12L, 2L, 23L, 200L, 80L, 98L))
commuters_new <- data.table(
from2 = c(1000L, 1001L, 1001L, 2000L, 2001L, 2001L),
to2 = c(1001L, 1002L, 1004L, 1002L, 2001L, 2003L),
amount2 = c(10L, 22L, 2L, 50L, 8L, 100L))
I would like to add the column "amount1" to data.table "commuters_new" but only if "from1 == from2" AND "to1 == to2". If this condition is not met, I want to write 0, so that:
commuters_new:
from2 | to2 | amount2 | amount1 |
---|---|---|---|
1000 | 1001 | 10 | 0 |
1001 | 1002 | 22 | 2 |
1001 | 1004 | 2 | 0 |
2000 | 1002 | 50 | 200 |
2001 | 2001 | 8 | 80 |
2001 | 2003 | 100 | 0 |
After that merge, I want to add another column "factor" that gives me the factor of "amount2" and"amount1".
commuters_new:
from2 | to2 | amount2 | amount1 | factor |
---|---|---|---|---|
1000 | 1001 | 10 | 0 | 0 |
1001 | 1002 | 22 | 2 | 11 |
1001 | 1004 | 2 | 0 | 0 |
2000 | 1002 | 50 | 200 | 0.25 |
2001 | 2001 | 8 | 80 | 0.1 |
2001 | 2003 | 100 | 0 | 0 |
What I have tried:
commuters_new[, amount1 := ifelse(from1==from2 & to1==to2, commuters_old$amount1, 0)]
OR
commuters_new$amount1[, paste(ifelse(from1==from2 & to1==to2, commuters_old$amount1, 0))]
But I always get the error:
Supplied 1703144 items to be assigned to 1408969 items of column 'amount1'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code
Another way, using dtplyr:
Steps:
amount1
column with 0
samount1
is 0, factor
is 0
. Otherwise, it is amount2 / amount1
dtplyr
to run the code by running as.data.table()
pacman::p_load(data.table, dtplyr, tidyverse)
commuters_new |>
lazy_dt() |>
left_join(commuters_old, by = c("from2" = "from1", "to2" = "to1")) |>
replace_na(list(amount1 = 0L))|>
mutate(factor = ifelse(!amount1, 0, amount2 / amount1)) |>
as.data.table()
Equivalent data.table code:
setnames(setcolorder(`_DT34`[`_DT33`, on = .(from1 = from2, to1 = to2),
allow.cartesian = TRUE], c(1L, 2L, 4L, 3L)), c("from1", "to1"
), c("from2", "to2"))[, `:=`(amount1 = fcoalesce(amount1, 0L))][,
`:=`(factor = fifelse(amount1 == 0, 0, amount2/amount1))]
# or
out <- (commuters_old[commuters_new, on = .(from1 = from2, to1 = to2)]
[, amount1 := fcoalesce(amount1, 0L)]
[, factor := ifelse(!amount1, 0, amount2 / amount1)])
Output:
from1 to1 amount1 amount2 factor
1: 1000 1001 0 10 0.00
2: 1001 1002 2 22 11.00
3: 1001 1004 0 2 0.00
4: 2000 1002 200 50 0.25
5: 2001 2001 80 8 0.10
6: 2001 2003 0 100 0.00