I have two data.tables "commuters" and "distance". Distance is a huge distance matrix with row and column indices. Commuters has a column "home" and "destination".
commuters (1646044 rows):
home | destination |
---|---|
2 | 2 |
1 | 2 |
3 | 3 |
1 | 4 |
1 | 4 |
distance (1187 rows):
1 | 2 | 3 | 4 | |
---|---|---|---|---|
1 | 238.23 | 453 | 263 | 1 |
2 | 21.2 | 1 | 238.23 | 238.23 |
3 | 577.98 | 238.23 | 4362 | 2443.22 |
4 | 234.12 | 987.98 | 89.93 | 12.21 |
I would like to add a new column "distance" to df commuters. So, if the value in "home" matches with the row index of "distance" and "destination" matches with the column index of "distance", I want to add the corresponding value in the matrix to this new column "distance" in "commuters". The problem is that "home" sometimes the same value multiple times (that is why there are more rows in commuters than in distance). R keeps giving me the error that the rows in "cpmmuters" do not match the rows in "distance". However, I would like to keep duplicates (as you can see in the following table: "1 | 4" appears twice).
desired output:
home | destination | distance |
---|---|---|
2 | 2 | 1 |
1 | 2 | 453 |
3 | 3 | 4362 |
1 | 4 | 1 |
1 | 4 | 1 |
What I have tried:
commuters$distance <- distance[cbind(commuters$home, commuters$destination)]
I get this error: i is invalid type (matrix). Perhaps in future a 2 column matrix could return a list of elements of DT (in the spirit of A[B] in FAQ 2.14). Please report to data.table issue tracker if you'd like this, or add your comments to FR #657.
How can I fix this?
No need to do anything fancy (no melting required): the base [
accessor for matrices allows for a 2-column matrix to specify row/column indices. Your error, however, is because your "matrix" (as you call it) is really a data.table
. If it were a frame or a matrix, it works:
as.data.table(distance)[cbind(commuters$home, commuters$destination)]
# Error in `[.data.table`(as.data.table(distance), cbind(commuters$home, :
# i is invalid type (matrix). Perhaps in future a 2 column matrix could return a list of elements of DT (in the spirit of A[B] in FAQ 2.14). Please report to data.table issue tracker if you'd like this, or add your comments to FR #657.
as.data.frame(distance)[cbind(commuters$home, commuters$destination)]
# [1] 1 453 4362 1 1
as.matrix(distance)[cbind(commuters$home, commuters$destination)]
# [1] 1 453 4362 1 1
So if commuters
really is a data.table
, we can do
commuters[, dist := as.matrix(distance)[cbind(home, destination)] ]
# home destination dist
# <int> <int> <num>
# 1: 2 2 1
# 2: 1 2 453
# 3: 3 3 4362
# 4: 1 4 1
# 5: 1 4 1
Data
commuters <- data.table::as.data.table(structure(list(home = c(2L, 1L, 3L, 1L, 1L), destination = c(2L, 2L, 3L, 4L, 4L)), row.names = c(NA, -5L), class = c("data.table", "data.frame")))
# the matrix version of distance ...
# to reproduce the error, convert this to a data.frame or data.table
distance <- structure(c(238.23, 21.2, 577.98, 234.12, 453, 1, 238.23, 987.98, 263, 238.23, 4362, 89.93, 1, 238.23, 2443.22, 12.21), dim = c(4L, 4L), dimnames = list(NULL, c("1", "2", "3", "4")))