I am trying to find the intersect between two list columns in a data.table object which I import into R. I replicate the data.table below, with the exact same values:
DT_1 <- data.table(
ego = as.integer(c(128320, 128320)),
list_ego = list(as.integer(c(1,4)), as.integer(c(1,4))),
alter = as.integer(c(48259, 167757)),
list_alter = list(as.integer(c(4,3,1,5)), as.integer(c(3,1,4,5)))
)
I then run the code below and get an error message:
> DT_1[, shared_list := mapply(FUN = intersect, list_ego, list_alter)]
Error in `[.data.table`(DT_1, , `:=`(shared_list, mapply(FUN = intersect, :
Supplied 4 items to be assigned to 2 items of column 'shared_list'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
In addition: Warning message:
In `[.data.table`(DT_1, , `:=`(shared_list, mapply(FUN = intersect, :
2 column matrix RHS of := will be treated as one vector
Strangely enough, the same code works when I use other values:
> DF_2 <- data.table(
+ ego = as.integer(c(1, 1)),
+ list_ego = list(as.integer(c(100,200)), as.integer(c(100,200))),
+ alter = as.integer(c(2, 3)),
+ list_alter = list(as.integer(c(100, 300)), as.integer(c(200, 300)))
+ )
> DF_2[, shared_list := mapply(FUN = intersect, list_ego, list_alter)]
> DF_2
ego list_ego alter list_alter shared_list
1: 1 100,200 2 100,300 100
2: 1 100,200 3 200,300 200
I need to have this code work for all values, as I will use it in a loop over many csv imported data.table objects.
The clue is in the error message. Use SIMPLIFY = FALSE
to avoid results being converted to a matrix when possible:
DT_1[, shared_list := mapply(FUN = intersect, list_ego, list_alter, SIMPLIFY = FALSE)]
# ego list_ego alter list_alter shared_list
# <int> <list> <int> <list> <list>
# 1: 128320 1,4 48259 4,3,1,5 1,4
# 2: 128320 1,4 167757 3,1,4,5 1,4
You can also use Map()
:
DT_1[, shared_list := Map(f = intersect, list_ego, list_alter)]