rdata.tableintersectmapply

Code that finds list intersection (using mapply) works for same values but not others


I am trying to find the intersect between two list columns in a data.table object which I import into R. I replicate the data.table below, with the exact same values:

DT_1 <- data.table(
  ego = as.integer(c(128320, 128320)),
  list_ego = list(as.integer(c(1,4)), as.integer(c(1,4))),
  alter = as.integer(c(48259, 167757)),
  list_alter = list(as.integer(c(4,3,1,5)), as.integer(c(3,1,4,5)))
)

I then run the code below and get an error message:

> DT_1[, shared_list := mapply(FUN = intersect, list_ego, list_alter)]
Error in `[.data.table`(DT_1, , `:=`(shared_list, mapply(FUN = intersect,  : 
  Supplied 4 items to be assigned to 2 items of column 'shared_list'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
In addition: Warning message:
In `[.data.table`(DT_1, , `:=`(shared_list, mapply(FUN = intersect,  :
  2 column matrix RHS of := will be treated as one vector

Strangely enough, the same code works when I use other values:

> DF_2 <- data.table(
+   ego = as.integer(c(1, 1)),
+   list_ego = list(as.integer(c(100,200)), as.integer(c(100,200))),
+   alter = as.integer(c(2, 3)),
+   list_alter = list(as.integer(c(100, 300)), as.integer(c(200, 300)))
+ )
> DF_2[, shared_list := mapply(FUN = intersect, list_ego, list_alter)]
> DF_2
   ego list_ego alter list_alter shared_list
1:   1  100,200     2    100,300         100
2:   1  100,200     3    200,300         200

I need to have this code work for all values, as I will use it in a loop over many csv imported data.table objects.


Solution

  • The clue is in the error message. Use SIMPLIFY = FALSE to avoid results being converted to a matrix when possible:

    DT_1[, shared_list := mapply(FUN = intersect, list_ego, list_alter, SIMPLIFY = FALSE)]
    
    #       ego list_ego  alter list_alter shared_list
    #     <int>   <list>  <int>     <list>      <list>
    # 1: 128320      1,4  48259    4,3,1,5         1,4
    # 2: 128320      1,4 167757    3,1,4,5         1,4
    

    You can also use Map():

    DT_1[, shared_list := Map(f = intersect, list_ego, list_alter)]