rlistdata.tableaggregaterbindlist

rbindlist a list column of data.frames and select unique values


I have a data.table 'DT' with a column ('col2') that is a list of data frames:

require(data.table)
DT <- data.table(col1 = c('A','A','B'),
                 col2 = list(data.frame(colA = c(1,3,54, 23), 
                                        colB = c("aa", "bb", "cc", "hh")),
                             data.frame(colA =c(23, 1),
                                       colB = c("hh", "aa")), 
                             data.frame(colA = 1,
                                       colB = "aa")))

> DT
   col1         col2
1:    A <data.frame>
2:    A <data.frame>
3:    B <data.frame>

>> DT$col2
[[1]]
  colA colB
1    1   aa
2    3   bb
3   54   cc
4   23   hh

[[2]]
  colA colB
1   23   hh
2    1   aa

[[3]]
  colA colB
1    1   aa

Each data.frame in col2 has two columns colA and colB. I'd like to have a data.table output that binds each unique row of those data.frames based on col1 of DT. I guess it's like using rbindlist in an aggregate function of the data.table.

This is the desired output:

> #desired output
> output
   colA colB col1
1:    1   aa    A
2:    3   bb    A
3:   54   cc    A
4:   23   hh    A
5:    1   aa    B

The dataframe of the second row of DT (DT[2, col2]) has duplicate entries, and only unique entries are desired for each unique col1.

I tried the following and I get an error.

desired_output <- DT[, lapply(col2, function(x) unique(rbindlist(x))), by = col1]
# Error in rbindlist(x) : 
#   Item 1 of list input is not a data.frame, data.table or list

This 'works', though not desired output:

unique(rbindlist(DT$col2))
   colA colB
1:    1   aa
2:    3   bb
3:   54   cc
4:   23   hh

Is there anyway to use rbindlist in an aggregate function of a data.table?


Solution

  • Group by 'col1', run rbindlist on 'col2':

    unique(DT[ , rbindlist(col2), by = col1]) # trimmed thanks to @snoram
    #    col1 colA colB
    # 1:    A    1   aa
    # 2:    A    3   bb
    # 3:    A   54   cc
    # 4:    A   23   hh
    # 5:    B    1   aa