I have a data.table 'DT' with a column ('col2') that is a list of data frames:
require(data.table)
DT <- data.table(col1 = c('A','A','B'),
col2 = list(data.frame(colA = c(1,3,54, 23),
colB = c("aa", "bb", "cc", "hh")),
data.frame(colA =c(23, 1),
colB = c("hh", "aa")),
data.frame(colA = 1,
colB = "aa")))
> DT
col1 col2
1: A <data.frame>
2: A <data.frame>
3: B <data.frame>
>> DT$col2
[[1]]
colA colB
1 1 aa
2 3 bb
3 54 cc
4 23 hh
[[2]]
colA colB
1 23 hh
2 1 aa
[[3]]
colA colB
1 1 aa
Each data.frame in col2 has two columns colA and colB.
I'd like to have a data.table output that binds each unique row of those data.frames based on col1 of DT.
I guess it's like using rbindlist in an aggregate function of the data.table.
This is the desired output:
> #desired output
> output
colA colB col1
1: 1 aa A
2: 3 bb A
3: 54 cc A
4: 23 hh A
5: 1 aa B
The dataframe of the second row of DT (DT[2, col2]) has duplicate entries, and only unique entries are desired for each unique col1.
I tried the following and I get an error.
desired_output <- DT[, lapply(col2, function(x) unique(rbindlist(x))), by = col1]
# Error in rbindlist(x) :
# Item 1 of list input is not a data.frame, data.table or list
This 'works', though not desired output:
unique(rbindlist(DT$col2))
colA colB
1: 1 aa
2: 3 bb
3: 54 cc
4: 23 hh
Is there anyway to use rbindlist in an aggregate function of a data.table?
Group by 'col1', run rbindlist on 'col2':
unique(DT[ , rbindlist(col2), by = col1]) # trimmed thanks to @snoram
# col1 colA colB
# 1: A 1 aa
# 2: A 3 bb
# 3: A 54 cc
# 4: A 23 hh
# 5: B 1 aa