rdata.table

How do I create a nested vector from one data.table column data grouping by another column?


Given the following data.table:

Col_1 <- c("A", "A", "A", "B", "B", "B")
Col_2 <- c("AA", NA, "AA", NA, "BB", "BBB")
dt <- data.table(Col_1, Col_2)

I need to create a new column with a nested vector with the existing elements in Col_2 for each instance of Col_1, providing a desired output such as:

Col_1    c3
<char> <int>
1:      A     c("AA", NA, "AA")
2:      B     c(NA, "BB", "BBB")

I tried the code below...

dt[, .(c3= sapply(Col_2,c))
   , .(Col_1)]

... which outputs:

    Col_1     c3
   <char> <list>
1:      A     AA
2:      A     NA
3:      A     AA
4:      B     NA
5:      B     BB
6:      B    BBB

How do I get the desired result?


Solution

  • You should use nested .() to define it

    > dt[, .(c3 = .(Col_2)), by = Col_1]
        Col_1        c3
       <char>    <list>
    1:      A  AA,NA,AA
    2:      B NA,BB,BBB
    

    or you can use aggregate from base R like below (be aware of na.action = argument)

    > aggregate(cbind(c3 = Col_2) ~ Col_1, dt, list, na.action = na.pass)
      Col_1          c3
    1     A  AA, NA, AA
    2     B NA, BB, BBB