rdata.table

Cannot update column value in R data.table


I have a data.table (let's call it mytable) that looks like below

A   B    C
v1  v2    p1, p2, p3, p1, p2, p2
v3  v4    p4, p5, p1, p2, p1
....

I want to convert it to below where I take only the unique values in column C for each row.

A   B    C
v1  v2    p1, p2, p3
v3  v4    p4, p5, p1, p2
...

When I try to take unique in the column, it gives me p1, p2, p3, p4, p5 in all rows.

I'm using the following code to get the data.table columns.

mytable[, .(A, B, paste(unique(unlist(strsplit(C, split=","))), collapse=", ") )] 
                              .

Solution

  • you need to apply the strsplit, unique, and paste functions row by row like this

    library(data.table)
    
    mytable[, C := sapply(strsplit(C, ","), function(x) paste(unique(x), collapse=", "))]