I have a data.table
(let's call it mytable
) that looks like below
A B C
v1 v2 p1, p2, p3, p1, p2, p2
v3 v4 p4, p5, p1, p2, p1
....
I want to convert it to below where I take only the unique values in column C for each row.
A B C
v1 v2 p1, p2, p3
v3 v4 p4, p5, p1, p2
...
When I try to take unique
in the column, it gives me p1, p2, p3, p4, p5
in all rows.
I'm using the following code to get the data.table columns.
mytable[, .(A, B, paste(unique(unlist(strsplit(C, split=","))), collapse=", ") )]
.
you need to apply the strsplit, unique, and paste functions row by row like this
library(data.table)
mytable[, C := sapply(strsplit(C, ","), function(x) paste(unique(x), collapse=", "))]