r filter duplicates data-manipulation drop-duplicates

R:search through each row, and delete duplicate values (may differ in each row) in each row

I have a dataset (df) like the following:

      val1                                val2               val3
1    ST  1.2  6.59 0.72 0       ST 1.2  6.59 0.72 0     PEACH 1.05  6.62 0.49 0
2   PEACH 1.05  6.62 0.49 0     ST 1.2  6.59 0.72 0     PEACH 1.05  6.62 0.49 0
3    ST  1.2  6.59 0.72 0       ST 1.2  6.59 0.72 0     PEACH 1.05  6.62 0.49 0

         val4                              val5
1 VANI 1.06 16.57 1.019 0    BB 1.0  6.75 0.45 0
2 VANI 1.06 16.57 1.019 0    BB 1.0  6.75 0.45 0
3 VANI 1.06 16.57 1.019 0    BB 1.0  6.75 0.45 0

Each row contains five character strings, and two of them will be duplicate (in each, the duplicated columns may be different), and I want remove the duplicated column for each row.

I have tried unique(df[1,]) or duplicated(dt[1,]), but they keep showing that there is not duplicated values.

I checked using df[1,1] == df[1,2], and this is shown as true, so I don't know why the unique and duplicated does not work here.

Solution

df <- data.frame(x=c(1,2,1,1), y=c(1,4:5,1), z=c(1,7:8,1), w=c(1,2,1,1), t=c(3,4,5,3))
df
#   x y z w t
# 1 1 1 1 1 3
# 2 2 4 7 2 4
# 3 1 5 8 1 5
# 4 1 1 1 1 3

If you notice row 1 and row 4 are the same(1 1 1 1 3). Also column 1 and column 4 are the same (1 2 1 1).

duplicated can locate both types:

duplicated(df)
[1] FALSE FALSE FALSE  TRUE

The function went row by row and returned the duplicate logical and found one at the last.

For the column search, which is what you are trying, it doesn't seem to work at first:

duplicated(df, MARGIN=2)
[1] FALSE FALSE FALSE  TRUE

That was not expected. It did the exact same thing, a row by row search. I marked columns but I still provided a data.frame. The MARGIN argument was ignored. But if I provide a matrix it works:

duplicated(as.matrix(df), MARGIN=2)
[1] FALSE FALSE FALSE  TRUE FALSE

That works, a column by column search. I can also force the matrix method of the function:

duplicated.matrix(df, MARGIN=2)
[1] FALSE FALSE FALSE  TRUE FALSE