I have a dataset (df) like the following:
val1 val2 val3
1 ST 1.2 6.59 0.72 0 ST 1.2 6.59 0.72 0 PEACH 1.05 6.62 0.49 0
2 PEACH 1.05 6.62 0.49 0 ST 1.2 6.59 0.72 0 PEACH 1.05 6.62 0.49 0
3 ST 1.2 6.59 0.72 0 ST 1.2 6.59 0.72 0 PEACH 1.05 6.62 0.49 0
val4 val5
1 VANI 1.06 16.57 1.019 0 BB 1.0 6.75 0.45 0
2 VANI 1.06 16.57 1.019 0 BB 1.0 6.75 0.45 0
3 VANI 1.06 16.57 1.019 0 BB 1.0 6.75 0.45 0
Each row contains five character strings, and two of them will be duplicate (in each, the duplicated columns may be different), and I want remove the duplicated column for each row.
I have tried unique(df[1,]) or duplicated(dt[1,]), but they keep showing that there is not duplicated values.
I checked using df[1,1] == df[1,2], and this is shown as true, so I don't know why the unique and duplicated does not work here.
df <- data.frame(x=c(1,2,1,1), y=c(1,4:5,1), z=c(1,7:8,1), w=c(1,2,1,1), t=c(3,4,5,3))
df
# x y z w t
# 1 1 1 1 1 3
# 2 2 4 7 2 4
# 3 1 5 8 1 5
# 4 1 1 1 1 3
If you notice row 1 and row 4 are the same(1 1 1 1 3). Also column 1 and column 4 are the same (1 2 1 1).
duplicated can locate both types:
duplicated(df)
[1] FALSE FALSE FALSE TRUE
The function went row by row and returned the duplicate logical and found one at the last.
For the column search, which is what you are trying, it doesn't seem to work at first:
duplicated(df, MARGIN=2)
[1] FALSE FALSE FALSE TRUE
That was not expected. It did the exact same thing, a row by row search. I marked columns but I still provided a data.frame. The MARGIN argument was ignored. But if I provide a matrix it works:
duplicated(as.matrix(df), MARGIN=2)
[1] FALSE FALSE FALSE TRUE FALSE
That works, a column by column search. I can also force the matrix method of the function:
duplicated.matrix(df, MARGIN=2)
[1] FALSE FALSE FALSE TRUE FALSE