rduplicatesdataframe

Find duplicated rows (based on 2 columns) in Data Frame in R


I have a data frame in R which looks like:

| RIC    | Date                | Open   |
|--------|---------------------|--------|
| S1A.PA | 2011-06-30 20:00:00 | 23.7   |
| ABC.PA | 2011-07-03 20:00:00 | 24.31  |
| EFG.PA | 2011-07-04 20:00:00 | 24.495 |
| S1A.PA | 2011-07-05 20:00:00 | 24.23  |

I want to know if there's any duplicates regarding to the combination of RIC and Date. Is there a function for that in R?


Solution

  • You can always try simply passing those first two columns to the function duplicated:

    duplicated(dat[,1:2])
    

    assuming your data frame is called dat. For more information, we can consult the help files for the duplicated function by typing ?duplicated at the console. This will provide the following sentences:

    Determines which elements of a vector or data frame are duplicates of elements with smaller subscripts, and returns a logical vector indicating which elements (rows) are duplicates.

    So duplicated returns a logical vector, which we can then use to extract a subset of dat:

    ind <- duplicated(dat[,1:2])
    dat[ind,]
    

    or you can skip the separate assignment step and simply use:

    dat[duplicated(dat[,1:2]),]