sasproc-sqlenterprise-guide4gl

How to select duplicates rows with duplicates in each column in SAS Enterprise Guide?


I have time in SAS Enterprise Guide like below:

COL1 COL2 COL3
10 12 44
15 8 2
10 12 44

And I need to select duplicated rows --> select duplicate rows, but only those where duplicates occur in each column

COL1 COL2 COL3
10 12 44
10 12 44

How can I do that in SAS Enterprise Guide / PROC SQL ?


Solution

  • If you want to eliminate the observations that occur only once then you could do something like:

     proc sort data=have out=want ;
        by col1 col2 col3 ;
     run;
     data want;
       set want;
       by col1 col2 col3 ;
       if not (first.col3 and last.col3) ;
     run;
    

    The unique rows are the only rows in their by grouping, so they are both the first row and the last row.

    Here is a trick to use if there are a lot of variables and you are not sure which one will end up last if you use the _all_ variable list. Just add any one to the end and use that.

     proc sort data=have out=want ;
        by _all_;
     run;
     data want;
       set want;
       by _all_ col2;
       if not (first.col2 and last.col2) ;
     run;