sas4gl

Remove duplicates inside a sequence of records in a group with SAS


Is it possible to remove duplicated records in sequence inside a specific group and output only last of them (based od date) with 4GL (SAS)? I have data like:

data example;
input obs id dt value WANT_TO_SELECT;
cards;
1 10 1 500 0
2 10 2 750 1
3 10 3 750 1
4 10 4 750 0
5 10 5 500 0
6 20 1 150 1
7 20 2 150 0
8 20 3 370 0
9 20 4 150 0
;
run;

As You see for id=10 I would like to have only one (last) record with value 750, because there is one after the other while value 500 can be twice because they are separated. I was trying use last/first but I am not sure how to sort the data.


Solution

  • Looks like a use case for the NOTSORTED keyword of the BY statement. This will let you use VALUE as a BY variable even though the data is not actually sorted by VALUE. That way the LAST.VALUE flag can be used.

    data want;
      set example;
      by id value notsorted;
      if last.value;
    run;
    

    Results:

                                       WANT_TO_
    Obs    obs    id    dt    value     SELECT
    
     1      1     10     1     500         0
     2      4     10     4     750         0
     3      5     10     5     500         0
     4      7     20     2     150         0
     5      8     20     3     370         0
     6      9     20     4     150         0