sasdelete-rowretainproc-sql

Removing rows between two values in SAS


For the following data I am trying to filter rows, of each group ID, based on these conditions:

Can we create the flag variable as shown in my want dataset? so that I can filter on Flag='Y'?

Have

  ID     Type     Date         Value
  001     F       1/2/2018      Y
  001     B       1/3/2018
  001     B       1/4/2018      Y
  001     B       1/5/2018
  001     B       1/6/2018
  001     F       1/6/2018      Y
  001     B       1/6/2018      
  001     B       1/7/2018
  001     B       1/8/2018      Y
  001     B       1/8/2018
  001     B       1/9/2018
  002     F       1/2/2018      Y
  002     B       1/3/2018
  002     B       1/4/2018

Want

  ID     Type     Date         Value   Flag
  001     F       1/2/2018      Y       Y
  001     B       1/3/2018              Y
  001     B       1/4/2018      Y       Y 
  001     B       1/5/2018
  001     B       1/6/2018
  001     F       1/6/2018      Y       Y
  001     B       1/6/2018              Y
  001     B       1/7/2018              Y
  001     B       1/8/2018      Y       Y 
  001     B       1/8/2018
  001     B       1/9/2018
  002     F       1/2/2018      Y       Y
  002     B       1/3/2018              Y
  002     B       1/4/2018              Y

I tried to do the following

data F;
set have;
where Type='F';run;

data B;
 set have;
 where Type='B';run;

 proc sql;
  create table all as select
  a.* from B as b
  inner join F as f
  on a.id=b.id
  and b.date >= a.date;
quit;

This includes all the rows from my have dataset. Any help is much appreciated.


Solution

  • The criteria for computing the state of a row as part of a contiguous sub-group (call it a 'run' of rows) within group ID are relatively simple, but a compromised state might occur or be indicated if some funny cases of data occur:

    data want(drop=run_:);
      SET have;
      BY id;
    
      run_first = (type='F' and value='Y');
      run_final = (type='B' and value='Y');
    
      * set flag state at criteria for start of contiguous sub-group criteria;
      run_flag + run_first;
    
      if first.id and NOT run_flag then
        put 'WARNING: first row in group ' id= ' is not F Y, this may be incorrect';
    
      if run_flag > 1 and run_first then 
        put 'WARNING: an additional F Y before a B Y at row ' _n_;
    
      if run_flag then
        OUTPUT;
    
      if run_flag = 0 and run_final then 
        put 'WARNING: an additional B Y before a F Y at row ' _n_;
    
      * reset flag at criteria for contiguous sub-group;
      if last.id or run_final then 
        run_flag = 0;
    run;