datesasproc-sqlenterprise-guide4gl

How to check max difference in dates between events and create new column in SAS Enterprise Guide?


I have table in SAS Enterprise Guide like below.

Data types and meaning:

Values in original dataset are not sorted, but it can be sorted if it is important for solution.

ID   | DT        | OFFER_1  | OFFER_2
-----|-----------|----------|----------
123  | 01MAY2020 | PR       | PR
123  | 05MAY2020 | PR       | P
123  | 10MAY2020 | P        | P
123  | 11MAY2020 | P        | P
123  | 20MAY2020 | P        | PR
123  | 21MAY2020 | PR       | M
123  | 25MAY2020 | M        | M
777  | 30MAY2020 | PR       | M
223  | 02JAN2020 | PR       | PR
223  | 15MAR2020 | PR       | PR
402  | 20MAR2020 | M        | M
33   | 11AUG2020 | M        | PR
11   | 20JAN2020 | PR       | M
11   | 05FEB2020 | M        | M

And I need to create new column "COL1" where will be information:

So, as a result I need something like below:

ID   | DT        | OFFER_1  | OFFER_2  | COL1
-----|-----------|----------|----------|---------
123  | 01MAY2020 | PR       | PR       | 15
123  | 05MAY2020 | PR       | P        | 15
123  | 10MAY2020 | P        | P        | 15
123  | 11MAY2020 | P        | P        | 15
123  | 20MAY2020 | P        | PR       | 15
123  | 21MAY2020 | PR       | M        | 15
123  | 25MAY2020 | M        | M        | 15
777  | 30MAY2020 | PR       | M        | 1
223  | 02JAN2020 | PR       | PR       | 0
223  | 15MAR2020 | PR       | PR       | 0
402  | 20MAR2020 | M        | M        | 0
33   | 11AUG2020 | M        | PR       | 0
11   | 20JAN2020 | PR       | M        | 16
11   | 05FEB2020 | M        | M        | 16

Because:

@Stu Sztukowski - Below exaplmes from my real data where your code does not work. I add also how it should be :)

Examples where code works wrong:

ID DT OFFER_1 OFFER_2 COL1
1020 01SEP2020 P P 1 -> number of days between 01SEP2020 and 02SEP2020 but there was not changes from PR to P or M, so this client does not interes for us, it should be 0
1020 02SEP2020 P P 1
2030 29JUL2022 P P 50 -> number of days between 20JUL2022 and 17SEP2022 but there was not changes from PR to P or M, so this client does not interes for us, it should be 0
2030 15JUL2022 P P 50
2030 17SEP2022 P P 50

How it should be: Because only change from PR to P or M are interested for us, example at the bottom of this post presents it. Below examples are similar like my examples ID = 223 or 402 from the bottom of this post :)

ID DT OFFER_1 OFFER_2 COL1
1020 01SEP2020 P P 0
1020 02SEP2020 P P 0
2030 29JUL2022 P P 0
2030 15JUL2022 P P 0
2030 17SEP2022 P P 0

How can I do that in SAS Enterprise Guide in PROC SQL or in normal SAS code ?


Solution

  • If offer_1 is P or M, then we just need to take the difference between the current date and the previous date. We'll use some rules to calculate the differences based on your rules. From there we can join the max values back with the original data.

    proc sort data=have;
        by id dt;
    run;
    
    data date_difs;
        set have;
        by id dt;
        retain flag_pr_pm start_dt;
    
        if(first.id) then call missing(flag_pr_pm, start_dt);
    
        if(offer_1 = 'PR' AND offer_2 IN('P', 'M') ) then do;
            flag_pr_pm = 1;
            start_dt   = dt;
        end;
    
        if(   (offer_1 IN('P', 'M') AND offer_2 = 'PR')
           OR (flag_pr_pm AND last.id AND NOT first.id)
          ) 
        then do;
            flag_pr_pm  = 0;
            total_days = dt - start_dt;
        end;
    
        if(    first.id 
           AND last.id 
           AND offer_1 = 'PR' 
           AND offer_2 IN ('P', 'M')
          ) 
        then total_days = 1;
    
        format start_dt date9.;
    run;
    

    Which gets you:

    id    dt         offer_1    offer_2   flag_pr_pm    start_dt    total_days
    11    20JAN2020  PR         M         1             20JAN2020   .
    11    05FEB2020  M          M         0             20JAN2020   16
    33    11AUG2020  M          PR        0             .           .
    123   01MAY2020  PR         PR        .             .           .
    123   05MAY2020  PR         P         1             05MAY2020   .
    123   10MAY2020  P          P         1             05MAY2020   .
    123   11MAY2020  P          P         1             05MAY2020   .
    123   20MAY2020  P          PR        0             05MAY2020   15
    123   21MAY2020  PR         M         1             21MAY2020   .
    123   25MAY2020  M          M         0             21MAY2020   4
    223   02JAN2020  PR         PR        .             .           .
    223   15MAR2020  PR         PR        .             .           .
    402   20MAR2020  M          M         .             .           .
    777   30MAY2020  PR         M         1             30MAY2020   1
    1020  01SEP2020  P          P         .             .           .
    1020  02SEP2020  P          P         .             .           .
    2030  15JUL2022  P          P         .             .           .
    2030  29JUL2022  P          P         .             .           .
    2030  17SEP2022  P          P         .             .           .
    

    Now join the max value by ID back to the original data:

    proc sql;
        create table want as
            select *, col1
            from have as t1
            LEFT JOIN
                 (select id, max(total_days) as col1
                  from date_difs
                  group by id
                 ) as t2
            ON t1.id = t2.id
        ;
    quit;
    

    This gives you the exact solution within your sample data.