sasdb2proc-sql

Merge/Group data in SAS


I have the following dataset

Id com typ cust bu tar item item_sufx part line dtn_cd geo_con sign
10 ARF 3 2585 12 100 4587 800 1 1 1 ZL-UK-23 +
10 ARF 3 2585 12 100 4587 800 1 2 1 ZL-US-22 -
10 ARF 3 2585 12 100 4587 800 1 3 1 ZL-JP-22 +
10 ARF 3 2585 12 100 4587 800 1 4 2 ZL-US-44 +
10 ARF 3 2585 12 100 4587 800 1 5 2 ZL-US-66 +
10 ARF 3 2585 12 100 4587 800 2 1 1 AP-TN-44 +
10 ARF 3 2585 12 100 4587 800 2 2 2 AP-TN-22 -
10 ARF 3 2585 12 100 4587 800 3 1 3 PQ-AR-44 +
10 ARF 3 2585 12 100 4587 800 3 2 4 PQ-AR-22 -
50 PFP 3 3000 12 100 9999 899 1 1 1 DE-NN-44 +
50 PFP 3 3000 12 100 9999 899 1 2 2 DE-NN-22 -
50 PFP 3 3000 12 100 9999 899 1 1 3 HK-DC-44 -
50 PFP 3 3000 12 100 9999 899 1 2 4 HK-DC-22 +

I want the following output

enter image description here

I tried the following code but it didn't work properly

data table_A;
set table_A;
merge
table_A (where= (DCTN_CD = '1') rename=(GEO_CON=FROM_GEO))
table_A (where= (DCTN_CD = '2') rename=(GEO_CON=TO_GEO))
table_A (where= (DCTN_CD = '3') rename=(GEO_CON=BETWEEN_GEO))
table_A (where= (DCTN_CD = '4') rename=(GEO_CON=AMONGST_GEO))
run;

Any idea how do that? I am open to SAS or PROC SQL (DB2) suggestions.


Solution

  • You will need to compute two synthetic values:

    Then you can merge by the line_group and dtn_seq.

    Example:

    data have;
    input
    Id  com$ typ cust  bu  tar item  item_sufx part  line  dtn_cd$ geo_con$ sign$ group; format _numeric_ 4.;
    datalines;
    10  ARF 3 2585  12  100 4587  800 1 1 1 ZL-UK-23  +  1
    10  ARF 3 2585  12  100 4587  800 1 2 1 ZL-US-22  -  2
    10  ARF 3 2585  12  100 4587  800 1 3 1 ZL-JP-22  +  3
    10  ARF 3 2585  12  100 4587  800 1 4 2 ZL-US-44  +  1
    10  ARF 3 2585  12  100 4587  800 1 5 2 ZL-US-66  +  2
    10  ARF 3 2585  12  100 4587  800 2 1 1 AP-TN-44  +  4
    10  ARF 3 2585  12  100 4587  800 2 2 2 AP-TN-22  -  4
    10  ARF 3 2585  12  100 4587  800 3 1 3 PQ-AR-44  +  5
    10  ARF 3 2585  12  100 4587  800 3 2 4 PQ-AR-22  -  5
    50  PFP 3 3000  12  100 9999  899 1 1 1 DE-NN-44  +  6
    50  PFP 3 3000  12  100 9999  899 1 2 2 DE-NN-22  -  6
    50  PFP 3 3000  12  100 9999  899 1 1 3 HK-DC-44  -  7
    50  PFP 3 3000  12  100 9999  899 1 2 4 HK-DC-22  +  7
    ;
    
    data have_grouped;
      set have;
      by id com typ cust bu tar item item_sufx part dtn_cd;
      
      line_decreased = line < lag(line) ;
      
      if first.part or line_decreased then line_group+1;
      if first.dtn_cd then dtn_seq=1; else dtn_seq+1;
    run;
    
    data want;
      set have_grouped;
      merge
        have_grouped (where= (DTN_CD = '1') rename=(GEO_CON=FROM_GEO SIGN=SIGN_FROM))
        have_grouped (where= (DTN_CD = '2') rename=(GEO_CON=TO_GEO   SIGN=SIGN_TO))
        have_grouped (where= (DTN_CD = '3') rename=(GEO_CON=BETWEEN_GEO SIGN=SIGN_BTWN))
        have_grouped (where= (DTN_CD = '4') rename=(GEO_CON=AMONGST_GEO SIGN=SIGN_AMNG))
      ;
      by line_group dtn_seq;
    run;
    

    You could get an ERROR about data not in order (because where= is not required to deliver rows in any particular order). If that happens you will need to merge four sorted views that do deliver the data in the line_group dtn_seq order.