I have the following dataset
Id | com | typ | cust | bu | tar | item | item_sufx | part | line | dtn_cd | geo_con | sign |
---|---|---|---|---|---|---|---|---|---|---|---|---|
10 | ARF | 3 | 2585 | 12 | 100 | 4587 | 800 | 1 | 1 | 1 | ZL-UK-23 | + |
10 | ARF | 3 | 2585 | 12 | 100 | 4587 | 800 | 1 | 2 | 1 | ZL-US-22 | - |
10 | ARF | 3 | 2585 | 12 | 100 | 4587 | 800 | 1 | 3 | 1 | ZL-JP-22 | + |
10 | ARF | 3 | 2585 | 12 | 100 | 4587 | 800 | 1 | 4 | 2 | ZL-US-44 | + |
10 | ARF | 3 | 2585 | 12 | 100 | 4587 | 800 | 1 | 5 | 2 | ZL-US-66 | + |
10 | ARF | 3 | 2585 | 12 | 100 | 4587 | 800 | 2 | 1 | 1 | AP-TN-44 | + |
10 | ARF | 3 | 2585 | 12 | 100 | 4587 | 800 | 2 | 2 | 2 | AP-TN-22 | - |
10 | ARF | 3 | 2585 | 12 | 100 | 4587 | 800 | 3 | 1 | 3 | PQ-AR-44 | + |
10 | ARF | 3 | 2585 | 12 | 100 | 4587 | 800 | 3 | 2 | 4 | PQ-AR-22 | - |
50 | PFP | 3 | 3000 | 12 | 100 | 9999 | 899 | 1 | 1 | 1 | DE-NN-44 | + |
50 | PFP | 3 | 3000 | 12 | 100 | 9999 | 899 | 1 | 2 | 2 | DE-NN-22 | - |
50 | PFP | 3 | 3000 | 12 | 100 | 9999 | 899 | 1 | 1 | 3 | HK-DC-44 | - |
50 | PFP | 3 | 3000 | 12 | 100 | 9999 | 899 | 1 | 2 | 4 | HK-DC-22 | + |
I want the following output
I tried the following code but it didn't work properly
data table_A;
set table_A;
merge
table_A (where= (DCTN_CD = '1') rename=(GEO_CON=FROM_GEO))
table_A (where= (DCTN_CD = '2') rename=(GEO_CON=TO_GEO))
table_A (where= (DCTN_CD = '3') rename=(GEO_CON=BETWEEN_GEO))
table_A (where= (DCTN_CD = '4') rename=(GEO_CON=AMONGST_GEO))
run;
Any idea how do that? I am open to SAS or PROC SQL (DB2) suggestions.
You will need to compute two synthetic values:
Then you can merge by the line_group and dtn_seq.
Example:
data have;
input
Id com$ typ cust bu tar item item_sufx part line dtn_cd$ geo_con$ sign$ group; format _numeric_ 4.;
datalines;
10 ARF 3 2585 12 100 4587 800 1 1 1 ZL-UK-23 + 1
10 ARF 3 2585 12 100 4587 800 1 2 1 ZL-US-22 - 2
10 ARF 3 2585 12 100 4587 800 1 3 1 ZL-JP-22 + 3
10 ARF 3 2585 12 100 4587 800 1 4 2 ZL-US-44 + 1
10 ARF 3 2585 12 100 4587 800 1 5 2 ZL-US-66 + 2
10 ARF 3 2585 12 100 4587 800 2 1 1 AP-TN-44 + 4
10 ARF 3 2585 12 100 4587 800 2 2 2 AP-TN-22 - 4
10 ARF 3 2585 12 100 4587 800 3 1 3 PQ-AR-44 + 5
10 ARF 3 2585 12 100 4587 800 3 2 4 PQ-AR-22 - 5
50 PFP 3 3000 12 100 9999 899 1 1 1 DE-NN-44 + 6
50 PFP 3 3000 12 100 9999 899 1 2 2 DE-NN-22 - 6
50 PFP 3 3000 12 100 9999 899 1 1 3 HK-DC-44 - 7
50 PFP 3 3000 12 100 9999 899 1 2 4 HK-DC-22 + 7
;
data have_grouped;
set have;
by id com typ cust bu tar item item_sufx part dtn_cd;
line_decreased = line < lag(line) ;
if first.part or line_decreased then line_group+1;
if first.dtn_cd then dtn_seq=1; else dtn_seq+1;
run;
data want;
set have_grouped;
merge
have_grouped (where= (DTN_CD = '1') rename=(GEO_CON=FROM_GEO SIGN=SIGN_FROM))
have_grouped (where= (DTN_CD = '2') rename=(GEO_CON=TO_GEO SIGN=SIGN_TO))
have_grouped (where= (DTN_CD = '3') rename=(GEO_CON=BETWEEN_GEO SIGN=SIGN_BTWN))
have_grouped (where= (DTN_CD = '4') rename=(GEO_CON=AMONGST_GEO SIGN=SIGN_AMNG))
;
by line_group dtn_seq;
run;
You could get an ERROR about data not in order (because where= is not required to deliver rows in any particular order). If that happens you will need to merge four sorted views that do deliver the data in the line_group dtn_seq order.