suppose to have the following dataset:
data have;
input ID :$20. Admission :date09. Discharge :date09. Index Variable1 Variable2;
format Admission date9. Discharge date9.;
cards;
0001 13JAN2015 20JAN2015 1 2 4
0001 21FEB2015 31DEC2015 0 . 2
0001 21FEB2015 31DEC2015 0 3 .
0001 01JAN2019 31DEC2019 0 6 9
0002 01JAN2015 31DEC2015 0 2 .
0002 01JAN2019 31OCT2019 1 . 2
0002 01JAN2019 31OCT2019 0 . 2
;
Than, I have another dataset that looks like this and comes from other calculation:
data have1;
input ID :$20. Admission :date09. Discharge :date09. Index Variable1 Variable2;
format Admission date9. Discharge date9.;
cards;
0001 13JAN2015 20JAN2015 1 6 8
0002 01JAN2019 31OCT2019 1 4 2
;
Is there a way to get the following?
data have2;
input ID :$20. Admission :date09. Discharge :date09. Index Variable1 Variable2;
format Admission date9. Discharge date9.;
cards;
0001 13JAN2015 20JAN2015 1 6 8
0001 21FEB2015 31DEC2015 0 . .
0001 21FEB2015 31DEC2015 0 . .
0001 01JAN2019 31DEC2019 0 . .
0002 01JAN2015 31DEC2015 0 . .
0002 01JAN2019 31OCT2019 1 4 2
0002 01JAN2019 31OCT2019 0 . .
;
In other words I would like to replace values in have with values in have1 (where Index = 1). The remaining in have should be set to "." to have have3. The total number of variable: is 50. Here only 2 are shown for simplicity.
Thank you in advance
You can do this in two ways:
Let's take a look at both.
Simple Merge
We want to merge by the key, which is the combination of ID Admission Discharge Index
. First we'll sort both datasets and then merge them by this key. If the combination of keys match, we'll set Variable1
and Variable2
to missing.
proc sort data=have;
by id admission discharge index;
run;
proc sort data=have1;
by id admission discharge index;
run;
data want;
merge have(in=h)
have1(in=h1)
;
by id admission discharge index;
if(h NE h1) then call missing(of variable:);
run;
Hash Join
This is very similar to a simple merge but it does not require sorting and has the highest performance. We'll treat have1
as a lookup table and pull down values from it for each match. If we do not find a match, we'll set the values of Variable1
and Variable2
to missing.
data want;
set have;
if(_N_ = 1) then do;
dcl hash h1(dataset:'have1');
h1.defineKey('id', 'admission', 'discharge', 'index');
h1.defineData('variable1', 'variable2');
h1.defineDone();
end;
if(h1.Find() NE 0) then call missing(of variable:);
run;
ID Admission Discharge Index Variable1 Variable2
0001 13JAN2015 20JAN2015 1 6 8
0001 21FEB2015 31DEC2015 0 . .
0001 21FEB2015 31DEC2015 0 . .
0001 01JAN2019 31DEC2019 0 . .
0002 01JAN2015 31DEC2015 0 . .
0002 01JAN2019 31OCT2019 1 4 2
0002 01JAN2019 31OCT2019 0 . .