suppose to have the following:
data DB;
input ID :$20. Admission :date09. Discharge :date09. Index1;
format Admission date9. Discharge date9.;
cards;
0001 13JAN2017 25JAN2017 0
0001 22FEB2018 03MAR2018 0
0001 30JAN2019 04MAR2019 0
0001 30JAN2019 04MAR2019 1
0001 25DEC2020 02JAN2021 0
0002 15JAN2014 29JAN2014 1
0002 01FEB2015 10FEB2015 0
0002 01FEB2015 10FEB2015 0
0003 22MAR2019 23MAR2019 1
0003 22MAR2019 23MAR2019 0
0003 02JUL2019 28AUG2019 0
0004 01SEP2022 15SEP2022 1
0004 02DEC2023 05DEC2023 1
;run;
Is there a way to get the following? In other words I need to remove duplicated rows (identical admission-discharge date) and retain the inly one where Index1 = 1. I tried with proc sort to order by the date and the Index1 (descending) without success. Thank you in advance!
data DB1;
input ID :$20. Admission :date09. Discharge :date09. Index1;
format Admission date9. Discharge date9.;
cards;
0001 13JAN2017 25JAN2017 0
0001 22FEB2018 03MAR2018 0
0001 30JAN2019 04MAR2019 1
0001 25DEC2020 02JAN2021 0
0002 15JAN2014 29JAN2014 1
0002 01FEB2015 10FEB2015 0
0003 22MAR2019 23MAR2019 1
0003 02JUL2019 28AUG2019 0
0004 01SEP2022 15SEP2022 1
0004 02DEC2023 05DEC2023 1
;run;
You can sort it by id admission discharge index1
which will ensure that if there are any duplicate values of the combination of id admission discharge
, index1
is last. From there, you can just use a DATA Step with by-group processing and only output the last value of the group of id admission discharge
.
Note that in the code below, if(last.discharge)
automatically takes into account the groups before it. In other words, last.discharge
means it's the last value within the combination of id admission discharge
.
proc sort data=db;
by id admission discharge index1;
run;
data want;
set db;
by id admission discharge;
if(last.discharge);
run;
Or, you can do it equivalently with a SQL having
clause:
proc sql;
create table want as
select *
from db
group by id, admission, discharge
having index1 = max(index1)
;
quit;
ID Admission Discharge Index1
0001 13JAN2017 25JAN2017 0
0001 22FEB2018 03MAR2018 0
0001 30JAN2019 04MAR2019 1
0001 25DEC2020 02JAN2021 0
0002 15JAN2014 29JAN2014 1
0002 01FEB2015 10FEB2015 0
0003 22MAR2019 23MAR2019 1
0003 02JUL2019 28AUG2019 0
0004 01SEP2022 15SEP2022 1
0004 02DEC2023 05DEC2023 1