I have a dataset at id level with some overlapping dates. All I need to find those rows and add an identifier to count the number overlapping records.
Data:
ID ITEM StrDate EndDate
1001 A121 02/01/2022 02/15/2022
1001 B121 03/10/2022 03/10/2022
1002 C121 02/01/2022 02/10/2022
1002 D121 02/05/2022 02/15/2022
1003 E121 03/10/2022 03/21/2022
1003 F121 03/12/2022 03/21/2022
1004 G121 01/12/2022 01/14/2022
Below is the Result that I am expecting
Want:
ID ITEM StrDate EndDate Indicator
1001 A121 02/01/2022 02/15/2022 N
1001 B121 03/10/2022 03/10/2022 N
1002 C121 02/01/2022 02/10/2022 Y
1002 D121 02/05/2022 02/15/2022 Y
1003 E121 03/10/2022 03/21/2022 Y
1003 F121 03/12/2022 03/21/2022 Y
1004 G121 01/12/2022 01/14/2022 N
I tried sorting the data first on StrDate
and EndDate
Proc sort data=Data; by ID StrDate EndDate;run;
Then I tried using lag
function to find the same id and subtract the dates but I figured that's not the correct way of doing.
I appreciate your help here. thanks
Simple overlap logic:
proc sql;
create table want as
select
a.*,
/* simple overlap logic */
case
when a.strdate <= b.strdate & a.enddate >= b.strdate then 'Y'
when b.strdate < a.strdate & b.enddate >= a.strdate then 'Y'
else 'N'
end as overlap
from
have a
left join
have b
on a.id = b.id /* join on same ids */
and a.item <> b.item /* but not the same item */
;
quit;
Result:
ID ITEM StrDate EndDate overlap
1001 B121 03/10/2022 03/10/2022 N
1001 A121 02/01/2022 02/15/2022 N
1002 D121 02/05/2022 02/15/2022 Y
1002 C121 02/01/2022 02/10/2022 Y
1003 E121 03/10/2022 03/21/2022 Y
1003 F121 03/12/2022 03/21/2022 Y
1004 G121 01/12/2022 01/14/2022 N
Overlap occurs if StartA <= StartB when:
StartA EndA>=StartB
|-------------|
|---------
StartB