I am trying to flag observations with the same ID and Date but different results. Not all observations with the same ID and Date will have a different result but a large fraction do. I have already created a flag variable for the observations with same ID and Date but having trouble creating the flag for different results.
Here is some example data.
data test;
input ID Dt :mmddyy10. result;
format dt mmddyy10.;
datalines;
1 4-27-24 1
1 4-27-24 1
1 4-30-24 2
1 4-30-24 2
2 5-5-24 1
2 5-5-24 2
3 6-6-24 1
3 6-6-24 1
4 7-7-24 1
4 7-7-24 2
4 7-17-24 2
4 7-17-24 2
;
run;
Thank you in advance!!
You can use by-group processing and the lag function to achieve this. We'll check if it's the last observation in the group of id dt
and ensure that there is more than one value of id dt
, then compare the previous value of result
to the current value. If it's the same, it's return a 0. If it isn't, it's return a 1. Because of some specific quirks about how the lag function works, we'll create the lag first and then set it to missing every time we get to the start of a new group.
data want;
set test;
by id dt;
lag_result = lag(result);
if(first.dt) then lag_result = .;
flag_dupe_and_dif_result = (NOT first.dt AND last.dt AND result NE lag_result);
drop lag_result;
run;
ID Dt result flag_dupe_and_dif_result
1 04/27/2024 1 0
1 04/27/2024 1 0
1 04/30/2024 2 0
1 04/30/2024 2 0
2 05/05/2024 1 0
2 05/05/2024 2 1
3 06/06/2024 1 0
3 06/06/2024 1 0
4 07/07/2024 1 0
4 07/07/2024 2 1
4 07/17/2024 2 0
4 07/17/2024 2 0