I have following dataset:
Data test;
Input id$ visit$ enrdate : mmddyy10. Vsdate : mmddyy10. ;
Format enrdate mmddyy10. Vsdate mmddyy10.;
Cards;
ABC01 00 1/2/2020 1/2/2020
ABC02 00 5/16/2020 5/16/2020
ABC02 06 5/16/2019 11/12/2019
CDC01 00 8/20/2019 8/20/2019
CDC01 06 8/20/2019 2/16/2020
EFG01 00 5/20/2020 5/20/2020
EFG02 00 12/2/2018 12/2/2018
EFG02 02 12/2/2018 1/31/2019
EFG02 06 12/2/2018 5/31/2019
EFG02 12 12/2/2018 12/2/2019
EFG03 00 3/3/2019 3/3/2019
EFG03 12 3/3/2019 3/2/2020
GFF04 00 6/2/2019 6/2/2019
GFF04 06 6/2/2019 .
;
Run;
I want to do the following:
Count how many of the currently enrolled participants still need to have their 6-months visit (v6), how many need their 12-months(v12), how many need their 18-months (v18) and how many need their 24-months (v24).
For example, we can see from mock dataset that all 7 participants would still need their v18 and v24 since none of them had these visits already, while participants ABC01 and EFG01 also need their v6 and v12, in addition to v18 and v24. Participants ABC02 and CDC01 only need their v12 in addition to the v18 and v24 etc. Participant GFF04, on the other hand, would need to be excluded from list of participants who still need their v6 since he was supposed to have that visit sometime in December 2019 but never did. However, I still want to count him as needing to have his v12, v18 and v24.
In general, we want to also exclude participants whose enrollment dates were a long time ago, yet they never had any other follow-up visits after that. For example, if a participant has an enrdate=1/20/2018 yet no other visits after that, then we wouldn’t count him as still needing any other visits, as he most likely dropped out since it’s now June 2020 and the only visit we have for him is that one enrollment visit.
Also, if a participant had the next visit entered but not the previous, we would not count this participant as needing that previous visit (i.e. if a participant already had v12, let’s say, but not v6, then he would not get counted as still needing that v6 and so on; i.e. participant EFG03)
Finally, I want a cutoff date of August 19, 2021. This means that a participant like CDC01 who was enrolled on 20 August 2019, would need to have their v24 on 20 August 2021 but that’s past the August 19, 2021 cutoff so this participant would only be counted as needing v12-months and v18. And so on.
This is what I’ve done so far but now I am stuck and don’t know how to continue coding to account for all the conditions above.
Data long;
Set test;
Where visit in (“00”, “06”, “12”, “18”, “24”);
If vsdate ne .;
Run;
Proc sort data=long out=longsort;
By id;
Run;
Data wide;
Set longsort;
By id;
Keep id enrdate vsdate00-vsdate24;
Retain vsdate00-vsdate24;
ARRAY avsdate(00:24) vsdate00-vsdate24;
if first.ID then do;
do i = 00 to 24;
avsdate (i) =.;
end;
end;
avsdate(visit)=vsdate;
if last.ID then output;
run;
data wide_0;
set wide (keep=ID ENRDATE VSDATE00 VSDATE06 VSDATE12 VSDATE18 VSDATE24);
run;
data wide_final;
set wide_0;
attrib
vsdate00 format=mmddyy10. Informat=anydtdte.
vsdate06 format=mmddyy10. Informat=anydtdte.
Vsdate12 format=mmddyy10. Informat=anydtdte.
Vsdate18 format=mmddyy10. Informat=anydtdte.
Vsdate24 format=mmddyy10. Informat=anydtdte.
;
If somebody can please help me with suggestions/sample code, that would be very helpful!
Thanks!
If you generate a dataset with a line for every combination of participant ID and visit, then you can proceed to classify each visit that has not (yet) been attended. This makes it easy to manually check and to count different kind of visits.
In the below code I have categorized a visit as skipped if its planned date was more than a week ago, and I have categorized a participant as a likely drop-out if their latest visit was more than a year ago. You may modify these rules.
Note that the result dataset will change over time, since the dates are compared to today()
.
* Define test data;
data have;
Input id$ visit$ enrdate : mmddyy10. Vsdate : mmddyy10. ;
Format enrdate mmddyy10. Vsdate mmddyy10.;
Cards;
ABC01 00 1/2/2020 1/2/2020
ABC02 00 5/16/2019 5/16/2019
ABC02 06 5/16/2019 11/12/2019
CDC01 00 8/20/2019 8/20/2019
CDC01 06 8/20/2019 2/16/2020
EFG01 00 5/20/2020 5/20/2020
EFG02 00 12/2/2018 12/2/2018
EFG02 02 12/2/2018 1/31/2019
EFG02 06 12/2/2018 5/31/2019
EFG02 12 12/2/2018 12/2/2019
EFG03 00 3/3/2019 3/3/2019
EFG03 12 3/3/2019 3/2/2020
GFF04 00 6/2/2019 6/2/2019
GFF04 06 6/2/2019 .
;
run;
* Define the planned visits;
data planned_visits;
Input visit$;
Cards;
00
06
12
18
24
;
run;
* List and categorize all visits for all participants;
proc sql;
create table id_visit_list as
select a.id
,a.enrdate
,b.visit
,intnx('month',a.enrdate,input(b.visit,best.),'s') as planned_date format mmddyy10.
,c.vsdate
,case
when not missing(vsdate) then "Visit attended"
when calculated planned_date + 7 < today() then "Visit skipped"
when calculated planned_date > '19AUG2021'D then "Visit after cut-off"
when max(vsdate) + 365 < today() then "Likely drop-out"
else "Scheduled visit"
end as visit_status
from (select distinct id, enrdate from have) as a
left join (select distinct visit from planned_visits) as b
on 1
left join have as c
on a.id = c.id and b.visit = c.visit
group by a.id
;
quit;
* Count number of scheduled visits;
proc sql;
create table want as
select distinct visit
,sum(visit_status = "Scheduled visit") as count
from id_visit_list
group by visit
;
quit;