I have a dataset where each row represents a hospital visit. However, if a patient returns to the same hospital, the system updates the previous discharge datetime instead of creating a new row.
I want to generate additional “return visit” rows so that each hospital stay has its own start and end datetimes.
There’s a variable indicating the transfer destination (a5_tnf_fltu), so I can identify hospital-to-hospital transfers, but I need help adjusting the date/times correctly when patients return to a prior hospital.
Logic to Update: If a patient goes A → B → A, I need:
Here is an example of what I 'Have' vs. what I 'Want'
HAVE:
| rec_id | hospital | a1_first_dttm | a5_disch_dttm |
|---|---|---|---|
| 1 | A | 29AUG21:03:30 | 02SEP21:06:35 |
| 1 | B | 29AUG21:11:30 | 30AUG21:01:30 |
| 1 | A | . | 02SEP21:06:35 |
| 2 | A | 29JUN22:19:39 | 30JUN22:11:50 |
| 2 | B | 30JUN22:13:00 | 07JUL22:13:05 |
| 2 | C | 30JUN22:23:00 | 03JUL22:16:40 |
| 2 | B | . | 07JUL22:13:05 |
WANT:
| rec_id | visit | hospital | a1_first_dttm | a5_disch_dttm |
|---|---|---|---|---|
| 1 | 1 | A | 29AUG21:03:30 | 29AUG21:11:30 |
| 1 | 2 | B | 29AUG21:11:30 | 30AUG21:01:30 |
| 1 | 3 | A | 30AUG21:01:30 | 02SEP21:06:35 |
| 2 | 1 | A | 29JUN22:19:39 | 30JUN22:11:50 |
| 2 | 2 | B | 30JUN22:13:00 | 30JUN22:23:00 |
| 2 | 3 | C | 30JUN22:23:00 | 03JUL22:16:40 |
| 2 | 4 | B | 03JUL22:16:40 | 07JUL22:13:05 |
Here is my current SAS code:
data admit_out3;
set admit_out2;
* translate transfer disposition to transfer facility;
if a5_tnf_fltu in (2,30) then trnsfr_fac = 1;
else if a5_tnf_fltu in (4,31) then trnsfr_fac = 2;
else if a5_tnf_fltu in (20,5,22,13,14,32) then trnsfr_fac = 3;
else if a5_tnf_fltu in (21,3,23,15,16,33) then trnsfr_fac = 4;
run;
* Sort by patient and count;
proc sort data=admit_out3;
by rec_id count;
run;
* Generate return rows;
data return_rows;
set admit_out3;
by rec_id;
* Arrays to track visited facilities and their discharge datetimes;
array fac_list[10] $50 _temporary_;
array disch_list[10] _temporary_;
retain fac_count;
if first.rec_id then do;
fac_count = 0;
do i = 1 to dim(fac_list);
fac_list[i] = '';
disch_list[i] = .;
end;
end;
* Check if trnsfr_fac is already visited;
found = 0;
do i = 1 to fac_count;
if trnsfr_fac = fac_list[i] then do;
found = i;
leave;
end;
end;
if found > 0 then do;
* Return visit;
a1_sndfac = a1_recvfac;
a1_recvfac = trnsfr_fac;
Hospital = trnsfr_fac;
a5_fl_dpo = .;
a5_tnf_fltu = .;
a5_disch_dttm = disch_list[found]; /* original discharge datetime */
a1_first_dttm = .; /* blank first_dttm for return */
count = count + 1;
tot_visits = count;
trnsfr_fac = .;
output;
end;
* store the current facility for future reference;
fac_count + 1;
fac_list[fac_count] = a1_recvfac;
disch_list[fac_count] = a5_disch_dttm;
drop i found;
run;
* Combine original and return rows;
data admit_out4;
set admit_out3 return_rows(in=b);
return_fac = b;
run;
* Sort final dataset;
proc sort data=admit_out4;
by rec_id count;
run;
How can I modify my SAS code so that it generates the correct “return visit” rows and aligns the start/end datetimes as shown in the WANT table?
Perhaps I have over simplified things because I am ignoring HOSPITAL.
You can use LAG() to look backwards at the previous observations discharge date and use it to fill in the missing admit date. And you can use an extra SET to mimic a LEAD() function to find the next admit date. But because of those missing admit dates you need to do it in two steps.
data step1;
set have;
by rec_id;
previous=lag(a5_disch_dttm);
if first.rec_id then call missing(previous,visit);
visit+1;
a1_first_dttm = coalesce(a1_first_dttm,previous);
drop previous ;
run;
data step2;
set step1;
by rec_id ;
set step1(firstobs=2 keep=a1_first_dttm rename=(a1_first_dttm=next))
step1(obs=1 drop=_all_)
;
if not last.rec_id and next < a5_disch_dttm then a5_disch_dttm=next;
drop next;
run;
proc compare data=want compare=step2;
id rec_id visit;
run;
So only change the discharge date when it is is AFTER the next admit date.
If you have a lot of data then make STEP1 a view.
79 data step1 / view=step1 ;
80 set have;
81 by rec_id;
82 previous=lag(a5_disch_dttm);
83 if first.rec_id then call missing(previous,visit);
84 visit+1;
85 a1_first_dttm = coalesce(a1_first_dttm,previous);
86 drop previous ;
87 run;
NOTE: DATA STEP view saved on file WORK.STEP1.
NOTE: A stored DATA STEP view cannot run under a different operating system.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
88
89 data step2;
90 set step1;
91 by rec_id ;
92 set step1(firstobs=2 keep=a1_first_dttm rename=(a1_first_dttm=next))
93 step1(obs=1 drop=_all_)
94 ;
95 if not last.rec_id and next < a5_disch_dttm then a5_disch_dttm=next;
96 drop next;
97 run;
NOTE: View WORK.STEP1.VIEW used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
NOTE: There were 7 observations read from the data set WORK.HAVE.
NOTE: View WORK.STEP1.VIEW used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
NOTE: There were 7 observations read from the data set WORK.HAVE.
NOTE: View WORK.STEP1.VIEW used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
NOTE: There were 7 observations read from the data set WORK.HAVE.
NOTE: There were 7 observations read from the data set WORK.STEP1.
NOTE: There were 6 observations read from the data set WORK.STEP1.
NOTE: There were 1 observations read from the data set WORK.STEP1.
NOTE: The data set WORK.STEP2 has 7 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds