I have a dataframe where I want to return the number (proportion) of patients that have had a subsequent follow up after diagnosis of diseaseacross departments.
Example; a patient can have diag_yn
across multiple departments but I want to check has the patient had follow up like before but for each department.
So, Patient 1 would be recorded twice in example, 1 for Radiology and again for Respiratory as it had a follow up appt for both departments.
Original DF
dept | patient_id | app_date | diag_date | diag_yn |
|------------|------------|------------|------------|-----------|
Radiology | 1 | 2024-01-11 | NaT | NaN |
Radiology | 1 | 2024-03-14 | 2024-03-14 | 1 |
Radiology | 1 | 2024-04-09 | NaT | NaN |
Radiology | 1 | 2024-09-09 | NaT | NaN |
Respiratory | 1 | 2024-02-11 | NaT | NaN |
Respiratory | 1 | 2024-04-14 | 2024-04-14 | 1 |
Respiratory | 1 | 2024-06-09 | NaT | NaN |
Respiratory | 1 | 2024-09-09 | NaT | NaN |
Respiratory | 2 | 2024-01-11 | NaT | NaN |
Respiratory | 2 | 2024-03-14 | 2024-03-14 | 1 |
Respiratory | 2 | 2024-04-09 | NaT | NaN |
Respiratory | 2 | 2024-09-09 | NaT | NaN |
Desired Output
dept | patient_with_fup | count |
Radiology |------------------|-------|
| 1 | 1 |
| 0 | 0 |
Respiratory |------------------|-------|
| 1 | 2 |
| 0 | 0 |
Have been provided the following code but before the dept
column was added to the DF.
(df.groupby('patient_id')
.apply(lambda g: g['app_date'].gt(next(iter(g['diag_date'].dropna()),
pd.NaT)).any(),
include_groups=False)
.astype(int).value_counts()
.reindex([0, 1], fill_value=0).rename_axis('fup_yn')
.reset_index()
)
You could start like the original approach, using the two columns as group, then compute the counts with pivot_table
:
(df.groupby(['patient_id', 'dept'])
.apply(lambda g: g['app_date'].gt(next(iter(g['diag_date'].dropna()),
pd.NaT)).any(),
include_groups=False)
.astype(int).reset_index(name='patient_with_fup')
.pivot_table(index='patient_with_fup', columns='dept',
values='patient_id', aggfunc='count')
.reindex([0, 1], fill_value=0)
.unstack().reset_index(name='count')
)
Output:
dept patient_with_fup count
0 Radiology 0 1
1 Radiology 1 0
2 Respiratory 0 2
3 Respiratory 1 0
Example with more groups:
# let's add a new column
df['unit'] = ['A']*11+['B']
groups = ['dept', 'unit']
(df.groupby(['patient_id']+groups)
.apply(lambda g: g['app_date'].gt(next(iter(g['diag_date'].dropna()),
pd.NaT)).any(),
include_groups=False)
.astype(int).reset_index(name='patient_with_fup')
.pivot_table(index='patient_with_fup', columns=groups,
values='patient_id', aggfunc='count')
.reindex([0, 1], fill_value=0)
.unstack().reset_index(name='count')
)
Output:
dept unit patient_with_fup count
0 Radiology A 0 1
1 Radiology A 1 0
2 Respiratory A 0 2
3 Respiratory A 1 0
4 Respiratory B 0 1
5 Respiratory B 1 0