I work in healthcare, and need to produce a report that shows patient lab values at various timepoints. The timepoints are like the following:
Pre transplant:
1 year = 365 days +/- 30 days
3 months = 90 days +/- 14 days
1 month = 30 days +/- 7 days
Post transplant:
1 day = 24 hrs +/- 12 hrs
1 week = 7 days +/- 1 days
1 month = 30 days +/- 7 days
3 month = 90 days +/- 14 days
6 month = 180 days +/- 30 days
1 year = 365 days +/- 30 days
My data model has a lot of tables (results from SQL Server queries), but the main lab table looks like the following:
+-----------------------+-----------------+------------+-----------+
| Order ID | Episode ID | Transplant Date | Lab Date | Lab Value |
+----------+------------+-----------------+------------+-----------+
| 111 | 222 | 5/2/2018 | 1/22/2018 | 23 |
| 112 | 222 | 5/2/2018 | 1/27/2018 | 15 |
| 113 | 222 | 5/2/2018 | 5/3/2018 | 14 |
| 114 | 222 | 5/2/2018 | 10/19/2018 | 12 |
| 115 | 223 | 1/23/2019 | 1/24/2019 | 20 |
| 116 | 223 | 1/23/2019 | 1/25/2019 | 25 |
| 117 | 223 | 1/23/2019 | 1/31/2019 | 29 |
| 118 | 223 | 1/23/2019 | 4/23/2019 | 30 |
| 119 | 223 | 1/23/2019 | 3/1/2019 | 35 |
| 120 | 224 | 7/19/2019 | 7/19/2018 | 5 |
| 121 | 224 | 7/19/2019 | 7/24/2018 | 13 |
+-----------------------+-----------------+------------+-----------+
Order ID
is the unique identifier for the lab, Episode ID
is the unique identifier for the patient, and we're looking for labs relative to the Transplant Date
.
There's another table for patient data that looks something like this:
+------------+----------------+-----------------+
| Episode ID | Patient Name | Transplant Date |
+------------+----------------+-----------------+
| 222 | Alphers, Ralph | 5/2/2018 |
| 223 | Bethe, Hans | 1/23/2019 |
| 224 | Gammow, George | 7/19/2019 |
+------------+----------------+-----------------+
The resulting data should looke similar to this:
+------------+------------+--------------+-------------+------------+-------------+--------------+---------------+-------------+
| Episode ID | 1 year pre | 3 months pre | 1 month pre | 1 day post | 1 week post | 1 month post | 6 months post | 1 year post |
+------------+------------+--------------+-------------+------------+-------------+--------------+---------------+-------------+
| 222 | | 15 | | 14 | | | 12 | |
| 223 | | | | 20 | 29 | 35 | | |
| 224 | 5 | | | | | | | |
+------------+------------+--------------+-------------+------------+-------------+--------------+---------------+-------------+
Is there a best way to do this, taking into account both processing time (user experience) and development complexity?
Right now, here's how I'm doing it.
First, I'm using Power Query (M) to create the time points (e.g. Table.AddColumn(#"Changed Type", "Minutes to One Year Before Transplant", each Number.Abs(Duration.TotalMinutes(([Lab Date] - DateTime.From(Date.AddYears([Transplant Date], -1))))))
).
Then, I use DAX to find the number of days for the record that's closest to the right target date:
Labs shortest minutes to one year before transplant =
VAR EpisodeID = Patients[Episode ID]
VAR TargetDate = DATEADD(Patients[Transplant Date], 1, MONTH)
VAR WindowDays = 30
RETURN
CALCULATE(
MIN(Labs[Minutes to One Month After Transplant]),
FILTER(Labs, Labs[Episode ID] = EpisodeID),
FILTER(Labs, Labs[Lab Date] >= DATEADD(TargetDate, -WindowDays, DAY)),
FILTER(Labs, Labs[Lab Date] <= DATEADD(TargetDate, WindowDays, DAY))
)
Then, I use that number of minutes as an identifier to grab the Order ID
Lab Order ID closest to one year before transplant =
VAR EpisodeID = Patients[Episode ID]
VAR TargetDate = DATEADD(Patients[Transplant Date], 1, MONTH)
VAR WindowDays = 30
VAR DaysFrom = Patients[Labs shortest minutes to one year before transplant]
RETURN
CALCULATE(
MIN(Labs[Order ID]),
FILTER(Labs, Labs[Episode ID] = EpisodeID),
FILTER(Labs, Labs[Lab Date] >= DATEADD(TargetDate, -WindowDays, DAY)),
FILTER(Labs, Labs[Lab Date] <= DATEADD(TargetDate, WindowDays, DAY))
)
Finally, I can use that Order ID
to grab whatever I want from that lab, like the value:
Lab Value closest to one year before transplant =
VAR EpisodeID = Patients[Episode ID]
VAR OrderID = Patients[Lab Order ID closest to one year before transplant]
RETURN
CALCULATE(
MIN(Labs[Value]),
FILTER(Labs, Labs[Episode ID] = EpisodeID),
FILTER(Labs, Labs[Order ID] = OrderID)
)
And, I need to do this for 3 different labs, which means repeating this process like 30 times. And, the resulting report takes awhile to do the calculations. I can push a bunch of work back to SQL Server, but maybe that's not the best idea?
I would make a table with buckets that dates fall in. This way if someone requests different buckets it is simple to add.
CREATE TABLE [dbo].[table_Buckets](
[Bucket] [varchar](50) NULL,
[NumDaysLow] [int] NULL,
[NumDaysHigh] [int] NULL
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
INSERT [dbo].[table_Buckets] ([Bucket], [NumDaysLow], [NumDaysHigh]) VALUES (N'Pre-1Yr', -395, -335)
GO
INSERT [dbo].[table_Buckets] ([Bucket], [NumDaysLow], [NumDaysHigh]) VALUES (N'Pre-3Mth', -105, -75)
GO
INSERT [dbo].[table_Buckets] ([Bucket], [NumDaysLow], [NumDaysHigh]) VALUES (N'Pre-1Mth', -37, -21)
GO
INSERT [dbo].[table_Buckets] ([Bucket], [NumDaysLow], [NumDaysHigh]) VALUES (N'Post-1Day', 0, 2)
GO
INSERT [dbo].[table_Buckets] ([Bucket], [NumDaysLow], [NumDaysHigh]) VALUES (N'Post-1Wk', 6, 8)
GO
INSERT [dbo].[table_Buckets] ([Bucket], [NumDaysLow], [NumDaysHigh]) VALUES (N'Post-1Mth', 21, 37)
GO
INSERT [dbo].[table_Buckets] ([Bucket], [NumDaysLow], [NumDaysHigh]) VALUES (N'Post-3Mth', 76, 104)
GO
INSERT [dbo].[table_Buckets] ([Bucket], [NumDaysLow], [NumDaysHigh]) VALUES (N'Post-6Mth', 150, 210)
GO
INSERT [dbo].[table_Buckets] ([Bucket], [NumDaysLow], [NumDaysHigh]) VALUES (N'Post-1Yr', 335, 395)
GO
Now you can run the following sql query that will take the data, put a bucket date to the episode, take the lowest number per bucket and then pivot the table to the view that you desire. You will have to design your data around this structure.
select
EpisodeID
,[Pre-1Yr]
,[Pre-3Mth]
,[Pre-1Mth]
,[Post-1Day]
,[Post-1Wk]
,[Post-1Mth]
,[Post-3Mth]
,[Post-6Mth]
,[Post-1Yr]
from
(
--this select statement takes the lowest value if there are more than one value per bucket
select main.EpisodeID, main.Bucket, min(main.LabValue) as LabValue from
(--this select statement assigns the episode to a buckets
select
ml.EpisodeID
, (select Bucket from
table_Buckets
where
NumDaysLow <= datediff(d,pd.TransplantDate, ml.LabDate)
and NumDaysHigh >= datediff(d,pd.TransplantDate, ml.LabDate)
) AS Bucket
, ml.LabValue as LabValue
from
table_MainLab ML,
table_PatientData PD where ml.EpisodeID = pd.EpisodeID
) main
group by EpisodeID, Bucket) s
pivot
(avg(LabValue)
for [Bucket] in
([Pre-1Yr]
,[Pre-3Mth]
,[Pre-1Mth]
,[Post-1Day]
,[Post-1Wk]
,[Post-1Mth]
,[Post-3Mth]
,[Post-6Mth]
,[Post-1Yr])
) as pivottable