I am trying to dynamically get a count at a per month basis per customer and type. Where two of the counts are the first time we got an order based on the program and the other two are programs/customers we lost based on their last order date and 4 or 5 months have passed since.
The new customers per month is correct but the lost customers is not in both queries I have created.
There is a few issues with the data, one the invoicing data that I am using doesn't have a unique identifier so to create one I am using concat with First name, Last name, DOB and Type of program.
I.E FirstLast2024-09-12Type
Second issue I cant get around is the count for lost customers, I have been trying multiple ways and landed on the below two queries and have got different results for lost customers with each query. The first orders in my data set is 2021-01 so there should be no lost customers for those first months as no ones last order could be before 2021-01. I am hoping it is something small but even after taking a few days break from it I can't wrap my head around it.. First Query:
with dates as (
select convert(date, '2019-01-01') as dt
union all
select dateadd(month, 1, dt)
from dates
where dt < getdate()
)
Select left(dt,7) as [date],count(*) as [new_customers], sum(case when right(b.[id],3) = 'IPP' and left(b.[1st_order],7) = left(dt,7) then 1 else 0 end) as [new_IPP_customers],
sum(case when right(b.[id],3) = 'MAP' and left(b.[1st_order],7) = left(dt,7) then 1 else 0 end) as [new_MAP_customers],
sum(case when right(b.[id],3) = 'IPP' and datediff(m,dt,b.[Last]) > 1 then 1 else 0 end) as [Lost_IPP_customers],
sum(case when right(b.[id],3) = 'MAP' and datediff(m,dt,b.[Last]) > 1 then 1 else 0 end) as [Lost_MAP_customers]
from dates
Left Join(
Select CONCAT(a.[First Name],a.[Last Name],a.[DOB],a.Solution) as [id], min(dos) as [1st_order],max(llast) as [last]
FROM [MYTABLE] a
Left Join(
Select CONCAT([First Name],[Last Name],[DOB],Solution) as [id], dateadd(m,4,max(dos)) as [llast]
FROM [MYTABLE] a
Group by CONCAT([First Name],[Last Name],[DOB],[Solution])) b on b.id = CONCAT(a.[First Name],a.[Last Name],a.[DOB],a.Solution)
Group by CONCAT(a.[First Name],a.[Last Name],a.[DOB],a.Solution)) b on left(b.[1st_order],7) = left(dt,7)
where b.id is not null
Group by left(dt,7)
order by left(dt,7)
Results:
Second query: just tried different ways to join etc..
with dates as (
select convert(date, '2019-01-01') as dt
union all
select dateadd(month, 1, dt)
from dates
where dt < getdate()
)
Select left(dt,7) as [date],count(*) as [new_customers], sum(case when right(b.[id],3) = 'IPP' and left(b.[1st_order],7) = left(dt,7) then 1 else 0 end) as [new_IPP_customers],
sum(case when right(b.[id],3) = 'MAP' and left(b.[1st_order],7) = left(dt,7) then 1 else 0 end) as [new_MAP_customers],
sum(case when right(c.[id],3) = 'IPP' and datediff(m,dt,c.[Last_order]) > 5 then 1 else 0 end) as [Lost_IPP_customers],
sum(case when right(c.[id],3) = 'MAP' and datediff(m,dt,c.[Last_order]) > 5 then 1 else 0 end) as [Lost_MAP_customers]
from dates
Left Join(
Select a.[id], min(dos) as [1st_order]
From
(Select *,CONCAT([First Name],[Last Name],[DOB],Solution) as [id]
FROM [MYTABLE) a
Group by a.[id]) b on left(b.[1st_order],7) = left(dt,7)
Left Join(
Select a.[id], max(dos) [Last_order]
From
(Select *,CONCAT([First Name],[Last Name],[DOB],Solution) as [id]
FROM [MYTABLE]) a
Group by a.[id]) c on b.[id] = c.id
where c.id is not null
Group by left(dt,7)
order by left(dt,7)
Results: 2nd Query Results
Note: Count * was just to make sure the new customers added up for visual purposes.
Here is example of how the data set looks before joining to a calendar table and sum cases: First and Last orders
The new customers per month is correct but the lost customers is not in both queries I have created.
EDIT: I failed to notice the first query sum counts for first and lost are the same count as each other, the second query is what i started with and I tried to make it simpler and created the first query as the second query wasn't populating the lost customers correctly. Here is example data from "mytable" I blocked off parts of the data for privacy.. I also added two tables to the post of example data and What I am trying to accomplish with that data provided.
Example data:
First Name | Last Name | DOB | Member Number | DOS | Medication | Solution | Day Supply | Insurance | Amount Due | Payment Status | Invoice # | Inv# Date |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Name | Fake | 1968-01-11 | NULL | 2023-10-26 | Ocrevus | MAP | 180 | 0.00 | 9890.40 | PAID | 111 | 2023-11-09 |
Name | Fake | 1968-01-11 | NULL | 2022-10-31 | Ocrevus | MAP | 180 | 0.00 | 8723.9825 | PAID | 222 | 2022-11-18 |
Name | Fake | 1968-01-11 | NULL | 2023-04-29 | Ocrevus | MAP | 180 | 0.00 | 9890.40 | PAID | 333 | 2023-05-12 |
Name1 | Fake1 | 1998-03-22 | NULL | 2022-06-02 | Novolog Flextouch | IPP | 90 | 0.00 | 541.08 | PAID | 123456 | 2022-06-22 |
Name1 | Fake1 | 1998-03-22 | NULL | 2021-06-04 | Novolog | IPP | 90 | 0.00 | 541.62 | PAID | 123 | 2021-07-20 |
Name1 | Fake1 | 1998-03-22 | NULL | 2021-06-04 | Basaglar | IPP | 90 | 0.00 | 551.76 | PAID | 123 | 2021-07-20 |
Name1 | Fake1 | 1998-03-22 | NULL | 2022-03-08 | Lantus Solostar Pens | IPP | 90 | 0.00 | 665.97 | PAID | 12368 | 2022-04-20 |
Name2 | Fake2 | 1982-11-16 | NULL | 2022-09-29 | Creon | MAP | 90 | 0.00 | 1235.07 | PAID | 3698 | 2022-11-04 |
Name2 | Fake2 | 1982-11-16 | NULL | 2023-10-10 | Creon | IPP | 100 | 0.00 | 1874.56 | PAID | 516 | 2023-10-27 |
Name2 | Fake2 | 1982-11-16 | NULL | 2023-06-26 | Creon | MAP | 90 | 0.00 | 1238.00 | PAID | 44 | 2023-08-07 |
Using the example data above the output should look like below with the lost customer count being 4 months after the DOS (but the real results would have every year and month in order from x number of years ago to today's month, I shortened the results for viewing purposes.
date | new_IPP_customers | new_MAP_customers | Lost_IPP_customers | Lost_MAP_customers |
---|---|---|---|---|
2021-06 | 1 | 0 | 0 | 0 |
2022-09 | 0 | 1 | 0 | 0 |
2022-10 | 0 | 1 | 0 | 1 |
2023-10 | 1 | 0 | 0 | 1 |
2024-02 | 0 | 0 | 1 | 1 |
Something like this can be a good start perhaps:
--Data
SELECT *
INTO #data
FROM (
VALUES (N'Name', N'Fake', N'1968-01-11', NULL, N'2023-10-26', N'Ocrevus', N'MAP', 180, 0.00, 9890.40, N'PAID', 111, N'2023-11-09')
, (N'Name', N'Fake', N'1968-01-11', NULL, N'2022-10-31', N'Ocrevus', N'MAP', 180, 0.00, 8723.9825, N'PAID', 222, N'2022-11-18')
, (N'Name', N'Fake', N'1968-01-11', NULL, N'2023-04-29', N'Ocrevus', N'MAP', 180, 0.00, 9890.40, N'PAID', 333, N'2023-05-12')
, (N'Name1', N'Fake1', N'1998-03-22', NULL, N'2022-06-02', N'Novolog Flextouch', N'IPP', 90, 0.00, 541.08, N'PAID', 123456, N'2022-06-22')
, (N'Name1', N'Fake1', N'1998-03-22', NULL, N'2021-06-04', N'Novolog', N'IPP', 90, 0.00, 541.62, N'PAID', 123, N'2021-07-20')
, (N'Name1', N'Fake1', N'1998-03-22', NULL, N'2021-06-04', N'Basaglar', N'IPP', 90, 0.00, 551.76, N'PAID', 123, N'2021-07-20')
, (N'Name1', N'Fake1', N'1998-03-22', NULL, N'2022-03-08', N'Lantus Solostar Pens', N'IPP', 90, 0.00, 665.97, N'PAID', 12368, N'2022-04-20')
, (N'Name2', N'Fake2', N'1982-11-16', NULL, N'2022-09-29', N'Creon', N'MAP', 90, 0.00, 1235.07, N'PAID', 3698, N'2022-11-04')
, (N'Name2', N'Fake2', N'1982-11-16', NULL, N'2023-10-10', N'Creon', N'IPP', 100, 0.00, 1874.56, N'PAID', 516, N'2023-10-27')
, (N'Name2', N'Fake2', N'1982-11-16', NULL, N'2023-06-26', N'Creon', N'MAP', 90, 0.00, 1238.00, N'PAID', 44, N'2023-08-07')
) t ([First Name],[Last Name],DOB,[Member Number],DOS,Medication,Solution,[Day Supply],Insurance,[Amount Due],[Payment Status],[Invoice #],[Inv# Date])
-- Code
;WITH dates AS (
SELECT CONVERT(date, '2019-01-01') AS dt
UNION ALL
SELECT dateadd(month, 1, dt)
FROM dates
WHERE dt < getdate()
)
SELECT CONVERT(VARCHAR(6), dt, 112)
, ISNULL(SUM(CASE WHEN solution = 'IPP' THEN x.gained END), 0) AS IPPGained
, ISNULL(SUM(CASE WHEN solution = 'IPP' THEN x.lost END), 0) AS IPPLost
, ISNULL(SUM(CASE WHEN solution = 'MAP' THEN x.gained END), 0) AS MAPGained
, ISNULL(SUM(CASE WHEN solution = 'MAP' THEN x.lost END), 0) AS MAPLost
FROM dates d
LEFT JOIN
(
SELECT sa.*
FROM (
SELECT CASE WHEN lag(dos) OVER(partition BY CONCAT([First Name], [Last Name], DOB, Solution) ORDER BY DOS) IS NULL THEN 1 ELSE 0 END AS first
, CASE WHEN LEAD(dos) OVER(partition BY CONCAT([First Name], [Last Name], DOB, Solution) ORDER BY DOS) IS NULL THEN 1 ELSE 0 END AS last
, *
FROM #data d
) d
CROSS apply (
SELECT solution, first AS gained, 0 AS lost, DOS
WHERE first = 1
UNION ALL
SELECT solution, 0, 1, DATEADD(MONTH, 4, Dos)
WHERE last = 1
) sa
) x
ON EOMONTH(x.DOS) = EOMONTH(d.dt)
GROUP BY CONVERT(VARCHAR(6), dt, 112)
The core of the idea is to create two flags, first and last order.
I use LAG/LEAD window function to do so, and PARTITION BY the identifiers like First, last, Solution to distinguish the different groups of customers.
Now, to simplify the gain/losses, I create two rows:
This simplifies our logic later.
Finally, i join back on the dates CTE and use conditional aggregation (CASE WHEN ...) to generate the tabulated format you're looking for.
For the date join, i use EOMONTH to simplify the date handling, this function returns the last date of the month ie. a single date. Feel free to use something else, i wasn't going for the most performant solution, but rather to demonstrate main concept.