I am trying to get all continuous period of customers from their contract start and end dates in Synapse SQL allowing up to 30 day break between contracts.
I have been trying with LAG and Rownumber but each fails because when sorted with startdate overlapping contracts give poor results
WITH RankedContracts AS (
SELECT
CustomerID,
ContractID,
StartDate,
EndDate,
-- Ensure correct order for computing PrevEndDate
LAG(EndDate) OVER (PARTITION BY CustomerID ORDER BY StartDate, ContractID) AS PrevEndDate
FROM Contracts
),
PeriodBreaks AS (
SELECT
CustomerID,
ContractID,
StartDate,
EndDate,
-- Check if the current StartDate is NOT contiguous with the previous EndDate
CASE
WHEN DATEDIFF(DAY, PrevEndDate, StartDate) > 30 THEN 1
ELSE 0
END AS IsNewPeriod
FROM RankedContracts
)
Row 3 is problematic. It should not be a new period because it is within 30 days of row 1. Is there a way to compare to all previous rows instead of just the previous row?
Result with given code:
CustomerID | ContractID | StartDate | EndDate | IsNewPeriod |
---|---|---|---|---|
1 | 1 | 2019-01-01T00:00:00.0 | 2020-06-30T00:00:00.0 | 0 |
1 | 4 | 2019-04-23T00:00:00.0 | 2019-11-29T00:00:00.0 | 0 |
1 | 41 | 2020-07-01T00:00:00.0 | 2023-06-30T00:00:00.0 | 1 |
1 | 42 | 2020-12-08T00:00:00.0 | 2021-03-18T00:00:00.0 | 0 |
2 | 22 | 2020-07-01T00:00:00.0 | 2023-06-30T00:00:00.0 | 0 |
2 | 23 | 2024-01-08T00:00:00.0 | 2024-03-18T00:00:00.0 | 1 |
2 | 24 | 2024-04-01T00:00:00.0 | 2025-06-30T00:00:00.0 | 0 |
Wanted result should be like
CustomerID | ContractID | StartDate | EndDate | IsNewPeriod |
---|---|---|---|---|
1 | 1 | 2019-01-01T00:00:00.0 | 2020-06-30T00:00:00.0 | 0 |
1 | 4 | 2019-04-23T00:00:00.0 | 2019-11-29T00:00:00.0 | 0 |
1 | 41 | 2020-07-01T00:00:00.0 | 2023-06-30T00:00:00.0 | 0 |
1 | 42 | 2020-12-08T00:00:00.0 | 2021-03-18T00:00:00.0 | 0 |
2 | 22 | 2020-07-01T00:00:00.0 | 2023-06-30T00:00:00.0 | 0 |
2 | 23 | 2024-01-08T00:00:00.0 | 2024-03-18T00:00:00.0 | 1 |
2 | 24 | 2024-04-01T00:00:00.0 | 2025-06-30T00:00:00.0 | 0 |
I believe, that using the extended functionality of t-sql window functions will solve your problem. You can extend the viewed window by bounded ranges like here:
WITH RankedContracts AS (
SELECT
CustomerID,
ContractID,
StartDate,
EndDate,
-- Get the most recent EndDate that is within 30 days prior to the current StartDate
MAX(EndDate) OVER (
PARTITION BY CustomerID
ORDER BY StartDate, ContractID
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) AS MaxPrevEndDate
FROM Contracts
),
PeriodBreaks AS (
SELECT
CustomerID,
ContractID,
StartDate,
EndDate,
-- Check if the current StartDate is more than 30 days after the MaxPrevEndDate
CASE
WHEN DATEDIFF(DAY, MaxPrevEndDate, StartDate) > 30 THEN 1
ELSE 0
END AS IsNewPeriod
FROM RankedContracts
)
-- Calculate the final result
SELECT
CustomerID,
ContractID,
StartDate,
EndDate,
SUM(IsNewPeriod) OVER (PARTITION BY CustomerID ORDER BY StartDate, ContractID) AS PeriodGroup
FROM PeriodBreaks
ORDER BY CustomerID, StartDate;
This should retrieve the desired result