I am using SQL Server version 2014. I am working on requirement where I have find consecutive dates and paid fee based on the dates. For instance, the SESSION_ID
7415 as the dates are not overlapping the query works as expected. The only requirement is if for any reason the consecutive dates have different paid fees based on the latest SESSION_ENTER_DATE
we should consider that particular PAID
value. For the SESSION_ID
7415 the START_DATE 2018-12-09 - END_DATE 2018-12-29
though this date range has the fees PAID
value of 37.02 & 63.6
but based on the latest SESSION_ENTER_DATE
it would be 63.6
.
START_DATE | END_DATE | PAID | SESSION_ID | SESSION_TYPE | SESSION_ENTER_DATE |
---|---|---|---|---|---|
2018-12-09 | 2018-12-28 | 37.02 | 7415 | 4963 | 2018-11-29 |
2018-12-29 | 2018-12-29 | 63.60 | 7415 | 4963 | 2019-01-02 |
SESSION_ID | SESSION_TYPE | START_DATE | END_DATE | PAID |
---|---|---|---|---|
7415 | 3256 | 2016-04-13 | 2016-05-24 | 32.53 |
7415 | 4963 | 2017-11-11 | 2018-03-02 | 43.59 |
7415 | 4963 | 2018-03-31 | 2018-04-27 | 43.59 |
7415 | 4963 | 2018-06-23 | 2018-11-09 | 43.59 |
7415 | 4963 | 2018-12-09 | 2018-12-29 | 63.60 |
But, the problem I run into is when there are overlapping dates. For instance, SESSION_ID 8642
has consecutive START_DATE
& END_DATE
2022-02-17 - 2022-05-12
& another set 2022-06-02 - 2022-07-27
but the output I would want is 2/17/2022 - 9/7/2022
as the aforementioned date ranges would fall in between.
START_DATE | END_DATE | PAID | SESSION_ID | SESSION_TYPE | SESSION_ENTER_DATE |
---|---|---|---|---|---|
2022-02-17 | 2022-03-02 | 48.58 | 8642 | 3256 | 2022-02-25 |
2022-02-17 | 2022-05-12 | 29.27 | 8642 | 3256 | 2022-07-27 |
2022-03-03 | 2022-03-09 | 46.60 | 8642 | 3256 | 2022-03-07 |
2022-03-10 | 2022-03-16 | 46.60 | 8642 | 3256 | 2022-03-14 |
2022-03-17 | 2022-03-23 | 46.60 | 8642 | 3256 | 2022-03-21 |
2022-03-24 | 2022-03-30 | 46.60 | 8642 | 3256 | 2022-03-28 |
2022-03-31 | 2022-04-06 | 46.60 | 8642 | 3256 | 2022-04-04 |
2022-04-07 | 2022-04-13 | 46.60 | 8642 | 3256 | 2022-04-11 |
2022-04-14 | 2022-04-20 | 46.60 | 8642 | 3256 | 2022-04-18 |
2022-04-21 | 2022-04-27 | 46.60 | 8642 | 3256 | 2022-04-25 |
2022-04-28 | 2022-05-04 | 46.60 | 8642 | 3256 | 2022-05-10 |
2022-05-05 | 2022-05-11 | 46.60 | 8642 | 3256 | 2022-05-10 |
2022-05-12 | 2022-05-12 | 109.80 | 8642 | 3256 | 2022-05-13 |
So for the SESSION_ID 8642
the output I would want to see is
2/17/2022 | 2022-05-12 | 29.27 | 3256 |
2022-06-02 | 2022-07-27 | 48.72 | 3256 |
4/19/2023 | 4/25/2023 | 61.19 | 3256 |
1/2/2024 | 1/7/2024 | 34.97 | 3256 |
2/22/2024 | 1/2/2025 | 71 | 4963 |
We need to grab all the other records based on the START_DATE
and if there are multiple START_DATE
grab the one with the latest SESSION_ENTER_DATE
by checking if the date range is consecutive or if there are any gaps in-between.
This is gaps and islands
task.
As a rule, we compare end_date
for current row and start_date
for next row.
Or compare start_date
with previous end_date
.
If datediff(...)>1 then this is gap.
Since we are comparing the current row with the next one, we use a range of rows in the window function as rows between unbounded preceding and 1 preceding
for calculating group number (running total).
Additional check is max_enter_date<next_start_date
for current row. If not - this is overlapped row, not gap. It is for this check that we compare the current row with the next one, and not the other way around.
with gaps as ( -- detect gaps
select *
,case when max_enter_date<next_start_date
and datediff(day,end_date
,next_start_date
)>1
then 1 else 0 end isGap
from(
select *
,lead(start_date,1,end_date)
over(partition by session_id,session_type order by start_date) next_start_date
,max(case when session_enter_date is null then getdate() else session_enter_date end)
over(partition by session_id,session_type order by start_date) max_enter_date
from session
)a
)
,t as( -- detect last paid -> rn=1
select *
,row_number()over(partition by session_id,session_type,grn order by start_date desc)rn
from( -- calculate group (island) num as running total
select *
,coalesce(sum(isGap)over(partition by session_id,session_type order by start_date
rows between unbounded preceding and 1 preceding)
,0) grn
from gaps
)grn
)
select session_id,session_type,grn
,min(start_date) start_date
,max(end_date) end_date
,max(case when rn=1 then paid end)paid
from t
group by session_id,session_type,grn
;
session_id | session_type | grn | start_date | end_date | paid |
---|---|---|---|---|---|
7415 | 3256 | 0 | 2016-04-13 | 2016-05-24 | 32.53 |
7415 | 4963 | 0 | 2017-11-11 | 2018-03-02 | 43.59 |
7415 | 4963 | 1 | 2018-03-31 | 2018-04-27 | 43.59 |
7415 | 4963 | 2 | 2018-06-23 | 2018-11-09 | 43.59 |
7415 | 4963 | 3 | 2018-12-09 | 2018-12-29 | 63.60 |
8642 | 3256 | 0 | 2022-02-17 | 2022-09-07 | 61.19 |
8642 | 3256 | 1 | 2023-04-19 | 2023-06-13 | 61.19 |
8642 | 3256 | 2 | 2024-01-02 | 2024-01-08 | 17.48 |
8642 | 3256 | 3 | 2024-12-20 | 2024-12-26 | 71.00 |
8642 | 4963 | 0 | 2024-02-22 | 2025-01-02 | 71.00 |
An additional assumption is used here
case when session_enter_date is null then getdate() else session_enter_date end
if session_enter_date is null, this payment has not been used and is valid now.