sqlsql-servert-sql

SQL Server combine overlapping date ranges


I am using SQL Server version 2014. I am working on requirement where I have find consecutive dates and paid fee based on the dates. For instance, the SESSION_ID 7415 as the dates are not overlapping the query works as expected. The only requirement is if for any reason the consecutive dates have different paid fees based on the latest SESSION_ENTER_DATE we should consider that particular PAID value. For the SESSION_ID 7415 the START_DATE 2018-12-09 - END_DATE 2018-12-29 though this date range has the fees PAID value of 37.02 & 63.6 but based on the latest SESSION_ENTER_DATE it would be 63.6.

START_DATE END_DATE PAID SESSION_ID SESSION_TYPE SESSION_ENTER_DATE
2018-12-09 2018-12-28 37.02 7415 4963 2018-11-29
2018-12-29 2018-12-29 63.60 7415 4963 2019-01-02
SESSION_ID SESSION_TYPE START_DATE END_DATE PAID
7415 3256 2016-04-13 2016-05-24 32.53
7415 4963 2017-11-11 2018-03-02 43.59
7415 4963 2018-03-31 2018-04-27 43.59
7415 4963 2018-06-23 2018-11-09 43.59
7415 4963 2018-12-09 2018-12-29 63.60

Previous Query

But, the problem I run into is when there are overlapping dates. For instance, SESSION_ID 8642 has consecutive START_DATE & END_DATE 2022-02-17 - 2022-05-12 & another set 2022-06-02 - 2022-07-27 but the output I would want is 2/17/2022 - 9/7/2022 as the aforementioned date ranges would fall in between.

START_DATE END_DATE PAID SESSION_ID SESSION_TYPE SESSION_ENTER_DATE
2022-02-17 2022-03-02 48.58 8642 3256 2022-02-25
2022-02-17 2022-05-12 29.27 8642 3256 2022-07-27
2022-03-03 2022-03-09 46.60 8642 3256 2022-03-07
2022-03-10 2022-03-16 46.60 8642 3256 2022-03-14
2022-03-17 2022-03-23 46.60 8642 3256 2022-03-21
2022-03-24 2022-03-30 46.60 8642 3256 2022-03-28
2022-03-31 2022-04-06 46.60 8642 3256 2022-04-04
2022-04-07 2022-04-13 46.60 8642 3256 2022-04-11
2022-04-14 2022-04-20 46.60 8642 3256 2022-04-18
2022-04-21 2022-04-27 46.60 8642 3256 2022-04-25
2022-04-28 2022-05-04 46.60 8642 3256 2022-05-10
2022-05-05 2022-05-11 46.60 8642 3256 2022-05-10
2022-05-12 2022-05-12 109.80 8642 3256 2022-05-13

So for the SESSION_ID 8642 the output I would want to see is

2/17/2022 2022-05-12 29.27 3256
2022-06-02 2022-07-27 48.72 3256
4/19/2023 4/25/2023 61.19 3256
1/2/2024 1/7/2024 34.97 3256
2/22/2024 1/2/2025 71 4963

We need to grab all the other records based on the START_DATE and if there are multiple START_DATE grab the one with the latest SESSION_ENTER_DATE by checking if the date range is consecutive or if there are any gaps in-between.

Modified code:


Solution

  • This is gaps and islands task.
    As a rule, we compare end_date for current row and start_date for next row.
    Or compare start_date with previous end_date.
    If datediff(...)>1 then this is gap.

    Since we are comparing the current row with the next one, we use a range of rows in the window function as rows between unbounded preceding and 1 preceding for calculating group number (running total).

    Additional check is max_enter_date<next_start_date for current row. If not - this is overlapped row, not gap. It is for this check that we compare the current row with the next one, and not the other way around.

    with gaps as ( -- detect gaps
    select * 
      ,case when max_enter_date<next_start_date 
           and datediff(day,end_date
                             ,next_start_date
                 )>1
       then 1 else 0 end isGap
    from(
      select *
        ,lead(start_date,1,end_date)
             over(partition by session_id,session_type order by start_date) next_start_date
        ,max(case when session_enter_date is null then getdate() else session_enter_date end)
            over(partition by session_id,session_type order by start_date) max_enter_date
      from session
    )a
    )
    ,t as( -- detect last paid -> rn=1
    select *
      ,row_number()over(partition by session_id,session_type,grn order by start_date desc)rn
    from( -- calculate group (island) num as running total
      select *
        ,coalesce(sum(isGap)over(partition by session_id,session_type order by start_date
            rows between unbounded preceding and 1 preceding) 
          ,0) grn
      from gaps
     )grn
    )
    select session_id,session_type,grn
      ,min(start_date) start_date
      ,max(end_date) end_date
      ,max(case when rn=1 then paid end)paid
    from t
    group by session_id,session_type,grn
    ;
    
    session_id session_type grn start_date end_date paid
    7415 3256 0 2016-04-13 2016-05-24 32.53
    7415 4963 0 2017-11-11 2018-03-02 43.59
    7415 4963 1 2018-03-31 2018-04-27 43.59
    7415 4963 2 2018-06-23 2018-11-09 43.59
    7415 4963 3 2018-12-09 2018-12-29 63.60
    8642 3256 0 2022-02-17 2022-09-07 61.19
    8642 3256 1 2023-04-19 2023-06-13 61.19
    8642 3256 2 2024-01-02 2024-01-08 17.48
    8642 3256 3 2024-12-20 2024-12-26 71.00
    8642 4963 0 2024-02-22 2025-01-02 71.00

    fiddle

    An additional assumption is used here

    case when session_enter_date is null then getdate() else session_enter_date end
    

    if session_enter_date is null, this payment has not been used and is valid now.