I have just 2 columns for partition and on basis of those two columns, need to select the appropriate dates.
Input Data:
strt_dt | end_dt | ID | Amt |
---|---|---|---|
2023-09-29 | 2023-10-01 | ID1 | 100 |
2023-10-02 | 2023-10-02 | ID1 | 100 |
2023-10-03 | 2023-10-03 | ID1 | 100 |
2023-10-04 | 2023-10-04 | ID1 | 200 |
2023-10-05 | 2023-10-05 | ID1 | 100 |
Expected Output:
strt_dt | end_dt | ID | Amt |
---|---|---|---|
2023-09-29 | 2023-10-03 | ID1 | 100 |
2023-10-04 | 2023-10-04 | ID1 | 200 |
2023-10-05 | 2023-10-05 | ID1 | 100 |
Actual Output:
strt_dt | end_dt | ID | Amt |
---|---|---|---|
2023-09-29 | 2023-10-05 | ID1 | 100 |
2023-10-04 | 2023-10-04 | ID1 | 200 |
I have used min(strt_dt) OVER (PARTITION by id, amt ORDER BY strt_dt)
which is the cause of the overlapping results.
Do I need to do anything differently?
It can be achieve in 2 ways, I will provide both simple and complex versions.
ROW_NUMBER()
will work for the requirement, however your need to use it twice (first without amount and then with amount). The below approach is a simple version,
WITH consecutive_blocks AS (
SELECT
strt_dt,
end_dt,
ID,
Amt,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY strt_dt) AS rn1,
ROW_NUMBER() OVER (PARTITION BY ID, Amt ORDER BY strt_dt) AS rn2
FROM
table_a
)
SELECT
MIN(strt_dt) AS strt_dt,
MAX(end_dt) AS end_dt,
ID,
Amt
FROM
consecutive_blocks
GROUP BY
ID, Amt, (rn1 - rn2)
ORDER BY
strt_dt;
There is an efficient way of achieving the same result, but it will be a more complex one,
WITH flagged_rows AS (
SELECT
strt_dt,
end_dt,
ID,
Amt,
CASE
WHEN LAG(Amt) OVER (PARTITION BY ID ORDER BY strt_dt) = Amt THEN 0
ELSE 1
END AS flag
FROM
table_a
),
grouped_rows AS (
SELECT
strt_dt,
end_dt,
ID,
Amt,
SUM(flag) OVER (PARTITION BY ID ORDER BY strt_dt ROWS UNBOUNDED PRECEDING) AS grp
FROM
flagged_rows
)
SELECT
MIN(strt_dt) AS strt_dt,
MAX(end_dt) AS end_dt,
ID,
Amt
FROM
grouped_rows
GROUP BY
ID, Amt, grp
ORDER BY
strt_dt;