[SOLVED] min/max date range within group

min/max date range within group

We have a transactional table that stores the data much like a historical table where any time a status changes (or other attributes) it gets effective dated.

Example:

  Product | Status   | Start Date | End Date
----------+------- --+------------+-----------
widget a  | active   | 02/01/2020 | 02/30/2020
widget a  | active   | 03/01/2020 | 03/19/2020
widget a  | inactive | 03/20/2020 | 05/01/2020
widget a  | active   | 05/02/2020 | 08/31/2020
widget b  | active   | 02/01/2020 | 05/31/2020
widget b  | inactive | 06/01/2020 | 06/31/2020

I am trying to roll up this data based on the min and max dates as the status changes (as I said, other attributes contribute to the record changing, but I am only concerned with status changes). So in above example, 'widget a' would have three records: active from 02/01/2020 - 03/19/2020, inactive from 03/20/2020 - 05/01/2020 and active from 05/02/2020 - 08/31/2020. This can easily be done using an ETL tool but I would like to get this into a view.

What is the best way to do this while being mindful of performance

This is postgresql 10

Solution

This is a gaps-and-islands problem, where you want to group together adjacent rows that have the same product and status.

Here is an approach that uses the difference between row numbers to build the groups:

select product, status, min(start_date) start_date, max(end_date) end_date,
    rn1 - rn2 as grp_no
from (
    select t.*, 
        row_number() over(partition by product order by start_date) rn1,
        row_number() over(partition by product, status order by start_date) rn2
    from orders t
) t
group by product,status, rn1 - rn2 order by product,start_date;