sqlpostgresqlgreatest-n-per-group

SQL query to get hourly data for highest revision number per day


I have two related tables as below.

Master table:

ID Date Rev
1 2022-01-01 1
2 2022-01-02 1
3 2022-01-02 2
4 2022-01-03 1

Detail table:

ID hour Capacity
1 1 1
1 2 2
1 3 3
1 4 4
2 1 5
2 2 6
2 3 7
2 4 8
3 1 9
3 2 10
3 3 11
3 4 12
4 1 13
4 2 14
4 3 15
4 4 16

The capacity for each day will be saved multiple times as revision. For some day there would be 1 revision, some other day 3 based on number of times data saved.

Now I need to take data from these two table in a single query with largest revision of each day. Expecting 12 rows of data for the 3 existing days in the chosen month.

I wrote the below query

select a1.wdcm_date as wdcm_date, c1.wdcd_block_no as wdcd_block_no, c1.wdcd_capacity as wdcd_capacity, 
      c1.wdcd_approval as wdcd_approval, a1.wdcm_revision_no as wdcm_revision_no
from wb_declared_capacity_master a1, wb_declared_capacity_detail c1
where a1.wdcm_internal_id = c1.wdcd_ref_id  and to_char(a1.wdcm_date,'MM yyyy')='01 2022' 
and wdcm_revision_no = (select max(wdcm_revision_no) from wb_declared_capacity_master where to_char(wdcm_date,'MM yyyy')='01 2022')

This returns only the data with revision number 3 dates.

My expected result should be

Date hour Capacity Rev
2022-01-01 1 1 1
2022-01-01 2 2 1
2022-01-01 3 3 1
2022-01-01 4 4 1
2022-01-02 1 9 2
2022-01-02 2 10 2
2022-01-02 3 11 2
2022-01-02 4 12 2
2022-01-03 1 13 1
2022-01-03 2 14 1
2022-01-03 3 15 1
2022-01-03 4 16 1

How to get all 12 rows (all hours for each qualifying revisions)?


Solution

  • Yet another case that's simpler and faster with DISTINCT ON:

    SELECT m.date, d.hour, d.capacity, m.rev
    FROM  (
       SELECT DISTINCT ON (date)    -- simple & fast
              id, date, rev
       FROM   master
       WHERE  date >= '2022-01-01'  -- ! sargable = faster
       AND    date <  '2022-01-02'  -- !
       ORDER  BY date, rev DESC
       ) m
    LEFT   JOIN detail d USING (id)
    ORDER  BY date, hour;
    

    See:

    And avoid expressions on columns before filtering like the plague.
    to_char(date,'MM yyyy') = '01 2022') is hugely expensive. It has to be computed for every row in the table before your filter can be applied. Use a "sargable" filter. Much faster, especially with applicable index, but even without. Related:

    Subtle detail: I use LEFT JOIN to preserve every qualifying day in the result, even if detail are missing. May or may not be possible with your data model. But won't hurt in any case.