sqlpostgresqlwindow-functionsgenerate-seriesforward-fill

Most recent status of each item as of the 1st of each month


I have a table that is structured in the following way: fiddle

create table test(id,status,datechange)as values
 ('011AVN', 11, '2024-06-21 08:27:13'::timestamp)
,('011AVN', 12, '2024-06-21 08:28:16')
,('011AVN', 21, '2024-07-04 15:01:21')
,('011AVN', 22, '2024-07-07 20:30:30')
,('011AVN', 31, '2024-07-11 17:38:01')
,('011AVN', 32, '2024-07-12 20:30:15')
,('011AVN', 33, '2024-07-31 20:58:54')
,('011AVN', 22, '2024-10-16 16:13:33')
,('011AVN', 22, '2024-12-02 18:03:19')
,('011AVN', 31, '2024-12-10 21:53:04')
,('011AVN', 32, '2024-12-11 22:04:26')
,('011AVN', 33, '2025-01-03 10:51:54');

I need to know what the status was on the first of each month. For instance:

I have been trying to join it with a series generated this way:

select generate_series('2024-07-01', now(), interval '1 month') as s;

But I'm a bit stuck.


Solution

  • (my answer elaborates on Laurenz Albe's one, who posted first and deserves the credits; it adds distinct by-id timelines)

    -- Each id gets its own timeline, starting with the debuting month of the first measured status: 
    with timelines as
    (
        select id, generate_series(date_trunc('month', min(dateChange)), now(), interval '1 month') d
        from tab
        group by id
    )
    -- Then for each month, lookup the last value it had before (well, the first value it had going backwards):
    SELECT S.d,
           (SELECT tab.status
            FROM tab
            WHERE tab.id = s.id and tab.datechange <= s.d
            ORDER BY tab.datechange DESC
            LIMIT 1)
    from timelines S;
    

    Here is an SQLFiddle to demo it.