postgresqlpartitioningstatusranking-functions

Track dates at which a status changes to a specific value


I have a table my_table like this:

                     id                  |          updated_at           | status 
-----------------------------------------+-------------------------------+--------
 ID1                                     | 2023-12-18 18:37:21.724825+00 | I
 ID1                                     | 2024-01-02 23:08:03.50741+00  | I
 ID1                                     | 2024-01-03 23:07:03.892617+00 | I
 ID1                                     | 2024-01-06 23:09:00.877018+00 | A
 ID1                                     | 2024-01-07 23:08:03.134111+00 | A
 ID1                                     | 2024-01-08 23:06:31.164932+00 | A
 ID1                                     | 2024-03-03 23:39:39.138697+00 | A
 ID1                                     | 2024-03-04 23:55:54.582789+00 | F
 ID1                                     | 2024-03-05 23:37:15.70314+00  | F
 ID1                                     | 2024-03-06 23:47:16.814729+00 | F
 ID1                                     | 2024-03-07 23:42:54.349137+00 | A
 ID1                                     | 2024-03-08 23:35:53.188792+00 | A
 ID1                                     | 2024-03-09 23:30:55.979575+00 | A
 ID1                                     | 2024-04-10 08:30:05.329453+00 | F
 ID2                                     | 2024-03-15 23:52:28.005298+00 | A
 ID2                                     | 2024-03-16 23:53:38.031816+00 | A
 ID2                                     | 2024-03-17 23:57:42.556396+00 | A
 ID2                                     | 2024-03-18 13:33:38.264444+00 | A
 ID2                                     | 2024-03-18 22:48:29.366489+00 | A
 ID2                                     | 2024-03-19 15:00:02.19655+00  | F
 ID2                                     | 2024-03-26 18:37:13.514644+00 | A
 ID2                                     | 2024-03-27 15:19:13.823159+00 | F
 ID2                                     | 2024-03-28 15:31:46.841049+00 | F
 ID2                                     | 2024-04-03 07:43:08.253604+00 | P

I would like to select from it the dates at which the status changes from any value to the value F, i.e I am expecting the following output:

ID1   | 2024-03-04 23:55:54.582789+00
ID1   | 2024-04-10 08:30:05.329453+00
ID2   | 2024-03-19 15:00:02.19655+00
ID2   | 2024-03-27 15:19:13.823159+00

So far, I managed to get the first time such a change occurs for each id, but I can't get subsequent changes. Here is my query:

    SELECT
        id, updated_at
    FROM (
        SELECT
            *
            , RANK() OVER (PARTITION BY id ORDER BY updated_at ASC) AS rank_
        FROM ( SELECT * FROM my_table WHERE status='F' ) t_sub
    ) T
    WHERE rank_ = 1

How to change it to get not only the first time each id turns to the status F, but also second, third, fourth, etc... changes to F?


Solution

  • Set up a subquery where you establish whether a row has status='F' and the value of lag(status)over(partition by id order by updated_at) window function is different from it, then only select those: demo

    select id,updated_at 
    from(select *
               ,status<>lag(status,1,status)over w1 and status='F' as changed_to_f
         from my_table
         window w1 as (partition by id order by updated_at) ) as subquery
    where changed_to_f;
    
    id updated_at
    ID1 2024-03-04 23:55:54.582789+00
    ID1 2024-04-10 09:30:05.329453+01
    ID2 2024-03-19 15:00:02.19655+00
    ID2 2024-03-27 15:19:13.823159+00

    It's a strict interpretation: this does not catch the rows that are the initial row of a given ID, because there's no prior row it would be changing from. If you do want to also catch those:

    select id,updated_at 
    from(select *,status is distinct from lag(status)over w1 and status='F' 
                  as changed_to_f
                 ,status<>lag(status,1,status||'.')over w1 and status='F' 
                  as also_changed_to_f
         from my_table
         window w1 as (partition by id order by updated_at) ) as subquery
    where changed_to_f
    order by 1,2;