postgresqllateral

Selecting events from the start of a cycle in Postgres 11.6


Using Postgres 11.6, I'm trying to write a query that filters out events based on their position in a sequence during grouping. We've got a lot of event data grouped in "cycles". Here's the table setup:

Here's the table setup:

CREATE TABLE IF NOT EXISTS data.event (
    id uuid NOT NULL,
    cycle_id uuid,
    dts timestamp without time zone,
    state citext,
    PRIMARY KEY(id)
);

Below are some sample records from three different cycles:

id                                      cycle_id                                dts                    state
f5c1ecef-b917-6c44-ae18-ed8c8b9e71d5    00000910-da83-7c4b-b47a-38be998471d1    2019-05-15 15:54:17    Boxed
dbf713c9-802d-a445-9cf8-ff2953d1ef6c    00000910-da83-7c4b-b47a-38be998471d1    2019-05-16 09:08:23    Assembly
93ea8e6a-2c41-d848-805b-dc83d572af1e    00000910-da83-7c4b-b47a-38be998471d1    2019-05-16 09:34:51    Waiting For Quality
bc30d154-8c72-7e40-b75d-048b7f2ca3d4    00000910-da83-7c4b-b47a-38be998471d1    2019-05-16 10:43:31    Quality
0b69895a-36fd-274b-8730-e81206636013    00000910-da83-7c4b-b47a-38be998471d1    2019-05-16 13:02:07    Waiting After Quality

550a0b9e-7731-4347-ba55-4ca222b3e69f    0000cf14-e5d3-4c4c-9e44-b39bc5f7fa7b    2019-04-15 13:39:59    Boxed
1c43d256-807c-8643-82a8-1c32653cedb3    0000cf14-e5d3-4c4c-9e44-b39bc5f7fa7b    2019-04-15 19:14:36    Waiting For Assembly
9b4e8e70-d383-4b47-aed4-4aa9de3b5b5d    0000cf14-e5d3-4c4c-9e44-b39bc5f7fa7b    2019-04-15 20:13:59    Assembly
1d99218c-ea25-6c4d-91a5-07d5198971f3    0000cf14-e5d3-4c4c-9e44-b39bc5f7fa7b    2019-04-15 20:16:11    Waiting For Quality
b9571373-4de1-cc4c-badc-58cbaca2de1a    0000cf14-e5d3-4c4c-9e44-b39bc5f7fa7b    2019-04-15 21:36:59    Quality
1d233ed1-ea57-f741-af76-cf2a8630f5d7    0000cf14-e5d3-4c4c-9e44-b39bc5f7fa7b    2019-04-15 23:40:04    Waiting After Quality
db305c40-485b-b542-8c05-739ea46fa006    0000cf14-e5d3-4c4c-9e44-b39bc5f7fa7b    2019-04-16 22:15:13    Storage
2afa5d2c-f061-4c47-be85-010010a12785    0000cf14-e5d3-4c4c-9e44-b39bc5f7fa7b    2019-04-16 23:01:28    Vendor Dropoff

43a35799-d58e-5049-b9ca-12e0980407b1    00010d69-51d0-8b4f-9acb-1573a97517cd    2019-08-08 14:09:56    Boxed
8db57a1a-6f02-7c46-81ca-f1e6acb464cd    00010d69-51d0-8b4f-9acb-1573a97517cd    2019-08-10 12:18:02    Assembly
74d53a8d-5e22-274d-b429-b3bfa89b0211    00010d69-51d0-8b4f-9acb-1573a97517cd    2019-08-10 12:18:25    Waiting For Quality
8517b241-1a2f-2c47-80f5-f910a7b0729c    00010d69-51d0-8b4f-9acb-1573a97517cd    2019-08-10 12:25:14    Quality
ebaaf219-7319-084d-9179-ec7b1f4bffdf    00010d69-51d0-8b4f-9acb-1573a97517cd    2019-08-10 14:53:42    Waiting After Quality
3ee6ae5b-9e44-4944-9aad-b084552309af    00010d69-51d0-8b4f-9acb-1573a97517cd    2019-08-14 00:08:19    Storage
cea8cfdb-533d-5e42-b136-b9bfd5d69eee    00010d69-51d0-8b4f-9acb-1573a97517cd    2019-08-14 23:00:00    Transport
b27e43da-df4f-f642-8158-25efa2f841a4    00010d69-51d0-8b4f-9acb-1573a97517cd    2019-08-14 14:09:17    Return to Vendor

In reality, the cycles include more steps, I've only included enough to make an example. The goal here is to filter out events after specific cycle phases, "Waiting After Quality" or "Storage". Every cycle has one or both of these events. Written imperatively, the logic for each cycle would be like this:

If (The cycle includes "Storage") then
    Keep only the cycle data up to and include "Storage"
Else
    Keep only the cycle data up to and including "Waiting After Quality"
End if

The target result includes only these rows:

id                                      cycle_id                                dts                    state
f5c1ecef-b917-6c44-ae18-ed8c8b9e71d5    00000910-da83-7c4b-b47a-38be998471d1    2019-05-15 15:54:17    Boxed
dbf713c9-802d-a445-9cf8-ff2953d1ef6c    00000910-da83-7c4b-b47a-38be998471d1    2019-05-16 09:08:23    Assembly
93ea8e6a-2c41-d848-805b-dc83d572af1e    00000910-da83-7c4b-b47a-38be998471d1    2019-05-16 09:34:51    Waiting For Quality
bc30d154-8c72-7e40-b75d-048b7f2ca3d4    00000910-da83-7c4b-b47a-38be998471d1    2019-05-16 10:43:31    Quality
0b69895a-36fd-274b-8730-e81206636013    00000910-da83-7c4b-b47a-38be998471d1    2019-05-16 13:02:07    Waiting After Quality

550a0b9e-7731-4347-ba55-4ca222b3e69f    0000cf14-e5d3-4c4c-9e44-b39bc5f7fa7b    2019-04-15 13:39:59    Boxed
1c43d256-807c-8643-82a8-1c32653cedb3    0000cf14-e5d3-4c4c-9e44-b39bc5f7fa7b    2019-04-15 19:14:36    Waiting For Assembly
9b4e8e70-d383-4b47-aed4-4aa9de3b5b5d    0000cf14-e5d3-4c4c-9e44-b39bc5f7fa7b    2019-04-15 20:13:59    Assembly
1d99218c-ea25-6c4d-91a5-07d5198971f3    0000cf14-e5d3-4c4c-9e44-b39bc5f7fa7b    2019-04-15 20:16:11    Waiting For Quality
b9571373-4de1-cc4c-badc-58cbaca2de1a    0000cf14-e5d3-4c4c-9e44-b39bc5f7fa7b    2019-04-15 21:36:59    Quality
1d233ed1-ea57-f741-af76-cf2a8630f5d7    0000cf14-e5d3-4c4c-9e44-b39bc5f7fa7b    2019-04-15 23:40:04    Waiting After Quality
db305c40-485b-b542-8c05-739ea46fa006    0000cf14-e5d3-4c4c-9e44-b39bc5f7fa7b    2019-04-16 22:15:13    Storage

43a35799-d58e-5049-b9ca-12e0980407b1    00010d69-51d0-8b4f-9acb-1573a97517cd    2019-08-08 14:09:56    Boxed
8db57a1a-6f02-7c46-81ca-f1e6acb464cd    00010d69-51d0-8b4f-9acb-1573a97517cd    2019-08-10 12:18:02    Assembly
74d53a8d-5e22-274d-b429-b3bfa89b0211    00010d69-51d0-8b4f-9acb-1573a97517cd    2019-08-10 12:18:25    Waiting For Quality
8517b241-1a2f-2c47-80f5-f910a7b0729c    00010d69-51d0-8b4f-9acb-1573a97517cd    2019-08-10 12:25:14    Quality
ebaaf219-7319-084d-9179-ec7b1f4bffdf    00010d69-51d0-8b4f-9acb-1573a97517cd    2019-08-10 14:53:42    Waiting After Quality
3ee6ae5b-9e44-4944-9aad-b084552309af    00010d69-51d0-8b4f-9acb-1573a97517cd    2019-08-14 00:08:19    Storage

A simple grouping query lays out the cycle events in time order:

  select cycle_id,
         string_agg(state::text, ', ' order by dts asc) as states
     from event   
 group by cycle_id 



cycle_id                                states
00000910-da83-7c4b-b47a-38be998471d1    Boxed, Assembly, Waiting For Quality, Quality, Waiting After Quality
0000cf14-e5d3-4c4c-9e44-b39bc5f7fa7b    Boxed, Waiting For Assembly, Assembly, Waiting For Quality, Quality, Waiting After Quality, Storage, Vendor Dropoff
00010d69-51d0-8b4f-9acb-1573a97517cd    Boxed, Assembly, Waiting For Quality, Quality, Waiting After Quality, Storage, Return to Vendor, Transport

I need to find the events from the start of each cycle until and including "Storage" or "Waiting After Quality", with "Storage" in case both are present.

And I'm stumped. I understand how to write this in a client-side language, but not in SQL. I'd much prefer to do it in SQL. If someone can show me how, or point me in the right direction, that would be fantastic.

The Answer

Here's a final solution with a couple of small syntax changes from the accepted answer.

SELECT event.*
FROM data.event
JOIN (
  SELECT  cycle_id,
         COALESCE(
            MIN(dts) FILTER (WHERE state = 'Storage'),
            MIN(dts) FILTER (WHERE state = 'Waiting After Quality')) as upto

     FROM data.event

 GROUP BY cycle_id

   ) last_events USING (cycle_id)

WHERE dts <= upto

ORDER BY cycle_id,
         dts

Solution

  • I'd suggest first finding the timestamps per cycle up-to-and-including which you want to select the events:

    SELECT
      cycle_id,
      COALESCE(
        MIN(dts) FILTER (WHERE state = "Storage"),
        MIN(dts) FILTER (WHERE state = "Waiting After Quality")
      ) as upto
    FROM data.event
    GROUP BY cycle_id
    

    Then you can join these results against all the events, and filter them by the respective condition:

    SELECT event.*
    FROM data.event
    JOIN (
      SELECT
        cycle_id,
        COALESCE(
          MIN(dts) FILTER (WHERE state = "Storage"),
          MIN(dts) FILTER (WHERE state = "Waiting After Quality")
        ) as upto
      FROM data.event
      GROUP BY cycle_id
    ) last_events USING (cycle_id)
    WHERE dts <= upto