sqlpostgresqllateral-joinpostgresql-16

Use results from one query in WHERE clause of the next query


This is how it currently works, first select query returns a list of parameters.

SELECT character_id, foe_id, location_id, date_time, damage, points 
FROM events 
ORDER BY date_time DESC 
LIMIT 100

Now, the following is iterated 100 times to complete the data, for example the first one returns:

  character_id = 12695
, location_id = 818
, foe_id = 33997
, date_time = '2024-05-02 23:00:00'

The following query is based on these results:

SELECT *, A1+B1-C1-D1-E1-F1+G1+H1 AS A2 
FROM 
    (SELECT * 
     FROM 
         (SELECT id_, cnt_7, date_diff_7 , NTH_VALUE (A0,1) OVER() AS A1, NTH_VALUE (A0,2) OVER() AS B1, NTH_VALUE (B0,1) OVER() AS C1, NTH_VALUE (B0,2) OVER() AS D1 
          FROM 
              (SELECT damage AS A0, points AS B0, RootQuery_7.id as id_, count(*) over() as cnt_7,  max(date_diff) over() as date_diff_7 
               FROM 
                   (SELECT *, extract(day from date_time - lag(date_time) over (ORDER BY date_time)) as date_diff 
                    FROM events 
                    WHERE character_id = 12695 
                      AND status = 1 
                      AND location_id = 818 
                      AND date_time < '2024-05-02 23:00:00' 
                    ORDER BY date_time DESC 
                    LIMIT 10 OFFSET 0) AS RootQuery_7) 
            AS BaseQuery7 LIMIT 1) wnd_1 
        JOIN (SELECT id_, cnt_8, date_diff_8 , NTH_VALUE (D0,2) OVER() AS H1, NTH_VALUE (C0,1) OVER() AS E1, NTH_VALUE (C0,2) OVER() AS F1, NTH_VALUE (D0,1) OVER() AS G1 FROM 
            (SELECT damage AS C0, points AS D0, RootQuery_8.id as id_, count(*) over() as cnt_8,  max(date_diff) over() as date_diff_8 FROM 
                (SELECT *, extract(day from date_time - lag(date_time) over (ORDER BY date_time)) as date_diff FROM events WHERE foe_id = 33997 AND status = 1 AND location_id = 818 AND date_time < '2024-05-02 23:00:00' ORDER BY date_time DESC LIMIT 15 OFFSET 0)
                    AS RootQuery_8)
            AS BaseQuery8 LIMIT 1) wnd_2
    ON wnd_1.id_ <> 0 WHERE cnt_7 >= 10 AND date_diff_7 <= 150 AND cnt_8 >= 10 AND date_diff_8 <= 150) AS layer_1

It works but is too inefficient and slow, please don't mind the verisimilitude of the data retrieved, since it is an example.

I am trying to get this list of results from a single query like this:

(SELECT character_id, foe_id, location_id, date_time, damage, points FROM events ORDER BY date_time DESC LIMIT 100) ref_
JOIN
SELECT *, A1+B1-C1-D1-E1-F1+G1+H1 AS A2 FROM 
    (SELECT * FROM 
        (SELECT id_, cnt_7, date_diff_7 , NTH_VALUE (A0,1) OVER() AS A1, NTH_VALUE (A0,2) OVER() AS B1, NTH_VALUE (B0,1) OVER() AS C1, NTH_VALUE (B0,2) OVER() AS D1 FROM 
            (SELECT damage AS A0, points AS B0, RootQuery_7.id as id_, count(*) over() as cnt_7,  max(date_diff) over() as date_diff_7 FROM 
                (SELECT *, extract(day from date_time - lag(date_time) over (ORDER BY date_time)) as date_diff FROM events WHERE character_id = ref_.character_id  AND status = 1 AND location_id = ref_.location_id AND date_time < location_id = ref_.date_time ORDER BY date_time DESC LIMIT 10 OFFSET 0)te_time
                    AS RootQuery_7) 
            AS BaseQuery7 LIMIT 1) wnd_1 
        JOIN (SELECT id_, cnt_8, date_diff_8 , NTH_VALUE (D0,2) OVER() AS H1, NTH_VALUE (C0,1) OVER() AS E1, NTH_VALUE (C0,2) OVER() AS F1, NTH_VALUE (D0,1) OVER() AS G1 FROM 
            (SELECT damage AS C0, points AS D0, RootQuery_8.id as id_, count(*) over() as cnt_8,  max(date_diff) over() as date_diff_8 FROM 
                (SELECT *, extract(day from date_time - lag(date_time) over (ORDER BY date_time)) as date_diff FROM events WHERE foe_id = ref_.foe_id  AND status = 1 AND location_id = ref_.location_id AND date_time < location_id = ref_.date_time ORDER BY date_time DESC LIMIT 15 OFFSET 0)
                    AS RootQuery_8)
            AS BaseQuery8 LIMIT 1) wnd_2
    ON wnd_1.id_ <> 0 WHERE cnt_7 >= 10 AND date_diff_7 <= 150 AND cnt_8 >= 10 AND date_diff_8 <= 150) AS layer_1
ON ...

But the result of the first query results cannot be used within the deepest layer of selects in the second query.


Solution

  • TLDR: Use a LATERAL join.

    LATERAL joins allow subqueries to the right to reference columns in FROM items to the left. See:

    Your query could look something like this:

    SELECT *, A1+B1-C1-D1-E1-F1+G1+H1 AS A2
    FROM  (
       SELECT character_id, foe_id, location_id, date_time, damage, points FROM events ORDER BY date_time DESC LIMIT 100
       ) prime
    JOIN LATERAL (  -- !
       SELECT id_, cnt_7, date_diff_7
            , nth_value(A0,1) OVER () AS A1, nth_value(A0,2) OVER () AS B1, nth_value(B0,1) OVER () AS C1, nth_value(B0,2) OVER () AS D1
       FROM  (
          SELECT damage AS A0, points AS B0, RootQuery_7.id AS id_, count(*) OVER () AS cnt_7,  max(date_diff) OVER () AS date_diff_7
          FROM  (
             SELECT *, extract(day FROM e.date_time - lag(e.date_time) OVER (ORDER BY e.date_time)) AS date_diff
             FROM   events e
             WHERE  e.status = 1
             AND    e.character_id = prime.character_id  -- LATERAL reference
             AND    e.location_id  = prime.location_id   -- LATERAL reference
             AND    e.date_time    < prime.date_time     -- LATERAL reference
             ORDER  BY e.date_time DESC
             LIMIT  10
             ) AS RootQuery_7
          ) AS BaseQuery7
          LIMIT 1
       ) wnd_1 ON wnd_1.id_ <> 0
    CROSS JOIN LATERAL (  -- !
       SELECT id_, cnt_8, date_diff_8 , nth_value(D0,2) OVER () AS H1, nth_value(C0,1) OVER () AS E1, nth_value(C0,2) OVER () AS F1, nth_value(D0,1) OVER () AS G1
       FROM  (
          SELECT damage AS C0, points AS D0, RootQuery_8.id AS id_, count(*) OVER () AS cnt_8,  max(date_diff) OVER () AS date_diff_8
          FROM  (
             SELECT *, extract(day from date_time - lag(date_time) OVER (ORDER BY date_time)) AS date_diff
             FROM   events
             WHERE  status = 1
             AND    foe_id      = prime.foe_id       -- LATERAL reference
             AND    location_id = prime.location_id  -- LATERAL reference
             AND    date_time   < prime.date_time    -- LATERAL reference
             ORDER  BY date_time DESC
             LIMIT  15
             ) RootQuery_8
          ) BaseQuery8
       LIMIT 1
       ) wnd_2
    WHERE  wnd_1.cnt_7 >= 10
    AND    wnd_1.date_diff_7 <= 150
    AND    wnd_2.cnt_8 >= 10
    AND    wnd_2.date_diff_8 <= 150;
    

    Can certainly be simplified further. But that's beyond the scope of this question.

    Two instances of LIMIT 1 without ORDER BY lead to arbitrary results. Should probably be fixed (or removed).

    Aside:
    Avoid mixed-case identifiers in Postgres if at all possible. See: