postgresqllateral-join

Lateral Join returns too many rows


I try to understand LATERAL JOIN with this query:

select m1.contributor_prescription, derniere_publication.date_publication_prescription 
from activite.metadonnee m1
left join lateral 
(select date_publication_prescription, m2.contributor_prescription 
from activite.metadonnee m2 
where m1.contributor_prescription = m2.contributor_prescription 
order by date_publication_prescription desc
limit 3) as derniere_publication
ON true
order by m1.contributor_prescription

The goal is to have for each contributor_prescription the 3 latest date_publication_prescription.

For now, the result is all contributor_prescription rows and the last date_publication_prescription for all.

contributor_prescription date_publication_prescription
john 22/10/2023
john 22/10/2023
(as many times as john exists in the table)
paul 24/10/2023
paul 24/10/2023
(as many times as paul exists in the table)

expected:

contributor_prescription date_publication_prescription
john 22/10/2023
john 21/10/2023
john 20/10/2023
paul 24/10/2023
paul 23/10/2023
paul 21/10/2023

Solution

  • If you want the top 3 most recent per contributor_prescription you can order them with row_number(), then keep only the top 3 using a plain where: demo

    with cte as (
        select contributor_prescription, 
               date_publication_prescription,
               row_number()over w1 as rn
        from activite.metadonnee m1
        window w1 as (partition by contributor_prescription
                          order by date_publication_prescription desc) )
    select contributor_prescription,
           date_publication_prescription
    from cte 
    where rn<=3
    order by 1,2 desc;
    
    contributor_prescription date_publication_prescription
    john 2023-10-22
    john 2023-10-21
    john 2023-10-20
    paul 2023-10-24
    paul 2023-10-23
    paul 2023-10-21

    Or you could do what you did, but don't join straight to the source table, but to distinct contributor_prescription:

    select m1.contributor_prescription, 
           derniere_publication.date_publication_prescription 
    from (select distinct contributor_prescription from activite.metadonnee) m1
    left join lateral 
    (   select  date_publication_prescription, 
                m2.contributor_prescription 
        from activite.metadonnee m2 
        where m1.contributor_prescription = m2.contributor_prescription 
        order by date_publication_prescription desc
        limit 3) as derniere_publication
    ON true
    order by m1.contributor_prescription
    

    Otherwise you're asking every single row to show you 3 most recent rows with the same contributor_prescription, needlessly multiplying everything. Note the performance of that isn't great.

    There's also a very brief method of collecting into an array, taking a slice and unnesting that:

    select contributor_prescription, unnest(arr)
    from (select contributor_prescription, 
                (array_agg(date_publication_prescription 
                           order by date_publication_prescription desc))[:3] arr
          from activite.metadonnee
          group by 1) a;
    

    But the most performant would be the infamous (still emulated) index skip scan: demo

    WITH RECURSIVE t AS (
       SELECT contributor_prescription,
              1 as rank_pos,
              max(date_publication_prescription) AS date_publication_prescription
       FROM activite.metadonnee 
       GROUP BY contributor_prescription
       UNION ALL
       SELECT contributor_prescription,
              rank_pos+1,
              (SELECT max(date_publication_prescription) 
               FROM activite.metadonnee 
               WHERE date_publication_prescription < t.date_publication_prescription
               AND contributor_prescription=t.contributor_prescription)
       FROM t 
       WHERE rank_pos<3
       )
    SELECT contributor_prescription, date_publication_prescription
    FROM t;