postgresqlaggregate-functions

In Postgres, does percentile_disc reuse calculations from other percentile_disc in the same query?


Lets say you have some table you're querying percentiles from, with the same grouping used for both columns:

SELECT percentile_disc(0.25) WITHIN GROUP (ORDER BY my_col) as p25,
       percentile_disc(0.75) WITHIN GROUP (ORDER BY my_col) as p75 from my_table

Does Postgres reuse the grouping calculated for the first column when calculating the second?


Solution

  • It does. You can see that in explain analyse verbose: no matter how many of those you add, you only loop once.
    demo at db<>fiddle

    explain analyse verbose
    SELECT percentile_disc(0.25) WITHIN GROUP (ORDER BY my_col) as p25
          ,percentile_disc(0.75) WITHIN GROUP (ORDER BY my_col) as p75
          ,percentile_disc(0.3) WITHIN GROUP (ORDER BY my_col) as p
        ,percentile_disc(0.4) WITHIN GROUP (ORDER BY my_col)
        ,percentile_disc(0.5) WITHIN GROUP (ORDER BY my_col)
        ,percentile_disc(0.6) WITHIN GROUP (ORDER BY my_col)
        ,percentile_disc(0.7) WITHIN GROUP (ORDER BY my_col)
    from my_table;
    
    Aggregate (cost=8464.74..8464.75 rows=1 width=56) (actual time=343.493..343.495 rows=1 loops=1)
    Output: percentile_disc('0.25'::double precision) WITHIN GROUP (ORDER BY my_col), percentile_disc('0.75'::double precision) WITHIN GROUP (ORDER BY my_col), percentile_disc('0.3'::double precision) WITHIN GROUP (ORDER BY my_col), percentile_disc('0.4'::double precision) WITHIN GROUP (ORDER BY my_col), percentile_disc('0.5'::double precision) WITHIN GROUP (ORDER BY my_col), percentile_disc('0.6'::double precision) WITHIN GROUP (ORDER BY my_col), percentile_disc('0.7'::double precision) WITHIN GROUP (ORDER BY my_col)
    -> Seq Scan on public.my_table (cost=0.00..7214.38 rows=500138 width=8) (actual time=0.016..39.669 rows=500000 loops=1)
    Output: my_col
    Planning Time: 0.122 ms
    Execution Time: 344.358 ms

    It also seems like it doesn't really matter how many of those you request. In this test I didn't see any clear correlation between exec time and the number of different percentile_disc() calls in the query. It ran 600 queries against a table with 100k rows, requesting 1-301 different, random percentiles. Those that asked for single percentile averaged around 32ms, same as those requesting 131 or 81 different percentiles.
    Whatever differences I saw initially must've been just noise.