postgresqlindexingsubquerysql-execution-plan

How do I make postgres use an index when the query condition references a subquery?


I have a table 'analytics_event' (with 9.7M rows) which I need to query for records 'created' after a given time. The table has various indexes which are being ignored in favour of a seq scan over the table. I've distilled the problem down to the following queries.

This query doesn't use the index (takes around 40s):

explain with batch_params as (
  select 
    (now() - '1 minute'::interval)::timestamptz as created
) select * from private.analytics_event
  where
    created > (select created from batch_params);

-- output --
Seq Scan on analytics_event  (cost=0.01..1620760.63 rows=3251510 width=1025)
  Filter: (created > $0)
  InitPlan 1 (returns $0)
    ->  Limit  (cost=0.00..0.01 rows=1 width=8)
          ->  Result  (cost=0.00..0.01 rows=1 width=8)

Whereas this query DOES use the index (CTE kept, although useless) (takes a less than a ms):

explain with batch_params as (
  select 
    (now() - '1 minute'::interval)::timestamptz as created
) select * from private.analytics_event
  where
    created > (now() - '1 minute'::interval)::timestamptz;

-- output --
Index Scan using analytics_event_payload_event on analytics_event  (cost=0.44..7.33 rows=1 width=1025)
  Index Cond: (created > (now() - '00:01:00'::interval))

My index and table are as follows:

CREATE TABLE private.analytics_event (
    id uuid NOT NULL,
    environment_id varchar NOT NULL,
    payload jsonb NULL,
    created timestamptz(3) DEFAULT now() NOT NULL,
    CONSTRAINT analytics_event_pkey PRIMARY KEY (environment_id, id)
);

CREATE INDEX analytics_event_payload_event ON private.analytics_event USING btree (created, ((payload ->> 'foobar'::text)));

In original larger query, the batch_params queries a specific cursor table for those values.

I've tried:

  1. Adding a limit 1 on the CTE expression (no difference).

  2. Wrapping the batch_params in a function (no difference) e.g.

    CREATE OR REPLACE FUNCTION get_created_date() RETURNS timestamptz AS $BODY$ 
      select created from cursor_table;
    $BODY$ LANGUAGE SQL stable
    
  3. Wrapping the whole query in a function which has a declared variable created. The function first selects the value into the variable before querying the analytics table. This used the index, however it's much less maintainable, and its annoying that I can't explain analyze the query any more (since it's wrapped in a function).

A workaround I think, is to just use a transaction in my host application with multiple intermediate queries however I'd like to avoid this if possible.

How can I restructure this to utilise my index while still keeping my params dynamic?

Postgres Version: 15.6

Edit with explain:

explain (analyze, verbose, buffers, settings) ... <the slow query with sub-query>.
-- output --
Seq Scan on private.analytics_event  (cost=0.01..1620760.63 rows=3251510 width=1025) (actual time=52571.409..52571.410 rows=0 loops=1)
  Output: analytics_event.id, analytics_event.environment_id, analytics_event.payload, analytics_event.created
  Filter: (analytics_event.created > $0)
  Rows Removed by Filter: 9798348
  Buffers: shared hit=473037 read=1025792
  I/O Timings: shared read=49404.088
  InitPlan 1 (returns $0)
    ->  Result  (cost=0.00..0.01 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=1)
          Output: (now() - '00:01:00'::interval)
Settings: effective_cache_size = '7959688kB', jit = 'off', search_path = 'public, public, "$user"'
Query Identifier: -2998777079014550499
Planning Time: 0.087 ms
Execution Time: 52571.433 ms


explain (analyze, verbose, buffers, settings) ... <the fast/indexed query with inline condition>
-- output --
Index Scan using analytics_event_payload_event on private.analytics_event  (cost=0.44..7.33 rows=1 width=1025) (actual time=0.006..0.006 rows=0 loops=1)
  Output: id, environment_id, payload, created
  Index Cond: (analytics_event.created > (now() - '00:01:00'::interval))
  Buffers: shared hit=3
Settings: effective_cache_size = '7959688kB', jit = 'off', search_path = 'public, public, "$user"'
Query Identifier: -1698698247486258523
Planning:
  Buffers: shared read=5
  I/O Timings: shared read=3.082
Planning Time: 3.208 ms
Execution Time: 0.023 ms

Solution

  • The PostgreSQL optimizer is not smart enough to pull up the subquery, so you should do it yourself:

    WITH batch_params AS (
      SELECT current_timestamp - '1 minute'::interval AS created
    )
    SELECT *
    FROM private.analytics_event
       CROSS JOIN batch_params
    WHERE analytics_event.created > batch_params.created;