I have a table 'analytics_event' (with 9.7M rows) which I need to query for records 'created' after a given time. The table has various indexes which are being ignored in favour of a seq scan over the table. I've distilled the problem down to the following queries.
This query doesn't use the index (takes around 40s):
explain with batch_params as (
select
(now() - '1 minute'::interval)::timestamptz as created
) select * from private.analytics_event
where
created > (select created from batch_params);
-- output --
Seq Scan on analytics_event (cost=0.01..1620760.63 rows=3251510 width=1025)
Filter: (created > $0)
InitPlan 1 (returns $0)
-> Limit (cost=0.00..0.01 rows=1 width=8)
-> Result (cost=0.00..0.01 rows=1 width=8)
Whereas this query DOES use the index (CTE kept, although useless) (takes a less than a ms):
explain with batch_params as (
select
(now() - '1 minute'::interval)::timestamptz as created
) select * from private.analytics_event
where
created > (now() - '1 minute'::interval)::timestamptz;
-- output --
Index Scan using analytics_event_payload_event on analytics_event (cost=0.44..7.33 rows=1 width=1025)
Index Cond: (created > (now() - '00:01:00'::interval))
My index and table are as follows:
CREATE TABLE private.analytics_event (
id uuid NOT NULL,
environment_id varchar NOT NULL,
payload jsonb NULL,
created timestamptz(3) DEFAULT now() NOT NULL,
CONSTRAINT analytics_event_pkey PRIMARY KEY (environment_id, id)
);
CREATE INDEX analytics_event_payload_event ON private.analytics_event USING btree (created, ((payload ->> 'foobar'::text)));
In original larger query, the batch_params
queries a specific cursor table for those values.
I've tried:
Adding a limit 1
on the CTE expression (no difference).
Wrapping the batch_params
in a function (no difference) e.g.
CREATE OR REPLACE FUNCTION get_created_date() RETURNS timestamptz AS $BODY$
select created from cursor_table;
$BODY$ LANGUAGE SQL stable
Wrapping the whole query in a function which has a declared variable created
. The function first selects the value into
the variable before querying the analytics table. This used the index, however it's much less maintainable, and its annoying that I can't explain analyze
the query any more (since it's wrapped in a function).
A workaround I think, is to just use a transaction in my host application with multiple intermediate queries however I'd like to avoid this if possible.
How can I restructure this to utilise my index while still keeping my params dynamic?
Postgres Version: 15.6
Edit with explain:
explain (analyze, verbose, buffers, settings) ... <the slow query with sub-query>.
-- output --
Seq Scan on private.analytics_event (cost=0.01..1620760.63 rows=3251510 width=1025) (actual time=52571.409..52571.410 rows=0 loops=1)
Output: analytics_event.id, analytics_event.environment_id, analytics_event.payload, analytics_event.created
Filter: (analytics_event.created > $0)
Rows Removed by Filter: 9798348
Buffers: shared hit=473037 read=1025792
I/O Timings: shared read=49404.088
InitPlan 1 (returns $0)
-> Result (cost=0.00..0.01 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=1)
Output: (now() - '00:01:00'::interval)
Settings: effective_cache_size = '7959688kB', jit = 'off', search_path = 'public, public, "$user"'
Query Identifier: -2998777079014550499
Planning Time: 0.087 ms
Execution Time: 52571.433 ms
explain (analyze, verbose, buffers, settings) ... <the fast/indexed query with inline condition>
-- output --
Index Scan using analytics_event_payload_event on private.analytics_event (cost=0.44..7.33 rows=1 width=1025) (actual time=0.006..0.006 rows=0 loops=1)
Output: id, environment_id, payload, created
Index Cond: (analytics_event.created > (now() - '00:01:00'::interval))
Buffers: shared hit=3
Settings: effective_cache_size = '7959688kB', jit = 'off', search_path = 'public, public, "$user"'
Query Identifier: -1698698247486258523
Planning:
Buffers: shared read=5
I/O Timings: shared read=3.082
Planning Time: 3.208 ms
Execution Time: 0.023 ms
214M
.last_autoanalyze
was run today (the data was added a over a long period prior).The PostgreSQL optimizer is not smart enough to pull up the subquery, so you should do it yourself:
WITH batch_params AS (
SELECT current_timestamp - '1 minute'::interval AS created
)
SELECT *
FROM private.analytics_event
CROSS JOIN batch_params
WHERE analytics_event.created > batch_params.created;