postgresqlindexingpostgresql-performancevolatility

How to use date_trunc() with timestamptz in an index to support a join?


I have 2 tables in my database as defined below:

CREATE TABLE metric_events (
    id uuid PRIMARY KEY,
    metric_id integer NOT NULL,
    event_at timestamp with time zone NOT NULL,
    code text NOT NULL, 
    test boolean DEFAULT false NOT NULL,
    success boolean DEFAULT false NOT NULL
);

CREATE TABLE hourly_metric_event_counters (
    start_at timestamp with time zone NOT NULL,
    metric_id integer NOT NULL,
    code text NOT NULL,
    test boolean NOT NULL,
    total_count integer DEFAULT 0,
    success_count integer DEFAULT 0,
    PRIMARY KEY (metric_id, test, code, start_at)
);

CREATE UNIQUE INDEX metric_events_uuid_index ON metric_events(id);
CREATE INDEX metric_events_event_at_index on metric_events(event_at);
CREATE INDEX metric_events_metric_id_and_event_at_index ON metric_events(metric_id, event_at);
CREATE INDEX hourly_metric_event_counters_start_at_index ON hourly_metric_event_counters(start_at);

I run this query once daily:

SELECT date_trunc('hour', b.event_at),
       b.metric_id,
       b.code,
       b.test,
       COUNT(b.*) - MAX(bec.total_count) as total_diff,
       SUM(CASE WHEN b.success THEN 1 ELSE 0 END) - MAX(bec.success_count) as success_diff
FROM metric_events b
JOIN hourly_metric_event_counters bec
ON bec.metric_id = b.metric_id AND
   bec.test = b.test AND
   bec.code = b.code AND
   bec.start_at = date_trunc('hour', b.event_at)
WHERE event_at >= date_trunc('hour', NOW() - interval '24 hours') AND event_at < date_trunc('hour', NOW() - interval '1 hour')
GROUP BY 1, 2, 3, 4
HAVING COUNT(b.*) - MAX(bec.total_count) != 0 OR
       SUM(CASE WHEN b.success THEN 1 ELSE 0 END) - MAX(bec.success_count) != 0

I am trying to create an index on metric_events for columns metric_id, date_trunc('hour', event_at), test, code to make the join faster:

CREATE INDEX metric_events_composite_index on metric_events (metric_id, code, test, success, date_trunc('hour'::text, event_at));

However:

ERROR: functions in index expression must be marked IMMUTABLE SQL state: 42P17

Is there a way?


Solution

  • There are multiple overloaded versions of the function date_trunc(). All of them are IMMUTABLE - except the one you picked taking only timestamptz input, which is (necessarily) only STABLE. Some of the invocations (like date_trunc('day', event_at)) depend on the current timezone setting. There are even (incredibly stupid) time shifts involving fractions of an hour for the (stupid-by-design) "daylight saving time". So even truncating a point in time to the hour can depend on the time zone and time of the year.

    So you cannot create an index on date_trunc('hour', event_at). The expression is not IMMUTABLE. Casting to time or timestamp is also just STABLE for similar reasons. So an index involving date_trunc('hour', (event_at::time)) is also not possible.

    Postgres 12 added another overloaded variant date_trunc(text, timestamptz, text) taking the applicable time zone as 3rd parameter. That removes the dependency on the current timezone setting. Time zone definitions do change from time to time, however. And that could break an index relying on it. So this variant also started out as STABLE. The tiny residual risk had been deemed acceptable in other contexts, so it was eventually upgraded to IMMUTABLE with Postgres 16. The release notes:

    Change date_trunc(unit, timestamptz, time_zone) to be an immutable function (Przemyslaw Sztoch) §

    This allows the creation of expression indexes using this function.

    Now you can use date_trunc('hour', event_at, 'UTC') to declare for 'UTC' (for example), and build an index on it. Recreate such indexes after breaking changes to the underlying time zone definitions - which rarely happens, especially not for UTC, unsullied by "daylight saving" nonsense.

    That said, I doubt your index would be of much help, or even be used to begin with. If you were hoping for index-only scans, then move success to the INCLUDE section of the index. (It's not used for filtering or sorting.) You would also have to include the original column event_at to make index-only scans possible (a weakness of the query planner):

    CREATE INDEX metric_events_composite_index ON metric_events
     (metric_id, code, test, date_trunc('hour', event_at, 'UTC')) INCLUDE (success, event_at);
    

    Workarounds (for older versions)

    See: