I have a table where I record events. These events will come into the system once a minute, per user - on the order of 10's of thousands a minute. But I don't need to save all of them.
If an event comes in within 90 seconds of the previous event, I want to update the previous row. If it's been more than 90 seconds, I want to insert a new row.
Example table:
create table events (
id serial,
user_id int references users(id) not null,
created_at timestamp not null default now(),
updated_at timestamp not null default now(),
...some other event columns here...
);
create index idx_events_user_id_updated_at on events (user_id, updated_at desc);
And the pseudocode would be something like:
updated_at
is within the last 90 seconds:
updated_at
with now()
Is there a way to do this with a single Postgres statement?
I'm aware of on conflict
, but I don't think it'll work for this use-case. The (user_id, updated_at)
pair could be defined as a unique constraint, which could be used to trigger an on conflict
, but the timestamps are arbitrary. These events come in "every minute", but not exactly on a minute (or even exactly a minute apart, due to network latency, server latency, etc etc, hence using 90 seconds to give a 30 second buffer). Truncating the timestamps to the minute would reduce the feature's usefulness, so I'd hate to do that just to handle upserts more cleanly.
Is there a way to do this with a single Postgres statement?
Can be done with an UPSERT command. You need an EXCLUSION
constraint on the timestamp range. For the leading integer column user_id
, the additional module btree_gist
must be installed. See:
ALTER TABLE events
ADD CONSTRAINT user_90sec
EXCLUDE USING gist (user_id WITH =, tsrange(updated_at, updated_at + interval '90 sec') WITH &&);
Notably, the same doesn't work for timestamptz
because timestamptz + interval
is only STABLE
, while timestamp + interval
is IMMUTABLE
as required for the implied GiST index (or any index for that matter).
Query:
WITH input_rows(user_id, data) AS (
VALUES
(1, 'foo_new') -- your input here
, (2, 'bar_new')
, (3, 'baz_new')
-- more?
)
, ins AS (
INSERT INTO events (user_id, data)
SELECT user_id, data FROM input_rows
ON CONFLICT ON CONSTRAINT user_90sec DO NOTHING
RETURNING user_id
)
UPDATE events e
SET updated_at = LOCALTIMESTAMP
, data = i.data
FROM input_rows i
LEFT JOIN ins USING (user_id)
WHERE ins.user_id IS NULL
AND e.user_id = i.user_id
AND e.updated_at > LOCALTIMESTAMP - interval '90 sec';
This assumes at most one row per user_id
in the input.
And no concurrent, competing writes. Else, there may be race conditions due to the tiny, but unavoidable time gap between INSERT
and UPDATE
. (ON CONFLICT ... DO UPDATE
is not allowed based on an exclusion constraint.)
Also, no ambiguous data types in the input, or you must add an explicit cast. See:
Depending on details of your case, there may well be faster solutions. Like, just write all incoming rows (with a substantially faster COPY
command), and later consolidate in bulk operations.
While sticking with this solution, it may be faster to add a generated column for the tsrange
and base the exclusion constraint (and its implied GiST index) on that. Bloats the table row a bit, though. See: