postgresqlnetezza

Find Out What Rows Where Deleted And Updated


Is it possible to somehow find out primary keys of rows deleted from a table?

Similarly, is it possible to find out primary keys of rows updated in a table?

Let's consider a simple use case:

I have two timestamps:

Give me primary keys of all rows updated/deleted between those two timestamps.

There is no information inside the tables to indicate they have been updated.


Solution

  • No, PostgreSQL doesn't provide anything like that built-in.

    To do so it'd have to waste space and I/O keeping track of something most people will never need.

    If your primary keys are contiguous and monotonic then you can just do a left anti-join over generate_series to find holes. However, this won't work if you're using serial (i.e. a sequence) to generate keys, because any rollback will create a gap even though nothing was really ever inserted in order to then be deleted.

    You'll need to add an audit trigger to record the actions you want to track to a separate audit table. You can't do this retroactively; you can't audit things that're already in the past. This is the easiest option by far.

    Another common solution is to flag rows as "deleted" at the application level, without actually deleting them from the table. For example, you might set a deleted_at and deleted_by field in the table. You can even make this application-transparent using a view with an ON INSERT OR UPDATE OR DELETE DO INSTEAD trigger.

    There's also a pg_audit extension in development to make this easier, but it's not ready yet.


    Semi-relevant side-note: In PostgreSQL 9.5 I think this might actually be possible for recently deleted rows, before VACUUM cleans them up, using a C extension. 9.5 will be capable of recording the timestamp of a commit, which 9.4 and below don't (they just record the transaction ID). So you could find out which transaction ID deleted a row by examining xmax and from there, determine the timestamp it was deleted at. You'd need a C extension because you'd have to do a "dirty read" on the table.