postgresqlautovacuum

How can I prevent aggressive autovacuuming from locking an insert-only table in Postgres 13?


I am running a Postgres 13 instance with an insert-only table. When the application writes to this table it manually acquires a "SHARE ROW EXCLUSIVE" lock (as a form of concurrency control to ensure that there are no inserts to the table between the client's read and write to the table). Most of the time this is fine, but when Postgres runs an aggressive autovacuum to prevent XID wraparound, that autovacuum blocks the writes to this table. The autovacuum can take multiple minutes to finish, which is an unacceptably long time to be unable to write to this table.

Based on this blog post, I thought that being on Postgres 13 would prevent this problem:

Different from other autovacuum runs, anti-wraparound autovacuum will not give up when it blocks a concurrent transaction. This will block even short operations that require an ACCESS EXCLUSIVE lock (like DDL statements on the table). Such a blocked operation will block all other access to the table, and processing comes to a standstill.

From PostgreSQL v13 on, the default settings should already protect you from this problem.

My current theory is that the regular autovacuum process isn't actually looking at most pages — from the docs:

[VACUUM] will skip pages that don't have any dead row versions even if those pages might still have row versions with old XID values

Since this is an insert-only table, presumably most/all pages don't have dead row versions, so would be skipped by the autovacuum process. Since the autovacuum never visits these pages, the rows are never frozen, so it ends up needing to do an aggressive autovacuum to prevent wraparound.

I would like to prevent autovacuum from blocking writes on this table.

One approach would be to set autovacuum_freeze_max_age to a very low value for this table, so that aggressive autovacuuming is kicked off more frequently (and should block for a shorter time, since there should be less work to do each time). But that would still block for some amount of time, which I'd like to avoid if possible.

I'm hopeful that there's a way to avoid the aggressive autovacuuming at all for this table. I'm hesitant to use something like a manual vacuum with DISABLE PAGE SKIPPING since it warns that:

is intended to be used only when...there is a hardware or software issue causing database corruption

It seems strange to me that visiting all the pages is coupled with preventing XID wraparound and the "aggressive" setting, so please let me know if I'm misunderstanding how the system works here! I was hoping there was a way to do a more "thorough" non-blocking autovacuum that would freeze all these rows.

How can I avoid blocking the writes on this table?


Solution

  • I think the blog post you cite really misstates the case in its "use case 1". While the new parameters in v13 do increasing vacuuming of insert-only tables, that increased vacuum doesn't do much for gentling the effects of anti-wrapround prevention. It is instead really only beneficial to the visibility map and hint bits. I think that the main motivation for introducing this feature was for the effect on the visibility map, not on anti-wraparound, although as I recall many people on the developer mailing list also did misunderstand it at the time.

    There has been some further progress on improving the anti-wraparound issues since v13, but as far as I know they still aren't as good as they really should be.

    If all your processes are reliably cooperating with each other, you could replace the "SHARE ROW EXCLUSIVE" lock with an advisory lock which has the same user-space effect, but without accidentally roping vacuum into the locking scheme. But I assume you chose "SHARE ROW EXCLUSIVE" because the database enforces that itself, meaning only the one process taking the lock needs to be aware of its intentions. that would make the advisory lock unreliable, as every piece of your user code would need to be audited to ensure they have been changed to use it properly.

    The aggressive vacuum should only run once every vacuum_freeze_table_age minus vacuum_freeze_min_age, which is a large number and so should take a long time to reach. If you have a maintenance window (overnight, weekends) where you can run VACUUM FREEZE manually on the table at least this often, perhaps with INDEX_CLEANUP off and with PARALLEL set, you should avoid any unplanned aggressive vacuuming outside the maintenance window.