postgresqlpartitioningvacuumautovacuum

Autovacuum and partitioned tables


Postgres doc tells that partitioned tables are not processed by autovacuum. But still I see that last_autovacuum column from pg_stat_user_tables is populated with recent timestamps for live partitions.

Does it mean that these timestamps are set by the background worker which only prevents transaction ID wraparound, without actually performing ANALYZE&VACUUM? Or whatever else could populate them?

Besides, taken that partitions are large and active enough, should I run the both ANALYZE and VACUUM manually on those partitions? If yes, does the order matter?

UPDATE

I'm trying to elaborate, thanks to the comments given.

  1. Taking that vacuum should work the same way on partition as on the regular table, what could be a reason for much faster growth of the occupied disk space after partitioning? Before partitioning it was nearly a linear function of records count.

  2. What is confusing as well, when looking for autovacuum processes running I see that those related to partitions are denoted with "to prevent wraparound", while others are not. Is it absolutely a coincidence or there is something to check?

  3. Documentation describes partitioned table as rather a virtual entity, without its own storage. What is the point in denoting that it is not vacuumed?


Solution

  • The statement from the documentation is true, but misleading. Autovacuum does not process the partitioned table itself, but it processes the partitions, which are regular PostgreSQL tables. So dead tuples get removed, the visibility map gets updated, and so on. In short, there is nothing to worry about as far as vacuuming is concerned. Remember that the partitioned table itself does not hold any data!

    What the documentation warns you about is ANALYZE. Autovacuum also launches automatic ANALYZE jobs to collect accurate table statistics. This will be work fine on the partitions, but there are no table statistics collected on the partitioned table itself, so you have to run ANALYZE manually on the partitioned table to get these data. In practice, I find that not to be a problem, since the optimizer generates plans for each individual partition anyway, and there it has accurate statistics.