postgresqlvacuum

Postgres: "vacuum" command does not clean up dead tuples


We have a postgres database in Amazon RDS. Initially, we needed to load large amount of data quickly, so autovacuum was turned off according to the best practice suggestion from Amazon. Recently I noticed some performance issue when running queries. Then I realized it has not been vacuumed for a long time. As it turns out many tables have lots of dead tuples.

enter image description here

Surprisingly, even after I manually ran vacuum commands on some of the tables, it did not seem to remove these dead tuples at all. vacuum full takes too long to finish which usually ends up timed out after a whole night.

Why does vacuum command not work? What are my other options, restart the instance?


Solution

  • Use VACUUM (VERBOSE) to get detailed statistics of what it is doing and why.

    There are four reasons why dead tuples cannot be removed:

    1. There is a long running transaction that has not been closed. You can find the bad boys with

       SELECT pid, datname, usename, state, backend_xmin
       FROM pg_stat_activity
       WHERE backend_xmin IS NOT NULL
       ORDER BY age(backend_xmin) DESC;
      

      You can get rid of a transaction with pg_cancel_backend() or pg_terminate_backend().

    2. There are prepared transactions which have not been commited. You can find them with

       SELECT gid, prepared, owner, database, transaction
       FROM pg_prepared_xacts
       ORDER BY age(transaction) DESC;
      

      User COMMIT PREPARED or ROLLBACK PREPARED to close them.

    3. There are replication slots which are not used. Find them with

       SELECT slot_name, slot_type, database, xmin
       FROM pg_replication_slots
       ORDER BY age(xmin) DESC;
      

      Use pg_drop_replication_slot() to delete an unused replication slot.

    4. There is a standby server with hot_standby_feedback = on and a long running query.

      Find standby servers that hold back VACUUM progress with

      SELECT application_name, client_addr, backend_xmin
      FROM pg_stat_replication
      ORDER BY age(backend_xmin) DESC;
      

    Look at my article for an in-depth discussion.