[SOLVED] PostgreSQL: even read access changes data files disk leading to large incremental backups using pgbackrest

PostgreSQL: even read access changes data files disk leading to large incremental backups using pgbackrest

We are using pgbackrest to backup our database to Amazon S3. We do full backups once a week and an incremental backup every other day. Size of our database is around 1TB, a full backup is around 600GB and an incremental backup is also around 400GB!

We found out that even read access (pure select statements) on the database has the effect that the underlying data files (in /usr/local/pgsql/data/base/xxxxxx) change. This results in large incremental backups and also in very large storage (costs) on Amazon S3.

Usually the files with low index names (e.g. 391089.1) change on read access.

On an update, we see changes in one or more files - the index could correlate to the age of the row in the table.

Some more facts:

Postgres version 13.1
Database is running in docker container (docker version 20.10.0)
OS is CentOS 7

We see the phenomenon on multiple servers.

Can someone explain, why postgresql changes data files on pure read access? We tested on a pure database without any other resources accessing the database.

Solution

This is normal. Some cases I can think of right away are:

a SELECT or other SQL statement setting a hint bit

This is a shortcut for subsequent statements that access the data, so they don't have t consult the commit log any more.
a SELECT ... FOR UPDATE writing a row lock
autovacuum removing dead row versions

These are leftovers from DELETE or UPDATE.
autovacuum freezing old visible row versions

This is necessary to prevent data corruption if the transaction ID counter wraps around.

The only way to fairly reliably prevent PostgreSQL from modifying a table in the future is:

never perform an INSERT, UPDATE or DELETE on it
run VACUUM (FREEZE) on the table and make sure that there are no concurrent transactions