[SOLVED] PG_WAL is very big size

PG_WAL is very big size

I have a Postgres cluster with 3 nodes: ETCD+Patroni+Postgres13.

Now there was a problem of constantly growing pg_wal folder. It now contains 5127 files. After searching the internet, I found an article advising you to pay attention to the following database parameters (their meaning at the time of the case is this):

archive_mode off;
wal_level replica;
max_wal_size 1G;

SELECT * FROM pg_replication_slots;


postgres=# SELECT * FROM pg_replication_slots;
-[ RECORD 1 ]-------+------------
slot_name           | db2
plugin              |
slot_type           | physical
datoid              |
database            |
temporary           | f
active              | t
active_pid          | 2247228
xmin                |
catalog_xmin        |
restart_lsn         | 2D/D0ADC308
confirmed_flush_lsn |
wal_status          | reserved
safe_wal_size       |
-[ RECORD 2 ]-------+------------
slot_name           | db1
plugin              |
slot_type           | physical
datoid              |
database            |
temporary           | f
active              | t
active_pid          | 2247227
xmin                |
catalog_xmin        |
restart_lsn         | 2D/D0ADC308
confirmed_flush_lsn |
wal_status          | reserved
safe_wal_size       |

All other functionality of the Patroni cluster works (switchover, reinit, replication);

root@srvdb3:~# patronictl -c /etc/patroni/patroni.yml list
+ Cluster: mobile (7173650272103321745) --+----+-----------+
| Member | Host       | Role    | State   | TL | Lag in MB |
+--------+------------+---------+---------+----+-----------+
| db1    | 10.01.1.01 | Replica | running | 17 |         0 |
| db2    | 10.01.1.02 | Replica | running | 17 |         0 |
| db3    | 10.01.1.03 | Leader  | running | 17 |           |
+--------+------------+---------+---------+----+-----------+

Patroni patroni-edit:

loop_wait: 10
maximum_lag_on_failover: 1048576
postgresql:
  parameters:
    checkpoint_timeout: 30
    hot_standby: 'on'
    max_connections: '1100'
    max_replication_slots: 5
    max_wal_senders: 5
    shared_buffers: 2048MB
    wal_keep_segments: 5120
    wal_level: replica
  use_pg_rewind: true
  use_slots: true
retry_timeout: 10
ttl: 100

Help please, what could be the matter?

This is what I see in pg_stat_archiver:

postgres=# select * from pg_stat_archiver;
-[ RECORD 1 ]------+------------------------------
archived_count     | 0
last_archived_wal  |
last_archived_time |
failed_count       | 0
last_failed_wal    |
last_failed_time   |
stats_reset        | 2023-01-06 10:21:45.615312+00

Solution

If you have wal_keep_segments set to 5120, it is completely normal if you have 5127 WAL segments in pg_wal, because PostgreSQL will always retain at least 5120 old WAL segments. If that is too many for you, reduce the parameter. If you are using replication slots, the only disadvantage is that you might only be able to pg_rewind soon after a failover.