postgresql database-replication wal checkpointing

PostgreSQL - checkpoint interval behaviour in different WAL levels

I couldn't find a definite answer for my concerns, so I might as well ask it from you guys!

Long story short: We need to perform an UPDATE command on roughly 400M rows. The command could be modified to work in batches I know, but that is a different topic. Our problem is that the WAL gets too big and we run out of disk space. I'm wondering how the checkpoint intervals work with different WAL levels. To put it simply the documentation says that a longer checkpoint interval "triggers" less full page writes, which results in a smaller WAL. What I can't find is how this change behaves with different wal_level settings.

DB version: Postgres14.4

1. Does it have any relevance with a minimal wal_level setting? (Considering it removes almost all logging.)

2. Does it break the replicas when the wal_level is set to replica or higher? (It isn't obvious to me based on different articles and the documentation, but I assume the replicas should be fine since all the changes are logged despite of fewer full page/block writes, and it also can be benefitial i.e. decreased WAL size.)

We are in a position where a full backup and shutdown of related application is possible, so the minimal wal_level setting could work, but I'm interested in different solutions as well, feel free to share some thoughts on it.

Cheers!

Solution

Does it have any relevance with a minimal wal_level setting? (Considering it removes almost all logging.)

It doesn't. With minimal, you only skip WAL logging of a few things, like COPY into a table which was created or truncated in the same transaction, or the creation of indexes. Those special cases wouldn't apply to a bulk UPDATE.

To solve the problem, you first need to figure out what the root problem is. Are you so close to the out-of-space condition under normal conditions than any stress at all can push you over? Do you have replication slots, and the standbys can't keep up? Do you have an archive_command that can't keep up? Is your IO system so overwhelmed that the checkpoints can't finish in time despite trying as fast as they can? Is you max_wal_size writing checks your harddrive can't cash?