postgresqltimescaledbread-replication

Postgres read replica tuning for parallel workloads. Queries canceled


I am currently setting up a read replica to offload a heavy parallel workload from master to a replica, so that the heavy parallel reads we are doing are not affecting other important task that the master should be doing. We are using timescaledb 2.17.2 and postgresql 14.15.

My issue is that when we are running our parallel workload on the read replica, our quieries are canceled with the error "canceling statement due to conflict with recovery DETAIL: User query might have needed to see row versions that must be removed."

After reading up on the issue, we though that our VACCUUM processes on the table that is queried was causing us to have to update the tables regularily. So we made sure no updates or deletes was done on the table in master, so that no vaccuum process needed to run. The issue persisted. We then set the max_standby_streaming_delay and max_standby_archive_delay parameters to 5 minutes. Our queries takes on average 16ms, and no transaction to the read replica should ever take more than 30 seconds. Still, we get the same issue, just every 5 minutes instead of the default 30 seconds. Hundreds of queries are just canceled while the database is applying changes.

A common suggestion in other posts is to enable hot_standby_feedback, but we do not see a reason for doing this, as the whole point of the read replica is to offload read queries that should not be affecting other processes.

The heavy parallel workload is run once an hour, and could take up to an hour to run. The work that is done does not really need more recent data than up to the time the work is started.

The question that remains is, do my application code really need to synchronously wait every "max_standby_streaming_delay and max_standby_archive_delay" - secounds, while wal changes are applied, or am i missing something crucial?


Solution

  • If you want your queries to finish, set max_standby_streaming_delay to -1 on the standby. The down side is that replication will occasionally start lagging. If that is a problem for you, you can enable hot_standby_feedback on the standby to reduce the frequency of these conflicts, but you will not be able to avoid them completely.

    See this article for a deeper discussion.