We have an AlloyDB instance set up with a read pool. Inside our application, we route database queries to either the primary node or the read pool, depending on whether the operation itself is a SELECT or not. This has been working well; however, we have occasionally encountered errors that seem to be the result of changes not being replicated to the read pool. Specifically:
It was my understanding that the replicas would wait until any relevant WAL logs were processed before processing a query, ensuring that their state was always in sync with the primary node. Are there conditions under which the read pool state could be stale or get out of sync with the primary node? We'd like to understand what could explain the behavior we're seeing and what can we do remedy it.
AlloyDB read pools are implemented (currently) the same as Postgres read replicas, so there IS a lag there depending on workload (e.g. a 64 vCPU primary with a 2 vCPU read replica might never catch up). The primary and replicas will be consistent as of the log sequence number it's caught up to, but there is a lag there. How much lag depends on workload and machine differentials. We're working on some improvements over out of the box replication lag as well so it won't be dependent on the primary's load. It's actively being worked on, but I'm not sure when that'll make it in exactly.