ruby-on-railsrubyconnection-poolingruby-on-rails-6.1read-replication

ActiveRecord::ConnectionTimeoutError for background jobs with read replica


We recently implemented a read-replica in our Rails 6 application. We are only using it for specific transactions by wrapping them in a block:

ActiveRecord::Base.connected_to(role: :reading) do
  # read query here
end

In application_record.rb we have

class ApplicationRecord < ActiveRecord::Base
  self.abstract_class = true

  connects_to database: {
    writing: :primary,
    reading: :primary_replica
  }
end

In our database.yaml we have:

production:
  primary:
    url: <%= ENV['DATABASE_CONNECTION_POOL_URL'] || ENV['DATABASE_URL'] %>
    pool: <%= ENV["DB_POOL"] || ENV['RAILS_MAX_THREADS'] || 5 %>
    reaping_frequency: <%= ENV["REAPING_FREQUENCY"] || 10 %>
    adapter: postgresql
    advisory_locks: false
  primary_replica:
    url: <%= ENV['REPLICA_DATABASE_POOL_URL'] %>
    pool: <%= ENV["DB_POOL"] || ENV['RAILS_MAX_THREADS'] || 5 %>
    reaping_frequency: <%= ENV["REAPING_FREQUENCY"] || 10 %>
    adapter: postgresql
    advisory_locks: false
    replica: true

RAILS_MAX_THREADS is set to 12 and REAPING_FREQUENCY is set to 10 on our environment.

Whenever we start calling a background jobs (via sidekiq) and our queue builds up we start getting failures that are throwing the error: ActiveRecord::ConnectionTimeoutError: could not obtain a connection from the pool within 5.000 seconds (waited 5.366 seconds); all pooled connections were in use

I cannot replicate this behavior in our mirrored development environments even if I queue up a bunch of jobs and workers to process them. I'm at a loss on what I am missing with our connection pooling config. Most of the documentation I've read for rails 6 says we don't need to set any specific config for pooling in sidekiq.rb or puma.rb

We are running ruby 3.1.2 and rails 6.1.6

Any help or guidance here would be greatly appreciated.


Solution

  • Our concurrency/threading needed to be tuned. Had nothing to do with the replica. This video was extremely straight forward and resolved our issue: https://www.youtube.com/watch?v=XOfELjqm188