sqlpostgresqlamazon-web-servicesread-replication

Index not used in Postgresql read-replica. Why?


I have a relatively large table on an Amazon RDS production database (in the range of 2M records). I wish to group on a number of fields, including the month of a date (server_time) in the table. To speed things up a bit, I have created an index on the master database as follows:

create index on build_requests(group_id, artifact_id, account_id, number_of_interfaces, date_trunc('month', server_build_time));

Then, as you would expect, the query to group the data uses the index on the master:

GroupAggregate  (cost=0.55..311308.09 rows=1633231 width=85)
  Group Key: group_id, artifact_id, account_id, number_of_interfaces, date_trunc('month'::text, server_build_time)
  ->  Index Scan using build_requests_group_id_artifact_id_account_id_number_of_in_idx on build_requests  (cost=0.55..262417.68 rows=1898335 width=85)

However, after waiting for over an hour, the read replica still does not use the index:

GroupAggregate  (cost=434678.88..488313.41 rows=1633179 width=85)
  Group Key: group_id, artifact_id, account_id, number_of_interfaces, (date_trunc('month'::text, server_build_time))
  ->  Sort  (cost=434678.88..439424.56 rows=1898274 width=85)
        Sort Key: group_id, artifact_id, account_id, number_of_interfaces, (date_trunc('month'::text, server_build_time))
        ->  Seq Scan on build_requests  (cost=0.00..55053.43 rows=1898274 width=85)

Logging into the read replica with pgadmin, I see the index is present however. This is a problem, since the slow speed of the query on the read replica (5 minutes vs 3 seconds), causes other queries that include this query through postgres_fdw (cross database query) to return ssl connection resets (probably timeouts?).

Any idea why the read replica does not pick up/use the index I define on the master, and how I can remedy this? The query I execute on both master and read replica is identical:

SELECT group_id, artifact_id, 
       account_id, number_of_interfaces, 
       date_trunc('month', server_build_time) as server_build_month, 
       count(*)
FROM build_requests
GROUP BY group_id, artifact_id, 
         account_id, number_of_interfaces, 
         date_trunc('month', server_build_time);

Thanks for your help!


Solution

  • The problem appeared to be with the types of the amazon RDS instances. The original RDS was a t2-medium, whereas the read replica was only a micro instance. After scaling the read replica to medium, the replica too, used the index.

    Also, even when the read replica did use the index, simply executing the above query through postgres_fdw caused connection timeouts. It wasn't until I created a view on the master using the index, that the query executed without problem.