postgresqlforeign-data-wrapper

slow performance with select query on postgresql remote fdw table


I have 2 postgresql instances on different computers(X and Y) on the same network. Y server has PgAdmin installed on and connected to both instances.

On the Y instance, I created FDW connection for a database on X.

My problem is It takes over 30 seconds when I run a select query over foreign table. If I run the same query into direct to X server from PgAdmin, it takes under 1 sec and I though the reason for slow performance couldn't be the network issue.

I tried both with on and off for use_remote_estimate but since my foreign table keeps updating with new data continuously, I decided to keep it off.

My query is below

SELECT q_num,
    count(*)::integer AS total_calls,
    count(1) FILTER (WHERE na_code = 0 AND fail_code = 0)::integer AS answered,
    count(1) FILTER (WHERE na_code = 0 AND fail_code = 0 AND (wait + poll) <= '00:00:15'::interval)::integer AS answered_15,
    count(1) FILTER (WHERE na_code = 0 AND fail_code = 0 AND (wait + poll) <= '00:00:20'::interval)::integer AS answered_20,
    count(1) FILTER (WHERE na_code = 0 AND fail_code = 0 AND (wait + poll) <= '00:00:30'::interval)::integer AS answered_30,
    count(1) FILTER (WHERE na_code <> 0 OR fail_code <> 0)::integer AS missed,
    count(1) FILTER (WHERE na_code <> 0 OR fail_code <> 0 AND (wait + poll) > '00:00:15'::interval)::integer AS missed_15,
    count(1) FILTER (WHERE na_code <> 0 OR fail_code <> 0 AND (wait + poll) > '00:00:20'::interval)::integer AS missed_20,
    count(1) FILTER (WHERE na_code <> 0 OR fail_code <> 0 AND (wait + poll) > '00:00:30'::interval)::integer AS missed_30,
    EXTRACT(epoch FROM sum(wait + poll))::integer AS total_waiting,
    EXTRACT(epoch FROM max(wait + poll))::integer AS max_waiting
FROM foreign_table
WHERE time_start > (CURRENT_TIMESTAMP - '03:15:00'::interval)
GROUP BY q_num

time_start column has index on the remote server.

How can I improve the performance?

EDIT: Just tried to increase fetch_size but it still takes more than 20 seconds.

@Frank Heikens Here is the index

CREATE INDEX IF NOT EXISTS original_table_time_start_idx
    ON public.original_table USING btree
    (time_start ASC NULLS LAST)
    TABLESPACE pg_default;

Postgresql version on X instance

PostgreSQL 11.9, compiled by Visual C++ build 1914, 64-bit

Postgresql version on Y instance

PostgreSQL 14.5, compiled by Visual C++ build 1914, 64-bit

EXPLAIN (ANALYZE, VERBOSE, BUFFERS) on X instance

GroupAggregate  (cost=8.51..8.91 rows=2 width=49) (actual time=0.148..0.150 rows=1 loops=1)    
Output: q_num, (count(*))::integer, (count(1) FILTER (WHERE ((na_code = 0) AND (fail_code = 0))))::integer, (count(1) FILTER (WHERE ((na_code = 0) AND (fail_code = 0) AND ((wait + poll) <= '00:00:15'::interval))))::integer, (count(1) FILTER (WHERE ((na_code = 0) AND (fail_code = 0) AND ((wait + poll) <= '00:00:20'::interval))))::integer, (count(1) FILTER (WHERE ((na_code = 0) AND (fail_code = 0) AND ((wait + poll) <= '00:00:30'::interval))))::integer, (count(1) FILTER (WHERE ((na_code <> 0) OR (fail_code <> 0))))::integer, (count(1) FILTER (WHERE ((na_code <> 0) OR ((fail_code <> 0) AND ((wait + poll) > '00:00:15'::interval)))))::integer, (count(1) FILTER (WHERE ((na_code <> 0) OR ((fail_code <> 0) AND ((wait + poll) > '00:00:20'::interval)))))::integer, (count(1) FILTER (WHERE ((na_code <> 0) OR ((fail_code <> 0) AND ((wait + poll) > '00:00:30'::interval)))))::integer, (date_part('epoch'::text, sum((wait + poll))))::integer, (date_part('epoch'::text, max((wait + poll))))::integer
Group Key: original_table.q_num    
Buffers: shared hit=4   
->  Sort (cost=8.51..8.51 rows=3 width=45) (actual time=0.099..0.101 rows=3 loops=1) 
Output: q_num, na_code, fail_code, wait, poll 
Sort Key: original_table.q_num 
Sort Method: quicksort  
Memory: 25kB 
Buffers: shared hit=4
->  Index Scan using original_table_time_start_idx on public.original_table  (cost=0.43..8.48 rows=3 width=45) (actual time=0.052..0.060 rows=3 loops=1) 
Output: q_num, na_code, fail_code, wait, poll 
Index Cond: (original_table.time_start > (CURRENT_TIMESTAMP - '00:15:00'::interval)) 
Buffers: shared hit=4 
Planning Time: 1.383 ms 
Execution Time: 0.638 ms

EXPLAIN (ANALYZE, VERBOSE, BUFFERS) on Y instance

GroupAggregate  (cost=48287.41..48295.41 rows=6 width=49) (actual time=22846.842..22846.847 rows=1 loops=1)   
Output: q_num, (count(*))::integer, (count(1) FILTER (WHERE ((na_code = 0) AND (fail_code = 0))))::integer, (count(1) FILTER (WHERE ((na_code = 0) AND (fail_code = 0) AND ((wait + poll) <= '00:00:15'::interval))))::integer, (count(1) FILTER (WHERE ((na_code = 0) AND (fail_code = 0) AND ((wait + poll) <= '00:00:20'::interval))))::integer, (count(1) FILTER (WHERE ((na_code = 0) AND (fail_code = 0) AND ((wait + poll) <= '00:00:30'::interval))))::integer, (count(1) FILTER (WHERE ((na_code <> 0) OR (fail_code <> 0))))::integer, (count(1) FILTER (WHERE ((na_code <> 0) OR ((fail_code <> 0) AND ((wait + poll) > '00:00:15'::interval)))))::integer, (count(1) FILTER (WHERE ((na_code <> 0) OR ((fail_code <> 0) AND ((wait + poll) > '00:00:20'::interval)))))::integer, (count(1) FILTER (WHERE ((na_code <> 0) OR ((fail_code <> 0) AND ((wait + poll) > '00:00:30'::interval)))))::integer, (EXTRACT(epoch FROM sum((wait + poll))))::integer, (EXTRACT(epoch FROM max((wait + poll))))::integer
Group Key: foreign_table.q_num   
->  Sort  (cost=48287.41..48287.59 rows=72 width=45) (actual time=22846.755..22846.759 rows=1 loops=1)
Output: q_num, na_code, fail_code, wait, poll
Sort Key: foreign_table.q_num
Sort Method: quicksort  
Memory: 25kB
->  Foreign Scan on public.foreign_table  (cost=100.00..48285.19 rows=72 width=45) (actual time=10927.554..22846.717 rows=1 loops=1)
Output: q_num, na_code, fail_code, wait, poll
Filter: (foreign_table.time_start > (CURRENT_TIMESTAMP - '03:15:00'::interval))
Rows Removed by Filter: 720077
Remote SQL: SELECT q_num, time_start, wait, poll, na_code, fail_code FROM public.original_table 
Planning Time: 0.522 ms 
Execution Time: 22849.716 ms

I hope these informations helps.


Solution

  • expressions using CURRENT_TIMESTAMP are not considered shippable to the foreign side. You could argue that it should be shippable, as you could just materialize it into a literal and then ship the literal, but that is now how it currently works. So instead all rows need to be read back and have their timestamps tested locally.

    EDIT: Just tried to increase fetch_size but it still takes more than 20 seconds.

    What did you increase it to? Maybe increase it yet more.