sqlpostgresqlpostgresql-9.0

A postgresql query won't finish


on postgresl 9.0 we have a sql query:

SELECT count(*) FROM lane 
WHERE not exists 
    (SELECT 1 FROM id_map 
    WHERE id_map.new_id=lane.lane_id  
    and id_map.column_name='lane_id' 
    and id_map.table_name='lane') 
and lane.lane_id is not null;

that usually takes somewhat around 1.5 seconds to finish. Here's the explain plan: http://explain.depesz.com/s/axNN

Sometimes though this query hangs and will not finish. It may run even for 11 hours without success. It then takes up 100% of the cpu.

The only locks this query takes are "AccessShareLock"s and they are all granted.

SELECT a.datname,
         c.relname,
         l.transactionid,
         l.mode,
         l.granted,
         a.usename,
         a.current_query, 
         a.query_start,
         age(now(), a.query_start) AS "age", 
         a.procpid 
    FROM  pg_stat_activity a
     JOIN pg_locks         l ON l.pid = a.procpid
     JOIN pg_class         c ON c.oid = l.relation
    ORDER BY a.query_start;

The query is run as a part of a java process that connects to a database using a connection pool and performs sequentially similar select queries of this format:

SELECT count(*) FROM {}  WHERE not exists (SELECT 1 FROM id_map WHERE id_map.new_id={}.{} and id_map.column_name='{}' and id_map.table_name='{}') and {}.{} is not null

no updates or delete are happening in parallel to this process so I don't think vacuuming can be the issue here. Prior to running the entire process (so before 6 queries of this sort are run) an analyze on all the tables were run.

postgres logs don't show any entry for the long running queries because they never finish and thus never get to be logged.

Any idea what may cause this kind of behavior and how to prevent it from happening?

the explain plan without analyze:

Aggregate  (cost=874337.91..874337.92 rows=1 width=0)
  ->  Nested Loop Anti Join  (cost=0.00..870424.70 rows=1565283 width=0)
        Join Filter: (id_map.new_id = lane.lane_id)
        ->  Seq Scan on lane  (cost=0.00..30281.84 rows=1565284 width=8)
              Filter: (lane_id IS NOT NULL)
        ->  Materialize  (cost=0.00..816663.60 rows=1 width=8)
              ->  Seq Scan on id_map  (cost=0.00..816663.60 rows=1 width=8)
                    Filter: (((column_name)::text = 'lane_id'::text) AND ((table_name)::text = 'lane'::text))

Solution

  • VACUUM ANALYZE VERBOSE;
    

    refreshing statistics shall help db to choose optimal plan - not nested loops, which I believe take 100% CPU