on postgresl 9.0 we have a sql query:
SELECT count(*) FROM lane
WHERE not exists
(SELECT 1 FROM id_map
WHERE id_map.new_id=lane.lane_id
and id_map.column_name='lane_id'
and id_map.table_name='lane')
and lane.lane_id is not null;
that usually takes somewhat around 1.5 seconds to finish. Here's the explain plan: http://explain.depesz.com/s/axNN
Sometimes though this query hangs and will not finish. It may run even for 11 hours without success. It then takes up 100% of the cpu.
The only locks this query takes are "AccessShareLock"s and they are all granted.
SELECT a.datname,
c.relname,
l.transactionid,
l.mode,
l.granted,
a.usename,
a.current_query,
a.query_start,
age(now(), a.query_start) AS "age",
a.procpid
FROM pg_stat_activity a
JOIN pg_locks l ON l.pid = a.procpid
JOIN pg_class c ON c.oid = l.relation
ORDER BY a.query_start;
The query is run as a part of a java process that connects to a database using a connection pool and performs sequentially similar select queries of this format:
SELECT count(*) FROM {} WHERE not exists (SELECT 1 FROM id_map WHERE id_map.new_id={}.{} and id_map.column_name='{}' and id_map.table_name='{}') and {}.{} is not null
no updates or delete are happening in parallel to this process so I don't think vacuuming can be the issue here. Prior to running the entire process (so before 6 queries of this sort are run) an analyze on all the tables were run.
postgres logs don't show any entry for the long running queries because they never finish and thus never get to be logged.
Any idea what may cause this kind of behavior and how to prevent it from happening?
the explain plan without analyze:
Aggregate (cost=874337.91..874337.92 rows=1 width=0)
-> Nested Loop Anti Join (cost=0.00..870424.70 rows=1565283 width=0)
Join Filter: (id_map.new_id = lane.lane_id)
-> Seq Scan on lane (cost=0.00..30281.84 rows=1565284 width=8)
Filter: (lane_id IS NOT NULL)
-> Materialize (cost=0.00..816663.60 rows=1 width=8)
-> Seq Scan on id_map (cost=0.00..816663.60 rows=1 width=8)
Filter: (((column_name)::text = 'lane_id'::text) AND ((table_name)::text = 'lane'::text))
VACUUM ANALYZE VERBOSE;
refreshing statistics shall help db to choose optimal plan - not nested loops, which I believe take 100% CPU