ruby-on-railspostgresqldelayed-job

Error while reserving job: PG::ConnectionBad: PQconsumeInput() server closed the connection unexpectedly


Delayed jobs service is stopping after less than 1 hour of its starting time with the following log:

I, [2018-02-26T06:00:26.580458 #11439]  INFO -- : 2018-02-26T06:00:26+0400: [Worker(delayed_job host:myhost pid:11439)] Starting job worker
I, [2018-02-26T06:00:26.664929 #11439]  INFO -- : 2018-02-26T06:00:26+0400: [Worker(delayed_job host:myhost pid:11439)] Job ActiveJob::QueueAdapters::DelayedJobAdapter::JobWrapper (id=41019) RUNNING
I, [2018-02-26T06:00:27.342994 #11439]  INFO -- : 2018-02-26T06:00:27+0400: [Worker(delayed_job host:myhost pid:11439)] Job ActiveJob::QueueAdapters::DelayedJobAdapter::JobWrapper (id=41019) COMPLETED after 0.6779
I, [2018-02-26T06:00:27.346526 #11439]  INFO -- : 2018-02-26T06:00:27+0400: [Worker(delayed_job host:myhost pid:11439)] Job ActiveJob::QueueAdapters::DelayedJobAdapter::JobWrapper (id=41020) RUNNING
I, [2018-02-26T06:00:27.470858 #11439]  INFO -- : 2018-02-26T06:00:27+0400: [Worker(delayed_job host:myhost pid:11439)] Job ActiveJob::QueueAdapters::DelayedJobAdapter::JobWrapper (id=41020) COMPLETED after 0.1242
I, [2018-02-26T06:00:27.474937 #11439]  INFO -- : 2018-02-26T06:00:27+0400: [Worker(delayed_job host:myhost pid:11439)] Job ActiveJob::QueueAdapters::DelayedJobAdapter::JobWrapper (id=41024) RUNNING
I, [2018-02-26T06:00:27.603043 #11439]  INFO -- : 2018-02-26T06:00:27+0400: [Worker(delayed_job host:myhost pid:11439)] Job ActiveJob::QueueAdapters::DelayedJobAdapter::JobWrapper (id=41024) COMPLETED after 0.1280
I, [2018-02-26T06:00:27.606702 #11439]  INFO -- : 2018-02-26T06:00:27+0400: [Worker(delayed_job host:myhost pid:11439)] Job ActiveJob::QueueAdapters::DelayedJobAdapter::JobWrapper (id=41025) RUNNING
I, [2018-02-26T06:00:27.725715 #11439]  INFO -- : 2018-02-26T06:00:27+0400: [Worker(delayed_job host:myhost pid:11439)] Job ActiveJob::QueueAdapters::DelayedJobAdapter::JobWrapper (id=41025) COMPLETED after 0.1189
I, [2018-02-26T06:00:27.728021 #11439]  INFO -- : 2018-02-26T06:00:27+0400: [Worker(delayed_job host:myhost pid:11439)] 4 jobs processed at 3.4871 j/s, 0 failed
I, [2018-02-26T06:14:48.287220 #11439]  INFO -- : 2018-02-26T06:14:48+0400: [Worker(delayed_job host:myhost pid:11439)] Job ActiveJob::QueueAdapters::DelayedJobAdapter::JobWrapper (id=41027) RUNNING
I, [2018-02-26T06:14:48.414079 #11439]  INFO -- : 2018-02-26T06:14:48+0400: [Worker(delayed_job host:myhost pid:11439)] Job ActiveJob::QueueAdapters::DelayedJobAdapter::JobWrapper (id=41027) COMPLETED after 0.1267
I, [2018-02-26T06:14:48.416335 #11439]  INFO -- : 2018-02-26T06:14:48+0400: [Worker(delayed_job host:myhost pid:11439)] 1 jobs processed at 7.3771 j/s, 0 failed
I, [2018-02-26T06:16:33.492435 #11439]  INFO -- : 2018-02-26T06:16:33+0400: [Worker(delayed_job host:myhost pid:11439)] Job ActiveJob::QueueAdapters::DelayedJobAdapter::JobWrapper (id=41028) RUNNING
I, [2018-02-26T06:16:33.613684 #11439]  INFO -- : 2018-02-26T06:16:33+0400: [Worker(delayed_job host:myhost pid:11439)] Job ActiveJob::QueueAdapters::DelayedJobAdapter::JobWrapper (id=41028) COMPLETED after 0.1211
I, [2018-02-26T06:16:33.615953 #11439]  INFO -- : 2018-02-26T06:16:33+0400: [Worker(delayed_job host:myhost pid:11439)] 1 jobs processed at 7.8121 j/s, 0 failed
I, [2018-02-26T06:22:33.853678 #11439]  INFO -- : 2018-02-26T06:22:33+0400: [Worker(delayed_job host:myhost pid:11439)] Job ActiveJob::QueueAdapters::DelayedJobAdapter::JobWrapper (id=41030) RUNNING
I, [2018-02-26T06:22:33.967338 #11439]  INFO -- : 2018-02-26T06:22:33+0400: [Worker(delayed_job host:myhost pid:11439)] Job ActiveJob::QueueAdapters::DelayedJobAdapter::JobWrapper (id=41030) COMPLETED after 0.1136
I, [2018-02-26T06:22:33.970307 #11439]  INFO -- : 2018-02-26T06:22:33+0400: [Worker(delayed_job host:myhost pid:11439)] 1 jobs processed at 8.2735 j/s, 0 failed
I, [2018-02-26T06:38:24.595215 #11439]  INFO -- : 2018-02-26T06:38:24+0400: [Worker(delayed_job host:myhost pid:11439)] Error while reserving job: PG::ConnectionBad: PQconsumeInput() server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
: UPDATE "delayed_jobs" SET locked_at = '2018-02-26 02:38:24.593926', locked_by = 'delayed_job host:myhost pid:11439' WHERE id IN (SELECT  "delayed_jobs"."id" FROM "delayed_jobs" WHERE ((run_at <= '2018-02-26 02:38:24.593351' AND (locked_at IS NULL OR locked_at < '2018-02-25 22:38:24.593398') OR locked_by = 'delayed_job host:myhost pid:11439') AND failed_at IS NULL)  ORDER BY priority ASC, run_at ASC LIMIT 1 FOR UPDATE) RETURNING *
I, [2018-02-26T06:38:29.597026 #11439]  INFO -- : 2018-02-26T06:38:29+0400: [Worker(delayed_job host:myhost pid:11439)] Error while reserving job: PG::ConnectionBad: PQsocket() can't get socket descriptor: UPDATE "delayed_jobs" SET locked_at = '2018-02-26 02:38:29.596061', locked_by = 'delayed_job host:myhost pid:11439' WHERE id IN (SELECT  "delayed_jobs"."id" FROM "delayed_jobs" WHERE ((run_at <= '2018-02-26 02:38:29.595477' AND (locked_at IS NULL OR locked_at < '2018-02-25 22:38:29.595524') OR locked_by = 'delayed_job host:myhost pid:11439') AND failed_at IS NULL)  ORDER BY priority ASC, run_at ASC LIMIT 1 FOR UPDATE) RETURNING *
I, [2018-02-26T06:38:34.598775 #11439]  INFO -- : 2018-02-26T06:38:34+0400: [Worker(delayed_job host:myhost pid:11439)] Error while reserving job: PG::ConnectionBad: PQsocket() can't get socket descriptor: UPDATE "delayed_jobs" SET locked_at = '2018-02-26 02:38:34.597856', locked_by = 'delayed_job host:myhost pid:11439' WHERE id IN (SELECT  "delayed_jobs"."id" FROM "delayed_jobs" WHERE ((run_at <= '2018-02-26 02:38:34.597278' AND (locked_at IS NULL OR locked_at < '2018-02-25 22:38:34.597325') OR locked_by = 'delayed_job host:myhost pid:11439') AND failed_at IS NULL)  ORDER BY priority ASC, run_at ASC LIMIT 1 FOR UPDATE) RETURNING *
I, [2018-02-26T06:38:39.600772 #11439]  INFO -- : 2018-02-26T06:38:39+0400: [Worker(delayed_job host:myhost pid:11439)] Error while reserving job: PG::ConnectionBad: PQsocket() can't get socket descriptor: UPDATE "delayed_jobs" SET locked_at = '2018-02-26 02:38:39.599713', locked_by = 'delayed_job host:myhost pid:11439' WHERE id IN (SELECT  "delayed_jobs"."id" FROM "delayed_jobs" WHERE ((run_at <= '2018-02-26 02:38:39.599063' AND (locked_at IS NULL OR locked_at < '2018-02-25 22:38:39.599110') OR locked_by = 'delayed_job host:myhost pid:11439') AND failed_at IS NULL)  ORDER BY priority ASC, run_at ASC LIMIT 1 FOR UPDATE) RETURNING *
I, [2018-02-26T06:38:44.602546 #11439]  INFO -- : 2018-02-26T06:38:44+0400: [Worker(delayed_job host:myhost pid:11439)] Error while reserving job: PG::ConnectionBad: PQsocket() can't get socket descriptor: UPDATE "delayed_jobs" SET locked_at = '2018-02-26 02:38:44.601568', locked_by = 'delayed_job host:myhost pid:11439' WHERE id IN (SELECT  "delayed_jobs"."id" FROM "delayed_jobs" WHERE ((run_at <= '2018-02-26 02:38:44.601024' AND (locked_at IS NULL OR locked_at < '2018-02-25 22:38:44.601072') OR locked_by = 'delayed_job host:myhost pid:11439') AND failed_at IS NULL)  ORDER BY priority ASC, run_at ASC LIMIT 1 FOR UPDATE) RETURNING *
I, [2018-02-26T06:38:49.604286 #11439]  INFO -- : 2018-02-26T06:38:49+0400: [Worker(delayed_job host:myhost pid:11439)] Error while reserving job: PG::ConnectionBad: PQsocket() can't get socket descriptor: UPDATE "delayed_jobs" SET locked_at = '2018-02-26 02:38:49.603369', locked_by = 'delayed_job host:myhost pid:11439' WHERE id IN (SELECT  "delayed_jobs"."id" FROM "delayed_jobs" WHERE ((run_at <= '2018-02-26 02:38:49.602808' AND (locked_at IS NULL OR locked_at < '2018-02-25 22:38:49.602863') OR locked_by = 'delayed_job host:myhost pid:11439') AND failed_at IS NULL)  ORDER BY priority ASC, run_at ASC LIMIT 1 FOR UPDATE) RETURNING *
I, [2018-02-26T06:38:54.606189 #11439]  INFO -- : 2018-02-26T06:38:54+0400: [Worker(delayed_job host:myhost pid:11439)] Error while reserving job: PG::ConnectionBad: PQsocket() can't get socket descriptor: UPDATE "delayed_jobs" SET locked_at = '2018-02-26 02:38:54.605111', locked_by = 'delayed_job host:myhost pid:11439' WHERE id IN (SELECT  "delayed_jobs"."id" FROM "delayed_jobs" WHERE ((run_at <= '2018-02-26 02:38:54.604563' AND (locked_at IS NULL OR locked_at < '2018-02-25 22:38:54.604613') OR locked_by = 'delayed_job host:myhost pid:11439') AND failed_at IS NULL)  ORDER BY priority ASC, run_at ASC LIMIT 1 FOR UPDATE) RETURNING *
I, [2018-02-26T06:38:59.608610 #11439]  INFO -- : 2018-02-26T06:38:59+0400: [Worker(delayed_job host:myhost pid:11439)] Error while reserving job: PG::ConnectionBad: PQsocket() can't get socket descriptor: UPDATE "delayed_jobs" SET locked_at = '2018-02-26 02:38:59.607243', locked_by = 'delayed_job host:myhost pid:11439' WHERE id IN (SELECT  "delayed_jobs"."id" FROM "delayed_jobs" WHERE ((run_at <= '2018-02-26 02:38:59.606483' AND (locked_at IS NULL OR locked_at < '2018-02-25 22:38:59.606539') OR locked_by = 'delayed_job host:myhost pid:11439') AND failed_at IS NULL)  ORDER BY priority ASC, run_at ASC LIMIT 1 FOR UPDATE) RETURNING *
I, [2018-02-26T06:39:04.610465 #11439]  INFO -- : 2018-02-26T06:39:04+0400: [Worker(delayed_job host:myhost pid:11439)] Error while reserving job: PG::ConnectionBad: PQsocket() can't get socket descriptor: UPDATE "delayed_jobs" SET locked_at = '2018-02-26 02:39:04.609457', locked_by = 'delayed_job host:myhost pid:11439' WHERE id IN (SELECT  "delayed_jobs"."id" FROM "delayed_jobs" WHERE ((run_at <= '2018-02-26 02:39:04.608876' AND (locked_at IS NULL OR locked_at < '2018-02-25 22:39:04.608926') OR locked_by = 'delayed_job host:myhost pid:11439') AND failed_at IS NULL)  ORDER BY priority ASC, run_at ASC LIMIT 1 FOR UPDATE) RETURNING *
I, [2018-02-26T06:39:09.612201 #11439]  INFO -- : 2018-02-26T06:39:09+0400: [Worker(delayed_job host:myhost pid:11439)] Error while reserving job: PG::ConnectionBad: PQsocket() can't get socket descriptor: UPDATE "delayed_jobs" SET locked_at = '2018-02-26 02:39:09.611263', locked_by = 'delayed_job host:myhost pid:11439' WHERE id IN (SELECT  "delayed_jobs"."id" FROM "delayed_jobs" WHERE ((run_at <= '2018-02-26 02:39:09.610721' AND (locked_at IS NULL OR locked_at < '2018-02-25 22:39:09.610770') OR locked_by = 'delayed_job host:myhost pid:11439') AND failed_at IS NULL)  ORDER BY priority ASC, run_at ASC LIMIT 1 FOR UPDATE) RETURNING *

database.yml

production:
 adapter: postgresql
 encoding: unicode
 database: myapp
 port: 5432
 pool: 5
 username: username
 password: password
 reconnect: true

Please can anyone just explain the reason of this error and how to avoid it:

Error while reserving job: PG::ConnectionBad: PQconsumeInput() server closed the connection unexpectedly

Update:

I believe this issue is not related to delayed jobs, since I'm getting same error while I'm doing some normal DB queries. So DB is restarting for some reason and this why delayed job service is stopping.

As per commented by @LaurenzAlbe , below are some issues found in /var/log/postgresql/postgresql-9.3-main.log:

LOG:  connection received: host=10.10.10.15 port=57322
LOG:  replication connection authorized: user=MyDBUser
FATAL:  must be superuser or replication role to start walsender
LOG:  could not receive data from client: Connection reset by peer
LOG:  disconnection: session time: 0:06:18.911 user=MyDBUser database=MyDB host=127.0.0.1 port=34040
./systemd: 36: kill: Operation not permitted
WARNING:  skipping "delayed_jobs" --- only table or database owner can analyze it

Solution

  • After some research, I believe delayed jobs have a memory leak issue, which is solved by using config.cache_classes = true because delayed jobs keeps reloading classes every now and then.

    I had the same issue where delayed job process used more than 90%+ of memory and it crashed with the same error, even though I had cached classes, but I was running delayed jobs without RAILS_ENV, which caused them to load in development and ignore the other environment settings.