phplaravelredisqueuelaravel-horizon

Timed out job hangs for 15 or 30 minutes and then runs


We are having strange errors with our Horizon. Basically this is what happens: - A job is queued. And starts processing.

Seems like this can happen to any kind of job. For example if it's mailable that is queued, the user gets an email first, then 15 or 30 minutes later user gets another email. Same one.

Here our config files

config/database.php:

'redis' => [
        'client' => env('REDIS_CLIENT', 'predis'),
        'default' => [
            'host' => env('REDIS_HOST', '127.0.0.1'),
            'password' => env('REDIS_PASSWORD', null),
            'port' => env('REDIS_PORT', 6379),
            'database' => 0,
        ],
    ],

config/queue.php:

'redis' => [
    'driver' => 'redis',
    'connection' => 'default',
    'queue' => env('DEFAULT_QUEUE_NAME', 'default'),
    'retry_after' => 120, // 2 minutes
    'block_for' => null,
],

config/horizon.php:

'environments' => [
    'production' => [
        'supervisor-1' => [
            'connection' => env('HORIZON_CONNECTION', 'redis'),
            'queue' => [env('DEFAULT_QUEUE_NAME', 'default')],
            'balance' => 'simple',
            'processes' => 10,
            'tries' => 3,
            'timeout' => 90,
        ],
    ],
]

Here how it looks in Horizon Dashboard

This when the initial job times out. It stays like this in Recent Jobs while the retries are working. horizon1

After almost half an hour it changes to this:

horizon2

It's the same tags, I just blacked out names.

Here are the logs we are seeing (times here are in UTC)

[2020-04-22 11:24:59][88] Processing: App\Mail\ReservationInformation

[2020-04-22 11:29:00][88] Failed: App\Mail\ReservationInformation

[2020-04-22 11:29:00][88] Processing: App\Mail\ReservationInformation

[2020-04-22 11:56:21][88] Processed: App\Mail\ReservationInformation

Note: With Predis we also see some logs like Error while reading line from the server. [tcp://REDIS_HOST:6379] but with PHPRedis there was none.

We tried a lot of different combinations, to eliminate the problem. And it happened in every combination. So we think it must be something with Horizon.

We tried: - Predis with Redis 5 and Redis 3

There is only one instance of Horizon running. And no other queue is handled in this Horizon instance.

Any information or tips to try are welcome!


Solution

  • For us this turned out to be a configuration error in our systems. We were using OpenShift and Docker. We adjusted these values in our containers/systems

    net.ipv4.tcp_keepalive_intvl
    net.ipv4.tcp_keepalive_probes
    net.ipv4.tcp_keepalive_time
    

    and for now everything works normally.