phpmultiprocessingbeanstalkdpheanstalk

Pheanstalk reserving same beanstalkd job twice or more


I have a 3 beanstalkd queue processes running on the same ip but different ports. I have a separate server running supervisor spawning php workers in parallel (20 per beanstalkd port) to process the queue. My problem is that it seems like two processes can reserve the same job id on the same server in the same time.

Here is some sample output from my log:

2017-02-23 09:59:56 --> START JOB (port: 11301 | u: 0.45138600 1487861996 | jid:1695074 | pid:30019 | j:leads_to_tags_add | tr:1)
2017-02-23 09:59:57 --> START JOB (port: 11301 | u: 0.55024800 1487861997 | jid:1695074 | pid:30157 | j:leads_to_tags_add | tr:2)
2017-02-23 09:59:58 --> DEL   JOB (port: 11301 | u: 0.54731000 1487861998 | jid:1695074 | pid:30019 | j:leads_to_tags_add)
2017-02-23 09:59:58 --> DEL   JOB (port: 11301 | u: 0.58927900 1487861998 | jid:1695074 | pid:30157 | j:leads_to_tags_add)

It seems the two reserves happen right after each other, and the second reserve happens before the first process finishes and deletes the job.

I added a counter in redis per jobid and it's clear that by the time it reserves the second time, the counter went up once (tr). The TTRR is set to 3600, so there is no way that it expires before the first process completes.

This is what the job status looks like right after the second process reserve:

Pheanstalk\Response\ArrayResponse::__set_state(array(
   'id' => '1695074',
   'tube' => 'action-medium',
   'state' => 'reserved',
   'pri' => '0',
   'age' => '1',
   'delay' => '0',
   'ttr' => '3600',
   'time-left' => '3599',
   'file' => '385',
   'reserves' => '2',
   'timeouts' => '0',
   'releases' => '0',
   'buries' => '0',
   'kicks' => '0',
))

This behaviour is very random, sometimes only one process will be able to reserve until the job locks, sometimes 2, sometimes even 4 or more (rarely). Of course this creates an inconsistent number of duplicate jobs being executed.

Short version of the code:

$this->job = $this->pheanstalk->watch($tube)->reserve($timeout);


set_error_handler(function($errno, $errstr, $errfile, $errline, array $errcontext) {
    // error was suppressed with the @-operator
    if (0 === error_reporting()) {
        return false;
    }

    throw new ErrorException($errstr, 0, $errno, $errfile, $errline);
});

$this->log_message('info', __METHOD__ . ": START JOB (" . $this->_logDetails() . " | tr:{$tries})");


if ($this->_process_job()) {
    $this->log_message('info', __METHOD__ . ": FINISHED JOB (" . $this->_logDetails() . ")");
    $this->_delete_job();
} else {
    $this->log_message('error', __METHOD__ . ": FAILED JOB (" . $this->_logDetails() . ")");
}

restore_error_handler();

and

protected function _delete_job()
{
    $this->pheanstalk->delete($this->job);
    $this->log_message('info', __METHOD__ . ": DELETED JOB (" . $this->_logDetails() . ")");
}

Solution

  • The issue was a TCP disconnection happening within the process_job bit of my code. The pheanstalk object would get overwritten somewhere down the execution chain.

    I have a library wrapping pheanstalk which checks for the emptiest beanstalkd server available before sending a put command. It would then create a new Pheanstalk instance with the best server's details, save it as the current connection, and then dispatch the command.

    Sometimes the worker can send a subjob to the queue. The library would be loaded in the worker initially to fetch a job, and again in process_job to send to the queue. Due to the dependency injection in Codeigniter, that library would refer to the same object, and inside process_job it would overwrite the worker's current connection and cause a TCP disconnect.