serviceparallel-processingupstartgearmangnu-parallel

Fork processes indefinetly using gnu-parallel which catch individual exit errors and respawn


I guess the title gives you this thought.

Another duplicate question

Well, let me explain this in detail.

Okay, here we go.

I am using gearman to handle stack of tasks. I have a gearman client which send this task to workers. To run these task concurrently, there must be more workers to handle a task at a time. Presently, I create workers as per number of cpus. In my case, its 4. So, 4 processes.

./worker & ./worker & ./worker & ./worker.

I have same file running concurrently. But, I don't have their respective PIDs & their exit code status. I want them to run forever. Also, this processes do not output anything on console cuz they communicate client - worker style. And the biggest problem is to keep the terminal running. Remember, I want this processes running forever.

Now, to solve this problem, I decided to create a Upstart service which run this processes in background. But, I want to make sure that all my workers are running. Then I came across gnu-parallel which seems to be a perfect tool. I can't find the perfect command. And, I don't have time to explore it all.

So, I want to do the followings.

This is my upstart service

# workon

description "worker load"

start on runlevel [2345]
stop on runlevel [!2345]

respawn

script
  cpu="$(nproc)"

  line="./worker"

  for i in `seq 2 ${cpu}`; do
      line="${line} & ./worker"
  done

  sh -c "echo $$ > test.log; ${line}"
end script

I need parallel implementation in above code.

The flaw in the above code is that it re-spawns the service with all 4 worker process if the last worker get killed. For example.

___________________
Name   |  PID
worker    1011
worker    1012
worker    1013
worker    1014

If the PID 1014 get killed than the service respawn more 4 workers + old 3 workers. Which comes to 7 in total.

How to use gnu-parallel to keep all 4 workers alive in background service?

Thanks in advance.


Solution

  • GNU Parallel has --joblog that may be helpful here:

    seq 1000000000000 | parallel -N0 --joblog out.log worker
    

    This will start one worker per CPU core. When a worker crashes, the exitcode will be logged. The PID, however, will not.

    The worker will not be restarted, but a new worker will be started so there will always be one per CPU core running. When 1000000000000 workers have crashed, then GNU Parallel will not start another. Increase 1000000000000 if you think it is too small (it is 1 for each second in 31700 years - it will be enough for most humans, but if you are Vulcan, things may be different).

    If you really need the pid, you can probably do something like:

    seq 1000000000000 | parallel -N0 --joblog out.log 'echo $$; exec worker' >pids
    

    If you only need the PID of GNU Parallel:

    seq 1000000000000 | parallel -N0 --joblog out.log worker &
    echo $!