rubyruby-on-rails-3resquegod

God-Resque Process in Limbo


I'm currently using god to start 6 resque worker processes. Resque show's that they are started and working and everything is working. Occasionally a worker process drops out of recognition and ceases to be a known resque worker process. What I'm looking for is a way to restart that process or have resque-web recognize it again. What's weird is it's still running in the background and forking tasks to work on them and I can see the number decrease on resque-web, but it doesn't show that any workers are running. I've looked into their stale.god script, but that doesn't work because the process appears to keep retrieving jobs after it drops from recognition of resque-web. Here is my setup:

#resque-production.god

6.times do |num|
  God.watch do |w|
    w.name = "resque-#{num}"
    w.group = "resque"
    w.interval = 30.seconds
    w.env = { 'RAILS_ENV' => 'production' }
    w.dir = File.expand_path(File.join(File.dirname(__FILE__)))
    w.start = "bundle exec rake environment RAILS_ENV=production resque:workers:start"
    w.start_grace = 10.seconds
    w.log = "/var/www/loadmax/shared/log/resque-worker.log"

    # restart if memory gets too high
    w.transition(:up, :restart) do |on|
      on.condition(:memory_usage) do |c|
        c.above = 200.megabytes
        c.times = 2
      end
    end

    # determine the state on startup
    w.transition(:init, { true => :up, false => :start }) do |on|
      on.condition(:process_running) do |c|
        c.running = true
      end
    end

    # determine when process has finished starting
    w.transition([:start, :restart], :up) do |on|
      on.condition(:process_running) do |c|
        c.running = true
        c.interval = 5.seconds
      end

      # failsafe
      on.condition(:tries) do |c|
        c.times = 5
        c.transition = :start
        c.interval = 5.seconds
      end
    end

    # start if process is not running
    w.transition(:up, :start) do |on|
      on.condition(:process_running) do |c|
        c.running = false
      end
    end
  end
end 

The next file is used for connecting to one redis server and setting priorities.

#resque.rake 
require 'resque/tasks'
Dir.glob("#{Rails.root}/app/workers/*.rb") do |rb|
  require rb
end
task "resque:setup" => :environment do
  resque_config = YAML.load_file(Rails.root.join("config","resque.yml"))
  ENV['QUEUE'] = resque_config["priority"].map{ |x| "#{x}" }.join(",") if ENV['QUEUE'].nil?
end
task "resque:workers:start" => :environment do
  threads = []
  q = [1,2]
  resque_config = YAML.load_file(Rails.root.join("config","resque.yml"))
  threads << Thread.new(q){ |qs|
    %x[bundle exec rake environment RAILS_ENV=#{Rails.env} resque:work QUEUE=#{resque_config["priority"].map{ |x| "#{x}" }.join(",")} ]
  }
  threads.each {|aThread| aThread.join }
end

I've been looking all over for a solution for this and zombie processes, stale processes, and exiting processes don't seem to be a solution. I'm using god -c /path/to/god to start.

Let me know if I need to provide anything else or be more clear. Thanks for all the help!


Solution

  • I ended up putting redis on the same box as the workers and they have been functioning properly since.