rubygod

God monitoring: Delay start after process exit


I am monitoring a ruby program with god. When the ruby program exits, I want to wait for 10 seconds until it is started again. When I use grace, after the process exits, the process is immediately started again, but god waits for the grace period of 10 seconds until it looks at the process. When the process is now killed before the grace is over, god won't pick it up again and the process is never restarted.

I would like god to wait for 10 seconds until the start command is run after an exit. How would I do that?

I tried with transition on :process_exits in the watch, but I have difficulties finding a way to set the wait time at the right spot.

EDIT: After looking through the sources of god, I suspect, that a possible solution is to add a custom behavior which waits in its before_start method. Does that sound reasonable? (See below) (END)


More details:

When I use the grace features in a watch, I get this behaviour:

 INFO: Loading simple.god
 INFO: Syslog enabled.
 INFO: Using pid file directory: /Users/fsc/.god/pids
 INFO: Started on drbunix:///tmp/god.17165.sock
 INFO: simple_god move 'unmonitored' to 'init'
DEBUG: driver schedule #<God::Conditions::ProcessRunning:0x007fe134dee140> in 0 seconds
 INFO: simple_god moved 'unmonitored' to 'init'
 INFO: simple_god [trigger] process is not running (ProcessRunning)
DEBUG: simple_god ProcessRunning [false] {true=>:up, false=>:start}
 INFO: simple_god move 'init' to 'start'
 INFO: simple_god start: ruby .../simple.rb
DEBUG: driver schedule #<God::Conditions::ProcessRunning:0x007fe134dedb00> in 0 seconds
 INFO: simple_god moved 'init' to 'start'
 INFO: simple_god [trigger] process is running (ProcessRunning)
DEBUG: simple_god ProcessRunning [true] {true=>:up}
 INFO: simple_god move 'start' to 'up'
 INFO: simple_god registered 'proc_exit' event for pid 42498
 INFO: simple_god moved 'start' to 'up'

Here I kill the process.

 INFO: simple_god [trigger] process 42498 exited (ProcessExits)
DEBUG: simple_god ProcessExits [true] {true=>:start}
 INFO: simple_god move 'up' to 'start'
 INFO: simple_god deregistered 'proc_exit' event for pid 42498
 INFO: simple_god start: ruby .../simple.rb

Here the grace period kicks in. At this point the process is already started. However, the god watch waits for the grace period until it looks at the process.

The next log line occurs 10 seconds (the grace) after the last log line from above:

DEBUG: driver schedule #<God::Conditions::ProcessRunning:0x007fe134dedb00> in 0 seconds
 INFO: simple_god moved 'up' to 'start'
 INFO: simple_god [trigger] process is running (ProcessRunning)
DEBUG: simple_god ProcessRunning [true] {true=>:up}
 INFO: simple_god move 'start' to 'up'
 INFO: simple_god registered 'proc_exit' event for pid 42501
 INFO: simple_god moved 'start' to 'up'

EDIT:

The custom behavior:

module God
  module Behaviors

    class WaitBehavior < Behavior
      attr_accessor :delay

      def initialize
        super
        self.delay = 10
      end

      def valid?
        valid = true
        valid
      end

      def before_start
        if delay>0 then
          sleep delay
        end
      end

      def test
        true
      end
    end
  end
end

Using the behavior in the .god config:

w.behavior(:wait_behavior)

Solution

  • I think it should work, and the WaitBehavior class could be shorter.

    module God
      module Behaviors
        class WaitBehavior < Behavior
          attr_accessor :delay
    
          def before_start
            sleep delay.to_i if delay.to_i > 0
          end
        end
      end
    end
    

    in .god config:

    # .god
    w.behavior(:wait_behavior) do |b|
      b.delay = 10
    end
    

    another way

    Similar to WaitBehavior, we can define a StateFileBehavior to touch a file after_stop.

    require 'fileutils'
    
    module God
      module Behaviors
        class StateFileBehavior < Behavior
          attr_accessor :file
    
          def after_stop
            FileUtils.touch file
          end
        end
      end
    end
    

    and in .god config

    # .god
    stop_timestamp_file = '/path/to/file'
    
    w.behavior(:state_file_behavior) do |b|
      b.file = stop_timestamp_file
    end
    
    w.start_if do |on|
      on.condition(:file_mtime) do |c|
        c.interval = 2
        c.path = stop_timestamp_file
        c.max_age = 10
      end
    end
    

    Notice: In the second way, it could not work fine with w.keepalive