I am monitoring a ruby program with god. When the ruby program exits, I want to wait for 10 seconds until it is started again. When I use grace
, after the process exits, the process is immediately started again, but god waits for the grace period of 10 seconds until it looks at the process. When the process is now killed before the grace is over, god won't pick it up again and the process is never restarted.
I would like god to wait for 10 seconds until the start command is run after an exit. How would I do that?
I tried with transition
on :process_exits
in the watch, but I have difficulties finding a way to set the wait time at the right spot.
EDIT: After looking through the sources of god, I suspect, that a possible solution is to add a custom behavior which waits in its before_start
method. Does that sound reasonable? (See below) (END)
More details:
When I use the grace
features in a watch
, I get this behaviour:
INFO: Loading simple.god
INFO: Syslog enabled.
INFO: Using pid file directory: /Users/fsc/.god/pids
INFO: Started on drbunix:///tmp/god.17165.sock
INFO: simple_god move 'unmonitored' to 'init'
DEBUG: driver schedule #<God::Conditions::ProcessRunning:0x007fe134dee140> in 0 seconds
INFO: simple_god moved 'unmonitored' to 'init'
INFO: simple_god [trigger] process is not running (ProcessRunning)
DEBUG: simple_god ProcessRunning [false] {true=>:up, false=>:start}
INFO: simple_god move 'init' to 'start'
INFO: simple_god start: ruby .../simple.rb
DEBUG: driver schedule #<God::Conditions::ProcessRunning:0x007fe134dedb00> in 0 seconds
INFO: simple_god moved 'init' to 'start'
INFO: simple_god [trigger] process is running (ProcessRunning)
DEBUG: simple_god ProcessRunning [true] {true=>:up}
INFO: simple_god move 'start' to 'up'
INFO: simple_god registered 'proc_exit' event for pid 42498
INFO: simple_god moved 'start' to 'up'
Here I kill the process.
INFO: simple_god [trigger] process 42498 exited (ProcessExits)
DEBUG: simple_god ProcessExits [true] {true=>:start}
INFO: simple_god move 'up' to 'start'
INFO: simple_god deregistered 'proc_exit' event for pid 42498
INFO: simple_god start: ruby .../simple.rb
Here the grace period kicks in. At this point the process is already started. However, the god watch waits for the grace period until it looks at the process.
The next log line occurs 10 seconds (the grace) after the last log line from above:
DEBUG: driver schedule #<God::Conditions::ProcessRunning:0x007fe134dedb00> in 0 seconds
INFO: simple_god moved 'up' to 'start'
INFO: simple_god [trigger] process is running (ProcessRunning)
DEBUG: simple_god ProcessRunning [true] {true=>:up}
INFO: simple_god move 'start' to 'up'
INFO: simple_god registered 'proc_exit' event for pid 42501
INFO: simple_god moved 'start' to 'up'
EDIT:
The custom behavior:
module God
module Behaviors
class WaitBehavior < Behavior
attr_accessor :delay
def initialize
super
self.delay = 10
end
def valid?
valid = true
valid
end
def before_start
if delay>0 then
sleep delay
end
end
def test
true
end
end
end
end
Using the behavior in the .god config:
w.behavior(:wait_behavior)
I think it should work, and the WaitBehavior
class could be shorter.
module God
module Behaviors
class WaitBehavior < Behavior
attr_accessor :delay
def before_start
sleep delay.to_i if delay.to_i > 0
end
end
end
end
in .god config:
# .god
w.behavior(:wait_behavior) do |b|
b.delay = 10
end
Similar to WaitBehavior
, we can define a StateFileBehavior
to touch a file after_stop.
require 'fileutils'
module God
module Behaviors
class StateFileBehavior < Behavior
attr_accessor :file
def after_stop
FileUtils.touch file
end
end
end
end
and in .god
config
# .god
stop_timestamp_file = '/path/to/file'
w.behavior(:state_file_behavior) do |b|
b.file = stop_timestamp_file
end
w.start_if do |on|
on.condition(:file_mtime) do |c|
c.interval = 2
c.path = stop_timestamp_file
c.max_age = 10
end
end
Notice: In the second way, it could not work fine with w.keepalive