There is a chef recipe with a ruby-block. The ruby-block is executed until a socket connection can be established (retries 10). In case no connection can be established the ruby-block should not fail (ignore_failure).
Example:
ruby_block 'wait for service' do
block do
require 'socket'
require 'timeout'
Timeout.timeout(2) do
s = TCPSocket.new('127.0.0.1', 8080)
s.close
end
end
retries 10
retry_delay 5
ignore_failure true
action :run
end
The chef documentation isn't clear about whether the ruby-block is executed repeatedly or not when the ignore_failure
is set to true
.
When the script is executed and no service is listening on port 8080 the execution of the chef recipe continues after the first attempt with the following message:
ERROR: ruby_block[wait for service] (cookbook::wait_for_service line 1) had an error: Errno::ECONNREFUSED: Connection refused - connect(2) for "127.0.0.1" port 8080; ignore_failure is set, continuing
...
Error executing action run on resource 'ruby_block[wait for service]'
Errno::ECONNREFUSED
-------------------
Connection refused - connect(2) for "127.0.0.1" port 8080
...
Due to the ruby_block declaration I would thing that the ruby is executed 10 times before reporting an ERROR
.
I've tested your scenario with Chef version 12.19.36 and really it happens that if both ignore_failure
and retries
are specified only ignore_failure is applied while retries is ignored.
Also here Chef documentation isn't clear about this specific scenario and so it is not possible to solve your issue doing that.
Anyway you can solve manually implementing the retries and retry_delay logic as follow:
ruby_block 'wait for service' do
block do
require 'socket'
require 'timeout'
retry_delay = 5
retries = 10
1.upto(retries) do |n|
err_msg = ""
begin
Timeout::timeout(retry_delay) do
begin
s = TCPSocket.new('8.8.8.8', 52)
s.close
puts("Service is listening on")
break
rescue Errno::ECONNREFUSED
err_msg = "Port is open but no service is listening on"
rescue Errno::EHOSTUNREACH
err_msg = "Unable to connect to the service"
end
end
rescue Timeout::Error
err_msg = "Timeout reached"
end
if n == retries
raise "Unabled to reach server in #{retries} attempts"
else
puts "Failed to reach server on attempt [#{n}/#{retries}]. Cause is: [#{err_msg}]. Waiting #{retry_delay} seconds and retry."
sleep(retry_delay)
end
end
end
ignore_failure true
action :run
end
You can also improve the code creating a common function execute_with_retry with a lambda function as input in order to simply reuse this logic on your recipes when needed.