chef-infrachef-recipe

Is a ruby_block executed repeatedly when retries is above 0 and ignore_failure is true?


There is a chef recipe with a ruby-block. The ruby-block is executed until a socket connection can be established (retries 10). In case no connection can be established the ruby-block should not fail (ignore_failure).

Example:

ruby_block 'wait for service' do
  block do
    require 'socket'
    require 'timeout'
    Timeout.timeout(2) do
      s = TCPSocket.new('127.0.0.1', 8080)
      s.close
    end
  end
  retries 10
  retry_delay 5
  ignore_failure true
  action :run
end

The chef documentation isn't clear about whether the ruby-block is executed repeatedly or not when the ignore_failure is set to true.

Update

When the script is executed and no service is listening on port 8080 the execution of the chef recipe continues after the first attempt with the following message:

ERROR: ruby_block[wait for service] (cookbook::wait_for_service line 1) had an error: Errno::ECONNREFUSED: Connection refused - connect(2) for "127.0.0.1" port 8080; ignore_failure is set, continuing

...

Error executing action run on resource 'ruby_block[wait for service]'

Errno::ECONNREFUSED
-------------------
Connection refused - connect(2) for "127.0.0.1" port 8080

...

Due to the ruby_block declaration I would thing that the ruby is executed 10 times before reporting an ERROR.


Solution

  • I've tested your scenario with Chef version 12.19.36 and really it happens that if both ignore_failure and retries are specified only ignore_failure is applied while retries is ignored.

    Also here Chef documentation isn't clear about this specific scenario and so it is not possible to solve your issue doing that.

    Anyway you can solve manually implementing the retries and retry_delay logic as follow:

    ruby_block 'wait for service' do
          block do
            require 'socket'
            require 'timeout'
    
            retry_delay = 5
            retries = 10
    
            1.upto(retries) do |n|
              err_msg = ""
              begin
                Timeout::timeout(retry_delay) do
                  begin
                    s = TCPSocket.new('8.8.8.8', 52)
                    s.close
                    puts("Service is listening on")
                    break
                  rescue Errno::ECONNREFUSED
                    err_msg = "Port is open but no service is listening on"
                  rescue Errno::EHOSTUNREACH
                    err_msg =  "Unable to connect to the service"
                  end
                end
              rescue Timeout::Error
                err_msg = "Timeout reached"
              end
    
              if n == retries
                raise "Unabled to reach server in #{retries} attempts"
              else
                puts "Failed to reach server on attempt [#{n}/#{retries}]. Cause is: [#{err_msg}]. Waiting #{retry_delay} seconds and retry."
                sleep(retry_delay)
              end
    
            end
          end
          ignore_failure true
          action :run
        end
    

    You can also improve the code creating a common function execute_with_retry with a lambda function as input in order to simply reuse this logic on your recipes when needed.