ruby-on-railsrubyparsingnokogirihandleerror

Rails/nokogiri_parse multiples sites handling errors


I am developping on Rails 3 and using Nokogiri. My Controller parses multiples sites and shows the result on my View. Problem is, when one of these sites is unavailable (403 or 503 error for example), all webapp crashes because of that site.

My question: is there a way to check availability of parsed pages before Nokogiri opens it, or better, going through/ignoring unavailability ?

Thanks

Part of my Controller:

Docvariable1 = Nokogiri::HTML(open("http://www.site1.com/"))
@variable1 = {}
docvariable1.xpath('//div[6]/h3/a').each do |link|
@variable1[link.text.strip] = link['href']
End



Docvariable2 = Nokogiri::HTML(open("http://www.site2.com/"))
@variable2 = {}
docvariable2.xpath('//div[6]/h3/a').each do |link|
@variable2[link.text.strip] = link['href']
End


Docvariable3 = Nokogiri::HTML(open("http://www.site3.com/"))
@variable3 = {}
docvariable3.xpath('//div[6]/h3/a').each do |link|
@variable3[link.text.strip] = link['href']
end

Part of my View

<% if @variable1 %>
<% @variable1.each do |key, value| %>
<li ><a href=" <%= "#{value}" %>" target='_blank' ><%= "#{key}" %></a>
<% end %>
<% end %>

<% if @variable2 %>
<% @variable1.each do |key, value| %>
<li ><a href=" <%= "#{value}" %>" target='_blank' ><%= "#{key}" %></a>
<% end %>
<% end %>

<% if @variable3 %>
<% @variable1.each do |key, value| %>
<li ><a href=" <%= "#{value}" %>" target='_blank' ><%= "#{key}" %></a>
<% end %>
<% end %>

PS: I know that the code isn't quite "perfect" because it is the opposite of the "DRY" principle, still learning ;)


Solution

  • You can try to put each one of those within a begin -- rescue block, so it doesn't fail if one of them is unavailable. Then you can handle those exceptions if necessary.

    begin
        docvariable1 = Nokogiri::HTML(open("http://www.site1.com/"))
        @variable1 = {}
        docvariable1.xpath('//div[6]/h3/a').each do |link|
            @variable1[link.text.strip] = link['href']
        end
    rescue
        # Handle exception
    end