rubyweb-scrapingproxy

Change IP address in ruby


Right now I'm running a scraping program on my computer. It's massive in size, and unfortunately because of this, my IP address has been banned from the site I need to scrape. Is there a way that in Ruby, or even just in a simple manner, I can switch my IP address so that I can be allowed back into this site for scraping, or am I out of luck, and I may have to resort to other solutions. It is a 403 Forbidden Error, and for whatever its worth I'm using nokogiri and my user agent is ruby, thanks.


Solution

  • You can connect through a proxy, and if you have a list of proxy addresses then you can tell ruby to change proxy every x minutes, this will result in a change of the IP that the website thinks you have. Here's a code to scrape google search results through a proxy, to use a proxy list just extend the code a bit.

    require 'rubygems'
    require 'mechanize'
    
    agent = Mechanize.new
    agent.set_proxy '78.186.178.153', 8080
    page = agent.get('http://www.google.com/')
    
    google_form = page.form('f')
    google_form.q = 'new york city council'
    
    page = agent.submit(google_form, google_form.buttons.first)
    
    page.links.each do |link|
        if link.href.to_s =~/url.q/
            str=link.href.to_s
            strList=str.split(%r{=|&}) 
            url=strList[1] 
            puts url
        end 
    end