ruby-on-railsweb-scrapingssl

How do you scrape from a website which requires credentials (SSL)?


I was wondering if anyone can point me in the right direction. I want to scrape html/text contents from an SSL enabled website (https in the URL). There will be multiple branches in the filesystem of said site.

My questions is:

How do I go about providing credentials for the external website from within my Rails application?

Thanks!


Solution

  • require 'httpclient'
    require 'nokogiri'
    
    client = HTTPClient.new
    
    client.set_auth("http://domain.com", "username", "password")
    
    doc = Nokogiri::HTML(c.get_content("http://example.com"))
    

    Hey guys, sorry about the late response, I've been swamped with a few things. The code above worked for me. (after many tangos with mechanize and some of the other nokogiri-based gems). The some of the other gems such as openuri, mechanize, etc were resulting in errors such as MD5 Unknown hashing algorithm. Thanks for your time and help.