My attempts at building the simplest web crawler w/Capybara are failing. What am I doing wrong?

Capybara. Mechanize. Nokogiri. Selenium. Et cetera.

I've tried to build the simplest little Ruby program that does the following:

Opens a web browser
Navigates to a website
Clicks a link

. . . but have had basically no success.

Here's what I've tried:

crawler.rb

require "capybara"
require "capybara/dsl"

class Crawler
  include Capybara::DSL

  def initialize
    visit "http://www.google.com"
  end
end

crawler = Crawler.new

When I run that code, I get an error.

rack-test requires a rack application, but none was given (ArgumentError)

I read somewhere not in the documentation that this should fix it:

require "capybara"
require "capybara/dsl"

class Crawler
  include Capybara::DSL

  def initialize
    Capybara.default_driver = :selenium
    visit "http://www.google.com"
  end
end

crawler = Crawler.new

Then, when I solve for that error, I get another related to some other dependency.

Unable to find Mozilla geckodriver. Please download the server from https://github.com/mozilla/geckodriver/releases and place it somewhere on your PATH. More info at https://developer.mozilla.org/en-US/docs/Mozilla/QA/Marionette/WebDriver. (Selenium::WebDriver::Error::WebDriverError)

I download the driver, have no clue as to how to actually install the thing despite reading and following another set of elliptical directions, but already have the distinct sense that I'm down a path of yak-shaving that won't yield any fruit, because all I want to do is get Ruby to go to a stupid web page and click a stupid link.

I'm not trying to run this code as part of a test. I literally just want Ruby to open a web browser (that I can see) using Capybara (or whatever tool gets the job done, though preferably Capybara) and to do my bidding. But this for whatever reason is EXTREMELY difficult, even though it's apparently been done a billion times.

What am I doing wrong here?

Solution

selenium-webdriver recently released 3.0.0 which defaults to using geckodriver with firefox (which Capybara defaults to), but has some missing functionality in that combination. Rather I would recommend using it with chrome and chromedriver for your use case. You will need to download the latest version of chromedriver and put it somewhere in your PATH. Then

require "capybara/dsl"
require "selenium-webdriver"

Capybara.register_driver :crawler_driver do |app|
  Capybara::Selenium::Driver.new(app, :browser => :chrome)
end
Capybara.default_driver = :crawler_driver

class Crawler
  include Capybara::DSL

  def initialize
    visit "http://www.google.com"
  end
end

crawler = Crawler.new

should do what you're trying to do. You're going to have issues as soon as you create another Crawler instance though since they will both be using the same Capybara session and conflict. If you're not going to be creating multiple instance then you're fine, if you are then you'll want to create a new Capybara::Session in each instance of crawler and call all capybara methods on that session object rather than including Capybara::DSL into your object which would be more like this

class Crawler
  def initialize
    @session = Capybara::Session.new(:crawler_driver)
    @session.visit "http://www.google.com"
  end
end