seleniumelixirhound

problems with Scraping a Website with Elixir


I'm trying to get a simple hound test working with my app, I figured out its an error with selenium. This is the code:

In mix.exs:

defmodule Scraper.Mixfile do
  use Mix.Project

  def project do
    [app: :scraper,
     version: "0.0.1",
     elixir: "~> 1.0",
     build_embedded: Mix.env == :prod,
     start_permanent: Mix.env == :prod,
     deps: deps]
  end

  # Configuration for the OTP application
  #
  # Type `mix help compile.app` for more information
  def application do
    [applications: [:logger, :httpoison, :hound]]
  end

  # Dependencies can be Hex packages:
  #
  #   {:mydep, "~> 0.3.0"}
  #
  # Or git/path repositories:
  #
  #   {:mydep, git: "https://github.com/elixir-lang/mydep.git", tag: "0.1.0"}
  #
  # Type `mix help deps` for more examples and options
  defp deps do
    [
      {:httpoison, "~> 0.7"},
      {:floki, "~> 0.7"},
      {:hound, "~> 0.7"}
    ]
  end
end

In lib/scraper.ex

defmodule Example do
  use Hound.Helpers

  def run do
    Hound.start_session
    IO.inspect "Iniciando"
    navigate_to "http://akash.im"
    IO.inspect page_title()

    Hound.end_session
  end
end

In config/config.exs

# This file is responsible for configuring your application
# and its dependencies with the aid of the Mix.Config module.
use Mix.Config

# This configuration is loaded before any dependency and is restricted
# to this project. If another project depends on this project, this
# file won't be loaded nor affect the parent project. For this reason,
# if you want to provide default values for your application for third-
# party users, it should be done in your mix.exs file.

# Sample configuration:
#
#     config :logger, :console,
#       level: :info,
#       format: "$date $time [$level] $metadata$message\n",
#       metadata: [:user_id]

# It is also possible to import configuration files, relative to this
# directory. For example, you can emulate configuration per environment
# by uncommenting the line below and defining dev.exs, test.exs and such.
# Configuration from the imported file will override the ones defined
# here (which is why it is important to import them last).
#
#     import_config "#{Mix.env}.exs"
# Define how long the application will wait between failed attempts (in miliseconds)
config :hound, retry_time: 100000
# Start with selenium driver (default)
config :hound, driver: "selenium"

Starting a webdriver server

java -jar selenium-server-standalone-2.45.0.jar

Run app:

/scraper$ iex -S mix
Erlang/OTP 18 [erts-7.1] [source] [64-bit] [smp:2:2] [async-threads:10] [hipe] [kernel-poll:false]

Interactive Elixir (1.0.5) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> Example.run
** (exit) exited in: :gen_server.call(Hound.SessionServer, {:find_or_create_session, #PID<0.148.0>}, 60000)
    ** (EXIT) an exception was raised:
        ** (MatchError) no match of right hand side value: {:error, %HTTPoison.Error{id: nil, reason: :timeout}}
            (hound) lib/hound/request_utils.ex:43: Hound.RequestUtils.send_req/4
            (hound) lib/hound/session_server.ex:22: Hound.SessionServer.handle_call/3
            (stdlib) gen_server.erl:629: :gen_server.try_handle_call/4
            (stdlib) gen_server.erl:661: :gen_server.handle_msg/5
            (stdlib) proc_lib.erl:240: :proc_lib.init_p_do_apply/3

11:26:13.971 [error] GenServer Hound.SessionServer terminating
Last message: {:find_or_create_session, #PID<0.148.0>}
State: #HashDict<[]>
** (exit) an exception was raised:
    ** (MatchError) no match of right hand side value: {:error, %HTTPoison.Error{id: nil, reason: :timeout}}
        (hound) lib/hound/request_utils.ex:43: Hound.RequestUtils.send_req/4
        (hound) lib/hound/session_server.ex:22: Hound.SessionServer.handle_call/3
        (stdlib) gen_server.erl:629: :gen_server.try_handle_call/4
        (stdlib) gen_server.erl:661: :gen_server.handle_msg/5
        (stdlib) proc_lib.erl:240: :proc_lib.init_p_do_apply/3
     (stdlib) gen_server.erl:212: :gen_server.call/3
    (scraper) lib/scraper.ex:37: Example.run/0
iex(1)> 


Solution

  • The request timed out in this case, as can be seen from the line

    ** (MatchError) no match of right hand side value: {:error, %HTTPoison.Error{id: nil, reason: :timeout}}
    

    If you look at the stack trace, it indicates the error is at

    (hound) lib/hound/request_utils.ex:43: Hound.RequestUtils.send_req/4
    

    And if you open up hound source, on line 43 of lib/hound/request_utils.ex you see

    case type do
      :get ->
        {:ok, resp} = HTTPoison.get(url, headers, @http_options)
      :post ->
        {:ok, resp} = HTTPoison.post(url, body, headers, @http_options)
      :delete ->
        {:ok, resp} = HTTPoison.delete(url, headers, @http_options)
    end
    

    This code expects a response, and crashes otherwise. There's a timeout error in your case, causing the crash.

    Please check if the website up and reachable when you run the test, and retry.