rubyweb-scrapinghttp-postwatir

Scraping a POST-request AJAX web page with watir-webdriver (or any other way in Ruby)?


I'm trying to get arrest data from the police blotter of the Palm Beach County Sheriff's Office.

I've limited my search to the city of West Palm Beach, going back as far as the data goes (Oct. 31, 1974).

I'm using FireFox.

When I get the results, I open up FireBug, check the HTML tab, and I can see the info I want from the page (i.e., arrested person's name, arrest address, charges, etc.).

I checked the Net>>XHR>>Post tab to find the POST request parameters, but putting that into my code does nothing. It probably doesn't help that I'm a complete newbie to watir-webdriver.

Here's my code:

require 'watir-webdriver'
require 'net/http'
require 'uri'

b = Watir::Browser.new
b.goto 'http://www.pbso.org/index.cfm?fa=blotter'
b.text_field(:name => 'start_date').set '01/01/1900'
b.text_field(:name => 'city_name').set 'West Palm Beach'
b.button(:name => 'process').click

Does anyone know if it's possible to get the response page HTML (i.e., the HTML that contains the name, address, crime, etc.)?


Solution

  • That one doesn't look so bad, I would use mechanize instead:

    require 'mechanize'
    agent = Mechanize.new
    form = agent.get('http://www.pbso.org/index.cfm?fa=blotter').forms[0]
    form['captcha_id'] = -1
    
    # page 1 of results
    page = form.submit
    
    # page 2 of results
    form['fromrec'] = form['fromrec'].to_i + 5
    page = form.submit
    

    The problem with watir-webdriver and ajax updated results is the errors you will get when a dom element that was there one moment is suddenly gone