I am trying to parse an HTML table using Hpricot but am stuck, not able to select a table element from the page which has a specified id.
Here is my ruby code:-
require 'rubygems'
require 'mechanize'
require 'hpricot'
agent = WWW::Mechanize.new
page = agent.get('http://www.indiapost.gov.in/pin/pinsearch.aspx')
form = page.forms.find {|f| f.name == 'form1'}
form.fields.find {|f| f.name == 'ddl_state'}.options[1].select
page = agent.submit(form, form.buttons[2])
doc = Hpricot(page.body)
puts doc.to_html # Here the doc contains the full HTML page
puts doc.search("//table[@id='gvw_offices']").first # This is NIL
Can anyone help me to identify what's wrong with this.
Mechanize will use hpricot internally (it's mechanize's default parser). What's more, it'll pass the hpricot stuff on to the parser, so you don't have to do it yourself:
require 'rubygems'
require 'mechanize'
#You don't really need this if you don't use hpricot directly
require 'hpricot'
agent = WWW::Mechanize.new
page = agent.get('http://www.indiapost.gov.in/pin/pinsearch.aspx')
form = page.forms.find {|f| f.name == 'form1'}
form.fields.find {|f| f.name == 'ddl_state'}.options[1].select
page = agent.submit(form, form.buttons[2])
puts page.parser.to_html # page.parser returns the hpricot parser
puts page.at("//table[@id='gvw_offices']") # This passes through to hpricot
Also note that page.search("foo").first
is equivalent to page.at("foo")
.