perlcss-selectorsweb-scrapingwww-mechanizewww-mechanize-firefox

Selecting only from a specific table on a page WWW::Mechanize and CSS Selectors?


Good day,

I am scraping a number of pages that display the data I require in tables. On the page there are multiple tables with the following:

<table class="dTable" cellspacing="1" cellpadding="1" border="0">

The items I want scraped are in table cells:

<td class="dCell" align="right">

There are unfortunately many cells on the page with the same class. Furthermore, certain pages contain extra dCells for additional information. So specifying the particular cells in the script of the form:

my @thing = $mech->selector('td.dCell');

my $val = $thing[14]->text();

Will give different results on different pages, ie. I won't get what I want to scrape all the time.

So as a part solution, I think it would be best to select from the specific table.

my @table = $mech->selector('table.dTable');

my @required = $table[3]->selector('td.dCell');

#the info is in the third dTable on the page

#the third table does not contain changing data, ie. I can use required[1] and it will be the same all of the time.

I tried this and it does not work, error received:

MozRepl::RemoteObject::Object has no function selector at the following line:

my @required = $table[3]->selector('td.dCell');

So at this point I'm stuck. I appreciate all the assistance.


Solution

  • You need to use node option of the selector:

    my @required = $mech->selector( 'td.dCell', { node => ... } );
    

    But why you don't using XPath?

    my @required = $mech->xpath('//table[@class="dTable"][3]//td[@class="dCell"]');