htmlperlhtml-tableextract

perl HTML::TableExtract out of range error


I am having difficulty recovering data from within an HTML table. Here is what I have.

use strict; 
use warnings;
use HTML::TreeBuilder;
use HTML::TableExtract qw(tree); #
use WWW::Mechanize;

my $d = 3; 
my $c = 4; 

$te = HTML::TableExtract->new( depth => $d, count => $c ); # , decode => 1, gridmap => 1
$te->parse($mech->content);
print "\nDepth = $d, Count = $c \n\n";
my $table = $te->first_table_found;
my $table_tree = $table->tree();
my @rows = $table->rows();
print "The row count is   : ".$rowcount,"\n";
print "The column count is: ".$colcount,"\n";
foreach my $row (@rows)
{
   my @read_row = $table->tree->row($row);
   foreach my $read (@read_row)
   {
      print $read, "\n";
   }
}

I get this as the error message.

"Rows(ARRAY(0x2987ef8)) out of range at test4.pl line 91."

Is there a better way of looking through the table and getting the values. I have no headers to look for and I have looked at HTML::Query but couldn't find it or the required Badger::Base through PPM and HTML::Element looks like it's better used for table construction. I'm also using WWW::Mechanize earlier in the script.

Any help on my code above would be appreciated.


Solution

  • You don't really need tree extraction mode for most purposes.

    Please always use strict and use warnings at the top of every Perl program you write, and declare your variables as close as possible to their first point of use.

    Your call $table->rows() returns a list of array references, that you can access like this

    my $te = HTML::TableExtract->new(depth => $d, count => $c); # , decode => 1, gridmap => 1
    $te->parse($mech->content);
    printf "\nDepth = %d, Count = %d\n\n", $d, $c;
    
    my $table = $te->first_table_found;
    my @rows = $table->rows;
    
    for my $row (@rows) {
      print join(', ', @$row), "\n";
    }