perlweb-scraping

How to check the HTML element is a end node?


I am building a HTML parser in Perl. I would like to know if the HTML element is an element without any sibilings.

Here is the HTML, I would like to parse :

<span class="bold1">A:</span> ELementA<br />
<span class="bold1">B:</span> <a href="mailto:admin" class="bold1">mailto:admin</a><br />
<span class="bold1">C </span> 01/12<br />
<span class="bold1">D:</span> ELementC<br />
<span class="bold1">E:</span> ElementD<br />
<span class="bold1">F:</span> ElementE<br />

How to check if the element is the end element.

I am getting the error :

Can't call method "as_text" without a package or object reference at 

Any idea what could be wrong ?

Here is the code snippet in Perl,

my $mech = WWW::Mechanize->new( autocheck => 1 );

eval 
{
    $mech->get($url);
};
if ($@) 
{
    print "Error connecting to URL $url \n";
    exit(0);
}

my $root = HTML::TreeBuilder->new_from_content(decode_utf8($mech->content));

my @PageSections = $root->look_down( 
            sub { 
                return (
                ($_[0]->tag() eq 'span' ) and 
                ($_[0]->attr('class')  =~ m/bold1/i) )

            }); 

my $temp2;
my $temp3;

for my $ps (@PageSections)
{
    #  my $temp1= $ps->right()->as_text;
    $temp2= $ps->as_text;

    my $temp3=ref $ps->right();
    # 
    print defined $temp3 ? "defined \n" : "not defined\n";
}

Thanks


Solution

  • It's hard to tell without knowing more of your code, but I'm guessing @PageSections contains objects of some home brewed module, and that something happens there to make $_ point to something completely different. I'd go with

    for my $ps (@PageSections)
    {
        my $temp1= $ps->right()->as_text;
        my $temp2= $ps->as_text;
        print "$temp2  " . $temp1 . " \n";
    }
    

    instead.