perlhtml-treebuilder

Why does look_down method in HTML::Element fail to find <section> elements?


The code below shows that TreeBuilder method look_down cannot find the "section" element. Why?

use strict;
use warnings;
use HTML::TreeBuilder;

my $html =<<'END_HTML';
<html>
<head><title></title></head>
<body>
<div attrname="div">
<section attrname="section">
</section>
</div>
</body>
</html>
END_HTML

my $tree = HTML::TreeBuilder->new_from_content($html);

my @divs = $tree->look_down('attrname', 'div');
print "number of div elements found = ", scalar(@divs), "\n";

my @sections = $tree->look_down('attrname', 'section');
print "number of section elements found = ", scalar(@sections), "\n";

$tree->delete();

Output: number of div elements found = 1 number of section elements found = 0


Solution

  • This worked for me:

    my $tree = HTML::TreeBuilder->new;
    $tree->ignore_unknown(0);  # <-- Include unknown elements in tree
    $tree->parse($html);
    my @divs = $tree->look_down('attrname', 'div');
    my @sections = $tree->look_down('attrname', 'section');
    print "number of div elements found = ", scalar(@divs), "\n";
    print "number of section elements found = ", scalar(@sections), "\n";
    

    Output:

    number of div elements found = 1
    number of section elements found = 1