Suppose I've a HTML tree like this:
div
`- ul
`- li (*)
`- li (*)
`- li (*)
`- li (*)
`- ul
`- li
`- li
`- li
How do I select the <li>
elements that are marked with (*)
? They are direct descendants of the first <ul>
element.
Here is how I find the first <ul>
element:
my $ul = $div->look_down(_tag => 'ul');
Now I've the $ul
, but when I do things like:
my @li_elements = $ul->look_down(_tag => 'li');
It also finds <li>
elements that are buried deeper in the HTML tree.
How do I find just the <li>
elements that are direct descendants of the first <ul>
element? I've an unknown number of them. (I can't just select first 4 as in example).
You can get all the children of an HTML::Element
object using the content_list
method, so all the child nodes of the first <ul>
element in the document would be
use HTML::TreeBuilder;
my $tree = HTML::TreeBuilder->new_from_file('my.html');
my @items = $tree->look_down(_tag => 'ul')->content_list;
But it is far more expressive to use HTML::TreeBuilder::XPath
, which lets you find all <li>
children of <ul>
children of <div>
elements anywhere in the document, like this
use HTML::TreeBuilder::XPath;
my $tree = HTML::TreeBuilder->new_from_file('my.html');
my @items = $tree->findnodes('//div/ul/li')->get_nodelist;