perlmojo

Finding the contents under div with specific id patterns using MOJO::DOM


I need to parse some HTML codes. The patterns of the tag ID are:

<tr id="date">.....</tr>
<tr id="band01"><td>field1</td><td>field2</td></tr>
<tr id="band02">...contents...</tr>
.....
<tr id="(others">.....

I'm using PERL Mojo::DOM parser, and want to extract all the actual ids with names starting with "band" following by a number, as well as its contents.

How could I achieve this?


Solution

  • The E[foo^="bar"] selector matches any element with a "foo" attribute starting with "bar". Thus you can use:

    my $dom = Mojo::DOM->new($html);
    my $rows = $dom->find('tr[id^="band"]');
    

    $rows would be a Mojo::Collection of Mojo::DOM objects representing each matching element and its respective contents. For example, to get the list of matched IDs:

    my @ids = $rows->map(attr => 'id')->each;
    

    Or with more standard Perl:

    my @ids = map { $_->{id} } @$rows;